[Xenomai] POSIX application running under xenomai -- what do wrapped functions do?

Philippe Gerum rpm at xenomai.org
Fri Jun 20 19:38:51 CEST 2014


On 06/20/2014 05:48 PM, Steve M. Robbins wrote:
> Hi,
>
> I've looked through the FAQ and read all the "Start Here" documentation on the
> wiki.  At the risk of sounding dense, I confess I am still a bit unsure what
> the wrapped POSIX functions are doing.  This is going to be a long-winded
> post, but here's the first questions:
>
> Q1: Generically, what are the wrapped functions doing?  I did look at the code
> for wrapped select().  I can see it calls some xenomai function and falls back
> to the regular select(), but well, let's just say I'm no wiser...

Wrapping is only aimed at keeping the regular POSIX names for calling 
POSIX-compliant Xenomai services, without resorting to a parallel naming 
scheme for an API which is deemed portable and standardized (e.g. 
pthread_create_whatever_nonsense() instead of pthread_create()).

When a wrapper ends up calling the regular glibc service, this means 
that Xenomai won't process the call, so it simply hands it over to the 
regular kernel via the glibc, hoping for the best. In the select() case, 
this typically happens when Xenomai discovers that one or more file 
descriptors found in the sets are not RTDM ones (ie. not belonging to 
the Xenomai realm).

The key issue is that in such system with two kernels running 
side-by-side, there must be two service call stacks. One stack ends up 
calling into the real-time extension, the other into the host/Linux 
non-rt kernel. Sometimes the Xenomai wrappers do some bridging between 
them to make things more transparent to the user (well, that was the 
plan), like what happens with the select() call.

>
> Q2: The POSIX skin wraps I/O like read() and write().  Is it supported to mix
> wrapped and unwrapped calls?  I have inherited a code base where some I/O with
> FIFOs and files uses wrapped calls but socket code uses unwrapped code.

It is supported, but this also means that your code may switch back and 
forth between real-time and non real-time modes, we call these "primary" 
and "secondary" modes. Passing a RTDM fd to some unwrapped call will 
never work though, and should beget EBADF.

A thread in primary mode will be switched to secondary mode whenever it 
issues a Linux system call. Conversely, a (Xenomai) thread in secondary 
mode will switch back to primary mode when it calls a Xenomai service 
which requires it. Switching mode is:

1) time consuming and causes overhead when done very frequently (e.g. 
within some tight processing loop),

2) defeats the purpose of using a real-time extension as soon as the 
code moves to secondary mode.

This said, invoking regular kernel services from real-time threads in 
well-defined and delimited sections may be perfectly fine. We definitely 
need this for carrying out initialization/cleanup chores and such.

>
> Q3: The POSIX skin wraps select() but not pselect().  The two functions are
> basically identical in function if you don't use signal masking (as we don't).
> So we have different behaviour if I choose pselect() over select(). Since
> Xenomai needs to wrap one, is there a chance that using the unwrapped
> alternative may confuse Xenomai?  Our code uses pselect() today.
>
>

Then your code invokes the regular pselect() service from the regular 
kernel, since Xenomai does not wrap it. Xenomai won't be confused, the 
operation fully happens within the "other" kernel. This also means that 
the file descriptors must be regular ones, not obtained from the wrapped 
(i.e. RTDM-provided) open() call though.

>
> So I'm working on a motion control project using a Delta Tau system which
> consists of a PowerPC running Linux with Xenomai 2.5.6.  Delta Tau has
> arranged things so that you can just write servo loop code in their IDE and
> the build process takes care of the details.  They also provide a way to write
> a "background" linux program, which we use as a communications bridge to a
> second user interface system, sending commands and data over a socket.  The
> bridge program is mainly doing logging and socket I/O. We use some shared
> memory to send commands down to a real-time task (called RTI) and servo
> routine tasks (all written in C) and read back status. Additionally, we have a
> pair of FIFOs sending data streams from RTI to the bridge process.
>
> Until a few days ago, I considered the bridge program as a regular POSIX C
> program, but digging into the build system I discovered that it links with the
> xenomai posix skin libraries, with all the --wrap commands passed to the
> linker.  Furthermore the threads of this program appear in /proc/xenomai/sched
> (with PRI=1) and /proc/xenomai/stat shows that the threads are performing a
> huge number of mode switches.
>

Yuck. Then the issue about frenetic mode switching I described earlier 
might apply. This code may have a basic issue with properly splitting 
the real-time and non real-time activities.

> Our bridge suffers from random lockups. During one lockup, with the help of
> /proc/PID/status and /proc/PID/wchan, I was able to determine that the process
> was stuck in the system call "xnshadow_harden".  It stayed "stuck" for 30+
> hours until I rebooted the machine.

We fixed quite a few bugs (read: a truckload) since 2.5.6, including a 
few in the mode transition paths and elsewhere. So I would not be 
surprised that some of them still bite here. Any chance you could 
upgrade your board to 2.6.3? API-wise, I'm confident you would have no 
significant issue, if any at all. Besides, you would not have to change 
your kernel release - although it's likely quite old as well I guess.

>
> If interested, I posted some more details here:
> * http://forums.deltatau.com/showthread.php?tid=1654
> * http://forums.deltatau.com/showthread.php?tid=743
>
>
> Q4: Generically, what causes a process to get stuck in xnshadow_harden()?  How
> would I troubleshoot further?
>

As mentioned earlier, a Xenomai bug, and these ones have been nasty to 
chase down. xnshadow_harden() is part of the mode switch machinery.

> Q5: We do not call pthread_setschedparam() to change the scheduler or priority
> of the bridge program's threads, yet they appear as PRI=1 in
> /proc/xenomai/sched ... any ideas?  (Note that we do invoke an initialization
> function provided by Delta Tau which may be doing something under the covers).
>

If by threads you mean the main() threads, then there is the possibility 
that the library constructor of libpthread_rt switches the originally 
SCHED_FIFO,1 thread to its Xenomai equivalent, thus visible under 
/proc/xenomai/sched. But that would mean that your program inherits 
SCHED_FIFO,1 from its parent, not SCHED_OTHER.

> Q6: I realize I haven't given terribly many details, but generically, what
> would cause a non-real time "background" process to switch to the primary
> domain, as ours seems to do?

Calling into any Xenomai service which requires it. There are two 
classes of services that do:

- those which might block/suspend the caller (e.g. waiting on a sema4, 
reading/selecting RTDM-originated fildes, etc)

- those which do some kind of introspection of the calling context, or 
would affect some properties of the current thread.

However, a thread may only switch to primary mode if it's a Xenomai 
thread in the first place. A Xenomai thread in user-space is a regular 
POSIX thread on steroïds, which received Xenomai capabilities because 
either of these events happened:

- it was started by the wrapped pthread_create() call
- the wrapped pthread_setschedparam() call was invoked for it.

There is a subtlety to keep in mind at this point: a Xenomai thread is 
not necessarily real-time capable. Xenomai threads created in or moved 
to the SCHED_OTHER class are able to call Xenomai services (e.g. wait 
for, use or signal Xenomai resources), but won't compete for the CPU 
with Xenomai threads which belong to SCHED_FIFO/RR.

This is aimed at allowing non rt threads to synchronize with rt threads 
using common IPCs (sema4, mutexes etc), without having to resort to 
exchanging messages, provided both are Xenomai threads.

So, to sum up, the fact that a thread is able to call Xenomai(-only) 
services means that it must have been given at least Xenomai 
capabilities as mentioned (otherwise it would receive EPERM).

Unlike SCHED_FIFO/RR Xenomai threads, those special SCHED_OTHER Xenomai 
threads will switch back to secondary mode automatically when leaving a 
Xenomai system call if they happened to switch to primary mode for that 
purpose (unless they hold a real-time mutex though).

>
> Q7: In addition to the inconsistent wrapping in bridge, the real-time task RTI
> does not wrap any of its calls, e.g. we use write() on the FIFO.  Is this
> going to cause trouble?
>

This is going to switch any real-time Xenomai caller to secondary/non rt 
mode for sure.

In case this applies, if you want to exchange a stream of data between 
the rt and non-rt world, then you may want to have a look at the 
RTDM-based XDDP IPC, 
http://www.xenomai.org/documentation/xenomai-2.6/html/api/group__rtipc.html, 
with sample code in examples/rtdm/profiles/ipc.
This feature allows a non-rt thread reading/writing to a pseudo-device 
from /dev/rtp* to exchange messages with a rt thread reading/writing to 
a RTDM socket. The rt side never leaves primary mode when doing so, and 
the non rt program does not even have to link against Xenomai libraries.

> Thanks for reading this far.  If you can provide clues for even one of my
> questions, I'd be very very grateful.
>

Hopefully I won't have increased the headache.

HTH,

-- 
Philippe.




More information about the Xenomai mailing list