[Xenomai] EINTR in notifier.c (mercury)

Philippe Gerum rpm at xenomai.org
Fri Apr 11 19:02:04 CEST 2014


On 04/11/2014 06:44 PM, Matthias Schneider wrote:
>
>
>
>
> ----- Original Message -----
>> From: Philippe Gerum <rpm at xenomai.org>
>> To: Matthias Schneider <ma30002000 at yahoo.de>; "xenomai at xenomai.org" <xenomai at xenomai.org>
>> Cc:
>> Sent: Friday, April 11, 2014 5:07 PM
>> Subject: Re: [Xenomai] EINTR in notifier.c (mercury)
>>
>> On 04/11/2014 04:59 PM, Philippe Gerum wrote:
>>>   On 04/07/2014 06:49 PM, Matthias Schneider wrote:
>>>>   ----- Original Message -----
>>>>
>>>>>   From: Philippe Gerum <rpm at xenomai.org>
>>>>>   To: Matthias Schneider <ma30002000 at yahoo.de>;
>> "xenomai at xenomai.org"
>>>>>   <xenomai at xenomai.org>
>>>>>   Cc:
>>>>>   Sent: Sunday, April 6, 2014 5:04 PM
>>>>>   Subject: Re: [Xenomai] EINTR in notifier.c (mercury)
>>>>>
>>>>>   On 04/06/2014 01:11 PM, Matthias Schneider wrote:
>>>>>>>     On Monday, March 31, 2014 7:50 PM, Gilles Chanteperdrix
>>>>>   <gilles.chanteperdrix at xenomai.org> wrote:
>>>>>>
>>>>>>>>     On 03/31/2014 03:18 PM, Philippe Gerum wrote:
>>>>>>>>       On 03/31/2014 01:27 PM, Gilles Chanteperdrix wrote:
>>>>>>>>>       On 03/31/2014 12:34 PM, Matthias Schneider
>> wrote:
>>>>>>>>>>       Hi all,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>       still working on thread suspension in
>> mercury, I noticed
>>>>>   that some
>>>>>>>>>>       threadobj_suspend() and threadobj_resume()
>> calls seemed
>>>>>   not to have the desired
>>>>>>>>>>       effect. Analyzing the issue, I found out
>> that sometimes
>>>>>   the read operations on
>>>>>>>>>>       the pipe in notifier_wait() seem to return
>> with EINTR,
>>>>>   especially in
>>>>>>>>>>       heavily loaded systems. Restarting the read
>> system call
>>>>>   in that case made
>>>>>>>>>>       thread suspension a lot more reliable in my
>> case.
>>>>>>>>>>
>>>>>>>>>>       I have attached a patch adding loops to
>> deal with the
>>>>>   EINTR situation in all
>>>>>>>>>
>>>>>>>>>       You can probably avoid testing for EINTR if all
>> signal
>>>>>   handlers are
>>>>>>>>>       registered with the SA_RESTART flag.
>>>>>>>>>
>>>>>>>>
>>>>>>>>       The app may not explicitly care for signals
>> (granted, most would
>>>>>   do, but
>>>>>>>>       we would then have to assume they do it right).
>>>>>>>>
>>>>>>>
>>>>>>>     Yes, applications may want to use signals to interrupt a
>> call to
>>>>>>>   read
>>>>>>>     and voluntarily get EINTR anyway.
>>>>>>>                              Gilles.
>>>>>>
>>>>>>
>>>>>>     Please find attached a third version of my patch dealing with
>>>>>>     EINTR syscalls. Note that I also stumbled over unexpected
>> behavior
>>>>>>     in threadobj_sleep and cancel_sync due to EINTRs. Considering
>> that
>>>>>>     mercury is using signals for user-space round-robin and
>> thread
>>>>>>     suspension, even without any application signals EINTR
>> happens
>>>>>>     quite often.
>>>>>
>>>>>   @@ -86,9 +88,13 @@ static void notifier_sighandler(int sig,
>> siginfo_t
>>>>>   *siginfo,
>>>>>   void *uc)
>>>>>            if (nf->owner && nf->owner != tid)
>>>>>                continue;
>>>>>
>>>>>   -        while (__STD(read(nf->psfd[0], &c, 1)) > 0)
>>>>>   -            /* Callee must run async-safe code only. */
>>>>>   -            nf->callback(nf);
>>>>>   +        do {
>>>>>   +            ret = __STD(read(nf->psfd[0], &c, 1));
>>>>>   +            if (ret > 0)
>>>>>   +                /* Callee must run async-safe code only. */
>>>>>   +                nf->callback(nf);
>>>>>   +        } while (ret > 0 || errno == EINTR);
>>>>>
>>>>>   In theory, we could receive a zero byte count on EOF, in which case
>>>>>   we should
>>>>>   not test errno, rather we should bail out immediately.
>>>>>
>>>>>   @@ -1093,7 +1093,9 @@ int threadobj_sleep(struct timespec *ts)
>>>>>         */
>>>>>        current->run_state = __THREAD_S_DELAYED;
>>>>>        threadobj_save_timeout(&current->core, ts);
>>>>>   -    ret = -__RT(clock_nanosleep(CLOCK_COPPERPLATE, TIMER_ABSTIME,
>>>>>   ts, NULL));
>>>>>   +    do
>>>>>   +        ret = -__RT(clock_nanosleep(CLOCK_COPPERPLATE,
>>>>>   TIMER_ABSTIME, ts,
>>>>>   NULL));
>>>>>
>>>>>   We should definitely pass and use the "remain" field in
>>>>>   clock_nanosleep(), not to restart a complete wait each time we get
>>>>>   interrupted.
>>>>>
>>>>>
>>>>>   --
>>>>>   Philippe.
>>>>>
>>>>
>>>>   I have attached a new version of the patch leaving the loop on ret ==
>>>>   0. Note
>>>>   that this will be a new behavior in compared to before the patch.
>>>>
>>>>   Gilles already pointed out that TIMER_ABSTIME allows restarting the
>>>>   same syscall
>>>>   in threadobj_sleep.
>>>>
>>>
>>>   Yes, but unfortunately, that part of the patch is still broken and
>>>   introduced a regression of other copperplate-based APÏs. Several RTOS
>>>   implement some kind of thread/task_unblock() service, which shall cause
>>>   sleep calls based on threadobj_sleep() to return upon receiving the
>>>   SIGRELS notification via a signal. This is the meaning of the comment
>>>   heading that code.
>>>
>>>   So the proper fix should go to the call site in the pSOS emulator,
>>>   instead of changing the generic behavior in threadobj_sleep().
>>>
>>
>> Or threadobj_sleep() should be made smarter, specifically detecting
>> SIGRELS in the mercury case for aborting the wait.
>>
>>
>> --
>> Philippe.
>>
>
> I think I would prefer to encapsulate that in threadobj_sleep(). Would
> storing a flag in threadobj_current() in unblock_sighandler() and
> checking this in threadobj_sleep() be an option?

We'll have to mate threadobj_unblock() and threadobj_sleep() logically 
instead, to make this work for both cobalt and mercury cores, but yes, 
using a TCB-based flag for handling this specific problem is the way to 
go. This way, only SIGRELS notifications would make threadobj_sleep() 
abort with EINTR as expected, silently resuming sleep after other signals.

Since the pSOS emulator should never receive SIGRELS, we should not have 
to handle EINTR from tm_wkafter/tm_wkwhen() actually.

I'm working on a patch.

  Sorry for having
> overlooked this,
>

No problem, copperplate still lacks documentation anyway.


-- 
Philippe.




More information about the Xenomai mailing list