A potential Xenomai Mutex issue
DIAO, Hanson
hanson.diao at siemens.com
Fri Aug 23 16:02:53 CEST 2019
Hi Jan,
Thank you for your reply. I will answer the questions one by one.
Q: This is on a ARMv7 multicore target, rightIs?
HD: This is PowerPC target.
Q: Are you already able to reproduce the issue reliably, possibly in a synthetic environment?
HD: It reproduces every time for the first issue and second issue(recursive lock lock count should be more than 1).
Q: Or does your whole stack have to run on the target for a long time to trigger this?
HD: I got this issue when the system was in initialized stage. It is easy to trigger this and every time it happens.
Q: the mutex shared between multiple process or just between threads of the same process?
HD: The mutex shared only in one process with multi-tasks.
Q:Maybe something analogously was needed for native as well. And then you could look at what happened in 3.x mutex-wise to check if you are not missing a conceptual fix in 2.6.
HD: I will check the commit message. I compared 2.6.4 version with 2.6.5 version. It seems that the code are same in mutex(User space mutex).
Q:You seem to look at the wrong data structure. You need to examine RT_MUTEX_PLACEHOLDER fields.
HD: The data structure which I got is RT_MUTEX_PLACEHOLDER fields. I attached the code as below.
typedef struct rt_mutex_placeholder {
xnhandle_t opaque;
#ifdef CONFIG_XENO_FASTSYNCH
xnarch_atomic_t *fastlock;
int lockcnt;
#endif /* CONFIG_XENO_FASTSYNCH */
} RT_MUTEX_PLACEHOLDER;
-----Original Message-----
From: Jan Kiszka <jan.kiszka at siemens.com>
Sent: Friday, August 23, 2019 2:48 AM
To: DIAO, Hanson (DI PA CI RC R&D SW2) <hanson.diao at siemens.com>; xenomai at xenomai.org
Subject: Re: A potential Xenomai Mutex issue
On 22.08.19 20:42, DIAO, Hanson via Xenomai wrote:
> Hi all,
>
>
>
> I hope you are doing well. Currently I was working on a critical deadlock issue with Xenomail Library(version 2.6.4). I found that for the Xenomai lock count is not reliable after we called rt_mutex_release. I print the following message to you. I hope some developer can help me fix this issue. I know that this version is EOL, but we still use this old version. Thank you so much.
>
This is on a ARMv7 multicore target, right? Are you already able to reproduce the issue reliably, possibly in a synthetic environment? Or does your whole stack have to run on the target for a long time to trigger this? Is the mutex shared between multiple process or just between threads of the same process?
Next point: You are on 2.6.4 while the last release was 2.6.5. It contained e.g.
8047147aff9d (posix/mutex: handle recursion count completely in user-space).
>
>
> Issue 1:
>
> Before Mutex Lock Mutext addr = 0xb7c059e8,count = 0, owner = 0 This message show the status before rt_mutex_acquire.
>
> After Mutex Lock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd This message show the status after calling rt_mutex_acquire. Everything is right for the rt_mutex_acquire in this scenario.
>
>
>
> Before Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd This message show the status before rt_mutex_release.
>
> After Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 0 This message show the status after rt_mutex_release. It seems that the lock count is not correct after call rt_mutex_release.
>
>
>
> Issue 2:
>
> When our task is call recursive lock. The mutex lock count should more than 1, but the lock count is still 1.
>
>
>
> For the issue 1, I guess that there are something wrong in the release function. I highlighted the code. I am not sure if it is the root cause.
>
Don't use HTML emails on public lists. They often get filtered, at latest on receiver side.
Jan
>
>
> int rt_mutex_release(RT_MUTEX *mutex)
>
> {
>
> #ifdef CONFIG_XENO_FASTSYNCH
>
> unsigned long status;
>
> xnhandle_t cur;
>
>
>
> cur = xeno_get_current();
>
> if (cur == XN_NO_HANDLE)
>
> return -EPERM;
>
>
>
> status = xeno_get_current_mode();
>
> if (unlikely(status & XNOTHER))
>
> /* See rt_mutex_acquire_inner() */
>
> goto do_syscall;
>
>
>
> if (unlikely(xnsynch_fast_owner_check(mutex->fastlock, cur)
> != 0))
>
> return -EPERM;
>
>
>
> if (mutex->lockcnt > 1) {
>
> mutex->lockcnt--;
>
> return 0;
>
> }
>
>
>
> if (likely(xnsynch_fast_release(mutex->fastlock, cur)))
>
> {
>
> return 0;
>
> }
>
> do_syscall:
>
> #endif /* CONFIG_XENO_FASTSYNCH */
>
>
>
> return XENOMAI_SKINCALL1(__native_muxid,
> __native_mutex_release, mutex);
>
> }
>
>
>
>
>
>
>
> For the Mutex lock function, I am so confused with the following comments which I highlighted as below. I am not sure if it supports the recursive lock.
>
> static int rt_mutex_acquire_inner(RT_MUTEX *mutex, RTIME timeout,
> xntmode_t mode)
>
> {
>
> int err;
>
> #ifdef CONFIG_XENO_FASTSYNCH
>
> unsigned long status;
>
> xnhandle_t cur;
>
>
>
> cur = xeno_get_current();
>
> if (cur == XN_NO_HANDLE)
>
> return -EPERM;
>
>
>
> /*
>
> * We track resource ownership for non real-time shadows in
>
> * order to handle the auto-relax feature, so we must always
>
> * obtain them via a syscall.
>
> */
>
> status = xeno_get_current_mode();
>
> if (unlikely(status & XNOTHER))
>
> goto do_syscall;
>
>
>
> if (likely(!(status & XNRELAX))) {
>
> err = xnsynch_fast_acquire(mutex->fastlock, cur);
>
> if (likely(!err)) {
>
> mutex->lockcnt = 1;
>
> return 0;
>
> }
>
>
>
> if (err == -EBUSY) {
>
> if (mutex->lockcnt == UINT_MAX)
>
> return -EAGAIN;
>
>
>
> mutex->lockcnt++;
>
> return 0;
>
> }
>
>
>
> if (timeout == TM_NONBLOCK && mode == XN_RELATIVE)
>
> return -EWOULDBLOCK;
>
> } else if (xnsynch_fast_owner_check(mutex->fastlock, cur) ==
> 0) {
>
> /*
>
> * The application is buggy as it jumped to secondary
> mode
>
> * while holding the mutex. Nevertheless, we have to
> keep the
>
> * mutex state consistent.
>
> *
>
> * We make no efforts to migrate or warn here. There
> is
>
> * XENO_DEBUG(SYNCH_RELAX) to catch such bugs.
>
> */
>
> if (mutex->lockcnt == UINT_MAX)
>
> return -EAGAIN;
>
>
>
> mutex->lockcnt++;
>
> return 0;
>
> }
>
> do_syscall:
>
> #endif /* CONFIG_XENO_FASTSYNCH */
>
>
>
> err = XENOMAI_SKINCALL3(__native_muxid,
>
> __native_mutex_acquire, mutex, mode,
> &timeout);
>
>
>
> #ifdef CONFIG_XENO_FASTSYNCH
>
> if (!err)
>
> mutex->lockcnt = 1;
>
> #endif /* CONFIG_XENO_FASTSYNCH */
>
>
>
> return err;
>
> }
>
>
>
>
>
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
More information about the Xenomai
mailing list