A potential Xenomai Mutex issue
DIAO, Hanson
hanson.diao at siemens.com
Thu Aug 22 20:42:33 CEST 2019
Hi all,
I hope you are doing well. Currently I was working on a critical deadlock issue with Xenomail Library(version 2.6.4). I found that for the Xenomai lock count is not reliable after we called rt_mutex_release. I print the following message to you. I hope some developer can help me fix this issue. I know that this version is EOL, but we still use this old version. Thank you so much.
Issue 1:
Before Mutex Lock Mutext addr = 0xb7c059e8,count = 0, owner = 0 This message show the status before rt_mutex_acquire.
After Mutex Lock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd This message show the status after calling rt_mutex_acquire. Everything is right for the rt_mutex_acquire in this scenario.
Before Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd This message show the status before rt_mutex_release.
After Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 0 This message show the status after rt_mutex_release. It seems that the lock count is not correct after call rt_mutex_release.
Issue 2:
When our task is call recursive lock. The mutex lock count should more than 1, but the lock count is still 1.
For the issue 1, I guess that there are something wrong in the release function. I highlighted the code. I am not sure if it is the root cause.
int rt_mutex_release(RT_MUTEX *mutex)
{
#ifdef CONFIG_XENO_FASTSYNCH
unsigned long status;
xnhandle_t cur;
cur = xeno_get_current();
if (cur == XN_NO_HANDLE)
return -EPERM;
status = xeno_get_current_mode();
if (unlikely(status & XNOTHER))
/* See rt_mutex_acquire_inner() */
goto do_syscall;
if (unlikely(xnsynch_fast_owner_check(mutex->fastlock, cur) != 0))
return -EPERM;
if (mutex->lockcnt > 1) {
mutex->lockcnt--;
return 0;
}
if (likely(xnsynch_fast_release(mutex->fastlock, cur)))
{
return 0;
}
do_syscall:
#endif /* CONFIG_XENO_FASTSYNCH */
return XENOMAI_SKINCALL1(__native_muxid, __native_mutex_release, mutex);
}
For the Mutex lock function, I am so confused with the following comments which I highlighted as below. I am not sure if it supports the recursive lock.
static int rt_mutex_acquire_inner(RT_MUTEX *mutex, RTIME timeout, xntmode_t mode)
{
int err;
#ifdef CONFIG_XENO_FASTSYNCH
unsigned long status;
xnhandle_t cur;
cur = xeno_get_current();
if (cur == XN_NO_HANDLE)
return -EPERM;
/*
* We track resource ownership for non real-time shadows in
* order to handle the auto-relax feature, so we must always
* obtain them via a syscall.
*/
status = xeno_get_current_mode();
if (unlikely(status & XNOTHER))
goto do_syscall;
if (likely(!(status & XNRELAX))) {
err = xnsynch_fast_acquire(mutex->fastlock, cur);
if (likely(!err)) {
mutex->lockcnt = 1;
return 0;
}
if (err == -EBUSY) {
if (mutex->lockcnt == UINT_MAX)
return -EAGAIN;
mutex->lockcnt++;
return 0;
}
if (timeout == TM_NONBLOCK && mode == XN_RELATIVE)
return -EWOULDBLOCK;
} else if (xnsynch_fast_owner_check(mutex->fastlock, cur) == 0) {
/*
* The application is buggy as it jumped to secondary mode
* while holding the mutex. Nevertheless, we have to keep the
* mutex state consistent.
*
* We make no efforts to migrate or warn here. There is
* XENO_DEBUG(SYNCH_RELAX) to catch such bugs.
*/
if (mutex->lockcnt == UINT_MAX)
return -EAGAIN;
mutex->lockcnt++;
return 0;
}
do_syscall:
#endif /* CONFIG_XENO_FASTSYNCH */
err = XENOMAI_SKINCALL3(__native_muxid,
__native_mutex_acquire, mutex, mode, &timeout);
#ifdef CONFIG_XENO_FASTSYNCH
if (!err)
mutex->lockcnt = 1;
#endif /* CONFIG_XENO_FASTSYNCH */
return err;
}
More information about the Xenomai
mailing list