From: lemonlime51@gmail.com (Matthias Bonne)
To: kernelnewbies@lists.kernelnewbies.org
Subject: Question on mutex code
Date: Sun, 15 Mar 2015 01:05:47 +0200 [thread overview]
Message-ID: <5504BECB.50605@gmail.com> (raw)
In-Reply-To: <1425992639.3991.11.camel@opteya.com>
On 03/10/15 15:03, Yann Droneaud wrote:
> Hi,
>
> Le mercredi 04 mars 2015 ? 02:13 +0200, Matthias Bonne a ?crit :
>
>> I am trying to understand how mutexes work in the kernel, and I think
>> there might be a race between mutex_trylock() and mutex_unlock(). More
>> specifically, the race is between the functions
>> __mutex_trylock_slowpath and __mutex_unlock_common_slowpath (both
>> defined in kernel/locking/mutex.c).
>>
>> Consider the following sequence of events:
>>
[...]
>>
>> The end result is that the mutex count is 0 (locked), although the
>> owner has just released it, and nobody else is holding the mutex. So it
>> can no longer be acquired by anyone.
>>
>> Am I missing something that prevents the above scenario from happening?
>> If not, should I post a patch that fixes it to LKML? Or is it
>> considered too "theoretical" and cannot happen in practice?
>>
>
> I haven't looked at your explanations, you should have come with a
> reproductible test case to demonstrate the issue (involving slowing
> down one CPU ?).
>
> Anyway, such deep knowledge on the mutex implementation has to be found
> on lkml.
>
> Regards.
>
Thank you for your suggestions, and sorry for the long delay.
I see now that my explanation was unneccesarily complex. The problem is
this code from __mutex_trylock_slowpath():
spin_lock_mutex(&lock->wait_lock, flags);
prev = atomic_xchg(&lock->count, -1);
if (likely(prev == 1)) {
mutex_set_owner(lock);
mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_);
}
/* Set it back to 0 if there are no waiters: */
if (likely(list_empty(&lock->wait_list)))
atomic_set(&lock->count, 0);
spin_unlock_mutex(&lock->wait_lock, flags);
return prev == 1;
The above code assumes that the mutex cannot be unlocked while the
spinlock is held. However, mutex_unlock() sets the mutex count to 1
before taking the spinlock (even in the slowpath). If this happens
between the atomic_xchg() and the atomic_set() above, and the mutex has
no waiters, then the atomic_set() will set the mutex count back to 0
after it has been unlocked by mutex_unlock(), but mutex_trylock() will
still return failure. So the mutex will remain locked forever.
I don't know how to write a test case to demonstrate the issue, because
this race is very hard to trigger in practice: the mutex needs to be
locked immediately before the spinlock is acquired, and unlocked in the
very short interval between atomic_xchg() and atomic_set(). It also
requires that CONFIG_DEBUG_MUTEXES be set, since AFAICT the mutex
debugging code is currently the only user of __mutex_trylock_slowpath.
This is why I asked if it is acceptable to submit a patch for such
hard-to-trigger problems.
I think I will just send a fix. Any further suggestions or guidance
would be appreciated.
WARNING: multiple messages have this Message-ID (diff)
From: Matthias Bonne <lemonlime51@gmail.com>
To: Yann Droneaud <ydroneaud@opteya.com>
Cc: kernelnewbies@kernelnewbies.org, linux-kernel@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>
Subject: Re: Question on mutex code
Date: Sun, 15 Mar 2015 01:05:47 +0200 [thread overview]
Message-ID: <5504BECB.50605@gmail.com> (raw)
In-Reply-To: <1425992639.3991.11.camel@opteya.com>
On 03/10/15 15:03, Yann Droneaud wrote:
> Hi,
>
> Le mercredi 04 mars 2015 à 02:13 +0200, Matthias Bonne a écrit :
>
>> I am trying to understand how mutexes work in the kernel, and I think
>> there might be a race between mutex_trylock() and mutex_unlock(). More
>> specifically, the race is between the functions
>> __mutex_trylock_slowpath and __mutex_unlock_common_slowpath (both
>> defined in kernel/locking/mutex.c).
>>
>> Consider the following sequence of events:
>>
[...]
>>
>> The end result is that the mutex count is 0 (locked), although the
>> owner has just released it, and nobody else is holding the mutex. So it
>> can no longer be acquired by anyone.
>>
>> Am I missing something that prevents the above scenario from happening?
>> If not, should I post a patch that fixes it to LKML? Or is it
>> considered too "theoretical" and cannot happen in practice?
>>
>
> I haven't looked at your explanations, you should have come with a
> reproductible test case to demonstrate the issue (involving slowing
> down one CPU ?).
>
> Anyway, such deep knowledge on the mutex implementation has to be found
> on lkml.
>
> Regards.
>
Thank you for your suggestions, and sorry for the long delay.
I see now that my explanation was unneccesarily complex. The problem is
this code from __mutex_trylock_slowpath():
spin_lock_mutex(&lock->wait_lock, flags);
prev = atomic_xchg(&lock->count, -1);
if (likely(prev == 1)) {
mutex_set_owner(lock);
mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_);
}
/* Set it back to 0 if there are no waiters: */
if (likely(list_empty(&lock->wait_list)))
atomic_set(&lock->count, 0);
spin_unlock_mutex(&lock->wait_lock, flags);
return prev == 1;
The above code assumes that the mutex cannot be unlocked while the
spinlock is held. However, mutex_unlock() sets the mutex count to 1
before taking the spinlock (even in the slowpath). If this happens
between the atomic_xchg() and the atomic_set() above, and the mutex has
no waiters, then the atomic_set() will set the mutex count back to 0
after it has been unlocked by mutex_unlock(), but mutex_trylock() will
still return failure. So the mutex will remain locked forever.
I don't know how to write a test case to demonstrate the issue, because
this race is very hard to trigger in practice: the mutex needs to be
locked immediately before the spinlock is acquired, and unlocked in the
very short interval between atomic_xchg() and atomic_set(). It also
requires that CONFIG_DEBUG_MUTEXES be set, since AFAICT the mutex
debugging code is currently the only user of __mutex_trylock_slowpath.
This is why I asked if it is acceptable to submit a patch for such
hard-to-trigger problems.
I think I will just send a fix. Any further suggestions or guidance
would be appreciated.
next prev parent reply other threads:[~2015-03-14 23:05 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-04 0:13 Question on mutex code Matthias Bonne
2015-03-10 13:03 ` Yann Droneaud
2015-03-10 13:03 ` Yann Droneaud
2015-03-10 14:59 ` Valdis.Kletnieks at vt.edu
2015-03-10 14:59 ` Valdis.Kletnieks
2015-03-14 23:08 ` Matthias Bonne
2015-03-14 23:08 ` Matthias Bonne
2015-03-14 23:05 ` Matthias Bonne [this message]
2015-03-14 23:05 ` Matthias Bonne
2015-03-15 1:03 ` Davidlohr Bueso
2015-03-15 1:04 ` Davidlohr Bueso
2015-03-15 1:09 ` Davidlohr Bueso
2015-03-15 1:10 ` Davidlohr Bueso
2015-03-15 21:49 ` Matthias Bonne
2015-03-15 21:49 ` Matthias Bonne
2015-03-15 22:10 ` Rabin Vincent
2015-03-15 22:11 ` Rabin Vincent
2015-03-16 3:40 ` Matthias Bonne
2015-03-16 3:40 ` Matthias Bonne
2015-03-15 22:18 ` Davidlohr Bueso
2015-03-15 22:19 ` Davidlohr Bueso
2015-03-15 22:23 ` Davidlohr Bueso
2015-03-15 22:24 ` Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5504BECB.50605@gmail.com \
--to=lemonlime51@gmail.com \
--cc=kernelnewbies@lists.kernelnewbies.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.