* [PATCH v2 0/2] locking/rwsem: Fix rwsem waiter optimistic spinning problem with RT tasks
@ 2022-10-12 13:33 Waiman Long
2022-10-12 13:33 ` [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
2022-10-12 13:33 ` [PATCH v2 2/2] locking/rwsem: Limit # of null owner retries for handoff writer Waiman Long
0 siblings, 2 replies; 8+ messages in thread
From: Waiman Long @ 2022-10-12 13:33 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
Cc: linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha,
Ting11 Wang 王婷, Waiman Long
v2:
- Add an additional patch to limit the # of first waiter optimistic
spinning in the writer slowpath.
It turns out the current waiter optimistic spinning code does not work
that well if we have RT tasks in the mix. This patch series include
two different fixes to resolve those issues.
Waiman Long (2):
locking/rwsem: Prevent non-first waiter from spinning in down_write()
slowpath
locking/rwsem: Limit # of null owner retries for handoff writer
kernel/locking/rwsem.c | 31 ++++++++++++++++++++++++++-----
1 file changed, 26 insertions(+), 5 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-12 13:33 [PATCH v2 0/2] locking/rwsem: Fix rwsem waiter optimistic spinning problem with RT tasks Waiman Long @ 2022-10-12 13:33 ` Waiman Long 2022-10-12 14:23 ` Mukesh Ojha 2022-10-13 10:02 ` Peter Zijlstra 2022-10-12 13:33 ` [PATCH v2 2/2] locking/rwsem: Limit # of null owner retries for handoff writer Waiman Long 1 sibling, 2 replies; 8+ messages in thread From: Waiman Long @ 2022-10-12 13:33 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng Cc: linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha, Ting11 Wang 王婷, Waiman Long A non-first waiter can potentially spin in the for loop of rwsem_down_write_slowpath() without sleeping but fail to acquire the lock even if the rwsem is free if the following sequence happens: Non-first waiter First waiter Lock holder ---------------- ------------ ----------- Acquire wait_lock rwsem_try_write_lock(): Set handoff bit if RT or wait too long Set waiter->handoff_set Release wait_lock Acquire wait_lock Inherit waiter->handoff_set Release wait_lock Clear owner Release lock if (waiter.handoff_set) { rwsem_spin_on_owner((); if (OWNER_NULL) goto trylock_again; } trylock_again: Acquire wait_lock rwsem_try_write_lock(): if (first->handoff_set && (waiter != first)) return false; Release wait_lock It is especially problematic if the non-first waiter is an RT task and it is running on the same CPU as the first waiter as this can lead to live lock. Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/locking/rwsem.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 44873594de03..3839b38608da 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -636,6 +636,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, new = count; if (count & RWSEM_LOCK_MASK) { + /* + * A waiter (first or not) can set the handoff bit + * if it is an RT task or wait in the wait queue + * for too long. + */ if (has_handoff || (!rt_task(waiter->task) && !time_after(jiffies, waiter->timeout))) return false; @@ -651,11 +656,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)); /* - * We have either acquired the lock with handoff bit cleared or - * set the handoff bit. + * We have either acquired the lock with handoff bit cleared or set + * the handoff bit. Only the first waiter can have its handoff_set + * set here to enable optimistic spinning in slowpath loop. */ if (new & RWSEM_FLAG_HANDOFF) { - waiter->handoff_set = true; + if (waiter == first) + waiter->handoff_set = true; lockevent_inc(rwsem_wlock_handoff); return false; } -- 2.31.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-12 13:33 ` [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long @ 2022-10-12 14:23 ` Mukesh Ojha 2022-10-13 10:02 ` Peter Zijlstra 1 sibling, 0 replies; 8+ messages in thread From: Mukesh Ojha @ 2022-10-12 14:23 UTC (permalink / raw) To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng Cc: linux-kernel, john.p.donnelly, Hillf Danton, Ting11 Wang 王婷 Hi, On 10/12/2022 7:03 PM, Waiman Long wrote: > A non-first waiter can potentially spin in the for loop of > rwsem_down_write_slowpath() without sleeping but fail to acquire the > lock even if the rwsem is free if the following sequence happens: > > Non-first waiter First waiter Lock holder > ---------------- ------------ ----------- > Acquire wait_lock > rwsem_try_write_lock(): > Set handoff bit if RT or > wait too long > Set waiter->handoff_set > Release wait_lock > Acquire wait_lock > Inherit waiter->handoff_set > Release wait_lock > Clear owner > Release lock > if (waiter.handoff_set) { > rwsem_spin_on_owner((); > if (OWNER_NULL) > goto trylock_again; > } > trylock_again: > Acquire wait_lock > rwsem_try_write_lock(): > if (first->handoff_set && (waiter != first)) > return false; > Release wait_lock > > It is especially problematic if the non-first waiter is an RT task and > it is running on the same CPU as the first waiter as this can lead to > live lock. > > Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") > Signed-off-by: Waiman Long <longman@redhat.com> Since this patch is tested and it covers the mentioned scenario. Reviewed-and-Tested-by: Mukesh Ojha <quic_mojha@quicinc.com> -Mukesh > --- > kernel/locking/rwsem.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c > index 44873594de03..3839b38608da 100644 > --- a/kernel/locking/rwsem.c > +++ b/kernel/locking/rwsem.c > @@ -636,6 +636,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, > new = count; > > if (count & RWSEM_LOCK_MASK) { > + /* > + * A waiter (first or not) can set the handoff bit > + * if it is an RT task or wait in the wait queue > + * for too long. > + */ > if (has_handoff || (!rt_task(waiter->task) && > !time_after(jiffies, waiter->timeout))) > return false; > @@ -651,11 +656,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, > } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)); > > /* > - * We have either acquired the lock with handoff bit cleared or > - * set the handoff bit. > + * We have either acquired the lock with handoff bit cleared or set > + * the handoff bit. Only the first waiter can have its handoff_set > + * set here to enable optimistic spinning in slowpath loop. > */ > if (new & RWSEM_FLAG_HANDOFF) { > - waiter->handoff_set = true; > + if (waiter == first) > + waiter->handoff_set = true; > lockevent_inc(rwsem_wlock_handoff); > return false > } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-12 13:33 ` [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long 2022-10-12 14:23 ` Mukesh Ojha @ 2022-10-13 10:02 ` Peter Zijlstra 2022-10-13 13:33 ` Waiman Long 1 sibling, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2022-10-13 10:02 UTC (permalink / raw) To: Waiman Long Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha, Ting11 Wang 王婷 On Wed, Oct 12, 2022 at 09:33:32AM -0400, Waiman Long wrote: > A non-first waiter can potentially spin in the for loop of > rwsem_down_write_slowpath() without sleeping but fail to acquire the > lock even if the rwsem is free if the following sequence happens: > > Non-first waiter First waiter Lock holder > ---------------- ------------ ----------- > Acquire wait_lock > rwsem_try_write_lock(): > Set handoff bit if RT or > wait too long > Set waiter->handoff_set > Release wait_lock > Acquire wait_lock > Inherit waiter->handoff_set > Release wait_lock > Clear owner > Release lock > if (waiter.handoff_set) { > rwsem_spin_on_owner((); > if (OWNER_NULL) > goto trylock_again; > } > trylock_again: > Acquire wait_lock > rwsem_try_write_lock(): > if (first->handoff_set && (waiter != first)) > return false; > Release wait_lock > > It is especially problematic if the non-first waiter is an RT task and > it is running on the same CPU as the first waiter as this can lead to > live lock. > So why not do a better handoff? Specifically, have the owner set owner to first-waiter instead of NULL ? (like the normal mutex code) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-13 10:02 ` Peter Zijlstra @ 2022-10-13 13:33 ` Waiman Long 2022-10-13 15:37 ` Waiman Long 0 siblings, 1 reply; 8+ messages in thread From: Waiman Long @ 2022-10-13 13:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha, Ting11 Wang 王婷 On 10/13/22 06:02, Peter Zijlstra wrote: > On Wed, Oct 12, 2022 at 09:33:32AM -0400, Waiman Long wrote: >> A non-first waiter can potentially spin in the for loop of >> rwsem_down_write_slowpath() without sleeping but fail to acquire the >> lock even if the rwsem is free if the following sequence happens: >> >> Non-first waiter First waiter Lock holder >> ---------------- ------------ ----------- >> Acquire wait_lock >> rwsem_try_write_lock(): >> Set handoff bit if RT or >> wait too long >> Set waiter->handoff_set >> Release wait_lock >> Acquire wait_lock >> Inherit waiter->handoff_set >> Release wait_lock >> Clear owner >> Release lock >> if (waiter.handoff_set) { >> rwsem_spin_on_owner((); >> if (OWNER_NULL) >> goto trylock_again; >> } >> trylock_again: >> Acquire wait_lock >> rwsem_try_write_lock(): >> if (first->handoff_set && (waiter != first)) >> return false; >> Release wait_lock >> >> It is especially problematic if the non-first waiter is an RT task and >> it is running on the same CPU as the first waiter as this can lead to >> live lock. >> > So why not do a better handoff? Specifically, have the owner set owner > to first-waiter instead of NULL ? (like the normal mutex code) I understand your desire to make the rwsem handoff process more like what mutex is currently doing. I certainly think it is doable and will put this in my todo list. However, that needs to be done at unlock and wakeup time. I expect that will require moderate amount of code changes which will make it not that suitable for backporting to the stable releases. I would like to see these simple fixes get merged first and then we can work on a major revamp of the handoff code. What do you think? Cheers, Longman ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-13 13:33 ` Waiman Long @ 2022-10-13 15:37 ` Waiman Long 0 siblings, 0 replies; 8+ messages in thread From: Waiman Long @ 2022-10-13 15:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha, Ting11 Wang 王婷 On 10/13/22 09:33, Waiman Long wrote: > On 10/13/22 06:02, Peter Zijlstra wrote: >> On Wed, Oct 12, 2022 at 09:33:32AM -0400, Waiman Long wrote: >>> A non-first waiter can potentially spin in the for loop of >>> rwsem_down_write_slowpath() without sleeping but fail to acquire the >>> lock even if the rwsem is free if the following sequence happens: >>> >>> Non-first waiter First waiter Lock holder >>> ---------------- ------------ ----------- >>> Acquire wait_lock >>> rwsem_try_write_lock(): >>> Set handoff bit if RT or >>> wait too long >>> Set waiter->handoff_set >>> Release wait_lock >>> Acquire wait_lock >>> Inherit waiter->handoff_set >>> Release wait_lock >>> Clear owner >>> Release lock >>> if (waiter.handoff_set) { >>> rwsem_spin_on_owner((); >>> if (OWNER_NULL) >>> goto trylock_again; >>> } >>> trylock_again: >>> Acquire wait_lock >>> rwsem_try_write_lock(): >>> if (first->handoff_set && (waiter != first)) >>> return false; >>> Release wait_lock >>> >>> It is especially problematic if the non-first waiter is an RT task and >>> it is running on the same CPU as the first waiter as this can lead to >>> live lock. >>> >> So why not do a better handoff? Specifically, have the owner set owner >> to first-waiter instead of NULL ? (like the normal mutex code) > > I understand your desire to make the rwsem handoff process more like > what mutex is currently doing. I certainly think it is doable and will > put this in my todo list. However, that needs to be done at unlock and > wakeup time. I expect that will require moderate amount of code > changes which will make it not that suitable for backporting to the > stable releases. > > I would like to see these simple fixes get merged first and then we > can work on a major revamp of the handoff code. What do you think? > I am planning to post additional patches on top to rework the handoff code sometimes next week, but I will keep these fix patches for the stable releases. Cheers, Longman ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] locking/rwsem: Limit # of null owner retries for handoff writer 2022-10-12 13:33 [PATCH v2 0/2] locking/rwsem: Fix rwsem waiter optimistic spinning problem with RT tasks Waiman Long 2022-10-12 13:33 ` [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long @ 2022-10-12 13:33 ` Waiman Long 2022-10-12 14:38 ` Mukesh Ojha 1 sibling, 1 reply; 8+ messages in thread From: Waiman Long @ 2022-10-12 13:33 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng Cc: linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha, Ting11 Wang 王婷, Waiman Long Commit 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") assumes that when the owner field is changed to NULL, the lock will become free soon. That assumption may not be correct especially if the handoff writer doing the spinning is a RT task which may preempt another task from completing its action of either freeing the rwsem or properly setting up owner. To prevent this live lock scenario, we have to limit the number of trylock attempts without sleeping. The current limit is now set to 8 to allow enough time for the other task to hopefully complete its action. By adding new lock events to track the number of NULL owner retries with handoff flag set before a successful trylock when running a 96 threads locking microbenchmark with equal number of readers and writers running on a 2-core 96-thread system for 15 seconds, the following stats are obtained. Note that none of locking threads are RT tasks. Retries of successful trylock Count ----------------------------- ----- 1 1738 2 19 3 11 4 2 5 1 6 1 7 1 8 0 X 1 The last row is the one failed attempt that needs more than 8 retries. So a retry count maximum of 8 should capture most of them if no RT task is in the mix. Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/locking/rwsem.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 3839b38608da..12eb093328f2 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1123,6 +1123,7 @@ static struct rw_semaphore __sched * rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) { struct rwsem_waiter waiter; + int null_owner_retries; DEFINE_WAKE_Q(wake_q); /* do optimistic spinning and steal lock if possible */ @@ -1164,7 +1165,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) set_current_state(state); trace_contention_begin(sem, LCB_F_WRITE); - for (;;) { + for (null_owner_retries = 0;;) { if (rwsem_try_write_lock(sem, &waiter)) { /* rwsem_try_write_lock() implies ACQUIRE on success */ break; @@ -1190,8 +1191,21 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) owner_state = rwsem_spin_on_owner(sem); preempt_enable(); - if (owner_state == OWNER_NULL) + /* + * owner is NULL doesn't guarantee the lock is free. + * An incoming reader will temporarily increment the + * reader count without changing owner and the + * rwsem_try_write_lock() will fails if the reader + * is not able to decrement it in time. Allow 8 + * trylock attempts when hitting a NULL owner before + * going to sleep. + */ + if ((owner_state == OWNER_NULL) && + (null_owner_retries < 8)) { + null_owner_retries++; goto trylock_again; + } + null_owner_retries = 0; } schedule(); -- 2.31.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] locking/rwsem: Limit # of null owner retries for handoff writer 2022-10-12 13:33 ` [PATCH v2 2/2] locking/rwsem: Limit # of null owner retries for handoff writer Waiman Long @ 2022-10-12 14:38 ` Mukesh Ojha 0 siblings, 0 replies; 8+ messages in thread From: Mukesh Ojha @ 2022-10-12 14:38 UTC (permalink / raw) To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng Cc: linux-kernel, john.p.donnelly, Hillf Danton, Ting11 Wang 王婷 Hi, On 10/12/2022 7:03 PM, Waiman Long wrote: > Commit 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically > spin on owner") assumes that when the owner field is changed to NULL, > the lock will become free soon. That assumption may not be correct > especially if the handoff writer doing the spinning is a RT task which > may preempt another task from completing its action of either freeing > the rwsem or properly setting up owner. > > To prevent this live lock scenario, we have to limit the number of > trylock attempts without sleeping. The current limit is now set to 8 > to allow enough time for the other task to hopefully complete its action. > > By adding new lock events to track the number of NULL owner retries with > handoff flag set before a successful trylock when running a 96 threads > locking microbenchmark with equal number of readers and writers running > on a 2-core 96-thread system for 15 seconds, the following stats are > obtained. Note that none of locking threads are RT tasks. > > Retries of successful trylock Count > ----------------------------- ----- > 1 1738 > 2 19 > 3 11 > 4 2 > 5 1 > 6 1 > 7 1 > 8 0 > X 1 > > The last row is the one failed attempt that needs more than 8 retries. > So a retry count maximum of 8 should capture most of them if no RT task > is in the mix. > > Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") > Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> > Signed-off-by: Waiman Long <longman@redhat.com> > --- > kernel/locking/rwsem.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c > index 3839b38608da..12eb093328f2 100644 > --- a/kernel/locking/rwsem.c > +++ b/kernel/locking/rwsem.c > @@ -1123,6 +1123,7 @@ static struct rw_semaphore __sched * > rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) > { > struct rwsem_waiter waiter; > + int null_owner_retries; > DEFINE_WAKE_Q(wake_q); > > /* do optimistic spinning and steal lock if possible */ > @@ -1164,7 +1165,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) > set_current_state(state); > trace_contention_begin(sem, LCB_F_WRITE); > > - for (;;) { > + for (null_owner_retries = 0;;) { > if (rwsem_try_write_lock(sem, &waiter)) { > /* rwsem_try_write_lock() implies ACQUIRE on success */ > break; > @@ -1190,8 +1191,21 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state) > owner_state = rwsem_spin_on_owner(sem); > preempt_enable(); > > - if (owner_state == OWNER_NULL) > + /* > + * owner is NULL doesn't guarantee the lock is free. > + * An incoming reader will temporarily increment the > + * reader count without changing owner and the > + * rwsem_try_write_lock() will fails if the reader > + * is not able to decrement it in time. Allow 8 > + * trylock attempts when hitting a NULL owner before > + * going to sleep. > + */ > + if ((owner_state == OWNER_NULL) && > + (null_owner_retries < 8)) { define MAX_NULL_OWNER_RETRY 8 ?? > + null_owner_retries++; > goto trylock_again; > + } > + null_owner_retries = 0; > } > Thanks for considering this patch. LGTM. Reviewed-and-Tested-by: Mukesh Ojha <quic_mojha@quicinc.com> -Mukesh > schedule(); ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-10-13 15:38 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-12 13:33 [PATCH v2 0/2] locking/rwsem: Fix rwsem waiter optimistic spinning problem with RT tasks Waiman Long 2022-10-12 13:33 ` [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long 2022-10-12 14:23 ` Mukesh Ojha 2022-10-13 10:02 ` Peter Zijlstra 2022-10-13 13:33 ` Waiman Long 2022-10-13 15:37 ` Waiman Long 2022-10-12 13:33 ` [PATCH v2 2/2] locking/rwsem: Limit # of null owner retries for handoff writer Waiman Long 2022-10-12 14:38 ` Mukesh Ojha
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox