Re: [PATCH 11/12] rwsem: wake all readers when first waiter is a reader

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Michel Lespinasse <walken@google.com>
To: Peter Hurley <peter@hurleysoftware.com>
Cc: Alex Shi <alex.shi@intel.com>, Ingo Molnar <mingo@kernel.org>,
	David Howells <dhowells@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Thomas Gleixner <tglx@linutronix.de>,
	Yuanhan Liu <yuanhan.liu@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 11/12] rwsem: wake all readers when first waiter is a reader
Date: Thu, 14 Mar 2013 00:03:03 -0700	[thread overview]
Message-ID: <20130314070303.GA10956@google.com> (raw)
In-Reply-To: <1363034207.27803.8.camel@thor.lan>

On Mon, Mar 11, 2013 at 04:36:47PM -0400, Peter Hurley wrote:
> 
> On Wed, 2013-03-06 at 15:21 -0800, Michel Lespinasse wrote:
> > + retry_reader_grants:
> > +	oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
> > +	if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
> > +		/* A writer stole the lock.  Undo our reader grants. */
> > +		if (rwsem_atomic_update(-adjustment, sem) < RWSEM_WAITING_BIAS)
> > +			goto out;
> > +		/* The writer left.  Retry waking readers. */
> > +		goto retry_reader_grants;
> > +	}
> 
> This can be reduced to single looping cmpxchg in the grant reversal
> path; then if reversing the grant fails, the count can simply be
> re-tested for grant success, rather than trying to atomically re-grant.
> For example, with a helper function, rwsem_cmpxchg():
> 
> static inline int rwsem_cmpxchg(long *old, long new, struct rw_semaphore *sem)
> {
> 	long tmp = *old;
> 	*old = cmpxchg(&sem->count, *old, new);
> 	return tmp == *old;
> }
> 
> ... then above becomes ...
> 
> 	count = rwsem_atomic_update(adjustment, sem);
> 	do {
> 		if (count - adjustment >= RWSEM_WAITING_BIAS)
> 			break;
> 		if (rwsem_cmpxchg(&count, count - adjustment, sem))
> 			goto out;   /* or simply return sem */
> 	} while (1);
> 
> 	< wake up readers >

This would work, but I don't see a benefit as we still end up with a
retry loop. Also, the loop would have to retry whenever the counter
value changed instead of only when writer(s) appear or are removed.

> Also, this series and the original rwsem can mistakenly sleep reader(s)
> when the lock is transitioned from writer-owned to waiting readers-owned
> with no waiting writers. For example,
> 
> 
>   CPU 0                               |  CPU 1
>                                       |
>                                       | down_write()
> 
> ... CPU 1 has the write lock for the semaphore.
>     Meanwhile, 1 or more down_read(s) are attempted and fail;
>     these are put on the wait list.

I'm not sure of the relevance of these other down_read() calls -
please note that as these extra readers are put on the wait list,
their +read_bias adjustments are canceled so that the count value
ends up at write_bias + waiting_bias (for a total active count of 1)

> Then ...
> 
> down_read()                           | up_write()
>   local = atomic_update(+read_bias)   |
>   local <= 0?                         |   local = atomic_update(-write_bias)
>   if (true)                           |   local < 0?
>      down_read_failed()               |   if (true)
>                                       |      wake()
>                                       |         grab wait_lock
>         wait for wait_lock            |         wake all readers
>                                       |         release wait_lock
> 
> ... At this point, sem->count > 0 and the wait list is empty,
>     but down_read_failed() will sleep the reader.
> 
> In this case, CPU 0 has observed the sem count with the write lock (and
> the other waiters) and so is detoured to down_read_failed(). But if 
> CPU 0 can't grab the wait_lock before the up_write() does (via
> rwsem_wake()), then down_read_failed() will wake no one and sleep the
> reader.

Actually - down_read_failed() will insert the reader on the wait_list
and do an atomic update(-read_bias). If there are no active lockers at
the time of that atomic update, it calls __rwsem_do_wake() to unqueued
the first queued waiter - which may well be itself.

In your specific example, __rwsem_do_wake() will wake the previously queued
readers as well as current.

> Unfortunately, this means readers and writers which observe the sem
> count after the adjustment is committed by CPU 0 in down_read_failed()
> will sleep as well, until the sem count returns to 0.

Note that the active count does go back to 0 in your example.

However, thinking about it made me consider the following case:

CPU 0               CPU 1                                   CPU 2

down_write()

                    down_read()
                      local = atomic_update(+read_bias)

up_write()

                                                            down_read()

                      if (local <= 0)
                        down_read_failed()

At this point CPU 0 doesn't hold the write lock anymore;
CPU 2 grabbed a read lock;
CPU 1 should grab an additional read lock but it won't - instead, it will
queue itself (and get woken by CPU 2 when that read lock is released).

This is indeed slightly suboptimal. Could be fixed in an additional patch.

As you mentioned, this is not any new issue caused by this patch series
though - it's been there forever as far as I know.

Thanks,

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

next prev parent reply	other threads:[~2013-03-14  7:03 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-06 23:21 [PATCH 00/12] rwsem fast-path write lock stealing Michel Lespinasse
2013-03-06 23:21 ` [PATCH 01/12] rwsem: make the waiter type an enumeration rather than a bitmask Michel Lespinasse
2013-03-13 21:33   ` Rik van Riel
2013-03-06 23:21 ` [PATCH 02/12] rwsem: shorter spinlocked section in rwsem_down_failed_common() Michel Lespinasse
2013-03-06 23:21 ` [PATCH 03/12] rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed Michel Lespinasse
2013-03-06 23:21 ` [PATCH 04/12] rwsem: simplify rwsem_down_read_failed Michel Lespinasse
2013-03-06 23:21 ` [PATCH 05/12] rwsem: simplify rwsem_down_write_failed Michel Lespinasse
2013-03-06 23:21 ` [PATCH 06/12] rwsem: more agressive lock stealing in rwsem_down_write_failed Michel Lespinasse
2013-03-06 23:21 ` [PATCH 07/12] rwsem: use cmpxchg for trying to steal write lock Michel Lespinasse
2013-03-06 23:21 ` [PATCH 08/12] rwsem: avoid taking wait_lock in rwsem_down_write_failed Michel Lespinasse
2013-03-06 23:21 ` [PATCH 09/12] rwsem: skip initial trylock " Michel Lespinasse
2013-03-06 23:21 ` [PATCH 10/12] rwsem-spinlock: wake all readers when first waiter is a reader Michel Lespinasse
2013-03-06 23:21 ` [PATCH 11/12] rwsem: " Michel Lespinasse
2013-03-09  0:32   ` Dave Chinner
2013-03-09  1:20     ` Michel Lespinasse
2013-03-11  0:16       ` Dave Chinner
2013-03-11  5:17         ` Michel Lespinasse
2013-03-12  2:36           ` Dave Chinner
2013-03-12  6:43             ` Michel Lespinasse
2013-03-13  3:23               ` Dave Chinner
2013-03-13 11:03                 ` Michel Lespinasse
2013-03-14  2:00                 ` Peter Hurley
2013-03-19  1:17                   ` Dave Chinner
2013-03-19 23:48                     ` Michel Lespinasse
2013-03-11  7:50         ` Ingo Molnar
2013-03-11 20:36   ` Peter Hurley
2013-03-14  7:03     ` Michel Lespinasse [this message]
2013-03-14 11:39       ` Peter Hurley
2013-03-14 15:20         ` Michel Lespinasse
2013-03-06 23:21 ` [PATCH 12/12] x86 rwsem: avoid taking slow path when stealing write lock Michel Lespinasse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130314070303.GA10956@google.com \
    --to=walken@google.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peter@hurleysoftware.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=yuanhan.liu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox