Re: [PATCH 11/12] rwsem: wake all readers when first waiter is a reader

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Peter Hurley <peter@hurleysoftware.com>
Cc: Michel Lespinasse <walken@google.com>,
	Alex Shi <alex.shi@intel.com>, Ingo Molnar <mingo@kernel.org>,
	David Howells <dhowells@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Thomas Gleixner <tglx@linutronix.de>,
	Yuanhan Liu <yuanhan.liu@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 11/12] rwsem: wake all readers when first waiter is a reader
Date: Tue, 19 Mar 2013 12:17:54 +1100	[thread overview]
Message-ID: <20130319011754.GU6369@dastard> (raw)
In-Reply-To: <1363226451.25976.170.camel@thor.lan>

On Wed, Mar 13, 2013 at 10:00:51PM -0400, Peter Hurley wrote:
> On Wed, 2013-03-13 at 14:23 +1100, Dave Chinner wrote:
> > We don't care about the ordering between multiple concurrent
> > metadata modifications - what matters is whether the ongoing data IO
> > around them is ordered correctly.
> 
> Dave,
> 
> The point that Michel is making is that there never was any ordering
> guarantee by rwsem. It's an illusion.

Weasel words.

> The reason is simple: to even get to the lock the cpu has to be
> sleep-able. So for every submission that you believe is ordered, is by
> its very nature __not ordered__, even when used by kernel code.
>
> Why? Because any thread on its way to claim the lock (reader or writer)
> could be pre-empted for some other task, thus delaying the submission of
> whatever i/o you believed to be ordered.

You think I don't know this?  You're arguing fine grained, low level
behaviour between tasks is unpredictable. I get that. I understand
that. But I'm not arguing about fine-grained, low level, microsecond
semantics of the locking order....

What you (and Michael) appear to be failing to see is what happens
on a macro level when you have read locks being held for periods
measured in *seconds* (e.g. direct IO gets queued behind a few
thousand other IOs in the elevator waiting for a request slot),
and the subsequent effect of inserting an operation that requires a
write lock into that IO stream.

IOWs, it simply doesn't matter if there's a micro-level race between
the write lock and a couple of the readers. That's the level you
guys are arguing at but it simply does not matter in the cases I'm
describing. I'm talking about high level serialisation behaviours
that might take of *seconds* to play out and the ordering behaviours
observed at that scale.

That is, I don't care if a couple of threads out of a few thousand
race with the write lock over few tens to hundreds of microseconds,
but I most definitely care if a few thousand IOs issued seconds
after the write lock is queued jump over the write lock. That is a
gross behavioural change at the macro-level.....

> So just to reiterate: there is no 'queue' and no 'barrier'. The
> guarantees that rwsem makes are;
> 1. Multiple readers can own the lock.
> 2. Only a single writer can own the lock.
> 3. Readers will not starve writers.

You've conveniently ignored the fact that the current implementation
also provides following guarantee:

4. new readers will block behind existing writers

And that's the behaviour we currently depend on, whether you like it
or not.

> Where lock policy can have a significant impact is on performance. But
> predicting that impact is difficult -- it's better just to measure.

Predicting the impact in this case is trivial - it's obvious that
ordering of operations will change and break high level assumptions
that userspace currently makes about various IO operations on XFS
filesystems

> It's not my intention to convince you (or anyone else) that there should
> only be One True Rwsem, because I don't believe that. But I didn't want
> the impression to persist that rwsem does anything more than implement a
> fair reader/writer semaphore.

I'm sorry, but redefining "fair" to suit your own needs doesn't
convince me of anything. rwsem behaviour has been unchanged for at
least 10 years and hence the current implementation defines what is
"fair", not what you say is fair....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2013-03-19  1:18 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-06 23:21 [PATCH 00/12] rwsem fast-path write lock stealing Michel Lespinasse
2013-03-06 23:21 ` [PATCH 01/12] rwsem: make the waiter type an enumeration rather than a bitmask Michel Lespinasse
2013-03-13 21:33   ` Rik van Riel
2013-03-06 23:21 ` [PATCH 02/12] rwsem: shorter spinlocked section in rwsem_down_failed_common() Michel Lespinasse
2013-03-06 23:21 ` [PATCH 03/12] rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed Michel Lespinasse
2013-03-06 23:21 ` [PATCH 04/12] rwsem: simplify rwsem_down_read_failed Michel Lespinasse
2013-03-06 23:21 ` [PATCH 05/12] rwsem: simplify rwsem_down_write_failed Michel Lespinasse
2013-03-06 23:21 ` [PATCH 06/12] rwsem: more agressive lock stealing in rwsem_down_write_failed Michel Lespinasse
2013-03-06 23:21 ` [PATCH 07/12] rwsem: use cmpxchg for trying to steal write lock Michel Lespinasse
2013-03-06 23:21 ` [PATCH 08/12] rwsem: avoid taking wait_lock in rwsem_down_write_failed Michel Lespinasse
2013-03-06 23:21 ` [PATCH 09/12] rwsem: skip initial trylock " Michel Lespinasse
2013-03-06 23:21 ` [PATCH 10/12] rwsem-spinlock: wake all readers when first waiter is a reader Michel Lespinasse
2013-03-06 23:21 ` [PATCH 11/12] rwsem: " Michel Lespinasse
2013-03-09  0:32   ` Dave Chinner
2013-03-09  1:20     ` Michel Lespinasse
2013-03-11  0:16       ` Dave Chinner
2013-03-11  5:17         ` Michel Lespinasse
2013-03-12  2:36           ` Dave Chinner
2013-03-12  6:43             ` Michel Lespinasse
2013-03-13  3:23               ` Dave Chinner
2013-03-13 11:03                 ` Michel Lespinasse
2013-03-14  2:00                 ` Peter Hurley
2013-03-19  1:17                   ` Dave Chinner [this message]
2013-03-19 23:48                     ` Michel Lespinasse
2013-03-11  7:50         ` Ingo Molnar
2013-03-11 20:36   ` Peter Hurley
2013-03-14  7:03     ` Michel Lespinasse
2013-03-14 11:39       ` Peter Hurley
2013-03-14 15:20         ` Michel Lespinasse
2013-03-06 23:21 ` [PATCH 12/12] x86 rwsem: avoid taking slow path when stealing write lock Michel Lespinasse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130319011754.GU6369@dastard \
    --to=david@fromorbit.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peter@hurleysoftware.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=walken@google.com \
    --cc=yuanhan.liu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox