linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jan Kara <jack@suse.cz>,
	Michal Hocko <mhocko@suse.cz>, Hugh Dickins <hughd@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-FSDevel <linux-fsdevel@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Howells <dhowells@redhat.com>
Subject: Re: [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath
Date: Tue, 13 May 2014 08:27:19 -0700	[thread overview]
Message-ID: <20140513152719.GF18164@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140513141748.GD2485@laptop.programming.kicks-ass.net>

On Tue, May 13, 2014 at 04:17:48PM +0200, Peter Zijlstra wrote:
> On Tue, May 13, 2014 at 01:53:13PM +0100, Mel Gorman wrote:
> > On Tue, May 13, 2014 at 10:45:50AM +0100, Mel Gorman wrote:
> > >  void unlock_page(struct page *page)
> > >  {
> > > +	wait_queue_head_t *wqh = clear_page_waiters(page);
> > > +
> > >  	VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > +
> > > +	/*
> > > +	 * No additional barrier needed due to clear_bit_unlock barriering all updates
> > > +	 * before waking waiters
> > > +	 */
> > >  	clear_bit_unlock(PG_locked, &page->flags);
> > > -	smp_mb__after_clear_bit();
> > > -	wake_up_page(page, PG_locked);
> > 
> > This is wrong. The smp_mb__after_clear_bit() is still required to ensure
> > that the cleared bit is visible before the wakeup on all architectures.
> 
> wakeup implies a mb, and I just noticed that our Documentation is
> 'obsolete' and only mentions it implies a wmb.
> 
> Also, if you're going to use smp_mb__after_atomic() you can use
> clear_bit() and not use clear_bit_unlock().
> 
> 
> 
> ---
> Subject: doc: Update wakeup barrier documentation
> 
> As per commit e0acd0a68ec7 ("sched: fix the theoretical signal_wake_up()
> vs schedule() race") both wakeup and schedule now imply a full barrier.
> 
> Furthermore, the barrier is unconditional when calling try_to_wake_up()
> and has been for a fair while.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>

Some questions below.

							Thanx, Paul

> ---
>  Documentation/memory-barriers.txt | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 46412bded104..dae5158c2382 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1881,9 +1881,9 @@ The whole sequence above is available in various canned forms, all of which
>  	event_indicated = 1;
>  	wake_up_process(event_daemon);
> 
> -A write memory barrier is implied by wake_up() and co. if and only if they wake
> -something up.  The barrier occurs before the task state is cleared, and so sits
> -between the STORE to indicate the event and the STORE to set TASK_RUNNING:
> +A full memory barrier is implied by wake_up() and co. The barrier occurs

Last I checked, the memory barrier was guaranteed only if a wakeup
actually occurred.  If there is a sleep-wakeup race, for example,
between wait_event_interruptible() and wake_up(), then it looks to me
that the following can happen:

o	Task A invokes wait_event_interruptible(), waiting for
	X==1.

o	Before Task A gets anywhere, Task B sets Y=1, does
	smp_mb(), then sets X=1.

o	Task B invokes wake_up(), which invokes __wake_up(), which
	acquires the wait_queue_head_t's lock and invokes
	__wake_up_common(), which sees nothing to wake up.

o	Task A tests the condition, finds X==1, and returns without
	locks, memory barriers, atomic instructions, or anything else
	that would guarantee ordering.

o	Task A then loads from Y.  Because there have been no memory
	barriers, it might well see Y==0.

So what am I missing here?

On the other hand, if a wake_up() really does happen, then
the fast-path out of wait_event_interruptible() is not taken,
and __wait_event_interruptible() is called instead.  This calls
___wait_event(), which eventually calls prepare_to_wait_event(), which
in turn calls set_current_state(), which calls set_mb(), which does a
full memory barrier.  And if that isn't good enough, there is the
call to schedule() itself.  ;-)

So if a wait actually sleeps, it does imply a full memory barrier
several times over.

On the wake_up() side, wake_up() calls __wake_up(), which as mentioned
earlier calls __wake_up_common() under a lock.  This invokes the
wake-up function stored by the sleeping task, for example,
autoremove_wake_function(), which calls default_wake_function(),
which invokes try_to_wake_up(), which does smp_mb__before_spinlock()
before acquiring the to-be-waked task's PI lock.

The definition of smp_mb__before_spinlock() is smp_wmb().  There is
also an smp_rmb() in try_to_wake_up(), which still does not get us
to a full memory barrier.  It also calls select_task_rq(), which
does not seem to guarantee any particular memory ordering (but
I could easily have missed something).  It also calls ttwu_queue(),
which invokes ttwu_do_activate() under the RQ lock.  I don't see a
full memory barrier in ttwu_do_activate(), but again could easily
have missed one.  Ditto for ttwu_stat().

All the locks nest, so other than the smp_wmb() and smp_rmb(), things
could bleed in.

> +before the task state is cleared, and so sits between the STORE to indicate
> +the event and the STORE to set TASK_RUNNING:

If I am in fact correct, and if we really want to advertise the read
memory barrier, I suggest the following replacement text:

	A read and a write memory barrier (-not- a full memory barrier)
	are implied by wake_up() and co. if and only if they wake
	something up.  The write barrier occurs before the task state is
	cleared, and so sits between the STORE to indicate the event and
	the STORE to set TASK_RUNNING, and the read barrier after that:

	CPU 1				CPU 2
	===============================	===============================
	set_current_state();		STORE event_indicated
	  set_mb();			wake_up();
	    STORE current->state	  <write barrier>
	    <general barrier>		  STORE current->state
	LOAD event_indicated		  <read barrier>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-05-13 15:27 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-13  9:45 [PATCH 00/19] Misc page alloc, shmem, mark_page_accessed and page_waitqueue optimisations v3r33 Mel Gorman
2014-05-13  9:45 ` [PATCH 01/19] mm: page_alloc: Do not update zlc unless the zlc is active Mel Gorman
2014-05-13  9:45 ` [PATCH 02/19] mm: page_alloc: Do not treat a zone that cannot be used for dirty pages as "full" Mel Gorman
2014-05-13  9:45 ` [PATCH 03/19] jump_label: Expose the reference count Mel Gorman
2014-05-13  9:45 ` [PATCH 04/19] mm: page_alloc: Use jump labels to avoid checking number_of_cpusets Mel Gorman
2014-05-13 10:58   ` Peter Zijlstra
2014-05-13 12:28     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 05/19] mm: page_alloc: Calculate classzone_idx once from the zonelist ref Mel Gorman
2014-05-13 22:25   ` Andrew Morton
2014-05-14  6:32     ` Mel Gorman
2014-05-14 20:29     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 06/19] mm: page_alloc: Only check the zone id check if pages are buddies Mel Gorman
2014-05-13  9:45 ` [PATCH 07/19] mm: page_alloc: Only check the alloc flags and gfp_mask for dirty once Mel Gorman
2014-05-13  9:45 ` [PATCH 08/19] mm: page_alloc: Take the ALLOC_NO_WATERMARK check out of the fast path Mel Gorman
2014-05-13  9:45 ` [PATCH 09/19] mm: page_alloc: Use word-based accesses for get/set pageblock bitmaps Mel Gorman
2014-05-22  9:24   ` Vlastimil Babka
2014-05-22 18:23     ` Andrew Morton
2014-05-22 18:45       ` Vlastimil Babka
2014-05-13  9:45 ` [PATCH 10/19] mm: page_alloc: Reduce number of times page_to_pfn is called Mel Gorman
2014-05-13 13:27   ` Vlastimil Babka
2014-05-13 14:09     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 11/19] mm: page_alloc: Lookup pageblock migratetype with IRQs enabled during free Mel Gorman
2014-05-13 13:36   ` Vlastimil Babka
2014-05-13 14:23     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 12/19] mm: page_alloc: Use unsigned int for order in more places Mel Gorman
2014-05-13  9:45 ` [PATCH 13/19] mm: page_alloc: Convert hot/cold parameter and immediate callers to bool Mel Gorman
2014-05-13  9:45 ` [PATCH 14/19] mm: shmem: Avoid atomic operation during shmem_getpage_gfp Mel Gorman
2014-05-13  9:45 ` [PATCH 15/19] mm: Do not use atomic operations when releasing pages Mel Gorman
2014-05-13  9:45 ` [PATCH 16/19] mm: Do not use unnecessary atomic operations when adding pages to the LRU Mel Gorman
2014-05-13  9:45 ` [PATCH 17/19] fs: buffer: Do not use unnecessary atomic operations when discarding buffers Mel Gorman
2014-05-13 11:09   ` Peter Zijlstra
2014-05-13 12:50     ` Mel Gorman
2014-05-13 13:49       ` Jan Kara
2014-05-13 14:30         ` Mel Gorman
2014-05-13 14:01       ` Peter Zijlstra
2014-05-13 14:46         ` Mel Gorman
2014-05-13 13:50   ` Jan Kara
2014-05-13 22:29   ` Andrew Morton
2014-05-14  6:12     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 18/19] mm: Non-atomically mark page accessed during page cache allocation where possible Mel Gorman
2014-05-13 14:29   ` Theodore Ts'o
2014-05-20 15:49   ` [PATCH] mm: non-atomically mark page accessed during page cache allocation where possible -fix Mel Gorman
2014-05-20 19:34     ` Andrew Morton
2014-05-21 12:09       ` Mel Gorman
2014-05-21 22:11         ` Andrew Morton
2014-05-22  0:07           ` Mel Gorman
2014-05-22  5:35       ` Prabhakar Lad
2014-05-13  9:45 ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Mel Gorman
2014-05-13 12:53   ` Mel Gorman
2014-05-13 14:17     ` Peter Zijlstra
2014-05-13 15:27       ` Paul E. McKenney [this message]
2014-05-13 15:44         ` Peter Zijlstra
2014-05-13 16:14           ` Paul E. McKenney
2014-05-13 18:57             ` Oleg Nesterov
2014-05-13 20:24               ` Paul E. McKenney
2014-05-14 14:25                 ` Oleg Nesterov
2014-05-13 18:22           ` Oleg Nesterov
2014-05-13 18:18         ` Oleg Nesterov
2014-05-13 18:24           ` Peter Zijlstra
2014-05-13 18:52           ` Paul E. McKenney
2014-05-13 19:31             ` Oleg Nesterov
2014-05-13 20:32               ` Paul E. McKenney
2014-05-14 16:11       ` Oleg Nesterov
2014-05-14 16:17         ` Peter Zijlstra
2014-05-16 13:51           ` [PATCH 0/1] ptrace: task_clear_jobctl_trapping()->wake_up_bit() needs mb() Oleg Nesterov
2014-05-16 13:51             ` [PATCH 1/1] " Oleg Nesterov
2014-05-21  9:29               ` Peter Zijlstra
2014-05-21 19:19                 ` Andrew Morton
2014-05-21 19:18             ` [PATCH 0/1] " Andrew Morton
2014-05-14 19:29         ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Oleg Nesterov
2014-05-14 20:53           ` Mel Gorman
2014-05-15 10:48           ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v4 Mel Gorman
2014-05-15 13:20             ` Peter Zijlstra
2014-05-15 13:29               ` Peter Zijlstra
2014-05-15 15:34               ` Oleg Nesterov
2014-05-15 15:45                 ` Peter Zijlstra
2014-05-15 16:18               ` Mel Gorman
2014-05-15 15:03             ` Oleg Nesterov
2014-05-15 21:24             ` Andrew Morton
2014-05-21 12:15               ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Mel Gorman
2014-05-21 13:02                 ` Peter Zijlstra
2014-05-21 15:33                   ` Mel Gorman
2014-05-21 16:08                     ` Peter Zijlstra
2014-05-21 21:26                 ` Andrew Morton
2014-05-21 21:33                   ` Peter Zijlstra
2014-05-21 21:50                     ` Andrew Morton
2014-05-22  0:07                       ` Mel Gorman
2014-05-22  7:20                         ` Peter Zijlstra
2014-05-22 10:40                           ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v7 Mel Gorman
2014-05-22 10:56                             ` Peter Zijlstra
2014-05-22 13:00                               ` Mel Gorman
2014-05-22 14:40                               ` Mel Gorman
2014-05-22 15:04                                 ` Peter Zijlstra
2014-05-22 15:36                                   ` Mel Gorman
2014-05-22 16:58                                   ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v8 Mel Gorman
2014-05-22  6:45                       ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Peter Zijlstra
2014-05-22  8:46                         ` Mel Gorman
2014-05-22 17:47                           ` Andrew Morton
2014-05-22 19:53                             ` Mel Gorman
2014-05-21 23:35                   ` Mel Gorman
2014-05-13 16:52   ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Peter Zijlstra
2014-05-14  7:31     ` Mel Gorman
2014-05-19  8:57 ` [PATCH] mm: Avoid unnecessary atomic operations during end_page_writeback Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140513152719.GF18164@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=dhowells@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).