From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Vlastimil Babka <vbabka@suse.cz>, Jan Kara <jack@suse.cz>,
Michal Hocko <mhocko@suse.cz>, Hugh Dickins <hughd@google.com>,
Dave Hansen <dave.hansen@intel.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>,
Linux-FSDevel <linux-fsdevel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
David Howells <dhowells@redhat.com>
Subject: Re: [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath
Date: Tue, 13 May 2014 11:52:50 -0700 [thread overview]
Message-ID: <20140513185250.GM18164@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140513181852.GB12123@redhat.com>
On Tue, May 13, 2014 at 08:18:52PM +0200, Oleg Nesterov wrote:
> On 05/13, Paul E. McKenney wrote:
> >
> > On Tue, May 13, 2014 at 04:17:48PM +0200, Peter Zijlstra wrote:
> > >
> > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > > index 46412bded104..dae5158c2382 100644
> > > --- a/Documentation/memory-barriers.txt
> > > +++ b/Documentation/memory-barriers.txt
> > > @@ -1881,9 +1881,9 @@ The whole sequence above is available in various canned forms, all of which
> > > event_indicated = 1;
> > > wake_up_process(event_daemon);
> > >
> > > -A write memory barrier is implied by wake_up() and co. if and only if they wake
> > > -something up. The barrier occurs before the task state is cleared, and so sits
> > > -between the STORE to indicate the event and the STORE to set TASK_RUNNING:
> > > +A full memory barrier is implied by wake_up() and co. The barrier occurs
> >
> > Last I checked, the memory barrier was guaranteed
>
> I have to admit, I am confused. I simply do not understand what "memory
> barrier" actually means in this discussion.
>
> To me, wake_up/ttwu should only guarantee one thing: all the preceding
> STORE's should be serialized with all the subsequent manipulations with
> task->state (even with LOAD(task->state)).
I was thinking in terms of "everything done before the wake_up() is
visible after the wait_event*() returns" -- but only if the task doing
the wait_event*() actually sleeps and is awakened by that particular
wake_up().
Admittedly a bit of a weak guarantee!
> > If there is a sleep-wakeup race, for example,
> > between wait_event_interruptible() and wake_up(), then it looks to me
> > that the following can happen:
> >
> > o Task A invokes wait_event_interruptible(), waiting for
> > X==1.
> >
> > o Before Task A gets anywhere, Task B sets Y=1, does
> > smp_mb(), then sets X=1.
> >
> > o Task B invokes wake_up(), which invokes __wake_up(), which
> > acquires the wait_queue_head_t's lock and invokes
> > __wake_up_common(), which sees nothing to wake up.
> >
> > o Task A tests the condition, finds X==1, and returns without
> > locks, memory barriers, atomic instructions, or anything else
> > that would guarantee ordering.
> >
> > o Task A then loads from Y. Because there have been no memory
> > barriers, it might well see Y==0.
>
> Sure, but I can't understand "Because there have been no memory barriers".
>
> IOW. Suppose we add mb() into wake_up(). The same can happen anyway?
If the mb() is placed just after the fastpath condition check, then the
awakened task will be guaranteed to see Y=1. Either that memory barrier
or the wait_queue_head_t's lock will guarantee the serialization, I think,
anyway.
> And "if a wakeup actually occurred" is not clear to me too in this context.
> For example, suppose that ttwu() clears task->state but that task was not
> deactivated and it is going to check the condition, do we count this as
> "wakeup actually occurred" ? In this case that task still can see Y==0.
I was thinking in terms of the task doing the wait_event*() actually
entering the scheduler.
> > On the other hand, if a wake_up() really does happen, then
> > the fast-path out of wait_event_interruptible() is not taken,
> > and __wait_event_interruptible() is called instead. This calls
> > ___wait_event(), which eventually calls prepare_to_wait_event(), which
> > in turn calls set_current_state(), which calls set_mb(), which does a
> > full memory barrier.
>
> Can't understand this part too... OK, and suppose that right after that
> the task B from the scenario above does
>
> Y = 1;
> mb();
> X = 1;
> wake_up();
>
> After that task A checks the condition, sees X==1, and returns from
> wait_event() without spin_lock(wait_queue_head_t->lock) (if it also
> sees list_empty_careful() == T). Then it can see Y==0 again?
Yes. You need the barriers to be paired, and in this case, Task A isn't
executing a memory barrier. Yes, the mb() has forced Task B's CPU to
commit the writes in order (or at least pretend to), but Task A might
have speculated the read to Y.
Or am I missing your point?
> > A read and a write memory barrier (-not- a full memory barrier)
> > are implied by wake_up() and co. if and only if they wake
> > something up.
>
> Now this looks as if you document that, say,
>
> X = 1;
> wake_up();
> Y = 1;
>
> doesn't need wmb() before "Y = 1" if wake_up() wakes something up. Do we
> really want to document this? Is it fine to rely on this guarantee?
That is an excellent question. It would not be hard to argue that we
should either make the guarantee unconditional by adding smp_mb() to
the wait_event*() paths or alternatively just saying that there isn't
a guarantee to begin with.
Thoughts?
> > The write barrier occurs before the task state is
> > cleared, and so sits between the STORE to indicate the event and
> > the STORE to set TASK_RUNNING, and the read barrier after that:
>
> Plus: between the STORE to indicate the event and the LOAD which checks
> task->state, otherwise:
>
> > CPU 1 CPU 2
> > =============================== ===============================
> > set_current_state(); STORE event_indicated
> > set_mb(); wake_up();
> > STORE current->state <write barrier>
> > <general barrier> STORE current->state
> > LOAD event_indicated <read barrier>
>
> this code is still racy.
Yeah, it is missing some key components. That said, we should figure
out exactly what we want to guarantee before I try to fix it. ;-)
> In short: I am totally confused and most probably misunderstood you ;)
Oleg, if it confuses you, it is in desperate need of help! ;-)
Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-05-13 18:52 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-13 9:45 [PATCH 00/19] Misc page alloc, shmem, mark_page_accessed and page_waitqueue optimisations v3r33 Mel Gorman
2014-05-13 9:45 ` [PATCH 01/19] mm: page_alloc: Do not update zlc unless the zlc is active Mel Gorman
2014-05-13 9:45 ` [PATCH 02/19] mm: page_alloc: Do not treat a zone that cannot be used for dirty pages as "full" Mel Gorman
2014-05-13 9:45 ` [PATCH 03/19] jump_label: Expose the reference count Mel Gorman
2014-05-13 9:45 ` [PATCH 04/19] mm: page_alloc: Use jump labels to avoid checking number_of_cpusets Mel Gorman
2014-05-13 10:58 ` Peter Zijlstra
2014-05-13 12:28 ` Mel Gorman
2014-05-13 9:45 ` [PATCH 05/19] mm: page_alloc: Calculate classzone_idx once from the zonelist ref Mel Gorman
2014-05-13 22:25 ` Andrew Morton
2014-05-14 6:32 ` Mel Gorman
2014-05-14 20:29 ` Mel Gorman
2014-05-13 9:45 ` [PATCH 06/19] mm: page_alloc: Only check the zone id check if pages are buddies Mel Gorman
2014-05-13 9:45 ` [PATCH 07/19] mm: page_alloc: Only check the alloc flags and gfp_mask for dirty once Mel Gorman
2014-05-13 9:45 ` [PATCH 08/19] mm: page_alloc: Take the ALLOC_NO_WATERMARK check out of the fast path Mel Gorman
2014-05-13 9:45 ` [PATCH 09/19] mm: page_alloc: Use word-based accesses for get/set pageblock bitmaps Mel Gorman
2014-05-22 9:24 ` Vlastimil Babka
2014-05-22 18:23 ` Andrew Morton
2014-05-22 18:45 ` Vlastimil Babka
2014-05-13 9:45 ` [PATCH 10/19] mm: page_alloc: Reduce number of times page_to_pfn is called Mel Gorman
2014-05-13 13:27 ` Vlastimil Babka
2014-05-13 14:09 ` Mel Gorman
2014-05-13 9:45 ` [PATCH 11/19] mm: page_alloc: Lookup pageblock migratetype with IRQs enabled during free Mel Gorman
2014-05-13 13:36 ` Vlastimil Babka
2014-05-13 14:23 ` Mel Gorman
2014-05-13 9:45 ` [PATCH 12/19] mm: page_alloc: Use unsigned int for order in more places Mel Gorman
2014-05-13 9:45 ` [PATCH 13/19] mm: page_alloc: Convert hot/cold parameter and immediate callers to bool Mel Gorman
2014-05-13 9:45 ` [PATCH 14/19] mm: shmem: Avoid atomic operation during shmem_getpage_gfp Mel Gorman
2014-05-13 9:45 ` [PATCH 15/19] mm: Do not use atomic operations when releasing pages Mel Gorman
2014-05-13 9:45 ` [PATCH 16/19] mm: Do not use unnecessary atomic operations when adding pages to the LRU Mel Gorman
2014-05-13 9:45 ` [PATCH 17/19] fs: buffer: Do not use unnecessary atomic operations when discarding buffers Mel Gorman
2014-05-13 11:09 ` Peter Zijlstra
2014-05-13 12:50 ` Mel Gorman
2014-05-13 13:49 ` Jan Kara
2014-05-13 14:30 ` Mel Gorman
2014-05-13 14:01 ` Peter Zijlstra
2014-05-13 14:46 ` Mel Gorman
2014-05-13 13:50 ` Jan Kara
2014-05-13 22:29 ` Andrew Morton
2014-05-14 6:12 ` Mel Gorman
2014-05-13 9:45 ` [PATCH 18/19] mm: Non-atomically mark page accessed during page cache allocation where possible Mel Gorman
2014-05-13 14:29 ` Theodore Ts'o
2014-05-20 15:49 ` [PATCH] mm: non-atomically mark page accessed during page cache allocation where possible -fix Mel Gorman
2014-05-20 19:34 ` Andrew Morton
2014-05-21 12:09 ` Mel Gorman
2014-05-21 22:11 ` Andrew Morton
2014-05-22 0:07 ` Mel Gorman
2014-05-22 5:35 ` Prabhakar Lad
2014-05-13 9:45 ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Mel Gorman
2014-05-13 12:53 ` Mel Gorman
2014-05-13 14:17 ` Peter Zijlstra
2014-05-13 15:27 ` Paul E. McKenney
2014-05-13 15:44 ` Peter Zijlstra
2014-05-13 16:14 ` Paul E. McKenney
2014-05-13 18:57 ` Oleg Nesterov
2014-05-13 20:24 ` Paul E. McKenney
2014-05-14 14:25 ` Oleg Nesterov
2014-05-13 18:22 ` Oleg Nesterov
2014-05-13 18:18 ` Oleg Nesterov
2014-05-13 18:24 ` Peter Zijlstra
2014-05-13 18:52 ` Paul E. McKenney [this message]
2014-05-13 19:31 ` Oleg Nesterov
2014-05-13 20:32 ` Paul E. McKenney
2014-05-14 16:11 ` Oleg Nesterov
2014-05-14 16:17 ` Peter Zijlstra
2014-05-16 13:51 ` [PATCH 0/1] ptrace: task_clear_jobctl_trapping()->wake_up_bit() needs mb() Oleg Nesterov
2014-05-16 13:51 ` [PATCH 1/1] " Oleg Nesterov
2014-05-21 9:29 ` Peter Zijlstra
2014-05-21 19:19 ` Andrew Morton
2014-05-21 19:18 ` [PATCH 0/1] " Andrew Morton
2014-05-14 19:29 ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Oleg Nesterov
2014-05-14 20:53 ` Mel Gorman
2014-05-15 10:48 ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v4 Mel Gorman
2014-05-15 13:20 ` Peter Zijlstra
2014-05-15 13:29 ` Peter Zijlstra
2014-05-15 15:34 ` Oleg Nesterov
2014-05-15 15:45 ` Peter Zijlstra
2014-05-15 16:18 ` Mel Gorman
2014-05-15 15:03 ` Oleg Nesterov
2014-05-15 21:24 ` Andrew Morton
2014-05-21 12:15 ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Mel Gorman
2014-05-21 13:02 ` Peter Zijlstra
2014-05-21 15:33 ` Mel Gorman
2014-05-21 16:08 ` Peter Zijlstra
2014-05-21 21:26 ` Andrew Morton
2014-05-21 21:33 ` Peter Zijlstra
2014-05-21 21:50 ` Andrew Morton
2014-05-22 0:07 ` Mel Gorman
2014-05-22 7:20 ` Peter Zijlstra
2014-05-22 10:40 ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v7 Mel Gorman
2014-05-22 10:56 ` Peter Zijlstra
2014-05-22 13:00 ` Mel Gorman
2014-05-22 14:40 ` Mel Gorman
2014-05-22 15:04 ` Peter Zijlstra
2014-05-22 15:36 ` Mel Gorman
2014-05-22 16:58 ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v8 Mel Gorman
2014-05-22 6:45 ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Peter Zijlstra
2014-05-22 8:46 ` Mel Gorman
2014-05-22 17:47 ` Andrew Morton
2014-05-22 19:53 ` Mel Gorman
2014-05-21 23:35 ` Mel Gorman
2014-05-13 16:52 ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Peter Zijlstra
2014-05-14 7:31 ` Mel Gorman
2014-05-19 8:57 ` [PATCH] mm: Avoid unnecessary atomic operations during end_page_writeback Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140513185250.GM18164@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=dhowells@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).