From: Oleg Nesterov <oleg@redhat.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Vlastimil Babka <vbabka@suse.cz>, Jan Kara <jack@suse.cz>,
Michal Hocko <mhocko@suse.cz>, Hugh Dickins <hughd@google.com>,
Dave Hansen <dave.hansen@intel.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>,
Linux-FSDevel <linux-fsdevel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
David Howells <dhowells@redhat.com>
Subject: Re: [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath
Date: Tue, 13 May 2014 21:31:46 +0200 [thread overview]
Message-ID: <20140513193146.GA17051@redhat.com> (raw)
In-Reply-To: <20140513185250.GM18164@linux.vnet.ibm.com>
On 05/13, Paul E. McKenney wrote:
>
> On Tue, May 13, 2014 at 08:18:52PM +0200, Oleg Nesterov wrote:
> >
> > I have to admit, I am confused. I simply do not understand what "memory
> > barrier" actually means in this discussion.
> >
> > To me, wake_up/ttwu should only guarantee one thing: all the preceding
> > STORE's should be serialized with all the subsequent manipulations with
> > task->state (even with LOAD(task->state)).
>
> I was thinking in terms of "everything done before the wake_up() is
> visible after the wait_event*() returns" -- but only if the task doing
> the wait_event*() actually sleeps and is awakened by that particular
> wake_up().
Hmm. The question is, visible to whom ;) To the woken task?
Yes sure, and this is simply because both sleeper/waker take rq->lock.
> > > If there is a sleep-wakeup race, for example,
> > > between wait_event_interruptible() and wake_up(), then it looks to me
> > > that the following can happen:
> > >
> > > o Task A invokes wait_event_interruptible(), waiting for
> > > X==1.
> > >
> > > o Before Task A gets anywhere, Task B sets Y=1, does
> > > smp_mb(), then sets X=1.
> > >
> > > o Task B invokes wake_up(), which invokes __wake_up(), which
> > > acquires the wait_queue_head_t's lock and invokes
> > > __wake_up_common(), which sees nothing to wake up.
> > >
> > > o Task A tests the condition, finds X==1, and returns without
> > > locks, memory barriers, atomic instructions, or anything else
> > > that would guarantee ordering.
> > >
> > > o Task A then loads from Y. Because there have been no memory
> > > barriers, it might well see Y==0.
> >
> > Sure, but I can't understand "Because there have been no memory barriers".
> >
> > IOW. Suppose we add mb() into wake_up(). The same can happen anyway?
>
> If the mb() is placed just after the fastpath condition check, then the
> awakened task will be guaranteed to see Y=1.
Of course. My point was, this has nothing to do with the barriers provided
by wake_up(), that is why I was confused.
> > > On the other hand, if a wake_up() really does happen, then
> > > the fast-path out of wait_event_interruptible() is not taken,
> > > and __wait_event_interruptible() is called instead. This calls
> > > ___wait_event(), which eventually calls prepare_to_wait_event(), which
> > > in turn calls set_current_state(), which calls set_mb(), which does a
> > > full memory barrier.
> >
> > Can't understand this part too... OK, and suppose that right after that
> > the task B from the scenario above does
> >
> > Y = 1;
> > mb();
> > X = 1;
> > wake_up();
> >
> > After that task A checks the condition, sees X==1, and returns from
> > wait_event() without spin_lock(wait_queue_head_t->lock) (if it also
> > sees list_empty_careful() == T). Then it can see Y==0 again?
>
> Yes. You need the barriers to be paired, and in this case, Task A isn't
> executing a memory barrier. Yes, the mb() has forced Task B's CPU to
> commit the writes in order (or at least pretend to), but Task A might
> have speculated the read to Y.
>
> Or am I missing your point?
I only meant that this case doesn't really differ from the scenario you
described above.
> > > A read and a write memory barrier (-not- a full memory barrier)
> > > are implied by wake_up() and co. if and only if they wake
> > > something up.
> >
> > Now this looks as if you document that, say,
> >
> > X = 1;
> > wake_up();
> > Y = 1;
> >
> > doesn't need wmb() before "Y = 1" if wake_up() wakes something up. Do we
> > really want to document this? Is it fine to rely on this guarantee?
>
> That is an excellent question. It would not be hard to argue that we
> should either make the guarantee unconditional by adding smp_mb() to
> the wait_event*() paths or alternatively just saying that there isn't
> a guarantee to begin with.
I'd vote for "no guarantees".
> > In short: I am totally confused and most probably misunderstood you ;)
>
> Oleg, if it confuses you, it is in desperate need of help! ;-)
Thanks, this helped ;)
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-05-13 19:32 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-13 9:45 [PATCH 00/19] Misc page alloc, shmem, mark_page_accessed and page_waitqueue optimisations v3r33 Mel Gorman
2014-05-13 9:45 ` [PATCH 01/19] mm: page_alloc: Do not update zlc unless the zlc is active Mel Gorman
2014-05-13 9:45 ` [PATCH 02/19] mm: page_alloc: Do not treat a zone that cannot be used for dirty pages as "full" Mel Gorman
2014-05-13 9:45 ` [PATCH 03/19] jump_label: Expose the reference count Mel Gorman
2014-05-13 9:45 ` [PATCH 04/19] mm: page_alloc: Use jump labels to avoid checking number_of_cpusets Mel Gorman
2014-05-13 10:58 ` Peter Zijlstra
2014-05-13 12:28 ` Mel Gorman
2014-05-13 9:45 ` [PATCH 05/19] mm: page_alloc: Calculate classzone_idx once from the zonelist ref Mel Gorman
2014-05-13 22:25 ` Andrew Morton
2014-05-14 6:32 ` Mel Gorman
2014-05-14 20:29 ` Mel Gorman
2014-05-13 9:45 ` [PATCH 06/19] mm: page_alloc: Only check the zone id check if pages are buddies Mel Gorman
2014-05-13 9:45 ` [PATCH 07/19] mm: page_alloc: Only check the alloc flags and gfp_mask for dirty once Mel Gorman
2014-05-13 9:45 ` [PATCH 08/19] mm: page_alloc: Take the ALLOC_NO_WATERMARK check out of the fast path Mel Gorman
2014-05-13 9:45 ` [PATCH 09/19] mm: page_alloc: Use word-based accesses for get/set pageblock bitmaps Mel Gorman
2014-05-22 9:24 ` Vlastimil Babka
2014-05-22 18:23 ` Andrew Morton
2014-05-22 18:45 ` Vlastimil Babka
2014-05-13 9:45 ` [PATCH 10/19] mm: page_alloc: Reduce number of times page_to_pfn is called Mel Gorman
2014-05-13 13:27 ` Vlastimil Babka
2014-05-13 14:09 ` Mel Gorman
2014-05-13 9:45 ` [PATCH 11/19] mm: page_alloc: Lookup pageblock migratetype with IRQs enabled during free Mel Gorman
2014-05-13 13:36 ` Vlastimil Babka
2014-05-13 14:23 ` Mel Gorman
2014-05-13 9:45 ` [PATCH 12/19] mm: page_alloc: Use unsigned int for order in more places Mel Gorman
2014-05-13 9:45 ` [PATCH 13/19] mm: page_alloc: Convert hot/cold parameter and immediate callers to bool Mel Gorman
2014-05-13 9:45 ` [PATCH 14/19] mm: shmem: Avoid atomic operation during shmem_getpage_gfp Mel Gorman
2014-05-13 9:45 ` [PATCH 15/19] mm: Do not use atomic operations when releasing pages Mel Gorman
2014-05-13 9:45 ` [PATCH 16/19] mm: Do not use unnecessary atomic operations when adding pages to the LRU Mel Gorman
2014-05-13 9:45 ` [PATCH 17/19] fs: buffer: Do not use unnecessary atomic operations when discarding buffers Mel Gorman
2014-05-13 11:09 ` Peter Zijlstra
2014-05-13 12:50 ` Mel Gorman
2014-05-13 13:49 ` Jan Kara
2014-05-13 14:30 ` Mel Gorman
2014-05-13 14:01 ` Peter Zijlstra
2014-05-13 14:46 ` Mel Gorman
2014-05-13 13:50 ` Jan Kara
2014-05-13 22:29 ` Andrew Morton
2014-05-14 6:12 ` Mel Gorman
2014-05-13 9:45 ` [PATCH 18/19] mm: Non-atomically mark page accessed during page cache allocation where possible Mel Gorman
2014-05-13 14:29 ` Theodore Ts'o
2014-05-20 15:49 ` [PATCH] mm: non-atomically mark page accessed during page cache allocation where possible -fix Mel Gorman
2014-05-20 19:34 ` Andrew Morton
2014-05-21 12:09 ` Mel Gorman
2014-05-21 22:11 ` Andrew Morton
2014-05-22 0:07 ` Mel Gorman
2014-05-22 5:35 ` Prabhakar Lad
2014-05-13 9:45 ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Mel Gorman
2014-05-13 12:53 ` Mel Gorman
2014-05-13 14:17 ` Peter Zijlstra
2014-05-13 15:27 ` Paul E. McKenney
2014-05-13 15:44 ` Peter Zijlstra
2014-05-13 16:14 ` Paul E. McKenney
2014-05-13 18:57 ` Oleg Nesterov
2014-05-13 20:24 ` Paul E. McKenney
2014-05-14 14:25 ` Oleg Nesterov
2014-05-13 18:22 ` Oleg Nesterov
2014-05-13 18:18 ` Oleg Nesterov
2014-05-13 18:24 ` Peter Zijlstra
2014-05-13 18:52 ` Paul E. McKenney
2014-05-13 19:31 ` Oleg Nesterov [this message]
2014-05-13 20:32 ` Paul E. McKenney
2014-05-14 16:11 ` Oleg Nesterov
2014-05-14 16:17 ` Peter Zijlstra
2014-05-16 13:51 ` [PATCH 0/1] ptrace: task_clear_jobctl_trapping()->wake_up_bit() needs mb() Oleg Nesterov
2014-05-16 13:51 ` [PATCH 1/1] " Oleg Nesterov
2014-05-21 9:29 ` Peter Zijlstra
2014-05-21 19:19 ` Andrew Morton
2014-05-21 19:18 ` [PATCH 0/1] " Andrew Morton
2014-05-14 19:29 ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Oleg Nesterov
2014-05-14 20:53 ` Mel Gorman
2014-05-15 10:48 ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v4 Mel Gorman
2014-05-15 13:20 ` Peter Zijlstra
2014-05-15 13:29 ` Peter Zijlstra
2014-05-15 15:34 ` Oleg Nesterov
2014-05-15 15:45 ` Peter Zijlstra
2014-05-15 16:18 ` Mel Gorman
2014-05-15 15:03 ` Oleg Nesterov
2014-05-15 21:24 ` Andrew Morton
2014-05-21 12:15 ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Mel Gorman
2014-05-21 13:02 ` Peter Zijlstra
2014-05-21 15:33 ` Mel Gorman
2014-05-21 16:08 ` Peter Zijlstra
2014-05-21 21:26 ` Andrew Morton
2014-05-21 21:33 ` Peter Zijlstra
2014-05-21 21:50 ` Andrew Morton
2014-05-22 0:07 ` Mel Gorman
2014-05-22 7:20 ` Peter Zijlstra
2014-05-22 10:40 ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v7 Mel Gorman
2014-05-22 10:56 ` Peter Zijlstra
2014-05-22 13:00 ` Mel Gorman
2014-05-22 14:40 ` Mel Gorman
2014-05-22 15:04 ` Peter Zijlstra
2014-05-22 15:36 ` Mel Gorman
2014-05-22 16:58 ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v8 Mel Gorman
2014-05-22 6:45 ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Peter Zijlstra
2014-05-22 8:46 ` Mel Gorman
2014-05-22 17:47 ` Andrew Morton
2014-05-22 19:53 ` Mel Gorman
2014-05-21 23:35 ` Mel Gorman
2014-05-13 16:52 ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Peter Zijlstra
2014-05-14 7:31 ` Mel Gorman
2014-05-19 8:57 ` [PATCH] mm: Avoid unnecessary atomic operations during end_page_writeback Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140513193146.GA17051@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=dhowells@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).