From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Johannes Weiner <hannes@cmpxchg.org>, Jan Kara <jack@suse.cz>,
Nicholas Piggin <npiggin@gmail.com>,
Rik van Riel <riel@redhat.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: page_waitqueue() considered harmful
Date: Mon, 3 Oct 2016 11:47:43 +0100 [thread overview]
Message-ID: <20161003104743.GD3903@techsingularity.net> (raw)
In-Reply-To: <20160929130827.GX5016@twins.programming.kicks-ass.net>
On Thu, Sep 29, 2016 at 03:08:27PM +0200, Peter Zijlstra wrote:
> > is not racy (the add_wait_queue() will now already guarantee that
> > nobody else clears the bit).
> >
> > Hmm?
>
> Yes. I got my brain in a complete twist, but you're right, that is
> indeed required.
>
> Here's a new version with hopefully clearer comments.
>
> Same caveat about 32bit, naming etc..
>
I was able to run this with basic workloads over the weekend on small
UMA machines. Both machines behaved similarly so I'm only reporting one
from a single socket Skylake machine. NUMA machines rarely show anything
much more interesting for these type of workloads but as always, the full
impact is machine and workload dependant. Generally, I expect this type
of patch to have marginal but detectable impact.
This is a workload doing parallel dd of files large enough to trigger
reclaim which locks/unlocks pages
paralleldd
4.8.0-rc8 4.8.0-rc8
vanilla waitqueue-v1r2
Amean Elapsd-1 215.05 ( 0.00%) 214.53 ( 0.24%)
Amean Elapsd-3 214.72 ( 0.00%) 214.42 ( 0.14%)
Amean Elapsd-5 215.29 ( 0.00%) 214.88 ( 0.19%)
Amean Elapsd-7 215.75 ( 0.00%) 214.79 ( 0.44%)
Amean Elapsd-8 214.96 ( 0.00%) 215.21 ( -0.12%)
That's basically within the noise. CPU usage overall looks like
4.8.0-rc8 4.8.0-rc8
vanillawaitqueue-v1r2
User 3409.66 3421.72
System 18298.66 18251.99
Elapsed 7178.82 7181.14
Marginal decrease in system CPU usage. Profiles showed the vanilla
kernel spending less than 0.1% on unlock_page but it's eliminated by the
patch.
This is some microbenchmarks from the vm-scalability benchmark. It's
similar to dd in that it triggers reclaim from a single thread
vmscale
4.8.0-rc8 4.8.0-rc8
vanilla waitqueue-v1r2
Ops lru-file-mmap-read-elapsed 19.50 ( 0.00%) 19.43 ( 0.36%)
Ops lru-file-readonce-elapsed 12.44 ( 0.00%) 12.29 ( 1.21%)
Ops lru-file-readtwice-elapsed 22.27 ( 0.00%) 22.19 ( 0.36%)
Ops lru-memcg-elapsed 12.18 ( 0.00%) 12.00 ( 1.48%)
4.8.0-rc8 4.8.0-rc8
vanillawaitqueue-v1r2
User 50.54 50.88
System 398.72 388.81
Elapsed 69.48 68.99
Again, differences are marginal but detectable. I accidentally did not
collect profile data but I have no reason to believe it's significantly
different to dd.
This is "gitsource" from mmtests but it's a checkout of the git source
tree and a run of make test which is where Linus first noticed the
problem. The metric here is time-based, I don't actually check the
results of the regression suite.
gitsource
4.8.0-rc8 4.8.0-rc8
vanilla waitqueue-v1r2
User min 192.28 ( 0.00%) 192.49 ( -0.11%)
User mean 193.55 ( 0.00%) 194.88 ( -0.69%)
User stddev 1.52 ( 0.00%) 2.39 (-57.58%)
User coeffvar 0.79 ( 0.00%) 1.23 (-56.51%)
User max 196.34 ( 0.00%) 199.06 ( -1.39%)
System min 122.70 ( 0.00%) 118.69 ( 3.27%)
System mean 123.87 ( 0.00%) 120.68 ( 2.57%)
System stddev 0.84 ( 0.00%) 1.65 (-97.67%)
System coeffvar 0.67 ( 0.00%) 1.37 (-102.89%)
System max 124.95 ( 0.00%) 123.14 ( 1.45%)
Elapsed min 718.09 ( 0.00%) 711.48 ( 0.92%)
Elapsed mean 724.23 ( 0.00%) 716.52 ( 1.07%)
Elapsed stddev 4.20 ( 0.00%) 4.84 (-15.42%)
Elapsed coeffvar 0.58 ( 0.00%) 0.68 (-16.66%)
Elapsed max 730.51 ( 0.00%) 724.45 ( 0.83%)
4.8.0-rc8 4.8.0-rc8
vanillawaitqueue-v1r2
User 2730.60 2808.48
System 2184.85 2108.68
Elapsed 9938.01 9929.56
Overall, it's showing a drop in system CPU usage as expected. The detailed
results show a drop of 2.57% in system CPU usage running the benchmark
itself and 3.48% overall which is measuring everything and not just "make
test". The drop in elapsed time is marginal but measurable.
It may raise an eyebrow that the overall elapsed time doesn't match the
detailed results. The detailed results report 5 iterations of "make test"
without profiling enabled which takes takes about an hour. The way I
configured it, the profiled run happened immediately after it and it's much
slower as well as having to compile git itself which takes a few minutes.
This is the top lock/unlock activity in the vanilla kernel
0.80% git [kernel.vmlinux] [k] unlock_page
0.28% sh [kernel.vmlinux] [k] unlock_page
0.20% git-rebase [kernel.vmlinux] [k] unlock_page
0.13% git [kernel.vmlinux] [k] lock_page_memcg
0.10% git [kernel.vmlinux] [k] unlock_page_memcg
0.07% git-submodule [kernel.vmlinux] [k] unlock_page
0.04% sh [kernel.vmlinux] [k] lock_page_memcg
0.03% git-rebase [kernel.vmlinux] [k] lock_page_memcg
0.03% sh [kernel.vmlinux] [k] unlock_page_memcg
0.03% sed [kernel.vmlinux] [k] unlock_page
0.03% perf [kernel.vmlinux] [k] unlock_page
0.02% git-rebase [kernel.vmlinux] [k] unlock_page_memcg
0.02% rm [kernel.vmlinux] [k] unlock_page
0.02% git-stash [kernel.vmlinux] [k] unlock_page
0.02% git-bisect [kernel.vmlinux] [k] unlock_page
0.02% diff [kernel.vmlinux] [k] unlock_page
0.02% cat [kernel.vmlinux] [k] unlock_page
0.02% wc [kernel.vmlinux] [k] unlock_page
0.01% mv [kernel.vmlinux] [k] unlock_page
0.01% git-submodule [kernel.vmlinux] [k] lock_page_memcg
This is with the patch applied
0.49% git [kernel.vmlinux] [k] unlock_page
0.14% sh [kernel.vmlinux] [k] unlock_page
0.13% git [kernel.vmlinux] [k] lock_page_memcg
0.11% git-rebase [kernel.vmlinux] [k] unlock_page
0.10% git [kernel.vmlinux] [k] unlock_page_memcg
0.04% sh [kernel.vmlinux] [k] lock_page_memcg
0.04% git-submodule [kernel.vmlinux] [k] unlock_page
0.03% sh [kernel.vmlinux] [k] unlock_page_memcg
0.03% git-rebase [kernel.vmlinux] [k] lock_page_memcg
0.02% git-rebase [kernel.vmlinux] [k] unlock_page_memcg
0.02% sed [kernel.vmlinux] [k] unlock_page
0.01% rm [kernel.vmlinux] [k] unlock_page
0.01% git-stash [kernel.vmlinux] [k] unlock_page
0.01% git-submodule [kernel.vmlinux] [k] lock_page_memcg
0.01% git-bisect [kernel.vmlinux] [k] unlock_page
0.01% diff [kernel.vmlinux] [k] unlock_page
0.01% cat [kernel.vmlinux] [k] unlock_page
0.01% wc [kernel.vmlinux] [k] unlock_page
0.01% git-submodule [kernel.vmlinux] [k] unlock_page_memcg
0.01% mv [kernel.vmlinux] [k] unlock_page
The drop in time spent by git in unlock_page is noticable. I did not
drill down into the annotated profile but this roughly matches what I
measured before when avoiding page_waitqueue lookups.
The full profile is not exactly great but I didn't see anything in there
I haven't seen before. Top entries with the patch applied looks like
this
7.44% swapper [kernel.vmlinux] [k] intel_idle
1.25% git [kernel.vmlinux] [k] filemap_map_pages
1.08% git [kernel.vmlinux] [k] native_irq_return_iret
0.79% git [kernel.vmlinux] [k] unmap_page_range
0.56% git [kernel.vmlinux] [k] release_pages
0.51% git [kernel.vmlinux] [k] handle_mm_fault
0.49% git [kernel.vmlinux] [k] unlock_page
0.46% git [kernel.vmlinux] [k] page_remove_rmap
0.46% git [kernel.vmlinux] [k] _raw_spin_lock
0.42% git [kernel.vmlinux] [k] clear_page_c_e
Lot of map/unmap activity like you'd expect and release_pages being a pig
as usual.
Overall, this patch shows similar behaviour to my own patch from 2014.
There is a definite benefit but it's marginal. The big difference is
that this patch is a lot similar than the 2014 version and may meet less
resistance as a result.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-10-03 10:47 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-26 20:58 page_waitqueue() considered harmful Linus Torvalds
2016-09-26 21:23 ` Rik van Riel
2016-09-26 21:30 ` Linus Torvalds
2016-09-26 23:11 ` Kirill A. Shutemov
2016-09-27 1:01 ` Rik van Riel
2016-09-27 7:30 ` Peter Zijlstra
2016-09-27 8:54 ` Mel Gorman
2016-09-27 9:11 ` Kirill A. Shutemov
2016-09-27 9:42 ` Mel Gorman
2016-09-27 9:52 ` Minchan Kim
2016-09-27 12:11 ` Kirill A. Shutemov
2016-09-29 8:01 ` Peter Zijlstra
2016-09-29 12:55 ` Nicholas Piggin
2016-09-29 13:16 ` Peter Zijlstra
2016-09-29 13:54 ` Nicholas Piggin
2016-09-29 15:05 ` Rik van Riel
2016-09-27 8:03 ` Jan Kara
2016-09-27 8:31 ` Mel Gorman
2016-09-27 14:34 ` Peter Zijlstra
2016-09-27 15:08 ` Nicholas Piggin
2016-09-27 16:31 ` Linus Torvalds
2016-09-27 16:49 ` Peter Zijlstra
2016-09-28 10:45 ` Mel Gorman
2016-09-28 11:11 ` Peter Zijlstra
2016-09-28 16:10 ` Linus Torvalds
2016-09-29 13:08 ` Peter Zijlstra
2016-10-03 10:47 ` Mel Gorman [this message]
2016-09-27 14:53 ` Nicholas Piggin
2016-09-27 15:17 ` Nicholas Piggin
2016-09-27 16:52 ` Peter Zijlstra
2016-09-27 17:06 ` Nicholas Piggin
2016-09-28 7:05 ` Peter Zijlstra
2016-09-28 11:05 ` Paul E. McKenney
2016-09-28 11:16 ` Peter Zijlstra
2016-09-28 12:58 ` Paul E. McKenney
2016-09-29 1:31 ` Nicholas Piggin
2016-09-29 2:12 ` Paul E. McKenney
2016-09-29 6:21 ` Peter Zijlstra
2016-09-29 6:42 ` Nicholas Piggin
2016-09-29 7:14 ` Peter Zijlstra
2016-09-29 7:43 ` Peter Zijlstra
2016-09-28 7:40 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161003104743.GD3903@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).