Re: page_waitqueue() considered harmful

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Jan Kara <jack@suse.cz>,
	Nicholas Piggin <npiggin@gmail.com>,
	Rik van Riel <riel@redhat.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: page_waitqueue() considered harmful
Date: Mon, 3 Oct 2016 11:47:43 +0100	[thread overview]
Message-ID: <20161003104743.GD3903@techsingularity.net> (raw)
In-Reply-To: <20160929130827.GX5016@twins.programming.kicks-ass.net>

On Thu, Sep 29, 2016 at 03:08:27PM +0200, Peter Zijlstra wrote:
> > is not racy (the add_wait_queue() will now already guarantee that
> > nobody else clears the bit).
> > 
> > Hmm?
> 
> Yes. I got my brain in a complete twist, but you're right, that is
> indeed required.
> 
> Here's a new version with hopefully clearer comments.
> 
> Same caveat about 32bit, naming etc..
> 

I was able to run this with basic workloads over the weekend on small
UMA machines. Both machines behaved similarly so I'm only reporting one
from a single socket Skylake machine. NUMA machines rarely show anything
much more interesting for these type of workloads but as always, the full
impact is machine and workload dependant. Generally, I expect this type
of patch to have marginal but detectable impact.

This is a workload doing parallel dd of files large enough to trigger
reclaim which locks/unlocks pages

paralleldd
                              4.8.0-rc8             4.8.0-rc8
                                vanilla        waitqueue-v1r2
Amean    Elapsd-1      215.05 (  0.00%)      214.53 (  0.24%)
Amean    Elapsd-3      214.72 (  0.00%)      214.42 (  0.14%)
Amean    Elapsd-5      215.29 (  0.00%)      214.88 (  0.19%)
Amean    Elapsd-7      215.75 (  0.00%)      214.79 (  0.44%)
Amean    Elapsd-8      214.96 (  0.00%)      215.21 ( -0.12%)

That's basically within the noise. CPU usage overall looks like

           4.8.0-rc8   4.8.0-rc8
             vanillawaitqueue-v1r2
User         3409.66     3421.72
System      18298.66    18251.99
Elapsed      7178.82     7181.14

Marginal decrease in system CPU usage. Profiles showed the vanilla
kernel spending less than 0.1% on unlock_page but it's eliminated by the
patch.

This is some microbenchmarks from the vm-scalability benchmark. It's
similar to dd in that it triggers reclaim from a single thread

vmscale
                                                           4.8.0-rc8                          4.8.0-rc8
                                                             vanilla                     waitqueue-v1r2
Ops lru-file-mmap-read-elapsed                       19.50 (  0.00%)                    19.43 (  0.36%)
Ops lru-file-readonce-elapsed                        12.44 (  0.00%)                    12.29 (  1.21%)
Ops lru-file-readtwice-elapsed                       22.27 (  0.00%)                    22.19 (  0.36%)
Ops lru-memcg-elapsed                                12.18 (  0.00%)                    12.00 (  1.48%)

           4.8.0-rc8   4.8.0-rc8
             vanillawaitqueue-v1r2
User           50.54       50.88
System        398.72      388.81
Elapsed        69.48       68.99

Again, differences are marginal but detectable. I accidentally did not
collect profile data but I have no reason to believe it's significantly
different to dd.

This is "gitsource" from mmtests but it's a checkout of the git source
tree and a run of make test which is where Linus first noticed the
problem. The metric here is time-based, I don't actually check the
results of the regression suite.

gitsource
                             4.8.0-rc8             4.8.0-rc8
                               vanilla        waitqueue-v1r2
User    min           192.28 (  0.00%)      192.49 ( -0.11%)
User    mean          193.55 (  0.00%)      194.88 ( -0.69%)
User    stddev          1.52 (  0.00%)        2.39 (-57.58%)
User    coeffvar        0.79 (  0.00%)        1.23 (-56.51%)
User    max           196.34 (  0.00%)      199.06 ( -1.39%)
System  min           122.70 (  0.00%)      118.69 (  3.27%)
System  mean          123.87 (  0.00%)      120.68 (  2.57%)
System  stddev          0.84 (  0.00%)        1.65 (-97.67%)
System  coeffvar        0.67 (  0.00%)        1.37 (-102.89%)
System  max           124.95 (  0.00%)      123.14 (  1.45%)
Elapsed min           718.09 (  0.00%)      711.48 (  0.92%)
Elapsed mean          724.23 (  0.00%)      716.52 (  1.07%)
Elapsed stddev          4.20 (  0.00%)        4.84 (-15.42%)
Elapsed coeffvar        0.58 (  0.00%)        0.68 (-16.66%)
Elapsed max           730.51 (  0.00%)      724.45 (  0.83%)

           4.8.0-rc8   4.8.0-rc8
             vanillawaitqueue-v1r2
User         2730.60     2808.48
System       2184.85     2108.68
Elapsed      9938.01     9929.56

Overall, it's showing a drop in system CPU usage as expected. The detailed
results show a drop of 2.57% in system CPU usage running the benchmark
itself and 3.48% overall which is measuring everything and not just "make
test". The drop in elapsed time is marginal but measurable.

It may raise an eyebrow that the overall elapsed time doesn't match the
detailed results. The detailed results report 5 iterations of "make test"
without profiling enabled which takes takes about an hour. The way I
configured it, the profiled run happened immediately after it and it's much
slower as well as having to compile git itself which takes a few minutes.

This is the top lock/unlock activity in the vanilla kernel

     0.80%  git              [kernel.vmlinux]              [k] unlock_page
     0.28%  sh               [kernel.vmlinux]              [k] unlock_page
     0.20%  git-rebase       [kernel.vmlinux]              [k] unlock_page
     0.13%  git              [kernel.vmlinux]              [k] lock_page_memcg
     0.10%  git              [kernel.vmlinux]              [k] unlock_page_memcg
     0.07%  git-submodule    [kernel.vmlinux]              [k] unlock_page
     0.04%  sh               [kernel.vmlinux]              [k] lock_page_memcg
     0.03%  git-rebase       [kernel.vmlinux]              [k] lock_page_memcg
     0.03%  sh               [kernel.vmlinux]              [k] unlock_page_memcg
     0.03%  sed              [kernel.vmlinux]              [k] unlock_page
     0.03%  perf             [kernel.vmlinux]              [k] unlock_page
     0.02%  git-rebase       [kernel.vmlinux]              [k] unlock_page_memcg
     0.02%  rm               [kernel.vmlinux]              [k] unlock_page
     0.02%  git-stash        [kernel.vmlinux]              [k] unlock_page
     0.02%  git-bisect       [kernel.vmlinux]              [k] unlock_page
     0.02%  diff             [kernel.vmlinux]              [k] unlock_page
     0.02%  cat              [kernel.vmlinux]              [k] unlock_page
     0.02%  wc               [kernel.vmlinux]              [k] unlock_page
     0.01%  mv               [kernel.vmlinux]              [k] unlock_page
     0.01%  git-submodule    [kernel.vmlinux]              [k] lock_page_memcg

This is with the patch applied

     0.49%  git              [kernel.vmlinux]             [k] unlock_page
     0.14%  sh               [kernel.vmlinux]             [k] unlock_page
     0.13%  git              [kernel.vmlinux]             [k] lock_page_memcg
     0.11%  git-rebase       [kernel.vmlinux]             [k] unlock_page
     0.10%  git              [kernel.vmlinux]             [k] unlock_page_memcg
     0.04%  sh               [kernel.vmlinux]             [k] lock_page_memcg
     0.04%  git-submodule    [kernel.vmlinux]             [k] unlock_page
     0.03%  sh               [kernel.vmlinux]             [k] unlock_page_memcg
     0.03%  git-rebase       [kernel.vmlinux]             [k] lock_page_memcg
     0.02%  git-rebase       [kernel.vmlinux]             [k] unlock_page_memcg
     0.02%  sed              [kernel.vmlinux]             [k] unlock_page
     0.01%  rm               [kernel.vmlinux]             [k] unlock_page
     0.01%  git-stash        [kernel.vmlinux]             [k] unlock_page
     0.01%  git-submodule    [kernel.vmlinux]             [k] lock_page_memcg
     0.01%  git-bisect       [kernel.vmlinux]             [k] unlock_page
     0.01%  diff             [kernel.vmlinux]             [k] unlock_page
     0.01%  cat              [kernel.vmlinux]             [k] unlock_page
     0.01%  wc               [kernel.vmlinux]             [k] unlock_page
     0.01%  git-submodule    [kernel.vmlinux]             [k] unlock_page_memcg
     0.01%  mv               [kernel.vmlinux]             [k] unlock_page

The drop in time spent by git in unlock_page is noticable. I did not
drill down into the annotated profile but this roughly matches what I
measured before when avoiding page_waitqueue lookups.

The full profile is not exactly great but I didn't see anything in there
I haven't seen before. Top entries with the patch applied looks like
this

     7.44%  swapper          [kernel.vmlinux]             [k] intel_idle
     1.25%  git              [kernel.vmlinux]             [k] filemap_map_pages
     1.08%  git              [kernel.vmlinux]             [k] native_irq_return_iret
     0.79%  git              [kernel.vmlinux]             [k] unmap_page_range
     0.56%  git              [kernel.vmlinux]             [k] release_pages
     0.51%  git              [kernel.vmlinux]             [k] handle_mm_fault
     0.49%  git              [kernel.vmlinux]             [k] unlock_page
     0.46%  git              [kernel.vmlinux]             [k] page_remove_rmap
     0.46%  git              [kernel.vmlinux]             [k] _raw_spin_lock
     0.42%  git              [kernel.vmlinux]             [k] clear_page_c_e

Lot of map/unmap activity like you'd expect and release_pages being a pig
as usual.

Overall, this patch shows similar behaviour to my own patch from 2014.
There is a definite benefit but it's marginal. The big difference is
that this patch is a lot similar than the 2014 version and may meet less
resistance as a result.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-10-03 10:47 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-26 20:58 page_waitqueue() considered harmful Linus Torvalds
2016-09-26 21:23 ` Rik van Riel
2016-09-26 21:30   ` Linus Torvalds
2016-09-26 23:11   ` Kirill A. Shutemov
2016-09-27  1:01     ` Rik van Riel
2016-09-27  7:30 ` Peter Zijlstra
2016-09-27  8:54   ` Mel Gorman
2016-09-27  9:11     ` Kirill A. Shutemov
2016-09-27  9:42       ` Mel Gorman
2016-09-27  9:52       ` Minchan Kim
2016-09-27 12:11         ` Kirill A. Shutemov
2016-09-29  8:01     ` Peter Zijlstra
2016-09-29 12:55       ` Nicholas Piggin
2016-09-29 13:16         ` Peter Zijlstra
2016-09-29 13:54           ` Nicholas Piggin
2016-09-29 15:05         ` Rik van Riel
2016-09-27  8:03 ` Jan Kara
2016-09-27  8:31 ` Mel Gorman
2016-09-27 14:34   ` Peter Zijlstra
2016-09-27 15:08     ` Nicholas Piggin
2016-09-27 16:31     ` Linus Torvalds
2016-09-27 16:49       ` Peter Zijlstra
2016-09-28 10:45     ` Mel Gorman
2016-09-28 11:11       ` Peter Zijlstra
2016-09-28 16:10         ` Linus Torvalds
2016-09-29 13:08           ` Peter Zijlstra
2016-10-03 10:47             ` Mel Gorman [this message]
2016-09-27 14:53   ` Nicholas Piggin
2016-09-27 15:17     ` Nicholas Piggin
2016-09-27 16:52     ` Peter Zijlstra
2016-09-27 17:06       ` Nicholas Piggin
2016-09-28  7:05         ` Peter Zijlstra
2016-09-28 11:05           ` Paul E. McKenney
2016-09-28 11:16             ` Peter Zijlstra
2016-09-28 12:58               ` Paul E. McKenney
2016-09-29  1:31           ` Nicholas Piggin
2016-09-29  2:12             ` Paul E. McKenney
2016-09-29  6:21             ` Peter Zijlstra
2016-09-29  6:42               ` Nicholas Piggin
2016-09-29  7:14                 ` Peter Zijlstra
2016-09-29  7:43                   ` Peter Zijlstra
2016-09-28  7:40     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161003104743.GD3903@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).