From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan@kernel.org>,
Dave Hansen <dave.hansen@intel.com>,
Andi Kleen <andi@firstfloor.org>, H Peter Anvin <hpa@zytor.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages
Date: Wed, 10 Jun 2015 10:58:26 +0100 [thread overview]
Message-ID: <20150610095826.GD26425@suse.de> (raw)
In-Reply-To: <20150610082640.GA24483@gmail.com>
On Wed, Jun 10, 2015 at 10:26:40AM +0200, Ingo Molnar wrote:
>
> * Mel Gorman <mgorman@suse.de> wrote:
>
> > On a 4-socket machine the results were
> >
> > 4.1.0-rc6 4.1.0-rc6
> > batchdirty-v6 batchunmap-v6
> > Ops lru-file-mmap-read-elapsed 121.27 ( 0.00%) 118.79 ( 2.05%)
> >
> > 4.1.0-rc6 4.1.0-rc6
> > batchdirty-v6 batchunmap-v6
> > User 620.84 608.48
> > System 4245.35 4152.89
> > Elapsed 122.65 120.15
> >
> > In this case the workload completed faster and there was less CPU overhead
> > but as it's a NUMA machine there are a lot of factors at play. It's easier
> > to quantify on a single socket machine;
> >
> > 4.1.0-rc6 4.1.0-rc6
> > batchdirty-v6 batchunmap-v6
> > Ops lru-file-mmap-read-elapsed 20.35 ( 0.00%) 21.52 ( -5.75%)
> >
> > 4.1.0-rc6 4.1.0-rc6
> > batchdirty-v6r5batchunmap-v6r5
> > User 58.02 60.70
> > System 77.57 81.92
> > Elapsed 22.14 23.16
> >
> > That shows the workload takes 5.75% longer to complete with a similar
> > increase in the system CPU usage.
>
> Btw., do you have any stddev noise numbers?
>
4.1.0-rc6 4.1.0-rc6 4.1.0-rc6 4.1.0-rc6
vanilla flushfull-v6r5 batchdirty-v6r5 batchunmap-v6r5
Ops lru-file-mmap-read-elapsed 25.43 ( 0.00%) 20.59 ( 19.03%) 20.35 ( 19.98%) 21.52 ( 15.38%)
Ops lru-file-mmap-read-time_stddv 0.32 ( 0.00%) 0.32 ( -1.30%) 0.39 (-23.00%) 0.45 (-40.91%)
flushfull -- patch 2
batchdirty -- patch 3
batchunmap -- patch 4
So the impact of tracking the PFNs is outside the noise and there is
definite direct cost to it. This was expected for both the PFN tracking
and the individual flushes.
> The batching speedup is brutal enough to not need any noise estimations, it's a
> clear winner.
>
Agreed.
> But this PFN tracking patch is more difficult to judge as the numbers are pretty
> close to each other.
>
It's definitely measurable, no doubt about it and there never was. The
concerns were always the refill costs due to flushing potentially active
TLB entries unnecessarily. From https://lkml.org/lkml/2014/7/31/825, this
is potentially high where it says that a 512 DTLB refill takes 22,000
cycles which is higher than the individual flushes. However, this is an
estimate and it'll always be a case of "it depends". It's been asserted
that the refill costs are really low so lets just go with that, drop
patch 4 and wait and see who complains.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan@kernel.org>,
Dave Hansen <dave.hansen@intel.com>,
Andi Kleen <andi@firstfloor.org>, H Peter Anvin <hpa@zytor.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages
Date: Wed, 10 Jun 2015 10:58:26 +0100 [thread overview]
Message-ID: <20150610095826.GD26425@suse.de> (raw)
In-Reply-To: <20150610082640.GA24483@gmail.com>
On Wed, Jun 10, 2015 at 10:26:40AM +0200, Ingo Molnar wrote:
>
> * Mel Gorman <mgorman@suse.de> wrote:
>
> > On a 4-socket machine the results were
> >
> > 4.1.0-rc6 4.1.0-rc6
> > batchdirty-v6 batchunmap-v6
> > Ops lru-file-mmap-read-elapsed 121.27 ( 0.00%) 118.79 ( 2.05%)
> >
> > 4.1.0-rc6 4.1.0-rc6
> > batchdirty-v6 batchunmap-v6
> > User 620.84 608.48
> > System 4245.35 4152.89
> > Elapsed 122.65 120.15
> >
> > In this case the workload completed faster and there was less CPU overhead
> > but as it's a NUMA machine there are a lot of factors at play. It's easier
> > to quantify on a single socket machine;
> >
> > 4.1.0-rc6 4.1.0-rc6
> > batchdirty-v6 batchunmap-v6
> > Ops lru-file-mmap-read-elapsed 20.35 ( 0.00%) 21.52 ( -5.75%)
> >
> > 4.1.0-rc6 4.1.0-rc6
> > batchdirty-v6r5batchunmap-v6r5
> > User 58.02 60.70
> > System 77.57 81.92
> > Elapsed 22.14 23.16
> >
> > That shows the workload takes 5.75% longer to complete with a similar
> > increase in the system CPU usage.
>
> Btw., do you have any stddev noise numbers?
>
4.1.0-rc6 4.1.0-rc6 4.1.0-rc6 4.1.0-rc6
vanilla flushfull-v6r5 batchdirty-v6r5 batchunmap-v6r5
Ops lru-file-mmap-read-elapsed 25.43 ( 0.00%) 20.59 ( 19.03%) 20.35 ( 19.98%) 21.52 ( 15.38%)
Ops lru-file-mmap-read-time_stddv 0.32 ( 0.00%) 0.32 ( -1.30%) 0.39 (-23.00%) 0.45 (-40.91%)
flushfull -- patch 2
batchdirty -- patch 3
batchunmap -- patch 4
So the impact of tracking the PFNs is outside the noise and there is
definite direct cost to it. This was expected for both the PFN tracking
and the individual flushes.
> The batching speedup is brutal enough to not need any noise estimations, it's a
> clear winner.
>
Agreed.
> But this PFN tracking patch is more difficult to judge as the numbers are pretty
> close to each other.
>
It's definitely measurable, no doubt about it and there never was. The
concerns were always the refill costs due to flushing potentially active
TLB entries unnecessarily. From https://lkml.org/lkml/2014/7/31/825, this
is potentially high where it says that a 512 DTLB refill takes 22,000
cycles which is higher than the individual flushes. However, this is an
estimate and it'll always be a case of "it depends". It's been asserted
that the refill costs are really low so lets just go with that, drop
patch 4 and wait and see who complains.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2015-06-10 9:58 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-09 17:31 [PATCH 0/3] TLB flush multiple pages per IPI v6 Mel Gorman
2015-06-09 17:31 ` Mel Gorman
2015-06-09 17:31 ` [PATCH 1/4] x86, mm: Trace when an IPI is about to be sent Mel Gorman
2015-06-09 17:31 ` Mel Gorman
2015-06-09 17:31 ` [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages Mel Gorman
2015-06-09 17:31 ` Mel Gorman
2015-06-09 20:01 ` Rik van Riel
2015-06-09 20:01 ` Rik van Riel
2015-06-10 7:47 ` Ingo Molnar
2015-06-10 7:47 ` Ingo Molnar
2015-06-10 8:14 ` Mel Gorman
2015-06-10 8:14 ` Mel Gorman
2015-06-10 8:21 ` Ingo Molnar
2015-06-10 8:21 ` Ingo Molnar
2015-06-10 8:51 ` Mel Gorman
2015-06-10 8:51 ` Mel Gorman
2015-06-10 8:26 ` Ingo Molnar
2015-06-10 8:26 ` Ingo Molnar
2015-06-10 9:58 ` Mel Gorman [this message]
2015-06-10 9:58 ` Mel Gorman
2015-06-10 8:33 ` Ingo Molnar
2015-06-10 8:33 ` Ingo Molnar
2015-06-10 8:59 ` Mel Gorman
2015-06-10 8:59 ` Mel Gorman
2015-06-11 15:02 ` Ingo Molnar
2015-06-11 15:02 ` Ingo Molnar
2015-06-11 15:25 ` Mel Gorman
2015-06-11 15:25 ` Mel Gorman
2015-06-09 17:31 ` [PATCH 3/4] mm: Defer flush of writable TLB entries Mel Gorman
2015-06-09 17:31 ` Mel Gorman
2015-06-09 20:02 ` Rik van Riel
2015-06-09 20:02 ` Rik van Riel
2015-06-10 7:50 ` Ingo Molnar
2015-06-10 7:50 ` Ingo Molnar
2015-06-10 8:17 ` Mel Gorman
2015-06-10 8:17 ` Mel Gorman
2015-06-09 17:31 ` [PATCH 4/4] mm: Send one IPI per CPU to TLB flush pages that were recently unmapped Mel Gorman
2015-06-09 17:31 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2015-07-06 13:39 [PATCH 0/4] TLB flush multiple pages per IPI v7 Mel Gorman
2015-07-06 13:39 ` [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages Mel Gorman
2015-07-06 13:39 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150610095826.GD26425@suse.de \
--to=mgorman@suse.de \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=dave.hansen@intel.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=mingo@kernel.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.