From: Mel Gorman <mgorman@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>,
Dave Hansen <dave.hansen@intel.com>,
Ingo Molnar <mingo@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan@kernel.org>, H Peter Anvin <hpa@zytor.com>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/3] TLB flush multiple pages per IPI v5
Date: Wed, 10 Jun 2015 18:07:00 +0100 [thread overview]
Message-ID: <20150610170700.GG26425@suse.de> (raw)
In-Reply-To: <CA+55aFwVUkdaf0_rBk7uJHQjWXu+OcLTHc6FKuCn0Cb2Kvg9NA@mail.gmail.com>
On Wed, Jun 10, 2015 at 09:17:15AM -0700, Linus Torvalds wrote:
> On Wed, Jun 10, 2015 at 6:13 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> > Assuming the page tables are cache-hot... And hot here does not mean
> > L3 cache, but higher. But a memory intensive workload can easily
> > violate that.
>
> In practice, no.
>
> You'll spend all your time on the actual real data cache misses, the
> TLB misses won't be any more noticeable.
>
> And if your access patters are even *remoptely* cache-friendly (ie
> _not_ spending all your time just waiting for regular data cache
> misses), then a radix-tree-like page table like Intel will have much
> better locality in the page tables than in the actual data. So again,
> the TLB misses won't be your big problem.
>
> There may be pathological cases where you just look at one word per
> page, but let's face it, we don't optimize for pathological or
> unrealistic cases.
>
It's concerns like this that have me avoiding any micro-benchmarking approach
that tried to measure the indirect costs of refills. No matter what the
microbenchmark does, there will be other cases that render it irrelevant.
> And the thing is, you need to look at the costs. Single-page
> invalidation taking hundreds of cycles? Yeah, we definitely need to
> take the downside of trying to be clever into account.
>
> If the invalidation was really cheap, the rules might change. As it
> is, I really don't think there is any question about this.
>
> That's particularly true when the single-page invalidation approach
> has lots of *software* overhead too - not just the complexity, but
> even "obvious" costs feeding the list of pages to be invalidated
> across CPU's. Think about it - there are cache misses there too, and
> because we do those across CPU's those cache misses are *mandatory*.
>
> So trying to avoid a few TLB misses by forcing mandatory cache misses
> and extra complexity, and by doing lots of 200+ cycle operations?
> Really? In what universe does that sound like a good idea?
>
> Quite frankly, I can pretty much *guarantee* that you didn't actually
> think about any real numbers, you've just been taught that fairy-tale
> of "TLB misses are expensive". As if TLB entries were somehow sacred.
>
Everyone has been taught that one. Papers I've read from the last two
years on TLB implementations or page reclaim management bring this up as
a supporting point for whatever they are proposing. It was partially why
I kept PFN tracking and also to put much of the cost on the reclaimer and
minimise interference on the recipient of the IPI. I still think it was
a rational concern but will assume that refills are cheaper than smart
invalidations until it can be proven otherwise.
> If somebody can show real numbers on a real workload, that's one
> thing.
The last adjustments made today to the series are at
http://git.kernel.org/cgit/linux/kernel/git/mel/linux-balancenuma.git/log/?h=mm-vmscan-lessipi-v7r5
I'll redo it on top of 4.2-rc1 whenever that happens so gets a full round
in linux-next. Patch 4 can be revisited if a real workload is found that
is not deliberately pathological running on a CPU that matters. The forward
port of patch 4 for testing will be trivial.
It also separated out the dynamic allocation of the structure so that it
can be excluded if deemed to be an unnecessary complication.
> So anyway, I like the patch series. I just think that the final patch
> - the one that actually saves the addreses, and limits things to
> BATCH_TLBFLUSH_SIZE, should be limited.
>
I see your logic but if it's limited then we send more IPIs and it's all
crappy tradeoffs. If a real workload complains, it'll be far easier to
work with.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>,
Dave Hansen <dave.hansen@intel.com>,
Ingo Molnar <mingo@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan@kernel.org>, H Peter Anvin <hpa@zytor.com>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/3] TLB flush multiple pages per IPI v5
Date: Wed, 10 Jun 2015 18:07:00 +0100 [thread overview]
Message-ID: <20150610170700.GG26425@suse.de> (raw)
In-Reply-To: <CA+55aFwVUkdaf0_rBk7uJHQjWXu+OcLTHc6FKuCn0Cb2Kvg9NA@mail.gmail.com>
On Wed, Jun 10, 2015 at 09:17:15AM -0700, Linus Torvalds wrote:
> On Wed, Jun 10, 2015 at 6:13 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> > Assuming the page tables are cache-hot... And hot here does not mean
> > L3 cache, but higher. But a memory intensive workload can easily
> > violate that.
>
> In practice, no.
>
> You'll spend all your time on the actual real data cache misses, the
> TLB misses won't be any more noticeable.
>
> And if your access patters are even *remoptely* cache-friendly (ie
> _not_ spending all your time just waiting for regular data cache
> misses), then a radix-tree-like page table like Intel will have much
> better locality in the page tables than in the actual data. So again,
> the TLB misses won't be your big problem.
>
> There may be pathological cases where you just look at one word per
> page, but let's face it, we don't optimize for pathological or
> unrealistic cases.
>
It's concerns like this that have me avoiding any micro-benchmarking approach
that tried to measure the indirect costs of refills. No matter what the
microbenchmark does, there will be other cases that render it irrelevant.
> And the thing is, you need to look at the costs. Single-page
> invalidation taking hundreds of cycles? Yeah, we definitely need to
> take the downside of trying to be clever into account.
>
> If the invalidation was really cheap, the rules might change. As it
> is, I really don't think there is any question about this.
>
> That's particularly true when the single-page invalidation approach
> has lots of *software* overhead too - not just the complexity, but
> even "obvious" costs feeding the list of pages to be invalidated
> across CPU's. Think about it - there are cache misses there too, and
> because we do those across CPU's those cache misses are *mandatory*.
>
> So trying to avoid a few TLB misses by forcing mandatory cache misses
> and extra complexity, and by doing lots of 200+ cycle operations?
> Really? In what universe does that sound like a good idea?
>
> Quite frankly, I can pretty much *guarantee* that you didn't actually
> think about any real numbers, you've just been taught that fairy-tale
> of "TLB misses are expensive". As if TLB entries were somehow sacred.
>
Everyone has been taught that one. Papers I've read from the last two
years on TLB implementations or page reclaim management bring this up as
a supporting point for whatever they are proposing. It was partially why
I kept PFN tracking and also to put much of the cost on the reclaimer and
minimise interference on the recipient of the IPI. I still think it was
a rational concern but will assume that refills are cheaper than smart
invalidations until it can be proven otherwise.
> If somebody can show real numbers on a real workload, that's one
> thing.
The last adjustments made today to the series are at
http://git.kernel.org/cgit/linux/kernel/git/mel/linux-balancenuma.git/log/?h=mm-vmscan-lessipi-v7r5
I'll redo it on top of 4.2-rc1 whenever that happens so gets a full round
in linux-next. Patch 4 can be revisited if a real workload is found that
is not deliberately pathological running on a CPU that matters. The forward
port of patch 4 for testing will be trivial.
It also separated out the dynamic allocation of the structure so that it
can be excluded if deemed to be an unnecessary complication.
> So anyway, I like the patch series. I just think that the final patch
> - the one that actually saves the addreses, and limits things to
> BATCH_TLBFLUSH_SIZE, should be limited.
>
I see your logic but if it's limited then we send more IPIs and it's all
crappy tradeoffs. If a real workload complains, it'll be far easier to
work with.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2015-06-10 17:07 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-08 12:50 [PATCH 0/3] TLB flush multiple pages per IPI v5 Mel Gorman
2015-06-08 12:50 ` Mel Gorman
2015-06-08 12:50 ` [PATCH 1/3] x86, mm: Trace when an IPI is about to be sent Mel Gorman
2015-06-08 12:50 ` Mel Gorman
2015-06-08 12:50 ` [PATCH 2/3] mm: Send one IPI per CPU to TLB flush multiple pages that were recently unmapped Mel Gorman
2015-06-08 12:50 ` Mel Gorman
2015-06-08 22:38 ` Andrew Morton
2015-06-08 22:38 ` Andrew Morton
2015-06-09 11:07 ` Mel Gorman
2015-06-09 11:07 ` Mel Gorman
2015-06-08 12:50 ` [PATCH 3/3] mm: Defer flush of writable TLB entries Mel Gorman
2015-06-08 12:50 ` Mel Gorman
2015-06-08 17:45 ` [PATCH 0/3] TLB flush multiple pages per IPI v5 Ingo Molnar
2015-06-08 17:45 ` Ingo Molnar
2015-06-08 18:21 ` Dave Hansen
2015-06-08 18:21 ` Dave Hansen
2015-06-08 19:52 ` Ingo Molnar
2015-06-08 19:52 ` Ingo Molnar
2015-06-08 20:03 ` Ingo Molnar
2015-06-08 20:03 ` Ingo Molnar
2015-06-08 21:07 ` Dave Hansen
2015-06-08 21:07 ` Dave Hansen
2015-06-08 21:50 ` Ingo Molnar
2015-06-08 21:50 ` Ingo Molnar
2015-06-09 8:47 ` Mel Gorman
2015-06-09 8:47 ` Mel Gorman
2015-06-09 10:32 ` Ingo Molnar
2015-06-09 10:32 ` Ingo Molnar
2015-06-09 11:20 ` Mel Gorman
2015-06-09 11:20 ` Mel Gorman
2015-06-09 12:43 ` Ingo Molnar
2015-06-09 12:43 ` Ingo Molnar
2015-06-09 13:05 ` Mel Gorman
2015-06-09 13:05 ` Mel Gorman
2015-06-10 8:51 ` Ingo Molnar
2015-06-10 8:51 ` Ingo Molnar
2015-06-10 9:08 ` Ingo Molnar
2015-06-10 9:08 ` Ingo Molnar
2015-06-10 10:15 ` Mel Gorman
2015-06-10 10:15 ` Mel Gorman
2015-06-11 15:26 ` Ingo Molnar
2015-06-11 15:26 ` Ingo Molnar
2015-06-10 9:19 ` Mel Gorman
2015-06-10 9:19 ` Mel Gorman
2015-06-09 15:34 ` Dave Hansen
2015-06-09 15:34 ` Dave Hansen
2015-06-09 16:49 ` Dave Hansen
2015-06-09 16:49 ` Dave Hansen
2015-06-09 21:14 ` Dave Hansen
2015-06-09 21:14 ` Dave Hansen
2015-06-09 21:54 ` Linus Torvalds
2015-06-09 21:54 ` Linus Torvalds
2015-06-09 22:32 ` Mel Gorman
2015-06-09 22:32 ` Mel Gorman
2015-06-09 22:35 ` Mel Gorman
2015-06-09 22:35 ` Mel Gorman
2015-06-10 13:13 ` Andi Kleen
2015-06-10 13:13 ` Andi Kleen
2015-06-10 16:17 ` Linus Torvalds
2015-06-10 16:17 ` Linus Torvalds
2015-06-10 16:42 ` Linus Torvalds
2015-06-10 16:42 ` Linus Torvalds
2015-06-10 17:24 ` Mel Gorman
2015-06-10 17:24 ` Mel Gorman
2015-06-10 17:31 ` Linus Torvalds
2015-06-10 17:31 ` Linus Torvalds
2015-06-10 18:08 ` Josh Boyer
2015-06-10 18:08 ` Josh Boyer
2015-06-10 17:07 ` Mel Gorman [this message]
2015-06-10 17:07 ` Mel Gorman
2015-06-21 20:22 ` Kirill A. Shutemov
2015-06-21 20:22 ` Kirill A. Shutemov
2015-06-25 11:48 ` Ingo Molnar
2015-06-25 11:48 ` Ingo Molnar
2015-06-25 18:36 ` Linus Torvalds
2015-06-25 19:15 ` Vlastimil Babka
2015-06-25 19:15 ` Vlastimil Babka
2015-06-25 22:04 ` Linus Torvalds
2015-06-25 22:04 ` Linus Torvalds
2015-06-25 18:46 ` Dave Hansen
2015-06-25 18:46 ` Dave Hansen
2015-06-26 9:08 ` Ingo Molnar
2015-06-26 9:08 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150610170700.GG26425@suse.de \
--to=mgorman@suse.de \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=dave.hansen@intel.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=mingo@kernel.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.