From: Will Deacon <will.deacon@arm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Russell King - ARM Linux <linux@arm.linux.org.uk>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [RFC PATCH 1/2] zap_pte_range: update addr when forcing flush after TLB batching faiure
Date: Tue, 28 Oct 2014 17:07:15 +0000 [thread overview]
Message-ID: <20141028170715.GJ29706@arm.com> (raw)
In-Reply-To: <CA+55aFxvz=guax4KcAszyjkqdqXGwV38O+G23xMvGFJDTrZqtg@mail.gmail.com>
On Tue, Oct 28, 2014 at 04:25:35PM +0000, Linus Torvalds wrote:
> On Tue, Oct 28, 2014 at 9:07 AM, Will Deacon <will.deacon@arm.com> wrote:
> > I was certainly seeing this issue trigger regularly when running firefox,
> > but I'll need to dig and find out the differences in range size.
>
> I'm wondering whether that was perhaps because of the mix-up with
> initialization of the range. Afaik, that would always break your
> min/max thing for the first batch (and since the batches are fairly
> large, "first" may be "only")
>
> But hey. it's possible that firefox does some big mappings but only
> populates the beginning. Most architectures don't tend to have
> excessive glass jaws in this area: invalidating things page-by-page is
> invariably so slow that at some point you just go "just do the whole
> range".
>
> > Since we have hardware broadcasting of TLB invalidations on ARM, it is
> > in our interest to keep the number of outstanding operations as small as
> > possible, particularly on large systems where we don't get the targetted
> > shootdown with a single message that you can perform using IPIs (i.e.
> > you can only broadcast to all or no CPUs, and that happens for each pte).
>
> Do you seriously *have* to broadcast for each pte?
>
> Because that is quite frankly moronic. We batch things up in software
> for a real good reason: doing things one entry at a time just cannot
> ever scale. At some point (and that point is usually not even very far
> away), it's much better to do a single invalidate over a range. The
> cost of having to refill the TLB's is *much* smaller than the cost of
> doing tons of cross-CPU invalidates.
I don't think that's necessarily true, at least not on the systems I'm
familiar with. A table walk can be comparatively expensive, particularly
when virtualisation is involved and the depth of the host and guest page
tables starts to grow -- we're talking >20 memory accesses per walk. By
contrast, the TLB invalidation messages are asynchronous and carried on
the interconnect (a DSB instruction is used to synchronise the updates).
> That's true even for the cases where we track the CPU's involved in
> that mapping, and only invalidate a small subset. With a "all CPU's
> broadcast", the cross-over point must be even smaller. Doing thousands
> of CPU broadcasts is just crazy, even if they are hw-accelerated.
>
> Can't you just do a full invalidate and a SW IPI for larger ranges?
We already do that, but it's mainly there to catch *really* large ranges
(like the negative ones...), which can trigger the soft lockup detector.
The cases we've seen for this so far have been bugs (e.g. this thread and
also a related issue where we try to flush the whole of vmalloc space).
> And as mentioned, true sparse mappings are actually fairly rare, so
> making extra effort (and data structures) to have individual ranges
> sounds crazy.
Sure, I'll try and get some data on this. I'd like to resolve the THP case,
at least, which means keeping track of calls to __tlb_remove_pmd_tlb_entry.
> Is this some hw-enforced thing? You really can't turn off the
> cross-cpu-for-each-pte braindamage?
We could use IPIs if we wanted to and issue local TLB invalidations on
the targetted cores, but I'd be surprised if this showed an improvement
on ARM-based systems.
Will
next prev parent reply other threads:[~2014-10-28 17:07 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-28 11:44 [RFC PATCH 0/2] Fix a couple of issues with zap_pte_range and MMU gather Will Deacon
2014-10-28 11:44 ` [RFC PATCH 1/2] zap_pte_range: update addr when forcing flush after TLB batching faiure Will Deacon
2014-10-28 15:30 ` Linus Torvalds
2014-10-28 16:07 ` Will Deacon
2014-10-28 16:25 ` Linus Torvalds
2014-10-28 17:07 ` Will Deacon [this message]
2014-10-28 18:03 ` Linus Torvalds
2014-10-28 21:16 ` Benjamin Herrenschmidt
2014-10-28 21:32 ` Linus Torvalds
2014-10-28 21:40 ` Linus Torvalds
2014-10-29 19:47 ` Will Deacon
2014-10-29 21:11 ` Linus Torvalds
2014-10-29 21:27 ` Benjamin Herrenschmidt
2014-11-01 17:01 ` Linus Torvalds
2014-11-01 20:25 ` Benjamin Herrenschmidt
2014-11-03 17:56 ` Will Deacon
2014-11-03 18:05 ` Linus Torvalds
2014-11-04 14:29 ` Catalin Marinas
2014-11-04 16:08 ` Linus Torvalds
2014-11-06 13:57 ` Catalin Marinas
2014-11-06 17:53 ` Linus Torvalds
2014-11-06 18:38 ` Catalin Marinas
2014-11-06 21:29 ` Linus Torvalds
2014-11-07 16:50 ` Catalin Marinas
2014-11-10 13:56 ` Will Deacon
2014-10-28 11:44 ` [RFC PATCH 2/2] zap_pte_range: fix partial TLB flushing in response to a dirty pte Will Deacon
2014-10-28 15:18 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141028170715.GJ29706@arm.com \
--to=will.deacon@arm.com \
--cc=benh@kernel.crashing.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.