arm64 flushing 255GB of vmalloc space takes too long

All of lore.kernel.org
 help / color / mirror / Atom feed

From: catalin.marinas@arm.com (Catalin Marinas)
To: linux-arm-kernel@lists.infradead.org
Subject: arm64 flushing 255GB of vmalloc space takes too long
Date: Thu, 24 Jul 2014 15:24:17 +0100	[thread overview]
Message-ID: <20140724142417.GE13371@arm.com> (raw)
In-Reply-To: <1406150734.12484.79.camel@deneb.redhat.com>

On Wed, Jul 23, 2014 at 10:25:34PM +0100, Mark Salter wrote:
> On Fri, 2014-07-11 at 13:45 +0100, Catalin Marinas wrote:
> > On Fri, Jul 11, 2014 at 02:26:48AM +0100, Laura Abbott wrote:
> > > Mark Salter actually proposed a fix to this back in May 
> > > 
> > > https://lkml.org/lkml/2014/5/2/311
> > > 
> > > I never saw any further comments on it though. It also matches what x86
> > > does with their TLB flushing. It fixes the problem for me and the threshold
> > > seems to be the best we can do unless we want to introduce options per
> > > platform. It will need to be rebased to the latest tree though.
> > 
> > There were other patches in this area and I forgot about this. The
> > problem is that the ARM architecture does not define the actual
> > micro-architectural implementation of the TLBs (and it shouldn't), so
> > there is no way to guess how many TLB entries there are. It's not an
> > easy figure to get either since there are multiple levels of caching for
> > the TLBs.
> > 
> > So we either guess some value here (we may not always be optimal) or we
> > put some time bound (e.g. based on sched_clock()) on how long to loop.
> > The latter is not optimal either, the only aim being to avoid
> > soft-lockups.
> 
> Sorry for the late reply...
> 
> So, what would you like to see wrt this, Catalin? A reworked patch based
> on time? IMO, something based on loop count or time seems better than
> the status quo of a CPU potentially wasting 10s of seconds flushing the
> tlb.

I think we could go with a loop for simplicity but with a larger number
of iterations only to avoid the lock-up (e.g. 1024, this would be 4MB
range). My concern is that for a few global mappings that may or may not
be in the TLB we nuke both the L1 and L2 TLBs (the latter can have over
1K entries). As for optimisation, I think we should look at the original
code generating such big ranges.

Would you mind posting a patch against the latest kernel?

-- 
Catalin

WARNING: multiple messages have this Message-ID (diff)

From: Catalin Marinas <catalin.marinas@arm.com>
To: Mark Salter <msalter@redhat.com>
Cc: Laura Abbott <lauraa@codeaurora.org>,
	Eric Miao <eric.y.miao@gmail.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Will Deacon <Will.Deacon@arm.com>,
	Russell King <linux@arm.linux.org.uk>
Subject: Re: arm64 flushing 255GB of vmalloc space takes too long
Date: Thu, 24 Jul 2014 15:24:17 +0100	[thread overview]
Message-ID: <20140724142417.GE13371@arm.com> (raw)
In-Reply-To: <1406150734.12484.79.camel@deneb.redhat.com>

On Wed, Jul 23, 2014 at 10:25:34PM +0100, Mark Salter wrote:
> On Fri, 2014-07-11 at 13:45 +0100, Catalin Marinas wrote:
> > On Fri, Jul 11, 2014 at 02:26:48AM +0100, Laura Abbott wrote:
> > > Mark Salter actually proposed a fix to this back in May 
> > > 
> > > https://lkml.org/lkml/2014/5/2/311
> > > 
> > > I never saw any further comments on it though. It also matches what x86
> > > does with their TLB flushing. It fixes the problem for me and the threshold
> > > seems to be the best we can do unless we want to introduce options per
> > > platform. It will need to be rebased to the latest tree though.
> > 
> > There were other patches in this area and I forgot about this. The
> > problem is that the ARM architecture does not define the actual
> > micro-architectural implementation of the TLBs (and it shouldn't), so
> > there is no way to guess how many TLB entries there are. It's not an
> > easy figure to get either since there are multiple levels of caching for
> > the TLBs.
> > 
> > So we either guess some value here (we may not always be optimal) or we
> > put some time bound (e.g. based on sched_clock()) on how long to loop.
> > The latter is not optimal either, the only aim being to avoid
> > soft-lockups.
> 
> Sorry for the late reply...
> 
> So, what would you like to see wrt this, Catalin? A reworked patch based
> on time? IMO, something based on loop count or time seems better than
> the status quo of a CPU potentially wasting 10s of seconds flushing the
> tlb.

I think we could go with a loop for simplicity but with a larger number
of iterations only to avoid the lock-up (e.g. 1024, this would be 4MB
range). My concern is that for a few global mappings that may or may not
be in the TLB we nuke both the L1 and L2 TLBs (the latter can have over
1K entries). As for optimisation, I think we should look at the original
code generating such big ranges.

Would you mind posting a patch against the latest kernel?

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-07-24 14:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-09 16:53 arm64 flushing 255GB of vmalloc space takes too long Eric Miao
2014-07-09 17:40 ` Catalin Marinas
2014-07-09 17:40   ` Catalin Marinas
2014-07-09 18:04   ` Eric Miao
2014-07-09 18:04     ` Eric Miao
2014-07-11  1:26     ` Laura Abbott
2014-07-11  1:26       ` Laura Abbott
2014-07-11 12:45       ` Catalin Marinas
2014-07-11 12:45         ` Catalin Marinas
2014-07-23 21:25         ` Mark Salter
2014-07-23 21:25           ` Mark Salter
2014-07-24 14:24           ` Catalin Marinas [this message]
2014-07-24 14:24             ` Catalin Marinas
2014-07-24 14:56             ` [PATCH] arm64: fix soft lockup due to large tlb flush range Mark Salter
2014-07-24 14:56               ` Mark Salter
2014-07-24 17:47               ` Catalin Marinas
2014-07-24 17:47                 ` Catalin Marinas
  -- strict thread matches above, loose matches on Subject: below --
2014-07-09  1:43 arm64 flushing 255GB of vmalloc space takes too long Laura Abbott
2014-07-09  1:43 ` Laura Abbott

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140724142417.GE13371@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.