Re: [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Andi Kleen <andi@firstfloor.org>, H Peter Anvin <hpa@zytor.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages
Date: Wed, 10 Jun 2015 09:51:30 +0100	[thread overview]
Message-ID: <20150610085130.GA26425@suse.de> (raw)
In-Reply-To: <20150610082107.GA23575@gmail.com>

On Wed, Jun 10, 2015 at 10:21:07AM +0200, Ingo Molnar wrote:
> 
> * Mel Gorman <mgorman@suse.de> wrote:
> 
> > On Wed, Jun 10, 2015 at 09:47:04AM +0200, Ingo Molnar wrote:
> > > 
> > > * Mel Gorman <mgorman@suse.de> wrote:
> > > 
> > > > --- a/include/linux/sched.h
> > > > +++ b/include/linux/sched.h
> > > > @@ -1289,6 +1289,18 @@ enum perf_event_task_context {
> > > >  	perf_nr_task_contexts,
> > > >  };
> > > >  
> > > > +/* Track pages that require TLB flushes */
> > > > +struct tlbflush_unmap_batch {
> > > > +	/*
> > > > +	 * Each bit set is a CPU that potentially has a TLB entry for one of
> > > > +	 * the PFNs being flushed. See set_tlb_ubc_flush_pending().
> > > > +	 */
> > > > +	struct cpumask cpumask;
> > > > +
> > > > +	/* True if any bit in cpumask is set */
> > > > +	bool flush_required;
> > > > +};
> > > > +
> > > >  struct task_struct {
> > > >  	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
> > > >  	void *stack;
> > > > @@ -1648,6 +1660,10 @@ struct task_struct {
> > > >  	unsigned long numa_pages_migrated;
> > > >  #endif /* CONFIG_NUMA_BALANCING */
> > > >  
> > > > +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > > > +	struct tlbflush_unmap_batch *tlb_ubc;
> > > > +#endif
> > > 
> > > Please embedd this constant size structure in task_struct directly so that the 
> > > whole per task allocation overhead goes away:
> > > 
> > 
> > That puts a structure (72 bytes in the config I used) within the task struct 
> > even when it's not required. On a lightly loaded system direct reclaim will not 
> > be active and for some processes, it'll never be active. It's very wasteful.
> 
> For certain values of 'very'.
> 
>  - 72 bytes suggests that you have NR_CPUS set to 512 or so? On a kernel sized to 
>    such large systems with 1000 active tasks we are talking about about +72K of 
>    RAM...
> 

The NR_CPUS is based on the openSUSE 13.1 distro config so yes, it's large but I also
expect it to be a common configuration.

>  - Furthermore, by embedding it it gets packed better with neighboring task_struct 
>    fields, while by allocating it dynamically it's a separate cache line wasted.
> 

A separate cache line that is only used during direct reclaim when the
process is taking a large hit anyway

>  - Plus by allocating it separately you spend two cachelines on it: each slab will 
>    be at least cacheline aligned, and 72 bytes will allocate 128 bytes. So when 
>    this gets triggered you've just wasted some more RAM.
> 
>  - I mean, if it had dynamic size, or was arguably huge. But this is just a 
>    cpumask and a boolean!
> 

It gets larger with enterprise configs.

>  - The cpumask will be dynamic if you increase the NR_CPUS count any more than 
>    that - in which case embedding the structure is the right choice again.
> 

Enterprise configurations are larger. The most recent one I checked defined
NR_CPUS as 8192. If it's embedded in the structure, it means that we need
to call cpumask_clear on every fork even if it's never used. That adds
constant overhead to a fast path to avoid an allocation and a few cache
misses in a direct reclaim path. Are you certain you want that trade-off?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Andi Kleen <andi@firstfloor.org>, H Peter Anvin <hpa@zytor.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages
Date: Wed, 10 Jun 2015 09:51:30 +0100	[thread overview]
Message-ID: <20150610085130.GA26425@suse.de> (raw)
In-Reply-To: <20150610082107.GA23575@gmail.com>

On Wed, Jun 10, 2015 at 10:21:07AM +0200, Ingo Molnar wrote:
> 
> * Mel Gorman <mgorman@suse.de> wrote:
> 
> > On Wed, Jun 10, 2015 at 09:47:04AM +0200, Ingo Molnar wrote:
> > > 
> > > * Mel Gorman <mgorman@suse.de> wrote:
> > > 
> > > > --- a/include/linux/sched.h
> > > > +++ b/include/linux/sched.h
> > > > @@ -1289,6 +1289,18 @@ enum perf_event_task_context {
> > > >  	perf_nr_task_contexts,
> > > >  };
> > > >  
> > > > +/* Track pages that require TLB flushes */
> > > > +struct tlbflush_unmap_batch {
> > > > +	/*
> > > > +	 * Each bit set is a CPU that potentially has a TLB entry for one of
> > > > +	 * the PFNs being flushed. See set_tlb_ubc_flush_pending().
> > > > +	 */
> > > > +	struct cpumask cpumask;
> > > > +
> > > > +	/* True if any bit in cpumask is set */
> > > > +	bool flush_required;
> > > > +};
> > > > +
> > > >  struct task_struct {
> > > >  	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
> > > >  	void *stack;
> > > > @@ -1648,6 +1660,10 @@ struct task_struct {
> > > >  	unsigned long numa_pages_migrated;
> > > >  #endif /* CONFIG_NUMA_BALANCING */
> > > >  
> > > > +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > > > +	struct tlbflush_unmap_batch *tlb_ubc;
> > > > +#endif
> > > 
> > > Please embedd this constant size structure in task_struct directly so that the 
> > > whole per task allocation overhead goes away:
> > > 
> > 
> > That puts a structure (72 bytes in the config I used) within the task struct 
> > even when it's not required. On a lightly loaded system direct reclaim will not 
> > be active and for some processes, it'll never be active. It's very wasteful.
> 
> For certain values of 'very'.
> 
>  - 72 bytes suggests that you have NR_CPUS set to 512 or so? On a kernel sized to 
>    such large systems with 1000 active tasks we are talking about about +72K of 
>    RAM...
> 

The NR_CPUS is based on the openSUSE 13.1 distro config so yes, it's large but I also
expect it to be a common configuration.

>  - Furthermore, by embedding it it gets packed better with neighboring task_struct 
>    fields, while by allocating it dynamically it's a separate cache line wasted.
> 

A separate cache line that is only used during direct reclaim when the
process is taking a large hit anyway

>  - Plus by allocating it separately you spend two cachelines on it: each slab will 
>    be at least cacheline aligned, and 72 bytes will allocate 128 bytes. So when 
>    this gets triggered you've just wasted some more RAM.
> 
>  - I mean, if it had dynamic size, or was arguably huge. But this is just a 
>    cpumask and a boolean!
> 

It gets larger with enterprise configs.

>  - The cpumask will be dynamic if you increase the NR_CPUS count any more than 
>    that - in which case embedding the structure is the right choice again.
> 

Enterprise configurations are larger. The most recent one I checked defined
NR_CPUS as 8192. If it's embedded in the structure, it means that we need
to call cpumask_clear on every fork even if it's never used. That adds
constant overhead to a fast path to avoid an allocation and a few cache
misses in a direct reclaim path. Are you certain you want that trade-off?

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2015-06-10  8:51 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-09 17:31 [PATCH 0/3] TLB flush multiple pages per IPI v6 Mel Gorman
2015-06-09 17:31 ` Mel Gorman
2015-06-09 17:31 ` [PATCH 1/4] x86, mm: Trace when an IPI is about to be sent Mel Gorman
2015-06-09 17:31   ` Mel Gorman
2015-06-09 17:31 ` [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages Mel Gorman
2015-06-09 17:31   ` Mel Gorman
2015-06-09 20:01   ` Rik van Riel
2015-06-09 20:01     ` Rik van Riel
2015-06-10  7:47   ` Ingo Molnar
2015-06-10  7:47     ` Ingo Molnar
2015-06-10  8:14     ` Mel Gorman
2015-06-10  8:14       ` Mel Gorman
2015-06-10  8:21       ` Ingo Molnar
2015-06-10  8:21         ` Ingo Molnar
2015-06-10  8:51         ` Mel Gorman [this message]
2015-06-10  8:51           ` Mel Gorman
2015-06-10  8:26   ` Ingo Molnar
2015-06-10  8:26     ` Ingo Molnar
2015-06-10  9:58     ` Mel Gorman
2015-06-10  9:58       ` Mel Gorman
2015-06-10  8:33   ` Ingo Molnar
2015-06-10  8:33     ` Ingo Molnar
2015-06-10  8:59     ` Mel Gorman
2015-06-10  8:59       ` Mel Gorman
2015-06-11 15:02       ` Ingo Molnar
2015-06-11 15:02         ` Ingo Molnar
2015-06-11 15:25         ` Mel Gorman
2015-06-11 15:25           ` Mel Gorman
2015-06-09 17:31 ` [PATCH 3/4] mm: Defer flush of writable TLB entries Mel Gorman
2015-06-09 17:31   ` Mel Gorman
2015-06-09 20:02   ` Rik van Riel
2015-06-09 20:02     ` Rik van Riel
2015-06-10  7:50   ` Ingo Molnar
2015-06-10  7:50     ` Ingo Molnar
2015-06-10  8:17     ` Mel Gorman
2015-06-10  8:17       ` Mel Gorman
2015-06-09 17:31 ` [PATCH 4/4] mm: Send one IPI per CPU to TLB flush pages that were recently unmapped Mel Gorman
2015-06-09 17:31   ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2015-07-06 13:39 [PATCH 0/4] TLB flush multiple pages per IPI v7 Mel Gorman
2015-07-06 13:39 ` [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages Mel Gorman
2015-07-06 13:39   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150610085130.GA26425@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.