All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Andi Kleen <andi@firstfloor.org>, H Peter Anvin <hpa@zytor.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages
Date: Wed, 10 Jun 2015 10:21:07 +0200	[thread overview]
Message-ID: <20150610082107.GA23575@gmail.com> (raw)
In-Reply-To: <20150610081432.GY26425@suse.de>


* Mel Gorman <mgorman@suse.de> wrote:

> On Wed, Jun 10, 2015 at 09:47:04AM +0200, Ingo Molnar wrote:
> > 
> > * Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -1289,6 +1289,18 @@ enum perf_event_task_context {
> > >  	perf_nr_task_contexts,
> > >  };
> > >  
> > > +/* Track pages that require TLB flushes */
> > > +struct tlbflush_unmap_batch {
> > > +	/*
> > > +	 * Each bit set is a CPU that potentially has a TLB entry for one of
> > > +	 * the PFNs being flushed. See set_tlb_ubc_flush_pending().
> > > +	 */
> > > +	struct cpumask cpumask;
> > > +
> > > +	/* True if any bit in cpumask is set */
> > > +	bool flush_required;
> > > +};
> > > +
> > >  struct task_struct {
> > >  	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
> > >  	void *stack;
> > > @@ -1648,6 +1660,10 @@ struct task_struct {
> > >  	unsigned long numa_pages_migrated;
> > >  #endif /* CONFIG_NUMA_BALANCING */
> > >  
> > > +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > > +	struct tlbflush_unmap_batch *tlb_ubc;
> > > +#endif
> > 
> > Please embedd this constant size structure in task_struct directly so that the 
> > whole per task allocation overhead goes away:
> > 
> 
> That puts a structure (72 bytes in the config I used) within the task struct 
> even when it's not required. On a lightly loaded system direct reclaim will not 
> be active and for some processes, it'll never be active. It's very wasteful.

For certain values of 'very'.

 - 72 bytes suggests that you have NR_CPUS set to 512 or so? On a kernel sized to 
   such large systems with 1000 active tasks we are talking about about +72K of 
   RAM...

 - Furthermore, by embedding it it gets packed better with neighboring task_struct 
   fields, while by allocating it dynamically it's a separate cache line wasted.

 - Plus by allocating it separately you spend two cachelines on it: each slab will 
   be at least cacheline aligned, and 72 bytes will allocate 128 bytes. So when 
   this gets triggered you've just wasted some more RAM.

 - I mean, if it had dynamic size, or was arguably huge. But this is just a 
   cpumask and a boolean!

 - The cpumask will be dynamic if you increase the NR_CPUS count any more than 
   that - in which case embedding the structure is the right choice again.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Andi Kleen <andi@firstfloor.org>, H Peter Anvin <hpa@zytor.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages
Date: Wed, 10 Jun 2015 10:21:07 +0200	[thread overview]
Message-ID: <20150610082107.GA23575@gmail.com> (raw)
In-Reply-To: <20150610081432.GY26425@suse.de>


* Mel Gorman <mgorman@suse.de> wrote:

> On Wed, Jun 10, 2015 at 09:47:04AM +0200, Ingo Molnar wrote:
> > 
> > * Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -1289,6 +1289,18 @@ enum perf_event_task_context {
> > >  	perf_nr_task_contexts,
> > >  };
> > >  
> > > +/* Track pages that require TLB flushes */
> > > +struct tlbflush_unmap_batch {
> > > +	/*
> > > +	 * Each bit set is a CPU that potentially has a TLB entry for one of
> > > +	 * the PFNs being flushed. See set_tlb_ubc_flush_pending().
> > > +	 */
> > > +	struct cpumask cpumask;
> > > +
> > > +	/* True if any bit in cpumask is set */
> > > +	bool flush_required;
> > > +};
> > > +
> > >  struct task_struct {
> > >  	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
> > >  	void *stack;
> > > @@ -1648,6 +1660,10 @@ struct task_struct {
> > >  	unsigned long numa_pages_migrated;
> > >  #endif /* CONFIG_NUMA_BALANCING */
> > >  
> > > +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > > +	struct tlbflush_unmap_batch *tlb_ubc;
> > > +#endif
> > 
> > Please embedd this constant size structure in task_struct directly so that the 
> > whole per task allocation overhead goes away:
> > 
> 
> That puts a structure (72 bytes in the config I used) within the task struct 
> even when it's not required. On a lightly loaded system direct reclaim will not 
> be active and for some processes, it'll never be active. It's very wasteful.

For certain values of 'very'.

 - 72 bytes suggests that you have NR_CPUS set to 512 or so? On a kernel sized to 
   such large systems with 1000 active tasks we are talking about about +72K of 
   RAM...

 - Furthermore, by embedding it it gets packed better with neighboring task_struct 
   fields, while by allocating it dynamically it's a separate cache line wasted.

 - Plus by allocating it separately you spend two cachelines on it: each slab will 
   be at least cacheline aligned, and 72 bytes will allocate 128 bytes. So when 
   this gets triggered you've just wasted some more RAM.

 - I mean, if it had dynamic size, or was arguably huge. But this is just a 
   cpumask and a boolean!

 - The cpumask will be dynamic if you increase the NR_CPUS count any more than 
   that - in which case embedding the structure is the right choice again.

Thanks,

	Ingo

  reply	other threads:[~2015-06-10  8:21 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-09 17:31 [PATCH 0/3] TLB flush multiple pages per IPI v6 Mel Gorman
2015-06-09 17:31 ` Mel Gorman
2015-06-09 17:31 ` [PATCH 1/4] x86, mm: Trace when an IPI is about to be sent Mel Gorman
2015-06-09 17:31   ` Mel Gorman
2015-06-09 17:31 ` [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages Mel Gorman
2015-06-09 17:31   ` Mel Gorman
2015-06-09 20:01   ` Rik van Riel
2015-06-09 20:01     ` Rik van Riel
2015-06-10  7:47   ` Ingo Molnar
2015-06-10  7:47     ` Ingo Molnar
2015-06-10  8:14     ` Mel Gorman
2015-06-10  8:14       ` Mel Gorman
2015-06-10  8:21       ` Ingo Molnar [this message]
2015-06-10  8:21         ` Ingo Molnar
2015-06-10  8:51         ` Mel Gorman
2015-06-10  8:51           ` Mel Gorman
2015-06-10  8:26   ` Ingo Molnar
2015-06-10  8:26     ` Ingo Molnar
2015-06-10  9:58     ` Mel Gorman
2015-06-10  9:58       ` Mel Gorman
2015-06-10  8:33   ` Ingo Molnar
2015-06-10  8:33     ` Ingo Molnar
2015-06-10  8:59     ` Mel Gorman
2015-06-10  8:59       ` Mel Gorman
2015-06-11 15:02       ` Ingo Molnar
2015-06-11 15:02         ` Ingo Molnar
2015-06-11 15:25         ` Mel Gorman
2015-06-11 15:25           ` Mel Gorman
2015-06-09 17:31 ` [PATCH 3/4] mm: Defer flush of writable TLB entries Mel Gorman
2015-06-09 17:31   ` Mel Gorman
2015-06-09 20:02   ` Rik van Riel
2015-06-09 20:02     ` Rik van Riel
2015-06-10  7:50   ` Ingo Molnar
2015-06-10  7:50     ` Ingo Molnar
2015-06-10  8:17     ` Mel Gorman
2015-06-10  8:17       ` Mel Gorman
2015-06-09 17:31 ` [PATCH 4/4] mm: Send one IPI per CPU to TLB flush pages that were recently unmapped Mel Gorman
2015-06-09 17:31   ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2015-07-06 13:39 [PATCH 0/4] TLB flush multiple pages per IPI v7 Mel Gorman
2015-07-06 13:39 ` [PATCH 2/4] mm: Send one IPI per CPU to TLB flush all entries after unmapping pages Mel Gorman
2015-07-06 13:39   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150610082107.GA23575@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.