Re: [PATCH v2] mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH v2] mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT
Date: Thu, 20 Dec 2012 10:24:38 +0000	[thread overview]
Message-ID: <20121220102438.GA10819@suse.de> (raw)
In-Reply-To: <20121219131316.7d13fcb1.akpm@linux-foundation.org>

On Wed, Dec 19, 2012 at 01:13:16PM -0800, Andrew Morton wrote:
> On Wed, 19 Dec 2012 16:04:37 +0100
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Since e303297 (mm: extended batches for generic mmu_gather) we are batching
> > pages to be freed until either tlb_next_batch cannot allocate a new batch or we
> > are done.
> > 
> > This works just fine most of the time but we can get in troubles with
> > non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY)
> > on large machines where too aggressive batching might lead to soft
> > lockups during process exit path (exit_mmap) because there are no
> > scheduling points down the free_pages_and_swap_cache path and so the
> > freeing can take long enough to trigger the soft lockup.
> > 
> > The lockup is harmless except when the system is setup to panic on
> > softlockup which is not that unusual.
> > 
> > The simplest way to work around this issue is to limit the maximum
> > number of batches in a single mmu_gather for !CONFIG_PREEMPT kernels.
> > Let's use 1G of resident memory for the limit for now. This shouldn't
> > make the batching less effective and it shouldn't trigger lockups as
> > well because freeing 262144 should be OK.
> > 
> > ...
> >
> > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> > index ed6642a..5843f59 100644
> > --- a/include/asm-generic/tlb.h
> > +++ b/include/asm-generic/tlb.h
> > @@ -78,6 +78,19 @@ struct mmu_gather_batch {
> >  #define MAX_GATHER_BATCH	\
> >  	((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
> >  
> > +/*
> > + * Limit the maximum number of mmu_gather batches for non-preemptible kernels
> > + * to reduce a risk of soft lockups on huge machines when a lot of memory is
> > + * zapped during unmapping.
> > + * 1GB of resident memory should be safe to free up at once even without
> > + * explicit preemption point.
> > + */
> > +#if defined(CONFIG_PREEMPT_COUNT)
> > +#define MAX_GATHER_BATCH_COUNT	(UINT_MAX)
> > +#else
> > +#define MAX_GATHER_BATCH_COUNT	(((1UL<<(30-PAGE_SHIFT))/MAX_GATHER_BATCH))
> 
> Geeze.  I spent waaaaay too long staring at that expression trying to
> work out "how many pages is in a batch" and gave up.
> 

1G.

> Realistically, I don't think we need to worry about CONFIG_PREEMPT here
> - if we just limit the thing to, say, 64k pages per batch then that
> will be OK for preemptible and non-preemptible kernels.  The
> performance difference between "64k" and "infinite" will be miniscule
> and unmeasurable.
> 

That was my fault due to a private conversation. Michal originally had
a fixed counter that was commented to be related to address space size
on x86-64. I felt if it was based on address space size then it should be
expressed in terms of PAGE_SIZE. It really is about the number of TLB flush
operations though and a fixed counter works. I'm happy either way but the
comment should not mention address space size if it's a fixed counter.

> Also, the batch count should be independent of PAGE_SIZE.  Because
> PAGE_SIZE can vary by a factor of 16 and you don't want to fix the
> problem on 4k page size but leave it broken on 64k page size.
> 
> Also, while the patch might prevent softlockup warnings, the kernel
> will still exhibit large latency glitches and those are undesirable.
> 
> Also, does this patch actually work?  It doesn't add a scheduling
> point.  It assumes that by returning zero from tlb_next_batch(), the
> process will back out to some point where it hits a cond_resched()?
> 

I expected it to work for two reasons.

1. returning here hits the cond_resched() in zap_pmd_range()
2. The original soft lockup was in tlb_finish_mmu and this patch should
   limit the amount of work that thing has to do

I didn't test it though.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH v2] mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT
Date: Thu, 20 Dec 2012 10:24:38 +0000	[thread overview]
Message-ID: <20121220102438.GA10819@suse.de> (raw)
In-Reply-To: <20121219131316.7d13fcb1.akpm@linux-foundation.org>

On Wed, Dec 19, 2012 at 01:13:16PM -0800, Andrew Morton wrote:
> On Wed, 19 Dec 2012 16:04:37 +0100
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Since e303297 (mm: extended batches for generic mmu_gather) we are batching
> > pages to be freed until either tlb_next_batch cannot allocate a new batch or we
> > are done.
> > 
> > This works just fine most of the time but we can get in troubles with
> > non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY)
> > on large machines where too aggressive batching might lead to soft
> > lockups during process exit path (exit_mmap) because there are no
> > scheduling points down the free_pages_and_swap_cache path and so the
> > freeing can take long enough to trigger the soft lockup.
> > 
> > The lockup is harmless except when the system is setup to panic on
> > softlockup which is not that unusual.
> > 
> > The simplest way to work around this issue is to limit the maximum
> > number of batches in a single mmu_gather for !CONFIG_PREEMPT kernels.
> > Let's use 1G of resident memory for the limit for now. This shouldn't
> > make the batching less effective and it shouldn't trigger lockups as
> > well because freeing 262144 should be OK.
> > 
> > ...
> >
> > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> > index ed6642a..5843f59 100644
> > --- a/include/asm-generic/tlb.h
> > +++ b/include/asm-generic/tlb.h
> > @@ -78,6 +78,19 @@ struct mmu_gather_batch {
> >  #define MAX_GATHER_BATCH	\
> >  	((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
> >  
> > +/*
> > + * Limit the maximum number of mmu_gather batches for non-preemptible kernels
> > + * to reduce a risk of soft lockups on huge machines when a lot of memory is
> > + * zapped during unmapping.
> > + * 1GB of resident memory should be safe to free up at once even without
> > + * explicit preemption point.
> > + */
> > +#if defined(CONFIG_PREEMPT_COUNT)
> > +#define MAX_GATHER_BATCH_COUNT	(UINT_MAX)
> > +#else
> > +#define MAX_GATHER_BATCH_COUNT	(((1UL<<(30-PAGE_SHIFT))/MAX_GATHER_BATCH))
> 
> Geeze.  I spent waaaaay too long staring at that expression trying to
> work out "how many pages is in a batch" and gave up.
> 

1G.

> Realistically, I don't think we need to worry about CONFIG_PREEMPT here
> - if we just limit the thing to, say, 64k pages per batch then that
> will be OK for preemptible and non-preemptible kernels.  The
> performance difference between "64k" and "infinite" will be miniscule
> and unmeasurable.
> 

That was my fault due to a private conversation. Michal originally had
a fixed counter that was commented to be related to address space size
on x86-64. I felt if it was based on address space size then it should be
expressed in terms of PAGE_SIZE. It really is about the number of TLB flush
operations though and a fixed counter works. I'm happy either way but the
comment should not mention address space size if it's a fixed counter.

> Also, the batch count should be independent of PAGE_SIZE.  Because
> PAGE_SIZE can vary by a factor of 16 and you don't want to fix the
> problem on 4k page size but leave it broken on 64k page size.
> 
> Also, while the patch might prevent softlockup warnings, the kernel
> will still exhibit large latency glitches and those are undesirable.
> 
> Also, does this patch actually work?  It doesn't add a scheduling
> point.  It assumes that by returning zero from tlb_next_batch(), the
> process will back out to some point where it hits a cond_resched()?
> 

I expected it to work for two reasons.

1. returning here hits the cond_resched() in zap_pmd_range()
2. The original soft lockup was in tlb_finish_mmu and this patch should
   limit the amount of work that thing has to do

I didn't test it though.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2012-12-20 10:24 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-18 16:11 [PATCH] mm: cond_resched in tlb_flush_mmu to fix soft lockups on !CONFIG_PREEMPT Michal Hocko
2012-12-18 16:11 ` Michal Hocko
2012-12-18 18:01 ` Rik van Riel
2012-12-18 18:01   ` Rik van Riel
2012-12-18 22:02 ` Andrew Morton
2012-12-18 22:02   ` Andrew Morton
2012-12-18 23:50   ` Michal Hocko
2012-12-18 23:50     ` Michal Hocko
2012-12-19  0:00     ` Andrew Morton
2012-12-19  0:00       ` Andrew Morton
2012-12-19 15:04       ` [PATCH v2] mm: limit mmu_gather batching " Michal Hocko
2012-12-19 15:04         ` Michal Hocko
2012-12-19 21:13         ` Andrew Morton
2012-12-19 21:13           ` Andrew Morton
2012-12-20 10:24           ` Mel Gorman [this message]
2012-12-20 10:24             ` Mel Gorman
2012-12-20 12:47           ` Michal Hocko
2012-12-20 12:47             ` Michal Hocko
2012-12-20 20:27             ` Andrew Morton
2012-12-20 20:27               ` Andrew Morton
2012-12-20 22:36               ` [PATCH v3] " Michal Hocko
2012-12-20 22:36                 ` Michal Hocko
2012-12-21  8:09                 ` Michal Hocko
2012-12-21  8:09                   ` Michal Hocko
2013-04-27  7:50 ` [PATCH] mm: cond_resched in tlb_flush_mmu " Simon Jeons
2013-04-27  7:50   ` Simon Jeons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121220102438.GA10819@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.