All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH] mm: cond_resched in tlb_flush_mmu to fix soft lockups on !CONFIG_PREEMPT
Date: Tue, 18 Dec 2012 16:00:30 -0800	[thread overview]
Message-ID: <20121218160030.baf723aa.akpm@linux-foundation.org> (raw)
In-Reply-To: <20121218235042.GA10350@dhcp22.suse.cz>

On Wed, 19 Dec 2012 00:50:42 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> On Tue 18-12-12 14:02:19, Andrew Morton wrote:
> > On Tue, 18 Dec 2012 17:11:28 +0100
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > Since e303297 (mm: extended batches for generic mmu_gather) we are batching
> > > pages to be freed until either tlb_next_batch cannot allocate a new batch or we
> > > are done.
> > > 
> > > This works just fine most of the time but we can get in troubles with
> > > non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY) on
> > > large machines where too aggressive batching might lead to soft lockups during
> > > process exit path (exit_mmap) because there are no scheduling points down the
> > > free_pages_and_swap_cache path and so the freeing can take long enough to
> > > trigger the soft lockup.
> > > 
> > > The lockup is harmless except when the system is setup to panic on
> > > softlockup which is not that unusual.
> > > 
> > > The simplest way to work around this issue is to explicitly cond_resched per
> > > batch in tlb_flush_mmu (1020 pages on x86_64).
> > > 
> > > ...
> > >
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -239,6 +239,7 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
> > >  	for (batch = &tlb->local; batch; batch = batch->next) {
> > >  		free_pages_and_swap_cache(batch->pages, batch->nr);
> > >  		batch->nr = 0;
> > > +		cond_resched();
> > >  	}
> > >  	tlb->active = &tlb->local;
> > >  }
> > 
> > tlb_flush_mmu() has a large number of callsites (or callsites which
> > call callers, etc), many in arch code.  It's not at all obvious that
> > tlb_flush_mmu() is never called from under spinlock?
> 
> free_pages_and_swap_cache calls lru_add_drain which in turn calls
> put_cpu (aka preempt_enable) which is a scheduling point for
> CONFIG_PREEMPT.

No, that inference doesn't work.  Because preempt_enable() inside
spinlock is OK - it will not call schedule() because
current->preempt_count is still elevated (by spin_lock).

> There are more down the call chain probably. None of
> them for non-preempt kernel.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH] mm: cond_resched in tlb_flush_mmu to fix soft lockups on !CONFIG_PREEMPT
Date: Tue, 18 Dec 2012 16:00:30 -0800	[thread overview]
Message-ID: <20121218160030.baf723aa.akpm@linux-foundation.org> (raw)
In-Reply-To: <20121218235042.GA10350@dhcp22.suse.cz>

On Wed, 19 Dec 2012 00:50:42 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> On Tue 18-12-12 14:02:19, Andrew Morton wrote:
> > On Tue, 18 Dec 2012 17:11:28 +0100
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > Since e303297 (mm: extended batches for generic mmu_gather) we are batching
> > > pages to be freed until either tlb_next_batch cannot allocate a new batch or we
> > > are done.
> > > 
> > > This works just fine most of the time but we can get in troubles with
> > > non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY) on
> > > large machines where too aggressive batching might lead to soft lockups during
> > > process exit path (exit_mmap) because there are no scheduling points down the
> > > free_pages_and_swap_cache path and so the freeing can take long enough to
> > > trigger the soft lockup.
> > > 
> > > The lockup is harmless except when the system is setup to panic on
> > > softlockup which is not that unusual.
> > > 
> > > The simplest way to work around this issue is to explicitly cond_resched per
> > > batch in tlb_flush_mmu (1020 pages on x86_64).
> > > 
> > > ...
> > >
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -239,6 +239,7 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
> > >  	for (batch = &tlb->local; batch; batch = batch->next) {
> > >  		free_pages_and_swap_cache(batch->pages, batch->nr);
> > >  		batch->nr = 0;
> > > +		cond_resched();
> > >  	}
> > >  	tlb->active = &tlb->local;
> > >  }
> > 
> > tlb_flush_mmu() has a large number of callsites (or callsites which
> > call callers, etc), many in arch code.  It's not at all obvious that
> > tlb_flush_mmu() is never called from under spinlock?
> 
> free_pages_and_swap_cache calls lru_add_drain which in turn calls
> put_cpu (aka preempt_enable) which is a scheduling point for
> CONFIG_PREEMPT.

No, that inference doesn't work.  Because preempt_enable() inside
spinlock is OK - it will not call schedule() because
current->preempt_count is still elevated (by spin_lock).

> There are more down the call chain probably. None of
> them for non-preempt kernel.


  reply	other threads:[~2012-12-19  0:00 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-18 16:11 [PATCH] mm: cond_resched in tlb_flush_mmu to fix soft lockups on !CONFIG_PREEMPT Michal Hocko
2012-12-18 16:11 ` Michal Hocko
2012-12-18 18:01 ` Rik van Riel
2012-12-18 18:01   ` Rik van Riel
2012-12-18 22:02 ` Andrew Morton
2012-12-18 22:02   ` Andrew Morton
2012-12-18 23:50   ` Michal Hocko
2012-12-18 23:50     ` Michal Hocko
2012-12-19  0:00     ` Andrew Morton [this message]
2012-12-19  0:00       ` Andrew Morton
2012-12-19 15:04       ` [PATCH v2] mm: limit mmu_gather batching " Michal Hocko
2012-12-19 15:04         ` Michal Hocko
2012-12-19 21:13         ` Andrew Morton
2012-12-19 21:13           ` Andrew Morton
2012-12-20 10:24           ` Mel Gorman
2012-12-20 10:24             ` Mel Gorman
2012-12-20 12:47           ` Michal Hocko
2012-12-20 12:47             ` Michal Hocko
2012-12-20 20:27             ` Andrew Morton
2012-12-20 20:27               ` Andrew Morton
2012-12-20 22:36               ` [PATCH v3] " Michal Hocko
2012-12-20 22:36                 ` Michal Hocko
2012-12-21  8:09                 ` Michal Hocko
2012-12-21  8:09                   ` Michal Hocko
2013-04-27  7:50 ` [PATCH] mm: cond_resched in tlb_flush_mmu " Simon Jeons
2013-04-27  7:50   ` Simon Jeons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121218160030.baf723aa.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.