Re: kernel BUG at mm/truncate.c:475!

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Michael Leun <lkml20101129@newton.leun.net>,
	hughd@google.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: kernel BUG at mm/truncate.c:475!
Date: Mon, 13 Dec 2010 14:20:59 -0800	[thread overview]
Message-ID: <20101213142059.643f8080.akpm@linux-foundation.org> (raw)
In-Reply-To: <E1PRQDn-0007jZ-5S@pomaz-ex.szeredi.hu>

On Sat, 11 Dec 2010 15:14:47 +0100
Miklos Szeredi <miklos@szeredi.hu> wrote:

> On Mon, 6 Dec 2010, Michael Leun wrote:
> > At the moment I'm trying to create an easy to reproduce scenario.
> > 
> 
> I've managed to reproduce the BUG.  First I thought it has to do with
> fork() racing with invalidate_inode_pages2_range() but it turns out,
> just two parallel invocation of invalidate_inode_pages2_range() with
> some page faults going on can trigger it.
> 
> The problem is: unmap_mapping_range() is not prepared for more than
> one concurrent invocation per inode.  For example:
> 
>   thread1: going through a big range, stops in the middle of a vma and
>      stores the restart address in vm_truncate_count.
> 
>   thread2: comes in with a small (e.g. single page) unmap request on
>      the same vma, somewhere before restart_address, finds that the

"restart_addr", please.

>      vma was already unmapped up to the restart address and happily
>      returns without doing anything.
> 
> Another scenario would be two big unmap requests, both having to
> restart the unmapping and each one setting vm_truncate_count to its
> own value.  This could go on forever without any of them being able to
> finish.
> 
> Truncate and hole punching already serialize with i_mutex.  Other
> callers of unmap_mapping_range() do not, however, and I see difficulty
> with doing it in the callers.  I think the proper solution is to add
> serialization to unmap_mapping_range() itself.
> 
> Attached patch attempts to do this without adding more fields to
> struct address_space.  It fixes the bug in my testing.
> 

That's a pretty old bug, isn't it?  5+ years.

> 
> 
> ---
>  include/linux/pagemap.h |    1 +
>  mm/memory.c             |   14 ++++++++++++++
>  2 files changed, 15 insertions(+)
> 
> Index: linux.git/include/linux/pagemap.h
> ===================================================================
> --- linux.git.orig/include/linux/pagemap.h	2010-11-26 10:52:17.000000000 +0100
> +++ linux.git/include/linux/pagemap.h	2010-12-11 13:39:32.000000000 +0100
> @@ -24,6 +24,7 @@ enum mapping_flags {
>  	AS_ENOSPC	= __GFP_BITS_SHIFT + 1,	/* ENOSPC on async write */
>  	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> +	AS_UNMAPPING	= __GFP_BITS_SHIFT + 4, /* for unmap_mapping_range() */
>  };
>  
>  static inline void mapping_set_error(struct address_space *mapping, int error)
> Index: linux.git/mm/memory.c
> ===================================================================
> --- linux.git.orig/mm/memory.c	2010-12-11 13:07:28.000000000 +0100
> +++ linux.git/mm/memory.c	2010-12-11 14:09:42.000000000 +0100
> @@ -2535,6 +2535,12 @@ static inline void unmap_mapping_range_l
>  	}
>  }
>  
> +static int mapping_sleep(void *x)
> +{
> +	schedule();
> +	return 0;
> +}
> +
>  /**
>   * unmap_mapping_range - unmap the portion of all mmaps in the specified address_space corresponding to the specified page range in the underlying file.
>   * @mapping: the address space containing mmaps to be unmapped.
> @@ -2572,6 +2578,9 @@ void unmap_mapping_range(struct address_
>  		details.last_index = ULONG_MAX;
>  	details.i_mmap_lock = &mapping->i_mmap_lock;
>  
> +	wait_on_bit_lock(&mapping->flags, AS_UNMAPPING, mapping_sleep,
> +			 TASK_UNINTERRUPTIBLE);
> +
>  	spin_lock(&mapping->i_mmap_lock);
>  
>  	/* Protect against endless unmapping loops */
> @@ -2588,6 +2597,11 @@ void unmap_mapping_range(struct address_
>  	if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
>  		unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
>  	spin_unlock(&mapping->i_mmap_lock);
> +
> +	clear_bit_unlock(AS_UNMAPPING, &mapping->flags);
> +	smp_mb__after_clear_bit();
> +	wake_up_bit(&mapping->flags, AS_UNMAPPING);
> +

I do think this was premature optimisation.  The open-coded lock is
hidden from lockdep so we won't find out if this introduces potential
deadlocks.  It would be better to add a new mutex at least temporarily,
then look at replacing it with a MiklosLock later on, when the code is
bedded in.

At which time, replacing mutexes with MiklosLocks becomes part of a
general "shrink the address_space" exercise in which there's no reason
to exclusively concentrate on that new mutex!


How hard is it to avoid adding a new lock and using an existing one,
presumablt i_mutex?  Because if we can get i_mutex coverage over
unmap_mapping_range() then I suspect all the
vm_truncate_count/restart_addr stuff can go away?

WARNING: multiple messages have this Message-ID (diff)

From: Andrew Morton <akpm@linux-foundation.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Michael Leun <lkml20101129@newton.leun.net>,
	hughd@google.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: kernel BUG at mm/truncate.c:475!
Date: Mon, 13 Dec 2010 14:20:59 -0800	[thread overview]
Message-ID: <20101213142059.643f8080.akpm@linux-foundation.org> (raw)
In-Reply-To: <E1PRQDn-0007jZ-5S@pomaz-ex.szeredi.hu>

On Sat, 11 Dec 2010 15:14:47 +0100
Miklos Szeredi <miklos@szeredi.hu> wrote:

> On Mon, 6 Dec 2010, Michael Leun wrote:
> > At the moment I'm trying to create an easy to reproduce scenario.
> > 
> 
> I've managed to reproduce the BUG.  First I thought it has to do with
> fork() racing with invalidate_inode_pages2_range() but it turns out,
> just two parallel invocation of invalidate_inode_pages2_range() with
> some page faults going on can trigger it.
> 
> The problem is: unmap_mapping_range() is not prepared for more than
> one concurrent invocation per inode.  For example:
> 
>   thread1: going through a big range, stops in the middle of a vma and
>      stores the restart address in vm_truncate_count.
> 
>   thread2: comes in with a small (e.g. single page) unmap request on
>      the same vma, somewhere before restart_address, finds that the

"restart_addr", please.

>      vma was already unmapped up to the restart address and happily
>      returns without doing anything.
> 
> Another scenario would be two big unmap requests, both having to
> restart the unmapping and each one setting vm_truncate_count to its
> own value.  This could go on forever without any of them being able to
> finish.
> 
> Truncate and hole punching already serialize with i_mutex.  Other
> callers of unmap_mapping_range() do not, however, and I see difficulty
> with doing it in the callers.  I think the proper solution is to add
> serialization to unmap_mapping_range() itself.
> 
> Attached patch attempts to do this without adding more fields to
> struct address_space.  It fixes the bug in my testing.
> 

That's a pretty old bug, isn't it?  5+ years.

> 
> 
> ---
>  include/linux/pagemap.h |    1 +
>  mm/memory.c             |   14 ++++++++++++++
>  2 files changed, 15 insertions(+)
> 
> Index: linux.git/include/linux/pagemap.h
> ===================================================================
> --- linux.git.orig/include/linux/pagemap.h	2010-11-26 10:52:17.000000000 +0100
> +++ linux.git/include/linux/pagemap.h	2010-12-11 13:39:32.000000000 +0100
> @@ -24,6 +24,7 @@ enum mapping_flags {
>  	AS_ENOSPC	= __GFP_BITS_SHIFT + 1,	/* ENOSPC on async write */
>  	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> +	AS_UNMAPPING	= __GFP_BITS_SHIFT + 4, /* for unmap_mapping_range() */
>  };
>  
>  static inline void mapping_set_error(struct address_space *mapping, int error)
> Index: linux.git/mm/memory.c
> ===================================================================
> --- linux.git.orig/mm/memory.c	2010-12-11 13:07:28.000000000 +0100
> +++ linux.git/mm/memory.c	2010-12-11 14:09:42.000000000 +0100
> @@ -2535,6 +2535,12 @@ static inline void unmap_mapping_range_l
>  	}
>  }
>  
> +static int mapping_sleep(void *x)
> +{
> +	schedule();
> +	return 0;
> +}
> +
>  /**
>   * unmap_mapping_range - unmap the portion of all mmaps in the specified address_space corresponding to the specified page range in the underlying file.
>   * @mapping: the address space containing mmaps to be unmapped.
> @@ -2572,6 +2578,9 @@ void unmap_mapping_range(struct address_
>  		details.last_index = ULONG_MAX;
>  	details.i_mmap_lock = &mapping->i_mmap_lock;
>  
> +	wait_on_bit_lock(&mapping->flags, AS_UNMAPPING, mapping_sleep,
> +			 TASK_UNINTERRUPTIBLE);
> +
>  	spin_lock(&mapping->i_mmap_lock);
>  
>  	/* Protect against endless unmapping loops */
> @@ -2588,6 +2597,11 @@ void unmap_mapping_range(struct address_
>  	if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
>  		unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
>  	spin_unlock(&mapping->i_mmap_lock);
> +
> +	clear_bit_unlock(AS_UNMAPPING, &mapping->flags);
> +	smp_mb__after_clear_bit();
> +	wake_up_bit(&mapping->flags, AS_UNMAPPING);
> +

I do think this was premature optimisation.  The open-coded lock is
hidden from lockdep so we won't find out if this introduces potential
deadlocks.  It would be better to add a new mutex at least temporarily,
then look at replacing it with a MiklosLock later on, when the code is
bedded in.

At which time, replacing mutexes with MiklosLocks becomes part of a
general "shrink the address_space" exercise in which there's no reason
to exclusively concentrate on that new mutex!


How hard is it to avoid adding a new lock and using an existing one,
presumablt i_mutex?  Because if we can get i_mutex coverage over
unmap_mapping_range() then I suspect all the
vm_truncate_count/restart_addr stuff can go away?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-12-13 22:21 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-30 18:49 kernel BUG at mm/truncate.c:475! Michael Leun
2010-11-30 23:00 ` Hugh Dickins
2010-11-30 23:00   ` Hugh Dickins
2010-12-01 10:25   ` Miklos Szeredi
2010-12-01 10:25     ` Miklos Szeredi
2010-12-01 11:45     ` Michael Leun
2010-12-01 11:45       ` Michael Leun
2010-12-01 17:22       ` Miklos Szeredi
2010-12-01 17:22         ` Miklos Szeredi
2010-12-02  7:41         ` Michael Leun
2010-12-02  7:41           ` Michael Leun
2010-12-02  8:15           ` Michael Leun
2010-12-02  8:15             ` Michael Leun
2010-12-02  9:42             ` Miklos Szeredi
2010-12-02  9:42               ` Miklos Szeredi
2010-12-02 10:57               ` Michael Leun
2010-12-02 10:57                 ` Michael Leun
2010-12-03  7:53                 ` Michael Leun
2010-12-03  7:53                   ` Michael Leun
2010-12-06 12:36                   ` Miklos Szeredi
2010-12-06 12:36                     ` Miklos Szeredi
2010-12-06 19:43                     ` Michael Leun
2010-12-06 19:43                       ` Michael Leun
2010-12-11 14:14                       ` Miklos Szeredi
2010-12-11 19:50                         ` Michael Leun
2010-12-11 19:50                           ` Michael Leun
2010-12-13 22:20                         ` Andrew Morton [this message]
2010-12-13 22:20                           ` Andrew Morton
2010-12-14  7:31                           ` Hugh Dickins
2010-12-14  7:31                             ` Hugh Dickins
2010-12-14  9:10                             ` Peter Zijlstra
2010-12-14  9:10                               ` Peter Zijlstra
2010-12-14  9:11                             ` Peter Zijlstra
2010-12-14  9:11                               ` Peter Zijlstra
2010-12-14 10:45                           ` Miklos Szeredi
2010-12-14 10:45                             ` Miklos Szeredi
2010-12-15  4:32                             ` Hugh Dickins
2010-12-15  4:32                               ` Hugh Dickins
2010-12-15 11:22                               ` Miklos Szeredi
2010-12-15 11:22                                 ` Miklos Szeredi
2010-12-16  0:26                                 ` Hugh Dickins
2010-12-16  0:26                                   ` Hugh Dickins
2010-12-16 14:50                               ` Robert Święcki
2010-12-16 14:50                                 ` Robert Święcki
2010-12-16 19:05                                 ` Hugh Dickins
2010-12-16 19:05                                   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101213142059.643f8080.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkml20101129@newton.leun.net \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.