* [PATCH v2] mm: Fix assertion mapping->nrpages == 0 in end_writeback()
@ 2011-06-15 15:37 Jan Kara
2011-06-21 0:18 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2011-06-15 15:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, Miklos Szeredi, Jay, Jan Kara, Miklos Szeredi
Under heavy memory and filesystem load, users observe the assertion
mapping->nrpages == 0 in end_writeback() trigger. This can be caused
by page reclaim reclaiming the last page from a mapping in the following
race:
CPU0 CPU1
...
shrink_page_list()
__remove_mapping()
__delete_from_page_cache()
radix_tree_delete()
evict_inode()
truncate_inode_pages()
truncate_inode_pages_range()
pagevec_lookup() - finds nothing
end_writeback()
mapping->nrpages != 0 -> BUG
page->mapping = NULL
mapping->nrpages--
Fix the problem by doing a reliable check of mapping->nrpages under
mapping->tree_lock in end_writeback().
Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug
out by Miklos Szeredi <mszeredi@suse.de>.
CC: Jay <jinshan.xiong@whamcloud.com>
CC: Miklos Szeredi <mszeredi@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/inode.c | 7 +++++++
include/linux/fs.h | 1 +
mm/truncate.c | 5 +++++
3 files changed, 13 insertions(+), 0 deletions(-)
Andrew, does this look better?
diff --git a/fs/inode.c b/fs/inode.c
index 33c963d..1133cb0 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -467,7 +467,14 @@ EXPORT_SYMBOL(remove_inode_hash);
void end_writeback(struct inode *inode)
{
might_sleep();
+ /*
+ * We have to cycle tree_lock here because reclaim can be still in the
+ * process of removing the last page (in __delete_from_page_cache())
+ * and we must not free mapping under it.
+ */
+ spin_lock(&inode->i_data.tree_lock);
BUG_ON(inode->i_data.nrpages);
+ spin_unlock(&inode->i_data.tree_lock);
BUG_ON(!list_empty(&inode->i_data.private_list));
BUG_ON(!(inode->i_state & I_FREEING));
BUG_ON(inode->i_state & I_CLEAR);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cdf9495..1a9375b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -636,6 +636,7 @@ struct address_space {
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
spinlock_t i_mmap_lock; /* protect tree, count, list */
unsigned int truncate_count; /* Cover race condition with truncate */
+ /* protected by tree_lock together with the radix tree */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops; /* methods */
diff --git a/mm/truncate.c b/mm/truncate.c
index a956675..499d6ab 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -300,6 +300,11 @@ EXPORT_SYMBOL(truncate_inode_pages_range);
* @lstart: offset from which to truncate
*
* Called under (and serialised by) inode->i_mutex.
+ *
+ * Note: When this function returns, there can be a page in the process of
+ * deletion (inside __delete_from_page_cache()) in the specified range. Thus
+ * mapping->nrpages can be non-zero when this function returns even after
+ * truncation of the whole mapping.
*/
void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
{
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] mm: Fix assertion mapping->nrpages == 0 in end_writeback()
2011-06-15 15:37 [PATCH v2] mm: Fix assertion mapping->nrpages == 0 in end_writeback() Jan Kara
@ 2011-06-21 0:18 ` Andrew Morton
2011-06-21 6:47 ` Miklos Szeredi
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2011-06-21 0:18 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-mm, Miklos Szeredi, Jay, Miklos Szeredi
On Wed, 15 Jun 2011 17:37:13 +0200
Jan Kara <jack@suse.cz> wrote:
> Under heavy memory and filesystem load, users observe the assertion
> mapping->nrpages == 0 in end_writeback() trigger. This can be caused
> by page reclaim reclaiming the last page from a mapping in the following
> race:
> CPU0 CPU1
> ...
> shrink_page_list()
> __remove_mapping()
> __delete_from_page_cache()
> radix_tree_delete()
> evict_inode()
> truncate_inode_pages()
> truncate_inode_pages_range()
> pagevec_lookup() - finds nothing
> end_writeback()
> mapping->nrpages != 0 -> BUG
> page->mapping = NULL
> mapping->nrpages--
>
> Fix the problem by doing a reliable check of mapping->nrpages under
> mapping->tree_lock in end_writeback().
>
> Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug
> out by Miklos Szeredi <mszeredi@suse.de>.
>
> CC: Jay <jinshan.xiong@whamcloud.com>
> CC: Miklos Szeredi <mszeredi@suse.de>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/inode.c | 7 +++++++
> include/linux/fs.h | 1 +
> mm/truncate.c | 5 +++++
> 3 files changed, 13 insertions(+), 0 deletions(-)
>
> Andrew, does this look better?
spose so.
> diff --git a/fs/inode.c b/fs/inode.c
> index 33c963d..1133cb0 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -467,7 +467,14 @@ EXPORT_SYMBOL(remove_inode_hash);
> void end_writeback(struct inode *inode)
> {
> might_sleep();
> + /*
> + * We have to cycle tree_lock here because reclaim can be still in the
> + * process of removing the last page (in __delete_from_page_cache())
> + * and we must not free mapping under it.
> + */
> + spin_lock(&inode->i_data.tree_lock);
> BUG_ON(inode->i_data.nrpages);
> + spin_unlock(&inode->i_data.tree_lock);
That's an expensive assertion. We might want to wrap all this in
CONFIG_DEBUG_VM.
Or we could do
if (unlikely(inode->i_data.nrpages)) {
/* comment goes here */
spin_lock(&inode->i_data.tree_lock);
BUG_ON(inode->i_data.nrpages);
spin_unlock(&inode->i_data.tree_lock);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] mm: Fix assertion mapping->nrpages == 0 in end_writeback()
2011-06-21 0:18 ` Andrew Morton
@ 2011-06-21 6:47 ` Miklos Szeredi
2011-06-21 19:23 ` Jan Kara
0 siblings, 1 reply; 4+ messages in thread
From: Miklos Szeredi @ 2011-06-21 6:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jan Kara, linux-mm, Jay, Miklos Szeredi
On Mon, 2011-06-20 at 17:18 -0700, Andrew Morton wrote:
> On Wed, 15 Jun 2011 17:37:13 +0200
> Jan Kara <jack@suse.cz> wrote:
>
> > Under heavy memory and filesystem load, users observe the assertion
> > mapping->nrpages == 0 in end_writeback() trigger. This can be caused
> > by page reclaim reclaiming the last page from a mapping in the following
> > race:
> > CPU0 CPU1
> > ...
> > shrink_page_list()
> > __remove_mapping()
> > __delete_from_page_cache()
> > radix_tree_delete()
> > evict_inode()
> > truncate_inode_pages()
> > truncate_inode_pages_range()
> > pagevec_lookup() - finds nothing
> > end_writeback()
> > mapping->nrpages != 0 -> BUG
> > page->mapping = NULL
> > mapping->nrpages--
> >
> > Fix the problem by doing a reliable check of mapping->nrpages under
> > mapping->tree_lock in end_writeback().
> >
> > Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug
> > out by Miklos Szeredi <mszeredi@suse.de>.
> >
> > CC: Jay <jinshan.xiong@whamcloud.com>
> > CC: Miklos Szeredi <mszeredi@suse.de>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/inode.c | 7 +++++++
> > include/linux/fs.h | 1 +
> > mm/truncate.c | 5 +++++
> > 3 files changed, 13 insertions(+), 0 deletions(-)
> >
> > Andrew, does this look better?
>
> spose so.
>
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 33c963d..1133cb0 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -467,7 +467,14 @@ EXPORT_SYMBOL(remove_inode_hash);
> > void end_writeback(struct inode *inode)
> > {
> > might_sleep();
> > + /*
> > + * We have to cycle tree_lock here because reclaim can be still in the
> > + * process of removing the last page (in __delete_from_page_cache())
> > + * and we must not free mapping under it.
> > + */
> > + spin_lock(&inode->i_data.tree_lock);
> > BUG_ON(inode->i_data.nrpages);
> > + spin_unlock(&inode->i_data.tree_lock);
>
> That's an expensive assertion. We might want to wrap all this in
> CONFIG_DEBUG_VM.
>
> Or we could do
>
> if (unlikely(inode->i_data.nrpages)) {
> /* comment goes here */
> spin_lock(&inode->i_data.tree_lock);
> BUG_ON(inode->i_data.nrpages);
> spin_unlock(&inode->i_data.tree_lock);
> }
>
It's not *just* the assertion that needs locking. Suppose that we are
in __remove_mapping() just before the
spin_unlock_irq(&mapping->tree_lock) and the inode is freed along with
the mapping at that point in evict(). In that case the spin_unlock
would be touching freed memory.
truncate_inode_pages() used to synchronize page reclaim with inode
eviction, but now that synchronization is gone.
Thanks,
Miklos
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] mm: Fix assertion mapping->nrpages == 0 in end_writeback()
2011-06-21 6:47 ` Miklos Szeredi
@ 2011-06-21 19:23 ` Jan Kara
0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2011-06-21 19:23 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Andrew Morton, Jan Kara, linux-mm, Jay, Miklos Szeredi
On Tue 21-06-11 08:47:49, Miklos Szeredi wrote:
> On Mon, 2011-06-20 at 17:18 -0700, Andrew Morton wrote:
> > On Wed, 15 Jun 2011 17:37:13 +0200
> > Jan Kara <jack@suse.cz> wrote:
> >
> > > Under heavy memory and filesystem load, users observe the assertion
> > > mapping->nrpages == 0 in end_writeback() trigger. This can be caused
> > > by page reclaim reclaiming the last page from a mapping in the following
> > > race:
> > > CPU0 CPU1
> > > ...
> > > shrink_page_list()
> > > __remove_mapping()
> > > __delete_from_page_cache()
> > > radix_tree_delete()
> > > evict_inode()
> > > truncate_inode_pages()
> > > truncate_inode_pages_range()
> > > pagevec_lookup() - finds nothing
> > > end_writeback()
> > > mapping->nrpages != 0 -> BUG
> > > page->mapping = NULL
> > > mapping->nrpages--
> > >
> > > Fix the problem by doing a reliable check of mapping->nrpages under
> > > mapping->tree_lock in end_writeback().
> > >
> > > Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug
> > > out by Miklos Szeredi <mszeredi@suse.de>.
> > >
> > > CC: Jay <jinshan.xiong@whamcloud.com>
> > > CC: Miklos Szeredi <mszeredi@suse.de>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > > fs/inode.c | 7 +++++++
> > > include/linux/fs.h | 1 +
> > > mm/truncate.c | 5 +++++
> > > 3 files changed, 13 insertions(+), 0 deletions(-)
> > >
> > > Andrew, does this look better?
> >
> > spose so.
> >
> > > diff --git a/fs/inode.c b/fs/inode.c
> > > index 33c963d..1133cb0 100644
> > > --- a/fs/inode.c
> > > +++ b/fs/inode.c
> > > @@ -467,7 +467,14 @@ EXPORT_SYMBOL(remove_inode_hash);
> > > void end_writeback(struct inode *inode)
> > > {
> > > might_sleep();
> > > + /*
> > > + * We have to cycle tree_lock here because reclaim can be still in the
> > > + * process of removing the last page (in __delete_from_page_cache())
> > > + * and we must not free mapping under it.
> > > + */
> > > + spin_lock(&inode->i_data.tree_lock);
> > > BUG_ON(inode->i_data.nrpages);
> > > + spin_unlock(&inode->i_data.tree_lock);
> >
> > That's an expensive assertion. We might want to wrap all this in
> > CONFIG_DEBUG_VM.
> >
> > Or we could do
> >
> > if (unlikely(inode->i_data.nrpages)) {
> > /* comment goes here */
> > spin_lock(&inode->i_data.tree_lock);
> > BUG_ON(inode->i_data.nrpages);
> > spin_unlock(&inode->i_data.tree_lock);
> > }
> >
>
> It's not *just* the assertion that needs locking. Suppose that we are
> in __remove_mapping() just before the
> spin_unlock_irq(&mapping->tree_lock) and the inode is freed along with
> the mapping at that point in evict(). In that case the spin_unlock
> would be touching freed memory.
Exactly. That's why I added the comment before the spin_lock. If it's not
clear enough, feel free to change it to a better wording.
> truncate_inode_pages() used to synchronize page reclaim with inode
> eviction, but now that synchronization is gone.
Yes, but strictly speaking I don't see a real reason for synchronization
other then evict_inode() (the page that is in __delete_from_page_cache()
is really almost dead and page->mapping is NULL for anyone who reliably
reads that). So I think synchronizing in evict_inode() should be OK.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-06-21 19:23 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-15 15:37 [PATCH v2] mm: Fix assertion mapping->nrpages == 0 in end_writeback() Jan Kara
2011-06-21 0:18 ` Andrew Morton
2011-06-21 6:47 ` Miklos Szeredi
2011-06-21 19:23 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).