All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-nvdimm@lists.01.org, Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	Christoph Hellwig <hch@lst.de>,
	stable@vger.kernel.org
Subject: Re: [PATCH 5/5] radix tree: fix multi-order iteration race
Date: Wed, 9 May 2018 09:09:38 -0600	[thread overview]
Message-ID: <20180509150938.GA3814@linux.intel.com> (raw)
In-Reply-To: <20180509124611.6hoa743z4qrx6bgc@quack2.suse.cz>

On Wed, May 09, 2018 at 02:46:11PM +0200, Jan Kara wrote:
> On Thu 03-05-18 13:24:30, Ross Zwisler wrote:
> > Fix a race in the multi-order iteration code which causes the kernel to hit
> > a GP fault.  This was first seen with a production v4.15 based kernel
> > (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD
> > DAX entries.
> > 
> > The race has to do with how we tear down multi-order sibling entries when
> > we are removing an item from the tree.  Remember for example that an order
> > 2 entry looks like this:
> > 
> > struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> > 
> > where 'entry' is in some slot in the struct radix_tree_node, and the three
> > slots following 'entry' contain sibling pointers which point back to
> > 'entry.'
> > 
> > When we delete 'entry' from the tree, we call :
> >   radix_tree_delete()
> >     radix_tree_delete_item()
> >       __radix_tree_delete()
> >         replace_slot()
> > 
> > replace_slot() first removes the siblings in order from the first to the
> > last, then at then replaces 'entry' with NULL.  This means that for a brief
> > period of time we end up with one or more of the siblings removed, so:
> > 
> > struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> > 
> > This causes an issue if you have a reader iterating over the slots in the
> > tree via radix_tree_for_each_slot() while only under
> > rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
> > mm/filemap.c.
> > 
> > The issue is that when __radix_tree_next_slot() => skip_siblings() tries to
> > skip over the sibling entries in the slots, it currently does so with an
> > exact match on the slot directly preceding our current slot.  Normally this
> > works:
> >                                     V preceding slot
> > struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> >                                             ^ current slot
> > 
> > This lets you find the first sibling, and you skip them all in order.
> > 
> > But in the case where one of the siblings is NULL, that slot is skipped and
> > then our sibling detection is interrupted:
> > 
> >                                            V preceding slot
> > struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> >                                                   ^ current slot
> > 
> > This means that the sibling pointers aren't recognized since they point all
> > the way back to 'entry', so we think that they are normal internal radix
> > tree pointers.  This causes us to think we need to walk down to a struct
> > radix_tree_node starting at the address of 'entry'.
> > 
> > In a real running kernel this will crash the thread with a GP fault when
> > you try and dereference the slots in your broken node starting at 'entry'.
> > 
> > We fix this race by fixing the way that skip_siblings() detects sibling
> > nodes.  Instead of testing against the preceding slot we instead look for
> > siblings via is_sibling_entry() which compares against the position of the
> > struct radix_tree_node.slots[] array.  This ensures that sibling entries
> > are properly identified, even if they are no longer contiguous with the
> > 'entry' they point to.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Reported-by: CR, Sapthagirish <sapthagirish.cr@intel.com>
> > Fixes: commit 148deab223b2 ("radix-tree: improve multiorder iterators")
> > Cc: <stable@vger.kernel.org>
> 
> Looks good to me. You can add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Thank you for the review, Jan.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Chinner <david@fromorbit.com>,
	linux-nvdimm@lists.01.org, stable@vger.kernel.org
Subject: Re: [PATCH 5/5] radix tree: fix multi-order iteration race
Date: Wed, 9 May 2018 09:09:38 -0600	[thread overview]
Message-ID: <20180509150938.GA3814@linux.intel.com> (raw)
In-Reply-To: <20180509124611.6hoa743z4qrx6bgc@quack2.suse.cz>

On Wed, May 09, 2018 at 02:46:11PM +0200, Jan Kara wrote:
> On Thu 03-05-18 13:24:30, Ross Zwisler wrote:
> > Fix a race in the multi-order iteration code which causes the kernel to hit
> > a GP fault.  This was first seen with a production v4.15 based kernel
> > (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD
> > DAX entries.
> > 
> > The race has to do with how we tear down multi-order sibling entries when
> > we are removing an item from the tree.  Remember for example that an order
> > 2 entry looks like this:
> > 
> > struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> > 
> > where 'entry' is in some slot in the struct radix_tree_node, and the three
> > slots following 'entry' contain sibling pointers which point back to
> > 'entry.'
> > 
> > When we delete 'entry' from the tree, we call :
> >   radix_tree_delete()
> >     radix_tree_delete_item()
> >       __radix_tree_delete()
> >         replace_slot()
> > 
> > replace_slot() first removes the siblings in order from the first to the
> > last, then at then replaces 'entry' with NULL.  This means that for a brief
> > period of time we end up with one or more of the siblings removed, so:
> > 
> > struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> > 
> > This causes an issue if you have a reader iterating over the slots in the
> > tree via radix_tree_for_each_slot() while only under
> > rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
> > mm/filemap.c.
> > 
> > The issue is that when __radix_tree_next_slot() => skip_siblings() tries to
> > skip over the sibling entries in the slots, it currently does so with an
> > exact match on the slot directly preceding our current slot.  Normally this
> > works:
> >                                     V preceding slot
> > struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> >                                             ^ current slot
> > 
> > This lets you find the first sibling, and you skip them all in order.
> > 
> > But in the case where one of the siblings is NULL, that slot is skipped and
> > then our sibling detection is interrupted:
> > 
> >                                            V preceding slot
> > struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> >                                                   ^ current slot
> > 
> > This means that the sibling pointers aren't recognized since they point all
> > the way back to 'entry', so we think that they are normal internal radix
> > tree pointers.  This causes us to think we need to walk down to a struct
> > radix_tree_node starting at the address of 'entry'.
> > 
> > In a real running kernel this will crash the thread with a GP fault when
> > you try and dereference the slots in your broken node starting at 'entry'.
> > 
> > We fix this race by fixing the way that skip_siblings() detects sibling
> > nodes.  Instead of testing against the preceding slot we instead look for
> > siblings via is_sibling_entry() which compares against the position of the
> > struct radix_tree_node.slots[] array.  This ensures that sibling entries
> > are properly identified, even if they are no longer contiguous with the
> > 'entry' they point to.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Reported-by: CR, Sapthagirish <sapthagirish.cr@intel.com>
> > Fixes: commit 148deab223b2 ("radix-tree: improve multiorder iterators")
> > Cc: <stable@vger.kernel.org>
> 
> Looks good to me. You can add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Thank you for the review, Jan.

  reply	other threads:[~2018-05-09 15:09 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-03 19:24 [PATCH 0/5] fix radix tree multi-order iteration race Ross Zwisler
2018-05-03 19:24 ` Ross Zwisler
2018-05-03 19:24 ` [PATCH 1/5] radix tree test suite: fix mapshift build target Ross Zwisler
2018-05-03 19:24   ` Ross Zwisler
2018-07-15 23:00   ` Matthew Wilcox
2018-07-15 23:00     ` Matthew Wilcox
2018-07-16 16:07     ` Ross Zwisler
2018-07-16 16:07       ` Ross Zwisler
2018-07-16 19:52       ` Matthew Wilcox
2018-07-16 19:52         ` Matthew Wilcox
2018-07-16 21:08         ` Ross Zwisler
2018-07-16 21:08           ` Ross Zwisler
2018-07-17  2:41           ` Matthew Wilcox
2018-07-17  2:41             ` Matthew Wilcox
2018-07-21 23:45             ` Dave Chinner
2018-07-21 23:45               ` Dave Chinner
2018-07-22  3:11               ` Ross Zwisler
2018-07-22  3:11                 ` Ross Zwisler
2018-07-17  3:18       ` Matthew Wilcox
2018-07-17  3:18         ` Matthew Wilcox
2018-07-17 17:17         ` Ross Zwisler
2018-07-17 17:17           ` Ross Zwisler
2018-05-03 19:24 ` [PATCH 2/5] radix tree test suite: fix compilation issue Ross Zwisler
2018-05-03 19:24   ` Ross Zwisler
2018-05-03 19:24 ` [PATCH 3/5] radix tree test suite: add item_delete_rcu() Ross Zwisler
2018-05-03 19:24   ` Ross Zwisler
2018-05-03 19:24 ` [PATCH 4/5] radix tree test suite: multi-order iteration race Ross Zwisler
2018-05-03 19:24   ` Ross Zwisler
2018-05-03 19:24 ` [PATCH 5/5] radix tree: fix " Ross Zwisler
2018-05-03 19:24   ` Ross Zwisler
2018-05-09 12:46   ` Jan Kara
2018-05-09 12:46     ` Jan Kara
2018-05-09 15:09     ` Ross Zwisler [this message]
2018-05-09 15:09       ` Ross Zwisler
2018-05-08 17:44 ` [PATCH 0/5] fix radix tree " Ross Zwisler
2018-05-08 17:44   ` Ross Zwisler
2018-05-10 22:48   ` Andrew Morton
2018-05-10 22:48     ` Andrew Morton
2018-05-10 22:54     ` Dan Williams
2018-05-10 22:54       ` Dan Williams
2018-05-10 23:12       ` Andrew Morton
2018-05-10 23:12         ` Andrew Morton
2018-05-10 23:19         ` Dan Williams
2018-05-10 23:19           ` Dan Williams
2018-05-11  4:04     ` Ross Zwisler
2018-05-11  4:04       ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180509150938.GA3814@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.