From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-nvdimm@lists.01.org, Dave Chinner <david@fromorbit.com>,
linux-kernel@vger.kernel.org,
Matthew Wilcox <willy@infradead.org>,
Christoph Hellwig <hch@lst.de>,
stable@vger.kernel.org
Subject: Re: [PATCH 5/5] radix tree: fix multi-order iteration race
Date: Wed, 9 May 2018 09:09:38 -0600 [thread overview]
Message-ID: <20180509150938.GA3814@linux.intel.com> (raw)
In-Reply-To: <20180509124611.6hoa743z4qrx6bgc@quack2.suse.cz>
On Wed, May 09, 2018 at 02:46:11PM +0200, Jan Kara wrote:
> On Thu 03-05-18 13:24:30, Ross Zwisler wrote:
> > Fix a race in the multi-order iteration code which causes the kernel to hit
> > a GP fault. This was first seen with a production v4.15 based kernel
> > (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD
> > DAX entries.
> >
> > The race has to do with how we tear down multi-order sibling entries when
> > we are removing an item from the tree. Remember for example that an order
> > 2 entry looks like this:
> >
> > struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> >
> > where 'entry' is in some slot in the struct radix_tree_node, and the three
> > slots following 'entry' contain sibling pointers which point back to
> > 'entry.'
> >
> > When we delete 'entry' from the tree, we call :
> > radix_tree_delete()
> > radix_tree_delete_item()
> > __radix_tree_delete()
> > replace_slot()
> >
> > replace_slot() first removes the siblings in order from the first to the
> > last, then at then replaces 'entry' with NULL. This means that for a brief
> > period of time we end up with one or more of the siblings removed, so:
> >
> > struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> >
> > This causes an issue if you have a reader iterating over the slots in the
> > tree via radix_tree_for_each_slot() while only under
> > rcu_read_lock()/rcu_read_unlock() protection. This is a common case in
> > mm/filemap.c.
> >
> > The issue is that when __radix_tree_next_slot() => skip_siblings() tries to
> > skip over the sibling entries in the slots, it currently does so with an
> > exact match on the slot directly preceding our current slot. Normally this
> > works:
> > V preceding slot
> > struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> > ^ current slot
> >
> > This lets you find the first sibling, and you skip them all in order.
> >
> > But in the case where one of the siblings is NULL, that slot is skipped and
> > then our sibling detection is interrupted:
> >
> > V preceding slot
> > struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> > ^ current slot
> >
> > This means that the sibling pointers aren't recognized since they point all
> > the way back to 'entry', so we think that they are normal internal radix
> > tree pointers. This causes us to think we need to walk down to a struct
> > radix_tree_node starting at the address of 'entry'.
> >
> > In a real running kernel this will crash the thread with a GP fault when
> > you try and dereference the slots in your broken node starting at 'entry'.
> >
> > We fix this race by fixing the way that skip_siblings() detects sibling
> > nodes. Instead of testing against the preceding slot we instead look for
> > siblings via is_sibling_entry() which compares against the position of the
> > struct radix_tree_node.slots[] array. This ensures that sibling entries
> > are properly identified, even if they are no longer contiguous with the
> > 'entry' they point to.
> >
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Reported-by: CR, Sapthagirish <sapthagirish.cr@intel.com>
> > Fixes: commit 148deab223b2 ("radix-tree: improve multiorder iterators")
> > Cc: <stable@vger.kernel.org>
>
> Looks good to me. You can add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
Thank you for the review, Jan.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
next prev parent reply other threads:[~2018-05-09 15:09 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-03 19:24 [PATCH 0/5] fix radix tree multi-order iteration race Ross Zwisler
2018-05-03 19:24 ` [PATCH 1/5] radix tree test suite: fix mapshift build target Ross Zwisler
2018-07-15 23:00 ` Matthew Wilcox
2018-07-16 16:07 ` Ross Zwisler
2018-07-16 19:52 ` Matthew Wilcox
2018-07-16 21:08 ` Ross Zwisler
2018-07-17 2:41 ` Matthew Wilcox
2018-07-21 23:45 ` Dave Chinner
2018-07-22 3:11 ` Ross Zwisler
2018-07-17 3:18 ` Matthew Wilcox
2018-07-17 17:17 ` Ross Zwisler
2018-05-03 19:24 ` [PATCH 2/5] radix tree test suite: fix compilation issue Ross Zwisler
2018-05-03 19:24 ` [PATCH 3/5] radix tree test suite: add item_delete_rcu() Ross Zwisler
2018-05-03 19:24 ` [PATCH 4/5] radix tree test suite: multi-order iteration race Ross Zwisler
2018-05-03 19:24 ` [PATCH 5/5] radix tree: fix " Ross Zwisler
2018-05-09 12:46 ` Jan Kara
2018-05-09 15:09 ` Ross Zwisler [this message]
2018-05-08 17:44 ` [PATCH 0/5] fix radix tree " Ross Zwisler
2018-05-10 22:48 ` Andrew Morton
2018-05-10 22:54 ` Dan Williams
2018-05-10 23:12 ` Andrew Morton
2018-05-10 23:19 ` Dan Williams
2018-05-11 4:04 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180509150938.GA3814@linux.intel.com \
--to=ross.zwisler@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=stable@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).