From: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: Sasha Levin <sasha.levin-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: CAI Qian <caiqian-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Eric Dumazet
<eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Containers
<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
linux-kernel
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org
Subject: Re: 3.9-rc1 NULL pointer crash at find_pid_ns
Date: Thu, 7 Mar 2013 10:29:34 -0800 [thread overview]
Message-ID: <20130307182934.GY3268@linux.vnet.ibm.com> (raw)
In-Reply-To: <5138D8F2.5020900-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
On Thu, Mar 07, 2013 at 01:14:10PM -0500, Sasha Levin wrote:
> On 03/07/2013 01:05 PM, ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org wrote:
> > Sasha Levin <sasha.levin-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> writes:
> >
> >> On 03/07/2013 12:46 PM, Eric Dumazet wrote:
> >>> On Thu, 2013-03-07 at 12:36 -0500, Sasha Levin wrote:
> >>>
> >>>> Looks like the hlist change is probably the issue, though it specifically
> >>>> uses:
> >>>>
> >>>> #define hlist_entry_safe(ptr, type, member) \
> >>>> (ptr) ? hlist_entry(ptr, type, member) : NULL
> >>>>
> >>>> I'm still looking at the code in question and it's assembly, but I can't
> >>>> figure out what's going wrong. I was also trying to see what's so special
> >>>> about this loop in find_pid_ns as opposed to the rest of the kernel code
> >>>> that uses hlist_for_each_entry_rcu() but couldn't find out why.
> >>>>
> >>>> Is it somehow possible that if we rcu_dereference_raw() the same thing twice
> >>>> inside the same rcu_read_lock() section we'll get different results? That's
> >>>> really the only reason for this crash that comes to mind at the moment, very
> >>>> unlikely - but that's all I have right now.
> >>>>
> >>>
> >>> Yep
> >>>
> >>> #define hlist_entry_safe(ptr, type, member) \
> >>> (ptr) ? hlist_entry(ptr, type, member) : NULL
> >>>
> >>> Is not safe, as ptr can be evaluated twice, and thats not good at all...
> >>
> >> ptr is being evaluated twice, but in this case this is an
> >> rcu_dereference_raw() value within the same rcu_read_lock() section.
> >>
> >> Is it still problematic?
> >
> > Definitely.
> >
> > Head in this instance the expression: &pid_hash[pid_hashfn(nr, ns)]
> >
> > And the crash clearly shows that when hilst_entry is being evaluated the
> > HEAD is NULL.
>
> Okay, I'm even more confused now.
>
> The expression in question is:
>
> hlist_entry_safe(rcu_dereference_bh(hlist_first_rcu(head)))
>
> You're saying that "rcu_dereference_bh(hlist_first_rcu(head))" can change between
> the two evaluations we do. That would mean that 'head' has changed in between, right?
>
> In that case, the list itself has changed - which means that RCU has changed the
> list underneath us.
>
> hlist_first_rcu() doesn't have any side-effects, it doesn't modify the list whatsoever,
> so the only thing that can change is 'head'. Why is it allowed to change if the list
> is protected by RCU?
RCU does not prevent the list from changing. Instead, it prevents anything
that was in the list from being freed during a given RCU read-side critical
section. Here is how it is supposed to happen:
head---->A
Task 1 picks up the pointer from head to A, and sees that it is non-NULL.
Task 2 removes A from the list, so that the pointer from head is now NULL:
head A
|
|
V
NULL
Now task 1 refetches from head, and is fatally disappointed to get a
NULL pointer.
Now, had task 1 avoided the refetch, it would be still working with
a pointer to A. Since A won't be freed until the end of an RCU grace
period, all would have been well. Again, one way to handle this is
as follows:
#define hlist_entry_safe(ptr, type, member) \
({ typeof(ptr) ____ptr = (ptr); \
____ptr ? hlist_entry(____ptr, type, member) : NULL; \
})
This way, "ptr" is executed exactly once, and the check and the
hlist_entry() are both using the same value.
Thanx, Paul
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: ebiederm@xmission.com, Eric Dumazet <eric.dumazet@gmail.com>,
Li Zefan <lizefan@huawei.com>, CAI Qian <caiqian@redhat.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Containers <containers@lists.linux-foundation.org>
Subject: Re: 3.9-rc1 NULL pointer crash at find_pid_ns
Date: Thu, 7 Mar 2013 10:29:34 -0800 [thread overview]
Message-ID: <20130307182934.GY3268@linux.vnet.ibm.com> (raw)
In-Reply-To: <5138D8F2.5020900@oracle.com>
On Thu, Mar 07, 2013 at 01:14:10PM -0500, Sasha Levin wrote:
> On 03/07/2013 01:05 PM, ebiederm@xmission.com wrote:
> > Sasha Levin <sasha.levin@oracle.com> writes:
> >
> >> On 03/07/2013 12:46 PM, Eric Dumazet wrote:
> >>> On Thu, 2013-03-07 at 12:36 -0500, Sasha Levin wrote:
> >>>
> >>>> Looks like the hlist change is probably the issue, though it specifically
> >>>> uses:
> >>>>
> >>>> #define hlist_entry_safe(ptr, type, member) \
> >>>> (ptr) ? hlist_entry(ptr, type, member) : NULL
> >>>>
> >>>> I'm still looking at the code in question and it's assembly, but I can't
> >>>> figure out what's going wrong. I was also trying to see what's so special
> >>>> about this loop in find_pid_ns as opposed to the rest of the kernel code
> >>>> that uses hlist_for_each_entry_rcu() but couldn't find out why.
> >>>>
> >>>> Is it somehow possible that if we rcu_dereference_raw() the same thing twice
> >>>> inside the same rcu_read_lock() section we'll get different results? That's
> >>>> really the only reason for this crash that comes to mind at the moment, very
> >>>> unlikely - but that's all I have right now.
> >>>>
> >>>
> >>> Yep
> >>>
> >>> #define hlist_entry_safe(ptr, type, member) \
> >>> (ptr) ? hlist_entry(ptr, type, member) : NULL
> >>>
> >>> Is not safe, as ptr can be evaluated twice, and thats not good at all...
> >>
> >> ptr is being evaluated twice, but in this case this is an
> >> rcu_dereference_raw() value within the same rcu_read_lock() section.
> >>
> >> Is it still problematic?
> >
> > Definitely.
> >
> > Head in this instance the expression: &pid_hash[pid_hashfn(nr, ns)]
> >
> > And the crash clearly shows that when hilst_entry is being evaluated the
> > HEAD is NULL.
>
> Okay, I'm even more confused now.
>
> The expression in question is:
>
> hlist_entry_safe(rcu_dereference_bh(hlist_first_rcu(head)))
>
> You're saying that "rcu_dereference_bh(hlist_first_rcu(head))" can change between
> the two evaluations we do. That would mean that 'head' has changed in between, right?
>
> In that case, the list itself has changed - which means that RCU has changed the
> list underneath us.
>
> hlist_first_rcu() doesn't have any side-effects, it doesn't modify the list whatsoever,
> so the only thing that can change is 'head'. Why is it allowed to change if the list
> is protected by RCU?
RCU does not prevent the list from changing. Instead, it prevents anything
that was in the list from being freed during a given RCU read-side critical
section. Here is how it is supposed to happen:
head---->A
Task 1 picks up the pointer from head to A, and sees that it is non-NULL.
Task 2 removes A from the list, so that the pointer from head is now NULL:
head A
|
|
V
NULL
Now task 1 refetches from head, and is fatally disappointed to get a
NULL pointer.
Now, had task 1 avoided the refetch, it would be still working with
a pointer to A. Since A won't be freed until the end of an RCU grace
period, all would have been well. Again, one way to handle this is
as follows:
#define hlist_entry_safe(ptr, type, member) \
({ typeof(ptr) ____ptr = (ptr); \
____ptr ? hlist_entry(____ptr, type, member) : NULL; \
})
This way, "ptr" is executed exactly once, and the check and the
hlist_entry() are both using the same value.
Thanx, Paul
next prev parent reply other threads:[~2013-03-07 18:29 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-07 9:37 3.9-rc1 NULL pointer crash at find_pid_ns CAI Qian
[not found] ` <611667212.10748821.1362649031475.JavaMail.root-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-07 9:42 ` Li Zefan
2013-03-07 9:42 ` Li Zefan
[not found] ` <513860E8.4080807-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-03-07 9:59 ` Eric W. Biederman
2013-03-07 9:59 ` Eric W. Biederman
[not found] ` <876213wmwt.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-03-07 17:36 ` Sasha Levin
2013-03-07 17:36 ` Sasha Levin
[not found] ` <5138D001.8000409-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-07 17:46 ` Eric Dumazet
2013-03-07 17:46 ` Eric Dumazet
2013-03-07 17:50 ` Sasha Levin
[not found] ` <5138D377.6040406-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-07 18:04 ` Paul E. McKenney
2013-03-07 18:04 ` Paul E. McKenney
2013-03-07 18:05 ` Eric W. Biederman
2013-03-07 18:05 ` Eric W. Biederman
[not found] ` <87boavrspd.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-03-07 18:14 ` Sasha Levin
2013-03-07 18:14 ` Sasha Levin
[not found] ` <5138D8F2.5020900-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-07 18:18 ` Eric Dumazet
2013-03-07 18:18 ` Eric Dumazet
2013-03-07 18:21 ` Eric W. Biederman
2013-03-07 18:21 ` Eric W. Biederman
[not found] ` <87r4jrqdf6.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-03-07 18:27 ` Sasha Levin
2013-03-07 18:27 ` Sasha Levin
2013-03-07 18:29 ` Paul E. McKenney [this message]
2013-03-07 18:29 ` Paul E. McKenney
[not found] ` <20130307182934.GY3268-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-03-09 8:01 ` Li Zefan
2013-03-09 8:01 ` Li Zefan
[not found] ` <513AEC65.8000008-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-03-09 15:51 ` Paul E. McKenney
2013-03-09 15:51 ` Paul E. McKenney
[not found] ` <20130309155146.GR3268-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-03-14 20:00 ` Dave Jones
2013-03-14 20:00 ` Dave Jones
[not found] ` <20130314200054.GA5924-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-14 21:00 ` Paul E. McKenney
2013-03-14 21:00 ` Paul E. McKenney
2013-03-07 18:15 ` Paul E. McKenney
2013-03-07 18:15 ` Paul E. McKenney
2013-03-07 17:50 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130307182934.GY3268@linux.vnet.ibm.com \
--to=paulmck-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
--cc=caiqian-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sasha.levin-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.