All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sasha.levin-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org,
	"Paul E. McKenney"
	<paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	CAI Qian <caiqian-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-kernel
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: 3.9-rc1 NULL pointer crash at find_pid_ns
Date: Thu, 07 Mar 2013 12:36:01 -0500	[thread overview]
Message-ID: <5138D001.8000409@oracle.com> (raw)
In-Reply-To: <876213wmwt.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

On 03/07/2013 04:59 AM, ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org wrote:
> Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> writes:
> 
>> Cc: sasha.levin-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org
>> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> Cc: container
>>
>> This is a second report... and the same address: 0xfffffffffffffff0 
> 
> Actually this is the third report I have seen with that address, and the
> others were on x86_64.
> 
> The obvious answer is that there is something subtlely wrong with:
> 
> commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a
> Author: Sasha Levin <sasha.levin-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Date:   Wed Feb 27 17:06:00 2013 -0800
> 
>     hlist: drop the node parameter from iterators
> 
> 
> This is the only change the pid namespace that I am aware of in 3.9-rc1.
> 
> If you can reproduce this somewhat readily can you please revert the
> hlist change and see if this continues to happen.  Right now there are
> no other code changes that I can see.  And the address
> 0xfffffffffffffff0 is consistent with a bug in hlist_for_each_entry_rcu.

Looks like the hlist change is probably the issue, though it specifically
uses:

	#define hlist_entry_safe(ptr, type, member) \
        	(ptr) ? hlist_entry(ptr, type, member) : NULL

I'm still looking at the code in question and it's assembly, but I can't
figure out what's going wrong. I was also trying to see what's so special
about this loop in find_pid_ns as opposed to the rest of the kernel code
that uses hlist_for_each_entry_rcu() but couldn't find out why.

Is it somehow possible that if we rcu_dereference_raw() the same thing twice
inside the same rcu_read_lock() section we'll get different results? That's
really the only reason for this crash that comes to mind at the moment, very
unlikely - but that's all I have right now.

Is this bug reproducible easily on your setup? I've managed to reproduce it
exactly 3 times in the past month or so, twice when I reported it and only
once since then - at some point I thought that it was a freak compiler issue
that went away when the code changed but since you're reporting it again
I guess that it isn't the case.


Paul, any chance you can give hlist_for_each_entry_rcu() a second look please?
I know you've already acked it before, but is it possible I missed a subtle
detail with RCU that causes this?


Thanks,
Sasha

WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sasha.levin@oracle.com>
To: ebiederm@xmission.com, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Li Zefan <lizefan@huawei.com>, CAI Qian <caiqian@redhat.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Containers <containers@lists.linux-foundation.org>
Subject: Re: 3.9-rc1 NULL pointer crash at find_pid_ns
Date: Thu, 07 Mar 2013 12:36:01 -0500	[thread overview]
Message-ID: <5138D001.8000409@oracle.com> (raw)
In-Reply-To: <876213wmwt.fsf@xmission.com>

On 03/07/2013 04:59 AM, ebiederm@xmission.com wrote:
> Li Zefan <lizefan@huawei.com> writes:
> 
>> Cc: sasha.levin@oracle.com
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> Cc: container
>>
>> This is a second report... and the same address: 0xfffffffffffffff0 
> 
> Actually this is the third report I have seen with that address, and the
> others were on x86_64.
> 
> The obvious answer is that there is something subtlely wrong with:
> 
> commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a
> Author: Sasha Levin <sasha.levin@oracle.com>
> Date:   Wed Feb 27 17:06:00 2013 -0800
> 
>     hlist: drop the node parameter from iterators
> 
> 
> This is the only change the pid namespace that I am aware of in 3.9-rc1.
> 
> If you can reproduce this somewhat readily can you please revert the
> hlist change and see if this continues to happen.  Right now there are
> no other code changes that I can see.  And the address
> 0xfffffffffffffff0 is consistent with a bug in hlist_for_each_entry_rcu.

Looks like the hlist change is probably the issue, though it specifically
uses:

	#define hlist_entry_safe(ptr, type, member) \
        	(ptr) ? hlist_entry(ptr, type, member) : NULL

I'm still looking at the code in question and it's assembly, but I can't
figure out what's going wrong. I was also trying to see what's so special
about this loop in find_pid_ns as opposed to the rest of the kernel code
that uses hlist_for_each_entry_rcu() but couldn't find out why.

Is it somehow possible that if we rcu_dereference_raw() the same thing twice
inside the same rcu_read_lock() section we'll get different results? That's
really the only reason for this crash that comes to mind at the moment, very
unlikely - but that's all I have right now.

Is this bug reproducible easily on your setup? I've managed to reproduce it
exactly 3 times in the past month or so, twice when I reported it and only
once since then - at some point I thought that it was a freak compiler issue
that went away when the code changed but since you're reporting it again
I guess that it isn't the case.


Paul, any chance you can give hlist_for_each_entry_rcu() a second look please?
I know you've already acked it before, but is it possible I missed a subtle
detail with RCU that causes this?


Thanks,
Sasha

  parent reply	other threads:[~2013-03-07 17:36 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-07  9:37 3.9-rc1 NULL pointer crash at find_pid_ns CAI Qian
     [not found] ` <611667212.10748821.1362649031475.JavaMail.root-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-07  9:42   ` Li Zefan
2013-03-07  9:42     ` Li Zefan
     [not found]     ` <513860E8.4080807-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-03-07  9:59       ` Eric W. Biederman
2013-03-07  9:59         ` Eric W. Biederman
     [not found]         ` <876213wmwt.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-03-07 17:36           ` Sasha Levin [this message]
2013-03-07 17:36             ` Sasha Levin
2013-03-07 17:46             ` Eric Dumazet
2013-03-07 17:50               ` Sasha Levin
     [not found]                 ` <5138D377.6040406-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-07 18:04                   ` Paul E. McKenney
2013-03-07 18:04                     ` Paul E. McKenney
2013-03-07 18:05                   ` Eric W. Biederman
2013-03-07 18:05                     ` Eric W. Biederman
     [not found]                     ` <87boavrspd.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-03-07 18:14                       ` Sasha Levin
2013-03-07 18:14                         ` Sasha Levin
     [not found]                         ` <5138D8F2.5020900-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-07 18:18                           ` Eric Dumazet
2013-03-07 18:18                             ` Eric Dumazet
2013-03-07 18:21                           ` Eric W. Biederman
2013-03-07 18:21                             ` Eric W. Biederman
     [not found]                             ` <87r4jrqdf6.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-03-07 18:27                               ` Sasha Levin
2013-03-07 18:27                                 ` Sasha Levin
2013-03-07 18:29                           ` Paul E. McKenney
2013-03-07 18:29                             ` Paul E. McKenney
     [not found]                             ` <20130307182934.GY3268-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-03-09  8:01                               ` Li Zefan
2013-03-09  8:01                             ` Li Zefan
     [not found]                               ` <513AEC65.8000008-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-03-09 15:51                                 ` Paul E. McKenney
2013-03-09 15:51                                   ` Paul E. McKenney
     [not found]                                   ` <20130309155146.GR3268-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-03-14 20:00                                     ` Dave Jones
2013-03-14 20:00                                       ` Dave Jones
     [not found]                                       ` <20130314200054.GA5924-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-14 21:00                                         ` Paul E. McKenney
2013-03-14 21:00                                           ` Paul E. McKenney
2013-03-07 18:15                       ` Paul E. McKenney
2013-03-07 18:15                         ` Paul E. McKenney
2013-03-07 17:50               ` Sasha Levin
     [not found]             ` <5138D001.8000409-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-03-07 17:46               ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5138D001.8000409@oracle.com \
    --to=sasha.levin-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
    --cc=caiqian-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.