From: ebiederm@xmission.com (Eric W. Biederman)
To: Sasha Levin <levinsasha928@gmail.com>
Cc: davem <davem@davemloft.net>,
Eric Dumazet <eric.dumazet@gmail.com>,
Eric Van Hensbergen <ericvh@gmail.com>,
Dave Jones <davej@redhat.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
netdev <netdev@vger.kernel.org>
Subject: Re: net: kernel BUG() in net/netns/generic.h:45
Date: Fri, 06 Apr 2012 18:16:10 -0700 [thread overview]
Message-ID: <m1mx6otk85.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <CA+1xoqdi4QOhRJvBA64pxiN3EphhThjPkf4eLRkYM8jxcJjh=w@mail.gmail.com> (Sasha Levin's message of "Fri, 6 Apr 2012 11:04:25 +0200")
Sasha Levin <levinsasha928@gmail.com> writes:
> On Fri, Apr 6, 2012 at 1:53 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Sasha Levin <levinsasha928@gmail.com> writes:
>>
>>> Hi all,
>>>
>>> When an initialization of a network namespace in setup_net() fails, we
>>> try to undo everything by executing each of the exit callbacks of every
>>> namespace in the network.
>>>
>>> The problem is, it might be possible that the net_generic array wasn't
>>> initialized before we fail and try to undo everything. At that point,
>>> some of the networks assume that since we're already calling the exit
>>> callback, the net_generic structure is initialized and we hit the BUG()
>>> in net/netns/generic.h:45 .
>>>
>>> I'm not quite sure whether the right fix from the following three
>>> options is, and would be happy to figure it out before fixing it:
>>>
>>> 1. Don't assume net_generic was initialized in the exit callback, which
>>> is a bit problematic since we can't query that nicely anyway (a
>>> sub-option here would be adding an API to query whether the net_generic
>>> structure is initialized.
>>>
>>> 2. Remove the BUG(), switch it to a WARN() and let each subsystem
>>> handle the case of NULL on it's own. While it sounds a bit wrong, it's
>>> worth mentioning that that BUG() was initially added in an attempt to
>>> fix an issue in CAIF, which was fixed in a completely different way
>>> afterwards, so it's not strictly necessary here.
>>>
>>> 3. Only call the exit callback for subsystems we have called the init
>>> callback for.
>>
>> Your option 3 only calling the exit callbacks for subsystems we have
>> initialized should be what is implemented. Certainly it looks like we
>> are attempting to only call the exit callbacks for code whose init
>> callback has succeeded.
>>
>> What problem are you seeing?
>>
>> This smells suspiciously like a problem we had a little while ago caif
>> was registering as a pernet device instead of a pernet subsystem,
>> and because of that we had packets flying around after it had been
>> unregistered and was trying access it's net_generic data.
>
> It looks different from the caif problem a bit, here is a sample stacktrace:
>
> [ 163.733755] ------------[ cut here ]------------
> [ 163.734501] kernel BUG at include/net/netns/generic.h:45!
> [ 163.734501] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 163.734501] CPU 2
> [ 163.734501] Pid: 19145, comm: trinity Tainted: G W
> 3.4.0-rc1-next-20120405-sasha-dirty #57
> [ 163.734501] RIP: 0010:[<ffffffff824d6062>] [<ffffffff824d6062>]
> phonet_pernet+0x182/0x1a0
> [ 163.734501] RSP: 0018:ffff8800674d5ca8 EFLAGS: 00010246
> [ 163.734501] RAX: 000000003fffffff RBX: 0000000000000000 RCX: ffff8800678c88d8
> [ 163.734501] RDX: 00000000003f4000 RSI: ffff8800678c8910 RDI: 0000000000000282
> [ 163.734501] RBP: ffff8800674d5cc8 R08: 0000000000000000 R09: 0000000000000000
> [ 163.734501] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880068bec920
> [ 163.734501] R13: ffffffff836b90c0 R14: 0000000000000000 R15: 0000000000000000
> [ 163.734501] FS: 00007f055e8de700(0000) GS:ffff88007d000000(0000)
> knlGS:0000000000000000
> [ 163.734501] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 163.734501] CR2: 00007f055e6bb518 CR3: 0000000070c16000 CR4: 00000000000406e0
> [ 163.734501] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 163.734501] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 163.734501] Process trinity (pid: 19145, threadinfo
> ffff8800674d4000, task ffff8800678c8000)
> [ 163.734501] Stack:
> [ 163.734501] ffffffff824d5f00 ffffffff810e2ec1 ffff880067ae0000
> 00000000ffffffd4
> [ 163.734501] ffff8800674d5cf8 ffffffff824d667a ffff880067ae0000
> 00000000ffffffd4
> [ 163.734501] ffffffff836b90c0 0000000000000000 ffff8800674d5d18
> ffffffff824d707d
> [ 163.734501] Call Trace:
> [ 163.734501] [<ffffffff824d5f00>] ? phonet_pernet+0x20/0x1a0
> [ 163.734501] [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
> [ 163.734501] [<ffffffff824d667a>] phonet_device_destroy+0x1a/0x100
> [ 163.734501] [<ffffffff824d707d>] phonet_device_notify+0x3d/0x50
> [ 163.734501] [<ffffffff810dd96e>] notifier_call_chain+0xee/0x130
> [ 163.734501] [<ffffffff810dd9d1>] raw_notifier_call_chain+0x11/0x20
> [ 163.734501] [<ffffffff821cce12>] call_netdevice_notifiers+0x52/0x60
> [ 163.734501] [<ffffffff821cd235>] rollback_registered_many+0x185/0x270
> [ 163.734501] [<ffffffff821cd334>] unregister_netdevice_many+0x14/0x60
> [ 163.734501] [<ffffffff823123e3>] ipip_exit_net+0x1b3/0x1d0
> [ 163.734501] [<ffffffff82312230>] ? ipip_rcv+0x420/0x420
> [ 163.734501] [<ffffffff821c8515>] ops_exit_list+0x35/0x70
> [ 163.734501] [<ffffffff821c911b>] setup_net+0xab/0xe0
> [ 163.734501] [<ffffffff821c9416>] copy_net_ns+0x76/0x100
> [ 163.734501] [<ffffffff810dc92b>] create_new_namespaces+0xfb/0x190
> [ 163.734501] [<ffffffff810dca21>] unshare_nsproxy_namespaces+0x61/0x80
> [ 163.734501] [<ffffffff810afd1f>] sys_unshare+0xff/0x290
> [ 163.734501] [<ffffffff8187622e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 163.734501] [<ffffffff82665539>] system_call_fastpath+0x16/0x1b
> [ 163.734501] Code: e0 c3 fe 66 0f 1f 44 00 00 48 c7 c2 40 60 4d 82
> be 01 00 00 00 48 c7 c7 80 d1 23 83 e8 48 2a c4 fe e8 73 06 c8 fe 48
> 85 db 75 0e <0f> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 83 c4 10 48
> 89 d8
> [ 163.734501] RIP [<ffffffff824d6062>] phonet_pernet+0x182/0x1a0
> [ 163.734501] RSP <ffff8800674d5ca8>
> [ 163.861289] ---[ end trace fb5615826c548066 ]---
>
> It would appear that there's at least a one-off problem with the
> reverse iteration.
I don't see anything pointing to reverse iteration being the problem,
or even the call graph order.
What I see is phonet registering a netdevice notifier.
Then I see phonet register pernet_device operations.
Then I see phonet responding to all network devices no matter what
the network namespace is, and looking in the phonet net_generic data
for them.
Looking phonet does not implement any network devices phonet just has
per network device state. Which means phonet should be using
register_pernet_subsys and this roughly is the same issue we saw with
caif. Confusion of when you should use register_pernet_subsys and
register_pernet_device.
Issues that I see.
- There is a bunch of weird and stuff going on in phonet_net_exit
to handle the module unregister case where we don't synthesize
network device removal events. ( A generic bug we can fix).
- phonet should be using register_pernet_subsys instead of
register_pernet_device.
Patches to follow in a minute. Is there any chance you can reproduce
this so you can verify the problems don't continue to reproduce?
Eric
next prev parent reply other threads:[~2012-04-07 1:12 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-05 22:20 net: kernel BUG() in net/netns/generic.h:45 Sasha Levin
2012-04-05 23:53 ` Eric W. Biederman
2012-04-06 9:04 ` Sasha Levin
2012-04-07 1:16 ` Eric W. Biederman [this message]
2012-04-07 1:31 ` [PATCH 1/2], net:In, unregister_netdevice_notifier, unregister, the, netdevices.
2012-04-07 1:33 ` [PATCH 1/2] net: In unregister_netdevice_notifier unregister the netdevices Eric W. Biederman
2012-04-07 1:35 ` [PATCH 2/2] phonet: Sort out initiailziation and cleanup code Eric W. Biederman
2012-04-10 6:39 ` Rémi Denis-Courmont
2012-04-10 10:47 ` Sasha Levin
2012-04-13 15:05 ` David Miller
2012-04-13 15:05 ` [PATCH 1/2] net: In unregister_netdevice_notifier unregister the netdevices David Miller
2012-04-14 0:03 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1mx6otk85.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=davej@redhat.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=ericvh@gmail.com \
--cc=levinsasha928@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.