netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: cera@cera.cz, mlxsw@mellanox.com, netdev@vger.kernel.org,
	peter@svinota.eu, bridge@lists.linux-foundation.org,
	idosch@mellanox.com, davem@davemloft.net
Subject: Re: [PATCH net] bridge: netlink: register netdevice before executing changelink
Date: Fri, 7 Apr 2017 13:37:53 -0400	[thread overview]
Message-ID: <20170407133753.1602f3cd@plumbers-lap.home.lan> (raw)
In-Reply-To: <c3769204-4803-6c5f-1286-c8f1f5e045be@cumulusnetworks.com>

On Fri, 7 Apr 2017 18:43:06 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> On 07/04/17 18:36, Stephen Hemminger wrote:
> > On Fri, 7 Apr 2017 18:27:37 +0300
> > Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> >   
> >> On 07/04/17 18:22, Stephen Hemminger wrote:  
> >>> On Fri, 7 Apr 2017 17:19:48 +0300
> >>> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> >>>     
> >>>> On 07/04/17 17:10, Stephen Hemminger wrote:    
> >>>>> On Fri, 7 Apr 2017 15:49:15 +0300
> >>>>> <idosch@mellanox.com> wrote:
> >>>>>       
> >>>>>> From: Ido Schimmel <idosch@mellanox.com>
> >>>>>>
> >>>>>> Peter reported a kernel oops when executing the following command:
> >>>>>>
> >>>>>> $ ip link add name test type bridge vlan_default_pvid 1
> >>>>>>
> >>>>>> [13634.939408] BUG: unable to handle kernel NULL pointer dereference at
> >>>>>> 0000000000000190
> >>>>>> [13634.939436] IP: __vlan_add+0x73/0x5f0
> >>>>>> [...]
> >>>>>> [13634.939783] Call Trace:
> >>>>>> [13634.939791]  ? pcpu_next_unpop+0x3b/0x50
> >>>>>> [13634.939801]  ? pcpu_alloc+0x3d2/0x680
> >>>>>> [13634.939810]  ? br_vlan_add+0x135/0x1b0
> >>>>>> [13634.939820]  ? __br_vlan_set_default_pvid.part.28+0x204/0x2b0
> >>>>>> [13634.939834]  ? br_changelink+0x120/0x4e0
> >>>>>> [13634.939844]  ? br_dev_newlink+0x50/0x70
> >>>>>> [13634.939854]  ? rtnl_newlink+0x5f5/0x8a0
> >>>>>> [13634.939864]  ? rtnl_newlink+0x176/0x8a0
> >>>>>> [13634.939874]  ? mem_cgroup_commit_charge+0x7c/0x4e0
> >>>>>> [13634.939886]  ? rtnetlink_rcv_msg+0xe1/0x220
> >>>>>> [13634.939896]  ? lookup_fast+0x52/0x370
> >>>>>> [13634.939905]  ? rtnl_newlink+0x8a0/0x8a0
> >>>>>> [13634.939915]  ? netlink_rcv_skb+0xa1/0xc0
> >>>>>> [13634.939925]  ? rtnetlink_rcv+0x24/0x30
> >>>>>> [13634.939934]  ? netlink_unicast+0x177/0x220
> >>>>>> [13634.939944]  ? netlink_sendmsg+0x2fe/0x3b0
> >>>>>> [13634.939954]  ? _copy_from_user+0x39/0x40
> >>>>>> [13634.939964]  ? sock_sendmsg+0x30/0x40
> >>>>>> [13634.940159]  ? ___sys_sendmsg+0x29d/0x2b0
> >>>>>> [13634.940326]  ? __alloc_pages_nodemask+0xdf/0x230
> >>>>>> [13634.940478]  ? mem_cgroup_commit_charge+0x7c/0x4e0
> >>>>>> [13634.940592]  ? mem_cgroup_try_charge+0x76/0x1a0
> >>>>>> [13634.940701]  ? __handle_mm_fault+0xdb9/0x10b0
> >>>>>> [13634.940809]  ? __sys_sendmsg+0x51/0x90
> >>>>>> [13634.940917]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
> >>>>>>
> >>>>>> The problem is that the bridge's VLAN group is created after setting the
> >>>>>> default PVID, when registering the netdevice and executing its
> >>>>>> ndo_init().
> >>>>>>
> >>>>>> Fix this by changing the order of both operations, so that
> >>>>>> br_changelink() is only processed after the netdevice is registered,
> >>>>>> when the VLAN group is already initialized.
> >>>>>>
> >>>>>> The changelink() call is done on a best-effort basis since unregistering
> >>>>>> the netdevice upon failure won't perform a proper cleanup due to a
> >>>>>> missing ndo_uninit(), which I'll try to add for net-next.
> >>>>>>
> >>>>>> Fixes: b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()")
> >>>>>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >>>>>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> >>>>>> Reported-by: Peter V. Saveliev <peter@svinota.eu>
> >>>>>> ---
> >>>>>> Please consider this for 4.4.y, 4.9.y and 4.10.y as well.      
> >>>>>
> >>>>> Although this patch fixes the OOPS it breaks all the error handling
> >>>>> of br_changelink. If bad attributes are passed to newlink, you leave a
> >>>>> broken partially configured bridge device. The code needs to cleanup
> >>>>> and return the correct errno.
> >>>>>       
> >>>>
> >>>> The cleanup would require adding ndo_uninit() and a much bigger churn
> >>>> which doesn't seem okay for -net, it will be targetted at net-next.
> >>>> The bridge can always be reconfigured as all of the options can be set
> >>>> during runtime, so anything can be fixed, thus the best-effort changelink.
> >>>>
> >>>> If it is not desirable for -net then maybe we should just revert the
> >>>> patch there altogether and make it again correctly with cleanup and so
> >>>> on in net-next.
> >>>>
> >>>>
> >>>>    
> >>>
> >>> Why not just add pointer validation in the PVID attribute parsing.
> >>>     
> >>
> >> We cannot have the changelink() before the register  for many reasons,
> >> first the vlan config will not be applied so all of the vlan options
> >> won't get set even though they're passed, then the changelink() can
> >> cause more trouble via the STP calls (f.e. br_stp_start) which can use
> >> br->dev->ifindex (= 0 at that point), also can use br->dev->name (still
> >> haven't passed validation and is uninitialized, i.e.
> >> dev_get_valid_name() hasn't been called and we can have format
> >> specifiers in it), multicast code also has some codepaths that will
> >> cause various timers to get started...
> >>
> >> Moving changelink() after the register is much safer.
> >>  
> > 
> > Then just fix error handling. Why not call unregister?
> >   
> 
> Right, because there's no ndo_uninit() and this will not cleanup the
> bridge properly. That's the plan for net-next, reorg the code and add
> ndo_uninit() for that reason.
> 
> From Ido's commit message above:
> "The changelink() call is done on a best-effort basis since
> unregistering the netdevice upon failure won't perform a proper cleanup
> due to a missing ndo_uninit(), which I'll try to add for net-next."

Fix it right now. The patch you propose is too half baked.

  reply	other threads:[~2017-04-07 17:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-07 12:49 [PATCH net] bridge: netlink: register netdevice before executing changelink idosch
2017-04-07 14:10 ` Stephen Hemminger
2017-04-07 14:19   ` Nikolay Aleksandrov via Bridge
2017-04-07 15:22     ` Stephen Hemminger
2017-04-07 15:27       ` Nikolay Aleksandrov
2017-04-07 15:36         ` Stephen Hemminger
2017-04-07 15:43           ` Nikolay Aleksandrov via Bridge
2017-04-07 17:37             ` Stephen Hemminger [this message]
2017-04-07 18:16               ` Ido Schimmel
2017-04-07 18:44                 ` Stephen Hemminger
2017-04-07 18:45                 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170407133753.1602f3cd@plumbers-lap.home.lan \
    --to=stephen@networkplumber.org \
    --cc=bridge@lists.linux-foundation.org \
    --cc=cera@cera.cz \
    --cc=davem@davemloft.net \
    --cc=idosch@mellanox.com \
    --cc=mlxsw@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=peter@svinota.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).