netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Veaceslav Falico <vfalico@redhat.com>
To: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: netdev@vger.kernel.org, fubar@us.ibm.com, andy@greyhouse.net,
	davem@davemloft.net
Subject: Re: [PATCH] bonding: remove sysfs before removing devices
Date: Sat, 6 Apr 2013 01:21:33 +0200	[thread overview]
Message-ID: <20130405232133.GA19437@redhat.com> (raw)
In-Reply-To: <515F4CEF.3030207@redhat.com>

On Sat, Apr 06, 2013 at 12:15:11AM +0200, Nikolay Aleksandrov wrote:
>Hi,
>Sorry for the late reply but I was travelling this week. In my opinion this
>fix is wrong because in bond_uninit() (called by rtnl_link_unregister)
>you have:
>list_del(&bond->bond_list);
>which is linked in the bond_net dev_list which is freed by
>unregister_pernet_subsys.

Yep, you're right, I've hit it recently with the patch applied, and now
working on it. However, I still think that the idea of the patch is correct
- i.e. to first disable sysfs (especially bonding_masters) and only
  afterwards to start removing everything else. Or, obviously, to finally
add normal locking to sysfs functions.

Anyway, this corruption is really rare, so I'll wait for your fix next
week.

>You'll get a corrupted list warning at best.
>
>Here's a sample from running insmod max_bonds=3/rmmod in a loop with the
>patch applied:
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302186] ------------[ cut here
>]------------
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302191] WARNING: at
>lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302192] Hardware name: VirtualBox
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302194] list_del corruption.
>next->prev should be ffff880036bc6860, but was ffff88002ee23000
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302194] Modules linked in:
>bonding(O-) ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
>nf_conntrack_ipv4 nf_defrag_ipv4 ip6table_filter xt_conntrack
>nf_conntrack ip6_tables snd_hda_codec_idt snd_hda_intel snd_hda_codec
>snd_hwdep snd_seq snd_seq_device ppdev snd_pcm pcspkr snd_page_alloc
>i2c_piix4 joydev snd_timer snd soundcore i2c_core microcode parport_pc
>parport e1000 [last unloaded: bonding]
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302214] Call Trace:
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302219]  [<ffffffff8105e99f>]
>warn_slowpath_common+0x7f/0xc0
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302221]  [<ffffffff8105ea96>]
>warn_slowpath_fmt+0x46/0x50
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302225]  [<ffffffff8164efaf>]
>? printk+0x61/0x63
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302227]  [<ffffffff8130c302>]
>__list_del_entry+0x82/0xd0
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302229]  [<ffffffff8130c361>]
>list_del+0x11/0x40
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302233]  [<ffffffffa0187f70>]
>bond_uninit+0x70/0xd0 [bonding]
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302236]  [<ffffffff815486d8>]
>rollback_registered_many+0x158/0x220
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302238]  [<ffffffff815487f9>]
>unregister_netdevice_many+0x19/0x60
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302240]  [<ffffffff815575ae>]
>__rtnl_link_unregister+0x6e/0xb0
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302243]  [<ffffffff81557b7e>]
>rtnl_link_unregister+0x1e/0x30
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302246]  [<ffffffffa019359e>]
>bonding_exit+0x2d/0xa8f [bonding]
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302249]  [<ffffffff810c1dd0>]
>sys_delete_module+0x170/0x2d0
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302252]  [<ffffffff81014981>]
>? do_notify_resume+0x71/0xb0
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302255]  [<ffffffff81661899>]
>system_call_fastpath+0x16/0x1b
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302256] ---[ end trace
>31cca9f26623fa11 ]---
>Apr  5 23:54:54 dhcp-1-171 kernel: [   21.302417] bonding: bond1:
>released all slaves
>
>You can hit this also with a NULL pointer dereference.
>I have a correct fix for this bug which I intend to post next week when
>I get back and after some more testing.
>
>Please let me know if I've missed something about this patch.
>
>Best regards,
> Nik

  reply	other threads:[~2013-04-05 23:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-03 15:46 [PATCH] bonding: remove sysfs before removing devices Veaceslav Falico
2013-04-05  4:50 ` David Miller
2013-04-05 22:15 ` Nikolay Aleksandrov
2013-04-05 23:21   ` Veaceslav Falico [this message]
2013-04-05 23:29     ` Nikolay Aleksandrov
2013-04-06  1:49       ` Veaceslav Falico
2013-04-06  7:38         ` Nikolay Aleksandrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130405232133.GA19437@redhat.com \
    --to=vfalico@redhat.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=fubar@us.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).