* Oops on aoe module removal
@ 2013-01-03 13:25 Josh Boyer
2013-01-03 14:02 ` Ed Cashin
0 siblings, 1 reply; 16+ messages in thread
From: Josh Boyer @ 2013-01-03 13:25 UTC (permalink / raw)
To: Ed L. Cashin; +Cc: mitko, axboe, linux-kernel, kernel-team
Hello,
We have a user that has reported an oops when removing the aoe module.
This seems to have been happening since the 3.4 kernel, as you can see
in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=853064
The recreate steps and oops output from a 3.6.11 kernel is below. Any
thoughts on what could be causing this?
josh
I run the following commands sequentially
- modprobe aoe
- dmesg:
[699170.611997] aoe: AoE v47 initialised.
[699170.653980] aoe: e4.1: setting 8192 byte data frames on eth1:000423d36ac3
[699170.654106] aoe: e6.0: setting 8192 byte data frames on eth1:000423d36ac3
[699170.654961] aoe: e6.2: setting 8192 byte data frames on eth1:000423d36ac3
[699170.654961] aoe: e6.3: setting 8192 byte data frames on eth1:000423d36ac3
[699170.654961] aoe: e8.1: setting 8192 byte data frames on eth1:000423d36ac3
[699170.654961] aoe: e8.2: setting 8192 byte data frames on eth1:000423d36ac3
[699170.654961] aoe: e8.10: setting 8192 byte data frames on eth1:000423d36ac3
[699170.654961] aoe: e8.11: setting 8192 byte data frames on eth1:000423d36ac3
[699170.654961] aoe: 000423d36ac3 e4.1 v0100 has 33554432 sectors
[699170.654961] aoe: 000423d36ac3 e6.0 v0100 has 12582912 sectors
[699170.654961] aoe: 000423d36ac3 e6.2 v0100 has 16777216 sectors
[699170.702143] aoe: 000423d36ac3 e6.3 v0100 has 104857600 sectors
[699170.706391] aoe: 000423d36ac3 e8.1 v0100 has 272629760 sectors
[699170.710623] aoe: 000423d36ac3 e8.2 v0100 has 67108864 sectors
[699170.714851] aoe: 000423d36ac3 e8.10 v0100 has 33554432 sectors
[699170.719056] aoe: 000423d36ac3 e8.11 v0100 has 67108864 sectors
[699170.824774] etherd/e4.1: p1
[699170.829069] etherd/e6.0: p1 p2
[699170.833274] etherd/e8.1: p1 p2
[699170.837329] etherd/e8.2: p1
[699170.841204] etherd/e8.10: p1
[699170.845030] etherd/e8.11: p1
[699170.848706] etherd/e6.3: unknown partition table
[699170.852384] etherd/e6.2: unknown partition table
- lsmod |grep aoe
aoe 32214 0
- modprobe -vr aoe
- dmesg:
[699231.304689] ------------[ cut here ]------------
[699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
[699231.312031] Hardware name: S5000VSA
[699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
[699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
[699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
[699231.340561] Call Trace:
[699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
[699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
[699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
[699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
[699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
[699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
[699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
[699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
[699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
[699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
[699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
[699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
[699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
[699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
[699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
[699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
[699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
[699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
[699231.419117] ---[ end trace 9e1558af1964b569 ]---
[699231.423248] ------------[ cut here ]------------
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 13:25 Oops on aoe module removal Josh Boyer
@ 2013-01-03 14:02 ` Ed Cashin
2013-01-03 14:09 ` Jens Axboe
0 siblings, 1 reply; 16+ messages in thread
From: Ed Cashin @ 2013-01-03 14:02 UTC (permalink / raw)
To: Josh Boyer
Cc: mitko@banksoft-bg.com, axboe@kernel.dk,
linux-kernel@vger.kernel.org, kernel-team@fedoraproject.org,
Peter Zijlstra
On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
> Hello,
>
> We have a user that has reported an oops when removing the aoe module.
> This seems to have been happening since the 3.4 kernel, as you can see
> in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=853064
>
> The recreate steps and oops output from a 3.6.11 kernel is below. Any
> thoughts on what could be causing this?
>
> josh
>
>
> I run the following commands sequentially
>
> - modprobe aoe
> - dmesg:
> [699170.611997] aoe: AoE v47 initialised.
> [699170.653980] aoe: e4.1: setting 8192 byte data frames on eth1:000423d36ac3
> [699170.654106] aoe: e6.0: setting 8192 byte data frames on eth1:000423d36ac3
> [699170.654961] aoe: e6.2: setting 8192 byte data frames on eth1:000423d36ac3
> [699170.654961] aoe: e6.3: setting 8192 byte data frames on eth1:000423d36ac3
> [699170.654961] aoe: e8.1: setting 8192 byte data frames on eth1:000423d36ac3
> [699170.654961] aoe: e8.2: setting 8192 byte data frames on eth1:000423d36ac3
> [699170.654961] aoe: e8.10: setting 8192 byte data frames on eth1:000423d36ac3
> [699170.654961] aoe: e8.11: setting 8192 byte data frames on eth1:000423d36ac3
> [699170.654961] aoe: 000423d36ac3 e4.1 v0100 has 33554432 sectors
> [699170.654961] aoe: 000423d36ac3 e6.0 v0100 has 12582912 sectors
> [699170.654961] aoe: 000423d36ac3 e6.2 v0100 has 16777216 sectors
> [699170.702143] aoe: 000423d36ac3 e6.3 v0100 has 104857600 sectors
> [699170.706391] aoe: 000423d36ac3 e8.1 v0100 has 272629760 sectors
> [699170.710623] aoe: 000423d36ac3 e8.2 v0100 has 67108864 sectors
> [699170.714851] aoe: 000423d36ac3 e8.10 v0100 has 33554432 sectors
> [699170.719056] aoe: 000423d36ac3 e8.11 v0100 has 67108864 sectors
> [699170.824774] etherd/e4.1: p1
> [699170.829069] etherd/e6.0: p1 p2
> [699170.833274] etherd/e8.1: p1 p2
> [699170.837329] etherd/e8.2: p1
> [699170.841204] etherd/e8.10: p1
> [699170.845030] etherd/e8.11: p1
> [699170.848706] etherd/e6.3: unknown partition table
> [699170.852384] etherd/e6.2: unknown partition table
>
> - lsmod |grep aoe
> aoe 32214 0
>
> - modprobe -vr aoe
> - dmesg:
> [699231.304689] ------------[ cut here ]------------
> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
> [699231.312031] Hardware name: S5000VSA
> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
> [699231.340561] Call Trace:
> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
> [699231.423248] ------------[ cut here ]------------
Thanks for the report. The problem seems to be older than that (see 2.6.32 below), and it seems to be related to changes that first appeared in 2.6.24. I'm going to investigate the changes introduced in the commit below to see whether the aoe driver needed updating when they went in. I'm Cc-ing Peter Zijlstra in case this rings any bells.
commit b2e8fb6efa209c82203c79b491b5bc952d44aa57
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue Oct 16 23:25:47 2007 -0700
mm: scalable bdi statistics counters
Provide scalable per backing_dev_info statistics counters.
CentOS release 6.2 (Final)
Kernel 2.6.32 on an x86_64
localhost.localdomain login: aoe: AoE v47 initialised.
e1000: eth1 changing MTU from 1500 to 9000
aoe: e0.1: setting 8704 byte data frames on eth1:0800275abc70
e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
aoe: 0800275abc70 e0.1 v4014 has 20971520 sectors
etherd/e0.1: unknown partition table
------------[ cut here ]------------
WARNING: at lib/list_debug.c:51 list_del+0x81/0x90()
Hardware name: VirtualBox
list_del corruption. next->prev should be ffff880037524440, but was ffffffff817961c0
Modules linked in: aoe(-) ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport pcspkr i2c_piix4 i2c_core snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 1077, comm: rmmod Not tainted 2.6.32 #1
Call Trace:
[<ffffffff8106735b>] warn_slowpath_common+0x7b/0xc0
[<ffffffff81067401>] warn_slowpath_fmt+0x41/0x50
[<ffffffff8123a931>] list_del+0x81/0x90
[<ffffffff8123def8>] percpu_counter_destroy+0x28/0x50
[<ffffffff81118b39>] bdi_destroy+0xf9/0x150
[<ffffffff8121aa70>] blk_release_queue+0x60/0x80
[<ffffffff8122dccd>] kobject_release+0x8d/0x240
[<ffffffff8122dc40>] ? kobject_release+0x0/0x240
[<ffffffff8122f1e7>] kref_put+0x37/0x70
[<ffffffff8122db47>] kobject_put+0x27/0x60
[<ffffffff812174c7>] blk_cleanup_queue+0x57/0x70
[<ffffffffa03412d5>] aoedev_freedev+0x125/0x140 [aoe]
[<ffffffffa03416fd>] aoedev_exit+0x6d/0x90 [aoe]
[<ffffffffa03419e3>] aoe_exit+0x33/0x40 [aoe]
[<ffffffff810a6db8>] sys_delete_module+0x1a8/0x280
[<ffffffff81090aae>] ? up_read+0xe/0x10
[<ffffffff81013072>] system_call_fastpath+0x16/0x1b
---[ end trace a6163f827673f4fe ]---
--
Ed Cashin
ecashin@coraid.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 14:02 ` Ed Cashin
@ 2013-01-03 14:09 ` Jens Axboe
2013-01-03 14:12 ` Jens Axboe
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2013-01-03 14:09 UTC (permalink / raw)
To: Ed Cashin
Cc: Josh Boyer, mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
On 2013-01-03 15:02, Ed Cashin wrote:
> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
>
>> Hello,
>>
>> We have a user that has reported an oops when removing the aoe module.
>> This seems to have been happening since the 3.4 kernel, as you can see
>> in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=853064
>>
>> The recreate steps and oops output from a 3.6.11 kernel is below. Any
>> thoughts on what could be causing this?
>>
>> josh
>>
>>
>> I run the following commands sequentially
>>
>> - modprobe aoe
>> - dmesg:
>> [699170.611997] aoe: AoE v47 initialised.
>> [699170.653980] aoe: e4.1: setting 8192 byte data frames on eth1:000423d36ac3
>> [699170.654106] aoe: e6.0: setting 8192 byte data frames on eth1:000423d36ac3
>> [699170.654961] aoe: e6.2: setting 8192 byte data frames on eth1:000423d36ac3
>> [699170.654961] aoe: e6.3: setting 8192 byte data frames on eth1:000423d36ac3
>> [699170.654961] aoe: e8.1: setting 8192 byte data frames on eth1:000423d36ac3
>> [699170.654961] aoe: e8.2: setting 8192 byte data frames on eth1:000423d36ac3
>> [699170.654961] aoe: e8.10: setting 8192 byte data frames on eth1:000423d36ac3
>> [699170.654961] aoe: e8.11: setting 8192 byte data frames on eth1:000423d36ac3
>> [699170.654961] aoe: 000423d36ac3 e4.1 v0100 has 33554432 sectors
>> [699170.654961] aoe: 000423d36ac3 e6.0 v0100 has 12582912 sectors
>> [699170.654961] aoe: 000423d36ac3 e6.2 v0100 has 16777216 sectors
>> [699170.702143] aoe: 000423d36ac3 e6.3 v0100 has 104857600 sectors
>> [699170.706391] aoe: 000423d36ac3 e8.1 v0100 has 272629760 sectors
>> [699170.710623] aoe: 000423d36ac3 e8.2 v0100 has 67108864 sectors
>> [699170.714851] aoe: 000423d36ac3 e8.10 v0100 has 33554432 sectors
>> [699170.719056] aoe: 000423d36ac3 e8.11 v0100 has 67108864 sectors
>> [699170.824774] etherd/e4.1: p1
>> [699170.829069] etherd/e6.0: p1 p2
>> [699170.833274] etherd/e8.1: p1 p2
>> [699170.837329] etherd/e8.2: p1
>> [699170.841204] etherd/e8.10: p1
>> [699170.845030] etherd/e8.11: p1
>> [699170.848706] etherd/e6.3: unknown partition table
>> [699170.852384] etherd/e6.2: unknown partition table
>>
>> - lsmod |grep aoe
>> aoe 32214 0
>>
>> - modprobe -vr aoe
>> - dmesg:
>> [699231.304689] ------------[ cut here ]------------
>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
>> [699231.312031] Hardware name: S5000VSA
>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
>> [699231.340561] Call Trace:
>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
>> [699231.423248] ------------[ cut here ]------------
>
> Thanks for the report. The problem seems to be older than that (see
> 2.6.32 below), and it seems to be related to changes that first
> appeared in 2.6.24. I'm going to investigate the changes introduced
> in the commit below to see whether the aoe driver needed updating when
> they went in. I'm Cc-ing Peter Zijlstra in case this rings any bells.
I highly doubt that has anything to do with it. Since it triggers
immediately on rmmod after modprobe (and not having set a device up,
presumably, being the key), it looks like a generic bug in aoeblk.
Ed, can you reproduce the issue?
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 14:09 ` Jens Axboe
@ 2013-01-03 14:12 ` Jens Axboe
2013-01-03 15:28 ` Ed Cashin
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2013-01-03 14:12 UTC (permalink / raw)
To: Ed Cashin
Cc: Josh Boyer, mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
On 2013-01-03 15:09, Jens Axboe wrote:
> On 2013-01-03 15:02, Ed Cashin wrote:
>> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
>>
>>> Hello,
>>>
>>> We have a user that has reported an oops when removing the aoe module.
>>> This seems to have been happening since the 3.4 kernel, as you can see
>>> in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=853064
>>>
>>> The recreate steps and oops output from a 3.6.11 kernel is below. Any
>>> thoughts on what could be causing this?
>>>
>>> josh
>>>
>>>
>>> I run the following commands sequentially
>>>
>>> - modprobe aoe
>>> - dmesg:
>>> [699170.611997] aoe: AoE v47 initialised.
>>> [699170.653980] aoe: e4.1: setting 8192 byte data frames on eth1:000423d36ac3
>>> [699170.654106] aoe: e6.0: setting 8192 byte data frames on eth1:000423d36ac3
>>> [699170.654961] aoe: e6.2: setting 8192 byte data frames on eth1:000423d36ac3
>>> [699170.654961] aoe: e6.3: setting 8192 byte data frames on eth1:000423d36ac3
>>> [699170.654961] aoe: e8.1: setting 8192 byte data frames on eth1:000423d36ac3
>>> [699170.654961] aoe: e8.2: setting 8192 byte data frames on eth1:000423d36ac3
>>> [699170.654961] aoe: e8.10: setting 8192 byte data frames on eth1:000423d36ac3
>>> [699170.654961] aoe: e8.11: setting 8192 byte data frames on eth1:000423d36ac3
>>> [699170.654961] aoe: 000423d36ac3 e4.1 v0100 has 33554432 sectors
>>> [699170.654961] aoe: 000423d36ac3 e6.0 v0100 has 12582912 sectors
>>> [699170.654961] aoe: 000423d36ac3 e6.2 v0100 has 16777216 sectors
>>> [699170.702143] aoe: 000423d36ac3 e6.3 v0100 has 104857600 sectors
>>> [699170.706391] aoe: 000423d36ac3 e8.1 v0100 has 272629760 sectors
>>> [699170.710623] aoe: 000423d36ac3 e8.2 v0100 has 67108864 sectors
>>> [699170.714851] aoe: 000423d36ac3 e8.10 v0100 has 33554432 sectors
>>> [699170.719056] aoe: 000423d36ac3 e8.11 v0100 has 67108864 sectors
>>> [699170.824774] etherd/e4.1: p1
>>> [699170.829069] etherd/e6.0: p1 p2
>>> [699170.833274] etherd/e8.1: p1 p2
>>> [699170.837329] etherd/e8.2: p1
>>> [699170.841204] etherd/e8.10: p1
>>> [699170.845030] etherd/e8.11: p1
>>> [699170.848706] etherd/e6.3: unknown partition table
>>> [699170.852384] etherd/e6.2: unknown partition table
>>>
>>> - lsmod |grep aoe
>>> aoe 32214 0
>>>
>>> - modprobe -vr aoe
>>> - dmesg:
>>> [699231.304689] ------------[ cut here ]------------
>>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
>>> [699231.312031] Hardware name: S5000VSA
>>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
>>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
>>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
>>> [699231.340561] Call Trace:
>>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
>>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
>>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
>>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
>>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
>>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
>>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
>>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
>>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
>>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
>>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
>>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
>>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
>>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
>>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
>>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
>>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
>>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
>>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
>>> [699231.423248] ------------[ cut here ]------------
>>
>> Thanks for the report. The problem seems to be older than that (see
>> 2.6.32 below), and it seems to be related to changes that first
>> appeared in 2.6.24. I'm going to investigate the changes introduced
>> in the commit below to see whether the aoe driver needed updating when
>> they went in. I'm Cc-ing Peter Zijlstra in case this rings any bells.
>
> I highly doubt that has anything to do with it. Since it triggers
> immediately on rmmod after modprobe (and not having set a device up,
> presumably, being the key), it looks like a generic bug in aoeblk.
>
> Ed, can you reproduce the issue?
Quick guess...
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index 98f2965..e4473af 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -280,8 +280,8 @@ freedev(struct aoedev *d)
if (d->gd) {
aoedisk_rm_sysfs(d);
del_gendisk(d->gd);
- put_disk(d->gd);
blk_cleanup_queue(d->blkq);
+ put_disk(d->gd);
}
t = d->targets;
e = t + d->ntargets;
--
Jens Axboe
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 14:12 ` Jens Axboe
@ 2013-01-03 15:28 ` Ed Cashin
2013-01-03 15:34 ` Jens Axboe
0 siblings, 1 reply; 16+ messages in thread
From: Ed Cashin @ 2013-01-03 15:28 UTC (permalink / raw)
To: Jens Axboe
Cc: Josh Boyer, mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
On Jan 3, 2013, at 9:12 AM, Jens Axboe wrote:
> On 2013-01-03 15:09, Jens Axboe wrote:
>> On 2013-01-03 15:02, Ed Cashin wrote:
>>> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
>>>
>>>> Hello,
>>>>
>>>> We have a user that has reported an oops when removing the aoe module.
>>>> This seems to have been happening since the 3.4 kernel, as you can see
>>>> in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=853064
>>>>
>>>> The recreate steps and oops output from a 3.6.11 kernel is below. Any
>>>> thoughts on what could be causing this?
>>>>
>>>> josh
>>>>
>>>>
>>>> I run the following commands sequentially
>>>>
>>>> - modprobe aoe
>>>> - dmesg:
>>>> [699170.611997] aoe: AoE v47 initialised.
>>>> [699170.653980] aoe: e4.1: setting 8192 byte data frames on eth1:000423d36ac3
>>>> [699170.654106] aoe: e6.0: setting 8192 byte data frames on eth1:000423d36ac3
>>>> [699170.654961] aoe: e6.2: setting 8192 byte data frames on eth1:000423d36ac3
>>>> [699170.654961] aoe: e6.3: setting 8192 byte data frames on eth1:000423d36ac3
>>>> [699170.654961] aoe: e8.1: setting 8192 byte data frames on eth1:000423d36ac3
>>>> [699170.654961] aoe: e8.2: setting 8192 byte data frames on eth1:000423d36ac3
>>>> [699170.654961] aoe: e8.10: setting 8192 byte data frames on eth1:000423d36ac3
>>>> [699170.654961] aoe: e8.11: setting 8192 byte data frames on eth1:000423d36ac3
>>>> [699170.654961] aoe: 000423d36ac3 e4.1 v0100 has 33554432 sectors
>>>> [699170.654961] aoe: 000423d36ac3 e6.0 v0100 has 12582912 sectors
>>>> [699170.654961] aoe: 000423d36ac3 e6.2 v0100 has 16777216 sectors
>>>> [699170.702143] aoe: 000423d36ac3 e6.3 v0100 has 104857600 sectors
>>>> [699170.706391] aoe: 000423d36ac3 e8.1 v0100 has 272629760 sectors
>>>> [699170.710623] aoe: 000423d36ac3 e8.2 v0100 has 67108864 sectors
>>>> [699170.714851] aoe: 000423d36ac3 e8.10 v0100 has 33554432 sectors
>>>> [699170.719056] aoe: 000423d36ac3 e8.11 v0100 has 67108864 sectors
>>>> [699170.824774] etherd/e4.1: p1
>>>> [699170.829069] etherd/e6.0: p1 p2
>>>> [699170.833274] etherd/e8.1: p1 p2
>>>> [699170.837329] etherd/e8.2: p1
>>>> [699170.841204] etherd/e8.10: p1
>>>> [699170.845030] etherd/e8.11: p1
>>>> [699170.848706] etherd/e6.3: unknown partition table
>>>> [699170.852384] etherd/e6.2: unknown partition table
>>>>
>>>> - lsmod |grep aoe
>>>> aoe 32214 0
>>>>
>>>> - modprobe -vr aoe
>>>> - dmesg:
>>>> [699231.304689] ------------[ cut here ]------------
>>>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
>>>> [699231.312031] Hardware name: S5000VSA
>>>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
>>>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
>>>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
>>>> [699231.340561] Call Trace:
>>>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
>>>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
>>>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
>>>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
>>>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
>>>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
>>>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
>>>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
>>>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
>>>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
>>>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
>>>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
>>>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
>>>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
>>>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
>>>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
>>>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
>>>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
>>>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
>>>> [699231.423248] ------------[ cut here ]------------
>>>
>>> Thanks for the report. The problem seems to be older than that (see
>>> 2.6.32 below), and it seems to be related to changes that first
>>> appeared in 2.6.24. I'm going to investigate the changes introduced
>>> in the commit below to see whether the aoe driver needed updating when
>>> they went in. I'm Cc-ing Peter Zijlstra in case this rings any bells.
>>
>> I highly doubt that has anything to do with it. Since it triggers
>> immediately on rmmod after modprobe (and not having set a device up,
>> presumably, being the key), it looks like a generic bug in aoeblk.
>>
>> Ed, can you reproduce the issue?
>
> Quick guess...
>
>
> diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
> index 98f2965..e4473af 100644
> --- a/drivers/block/aoe/aoedev.c
> +++ b/drivers/block/aoe/aoedev.c
> @@ -280,8 +280,8 @@ freedev(struct aoedev *d)
> if (d->gd) {
> aoedisk_rm_sysfs(d);
> del_gendisk(d->gd);
> - put_disk(d->gd);
> blk_cleanup_queue(d->blkq);
> + put_disk(d->gd);
> }
> t = d->targets;
> e = t + d->ntargets;
Yes, I can reproduce it on 3.5.6. There are devices up, none down, when I do rmmod. If no aoe devices are present, the warnings do not appear.
The suggestion above to move put_disk after blk_cleanup_queue doesn't affect the list_del warnings, but thanks for the quick guess---Is that something we need to change regardless of the issue at hand?
--
Ed Cashin
ecashin@coraid.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 15:28 ` Ed Cashin
@ 2013-01-03 15:34 ` Jens Axboe
2013-01-03 18:15 ` Ed Cashin
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2013-01-03 15:34 UTC (permalink / raw)
To: Ed Cashin
Cc: Josh Boyer, mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
On 2013-01-03 16:28, Ed Cashin wrote:
> On Jan 3, 2013, at 9:12 AM, Jens Axboe wrote:
>
>> On 2013-01-03 15:09, Jens Axboe wrote:
>>> On 2013-01-03 15:02, Ed Cashin wrote:
>>>> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We have a user that has reported an oops when removing the aoe module.
>>>>> This seems to have been happening since the 3.4 kernel, as you can see
>>>>> in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=853064
>>>>>
>>>>> The recreate steps and oops output from a 3.6.11 kernel is below. Any
>>>>> thoughts on what could be causing this?
>>>>>
>>>>> josh
>>>>>
>>>>>
>>>>> I run the following commands sequentially
>>>>>
>>>>> - modprobe aoe
>>>>> - dmesg:
>>>>> [699170.611997] aoe: AoE v47 initialised.
>>>>> [699170.653980] aoe: e4.1: setting 8192 byte data frames on eth1:000423d36ac3
>>>>> [699170.654106] aoe: e6.0: setting 8192 byte data frames on eth1:000423d36ac3
>>>>> [699170.654961] aoe: e6.2: setting 8192 byte data frames on eth1:000423d36ac3
>>>>> [699170.654961] aoe: e6.3: setting 8192 byte data frames on eth1:000423d36ac3
>>>>> [699170.654961] aoe: e8.1: setting 8192 byte data frames on eth1:000423d36ac3
>>>>> [699170.654961] aoe: e8.2: setting 8192 byte data frames on eth1:000423d36ac3
>>>>> [699170.654961] aoe: e8.10: setting 8192 byte data frames on eth1:000423d36ac3
>>>>> [699170.654961] aoe: e8.11: setting 8192 byte data frames on eth1:000423d36ac3
>>>>> [699170.654961] aoe: 000423d36ac3 e4.1 v0100 has 33554432 sectors
>>>>> [699170.654961] aoe: 000423d36ac3 e6.0 v0100 has 12582912 sectors
>>>>> [699170.654961] aoe: 000423d36ac3 e6.2 v0100 has 16777216 sectors
>>>>> [699170.702143] aoe: 000423d36ac3 e6.3 v0100 has 104857600 sectors
>>>>> [699170.706391] aoe: 000423d36ac3 e8.1 v0100 has 272629760 sectors
>>>>> [699170.710623] aoe: 000423d36ac3 e8.2 v0100 has 67108864 sectors
>>>>> [699170.714851] aoe: 000423d36ac3 e8.10 v0100 has 33554432 sectors
>>>>> [699170.719056] aoe: 000423d36ac3 e8.11 v0100 has 67108864 sectors
>>>>> [699170.824774] etherd/e4.1: p1
>>>>> [699170.829069] etherd/e6.0: p1 p2
>>>>> [699170.833274] etherd/e8.1: p1 p2
>>>>> [699170.837329] etherd/e8.2: p1
>>>>> [699170.841204] etherd/e8.10: p1
>>>>> [699170.845030] etherd/e8.11: p1
>>>>> [699170.848706] etherd/e6.3: unknown partition table
>>>>> [699170.852384] etherd/e6.2: unknown partition table
>>>>>
>>>>> - lsmod |grep aoe
>>>>> aoe 32214 0
>>>>>
>>>>> - modprobe -vr aoe
>>>>> - dmesg:
>>>>> [699231.304689] ------------[ cut here ]------------
>>>>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
>>>>> [699231.312031] Hardware name: S5000VSA
>>>>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
>>>>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
>>>>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
>>>>> [699231.340561] Call Trace:
>>>>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
>>>>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
>>>>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
>>>>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
>>>>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
>>>>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
>>>>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
>>>>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
>>>>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
>>>>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
>>>>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
>>>>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
>>>>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
>>>>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
>>>>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
>>>>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
>>>>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
>>>>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
>>>>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
>>>>> [699231.423248] ------------[ cut here ]------------
>>>>
>>>> Thanks for the report. The problem seems to be older than that (see
>>>> 2.6.32 below), and it seems to be related to changes that first
>>>> appeared in 2.6.24. I'm going to investigate the changes introduced
>>>> in the commit below to see whether the aoe driver needed updating when
>>>> they went in. I'm Cc-ing Peter Zijlstra in case this rings any bells.
>>>
>>> I highly doubt that has anything to do with it. Since it triggers
>>> immediately on rmmod after modprobe (and not having set a device up,
>>> presumably, being the key), it looks like a generic bug in aoeblk.
>>>
>>> Ed, can you reproduce the issue?
>>
>> Quick guess...
>>
>>
>> diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
>> index 98f2965..e4473af 100644
>> --- a/drivers/block/aoe/aoedev.c
>> +++ b/drivers/block/aoe/aoedev.c
>> @@ -280,8 +280,8 @@ freedev(struct aoedev *d)
>> if (d->gd) {
>> aoedisk_rm_sysfs(d);
>> del_gendisk(d->gd);
>> - put_disk(d->gd);
>> blk_cleanup_queue(d->blkq);
>> + put_disk(d->gd);
>> }
>> t = d->targets;
>> e = t + d->ntargets;
>
> Yes, I can reproduce it on 3.5.6. There are devices up, none down,
> when I do rmmod. If no aoe devices are present, the warnings do not
> appear.
OK, that's good at least. I can try here too.
> The suggestion above to move put_disk after blk_cleanup_queue doesn't
> affect the list_del warnings, but thanks for the quick guess---Is that
> something we need to change regardless of the issue at hand?
No should not matter.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 15:34 ` Jens Axboe
@ 2013-01-03 18:15 ` Ed Cashin
2013-01-03 19:28 ` Ed Cashin, Ed Cashin
0 siblings, 1 reply; 16+ messages in thread
From: Ed Cashin @ 2013-01-03 18:15 UTC (permalink / raw)
To: Jens Axboe
Cc: Josh Boyer, mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
On Jan 3, 2013, at 10:34 AM, Jens Axboe wrote:
> On 2013-01-03 16:28, Ed Cashin wrote:
>> On Jan 3, 2013, at 9:12 AM, Jens Axboe wrote:
>>
>>> On 2013-01-03 15:09, Jens Axboe wrote:
>>>> On 2013-01-03 15:02, Ed Cashin wrote:
>>>>> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We have a user that has reported an oops when removing the aoe module.
>>>>>> This seems to have been happening since the 3.4 kernel, as you can see
>>>>>> in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=853064
>>>>>>
>>>>>> The recreate steps and oops output from a 3.6.11 kernel is below. Any
>>>>>> thoughts on what could be causing this?
>>>>>>
>>>>>> josh
>>>>>>
>>>>>>
>>>>>> I run the following commands sequentially
>>>>>>
>>>>>> - modprobe aoe
>>>>>> - dmesg:
>>>>>> [699170.611997] aoe: AoE v47 initialised.
>>>>>> [699170.653980] aoe: e4.1: setting 8192 byte data frames on eth1:000423d36ac3
>>>>>> [699170.654106] aoe: e6.0: setting 8192 byte data frames on eth1:000423d36ac3
>>>>>> [699170.654961] aoe: e6.2: setting 8192 byte data frames on eth1:000423d36ac3
>>>>>> [699170.654961] aoe: e6.3: setting 8192 byte data frames on eth1:000423d36ac3
>>>>>> [699170.654961] aoe: e8.1: setting 8192 byte data frames on eth1:000423d36ac3
>>>>>> [699170.654961] aoe: e8.2: setting 8192 byte data frames on eth1:000423d36ac3
>>>>>> [699170.654961] aoe: e8.10: setting 8192 byte data frames on eth1:000423d36ac3
>>>>>> [699170.654961] aoe: e8.11: setting 8192 byte data frames on eth1:000423d36ac3
>>>>>> [699170.654961] aoe: 000423d36ac3 e4.1 v0100 has 33554432 sectors
>>>>>> [699170.654961] aoe: 000423d36ac3 e6.0 v0100 has 12582912 sectors
>>>>>> [699170.654961] aoe: 000423d36ac3 e6.2 v0100 has 16777216 sectors
>>>>>> [699170.702143] aoe: 000423d36ac3 e6.3 v0100 has 104857600 sectors
>>>>>> [699170.706391] aoe: 000423d36ac3 e8.1 v0100 has 272629760 sectors
>>>>>> [699170.710623] aoe: 000423d36ac3 e8.2 v0100 has 67108864 sectors
>>>>>> [699170.714851] aoe: 000423d36ac3 e8.10 v0100 has 33554432 sectors
>>>>>> [699170.719056] aoe: 000423d36ac3 e8.11 v0100 has 67108864 sectors
>>>>>> [699170.824774] etherd/e4.1: p1
>>>>>> [699170.829069] etherd/e6.0: p1 p2
>>>>>> [699170.833274] etherd/e8.1: p1 p2
>>>>>> [699170.837329] etherd/e8.2: p1
>>>>>> [699170.841204] etherd/e8.10: p1
>>>>>> [699170.845030] etherd/e8.11: p1
>>>>>> [699170.848706] etherd/e6.3: unknown partition table
>>>>>> [699170.852384] etherd/e6.2: unknown partition table
>>>>>>
>>>>>> - lsmod |grep aoe
>>>>>> aoe 32214 0
>>>>>>
>>>>>> - modprobe -vr aoe
>>>>>> - dmesg:
>>>>>> [699231.304689] ------------[ cut here ]------------
>>>>>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
>>>>>> [699231.312031] Hardware name: S5000VSA
>>>>>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
>>>>>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
>>>>>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
>>>>>> [699231.340561] Call Trace:
>>>>>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
>>>>>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
>>>>>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
>>>>>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
>>>>>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
>>>>>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
>>>>>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
>>>>>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
>>>>>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
>>>>>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
>>>>>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
>>>>>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
>>>>>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
>>>>>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
>>>>>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
>>>>>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
>>>>>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
>>>>>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
>>>>>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
>>>>>> [699231.423248] ------------[ cut here ]------------
>>>>>
>>>>> Thanks for the report. The problem seems to be older than that (see
>>>>> 2.6.32 below), and it seems to be related to changes that first
>>>>> appeared in 2.6.24. I'm going to investigate the changes introduced
>>>>> in the commit below to see whether the aoe driver needed updating when
>>>>> they went in. I'm Cc-ing Peter Zijlstra in case this rings any bells.
>>>>
>>>> I highly doubt that has anything to do with it. Since it triggers
>>>> immediately on rmmod after modprobe (and not having set a device up,
>>>> presumably, being the key), it looks like a generic bug in aoeblk.
>>>>
>>>> Ed, can you reproduce the issue?
>>>
>>> Quick guess...
>>>
>>>
>>> diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
>>> index 98f2965..e4473af 100644
>>> --- a/drivers/block/aoe/aoedev.c
>>> +++ b/drivers/block/aoe/aoedev.c
>>> @@ -280,8 +280,8 @@ freedev(struct aoedev *d)
>>> if (d->gd) {
>>> aoedisk_rm_sysfs(d);
>>> del_gendisk(d->gd);
>>> - put_disk(d->gd);
>>> blk_cleanup_queue(d->blkq);
>>> + put_disk(d->gd);
>>> }
>>> t = d->targets;
>>> e = t + d->ntargets;
>>
>> Yes, I can reproduce it on 3.5.6. There are devices up, none down,
>> when I do rmmod. If no aoe devices are present, the warnings do not
>> appear.
>
> OK, that's good at least. I can try here too.
>
>> The suggestion above to move put_disk after blk_cleanup_queue doesn't
>> affect the list_del warnings, but thanks for the quick guess---Is that
>> something we need to change regardless of the issue at hand?
>
> No should not matter.
Argh. I tried to send the email in a hurry and vger rejected the message as spam because of the HTML subpart. I'll get a process set up for sending replies as clean emails. Meanwhile, my message said,
"Thanks. I think I see the problem. The blk_alloc_queue has already done a bdi_init, and there's no need for aoeblk.c to do it, so when it does, the bdi_stat lists get messed up.
If you think that makes sense and the patch below (corrupted by our MS Exchange email server) also makes sense as a proof of concept, I'll send akpm a proper patch without email corruption. The patch below eliminates the list_del corruption messages in my tests, and I think the mainline aoe driver needs (the cleaned up version of) it as well as older stable kernels."
... and the patch just ifdeffed out the call to bdi_init in aoeblk.c:aoeblk_gdalloc(), along with the associated error handling.
--
Ed Cashin
ecashin@coraid.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 18:15 ` Ed Cashin
@ 2013-01-03 19:28 ` Ed Cashin, Ed Cashin
2013-01-03 19:45 ` Jens Axboe
0 siblings, 1 reply; 16+ messages in thread
From: Ed Cashin, Ed Cashin @ 2013-01-03 19:28 UTC (permalink / raw)
To: Jens Axboe
Cc: Josh Boyer, ecashin, mitko@banksoft-bg.com,
linux-kernel@vger.kernel.org, kernel-team@fedoraproject.org,
Peter Zijlstra
Lines: 75
On Thu, Jan 03, 2013 at 12:15:35PM -0600, Ed Cashin wrote:
...
> >>>>> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
...
> >>>>>> [699170.611997] aoe: AoE v47 initialised.
...
> >>>>>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
> >>>>>> [699231.312031] Hardware name: S5000VSA
> >>>>>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
> >>>>>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
> >>>>>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
> >>>>>> [699231.340561] Call Trace:
> >>>>>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
> >>>>>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
> >>>>>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
> >>>>>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
> >>>>>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
> >>>>>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
> >>>>>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
> >>>>>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
> >>>>>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
> >>>>>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
> >>>>>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
> >>>>>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
> >>>>>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
> >>>>>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
> >>>>>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
> >>>>>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
> >>>>>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
> >>>>>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
> >>>>>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
> >>>>>> [699231.423248] ------------[ cut here ]------------
The blk_alloc_queue has already done a bdi_init, so do not bdi_init again in
aoeblk_gdalloc.
The patch below applies to v3.5.6, with its v47 aoe driver. On my system it
eliminates the list_del corruption messages.
It updates VERSION for convenience during testing.
diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index db195ab..2ccb9e2 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -1,5 +1,5 @@
/* Copyright (c) 2007 Coraid, Inc. See COPYING for GPL terms. */
-#define VERSION "47"
+#define VERSION "47nobdi1"
#define AOE_MAJOR 152
#define DEVICE_NAME "aoe"
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 321de7b..7eca463 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -276,8 +276,6 @@ aoeblk_gdalloc(void *vp)
goto err_mempool;
blk_queue_make_request(d->blkq, aoeblk_make_request);
d->blkq->backing_dev_info.name = "aoe";
- if (bdi_init(&d->blkq->backing_dev_info))
- goto err_blkq;
spin_lock_irqsave(&d->lock, flags);
gd->major = AOE_MAJOR;
gd->first_minor = d->sysminor * AOE_PARTITIONS;
@@ -298,9 +296,6 @@ aoeblk_gdalloc(void *vp)
aoedisk_add_sysfs(d);
return;
-err_blkq:
- blk_cleanup_queue(d->blkq);
- d->blkq = NULL;
err_mempool:
mempool_destroy(d->bufpool);
err_disk:
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 19:28 ` Ed Cashin, Ed Cashin
@ 2013-01-03 19:45 ` Jens Axboe
2013-01-03 19:57 ` Ed Cashin
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2013-01-03 19:45 UTC (permalink / raw)
To: Ed Cashin
Cc: Josh Boyer, mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
On 2013-01-03 20:28, Ed Cashin wrote:
> Lines: 75
>
> On Thu, Jan 03, 2013 at 12:15:35PM -0600, Ed Cashin wrote:
> ...
>>>>>>> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
> ...
>>>>>>>> [699170.611997] aoe: AoE v47 initialised.
> ...
>>>>>>>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
>>>>>>>> [699231.312031] Hardware name: S5000VSA
>>>>>>>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
>>>>>>>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
>>>>>>>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
>>>>>>>> [699231.340561] Call Trace:
>>>>>>>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
>>>>>>>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
>>>>>>>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
>>>>>>>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
>>>>>>>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
>>>>>>>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
>>>>>>>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
>>>>>>>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
>>>>>>>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
>>>>>>>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
>>>>>>>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
>>>>>>>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
>>>>>>>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
>>>>>>>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
>>>>>>>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
>>>>>>>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
>>>>>>>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
>>>>>>>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
>>>>>>>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
>>>>>>>> [699231.423248] ------------[ cut here ]------------
>
> The blk_alloc_queue has already done a bdi_init, so do not bdi_init again in
> aoeblk_gdalloc.
>
> The patch below applies to v3.5.6, with its v47 aoe driver. On my system it
> eliminates the list_del corruption messages.
Since the patch doesn't apply to current -git, does the problem not
exist there?
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 19:45 ` Jens Axboe
@ 2013-01-03 19:57 ` Ed Cashin
2013-01-03 20:50 ` Ed Cashin
0 siblings, 1 reply; 16+ messages in thread
From: Ed Cashin @ 2013-01-03 19:57 UTC (permalink / raw)
To: Jens Axboe
Cc: Josh Boyer, mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
On Jan 3, 2013, at 2:45 PM, Jens Axboe wrote:
> On 2013-01-03 20:28, Ed Cashin wrote:
>> Lines: 75
>>
>> On Thu, Jan 03, 2013 at 12:15:35PM -0600, Ed Cashin wrote:
>> ...
>>>>>>>> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
>> ...
>>>>>>>>> [699170.611997] aoe: AoE v47 initialised.
>> ...
>>>>>>>>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
>>>>>>>>> [699231.312031] Hardware name: S5000VSA
>>>>>>>>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
>>>>>>>>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
>>>>>>>>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
>>>>>>>>> [699231.340561] Call Trace:
>>>>>>>>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
>>>>>>>>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
>>>>>>>>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
>>>>>>>>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
>>>>>>>>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
>>>>>>>>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
>>>>>>>>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
>>>>>>>>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
>>>>>>>>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
>>>>>>>>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
>>>>>>>>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
>>>>>>>>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
>>>>>>>>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
>>>>>>>>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
>>>>>>>>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
>>>>>>>>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
>>>>>>>>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
>>>>>>>>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
>>>>>>>>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
>>>>>>>>> [699231.423248] ------------[ cut here ]------------
>>
>> The blk_alloc_queue has already done a bdi_init, so do not bdi_init again in
>> aoeblk_gdalloc.
>>
>> The patch below applies to v3.5.6, with its v47 aoe driver. On my system it
>> eliminates the list_del corruption messages.
>
> Since the patch doesn't apply to current -git, does the problem not
> exist there?
The original post is about an older kernel with the v47 aoe driver. The current mainline has a v81 aoe driver, so the patch for v3.5.6 isn't expected to apply to the mainline.
I'm currently investigating the state of the mainline with relation to this issue.
--
Ed Cashin
ecashin@coraid.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 19:57 ` Ed Cashin
@ 2013-01-03 20:50 ` Ed Cashin
2013-01-03 21:00 ` Josh Boyer
2013-01-03 21:20 ` Ed Cashin
0 siblings, 2 replies; 16+ messages in thread
From: Ed Cashin @ 2013-01-03 20:50 UTC (permalink / raw)
To: Jens Axboe
Cc: Josh Boyer, mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
On Jan 3, 2013, at 2:57 PM, Ed Cashin wrote:
> On Jan 3, 2013, at 2:45 PM, Jens Axboe wrote:
>
>> On 2013-01-03 20:28, Ed Cashin wrote:
>>> Lines: 75
>>>
>>> On Thu, Jan 03, 2013 at 12:15:35PM -0600, Ed Cashin wrote:
>>> ...
>>>>>>>>> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
>>> ...
>>>>>>>>>> [699170.611997] aoe: AoE v47 initialised.
>>> ...
>>>>>>>>>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
>>>>>>>>>> [699231.312031] Hardware name: S5000VSA
>>>>>>>>>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
>>>>>>>>>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
>>>>>>>>>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
>>>>>>>>>> [699231.340561] Call Trace:
>>>>>>>>>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
>>>>>>>>>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
>>>>>>>>>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
>>>>>>>>>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
>>>>>>>>>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
>>>>>>>>>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
>>>>>>>>>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
>>>>>>>>>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
>>>>>>>>>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
>>>>>>>>>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
>>>>>>>>>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
>>>>>>>>>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
>>>>>>>>>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
>>>>>>>>>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
>>>>>>>>>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
>>>>>>>>>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
>>>>>>>>>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
>>>>>>>>>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
>>>>>>>>>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
>>>>>>>>>> [699231.423248] ------------[ cut here ]------------
>>>
>>> The blk_alloc_queue has already done a bdi_init, so do not bdi_init again in
>>> aoeblk_gdalloc.
>>>
>>> The patch below applies to v3.5.6, with its v47 aoe driver. On my system it
>>> eliminates the list_del corruption messages.
>>
>> Since the patch doesn't apply to current -git, does the problem not
>> exist there?
>
> The original post is about an older kernel with the v47 aoe driver. The current mainline has a v81 aoe driver, so the patch for v3.5.6 isn't expected to apply to the mainline.
>
> I'm currently investigating the state of the mainline with relation to this issue.
I don't see the extra call to bdi_init in aoe driver v81 in the mainline git tree, and
I don't see the symptoms under discussion, either, when performing the same
test steps.
Kernels with aoe v47 can use the patch I just posted.
I'm going to go through the stable kernels and check, but I believe kernels after
v47 but before commit 0a41409c5180 should apply the fix in 0a41409c5180:
commit 0a41409c518083133e79015092585d68915865be
Author: Ed Cashin <ecashin@coraid.com>
Date: Mon Dec 17 16:03:58 2012 -0800
aoe: remove vestigial request queue allocation
Josh, can you confirm that the patch I posted in this thread today works for
your customer?
--
Ed Cashin
ecashin@coraid.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 20:50 ` Ed Cashin
@ 2013-01-03 21:00 ` Josh Boyer
2013-01-04 12:35 ` Josh Boyer
2013-01-03 21:20 ` Ed Cashin
1 sibling, 1 reply; 16+ messages in thread
From: Josh Boyer @ 2013-01-03 21:00 UTC (permalink / raw)
To: Ed Cashin
Cc: Jens Axboe, mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
On Thu, Jan 03, 2013 at 02:50:46PM -0600, Ed Cashin wrote:
> >>> The blk_alloc_queue has already done a bdi_init, so do not bdi_init again in
> >>> aoeblk_gdalloc.
> >>>
> >>> The patch below applies to v3.5.6, with its v47 aoe driver. On my system it
> >>> eliminates the list_del corruption messages.
> >>
> >> Since the patch doesn't apply to current -git, does the problem not
> >> exist there?
> >
> > The original post is about an older kernel with the v47 aoe driver. The current mainline has a v81 aoe driver, so the patch for v3.5.6 isn't expected to apply to the mainline.
> >
> > I'm currently investigating the state of the mainline with relation to this issue.
>
> I don't see the extra call to bdi_init in aoe driver v81 in the mainline git tree, and
> I don't see the symptoms under discussion, either, when performing the same
> test steps.
>
> Kernels with aoe v47 can use the patch I just posted.
>
> I'm going to go through the stable kernels and check, but I believe kernels after
> v47 but before commit 0a41409c5180 should apply the fix in 0a41409c5180:
>
> commit 0a41409c518083133e79015092585d68915865be
> Author: Ed Cashin <ecashin@coraid.com>
> Date: Mon Dec 17 16:03:58 2012 -0800
>
> aoe: remove vestigial request queue allocation
>
> Josh, can you confirm that the patch I posted in this thread today works for
> your customer?
Sure. I'll get a test kernel built with that patch and ask them to
test.
josh
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 20:50 ` Ed Cashin
2013-01-03 21:00 ` Josh Boyer
@ 2013-01-03 21:20 ` Ed Cashin
2013-01-13 5:34 ` Ben Hutchings
1 sibling, 1 reply; 16+ messages in thread
From: Ed Cashin @ 2013-01-03 21:20 UTC (permalink / raw)
To: stable@vger.kernel.org
Cc: Jens Axboe, Josh Boyer, mitko@banksoft-bg.com,
linux-kernel@vger.kernel.org, kernel-team@fedoraproject.org,
Peter Zijlstra
On Jan 3, 2013, at 3:50 PM, Ed Cashin wrote:
> On Jan 3, 2013, at 2:57 PM, Ed Cashin wrote:
>
>> On Jan 3, 2013, at 2:45 PM, Jens Axboe wrote:
>>
>>> On 2013-01-03 20:28, Ed Cashin wrote:
>>>> Lines: 75
>>>>
>>>> On Thu, Jan 03, 2013 at 12:15:35PM -0600, Ed Cashin wrote:
>>>> ...
>>>>>>>>>> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
>>>> ...
>>>>>>>>>>> [699170.611997] aoe: AoE v47 initialised.
>>>> ...
>>>>>>>>>>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
>>>>>>>>>>> [699231.312031] Hardware name: S5000VSA
>>>>>>>>>>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
>>>>>>>>>>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
>>>>>>>>>>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
>>>>>>>>>>> [699231.340561] Call Trace:
>>>>>>>>>>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
>>>>>>>>>>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
>>>>>>>>>>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
>>>>>>>>>>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
>>>>>>>>>>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
>>>>>>>>>>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
>>>>>>>>>>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
>>>>>>>>>>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
>>>>>>>>>>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
>>>>>>>>>>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
>>>>>>>>>>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
>>>>>>>>>>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
>>>>>>>>>>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
>>>>>>>>>>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
>>>>>>>>>>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
>>>>>>>>>>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
>>>>>>>>>>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
>>>>>>>>>>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
>>>>>>>>>>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
>>>>>>>>>>> [699231.423248] ------------[ cut here ]------------
>>>>
>>>> The blk_alloc_queue has already done a bdi_init, so do not bdi_init again in
>>>> aoeblk_gdalloc.
>>>>
>>>> The patch below applies to v3.5.6, with its v47 aoe driver. On my system it
>>>> eliminates the list_del corruption messages.
>>>
>>> Since the patch doesn't apply to current -git, does the problem not
>>> exist there?
>>
>> The original post is about an older kernel with the v47 aoe driver. The current mainline has a v81 aoe driver, so the patch for v3.5.6 isn't expected to apply to the mainline.
>>
>> I'm currently investigating the state of the mainline with relation to this issue.
>
> I don't see the extra call to bdi_init in aoe driver v81 in the mainline git tree, and
> I don't see the symptoms under discussion, either, when performing the same
> test steps.
>
> Kernels with aoe v47 can use the patch I just posted.
>
> I'm going to go through the stable kernels and check, but I believe kernels after
> v47 but before commit 0a41409c5180 should apply the fix in 0a41409c5180:
>
> commit 0a41409c518083133e79015092585d68915865be
> Author: Ed Cashin <ecashin@coraid.com>
> Date: Mon Dec 17 16:03:58 2012 -0800
>
> aoe: remove vestigial request queue allocation
...
OK. Looks like 3.7.1 has aoe v50, exhibits the behavior in question, and needs
the above-mentioned commit 0a41409c5180 from Linus' tree as a fix.
The 3.6.11 kernel has aoe v47, so it and earlier stable kernels can use the patch
I posted in this thread.
I'm sending this to stable@vger.kernel.org to ask Greg, Ben, and other stable
maintainers (where's the list of stable kernel maintainers?) ...
Should I send git-generated patches to stable@vger.kernel.org based on each
linux-stable/linux-3.x,y branch? There are only two cases, really: 3.7.y and others.
I suppose 3.7.y can cherry pick from the mainline.
What's the best way to do this?
--
Ed Cashin
ecashin@coraid.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 21:00 ` Josh Boyer
@ 2013-01-04 12:35 ` Josh Boyer
0 siblings, 0 replies; 16+ messages in thread
From: Josh Boyer @ 2013-01-04 12:35 UTC (permalink / raw)
To: Ed Cashin
Cc: Jens Axboe, mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
On Thu, Jan 03, 2013 at 04:00:46PM -0500, Josh Boyer wrote:
> On Thu, Jan 03, 2013 at 02:50:46PM -0600, Ed Cashin wrote:
> > >>> The blk_alloc_queue has already done a bdi_init, so do not bdi_init again in
> > >>> aoeblk_gdalloc.
> > >>>
> > >>> The patch below applies to v3.5.6, with its v47 aoe driver. On my system it
> > >>> eliminates the list_del corruption messages.
> > >>
> > >> Since the patch doesn't apply to current -git, does the problem not
> > >> exist there?
> > >
> > > The original post is about an older kernel with the v47 aoe driver. The current mainline has a v81 aoe driver, so the patch for v3.5.6 isn't expected to apply to the mainline.
> > >
> > > I'm currently investigating the state of the mainline with relation to this issue.
> >
> > I don't see the extra call to bdi_init in aoe driver v81 in the mainline git tree, and
> > I don't see the symptoms under discussion, either, when performing the same
> > test steps.
> >
> > Kernels with aoe v47 can use the patch I just posted.
> >
> > I'm going to go through the stable kernels and check, but I believe kernels after
> > v47 but before commit 0a41409c5180 should apply the fix in 0a41409c5180:
> >
> > commit 0a41409c518083133e79015092585d68915865be
> > Author: Ed Cashin <ecashin@coraid.com>
> > Date: Mon Dec 17 16:03:58 2012 -0800
> >
> > aoe: remove vestigial request queue allocation
> >
> > Josh, can you confirm that the patch I posted in this thread today works for
> > your customer?
>
> Sure. I'll get a test kernel built with that patch and ask them to
> test.
Dimitar confirmed in the bug that the kernel I built with the patch no
longer oopses on rmmod. Thanks for the quick turn around!
josh
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-03 21:20 ` Ed Cashin
@ 2013-01-13 5:34 ` Ben Hutchings
2013-01-13 14:23 ` Ed Cashin
0 siblings, 1 reply; 16+ messages in thread
From: Ben Hutchings @ 2013-01-13 5:34 UTC (permalink / raw)
To: Ed Cashin
Cc: stable@vger.kernel.org, Jens Axboe, Josh Boyer,
mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra
[-- Attachment #1: Type: text/plain, Size: 1969 bytes --]
On Thu, 2013-01-03 at 15:20 -0600, Ed Cashin wrote:
> On Jan 3, 2013, at 3:50 PM, Ed Cashin wrote:
[...]
> > Kernels with aoe v47 can use the patch I just posted.
> >
> > I'm going to go through the stable kernels and check, but I believe kernels after
> > v47 but before commit 0a41409c5180 should apply the fix in 0a41409c5180:
> >
> > commit 0a41409c518083133e79015092585d68915865be
> > Author: Ed Cashin <ecashin@coraid.com>
> > Date: Mon Dec 17 16:03:58 2012 -0800
> >
> > aoe: remove vestigial request queue allocation
>
> ...
>
> OK. Looks like 3.7.1 has aoe v50, exhibits the behavior in question, and needs
> the above-mentioned commit 0a41409c5180 from Linus' tree as a fix.
>
> The 3.6.11 kernel has aoe v47, so it and earlier stable kernels can use the patch
> I posted in this thread.
>
> I'm sending this to stable@vger.kernel.org to ask Greg, Ben, and other stable
> maintainers (where's the list of stable kernel maintainers?) ...
There is Willy Tarreau looking after 2.6.32.y, Paul Gortmaker with
2.6.34.y and unofficially Herton Ronaldo Krzesinski with 3.5.7.y.
All stable maintainers should be reading stable@vger.kernel.org anyway.
> Should I send git-generated patches to stable@vger.kernel.org based on each
> linux-stable/linux-3.x,y branch? There are only two cases, really: 3.7.y and others.
> I suppose 3.7.y can cherry pick from the mainline.
>
> What's the best way to do this?
Given that aoe is basically unchanged between Linux 2.6.32 and 3.6 I
agree that the one patch should be sufficient. However your previous
patch was lacking a signoff or much of a commit message, so please do
re-post it with those. The commit message should include a reference to
the corresponding mainline commit as you identified above.
Ben.
--
Ben Hutchings
Klipstein's 4th Law of Prototyping and Production:
A fail-safe circuit will destroy others.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops on aoe module removal
2013-01-13 5:34 ` Ben Hutchings
@ 2013-01-13 14:23 ` Ed Cashin
0 siblings, 0 replies; 16+ messages in thread
From: Ed Cashin @ 2013-01-13 14:23 UTC (permalink / raw)
To: Ben Hutchings
Cc: stable@vger.kernel.org, Jens Axboe, Josh Boyer,
mitko@banksoft-bg.com, linux-kernel@vger.kernel.org,
kernel-team@fedoraproject.org, Peter Zijlstra, Ed Cashin
On Jan 13, 2013, at 12:34 AM, Ben Hutchings wrote:
...
[Nice info on stable maintainers, thanks.]
...
> Given that aoe is basically unchanged between Linux 2.6.32 and 3.6 I
> agree that the one patch should be sufficient. However your previous
> patch was lacking a signoff or much of a commit message, so please do
> re-post it with those. The commit message should include a reference to
> the corresponding mainline commit as you identified above.
OK. Got it:
[PATCH <= 3.6.y] aoe: do not call bdi_init after blk_alloc_queue
http://thread.gmane.org/gmane.linux.kernel.stable/38957
That particular change (for <= 3.6.y) is not derived from the mainline
commit you mention, but the other patch for 3.7.y is, and I included a
reference to the mainline commit in the changelog message for the
3.7.y fix.
[PATCH 3.7.y] aoe: merge 0a41409c5180 to avoid list corruption from extra bdi_init
http://thread.gmane.org/gmane.linux.kernel.stable/38909
--
Ed Cashin
ecashin@coraid.com
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-01-13 14:23 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-03 13:25 Oops on aoe module removal Josh Boyer
2013-01-03 14:02 ` Ed Cashin
2013-01-03 14:09 ` Jens Axboe
2013-01-03 14:12 ` Jens Axboe
2013-01-03 15:28 ` Ed Cashin
2013-01-03 15:34 ` Jens Axboe
2013-01-03 18:15 ` Ed Cashin
2013-01-03 19:28 ` Ed Cashin, Ed Cashin
2013-01-03 19:45 ` Jens Axboe
2013-01-03 19:57 ` Ed Cashin
2013-01-03 20:50 ` Ed Cashin
2013-01-03 21:00 ` Josh Boyer
2013-01-04 12:35 ` Josh Boyer
2013-01-03 21:20 ` Ed Cashin
2013-01-13 5:34 ` Ben Hutchings
2013-01-13 14:23 ` Ed Cashin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox