2.6.38-rc2+ tcm_mvsas kernel oops

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.38-rc2+ tcm_mvsas kernel oops
@ 2011-01-30 19:02 Fubo Chen
  2011-01-30 21:34 ` Nicholas A. Bellinger
  0 siblings, 1 reply; 8+ messages in thread
From: Fubo Chen @ 2011-01-30 19:02 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-scsi

Hello,

Today I did what I should have done before: try to load and unload
tcm_mvsas kernel module. Surprised to see that this triggered kernel
oops. Did I make stupid mistake ?

What I did:

# rm -rf drivers/target/tcm_mvsas
# cd Documentation/target
# { echo yes; echo yes; } | ./tcm_mod_builder.py -m tcm_mvsas -p SAS
# cd ../..
# echo m | make oldconfig
# make prepare
# make M=drivers/target/tcm_mvsas modules modules_install
# modprobe tcm_mvsas
# rmmod tcm_mvsas
# rmmod target_core_mod
Segmentation fault

>From console:

<<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
Initialized struct target_fabric_configfs: ffff880027680000 for mvsas
<<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
TCM_MVSAS[0] - Set fabric -> tcm_mvsas_fabric_configfs
<<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
Target_Core_ConfigFS: DEREGISTER -> Releasing tf: mvsas
<<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
TCM_MVSAS[0] - Cleared tcm_mvsas_fabric_configfs
general protection fault: 0000 [#1] SMP
last sysfs file:
/sys/devices/pci0000:00/0000:00:11.0/0000:02:03.0/usb1/1-0:1.0/uevent
CPU 0
Modules linked in: target_core_mod(-) configfs netconsole iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc psmouse
serio_raw shpchp i2c_piix4 mptspi mptscsih mptbase scsi_transport_spi
e1000 floppy [last unloaded: tcm_mvsas]

Pid: 2346, comm: rmmod Not tainted 2.6.38-rc2+
RIP: 0010:[<ffffffff810946a4>]  [<ffffffff810946a4>] __lock_acquire+0x64/0x1510
RSP: 0018:ffff8800275cdb18  EFLAGS: 00010046
RAX: 0000000000000046 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8800275cdbe8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88002cb4a350
FS:  00007f9238be4700(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f92386d1fc0 CR3: 0000000027560000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 2346, threadinfo ffff8800275cc000, task ffff88002cb4a350)
Stack:
 0000000000000004 ffff88002cb4a350 ffffffff82033ee0 ffffffff81010dfd
 ffff8800275cdb68 ffffffff81ed1590 ffff8800275cdb68 0000000000000000
 32a19d8cf067a674 ffff88002cb4ab08 ffff8800275cdc48 0000000000000002
Call Trace:
 [<ffffffff81010dfd>] ? save_stack_trace+0x2d/0x50
 [<ffffffff81095bf0>] lock_acquire+0xa0/0x150
 [<ffffffffa0146f8f>] ? detach_groups+0x2f/0x120 [configfs]
 [<ffffffff81545a04>] ? __mutex_lock_common+0x2a4/0x3e0
 [<ffffffffa0147004>] ? detach_groups+0xa4/0x120 [configfs]
 [<ffffffff815472b6>] _raw_spin_lock+0x36/0x70
 [<ffffffffa0146f8f>] ? detach_groups+0x2f/0x120 [configfs]
 [<ffffffffa0146f8f>] detach_groups+0x2f/0x120 [configfs]
 [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
 [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
 [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
 [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
 [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
 [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
 [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
 [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
 [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
 [<ffffffffa0147122>] configfs_unregister_subsystem+0xa2/0x130 [configfs]
 [<ffffffffa014fc84>] target_core_exit_configfs+0x184/0x1c0 [target_core_mod]
 [<ffffffff810a0a32>] sys_delete_module+0x1a2/0x280
 [<ffffffff81547019>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81002f82>] system_call_fastpath+0x16/0x1b
Code: 8b 05 a1 64 9a 00 4c 89 75 f0 48 89 fb 41 89 d5 4c 8b 55 10 45
85 c0 0f 84 4a 04 00 00 8b 3d 08 96 cd 00 85 ff 0f 84 5c 04 00 00 <48>
81 3b 20 15 dd 81 b8 01 00 00 00 44 0f 44 e0 83 fe 01 0f 86
RIP  [<ffffffff810946a4>] __lock_acquire+0x64/0x1510
 RSP <ffff8800275cdb18>
---[ end trace f4ddfaa61a61623b ]---


Thanks for all help.

Fubo.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38-rc2+ tcm_mvsas kernel oops
  2011-01-30 19:02 2.6.38-rc2+ tcm_mvsas kernel oops Fubo Chen
@ 2011-01-30 21:34 ` Nicholas A. Bellinger
  2011-01-31 17:21   ` Fubo Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Nicholas A. Bellinger @ 2011-01-30 21:34 UTC (permalink / raw)
  To: Fubo Chen; +Cc: linux-scsi

On Sun, 2011-01-30 at 20:02 +0100, Fubo Chen wrote:
> Hello,
> 
> Today I did what I should have done before: try to load and unload
> tcm_mvsas kernel module. Surprised to see that this triggered kernel
> oops. Did I make stupid mistake ?
> 

Hi Fubo,

> What I did:
> 
> # rm -rf drivers/target/tcm_mvsas
> # cd Documentation/target
> # { echo yes; echo yes; } | ./tcm_mod_builder.py -m tcm_mvsas -p SAS
> # cd ../..
> # echo m | make oldconfig
> # make prepare

FYI, you do not need to be calling make oldconfig + prepare each time to
rebuild a single fabric module like tcm_mvsas.ko

> # make M=drivers/target/tcm_mvsas modules modules_install
> # modprobe tcm_mvsas

What happened to 'modprobe target_core_mod' before loading tcm_mvsas..?

Typically if you are running 'make oldconfig' and change your .config,
you need to be running a matched set of modules, and not something that
was potentially built from a different .config.

> # rmmod tcm_mvsas
> # rmmod target_core_mod
> Segmentation fault
> 
> >From console:
> 
> <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> Initialized struct target_fabric_configfs: ffff880027680000 for mvsas
> <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> TCM_MVSAS[0] - Set fabric -> tcm_mvsas_fabric_configfs
> <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> Target_Core_ConfigFS: DEREGISTER -> Releasing tf: mvsas
> <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> TCM_MVSAS[0] - Cleared tcm_mvsas_fabric_configfs
> general protection fault: 0000 [#1] SMP
> last sysfs file:
> /sys/devices/pci0000:00/0000:00:11.0/0000:02:03.0/usb1/1-0:1.0/uevent
> CPU 0
> Modules linked in: target_core_mod(-) configfs netconsole iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc psmouse
> serio_raw shpchp i2c_piix4 mptspi mptscsih mptbase scsi_transport_spi
> e1000 floppy [last unloaded: tcm_mvsas]
> 
> Pid: 2346, comm: rmmod Not tainted 2.6.38-rc2+
> RIP: 0010:[<ffffffff810946a4>]  [<ffffffff810946a4>] __lock_acquire+0x64/0x1510
> RSP: 0018:ffff8800275cdb18  EFLAGS: 00010046
> RAX: 0000000000000046 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff8800275cdbe8 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88002cb4a350
> FS:  00007f9238be4700(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f92386d1fc0 CR3: 0000000027560000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process rmmod (pid: 2346, threadinfo ffff8800275cc000, task ffff88002cb4a350)
> Stack:
>  0000000000000004 ffff88002cb4a350 ffffffff82033ee0 ffffffff81010dfd
>  ffff8800275cdb68 ffffffff81ed1590 ffff8800275cdb68 0000000000000000
>  32a19d8cf067a674 ffff88002cb4ab08 ffff8800275cdc48 0000000000000002
> Call Trace:
>  [<ffffffff81010dfd>] ? save_stack_trace+0x2d/0x50
>  [<ffffffff81095bf0>] lock_acquire+0xa0/0x150
>  [<ffffffffa0146f8f>] ? detach_groups+0x2f/0x120 [configfs]
>  [<ffffffff81545a04>] ? __mutex_lock_common+0x2a4/0x3e0
>  [<ffffffffa0147004>] ? detach_groups+0xa4/0x120 [configfs]
>  [<ffffffff815472b6>] _raw_spin_lock+0x36/0x70
>  [<ffffffffa0146f8f>] ? detach_groups+0x2f/0x120 [configfs]
>  [<ffffffffa0146f8f>] detach_groups+0x2f/0x120 [configfs]
>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
>  [<ffffffffa0147122>] configfs_unregister_subsystem+0xa2/0x130 [configfs]
>  [<ffffffffa014fc84>] target_core_exit_configfs+0x184/0x1c0 [target_core_mod]
>  [<ffffffff810a0a32>] sys_delete_module+0x1a2/0x280
>  [<ffffffff81547019>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff81002f82>] system_call_fastpath+0x16/0x1b
> Code: 8b 05 a1 64 9a 00 4c 89 75 f0 48 89 fb 41 89 d5 4c 8b 55 10 45
> 85 c0 0f 84 4a 04 00 00 8b 3d 08 96 cd 00 85 ff 0f 84 5c 04 00 00 <48>
> 81 3b 20 15 dd 81 b8 01 00 00 00 44 0f 44 e0 83 fe 01 0f 86
> RIP  [<ffffffff810946a4>] __lock_acquire+0x64/0x1510
>  RSP <ffff8800275cdb18>
> ---[ end trace f4ddfaa61a61623b ]---
> 
> 

Ok, just to verify.  I have tried a couple varitions of the following
after generating a fresh 'tcm_mvsas' fabric skeleton on
lio-core-2.6.git/linus-38-rc2:

while [ 1 ]; do modprobe target_core_mod ; sleep 1 ; modprobe
tcm_mvsas ; rmmod tcm_mvsas ; rmmod target_core_mod; done

and nothing out of the ordinary appers with .38-rc2 target code on
x86_64 VM while this runs so far..

Did something change in your .config between the running target_core_mod
and newly built tcm_mvsas.ko that could cause a GFP like this..?

Please verify your 'rmmod tcm_mvsas' test with a single set of .config
options and rebuild + reboot with:

	make clean ; make bzImage ; make modules ; make modules_install ; make install

Thanks,

--nab


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38-rc2+ tcm_mvsas kernel oops
  2011-01-30 21:34 ` Nicholas A. Bellinger
@ 2011-01-31 17:21   ` Fubo Chen
  2011-01-31 20:55     ` Nicholas A. Bellinger
  0 siblings, 1 reply; 8+ messages in thread
From: Fubo Chen @ 2011-01-31 17:21 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-scsi

On Sun, Jan 30, 2011 at 10:34 PM, Nicholas A. Bellinger
<nab@linux-iscsi.org> wrote:
> On Sun, 2011-01-30 at 20:02 +0100, Fubo Chen wrote:
>> [ ... ]
>> # make M=drivers/target/tcm_mvsas modules modules_install
>> # modprobe tcm_mvsas
>
> What happened to 'modprobe target_core_mod' before loading tcm_mvsas..?

'modprobe tcm_mvsas' loads target_core_mod automatically as far as i know ?

> Typically if you are running 'make oldconfig' and change your .config,
> you need to be running a matched set of modules, and not something that
> was potentially built from a different .config.
>
>> # rmmod tcm_mvsas
>> # rmmod target_core_mod
>> Segmentation fault
>>
>> >From console:
>>
>> <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
>> Initialized struct target_fabric_configfs: ffff880027680000 for mvsas
>> <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
>> TCM_MVSAS[0] - Set fabric -> tcm_mvsas_fabric_configfs
>> <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
>> Target_Core_ConfigFS: DEREGISTER -> Releasing tf: mvsas
>> <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
>> TCM_MVSAS[0] - Cleared tcm_mvsas_fabric_configfs
>> general protection fault: 0000 [#1] SMP
>> last sysfs file:
>> /sys/devices/pci0000:00/0000:00:11.0/0000:02:03.0/usb1/1-0:1.0/uevent
>> CPU 0
>> Modules linked in: target_core_mod(-) configfs netconsole iscsi_tcp
>> libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc psmouse
>> serio_raw shpchp i2c_piix4 mptspi mptscsih mptbase scsi_transport_spi
>> e1000 floppy [last unloaded: tcm_mvsas]
>>
>> Pid: 2346, comm: rmmod Not tainted 2.6.38-rc2+
>> RIP: 0010:[<ffffffff810946a4>]  [<ffffffff810946a4>] __lock_acquire+0x64/0x1510
>> RSP: 0018:ffff8800275cdb18  EFLAGS: 00010046
>> RAX: 0000000000000046 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> RBP: ffff8800275cdbe8 R08: 0000000000000001 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
>> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88002cb4a350
>> FS:  00007f9238be4700(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 00007f92386d1fc0 CR3: 0000000027560000 CR4: 00000000000006f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process rmmod (pid: 2346, threadinfo ffff8800275cc000, task ffff88002cb4a350)
>> Stack:
>>  0000000000000004 ffff88002cb4a350 ffffffff82033ee0 ffffffff81010dfd
>>  ffff8800275cdb68 ffffffff81ed1590 ffff8800275cdb68 0000000000000000
>>  32a19d8cf067a674 ffff88002cb4ab08 ffff8800275cdc48 0000000000000002
>> Call Trace:
>>  [<ffffffff81010dfd>] ? save_stack_trace+0x2d/0x50
>>  [<ffffffff81095bf0>] lock_acquire+0xa0/0x150
>>  [<ffffffffa0146f8f>] ? detach_groups+0x2f/0x120 [configfs]
>>  [<ffffffff81545a04>] ? __mutex_lock_common+0x2a4/0x3e0
>>  [<ffffffffa0147004>] ? detach_groups+0xa4/0x120 [configfs]
>>  [<ffffffff815472b6>] _raw_spin_lock+0x36/0x70
>>  [<ffffffffa0146f8f>] ? detach_groups+0x2f/0x120 [configfs]
>>  [<ffffffffa0146f8f>] detach_groups+0x2f/0x120 [configfs]
>>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
>>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
>>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
>>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
>>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
>>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
>>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
>>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
>>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
>>  [<ffffffffa0147122>] configfs_unregister_subsystem+0xa2/0x130 [configfs]
>>  [<ffffffffa014fc84>] target_core_exit_configfs+0x184/0x1c0 [target_core_mod]
>>  [<ffffffff810a0a32>] sys_delete_module+0x1a2/0x280
>>  [<ffffffff81547019>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>  [<ffffffff81002f82>] system_call_fastpath+0x16/0x1b
>> Code: 8b 05 a1 64 9a 00 4c 89 75 f0 48 89 fb 41 89 d5 4c 8b 55 10 45
>> 85 c0 0f 84 4a 04 00 00 8b 3d 08 96 cd 00 85 ff 0f 84 5c 04 00 00 <48>
>> 81 3b 20 15 dd 81 b8 01 00 00 00 44 0f 44 e0 83 fe 01 0f 86
>> RIP  [<ffffffff810946a4>] __lock_acquire+0x64/0x1510
>>  RSP <ffff8800275cdb18>
>> ---[ end trace f4ddfaa61a61623b ]---
>>
>>
>
> Ok, just to verify.  I have tried a couple varitions of the following
> after generating a fresh 'tcm_mvsas' fabric skeleton on
> lio-core-2.6.git/linus-38-rc2:
>
> while [ 1 ]; do modprobe target_core_mod ; sleep 1 ; modprobe
> tcm_mvsas ; rmmod tcm_mvsas ; rmmod target_core_mod; done
>
> and nothing out of the ordinary appers with .38-rc2 target code on
> x86_64 VM while this runs so far..
>
> Did something change in your .config between the running target_core_mod
> and newly built tcm_mvsas.ko that could cause a GFP like this..?
>
> Please verify your 'rmmod tcm_mvsas' test with a single set of .config
> options and rebuild + reboot with:
>
>        make clean ; make bzImage ; make modules ; make modules_install ; make install

Thanks for hint. I have rebuilt kernel but unfortunately crash still
occurs. Maybe it's because I have enabled SLUB poisoning ?

Fubo.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38-rc2+ tcm_mvsas kernel oops
  2011-01-31 17:21   ` Fubo Chen
@ 2011-01-31 20:55     ` Nicholas A. Bellinger
  2011-02-01 17:55       ` Fubo Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Nicholas A. Bellinger @ 2011-01-31 20:55 UTC (permalink / raw)
  To: Fubo Chen; +Cc: linux-scsi

On Mon, 2011-01-31 at 18:21 +0100, Fubo Chen wrote:
> On Sun, Jan 30, 2011 at 10:34 PM, Nicholas A. Bellinger
> <nab@linux-iscsi.org> wrote:
> > On Sun, 2011-01-30 at 20:02 +0100, Fubo Chen wrote:
> >> [ ... ]
> >> # make M=drivers/target/tcm_mvsas modules modules_install
> >> # modprobe tcm_mvsas
> >
> > What happened to 'modprobe target_core_mod' before loading tcm_mvsas..?
> 
> 'modprobe tcm_mvsas' loads target_core_mod automatically as far as i know ?
> 

Hmmm, yes..  Typically after target_core_mod is initial loaded, and then
doing a:

	mkdir -p /sys/kernel/config/target/$FABRIC_MOD

will call request_module() based on known module names in
target_core_configfs.c:target_core_register_fabric() to autoload the
fabric module.

But since tcm_mvsas does not have an entry there yet, this AFAIK should
not many any difference.

> > Typically if you are running 'make oldconfig' and change your .config,
> > you need to be running a matched set of modules, and not something that
> > was potentially built from a different .config.
> >
> >> # rmmod tcm_mvsas
> >> # rmmod target_core_mod
> >> Segmentation fault
> >>
> >> >From console:
> >>
> >> <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> >> Initialized struct target_fabric_configfs: ffff880027680000 for mvsas
> >> <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> >> TCM_MVSAS[0] - Set fabric -> tcm_mvsas_fabric_configfs
> >> <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> >> Target_Core_ConfigFS: DEREGISTER -> Releasing tf: mvsas
> >> <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> >> TCM_MVSAS[0] - Cleared tcm_mvsas_fabric_configfs
> >> general protection fault: 0000 [#1] SMP
> >> last sysfs file:
> >> /sys/devices/pci0000:00/0000:00:11.0/0000:02:03.0/usb1/1-0:1.0/uevent
> >> CPU 0
> >> Modules linked in: target_core_mod(-) configfs netconsole iscsi_tcp
> >> libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc psmouse
> >> serio_raw shpchp i2c_piix4 mptspi mptscsih mptbase scsi_transport_spi
> >> e1000 floppy [last unloaded: tcm_mvsas]
> >>
> >> Pid: 2346, comm: rmmod Not tainted 2.6.38-rc2+
> >> RIP: 0010:[<ffffffff810946a4>]  [<ffffffff810946a4>] __lock_acquire+0x64/0x1510
> >> RSP: 0018:ffff8800275cdb18  EFLAGS: 00010046
> >> RAX: 0000000000000046 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
> >> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> >> RBP: ffff8800275cdbe8 R08: 0000000000000001 R09: 0000000000000000
> >> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
> >> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88002cb4a350
> >> FS:  00007f9238be4700(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
> >> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >> CR2: 00007f92386d1fc0 CR3: 0000000027560000 CR4: 00000000000006f0
> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> Process rmmod (pid: 2346, threadinfo ffff8800275cc000, task ffff88002cb4a350)
> >> Stack:
> >>  0000000000000004 ffff88002cb4a350 ffffffff82033ee0 ffffffff81010dfd
> >>  ffff8800275cdb68 ffffffff81ed1590 ffff8800275cdb68 0000000000000000
> >>  32a19d8cf067a674 ffff88002cb4ab08 ffff8800275cdc48 0000000000000002
> >> Call Trace:
> >>  [<ffffffff81010dfd>] ? save_stack_trace+0x2d/0x50
> >>  [<ffffffff81095bf0>] lock_acquire+0xa0/0x150
> >>  [<ffffffffa0146f8f>] ? detach_groups+0x2f/0x120 [configfs]
> >>  [<ffffffff81545a04>] ? __mutex_lock_common+0x2a4/0x3e0
> >>  [<ffffffffa0147004>] ? detach_groups+0xa4/0x120 [configfs]
> >>  [<ffffffff815472b6>] _raw_spin_lock+0x36/0x70
> >>  [<ffffffffa0146f8f>] ? detach_groups+0x2f/0x120 [configfs]
> >>  [<ffffffffa0146f8f>] detach_groups+0x2f/0x120 [configfs]
> >>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
> >>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
> >>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
> >>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
> >>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
> >>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
> >>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
> >>  [<ffffffffa0147012>] detach_groups+0xb2/0x120 [configfs]
> >>  [<ffffffffa0146f46>] configfs_detach_group+0x16/0x30 [configfs]
> >>  [<ffffffffa0147122>] configfs_unregister_subsystem+0xa2/0x130 [configfs]
> >>  [<ffffffffa014fc84>] target_core_exit_configfs+0x184/0x1c0 [target_core_mod]
> >>  [<ffffffff810a0a32>] sys_delete_module+0x1a2/0x280
> >>  [<ffffffff81547019>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >>  [<ffffffff81002f82>] system_call_fastpath+0x16/0x1b
> >> Code: 8b 05 a1 64 9a 00 4c 89 75 f0 48 89 fb 41 89 d5 4c 8b 55 10 45
> >> 85 c0 0f 84 4a 04 00 00 8b 3d 08 96 cd 00 85 ff 0f 84 5c 04 00 00 <48>
> >> 81 3b 20 15 dd 81 b8 01 00 00 00 44 0f 44 e0 83 fe 01 0f 86
> >> RIP  [<ffffffff810946a4>] __lock_acquire+0x64/0x1510
> >>  RSP <ffff8800275cdb18>
> >> ---[ end trace f4ddfaa61a61623b ]---
> >>
> >>
> >
> > Ok, just to verify.  I have tried a couple varitions of the following
> > after generating a fresh 'tcm_mvsas' fabric skeleton on
> > lio-core-2.6.git/linus-38-rc2:
> >
> > while [ 1 ]; do modprobe target_core_mod ; sleep 1 ; modprobe
> > tcm_mvsas ; rmmod tcm_mvsas ; rmmod target_core_mod; done
> >
> > and nothing out of the ordinary appers with .38-rc2 target code on
> > x86_64 VM while this runs so far..
> >
> > Did something change in your .config between the running target_core_mod
> > and newly built tcm_mvsas.ko that could cause a GFP like this..?
> >
> > Please verify your 'rmmod tcm_mvsas' test with a single set of .config
> > options and rebuild + reboot with:
> >
> >        make clean ; make bzImage ; make modules ; make modules_install ; make install
> 
> Thanks for hint. I have rebuilt kernel but unfortunately crash still
> occurs. Maybe it's because I have enabled SLUB poisoning ?
> 

Hmmm, I don't see how this would make a difference, and FYI the above
test loops for 'rmmod tcm_mvsas' where running with slub_debug=FZ w/o
issue.

Well, if you are certain things are working fine on .37-FINAL, you can
try using 'git bisect' from a known working LIO .37 commit and build
+test until you locate an offending commit.

But again, this appears to be working in lio-core-2.6.git/linus-38-rc2,
please verify this is what is being tested..?

--nab



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38-rc2+ tcm_mvsas kernel oops
  2011-01-31 20:55     ` Nicholas A. Bellinger
@ 2011-02-01 17:55       ` Fubo Chen
  2011-02-02  3:01         ` Nicholas A. Bellinger
  0 siblings, 1 reply; 8+ messages in thread
From: Fubo Chen @ 2011-02-01 17:55 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-scsi

On Mon, Jan 31, 2011 at 9:55 PM, Nicholas A. Bellinger
<nab@linux-iscsi.org> wrote:
> [ ... ]
>
> Hmmm, I don't see how this would make a difference, and FYI the above
> test loops for 'rmmod tcm_mvsas' where running with slub_debug=FZ w/o
> issue.
>
> Well, if you are certain things are working fine on .37-FINAL, you can
> try using 'git bisect' from a known working LIO .37 commit and build
> +test until you locate an offending commit.
>
> But again, this appears to be working in lio-core-2.6.git/linus-38-rc2,
> please verify this is what is being tested..?

Thanks for looking at this. This is what I get with v2.6.38-rc2,
tcm_mvsas and slub poisoning:

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-2.6.38-rc2
root=UUID=c2d91556-8ed3-4a2a-95d9-50d0203bcfcc ro quiet splash
slub_debug=FPUZ
# modprobe tcm_mvsas
# rmmod tcm_mvsas
# rmmod target_core_mod
Segmentation fault

and on the console:

<<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
Initialized struct target_fabric_configfs: ffff880025e09090 for mvsas
<<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
TCM_MVSAS[0] - Set fabric -> tcm_mvsas_fabric_configfs
<<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
Target_Core_ConfigFS: DEREGISTER -> Releasing tf: mvsas
<<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
TCM_MVSAS[0] - Cleared tcm_mvsas_fabric_configfs
general protection fault: 0000 [#1] SMP
last sysfs file:
/sys/devices/pci0000:00/0000:00:11.0/0000:02:03.0/usb1/1-0:1.0/uevent
CPU 0
Modules linked in: target_core_mod(-) configfs netconsole iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc psmouse
serio_raw i2c_piix4 shpchp mptspi mptscsih e1000 mptbase
scsi_transport_spi floppy [last unloaded: tcm_mvsas]

Pid: 1432, comm: rmmod Not tainted 2.6.38-rc2 #4 440BX Desktop
Reference Platform/VMware Virtual Platform
RIP: 0010:[<ffffffff81094684>]  [<ffffffff81094684>] __lock_acquire+0x64/0x1510
RSP: 0018:ffff880022697b18  EFLAGS: 00010046
RAX: 0000000000000046 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff880022697be8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88002a1a2350
FS:  00007f844069c700(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f8440189fc0 CR3: 0000000025d67000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 1432, threadinfo ffff880022696000, task ffff88002a1a2350)
Stack:
 0000000000000004 ffff88002a1a2350 ffffffff82030820 ffffffff81010dfd
 ffff880022697b68 ffffffff81ed0590 ffff880022697b68 0000000000000000
 3161938ca065261c ffff88002a1a2b08 ffff880022697c48 0000000000000002
Call Trace:
 [<ffffffff81010dfd>] ? save_stack_trace+0x2d/0x50
 [<ffffffff81095bd0>] lock_acquire+0xa0/0x150
 [<ffffffffa00e5f7f>] ? detach_groups+0x2f/0x120 [configfs]
 [<ffffffff81540f44>] ? __mutex_lock_common+0x2a4/0x3e0
 [<ffffffffa00e5ff4>] ? detach_groups+0xa4/0x120 [configfs]
 [<ffffffff815427f6>] _raw_spin_lock+0x36/0x70
 [<ffffffffa00e5f7f>] ? detach_groups+0x2f/0x120 [configfs]
 [<ffffffffa00e5f7f>] detach_groups+0x2f/0x120 [configfs]
 [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
 [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
 [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
 [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
 [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
 [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
 [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
 [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
 [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
 [<ffffffffa00e6112>] configfs_unregister_subsystem+0xa2/0x130 [configfs]
 [<ffffffffa00efc84>] target_core_exit_configfs+0x184/0x1c0 [target_core_mod]
 [<ffffffff810a0a12>] sys_delete_module+0x1a2/0x280
 [<ffffffff81542559>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81002f82>] system_call_fastpath+0x16/0x1b
Code: 8b 05 c1 64 9a 00 4c 89 75 f0 48 89 fb 41 89 d5 4c 8b 55 10 45
85 c0 0f 84 4a 04 00 00 8b 3d 28 86 cd 00 85 ff 0f 84 5c 04 00 00 <48>
81 3b 20 05 dd 81 b8 01 00 00 00 44 0f 44 e0 83 fe 01 0f 86
RIP  [<ffffffff81094684>] __lock_acquire+0x64/0x1510
 RSP <ffff880022697b18>
---[ end trace 4abcf014267c1c85 ]---

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38-rc2+ tcm_mvsas kernel oops
  2011-02-01 17:55       ` Fubo Chen
@ 2011-02-02  3:01         ` Nicholas A. Bellinger
  2011-02-02  4:46           ` Nicholas A. Bellinger
  0 siblings, 1 reply; 8+ messages in thread
From: Nicholas A. Bellinger @ 2011-02-02  3:01 UTC (permalink / raw)
  To: Fubo Chen; +Cc: linux-scsi, Joel Becker

On Tue, 2011-02-01 at 18:55 +0100, Fubo Chen wrote:
> On Mon, Jan 31, 2011 at 9:55 PM, Nicholas A. Bellinger
> <nab@linux-iscsi.org> wrote:
> > [ ... ]
> >
> > Hmmm, I don't see how this would make a difference, and FYI the above
> > test loops for 'rmmod tcm_mvsas' where running with slub_debug=FZ w/o
> > issue.
> >
> > Well, if you are certain things are working fine on .37-FINAL, you can
> > try using 'git bisect' from a known working LIO .37 commit and build
> > +test until you locate an offending commit.
> >
> > But again, this appears to be working in lio-core-2.6.git/linus-38-rc2,
> > please verify this is what is being tested..?
> 
> Thanks for looking at this. This is what I get with v2.6.38-rc2,
> tcm_mvsas and slub poisoning:
> 
> # cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinuz-2.6.38-rc2
> root=UUID=c2d91556-8ed3-4a2a-95d9-50d0203bcfcc ro quiet splash
> slub_debug=FPUZ
> # modprobe tcm_mvsas
> # rmmod tcm_mvsas
> # rmmod target_core_mod
> Segmentation fault
> 

Thanks for this info..  I am now able to reproduce w/ .38-rc2 using
slub_debug=FPUZ..  (More below)

> and on the console:
> 
> <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> Initialized struct target_fabric_configfs: ffff880025e09090 for mvsas
> <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> TCM_MVSAS[0] - Set fabric -> tcm_mvsas_fabric_configfs
> <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> Target_Core_ConfigFS: DEREGISTER -> Releasing tf: mvsas
> <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> TCM_MVSAS[0] - Cleared tcm_mvsas_fabric_configfs
> general protection fault: 0000 [#1] SMP
> last sysfs file:
> /sys/devices/pci0000:00/0000:00:11.0/0000:02:03.0/usb1/1-0:1.0/uevent
> CPU 0
> Modules linked in: target_core_mod(-) configfs netconsole iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc psmouse
> serio_raw i2c_piix4 shpchp mptspi mptscsih e1000 mptbase
> scsi_transport_spi floppy [last unloaded: tcm_mvsas]
> 
> Pid: 1432, comm: rmmod Not tainted 2.6.38-rc2 #4 440BX Desktop
> Reference Platform/VMware Virtual Platform
> RIP: 0010:[<ffffffff81094684>]  [<ffffffff81094684>] __lock_acquire+0x64/0x1510
> RSP: 0018:ffff880022697b18  EFLAGS: 00010046
> RAX: 0000000000000046 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff880022697be8 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88002a1a2350
> FS:  00007f844069c700(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f8440189fc0 CR3: 0000000025d67000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process rmmod (pid: 1432, threadinfo ffff880022696000, task ffff88002a1a2350)
> Stack:
>  0000000000000004 ffff88002a1a2350 ffffffff82030820 ffffffff81010dfd
>  ffff880022697b68 ffffffff81ed0590 ffff880022697b68 0000000000000000
>  3161938ca065261c ffff88002a1a2b08 ffff880022697c48 0000000000000002
> Call Trace:
>  [<ffffffff81010dfd>] ? save_stack_trace+0x2d/0x50
>  [<ffffffff81095bd0>] lock_acquire+0xa0/0x150
>  [<ffffffffa00e5f7f>] ? detach_groups+0x2f/0x120 [configfs]
>  [<ffffffff81540f44>] ? __mutex_lock_common+0x2a4/0x3e0
>  [<ffffffffa00e5ff4>] ? detach_groups+0xa4/0x120 [configfs]
>  [<ffffffff815427f6>] _raw_spin_lock+0x36/0x70
>  [<ffffffffa00e5f7f>] ? detach_groups+0x2f/0x120 [configfs]
>  [<ffffffffa00e5f7f>] detach_groups+0x2f/0x120 [configfs]
>  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
>  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
>  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
>  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
>  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
>  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
>  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
>  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
>  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
>  [<ffffffffa00e6112>] configfs_unregister_subsystem+0xa2/0x130 [configfs]
>  [<ffffffffa00efc84>] target_core_exit_configfs+0x184/0x1c0 [target_core_mod]
>  [<ffffffff810a0a12>] sys_delete_module+0x1a2/0x280
>  [<ffffffff81542559>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff81002f82>] system_call_fastpath+0x16/0x1b
> Code: 8b 05 c1 64 9a 00 4c 89 75 f0 48 89 fb 41 89 d5 4c 8b 55 10 45
> 85 c0 0f 84 4a 04 00 00 8b 3d 28 86 cd 00 85 ff 0f 84 5c 04 00 00 <48>
> 81 3b 20 05 dd 81 b8 01 00 00 00 44 0f 44 e0 83 fe 01 0f 86
> RIP  [<ffffffff81094684>] __lock_acquire+0x64/0x1510
>  RSP <ffff880022697b18>
> ---[ end trace 4abcf014267c1c85 ]---
> --

So this is coming from target_core_exit_configfs() ->
configfs_unregister_system() from a simple 'modprobe target_core_mo ;
rmmod target_core_mod' with slub_debug=FPUZ..

It appears to be related to the TCM top level struct
configfs_subsystem->su_group->default_groups[], which we setup in
target_core_init_configfs() and from which are released individually in
target_core_exit_configfs() before calling configfs_unregister_system().

Note that target_core_exit_configfs() is following the same logic as
default_groups for non struct configfs_subsystem backed groups, so I am
thinking this is going to be the root culprit.

After a quick test w/o the above subsys->su_group.default_groups
allocation/release (and the rest of the top level cg->default_groups[]
disabled), the GFP no longer appears.  They appear to be coming more
than a single stale struct configfs_dirent->s_children from the top
level TCM default groups attached fs/configfs/dir.c:detach_groups().
(jlbec CC'ed)

I am still looking at what is the expected way to handle multiple
default_groups (including a default_group with children) with struct
configfs_subsystem deregister() in fs/configfs/dir.c code, and will send
a followup later this evening.

Thanks again for your report,

--nab



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38-rc2+ tcm_mvsas kernel oops
  2011-02-02  3:01         ` Nicholas A. Bellinger
@ 2011-02-02  4:46           ` Nicholas A. Bellinger
  2011-02-02 17:53             ` Fubo Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Nicholas A. Bellinger @ 2011-02-02  4:46 UTC (permalink / raw)
  To: Fubo Chen; +Cc: linux-scsi, Joel Becker

On Tue, 2011-02-01 at 19:01 -0800, Nicholas A. Bellinger wrote:
> On Tue, 2011-02-01 at 18:55 +0100, Fubo Chen wrote:
> > On Mon, Jan 31, 2011 at 9:55 PM, Nicholas A. Bellinger
> > <nab@linux-iscsi.org> wrote:
> > > [ ... ]
> > >
> > > Hmmm, I don't see how this would make a difference, and FYI the above
> > > test loops for 'rmmod tcm_mvsas' where running with slub_debug=FZ w/o
> > > issue.
> > >
> > > Well, if you are certain things are working fine on .37-FINAL, you can
> > > try using 'git bisect' from a known working LIO .37 commit and build
> > > +test until you locate an offending commit.
> > >
> > > But again, this appears to be working in lio-core-2.6.git/linus-38-rc2,
> > > please verify this is what is being tested..?
> > 
> > Thanks for looking at this. This is what I get with v2.6.38-rc2,
> > tcm_mvsas and slub poisoning:
> > 
> > # cat /proc/cmdline
> > BOOT_IMAGE=/boot/vmlinuz-2.6.38-rc2
> > root=UUID=c2d91556-8ed3-4a2a-95d9-50d0203bcfcc ro quiet splash
> > slub_debug=FPUZ
> > # modprobe tcm_mvsas
> > # rmmod tcm_mvsas
> > # rmmod target_core_mod
> > Segmentation fault
> > 
> 
> Thanks for this info..  I am now able to reproduce w/ .38-rc2 using
> slub_debug=FPUZ..  (More below)
> 
> > and on the console:
> > 
> > <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> > Initialized struct target_fabric_configfs: ffff880025e09090 for mvsas
> > <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> > TCM_MVSAS[0] - Set fabric -> tcm_mvsas_fabric_configfs
> > <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> > Target_Core_ConfigFS: DEREGISTER -> Releasing tf: mvsas
> > <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
> > TCM_MVSAS[0] - Cleared tcm_mvsas_fabric_configfs
> > general protection fault: 0000 [#1] SMP
> > last sysfs file:
> > /sys/devices/pci0000:00/0000:00:11.0/0000:02:03.0/usb1/1-0:1.0/uevent
> > CPU 0
> > Modules linked in: target_core_mod(-) configfs netconsole iscsi_tcp
> > libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc psmouse
> > serio_raw i2c_piix4 shpchp mptspi mptscsih e1000 mptbase
> > scsi_transport_spi floppy [last unloaded: tcm_mvsas]
> > 
> > Pid: 1432, comm: rmmod Not tainted 2.6.38-rc2 #4 440BX Desktop
> > Reference Platform/VMware Virtual Platform
> > RIP: 0010:[<ffffffff81094684>]  [<ffffffff81094684>] __lock_acquire+0x64/0x1510
> > RSP: 0018:ffff880022697b18  EFLAGS: 00010046
> > RAX: 0000000000000046 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffff880022697be8 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
> > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88002a1a2350
> > FS:  00007f844069c700(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007f8440189fc0 CR3: 0000000025d67000 CR4: 00000000000006f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process rmmod (pid: 1432, threadinfo ffff880022696000, task ffff88002a1a2350)
> > Stack:
> >  0000000000000004 ffff88002a1a2350 ffffffff82030820 ffffffff81010dfd
> >  ffff880022697b68 ffffffff81ed0590 ffff880022697b68 0000000000000000
> >  3161938ca065261c ffff88002a1a2b08 ffff880022697c48 0000000000000002
> > Call Trace:
> >  [<ffffffff81010dfd>] ? save_stack_trace+0x2d/0x50
> >  [<ffffffff81095bd0>] lock_acquire+0xa0/0x150
> >  [<ffffffffa00e5f7f>] ? detach_groups+0x2f/0x120 [configfs]
> >  [<ffffffff81540f44>] ? __mutex_lock_common+0x2a4/0x3e0
> >  [<ffffffffa00e5ff4>] ? detach_groups+0xa4/0x120 [configfs]
> >  [<ffffffff815427f6>] _raw_spin_lock+0x36/0x70
> >  [<ffffffffa00e5f7f>] ? detach_groups+0x2f/0x120 [configfs]
> >  [<ffffffffa00e5f7f>] detach_groups+0x2f/0x120 [configfs]
> >  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
> >  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
> >  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
> >  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
> >  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
> >  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
> >  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
> >  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
> >  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
> >  [<ffffffffa00e6112>] configfs_unregister_subsystem+0xa2/0x130 [configfs]
> >  [<ffffffffa00efc84>] target_core_exit_configfs+0x184/0x1c0 [target_core_mod]
> >  [<ffffffff810a0a12>] sys_delete_module+0x1a2/0x280
> >  [<ffffffff81542559>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >  [<ffffffff81002f82>] system_call_fastpath+0x16/0x1b
> > Code: 8b 05 c1 64 9a 00 4c 89 75 f0 48 89 fb 41 89 d5 4c 8b 55 10 45
> > 85 c0 0f 84 4a 04 00 00 8b 3d 28 86 cd 00 85 ff 0f 84 5c 04 00 00 <48>
> > 81 3b 20 05 dd 81 b8 01 00 00 00 44 0f 44 e0 83 fe 01 0f 86
> > RIP  [<ffffffff81094684>] __lock_acquire+0x64/0x1510
> >  RSP <ffff880022697b18>
> > ---[ end trace 4abcf014267c1c85 ]---
> > --
> 
> So this is coming from target_core_exit_configfs() ->
> configfs_unregister_system() from a simple 'modprobe target_core_mo ;
> rmmod target_core_mod' with slub_debug=FPUZ..
> 
> It appears to be related to the TCM top level struct
> configfs_subsystem->su_group->default_groups[], which we setup in
> target_core_init_configfs() and from which are released individually in
> target_core_exit_configfs() before calling configfs_unregister_system().
> 
> Note that target_core_exit_configfs() is following the same logic as
> default_groups for non struct configfs_subsystem backed groups, so I am
> thinking this is going to be the root culprit.
> 
> After a quick test w/o the above subsys->su_group.default_groups
> allocation/release (and the rest of the top level cg->default_groups[]
> disabled), the GFP no longer appears.  They appear to be coming more
> than a single stale struct configfs_dirent->s_children from the top
> level TCM default groups attached fs/configfs/dir.c:detach_groups().
> (jlbec CC'ed)
> 
> I am still looking at what is the expected way to handle multiple
> default_groups (including a default_group with children) with struct
> configfs_subsystem deregister() in fs/configfs/dir.c code, and will send
> a followup later this evening.
> 
> Thanks again for your report,
> 

Ok, after some more research and testing there appears to be two issues
in target_core_exit_configfs() wrt to default groups.  First, the call
to configfs_unregister_subsystem() is expected to drain top level struct
configfs_subsystem->su_group.default_groups[] in fs/configfs/dir.c:
configfs_unregister_subsystem() -> unlink_group(), and not directly by
the configfs consumer.

These second issue is core_alua_free_lu_gp(se_global->default_lu_gp)
releasing default_lu_gp->lun_group before lu_gp_cg->default_groups is
drained.

Here the change that is now resolving the issue on my end with .38-rc2
using slub_debug=FPUZ, and I will send out a proper patch for
lio-core-2.6.git/linus-38-rc2 shortly..  Please verify this works for
you.

Thanks!

--nab


diff --git a/drivers/target/target_core_configfs.c b/drivers/target/target_core_configfs.c
index 9ff1942..7d7dfbc 100644
--- a/drivers/target/target_core_configfs.c
+++ b/drivers/target/target_core_configfs.c
@@ -3262,8 +3262,7 @@ static void target_core_exit_configfs(void)
                config_item_put(item);
        }
        kfree(lu_gp_cg->default_groups);
-       core_alua_free_lu_gp(se_global->default_lu_gp);
-       se_global->default_lu_gp = NULL;
+       lu_gp_cg->default_groups = NULL;
 
        alua_cg = &se_global->alua_group;
        for (i = 0; alua_cg->default_groups[i]; i++) {
@@ -3272,6 +3271,7 @@ static void target_core_exit_configfs(void)
                config_item_put(item);
        }
        kfree(alua_cg->default_groups);
+       alua_cg->default_groups = NULL;
 
        hba_cg = &se_global->target_core_hbagroup;
        for (i = 0; hba_cg->default_groups[i]; i++) {
@@ -3280,15 +3280,17 @@ static void target_core_exit_configfs(void)
                config_item_put(item);
        }
        kfree(hba_cg->default_groups);
-
-       for (i = 0; subsys->su_group.default_groups[i]; i++) {
-               item = &subsys->su_group.default_groups[i]->cg_item;
-               subsys->su_group.default_groups[i] = NULL;
-               config_item_put(item);
-       }
+       hba_cg->default_groups = NULL;
+       /*
+        * We expect subsys->su_group.default_groups to be released
+        * by configfs subsystem provider logic..
+        */
+       configfs_unregister_subsystem(subsys);
        kfree(subsys->su_group.default_groups);
 
-       configfs_unregister_subsystem(subsys);
+       core_alua_free_lu_gp(se_global->default_lu_gp);
+       se_global->default_lu_gp = NULL;
+
        printk(KERN_INFO "TARGET_CORE[0]: Released ConfigFS Fabric"
                        " Infrastructure\n");


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: 2.6.38-rc2+ tcm_mvsas kernel oops
  2011-02-02  4:46           ` Nicholas A. Bellinger
@ 2011-02-02 17:53             ` Fubo Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Fubo Chen @ 2011-02-02 17:53 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-scsi, Joel Becker

On Wed, Feb 2, 2011 at 5:46 AM, Nicholas A. Bellinger
<nab@linux-iscsi.org> wrote:
> On Tue, 2011-02-01 at 19:01 -0800, Nicholas A. Bellinger wrote:
>> On Tue, 2011-02-01 at 18:55 +0100, Fubo Chen wrote:
>> > On Mon, Jan 31, 2011 at 9:55 PM, Nicholas A. Bellinger
>> > <nab@linux-iscsi.org> wrote:
>> > > [ ... ]
>> > >
>> > > Hmmm, I don't see how this would make a difference, and FYI the above
>> > > test loops for 'rmmod tcm_mvsas' where running with slub_debug=FZ w/o
>> > > issue.
>> > >
>> > > Well, if you are certain things are working fine on .37-FINAL, you can
>> > > try using 'git bisect' from a known working LIO .37 commit and build
>> > > +test until you locate an offending commit.
>> > >
>> > > But again, this appears to be working in lio-core-2.6.git/linus-38-rc2,
>> > > please verify this is what is being tested..?
>> >
>> > Thanks for looking at this. This is what I get with v2.6.38-rc2,
>> > tcm_mvsas and slub poisoning:
>> >
>> > # cat /proc/cmdline
>> > BOOT_IMAGE=/boot/vmlinuz-2.6.38-rc2
>> > root=UUID=c2d91556-8ed3-4a2a-95d9-50d0203bcfcc ro quiet splash
>> > slub_debug=FPUZ
>> > # modprobe tcm_mvsas
>> > # rmmod tcm_mvsas
>> > # rmmod target_core_mod
>> > Segmentation fault
>> >
>>
>> Thanks for this info..  I am now able to reproduce w/ .38-rc2 using
>> slub_debug=FPUZ..  (More below)
>>
>> > and on the console:
>> >
>> > <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
>> > Initialized struct target_fabric_configfs: ffff880025e09090 for mvsas
>> > <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
>> > TCM_MVSAS[0] - Set fabric -> tcm_mvsas_fabric_configfs
>> > <<<<<<<<<<<<<<<<<<<<<< BEGIN FABRIC API >>>>>>>>>>>>>>>>>>>>>>
>> > Target_Core_ConfigFS: DEREGISTER -> Releasing tf: mvsas
>> > <<<<<<<<<<<<<<<<<<<<<< END FABRIC API >>>>>>>>>>>>>>>>>>>>>>
>> > TCM_MVSAS[0] - Cleared tcm_mvsas_fabric_configfs
>> > general protection fault: 0000 [#1] SMP
>> > last sysfs file:
>> > /sys/devices/pci0000:00/0000:00:11.0/0000:02:03.0/usb1/1-0:1.0/uevent
>> > CPU 0
>> > Modules linked in: target_core_mod(-) configfs netconsole iscsi_tcp
>> > libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc psmouse
>> > serio_raw i2c_piix4 shpchp mptspi mptscsih e1000 mptbase
>> > scsi_transport_spi floppy [last unloaded: tcm_mvsas]
>> >
>> > Pid: 1432, comm: rmmod Not tainted 2.6.38-rc2 #4 440BX Desktop
>> > Reference Platform/VMware Virtual Platform
>> > RIP: 0010:[<ffffffff81094684>]  [<ffffffff81094684>] __lock_acquire+0x64/0x1510
>> > RSP: 0018:ffff880022697b18  EFLAGS: 00010046
>> > RAX: 0000000000000046 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
>> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> > RBP: ffff880022697be8 R08: 0000000000000001 R09: 0000000000000000
>> > R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
>> > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88002a1a2350
>> > FS:  00007f844069c700(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
>> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> > CR2: 00007f8440189fc0 CR3: 0000000025d67000 CR4: 00000000000006f0
>> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> > Process rmmod (pid: 1432, threadinfo ffff880022696000, task ffff88002a1a2350)
>> > Stack:
>> >  0000000000000004 ffff88002a1a2350 ffffffff82030820 ffffffff81010dfd
>> >  ffff880022697b68 ffffffff81ed0590 ffff880022697b68 0000000000000000
>> >  3161938ca065261c ffff88002a1a2b08 ffff880022697c48 0000000000000002
>> > Call Trace:
>> >  [<ffffffff81010dfd>] ? save_stack_trace+0x2d/0x50
>> >  [<ffffffff81095bd0>] lock_acquire+0xa0/0x150
>> >  [<ffffffffa00e5f7f>] ? detach_groups+0x2f/0x120 [configfs]
>> >  [<ffffffff81540f44>] ? __mutex_lock_common+0x2a4/0x3e0
>> >  [<ffffffffa00e5ff4>] ? detach_groups+0xa4/0x120 [configfs]
>> >  [<ffffffff815427f6>] _raw_spin_lock+0x36/0x70
>> >  [<ffffffffa00e5f7f>] ? detach_groups+0x2f/0x120 [configfs]
>> >  [<ffffffffa00e5f7f>] detach_groups+0x2f/0x120 [configfs]
>> >  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
>> >  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
>> >  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
>> >  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
>> >  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
>> >  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
>> >  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
>> >  [<ffffffffa00e6002>] detach_groups+0xb2/0x120 [configfs]
>> >  [<ffffffffa00e5f36>] configfs_detach_group+0x16/0x30 [configfs]
>> >  [<ffffffffa00e6112>] configfs_unregister_subsystem+0xa2/0x130 [configfs]
>> >  [<ffffffffa00efc84>] target_core_exit_configfs+0x184/0x1c0 [target_core_mod]
>> >  [<ffffffff810a0a12>] sys_delete_module+0x1a2/0x280
>> >  [<ffffffff81542559>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>> >  [<ffffffff81002f82>] system_call_fastpath+0x16/0x1b
>> > Code: 8b 05 c1 64 9a 00 4c 89 75 f0 48 89 fb 41 89 d5 4c 8b 55 10 45
>> > 85 c0 0f 84 4a 04 00 00 8b 3d 28 86 cd 00 85 ff 0f 84 5c 04 00 00 <48>
>> > 81 3b 20 05 dd 81 b8 01 00 00 00 44 0f 44 e0 83 fe 01 0f 86
>> > RIP  [<ffffffff81094684>] __lock_acquire+0x64/0x1510
>> >  RSP <ffff880022697b18>
>> > ---[ end trace 4abcf014267c1c85 ]---
>> > --
>>
>> So this is coming from target_core_exit_configfs() ->
>> configfs_unregister_system() from a simple 'modprobe target_core_mo ;
>> rmmod target_core_mod' with slub_debug=FPUZ..
>>
>> It appears to be related to the TCM top level struct
>> configfs_subsystem->su_group->default_groups[], which we setup in
>> target_core_init_configfs() and from which are released individually in
>> target_core_exit_configfs() before calling configfs_unregister_system().
>>
>> Note that target_core_exit_configfs() is following the same logic as
>> default_groups for non struct configfs_subsystem backed groups, so I am
>> thinking this is going to be the root culprit.
>>
>> After a quick test w/o the above subsys->su_group.default_groups
>> allocation/release (and the rest of the top level cg->default_groups[]
>> disabled), the GFP no longer appears.  They appear to be coming more
>> than a single stale struct configfs_dirent->s_children from the top
>> level TCM default groups attached fs/configfs/dir.c:detach_groups().
>> (jlbec CC'ed)
>>
>> I am still looking at what is the expected way to handle multiple
>> default_groups (including a default_group with children) with struct
>> configfs_subsystem deregister() in fs/configfs/dir.c code, and will send
>> a followup later this evening.
>>
>> Thanks again for your report,
>>
>
> Ok, after some more research and testing there appears to be two issues
> in target_core_exit_configfs() wrt to default groups.  First, the call
> to configfs_unregister_subsystem() is expected to drain top level struct
> configfs_subsystem->su_group.default_groups[] in fs/configfs/dir.c:
> configfs_unregister_subsystem() -> unlink_group(), and not directly by
> the configfs consumer.
>
> These second issue is core_alua_free_lu_gp(se_global->default_lu_gp)
> releasing default_lu_gp->lun_group before lu_gp_cg->default_groups is
> drained.
>
> Here the change that is now resolving the issue on my end with .38-rc2
> using slub_debug=FPUZ, and I will send out a proper patch for
> lio-core-2.6.git/linus-38-rc2 shortly..  Please verify this works for
> you.

yes, this works forme. Thank you !

Fubo.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-02-02 17:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-30 19:02 2.6.38-rc2+ tcm_mvsas kernel oops Fubo Chen
2011-01-30 21:34 ` Nicholas A. Bellinger
2011-01-31 17:21   ` Fubo Chen
2011-01-31 20:55     ` Nicholas A. Bellinger
2011-02-01 17:55       ` Fubo Chen
2011-02-02  3:01         ` Nicholas A. Bellinger
2011-02-02  4:46           ` Nicholas A. Bellinger
2011-02-02 17:53             ` Fubo Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox