linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 29442] New: target: Race between target driver unregistration and LUN removal
@ 2011-02-19 12:28 bugzilla-daemon
  2011-02-21 11:16 ` [Bug 29442] " bugzilla-daemon
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: bugzilla-daemon @ 2011-02-19 12:28 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=29442

           Summary: target: Race between target driver unregistration and
                    LUN removal
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 2.6.38-rc5
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: SCSI
        AssignedTo: linux-scsi@vger.kernel.org
        ReportedBy: bart.vanassche@gmail.com
        Regression: No


With the ib_srpt driver from branch srpt of https://github.com/bvanassche/srpt,
the call trace included below sometimes appears in the kernel log while
removing ib_srpt-related nodes from configfs and unloading ib_srpt. IMHO this
call trace reveals a race condition in the target core between target driver
template unregistration and LUN removal.

Session 0x00000000000000000002c9030005f34f: kernel thread ib_srpt_compl (PID
7298) stopped
srpt_TPG[1] - Deleted ACL with TCQ Depth: 1 for srpt Initiator Node:
0x00000000000000000002c9030005f34f
srpt_TPG[2] - Deleted ACL with TCQ Depth: 1 for srpt Initiator Node:
0x00000000000000000002c9030003cca3
srpt_TPG[2] - Deleted ACL with TCQ Depth: 1 for srpt Initiator Node:
0x00000000000000000002c9030003cca7
srpt_TPG[2] - Deleted ACL with TCQ Depth: 1 for srpt Initiator Node:
0x00000000000000000002c9030005f34b
srpt_TPG[2] - Deleted ACL with TCQ Depth: 1 for srpt Initiator Node:
0x00000000000000000002c9030005f34f
srpt_TPG[1]_LUN[0] - Deactivated srpt Logical Unit from device object
srpt_TPG[1]_LUN[1] - Deactivated srpt Logical Unit from device object
srpt_TPG[1]_LUN[2] - Deactivated srpt Logical Unit from device object
srpt_TPG[2]_LUN[0] - Deactivated srpt Logical Unit from device object
srpt_TPG[2]_LUN[1] - Deactivated srpt Logical Unit from device object
srpt_TPG[2]_LUN[2] - Deactivated srpt Logical Unit from device object
ib_srpt drop_tpg(mlx4_0, mlx4_0)
ib_srpt drop_tpg(mlx4_1, mlx4_1)
ib_srpt drop_tport(mlx4_0)
ib_srpt drop_tport(mlx4_1)
Target_Core_ConfigFS: DEREGISTER -> Looking up srpt in tf list
Target_Core_ConfigFS: DEREGISTER -> located fabric: srpt
Target_Core_ConfigFS: DEREGISTER -> Releasing tf->tf_fabric for srpt
Target_Core_ConfigFS: DEREGISTER -> Releasing ci srpt
Target_Core_ConfigFS: Calling se_free_virtual_device() for se_dev_ptr:
ffff880063f02120
Target_Core_ConfigFS: Deallocating se_subsystem_dev_t: ffff88006a914120
Target_Core_ConfigFS: Calling se_free_virtual_device() for se_dev_ptr:
ffff880063f00848
Target_Core_ConfigFS: Deallocating se_subsystem_dev_t: ffff88006a070000
Target_Core_ConfigFS: Calling se_free_virtual_device() for se_dev_ptr:
ffff880063f018d8
CORE_RD[0] - Released device space for Ramdisk Device ID: 0, pages 2 in 1
tables total bytes 8192
Target_Core_ConfigFS: Deallocating se_subsystem_dev_t: ffff88006a075168
CORE_HBA[1] - Detached FILEIO HBA: 0 from Generic Target Core
CORE_HBA[1] - Detached HBA from Generic Target Core
CORE_HBA[3] - Detached FILEIO HBA: 1 from Generic Target Core
CORE_HBA[3] - Detached HBA from Generic Target Core
CORE_HBA[2] - Detached Ramdisk HBA: 0 from Generic Target Core
CORE_HBA[2] - Detached HBA from Generic Target Core
TARGET_CORE[Normal]: Deallocating srpt struct se_portal_group for endpoint:
0x2c9030005f34a Portal Tag 1
general protection fault: 0000 [#1] PREEMPT SMP 
last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
CPU 0 
Modules linked in: target_core_pscsi target_core_iblock target_core_file
ib_srpt(-) target_core_mod netconsole configfs af_packet rdma_ucm rdma_cm iw_cm
ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad cpufreq_conservative
cpufreq_userspace cpufreq_powersave acpi_cpufreq mlx4_ib ib_mad ib_core mperf
loop dm_mod mlx4_core sr_mod i2c_i801 pcspkr sg cdrom intel_agp intel_gtt
serio_raw i2c_core button uhci_hcd ehci_hcd usbcore rtc_cmos sd_mod crc_t10dif
rtc_core rtc_lib edd ext3 mbcache jbd fan ata_generic ata_piix pata_marvell
ahci libahci libata scsi_mod thermal processor thermal_sys hwmon [last
unloaded: target_core_mod]

Pid: 7324, comm: tcm_cl_0 Not tainted 2.6.38-rc5-dbg+ #2 System manufacturer
P5Q DELUXE/P5Q DELUXE
RIP: 0010:[<ffffffff811f4bf0>]  [<ffffffff811f4bf0>] list_del+0x30/0x110
RSP: 0018:ffff8800446b3e20  EFLAGS: 00010883
RAX: dead000000200200 RBX: ffff880044760130 RCX: 6b6b6b6b6b6b6b6b
RDX: ffff880044760130 RSI: ffff88006a1266e8 RDI: ffff880044760130
RBP: ffff8800446b3e30 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: dead000000100100
R13: ffff88006a1266d0 R14: ffff88006a1267c0 R15: ffff88006a1266d0
FS:  0000000000000000(0000) GS:ffff88007ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007faf05a335e0 CR3: 000000003bdc8000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process tcm_cl_0 (pid: 7324, threadinfo ffff8800446b2000, task
ffff88007180dfa0)
Stack:
 ffff8800447600c0 ffff88006a126670 ffff8800446b3e80 ffffffffa034a5a1
 00000000001d2180 0000000000000286 ffff8800767d9320 ffff88006a126670
 ffff8800446b3ec0 ffff88006a126670 ffffffffa034a820 0000000000000000
Call Trace:
 [<ffffffffa034a5a1>] __transport_clear_lun_from_sessions+0x61/0x2e0
[target_core_mod]
 [<ffffffffa034a820>] ? transport_clear_lun_thread+0x0/0x30 [target_core_mod]
 [<ffffffffa034a836>] transport_clear_lun_thread+0x16/0x30 [target_core_mod]
 [<ffffffffa034a820>] ? transport_clear_lun_thread+0x0/0x30 [target_core_mod]
 [<ffffffff81070256>] kthread+0xa6/0xb0
 [<ffffffff81003d54>] kernel_thread_helper+0x4/0x10
 [<ffffffff8103e170>] ? finish_task_switch+0x80/0xe0
 [<ffffffff813cb97b>] ? _raw_spin_unlock_irq+0x3b/0x60
 [<ffffffff813cbfc4>] ? restore_args+0x0/0x30
 [<ffffffff810701b0>] ? kthread+0x0/0xb0
 [<ffffffff81003d50>] ? kernel_thread_helper+0x0/0x10
Code: 54 49 bc 00 01 10 00 00 00 ad de 53 48 89 fb 4c 39 27 0f 84 9a 00 00 00
48 8b 4b 08 48 b8 00 02 20 00 00 00 ad de 48 39 c1 74 48 <4c> 8b 01 4c 39 c3 75
60 48 8b 03 4c 8b 40 08 4c 39 c3 0f 85 96 
RIP  [<ffffffff811f4bf0>] list_del+0x30/0x110
 RSP <ffff8800446b3e20>
---[ end trace a00acd2001b8e456 ]---
note: tcm_cl_0[7324] exited with preempt_count 1

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 29442] target: Race between target driver unregistration and LUN removal
  2011-02-19 12:28 [Bug 29442] New: target: Race between target driver unregistration and LUN removal bugzilla-daemon
@ 2011-02-21 11:16 ` bugzilla-daemon
  2011-02-21 18:43 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2011-02-21 11:16 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=29442


nab <nab@linux-iscsi.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nab@linux-iscsi.org




--- Comment #1 from nab <nab@linux-iscsi.org>  2011-02-21 11:16:57 ---
(In reply to comment #0)
> With the ib_srpt driver from branch srpt of https://github.com/bvanassche/srpt,
> the call trace included below sometimes appears in the kernel log while
> removing ib_srpt-related nodes from configfs and unloading ib_srpt. IMHO this
> call trace reveals a race condition in the target core between target driver
> template unregistration and LUN removal.
> 
> Session 0x00000000000000000002c9030005f34f: kernel thread ib_srpt_compl (PID
> 7298) stopped

Ok, the SRP Nexus for 2c9030005f34f is stopped at this point..

> srpt_TPG[1] - Deleted ACL with TCQ Depth: 1 for srpt Initiator Node:
> 0x00000000000000000002c9030005f34f
> srpt_TPG[2] - Deleted ACL with TCQ Depth: 1 for srpt Initiator Node:
> 0x00000000000000000002c9030003cca3
> srpt_TPG[2] - Deleted ACL with TCQ Depth: 1 for srpt Initiator Node:
> 0x00000000000000000002c9030003cca7
> srpt_TPG[2] - Deleted ACL with TCQ Depth: 1 for srpt Initiator Node:
> 0x00000000000000000002c9030005f34b
> srpt_TPG[2] - Deleted ACL with TCQ Depth: 1 for srpt Initiator Node:
> 0x00000000000000000002c9030005f34f

Hmmm, so this is with five explict NodeACLs including 2c9030005f34f across
TPG=1 and TPG=2 to the same $SRP_TARGET_WWN endpoint, but *not* with explict
MappedLUN into
/sys/kernel/config/target/srpt/$SRP_TARGET_WWN/acls/$SRP_INITIATOR_WWN/lun_$ID,
correct..?

> srpt_TPG[1]_LUN[0] - Deactivated srpt Logical Unit from device object
> srpt_TPG[1]_LUN[1] - Deactivated srpt Logical Unit from device object
> srpt_TPG[1]_LUN[2] - Deactivated srpt Logical Unit from device object
> srpt_TPG[2]_LUN[0] - Deactivated srpt Logical Unit from device object
> srpt_TPG[2]_LUN[1] - Deactivated srpt Logical Unit from device object
> srpt_TPG[2]_LUN[2] - Deactivated srpt Logical Unit from device object
> ib_srpt drop_tpg(mlx4_0, mlx4_0)
> ib_srpt drop_tpg(mlx4_1, mlx4_1)
> ib_srpt drop_tport(mlx4_0)
> ib_srpt drop_tport(mlx4_1)
> Target_Core_ConfigFS: DEREGISTER -> Looking up srpt in tf list
> Target_Core_ConfigFS: DEREGISTER -> located fabric: srpt
> Target_Core_ConfigFS: DEREGISTER -> Releasing tf->tf_fabric for srpt
> Target_Core_ConfigFS: DEREGISTER -> Releasing ci srpt
> Target_Core_ConfigFS: Calling se_free_virtual_device() for se_dev_ptr:
> ffff880063f02120
> Target_Core_ConfigFS: Deallocating se_subsystem_dev_t: ffff88006a914120
> Target_Core_ConfigFS: Calling se_free_virtual_device() for se_dev_ptr:
> ffff880063f00848
> Target_Core_ConfigFS: Deallocating se_subsystem_dev_t: ffff88006a070000
> Target_Core_ConfigFS: Calling se_free_virtual_device() for se_dev_ptr:
> ffff880063f018d8
> CORE_RD[0] - Released device space for Ramdisk Device ID: 0, pages 2 in 1
> tables total bytes 8192
> Target_Core_ConfigFS: Deallocating se_subsystem_dev_t: ffff88006a075168
> CORE_HBA[1] - Detached FILEIO HBA: 0 from Generic Target Core
> CORE_HBA[1] - Detached HBA from Generic Target Core
> CORE_HBA[3] - Detached FILEIO HBA: 1 from Generic Target Core
> CORE_HBA[3] - Detached HBA from Generic Target Core
> CORE_HBA[2] - Detached Ramdisk HBA: 0 from Generic Target Core
> CORE_HBA[2] - Detached HBA from Generic Target Core
> TARGET_CORE[Normal]: Deallocating srpt struct se_portal_group for endpoint:
> 0x2c9030005f34a Portal Tag 1
> general protection fault: 0000 [#1] PREEMPT SMP 
> last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
> CPU 0 
> Modules linked in: target_core_pscsi target_core_iblock target_core_file
> ib_srpt(-) target_core_mod netconsole configfs af_packet rdma_ucm rdma_cm iw_cm
> ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad cpufreq_conservative
> cpufreq_userspace cpufreq_powersave acpi_cpufreq mlx4_ib ib_mad ib_core mperf
> loop dm_mod mlx4_core sr_mod i2c_i801 pcspkr sg cdrom intel_agp intel_gtt
> serio_raw i2c_core button uhci_hcd ehci_hcd usbcore rtc_cmos sd_mod crc_t10dif
> rtc_core rtc_lib edd ext3 mbcache jbd fan ata_generic ata_piix pata_marvell
> ahci libahci libata scsi_mod thermal processor thermal_sys hwmon [last
> unloaded: target_core_mod]
> 
> Pid: 7324, comm: tcm_cl_0 Not tainted 2.6.38-rc5-dbg+ #2 System manufacturer
> P5Q DELUXE/P5Q DELUXE
> RIP: 0010:[<ffffffff811f4bf0>]  [<ffffffff811f4bf0>] list_del+0x30/0x110
> RSP: 0018:ffff8800446b3e20  EFLAGS: 00010883
> RAX: dead000000200200 RBX: ffff880044760130 RCX: 6b6b6b6b6b6b6b6b
> RDX: ffff880044760130 RSI: ffff88006a1266e8 RDI: ffff880044760130
> RBP: ffff8800446b3e30 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: dead000000100100
> R13: ffff88006a1266d0 R14: ffff88006a1267c0 R15: ffff88006a1266d0
> FS:  0000000000000000(0000) GS:ffff88007ce00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007faf05a335e0 CR3: 000000003bdc8000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process tcm_cl_0 (pid: 7324, threadinfo ffff8800446b2000, task
> ffff88007180dfa0)
> Stack:
>  ffff8800447600c0 ffff88006a126670 ffff8800446b3e80 ffffffffa034a5a1
>  00000000001d2180 0000000000000286 ffff8800767d9320 ffff88006a126670
>  ffff8800446b3ec0 ffff88006a126670 ffffffffa034a820 0000000000000000
> Call Trace:
>  [<ffffffffa034a5a1>] __transport_clear_lun_from_sessions+0x61/0x2e0
> [target_core_mod]
>  [<ffffffffa034a820>] ? transport_clear_lun_thread+0x0/0x30 [target_core_mod]
>  [<ffffffffa034a836>] transport_clear_lun_thread+0x16/0x30 [target_core_mod]
>  [<ffffffffa034a820>] ? transport_clear_lun_thread+0x0/0x30 [target_core_mod]
>  [<ffffffff81070256>] kthread+0xa6/0xb0
>  [<ffffffff81003d54>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8103e170>] ? finish_task_switch+0x80/0xe0
>  [<ffffffff813cb97b>] ? _raw_spin_unlock_irq+0x3b/0x60
>  [<ffffffff813cbfc4>] ? restore_args+0x0/0x30
>  [<ffffffff810701b0>] ? kthread+0x0/0xb0
>  [<ffffffff81003d50>] ? kernel_thread_helper+0x0/0x10
> Code: 54 49 bc 00 01 10 00 00 00 ad de 53 48 89 fb 4c 39 27 0f 84 9a 00 00 00
> 48 8b 4b 08 48 b8 00 02 20 00 00 00 ad de 48 39 c1 74 48 <4c> 8b 01 4c 39 c3 75
> 60 48 8b 03 4c 8b 40 08 4c 39 c3 0f 85 96 
> RIP  [<ffffffff811f4bf0>] list_del+0x30/0x110
>  RSP <ffff8800446b3e20>
> ---[ end trace a00acd2001b8e456 ]---
> note: tcm_cl_0[7324] exited with preempt_count 1

Ok, this is currently happening with for normal idle shutdown after pushing
sustained I/O, yes..?

Also, this issue within __transport_clear_lun_from_sessions() may be related to
the incorrect usage of TFO->check_stop_free() usage in the mentioned previous
email, or related to demo-mode MappedLUN access.  Please verify your running
/sys/kernel/config/target/srpt layout and will look at reproducing..

Thanks,

--nab

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 29442] target: Race between target driver unregistration and LUN removal
  2011-02-19 12:28 [Bug 29442] New: target: Race between target driver unregistration and LUN removal bugzilla-daemon
  2011-02-21 11:16 ` [Bug 29442] " bugzilla-daemon
@ 2011-02-21 18:43 ` bugzilla-daemon
  2011-02-28 23:43 ` bugzilla-daemon
  2012-01-22 19:30 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2011-02-21 18:43 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=29442





--- Comment #2 from Bart Van Assche <bart.vanassche@gmail.com>  2011-02-21 18:43:32 ---
Created an attachment (id=48622)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=48622)
bash script to start and stop target core + ib_srpt

I have attached the script that has been used during the test to start and stop
the target core and ib_srpt. That will probably make more clear what was going
on. The kernel bug was triggered by reading data on the initiator
simultaneously with running "start-lio stop" on the target.

Note: before you can run this script the initiator IDs in the variable
"initiators" will have to be replaced by those of your InfiniBand setup.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 29442] target: Race between target driver unregistration and LUN removal
  2011-02-19 12:28 [Bug 29442] New: target: Race between target driver unregistration and LUN removal bugzilla-daemon
  2011-02-21 11:16 ` [Bug 29442] " bugzilla-daemon
  2011-02-21 18:43 ` bugzilla-daemon
@ 2011-02-28 23:43 ` bugzilla-daemon
  2012-01-22 19:30 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2011-02-28 23:43 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=29442





--- Comment #3 from nab <nab@linux-iscsi.org>  2011-02-28 23:43:09 ---
>From Bart's feedback the following commit in upstream lio-core-2.6.git appears
to resolve the issue using the latest lio-4.1 (v4.1.0-rc1) development code:

target: Fix t_transport_aborted handling in LUN_RESET + active I/O shutdown
http://git.kernel.org/?p=linux/kernel/git/nab/lio-core-2.6.git;a=commitdiff;h=90c9861627c2b5ab4a79fabf34ffb0c7f59fb34b

This patch has been queued for mainline .38 here:

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-rc-fixes-2.6.git;a=commitdiff;h=52208ae3fc60cbcb214c10fb8b82304199e2cc3a

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug 29442] target: Race between target driver unregistration and LUN removal
  2011-02-19 12:28 [Bug 29442] New: target: Race between target driver unregistration and LUN removal bugzilla-daemon
                   ` (2 preceding siblings ...)
  2011-02-28 23:43 ` bugzilla-daemon
@ 2012-01-22 19:30 ` bugzilla-daemon
  3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2012-01-22 19:30 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=29442


Bart Van Assche <bvanassche@acm.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |CODE_FIX




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-01-22 20:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-19 12:28 [Bug 29442] New: target: Race between target driver unregistration and LUN removal bugzilla-daemon
2011-02-21 11:16 ` [Bug 29442] " bugzilla-daemon
2011-02-21 18:43 ` bugzilla-daemon
2011-02-28 23:43 ` bugzilla-daemon
2012-01-22 19:30 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).