All of lore.kernel.org
 help / color / mirror / Atom feed
From: Klemens Senn <klemens.senn@ims.co.at>
To: linux-rdma@vger.kernel.org
Cc: linux-nfs@vger.kernel.org
Subject: Re: Soft lockup in unloading kernel modules
Date: Tue, 13 May 2014 18:48:49 +0200	[thread overview]
Message-ID: <53724CF1.8030509@ims.co.at> (raw)
In-Reply-To: <536BA9DE.2060702@gmail.com>

Hi Anna,

today I retried unloading the kernel modules with your updated kernel
and additionally I tried the nfsd-next kernel from J. Bruce Fields and
Chuck's nfs-rdma-client kernel.

In short: None of these was able to unload the kernel modules with an
active connection.

In detail:

With your kernel I got following 3 faults:
  o BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:4615]
  o BUG: unable to handle kernel NULL pointer dereference at
0000000000000003
  o BUG: unable to handle kernel paging request at 0000000000005b8c

With the nfsd-next kernel I got following results:
  o BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:4452]
  o module unloading blocks forever, dmesg shows:
    nfsd: last server has exited, flushing export cache
    waiting module removal not supported: please upgrade
  o Kernel keeps running but reports the following:
    nfsd: last server has exited, flushing export cache
    waiting module removal not supported: please upgrade
    svc_xprt_enqueue: threads and transports both waiting??
    INFO: task modprobe:4510 blocked for more than 480 seconds.
          Not tainted 3.15.0-rc1-bfields-master+ #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
    modprobe        D ffff88087fc13440     0  4510   4458 0x00000000
     ffff88105bb23c58 0000000000000086 ffff88105c14e690 0000000000013440
     ffff88105bb23fd8 0000000000013440 ffffffff81a14480 ffff88105c14e690
     0000000000000037 ffff88085d7f74d8 ffff88085d7f74e0 7fffffffffffffff
    Call Trace:
     [<ffffffff815a2424>] schedule+0x24/0x70
     [<ffffffff815a18cc>] schedule_timeout+0x1ec/0x260
     [<ffffffff8159a504>] ? printk+0x5c/0x5e
     [<ffffffff815a3406>] wait_for_completion+0x96/0x100
     [<ffffffff81080c90>] ? try_to_wake_up+0x2b0/0x2b0
     [<ffffffffa0314039>] cma_remove_one+0x1a9/0x220 [rdma_cm]
     [<ffffffffa01fea86>] ib_unregister_device+0x46/0x120 [ib_core]
     [<ffffffffa02c5dc9>] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
     [<ffffffffa04319d0>] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
     [<ffffffffa0431a2b>] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
     [<ffffffffa02d74cc>] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
     [<ffffffff810bd612>] SyS_delete_module+0x152/0x220
     [<ffffffff81149684>] ? vm_munmap+0x54/0x70
     [<ffffffff815ad5a6>] system_call_fastpath+0x1a/0x1f

With the nfs-rdma-client I got following results:
  o module unloading blocks forever, dmesg shows:
    nfsd: last server has exited, flushing export cache
    svc_xprt_enqueue: threads and transports both waiting??
  o BUG: unable to handle kernel paging request at 0000000000004dec
    IP: [<ffffffff815a63b5>] _raw_spin_lock_bh+0x15/0x40
    PGD 107ba9a067 PUD 105c093067 PMD 0
    Oops: 0002 [#1] SMP
    Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry svcrdma
dm_mod cpuid nfs fscache lockd sunrpc af_packet 8021q garp stp llc
rdma_ucm ib_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en
mlx4_ib(-) ib_sa ib_mad ib_core ib_addr sr_mod cdrom usb_storage joydev
mlx4_core usbhid x86_pkg_temp_thermal coretemp kvm_intel kvm
ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul
glue_helper ehci_pci aes_x86_64 ehci_hcd isci iTCO_wdt libsas pcspkr
iTCO_vendor_support igb i2c_algo_bit sb_edac lpc_ich edac_core ioatdma
usbcore tpm_tis ptp microcode i2c_i801 sg mfd_core scsi_transport_sas
ipmi_si usb_common tpm wmi pps_core dca ipmi_msghandler acpi_cpufreq
button edd autofs4 xfs libcrc32c crc32c_intel processor thermal_sys
scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh
    CPU: 14 PID: 4813 Comm: modprobe Not tainted
3.15.0-rc5-cel-nfs-rdma-client-unpatched+ #2
    Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013
    task: ffff88085bf96190 ti: ffff88085d42a000 task.ti: ffff88085d42a000
    RIP: 0010:[<ffffffff815a63b5>]  [<ffffffff815a63b5>]
_raw_spin_lock_bh+0x15/0x40
    RSP: 0018:ffff88085d42bd18  EFLAGS: 00010286
    RAX: 0000000000010000 RBX: 0000000000004de8 RCX: 0000000000000000
    RDX: 000000000000000b RSI: 000000000000000e RDI: 0000000000004dec
    RBP: ffff88085d42bd18 R08: ffff88087c611f38 R09: 000000000000a140
    R10: 000000000000002b R11: 0000000000000000 R12: ffff88085dcc3c00
    R13: ffff88105ca13280 R14: 0000000000004dec R15: 0000000000004df0
    FS:  00007f0e49fb5700(0000) GS:ffff88107fcc0000(0000)
knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000004dec CR3: 000000105b027000 CR4: 00000000000407e0
    Stack:
     ffff88085d42bd58 ffffffffa03bd9f0 0000000001328b88 ffff88085dcc3c00
     ffff88085dce8000 ffff88105ca13280 ffff88085dce8260 ffff88085dce81c8
     ffff88085d42bd78 ffffffffa0441ce9 ffff88085dce8000 ffff88105ca13240
    Call Trace:
     [<ffffffffa03bd9f0>] svc_xprt_enqueue+0x50/0x220 [sunrpc]
     [<ffffffffa0441ce9>] rdma_cma_handler+0x69/0x180 [svcrdma]
     [<ffffffffa031a086>] cma_remove_one+0x1f6/0x220 [rdma_cm]
     [<ffffffffa0261a86>] ib_unregister_device+0x46/0x120 [ib_core]
     [<ffffffffa02b9dc9>] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
     [<ffffffffa02329d0>] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
     [<ffffffffa0232a2b>] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
     [<ffffffffa02cb4cc>] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
     [<ffffffff810bd6f0>] SyS_delete_module+0x170/0x1f0
     [<ffffffff811497f4>] ? vm_munmap+0x54/0x70
     [<ffffffff815ae426>] system_call_fastpath+0x1a/0x1f
    Code: 5d c3 0f b7 17 66 39 ca 74 f6 f3 90 0f b7 17 66 39 d1 75 f6 5d
c3 55 65 81 04 25 20 b9 00 00 00 02 00 00 48 89 e5 b8 00 00 01 00 <f0>
0f c1 07 89 c2 c1 ea 10 66 39 c2 75 04 5d c3 f3 90 0f b7 07
    RIP  [<ffffffff815a63b5>] _raw_spin_lock_bh+0x15/0x40
     RSP <ffff88085d42bd18>
    CR2: 0000000000004dec
    ---[ end trace bf1fd548a33cbfc4 ]---
    Kernel panic - not syncing: Fatal exception in interrupt
    Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffff9fffffff)
    ---[ end Kernel panic - not syncing: Fatal exception in interrupt


Regards,
Klemens


On 05/08/2014 05:59 PM, Anna Schumaker wrote:
> I haven't applied Chuck's recent (v3) patches to that kernel yet (I've been waiting to see if people have comments).  I'll try to push something out today.
> 
> On 05/08/2014 10:28 AM, Senn Klemens wrote:
>> Hi,
>>
>> I am getting a soft lockup on the NFS server on its reboot if at least
>> one client mount is established. I am using OpenSUSE 12.3 with the
>> nfs-rdma kernel from Anna Schumaker
>> (git://git.linux-nfs.org/projects/anna/nfs-rdma.git).
>>
>> The export on the server side is done with
>> /data	*(fsid=0,crossmnt,rw,mp,no_root_squash,sync,no_subtree_check,insecure)
>>
>> Following command is used for mounting the NFSv4 share:
>> mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.19:/ /mnt
>>
>> The HCA is a Mellanox MT4099 on the server and the client.
>>
>> The soft lockup can be reproduced by following steps:
>>   o server: Start the nfs server
>>   o client: Mount the share
>>   o client: Do a "ls" in the mounted directory
>>   o server: Stop the nfs server
>>   o server: Unload the nfs and mlx4 modules or reboot the server (I used
>> the openibd init script from the Mellanox driver without having the
>> Mellanox stack installed)
>>
>> The server reports a soft lockup
>>   BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:6146]
>> most times.
>>
>> Sometimes I get following kernel panic
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000003
>> IP: [<ffffffff815a5c35>] _raw_spin_lock_bh+0x15/0x40
>> PGD 82a820067 PUD 857832067 PMD 0
>> Oops: 0002 [#1] SMP
>> Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry nfnetlink_log
>> nfnetlink bluetooth rfkill nfsv4 svcrdma dm_mod cpuid nfs fscache lockd
>> sunrpc af_packet 8021q garp stp llc rdma_ucm ib_ucm rdma_cm iw_cm
>> ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib(-) ib_sa ib_mad ib_core
>> ib_addr sr_mod cdrom usb_storage joydev mlx4_core usbhid
>> x86_pkg_temp_thermal coretemp kvm_intel kvm ghash_clmulni_intel
>> aesni_intel ablk_helper cryptd iTCO_wdt lrw igb gf128mul
>> iTCO_vendor_support ehci_pci glue_helper pcspkr i2c_algo_bit isci
>> ehci_hcd aes_x86_64 ptp libsas ioatdma lpc_ich microcode sb_edac sg
>> pps_core usbcore ipmi_si tpm_tis edac_core scsi_transport_sas i2c_i801
>> mfd_core dca usb_common tpm ipmi_msghandler wmi acpi_cpufreq button edd
>> autofs4 xfs libcrc32c crc32c_intel processor thermal_sys scsi_dh_rdac
>> scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh [last unloaded: oid_registry]
>> CPU: 0 PID: 6603 Comm: modprobe Not tainted 3.15.0-rc2-anna-nfs-rdma+ #3
>> Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013
>> task: ffff88105b8c6050 ti: ffff88105d814000 task.ti: ffff88105d814000
>> RIP: 0010:[<ffffffff815a5c35>]  [<ffffffff815a5c35>]
>> _raw_spin_lock_bh+0x15/0x40
>> RSP: 0018:ffff88105d815d18  EFLAGS: 00010286
>> RAX: 0000000000010000 RBX: ffffffffffffffff RCX: 0000000000000000
>> RDX: 000000000000000b RSI: 0000000000000000 RDI: 0000000000000003
>> RBP: ffff88105d815d18 R08: ffff88087c611f38 R09: 0000000000000001
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88087c3c9800
>> R13: ffff88107b82ab00 R14: 0000000000000003 R15: 0000000000000007
>> FS:  00007fef64612700(0000) GS:ffff88087fc00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000003 CR3: 000000087c2c7000 CR4: 00000000000407f0
>> Stack:
>>  ffff88105d815d58 ffffffffa05199f0 ffff88105d815d88 ffff88087c3c9800
>>  ffff88087c3c9400 ffff88107b82ab00 ffff88087c3c9660 ffff88087c3c95c8
>>  ffff88105d815d78 ffffffffa0421ce9 ffff88087c3c9400 ffff88107b82aac0
>> Call Trace:
>>  [<ffffffffa05199f0>] svc_xprt_enqueue+0x50/0x220 [sunrpc]
>>  [<ffffffffa0421ce9>] rdma_cma_handler+0x69/0x180 [svcrdma]
>>  [<ffffffffa039d086>] cma_remove_one+0x1f6/0x220 [rdma_cm]
>>  [<ffffffffa01dca86>] ib_unregister_device+0x46/0x120 [ib_core]
>>  [<ffffffffa032ddc9>] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
>>  [<ffffffffa02fb9d0>] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
>>  [<ffffffffa02fba2b>] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
>>  [<ffffffffa033f4cc>] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
>>  [<ffffffff810bd6b2>] SyS_delete_module+0x152/0x220
>>  [<ffffffff811496e4>] ? vm_munmap+0x54/0x70
>>  [<ffffffff815adca6>] system_call_fastpath+0x1a/0x1f
>> Code: 5d c3 0f b7 17 66 39 ca 74 f6 f3 90 0f b7 17 66 39 d1 75 f6 5d c3
>> 55 65 81 04 25 20 b9 00 00 00 02 00 00 48 89 e5 b8 00 00 01 00 <f0> 0f
>> c1 07 89 c2 c1 ea 10 66 39 c2 75 04 5d c3 f3 90 0f b7 07
>> RIP  [<ffffffff815a5c35>] _raw_spin_lock_bh+0x15/0x40
>>  RSP <ffff88105d815d18>
>> CR2: 0000000000000003
>> ---[ end trace 18e02ff413ac4b9b ]---
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range:
>> 0xffffffff80000000-0xffffffff9fffffff)
>> ---[ end Kernel panic - not syncing: Fatal exception in interrupt
>>
>> Kind regards,
>> Klemens
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



WARNING: multiple messages have this Message-ID (diff)
From: Klemens Senn <klemens.senn-cv18SyjCLaheoWH0uzbU5w@public.gmane.org>
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Soft lockup in unloading kernel modules
Date: Tue, 13 May 2014 18:48:49 +0200	[thread overview]
Message-ID: <53724CF1.8030509@ims.co.at> (raw)
In-Reply-To: <536BA9DE.2060702-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Hi Anna,

today I retried unloading the kernel modules with your updated kernel
and additionally I tried the nfsd-next kernel from J. Bruce Fields and
Chuck's nfs-rdma-client kernel.

In short: None of these was able to unload the kernel modules with an
active connection.

In detail:

With your kernel I got following 3 faults:
  o BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:4615]
  o BUG: unable to handle kernel NULL pointer dereference at
0000000000000003
  o BUG: unable to handle kernel paging request at 0000000000005b8c

With the nfsd-next kernel I got following results:
  o BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:4452]
  o module unloading blocks forever, dmesg shows:
    nfsd: last server has exited, flushing export cache
    waiting module removal not supported: please upgrade
  o Kernel keeps running but reports the following:
    nfsd: last server has exited, flushing export cache
    waiting module removal not supported: please upgrade
    svc_xprt_enqueue: threads and transports both waiting??
    INFO: task modprobe:4510 blocked for more than 480 seconds.
          Not tainted 3.15.0-rc1-bfields-master+ #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
    modprobe        D ffff88087fc13440     0  4510   4458 0x00000000
     ffff88105bb23c58 0000000000000086 ffff88105c14e690 0000000000013440
     ffff88105bb23fd8 0000000000013440 ffffffff81a14480 ffff88105c14e690
     0000000000000037 ffff88085d7f74d8 ffff88085d7f74e0 7fffffffffffffff
    Call Trace:
     [<ffffffff815a2424>] schedule+0x24/0x70
     [<ffffffff815a18cc>] schedule_timeout+0x1ec/0x260
     [<ffffffff8159a504>] ? printk+0x5c/0x5e
     [<ffffffff815a3406>] wait_for_completion+0x96/0x100
     [<ffffffff81080c90>] ? try_to_wake_up+0x2b0/0x2b0
     [<ffffffffa0314039>] cma_remove_one+0x1a9/0x220 [rdma_cm]
     [<ffffffffa01fea86>] ib_unregister_device+0x46/0x120 [ib_core]
     [<ffffffffa02c5dc9>] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
     [<ffffffffa04319d0>] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
     [<ffffffffa0431a2b>] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
     [<ffffffffa02d74cc>] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
     [<ffffffff810bd612>] SyS_delete_module+0x152/0x220
     [<ffffffff81149684>] ? vm_munmap+0x54/0x70
     [<ffffffff815ad5a6>] system_call_fastpath+0x1a/0x1f

With the nfs-rdma-client I got following results:
  o module unloading blocks forever, dmesg shows:
    nfsd: last server has exited, flushing export cache
    svc_xprt_enqueue: threads and transports both waiting??
  o BUG: unable to handle kernel paging request at 0000000000004dec
    IP: [<ffffffff815a63b5>] _raw_spin_lock_bh+0x15/0x40
    PGD 107ba9a067 PUD 105c093067 PMD 0
    Oops: 0002 [#1] SMP
    Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry svcrdma
dm_mod cpuid nfs fscache lockd sunrpc af_packet 8021q garp stp llc
rdma_ucm ib_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en
mlx4_ib(-) ib_sa ib_mad ib_core ib_addr sr_mod cdrom usb_storage joydev
mlx4_core usbhid x86_pkg_temp_thermal coretemp kvm_intel kvm
ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul
glue_helper ehci_pci aes_x86_64 ehci_hcd isci iTCO_wdt libsas pcspkr
iTCO_vendor_support igb i2c_algo_bit sb_edac lpc_ich edac_core ioatdma
usbcore tpm_tis ptp microcode i2c_i801 sg mfd_core scsi_transport_sas
ipmi_si usb_common tpm wmi pps_core dca ipmi_msghandler acpi_cpufreq
button edd autofs4 xfs libcrc32c crc32c_intel processor thermal_sys
scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh
    CPU: 14 PID: 4813 Comm: modprobe Not tainted
3.15.0-rc5-cel-nfs-rdma-client-unpatched+ #2
    Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013
    task: ffff88085bf96190 ti: ffff88085d42a000 task.ti: ffff88085d42a000
    RIP: 0010:[<ffffffff815a63b5>]  [<ffffffff815a63b5>]
_raw_spin_lock_bh+0x15/0x40
    RSP: 0018:ffff88085d42bd18  EFLAGS: 00010286
    RAX: 0000000000010000 RBX: 0000000000004de8 RCX: 0000000000000000
    RDX: 000000000000000b RSI: 000000000000000e RDI: 0000000000004dec
    RBP: ffff88085d42bd18 R08: ffff88087c611f38 R09: 000000000000a140
    R10: 000000000000002b R11: 0000000000000000 R12: ffff88085dcc3c00
    R13: ffff88105ca13280 R14: 0000000000004dec R15: 0000000000004df0
    FS:  00007f0e49fb5700(0000) GS:ffff88107fcc0000(0000)
knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000004dec CR3: 000000105b027000 CR4: 00000000000407e0
    Stack:
     ffff88085d42bd58 ffffffffa03bd9f0 0000000001328b88 ffff88085dcc3c00
     ffff88085dce8000 ffff88105ca13280 ffff88085dce8260 ffff88085dce81c8
     ffff88085d42bd78 ffffffffa0441ce9 ffff88085dce8000 ffff88105ca13240
    Call Trace:
     [<ffffffffa03bd9f0>] svc_xprt_enqueue+0x50/0x220 [sunrpc]
     [<ffffffffa0441ce9>] rdma_cma_handler+0x69/0x180 [svcrdma]
     [<ffffffffa031a086>] cma_remove_one+0x1f6/0x220 [rdma_cm]
     [<ffffffffa0261a86>] ib_unregister_device+0x46/0x120 [ib_core]
     [<ffffffffa02b9dc9>] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
     [<ffffffffa02329d0>] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
     [<ffffffffa0232a2b>] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
     [<ffffffffa02cb4cc>] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
     [<ffffffff810bd6f0>] SyS_delete_module+0x170/0x1f0
     [<ffffffff811497f4>] ? vm_munmap+0x54/0x70
     [<ffffffff815ae426>] system_call_fastpath+0x1a/0x1f
    Code: 5d c3 0f b7 17 66 39 ca 74 f6 f3 90 0f b7 17 66 39 d1 75 f6 5d
c3 55 65 81 04 25 20 b9 00 00 00 02 00 00 48 89 e5 b8 00 00 01 00 <f0>
0f c1 07 89 c2 c1 ea 10 66 39 c2 75 04 5d c3 f3 90 0f b7 07
    RIP  [<ffffffff815a63b5>] _raw_spin_lock_bh+0x15/0x40
     RSP <ffff88085d42bd18>
    CR2: 0000000000004dec
    ---[ end trace bf1fd548a33cbfc4 ]---
    Kernel panic - not syncing: Fatal exception in interrupt
    Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffff9fffffff)
    ---[ end Kernel panic - not syncing: Fatal exception in interrupt


Regards,
Klemens


On 05/08/2014 05:59 PM, Anna Schumaker wrote:
> I haven't applied Chuck's recent (v3) patches to that kernel yet (I've been waiting to see if people have comments).  I'll try to push something out today.
> 
> On 05/08/2014 10:28 AM, Senn Klemens wrote:
>> Hi,
>>
>> I am getting a soft lockup on the NFS server on its reboot if at least
>> one client mount is established. I am using OpenSUSE 12.3 with the
>> nfs-rdma kernel from Anna Schumaker
>> (git://git.linux-nfs.org/projects/anna/nfs-rdma.git).
>>
>> The export on the server side is done with
>> /data	*(fsid=0,crossmnt,rw,mp,no_root_squash,sync,no_subtree_check,insecure)
>>
>> Following command is used for mounting the NFSv4 share:
>> mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.19:/ /mnt
>>
>> The HCA is a Mellanox MT4099 on the server and the client.
>>
>> The soft lockup can be reproduced by following steps:
>>   o server: Start the nfs server
>>   o client: Mount the share
>>   o client: Do a "ls" in the mounted directory
>>   o server: Stop the nfs server
>>   o server: Unload the nfs and mlx4 modules or reboot the server (I used
>> the openibd init script from the Mellanox driver without having the
>> Mellanox stack installed)
>>
>> The server reports a soft lockup
>>   BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:6146]
>> most times.
>>
>> Sometimes I get following kernel panic
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000003
>> IP: [<ffffffff815a5c35>] _raw_spin_lock_bh+0x15/0x40
>> PGD 82a820067 PUD 857832067 PMD 0
>> Oops: 0002 [#1] SMP
>> Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry nfnetlink_log
>> nfnetlink bluetooth rfkill nfsv4 svcrdma dm_mod cpuid nfs fscache lockd
>> sunrpc af_packet 8021q garp stp llc rdma_ucm ib_ucm rdma_cm iw_cm
>> ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib(-) ib_sa ib_mad ib_core
>> ib_addr sr_mod cdrom usb_storage joydev mlx4_core usbhid
>> x86_pkg_temp_thermal coretemp kvm_intel kvm ghash_clmulni_intel
>> aesni_intel ablk_helper cryptd iTCO_wdt lrw igb gf128mul
>> iTCO_vendor_support ehci_pci glue_helper pcspkr i2c_algo_bit isci
>> ehci_hcd aes_x86_64 ptp libsas ioatdma lpc_ich microcode sb_edac sg
>> pps_core usbcore ipmi_si tpm_tis edac_core scsi_transport_sas i2c_i801
>> mfd_core dca usb_common tpm ipmi_msghandler wmi acpi_cpufreq button edd
>> autofs4 xfs libcrc32c crc32c_intel processor thermal_sys scsi_dh_rdac
>> scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh [last unloaded: oid_registry]
>> CPU: 0 PID: 6603 Comm: modprobe Not tainted 3.15.0-rc2-anna-nfs-rdma+ #3
>> Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013
>> task: ffff88105b8c6050 ti: ffff88105d814000 task.ti: ffff88105d814000
>> RIP: 0010:[<ffffffff815a5c35>]  [<ffffffff815a5c35>]
>> _raw_spin_lock_bh+0x15/0x40
>> RSP: 0018:ffff88105d815d18  EFLAGS: 00010286
>> RAX: 0000000000010000 RBX: ffffffffffffffff RCX: 0000000000000000
>> RDX: 000000000000000b RSI: 0000000000000000 RDI: 0000000000000003
>> RBP: ffff88105d815d18 R08: ffff88087c611f38 R09: 0000000000000001
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88087c3c9800
>> R13: ffff88107b82ab00 R14: 0000000000000003 R15: 0000000000000007
>> FS:  00007fef64612700(0000) GS:ffff88087fc00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000003 CR3: 000000087c2c7000 CR4: 00000000000407f0
>> Stack:
>>  ffff88105d815d58 ffffffffa05199f0 ffff88105d815d88 ffff88087c3c9800
>>  ffff88087c3c9400 ffff88107b82ab00 ffff88087c3c9660 ffff88087c3c95c8
>>  ffff88105d815d78 ffffffffa0421ce9 ffff88087c3c9400 ffff88107b82aac0
>> Call Trace:
>>  [<ffffffffa05199f0>] svc_xprt_enqueue+0x50/0x220 [sunrpc]
>>  [<ffffffffa0421ce9>] rdma_cma_handler+0x69/0x180 [svcrdma]
>>  [<ffffffffa039d086>] cma_remove_one+0x1f6/0x220 [rdma_cm]
>>  [<ffffffffa01dca86>] ib_unregister_device+0x46/0x120 [ib_core]
>>  [<ffffffffa032ddc9>] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
>>  [<ffffffffa02fb9d0>] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
>>  [<ffffffffa02fba2b>] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
>>  [<ffffffffa033f4cc>] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
>>  [<ffffffff810bd6b2>] SyS_delete_module+0x152/0x220
>>  [<ffffffff811496e4>] ? vm_munmap+0x54/0x70
>>  [<ffffffff815adca6>] system_call_fastpath+0x1a/0x1f
>> Code: 5d c3 0f b7 17 66 39 ca 74 f6 f3 90 0f b7 17 66 39 d1 75 f6 5d c3
>> 55 65 81 04 25 20 b9 00 00 00 02 00 00 48 89 e5 b8 00 00 01 00 <f0> 0f
>> c1 07 89 c2 c1 ea 10 66 39 c2 75 04 5d c3 f3 90 0f b7 07
>> RIP  [<ffffffff815a5c35>] _raw_spin_lock_bh+0x15/0x40
>>  RSP <ffff88105d815d18>
>> CR2: 0000000000000003
>> ---[ end trace 18e02ff413ac4b9b ]---
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range:
>> 0xffffffff80000000-0xffffffff9fffffff)
>> ---[ end Kernel panic - not syncing: Fatal exception in interrupt
>>
>> Kind regards,
>> Klemens
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-05-13 16:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-08 14:28 Soft lockup in unloading kernel modules Senn Klemens
2014-05-08 14:28 ` Senn Klemens
2014-05-08 15:59 ` Anna Schumaker
2014-05-08 15:59   ` Anna Schumaker
2014-05-13 16:48   ` Klemens Senn [this message]
2014-05-13 16:48     ` Klemens Senn
2014-05-19 17:51     ` Chuck Lever
2014-05-19 17:51       ` Chuck Lever
2014-05-19 21:02       ` Shirley Ma
2014-05-19 21:02         ` Shirley Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53724CF1.8030509@ims.co.at \
    --to=klemens.senn@ims.co.at \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.