All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] lockd: fix races in per-net NSM client handling
@ 2012-10-31 17:27 Paweł Sikora
  2012-10-31 17:49 ` Greg KH
  0 siblings, 1 reply; 9+ messages in thread
From: Paweł Sikora @ 2012-10-31 17:27 UTC (permalink / raw)
  To: skinsbursky; +Cc: stable, linux-kernel, baggins, arekm

Hi,

the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to fix
the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
please queue this path for 3.6.$next.

BR,
Paweł.

[173788.113576] ------------[ cut here ]------------
[173788.133439] hrtimer: interrupt took 11004406 ns
[173788.157195] kernel BUG at fs/lockd/mon.c:150!
[173788.179641] invalid opcode: 0000 [#1] SMP 
[173788.202033] Modules linked in: nfsv4 fuse nfsv3 nfs fscache nfsd auth_rpcgss nfs_acl lockd sunrpc ipmi_si ipmi_devintf ipmi_msghandler sch_sfq iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_filter xt_TCPMSS xt_tcpudp iptable_mangle ip_tables ip6table_filter ip6_tables x_tables quota_v2 quota_tree ext4 crc16 jbd2 raid10 raid0 dm_mod uvesafb autofs4 dummy ide_cd_mod cdrom ata_generic pata_acpi pata_atiixp sp5100_tco ide_pci_generic igb ptp pps_core psmouse k10temp mgag200 serio_raw dca pcspkr ttm powernow_k8 drm_kms_helper drm mperf freq_table kvm_amd evdev joydev i2c_piix4 kvm i2c_algo_bit hid_generic syscopyarea sysfillrect sysimgblt hwmon microcode atiixp amd64_edac_mod edac_core i2c_core ide_core processor edac_mce_amd button ext3 mbcache jbd sd_mod crc_t10dif raid1 md_mod
[173788.378811]  ahci libahci libata scsi_mod usbhid hid ohci_hcd ehci_hcd usbcore usb_common
[173788.416270] CPU 2 
[173788.416648] Pid: 1383, comm: lockd Not tainted 3.6.3 #11 Supermicro H8DGU/H8DGU
[173788.493500] RIP: 0010:[<ffffffffa04e64c0>]  [<ffffffffa04e64c0>] nsm_mon_unmon+0x90/0xa0 [lockd]
[173788.529520] RSP: 0000:ffff8808093cdd00  EFLAGS: 00010246
[173788.565141] RAX: ffff8808093cdd28 RBX: ffff880ba2353200 RCX: 0000000000000000
[173788.601765] RDX: ffff8808093cdd68 RSI: 0000000000000002 RDI: ffff880ba2353200
[173788.638672] RBP: ffff8808093cdd50 R08: 00000000000168a0 R09: 000000000000ffff
[173788.675546] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880407db6c00
[173788.712500] R13: 0000000000000000 R14: ffff8808093cde28 R15: ffff8808093cde20
[173788.749767] FS:  00007f105fe73780(0000) GS:ffff88040fc80000(0000) knlGS:00000000f6663700
[173788.788015] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[173788.826367] CR2: 0000000000bce580 CR3: 000000044b252000 CR4: 00000000000007e0
[173788.865560] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[173788.904753] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[173788.943652] Process lockd (pid: 1383, threadinfo ffff8808093cc000, task ffff880808db3020)
[173788.983327] Stack:
[173789.022719]  ffff8808093cdd60 ffffffffa04ae9e4 ffff8808093cdd28 ffff8808093cdd68
[173789.063923]  0000000000000000 ffff880ba23532b1 00000003000186b5 0000000400000010
[173789.105657]  ffff880ba23532c1 000000000000008c ffff8808093cdd90 ffffffffa04e6821
[173789.148126] Call Trace:
[173789.190527]  [<ffffffffa04ae9e4>] ? sunrpc_cache_lookup+0x74/0x2f0 [sunrpc]
[173789.233864]  [<ffffffffa04e6821>] nsm_monitor+0xd1/0x1b0 [lockd]
[173789.277890]  [<ffffffffa04e8d18>] nlm4svc_retrieve_args+0xa8/0xf0 [lockd]
[173789.322014]  [<ffffffffa04e90c2>] nlm4svc_proc_lock+0x52/0xe0 [lockd]
[173789.366333]  [<ffffffffa04e86c9>] ? nlm4svc_decode_lockargs+0x49/0xc0 [lockd]
[173789.411109]  [<ffffffffa04a48d7>] svc_process+0x707/0x7a0 [sunrpc]
[173789.456179]  [<ffffffffa04e3825>] lockd+0xa5/0x1b0 [lockd]
[173789.500017]  [<ffffffffa04e3780>] ? set_grace_period+0xa0/0xa0 [lockd]
[173789.543446]  [<ffffffff810726ce>] kthread+0x8e/0xa0
[173789.585890]  [<ffffffff814af784>] kernel_thread_helper+0x4/0x10
[173789.628042]  [<ffffffff81072640>] ? kthread_freezable_should_stop+0x70/0x70
[173789.670892]  [<ffffffff814af780>] ? gs_change+0x13/0x13
[173789.713913] Code: 00 00 00 48 c1 e6 06 ba 00 04 00 00 48 29 c6 48 03 71 38 48 89 75 b8 48 8d 75 b8 e8 1b 3c fb ff 31 d2 85 c0 0f 4e d0 c9 89 d0 c3 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 fe b9 
[173789.806212] RIP  [<ffffffffa04e64c0>] nsm_mon_unmon+0x90/0xa0 [lockd]
[173789.851690]  RSP <ffff8808093cdd00>
[173789.897665] ---[ end trace c8774e11cc39ecc3 ]---


^ permalink raw reply	[flat|nested] 9+ messages in thread
* Re: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug)
@ 2012-10-23 19:49 Nix
  2012-10-24 10:18 ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
  0 siblings, 1 reply; 9+ messages in thread
From: Nix @ 2012-10-23 19:49 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: J. Bruce Fields, Ted Ts'o, linux-kernel@vger.kernel.org,
	Schumaker, Bryan, Peng Tao, gregkh@linuxfoundation.org,
	linux-nfs@vger.kernel.org, Stanislav Kinsbursky

On 23 Oct 2012, Trond Myklebust outgrape:

> On Tue, 2012-10-23 at 13:57 -0400, Trond Myklebust wrote:
>> On Tue, 2012-10-23 at 17:44 +0000, Myklebust, Trond wrote:
>> > You can't hold a spinlock while sleeping. Both mutex_lock() and nsm_create() can definitely sleep.
>> > 
>> > The correct way to do this is to grab the spinlock and recheck the value of ln->nsm_users inside the 'if (!IS_ERR())' condition. If it is still zero, bump it and set ln->nsm_clnt, otherwise bump it, get the existing ln->nsm_clnt and call rpc_shutdown_clnt() on the redundant nsm client after dropping the spinlock.
>> > 
>> > Cheers
>> >   Trond
>> 
>> Can you please check if the following patch fixes the issue?
>> 
>> Cheers
>>   Trond
>> 
> Meh... This one gets rid of the 100% redundant mutex...

No help, I'm afraid:

[  894.005699] ------------[ cut here ]------------
[  894.005929] kernel BUG at fs/lockd/mon.c:159!
[  894.006156] invalid opcode: 0000 [#1] SMP
[  894.006451] Modules linked in: firewire_ohci firewire_core [last unloaded: microcode]
[  894.007005] CPU 1
[  894.007050] Pid: 1035, comm: lockd Not tainted 3.6.3-dirty #1 empty empty/S7010
[  894.007669] RIP: 0010:[<ffffffff8120fbbc>]  [<ffffffff8120fbbc>] nsm_mon_unmon+0x64/0x98
[  894.008126] RSP: 0018:ffff880620a23ce0  EFLAGS: 00010246
[  894.008355] RAX: ffff880620a23ce8 RBX: 0000000000000000 RCX: 0000000000000000
[  894.008591] RDX: ffff880620a23d58 RSI: 0000000000000002 RDI: ffff880620a23d30
[  894.008827] RBP: ffff880620a23d40 R08: 0000000000000000 R09: ffffea00188e4f00
[  894.009063] R10: ffffffff814d032f R11: 0000000000000020 R12: 0000000000000000
[  894.009300] R13: ffff88061f067e40 R14: ffff88061f067ee8 R15: ffff88062393dc00
[  894.009537] FS:  0000000000000000(0000) GS:ffff88063fc40000(0000) knlGS:0000000000000000
[  894.009956] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  894.010187] CR2: 00007f056a9a6ff0 CR3: 0000000001a0b000 CR4: 00000000000027e0
[  894.010422] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  894.010659] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  894.010896] Process lockd (pid: 1035, threadinfo ffff880620a22000, task ffff8806208b5900)
[  894.011310] Stack:
[  894.011528]  0000000000000010 ffff8806102d3db1 00000003000186b5 ffffffff00000010
[  894.012083]  ffff8806102d3dc1 000000000000008c 0000000000000000 ffff880620a23ce8
[  894.012637]  ffff880620a23d58 0000000000000000 ffff88061f067ee8 ffff8806102d3d00
[  894.013190] Call Trace:
[  894.013413]  [<ffffffff8120ff07>] nsm_monitor+0x123/0x17e
[  894.013645]  [<ffffffff81211b72>] nlm4svc_retrieve_args+0x62/0xd7
[  894.013879]  [<ffffffff81211f71>] nlm4svc_proc_lock+0x3c/0xb5
[  894.014112]  [<ffffffff812116a3>] ? nlm4svc_decode_lockargs+0x47/0xb2
[  894.014349]  [<ffffffff814d89fa>] svc_process+0x3bf/0x6a1
[  894.014581]  [<ffffffff8120d5f0>] lockd+0x127/0x164
[  894.014810]  [<ffffffff8120d4c9>] ? set_grace_period+0x8a/0x8a
[  894.015046]  [<ffffffff8107bcbc>] kthread+0x8b/0x93
[  894.015277]  [<ffffffff81501334>] kernel_thread_helper+0x4/0x10
[  894.015511]  [<ffffffff8107bc31>] ? kthread_worker_fn+0xe1/0xe1
[  894.015744]  [<ffffffff81501330>] ? gs_change+0xb/0xb
[  894.015972] Code: b8 10 00 00 00 48 89 45 c0 48 8d 81 8c 00 00 00 b9 08 00 00 00 48 89 45 c8 89 d8 f3 ab 48 8d 45 a8 48 89 55 e0 48 89 45 d8 75 02 <0f> 0b 89 f6 48 c7 02 00 00 00 00 4c 89 c7 48 6b f6 38 ba 00 04
[  894.018895] RIP  [<ffffffff8120fbbc>] nsm_mon_unmon+0x64/0x98
[  894.019163]  RSP <ffff880620a23ce0>
[  894.019401] ---[ end trace b8ef5cb81bec72c8 ]---

Slightly different timing, but still boom.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-11-01 13:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-31 17:27 [PATCH] lockd: fix races in per-net NSM client handling Paweł Sikora
2012-10-31 17:49 ` Greg KH
2012-10-31 18:02   ` Paweł Sikora
2012-10-31 18:18     ` Myklebust, Trond
2012-10-31 18:05   ` Jonathan Nieder
2012-10-31 18:22     ` Greg KH
2012-11-01  6:54       ` Paweł Sikora
2012-11-01 13:14         ` Greg KH
  -- strict thread matches above, loose matches on Subject: below --
2012-10-23 19:49 Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-24 10:18 ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.