* [PATCH] lockd: fix races in per-net NSM client handling
2012-10-23 19:49 Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
@ 2012-10-24 10:18 ` Stanislav Kinsbursky
0 siblings, 0 replies; 9+ messages in thread
From: Stanislav Kinsbursky @ 2012-10-24 10:18 UTC (permalink / raw)
To: Trond.Myklebust; +Cc: bfields, linux-nfs, linux-kernel, devel
This patch fixes two problems:
1) Removes races on NSM creation.
2) Fixes silly misprint on NSM client destruction (usage counter was checked
for non-zero value instead of zero).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
fs/lockd/mon.c | 35 +++++++++++++++++++++++------------
1 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
index e4fb3ba..e3e59f6 100644
--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -85,30 +85,41 @@ static struct rpc_clnt *nsm_create(struct net *net)
return rpc_create(&args);
}
-static struct rpc_clnt *nsm_client_get(struct net *net)
+static struct rpc_clnt *nsm_get_client(struct net *net)
{
- static DEFINE_MUTEX(nsm_create_mutex);
- struct rpc_clnt *clnt;
+ struct rpc_clnt *clnt = NULL;
struct lockd_net *ln = net_generic(net, lockd_net_id);
spin_lock(&ln->nsm_clnt_lock);
if (ln->nsm_users) {
ln->nsm_users++;
clnt = ln->nsm_clnt;
- spin_unlock(&ln->nsm_clnt_lock);
- goto out;
}
spin_unlock(&ln->nsm_clnt_lock);
+ return clnt;
+}
+
+static struct rpc_clnt *nsm_client_get(struct net *net)
+{
+ static DEFINE_MUTEX(nsm_create_mutex);
+ struct rpc_clnt *clnt;
+ struct lockd_net *ln = net_generic(net, lockd_net_id);
+
+ clnt = nsm_get_client(net);
+ if (clnt)
+ return clnt;
mutex_lock(&nsm_create_mutex);
- clnt = nsm_create(net);
- if (!IS_ERR(clnt)) {
- ln->nsm_clnt = clnt;
- smp_wmb();
- ln->nsm_users = 1;
+ clnt = nsm_get_client(net);
+ if (clnt == NULL) {
+ clnt = nsm_create(net);
+ if (!IS_ERR(clnt)) {
+ ln->nsm_clnt = clnt;
+ smp_wmb();
+ ln->nsm_users = 1;
+ }
}
mutex_unlock(&nsm_create_mutex);
-out:
return clnt;
}
@@ -120,7 +131,7 @@ static void nsm_client_put(struct net *net)
spin_lock(&ln->nsm_clnt_lock);
if (ln->nsm_users) {
- if (--ln->nsm_users)
+ if (--ln->nsm_users == 0)
ln->nsm_clnt = NULL;
shutdown = !ln->nsm_users;
}
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] lockd: fix races in per-net NSM client handling
@ 2012-10-31 17:27 Paweł Sikora
2012-10-31 17:49 ` Greg KH
0 siblings, 1 reply; 9+ messages in thread
From: Paweł Sikora @ 2012-10-31 17:27 UTC (permalink / raw)
To: skinsbursky; +Cc: stable, linux-kernel, baggins, arekm
Hi,
the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to fix
the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
please queue this path for 3.6.$next.
BR,
Paweł.
[173788.113576] ------------[ cut here ]------------
[173788.133439] hrtimer: interrupt took 11004406 ns
[173788.157195] kernel BUG at fs/lockd/mon.c:150!
[173788.179641] invalid opcode: 0000 [#1] SMP
[173788.202033] Modules linked in: nfsv4 fuse nfsv3 nfs fscache nfsd auth_rpcgss nfs_acl lockd sunrpc ipmi_si ipmi_devintf ipmi_msghandler sch_sfq iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_filter xt_TCPMSS xt_tcpudp iptable_mangle ip_tables ip6table_filter ip6_tables x_tables quota_v2 quota_tree ext4 crc16 jbd2 raid10 raid0 dm_mod uvesafb autofs4 dummy ide_cd_mod cdrom ata_generic pata_acpi pata_atiixp sp5100_tco ide_pci_generic igb ptp pps_core psmouse k10temp mgag200 serio_raw dca pcspkr ttm powernow_k8 drm_kms_helper drm mperf freq_table kvm_amd evdev joydev i2c_piix4 kvm i2c_algo_bit hid_generic syscopyarea sysfillrect sysimgblt hwmon microcode atiixp amd64_edac_mod edac_core i2c_core ide_core processor edac_mce_amd button ext3 mbcache jbd sd_mod crc_t10dif raid1 md_mod
[173788.378811] ahci libahci libata scsi_mod usbhid hid ohci_hcd ehci_hcd usbcore usb_common
[173788.416270] CPU 2
[173788.416648] Pid: 1383, comm: lockd Not tainted 3.6.3 #11 Supermicro H8DGU/H8DGU
[173788.493500] RIP: 0010:[<ffffffffa04e64c0>] [<ffffffffa04e64c0>] nsm_mon_unmon+0x90/0xa0 [lockd]
[173788.529520] RSP: 0000:ffff8808093cdd00 EFLAGS: 00010246
[173788.565141] RAX: ffff8808093cdd28 RBX: ffff880ba2353200 RCX: 0000000000000000
[173788.601765] RDX: ffff8808093cdd68 RSI: 0000000000000002 RDI: ffff880ba2353200
[173788.638672] RBP: ffff8808093cdd50 R08: 00000000000168a0 R09: 000000000000ffff
[173788.675546] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880407db6c00
[173788.712500] R13: 0000000000000000 R14: ffff8808093cde28 R15: ffff8808093cde20
[173788.749767] FS: 00007f105fe73780(0000) GS:ffff88040fc80000(0000) knlGS:00000000f6663700
[173788.788015] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[173788.826367] CR2: 0000000000bce580 CR3: 000000044b252000 CR4: 00000000000007e0
[173788.865560] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[173788.904753] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[173788.943652] Process lockd (pid: 1383, threadinfo ffff8808093cc000, task ffff880808db3020)
[173788.983327] Stack:
[173789.022719] ffff8808093cdd60 ffffffffa04ae9e4 ffff8808093cdd28 ffff8808093cdd68
[173789.063923] 0000000000000000 ffff880ba23532b1 00000003000186b5 0000000400000010
[173789.105657] ffff880ba23532c1 000000000000008c ffff8808093cdd90 ffffffffa04e6821
[173789.148126] Call Trace:
[173789.190527] [<ffffffffa04ae9e4>] ? sunrpc_cache_lookup+0x74/0x2f0 [sunrpc]
[173789.233864] [<ffffffffa04e6821>] nsm_monitor+0xd1/0x1b0 [lockd]
[173789.277890] [<ffffffffa04e8d18>] nlm4svc_retrieve_args+0xa8/0xf0 [lockd]
[173789.322014] [<ffffffffa04e90c2>] nlm4svc_proc_lock+0x52/0xe0 [lockd]
[173789.366333] [<ffffffffa04e86c9>] ? nlm4svc_decode_lockargs+0x49/0xc0 [lockd]
[173789.411109] [<ffffffffa04a48d7>] svc_process+0x707/0x7a0 [sunrpc]
[173789.456179] [<ffffffffa04e3825>] lockd+0xa5/0x1b0 [lockd]
[173789.500017] [<ffffffffa04e3780>] ? set_grace_period+0xa0/0xa0 [lockd]
[173789.543446] [<ffffffff810726ce>] kthread+0x8e/0xa0
[173789.585890] [<ffffffff814af784>] kernel_thread_helper+0x4/0x10
[173789.628042] [<ffffffff81072640>] ? kthread_freezable_should_stop+0x70/0x70
[173789.670892] [<ffffffff814af780>] ? gs_change+0x13/0x13
[173789.713913] Code: 00 00 00 48 c1 e6 06 ba 00 04 00 00 48 29 c6 48 03 71 38 48 89 75 b8 48 8d 75 b8 e8 1b 3c fb ff 31 d2 85 c0 0f 4e d0 c9 89 d0 c3 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 fe b9
[173789.806212] RIP [<ffffffffa04e64c0>] nsm_mon_unmon+0x90/0xa0 [lockd]
[173789.851690] RSP <ffff8808093cdd00>
[173789.897665] ---[ end trace c8774e11cc39ecc3 ]---
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] lockd: fix races in per-net NSM client handling
2012-10-31 17:27 [PATCH] lockd: fix races in per-net NSM client handling Paweł Sikora
@ 2012-10-31 17:49 ` Greg KH
2012-10-31 18:02 ` Paweł Sikora
2012-10-31 18:05 ` Jonathan Nieder
0 siblings, 2 replies; 9+ messages in thread
From: Greg KH @ 2012-10-31 17:49 UTC (permalink / raw)
To: Paweł Sikora; +Cc: skinsbursky, stable, linux-kernel, baggins, arekm
On Wed, Oct 31, 2012 at 06:27:36PM +0100, Paweł Sikora wrote:
> Hi,
>
> the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to fix
> the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
> please queue this path for 3.6.$next.
Is it in Linus's tree already? If so, what is the git commit id?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] lockd: fix races in per-net NSM client handling
2012-10-31 17:49 ` Greg KH
@ 2012-10-31 18:02 ` Paweł Sikora
2012-10-31 18:18 ` Myklebust, Trond
2012-10-31 18:05 ` Jonathan Nieder
1 sibling, 1 reply; 9+ messages in thread
From: Paweł Sikora @ 2012-10-31 18:02 UTC (permalink / raw)
To: Greg KH; +Cc: skinsbursky, stable, linux-kernel, baggins, arekm
On Wednesday 31 of October 2012 10:49:46 Greg KH wrote:
> On Wed, Oct 31, 2012 at 06:27:36PM +0100, Paweł Sikora wrote:
> > Hi,
> >
> > the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to fix
> > the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
> > please queue this path for 3.6.$next.
>
> Is it in Linus's tree already? If so, what is the git commit id?
the mainstream contains some lock deamon fixes already:
* e498daa LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero
* a4ee8d9 LOCKD: fix races in nsm_client_get
but i don't know where is the right fix. Stanislav, could you put some light on this?
BR,
Paweł.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] lockd: fix races in per-net NSM client handling
2012-10-31 17:49 ` Greg KH
2012-10-31 18:02 ` Paweł Sikora
@ 2012-10-31 18:05 ` Jonathan Nieder
2012-10-31 18:22 ` Greg KH
1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Nieder @ 2012-10-31 18:05 UTC (permalink / raw)
To: Greg KH
Cc: Paweł Sikora, skinsbursky, stable, linux-kernel, baggins,
arekm
Hi,
Greg KH wrote:
> On Wed, Oct 31, 2012 at 06:27:36PM +0100, Paweł Sikora wrote:
>> the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to fix
>> the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
>> please queue this path for 3.6.$next.
>
> Is it in Linus's tree already? If so, what is the git commit id?
One of
a4ee8d978e47 LOCKD: fix races in nsm_client_get
e498daa81295 LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero
both of which were included in v3.6.5.
Hope that helps,
Jonathan
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH] lockd: fix races in per-net NSM client handling
2012-10-31 18:02 ` Paweł Sikora
@ 2012-10-31 18:18 ` Myklebust, Trond
0 siblings, 0 replies; 9+ messages in thread
From: Myklebust, Trond @ 2012-10-31 18:18 UTC (permalink / raw)
To: Paweł Sikora, Greg KH
Cc: skinsbursky@parallels.com, stable@vger.kernel.org,
linux-kernel@vger.kernel.org, baggins@pld-linux.org,
arekm@pld-linux.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1374 bytes --]
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Pawel Sikora
> Sent: Wednesday, October 31, 2012 2:03 PM
> To: Greg KH
> Cc: skinsbursky@parallels.com; stable@vger.kernel.org; linux-
> kernel@vger.kernel.org; baggins@pld-linux.org; arekm@pld-linux.org
> Subject: Re: [PATCH] lockd: fix races in per-net NSM client handling
>
> On Wednesday 31 of October 2012 10:49:46 Greg KH wrote:
> > On Wed, Oct 31, 2012 at 06:27:36PM +0100, PaweÅ Sikora wrote:
> > > Hi,
> > >
> > > the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to
> > > fix the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
> > > please queue this path for 3.6.$next.
> >
> > Is it in Linus's tree already? If so, what is the git commit id?
>
> the mainstream contains some lock deamon fixes already:
>
> * e498daa LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero
> * a4ee8d9 LOCKD: fix races in nsm_client_get
>
> but i don't know where is the right fix. Stanislav, could you put some light on
> this?
The above 2 patches (which are already in 3.6.5) replace Stanislav's patch, which will not be merged upstream.
Trond
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] lockd: fix races in per-net NSM client handling
2012-10-31 18:05 ` Jonathan Nieder
@ 2012-10-31 18:22 ` Greg KH
2012-11-01 6:54 ` Paweł Sikora
0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2012-10-31 18:22 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Paweł Sikora, skinsbursky, stable, linux-kernel, baggins,
arekm
On Wed, Oct 31, 2012 at 11:05:51AM -0700, Jonathan Nieder wrote:
> Hi,
>
> Greg KH wrote:
> > On Wed, Oct 31, 2012 at 06:27:36PM +0100, Paweł Sikora wrote:
>
> >> the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to fix
> >> the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
> >> please queue this path for 3.6.$next.
> >
> > Is it in Linus's tree already? If so, what is the git commit id?
>
> One of
>
> a4ee8d978e47 LOCKD: fix races in nsm_client_get
> e498daa81295 LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero
>
> both of which were included in v3.6.5.
Ok, Paweł, does 3.6.5 work properly for you?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] lockd: fix races in per-net NSM client handling
2012-10-31 18:22 ` Greg KH
@ 2012-11-01 6:54 ` Paweł Sikora
2012-11-01 13:14 ` Greg KH
0 siblings, 1 reply; 9+ messages in thread
From: Paweł Sikora @ 2012-11-01 6:54 UTC (permalink / raw)
To: Greg KH; +Cc: Jonathan Nieder, skinsbursky, stable, linux-kernel, baggins,
arekm
On Wednesday 31 of October 2012 11:22:06 Greg KH wrote:
> On Wed, Oct 31, 2012 at 11:05:51AM -0700, Jonathan Nieder wrote:
> > Hi,
> >
> > Greg KH wrote:
> > > On Wed, Oct 31, 2012 at 06:27:36PM +0100, Paweł Sikora wrote:
> >
> > >> the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to fix
> > >> the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
> > >> please queue this path for 3.6.$next.
> > >
> > > Is it in Linus's tree already? If so, what is the git commit id?
> >
> > One of
> >
> > a4ee8d978e47 LOCKD: fix races in nsm_client_get
> > e498daa81295 LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero
> >
> > both of which were included in v3.6.5.
>
> Ok, Paweł, does 3.6.5 work properly for you?
~12h uptime with full cpu/nfs load and all servers with 3.6.5 seem to work stable.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] lockd: fix races in per-net NSM client handling
2012-11-01 6:54 ` Paweł Sikora
@ 2012-11-01 13:14 ` Greg KH
0 siblings, 0 replies; 9+ messages in thread
From: Greg KH @ 2012-11-01 13:14 UTC (permalink / raw)
To: Paweł Sikora
Cc: Jonathan Nieder, skinsbursky, stable, linux-kernel, baggins,
arekm
On Thu, Nov 01, 2012 at 07:54:21AM +0100, Paweł Sikora wrote:
> On Wednesday 31 of October 2012 11:22:06 Greg KH wrote:
> > On Wed, Oct 31, 2012 at 11:05:51AM -0700, Jonathan Nieder wrote:
> > > Hi,
> > >
> > > Greg KH wrote:
> > > > On Wed, Oct 31, 2012 at 06:27:36PM +0100, Paweł Sikora wrote:
> > >
> > > >> the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to fix
> > > >> the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
> > > >> please queue this path for 3.6.$next.
> > > >
> > > > Is it in Linus's tree already? If so, what is the git commit id?
> > >
> > > One of
> > >
> > > a4ee8d978e47 LOCKD: fix races in nsm_client_get
> > > e498daa81295 LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero
> > >
> > > both of which were included in v3.6.5.
> >
> > Ok, Paweł, does 3.6.5 work properly for you?
>
> ~12h uptime with full cpu/nfs load and all servers with 3.6.5 seem to work stable.
Wonderful, thanks for testing.
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-11-01 13:15 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-31 17:27 [PATCH] lockd: fix races in per-net NSM client handling Paweł Sikora
2012-10-31 17:49 ` Greg KH
2012-10-31 18:02 ` Paweł Sikora
2012-10-31 18:18 ` Myklebust, Trond
2012-10-31 18:05 ` Jonathan Nieder
2012-10-31 18:22 ` Greg KH
2012-11-01 6:54 ` Paweł Sikora
2012-11-01 13:14 ` Greg KH
-- strict thread matches above, loose matches on Subject: below --
2012-10-23 19:49 Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-24 10:18 ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.