From: Nix <nix@esperi.org.uk>
To: "Myklebust\, Trond" <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
"Ted Ts'o" <tytso@mit.edu>,
"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Schumaker\, Bryan" <Bryan.Schumaker@netapp.com>,
Peng Tao <bergwolf@gmail.com>,
"gregkh\@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"linux-nfs\@vger.kernel.org" <linux-nfs@vger.kernel.org>,
Stanislav Kinsbursky <skinsbursky@parallels.com>
Subject: Re: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug)
Date: Tue, 23 Oct 2012 20:49:13 +0100 [thread overview]
Message-ID: <87fw55hsue.fsf@spindle.srvr.nix> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA90928CF49@SACEXCMBX04-PRD.hq.netapp.com> (Trond Myklebust's message of "Tue, 23 Oct 2012 18:23:48 +0000")
On 23 Oct 2012, Trond Myklebust outgrape:
> On Tue, 2012-10-23 at 13:57 -0400, Trond Myklebust wrote:
>> On Tue, 2012-10-23 at 17:44 +0000, Myklebust, Trond wrote:
>> > You can't hold a spinlock while sleeping. Both mutex_lock() and nsm_create() can definitely sleep.
>> >
>> > The correct way to do this is to grab the spinlock and recheck the value of ln->nsm_users inside the 'if (!IS_ERR())' condition. If it is still zero, bump it and set ln->nsm_clnt, otherwise bump it, get the existing ln->nsm_clnt and call rpc_shutdown_clnt() on the redundant nsm client after dropping the spinlock.
>> >
>> > Cheers
>> > Trond
>>
>> Can you please check if the following patch fixes the issue?
>>
>> Cheers
>> Trond
>>
> Meh... This one gets rid of the 100% redundant mutex...
No help, I'm afraid:
[ 894.005699] ------------[ cut here ]------------
[ 894.005929] kernel BUG at fs/lockd/mon.c:159!
[ 894.006156] invalid opcode: 0000 [#1] SMP
[ 894.006451] Modules linked in: firewire_ohci firewire_core [last unloaded: microcode]
[ 894.007005] CPU 1
[ 894.007050] Pid: 1035, comm: lockd Not tainted 3.6.3-dirty #1 empty empty/S7010
[ 894.007669] RIP: 0010:[<ffffffff8120fbbc>] [<ffffffff8120fbbc>] nsm_mon_unmon+0x64/0x98
[ 894.008126] RSP: 0018:ffff880620a23ce0 EFLAGS: 00010246
[ 894.008355] RAX: ffff880620a23ce8 RBX: 0000000000000000 RCX: 0000000000000000
[ 894.008591] RDX: ffff880620a23d58 RSI: 0000000000000002 RDI: ffff880620a23d30
[ 894.008827] RBP: ffff880620a23d40 R08: 0000000000000000 R09: ffffea00188e4f00
[ 894.009063] R10: ffffffff814d032f R11: 0000000000000020 R12: 0000000000000000
[ 894.009300] R13: ffff88061f067e40 R14: ffff88061f067ee8 R15: ffff88062393dc00
[ 894.009537] FS: 0000000000000000(0000) GS:ffff88063fc40000(0000) knlGS:0000000000000000
[ 894.009956] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 894.010187] CR2: 00007f056a9a6ff0 CR3: 0000000001a0b000 CR4: 00000000000027e0
[ 894.010422] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 894.010659] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 894.010896] Process lockd (pid: 1035, threadinfo ffff880620a22000, task ffff8806208b5900)
[ 894.011310] Stack:
[ 894.011528] 0000000000000010 ffff8806102d3db1 00000003000186b5 ffffffff00000010
[ 894.012083] ffff8806102d3dc1 000000000000008c 0000000000000000 ffff880620a23ce8
[ 894.012637] ffff880620a23d58 0000000000000000 ffff88061f067ee8 ffff8806102d3d00
[ 894.013190] Call Trace:
[ 894.013413] [<ffffffff8120ff07>] nsm_monitor+0x123/0x17e
[ 894.013645] [<ffffffff81211b72>] nlm4svc_retrieve_args+0x62/0xd7
[ 894.013879] [<ffffffff81211f71>] nlm4svc_proc_lock+0x3c/0xb5
[ 894.014112] [<ffffffff812116a3>] ? nlm4svc_decode_lockargs+0x47/0xb2
[ 894.014349] [<ffffffff814d89fa>] svc_process+0x3bf/0x6a1
[ 894.014581] [<ffffffff8120d5f0>] lockd+0x127/0x164
[ 894.014810] [<ffffffff8120d4c9>] ? set_grace_period+0x8a/0x8a
[ 894.015046] [<ffffffff8107bcbc>] kthread+0x8b/0x93
[ 894.015277] [<ffffffff81501334>] kernel_thread_helper+0x4/0x10
[ 894.015511] [<ffffffff8107bc31>] ? kthread_worker_fn+0xe1/0xe1
[ 894.015744] [<ffffffff81501330>] ? gs_change+0xb/0xb
[ 894.015972] Code: b8 10 00 00 00 48 89 45 c0 48 8d 81 8c 00 00 00 b9 08 00 00 00 48 89 45 c8 89 d8 f3 ab 48 8d 45 a8 48 89 55 e0 48 89 45 d8 75 02 <0f> 0b 89 f6 48 c7 02 00 00 00 00 4c 89 c7 48 6b f6 38 ba 00 04
[ 894.018895] RIP [<ffffffff8120fbbc>] nsm_mon_unmon+0x64/0x98
[ 894.019163] RSP <ffff880620a23ce0>
[ 894.019401] ---[ end trace b8ef5cb81bec72c8 ]---
Slightly different timing, but still boom.
next prev parent reply other threads:[~2012-10-23 19:49 UTC|newest]
Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-22 16:17 Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression? Nix
2012-10-23 1:33 ` J. Bruce Fields
2012-10-23 14:07 ` Nix
2012-10-23 14:30 ` J. Bruce Fields
2012-10-23 16:32 ` Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-23 16:46 ` J. Bruce Fields
2012-10-23 16:54 ` J. Bruce Fields
2012-10-23 16:56 ` Myklebust, Trond
2012-10-23 16:56 ` Myklebust, Trond
2012-10-23 17:05 ` Nix
2012-10-23 17:36 ` Nix
2012-10-23 17:43 ` J. Bruce Fields
2012-10-23 17:44 ` Myklebust, Trond
2012-10-23 17:57 ` Myklebust, Trond
2012-10-23 17:57 ` Myklebust, Trond
[not found] ` <1351015039.4622.23.camel@lade.trondhjem.org>
2012-10-23 18:23 ` Myklebust, Trond
2012-10-23 18:23 ` Myklebust, Trond
2012-10-23 19:49 ` Nix [this message]
2012-10-24 10:18 ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
[not found] ` <874nllxi7e.fsf_-_-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-23 20:57 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 20:57 ` Nix
2012-10-23 22:19 ` Theodore Ts'o
2012-10-23 22:47 ` Nix
2012-10-23 23:16 ` Theodore Ts'o
2012-10-23 23:06 ` Nix
2012-10-23 23:28 ` Theodore Ts'o
2012-10-23 23:34 ` Nix
2012-10-24 0:57 ` Eric Sandeen
2012-10-24 20:17 ` Jan Kara
2012-10-26 15:25 ` Eric Sandeen
2012-10-24 19:13 ` Jannis Achstetter
2012-10-24 19:13 ` Jannis Achstetter
2012-10-24 21:31 ` Theodore Ts'o
2012-10-24 22:05 ` Jannis Achstetter
2012-10-24 23:47 ` Nix
2012-10-25 17:02 ` Felipe Contreras
2012-10-24 21:04 ` Jannis Achstetter
[not found] ` <87pq48nbyz.fsf_-_-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-24 1:13 ` Eric Sandeen
2012-10-24 1:13 ` Eric Sandeen
2012-10-24 4:15 ` Nix
2012-10-24 4:27 ` Eric Sandeen
2012-10-24 5:23 ` Theodore Ts'o
2012-10-24 7:00 ` Hugh Dickins
2012-10-24 11:46 ` Nix
2012-10-24 11:45 ` Nix
2012-10-24 17:22 ` Eric Sandeen
2012-10-24 19:49 ` Nix
2012-10-24 19:54 ` Nix
2012-10-24 20:30 ` Eric Sandeen
2012-10-24 20:34 ` Nix
2012-10-24 20:45 ` Nix
2012-10-24 21:08 ` Theodore Ts'o
2012-10-24 23:27 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-24 23:42 ` Nix
2012-10-25 1:10 ` Theodore Ts'o
2012-10-25 1:45 ` Nix
2012-10-25 1:45 ` Nix
2012-10-25 14:12 ` Theodore Ts'o
2012-10-25 14:15 ` Nix
2012-10-25 17:39 ` Nix
2012-10-25 11:06 ` Nix
2012-10-26 0:22 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26 0:11 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Ric Wheeler
2012-10-26 0:43 ` Theodore Ts'o
2012-10-26 12:12 ` Nix
2012-10-26 20:35 ` Eric Sandeen
2012-10-26 20:37 ` Nix
[not found] ` <87wqydx957.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-26 20:56 ` Theodore Ts'o
2012-10-26 20:56 ` Theodore Ts'o
[not found] ` <20121026205618.GC8614-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-10-26 20:59 ` Nix
2012-10-26 20:59 ` Nix
[not found] ` <87objpx84k.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-26 21:15 ` Theodore Ts'o
2012-10-26 21:15 ` Theodore Ts'o
2012-10-26 21:19 ` Nix
[not found] ` <87haphx76u.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-27 0:22 ` Theodore Ts'o
2012-10-27 0:22 ` Theodore Ts'o
2012-10-27 12:45 ` Nix
2012-10-27 17:55 ` Theodore Ts'o
2012-10-27 18:47 ` Nix
2012-10-27 21:19 ` Eric Sandeen
2012-10-27 21:21 ` Nix
2012-10-27 21:23 ` Eric Sandeen
2012-10-27 21:29 ` Nix
2012-10-27 21:34 ` Eric Sandeen
2012-10-27 21:40 ` Nix
[not found] ` <09758CEA-74B5-48D0-8075-BB723A2CABBB@dilger.ca>
2012-10-29 2:09 ` Eric Sandeen
2012-10-27 22:42 ` Eric Sandeen
2012-10-29 1:00 ` Theodore Ts'o
2012-10-29 1:04 ` Nix
2012-10-29 2:24 ` Eric Sandeen
2012-10-29 2:34 ` Theodore Ts'o
2012-10-29 2:35 ` Eric Sandeen
2012-10-29 2:42 ` Theodore Ts'o
2012-10-27 18:30 ` Eric Sandeen
[not found] ` <20121026211542.GE8614-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-10-27 3:11 ` Jim Rees
2012-10-27 3:11 ` Jim Rees
2012-10-27 8:01 ` Testing ext4's journal via simulating a reboot via KVM Theodore Ts'o
2012-10-28 4:23 ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28 4:23 ` Eric Sandeen
2012-10-28 13:59 ` Nix
2012-10-29 2:30 ` [PATCH -v3] " Theodore Ts'o
2012-10-29 2:30 ` Theodore Ts'o
2012-10-29 3:24 ` Eric Sandeen
2012-10-29 5:07 ` Andreas Dilger
2012-10-29 17:08 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fw55hsue.fsf@spindle.srvr.nix \
--to=nix@esperi.org.uk \
--cc=Bryan.Schumaker@netapp.com \
--cc=Trond.Myklebust@netapp.com \
--cc=bergwolf@gmail.com \
--cc=bfields@fieldses.org \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=skinsbursky@parallels.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.