From: David Wysochanski <dwysocha@redhat.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: NFS failure with generic/074 when lockdep is enabled - BUG: Invalid wait context
Date: Wed, 25 Nov 2020 10:19:39 -0500 [thread overview]
Message-ID: <CALF+zOmYk6Mk19mj5rfmqRuydEBZ-EecFBrUYsznS3CH41Gbdw@mail.gmail.com> (raw)
In-Reply-To: <4e7d645996294262c738a05a3c2707aa20424f79.camel@hammerspace.com>
On Tue, Nov 24, 2020 at 8:47 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Tue, 2020-11-24 at 20:43 -0500, David Wysochanski wrote:
> > On Tue, Nov 24, 2020 at 8:33 PM Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > >
> > > On Tue, 2020-11-24 at 20:28 -0500, David Wysochanski wrote:
> > > > On Tue, Nov 24, 2020 at 8:07 PM Trond Myklebust <
> > > > trondmy@hammerspace.com> wrote:
> > > > >
> > > > > On Tue, 2020-11-24 at 16:56 -0500, David Wysochanski wrote:
> > > > > > I've started seeing this failure since testing 5.10-rc4 -
> > > > > > this
> > > > > > does
> > > > > > not happen on 5.9
> > > > > >
> > > > > >
> > > > > > f31-node1 login: [ 124.055768] FS-Cache: Netfs 'nfs'
> > > > > > registered
> > > > > > for
> > > > > > caching
> > > > > > [ 125.046104] Key type dns_resolver registered
> > > > > > [ 125.770354] NFS: Registering the id_resolver key type
> > > > > > [ 125.780599] Key type id_resolver registered
> > > > > > [ 125.782440] Key type id_legacy registered
> > > > > > [ 126.563717] run fstests generic/074 at 2020-11-24 11:23:49
> > > > > > [ 178.736479]
> > > > > > [ 178.751380] =============================
> > > > > > [ 178.753249] [ BUG: Invalid wait context ]
> > > > > > [ 178.754886] 5.10.0-rc4 #127 Not tainted
> > > > > > [ 178.756423] -----------------------------
> > > > > > [ 178.758055] kworker/1:2/848 is trying to lock:
> > > > > > [ 178.759866] ffff8947fffd33d8 (&zone->lock){..-.}-{3:3},
> > > > > > at:
> > > > > > get_page_from_freelist+0x897/0x2190
> > > > > > [ 178.763333] other info that might help us debug this:
> > > > > > [ 178.765354] context-{5:5}
> > > > > > [ 178.766437] 3 locks held by kworker/1:2/848:
> > > > > > [ 178.768158] #0: ffff8946ce825538
> > > > > > ((wq_completion)nfsiod){+.+.}-{0:0}, at:
> > > > > > process_one_work+0x1be/0x540
> > > > > > [ 178.771871] #1: ffff9e6b408f7e58
> > > > > > ((work_completion)(&task->u.tk_work)#2){+.+.}-{0:0}, at:
> > > > > > process_one_work+0x1be/0x540
> > > > > > [ 178.776562] #2: ffff8947f7c5b2b0 (krc.lock){..-.}-{2:2},
> > > > > > at:
> > > > > > kvfree_call_rcu+0x69/0x230
> > > > > > [ 178.779803] stack backtrace:
> > > > > > [ 178.780996] CPU: 1 PID: 848 Comm: kworker/1:2 Kdump:
> > > > > > loaded
> > > > > > Not
> > > > > > tainted 5.10.0-rc4 #127
> > > > > > [ 178.784374] Hardware name: Red Hat KVM, BIOS 0.5.1
> > > > > > 01/01/2011
> > > > > > [ 178.787071] Workqueue: nfsiod rpc_async_release [sunrpc]
> > > > > > [ 178.789308] Call Trace:
> > > > > > [ 178.790386] dump_stack+0x8d/0xb5
> > > > > > [ 178.791816] __lock_acquire.cold+0x20b/0x2c8
> > > > > > [ 178.793605] lock_acquire+0xca/0x380
> > > > > > [ 178.795113] ? get_page_from_freelist+0x897/0x2190
> > > > > > [ 178.797116] _raw_spin_lock+0x2c/0x40
> > > > > > [ 178.798638] ? get_page_from_freelist+0x897/0x2190
> > > > > > [ 178.800620] get_page_from_freelist+0x897/0x2190
> > > > > > [ 178.802537] __alloc_pages_nodemask+0x1b4/0x460
> > > > > > [ 178.804416] __get_free_pages+0xd/0x30
> > > > > > [ 178.805987] kvfree_call_rcu+0x168/0x230
> > > > > > [ 178.807687] nfs_free_request+0xab/0x180 [nfs]
> > > > > > [ 178.809547] nfs_page_group_destroy+0x41/0x80 [nfs]
> > > > > > [ 178.811588] nfs_read_completion+0x129/0x1f0 [nfs]
> > > > > > [ 178.813633] rpc_free_task+0x39/0x60 [sunrpc]
> > > > > > [ 178.815481] rpc_async_release+0x29/0x40 [sunrpc]
> > > > > > [ 178.817451] process_one_work+0x23e/0x540
> > > > > > [ 178.819136] worker_thread+0x50/0x3a0
> > > > > > [ 178.820657] ? process_one_work+0x540/0x540
> > > > > > [ 178.822427] kthread+0x10f/0x150
> > > > > > [ 178.823805] ? kthread_park+0x90/0x90
> > > > > > [ 178.825339] ret_from_fork+0x22/0x30
> > > > > >
> > > > >
> > > > > I can't think of any changes that might have caused this. Is
> > > > > this
> > > > > NFSv3, v4 or other? I haven't been seeing any of this.
> > > > >
> > > >
> > > > It is NFSv4.1 or NFS4.2. I am running the xfstests NFS client
> > > > against
> > > > an older server, RHEL7 based (3.10.0-1127.8.2.el7.x86_64) though
> > > > not
> > > > sure if that matters.
> > > > My config has these:
> > > > CONFIG_LOCK_DEBUGGING_SUPPORT=y
> > > > CONFIG_PROVE_LOCKING=y
> > > > CONFIG_PROVE_RAW_LOCK_NESTING=y
> > > > CONFIG_DEBUG_SPINLOCK=y
> > > > CONFIG_DEBUG_LOCK_ALLOC=y
> > > > CONFIG_LOCKDEP=y
> > > >
> > > That helps. It means we can't blame the new READ_PLUS code, since
> > > it
> > > would be completely disabled here.
> > > Are you using any special rsize values? Also, could pNFS be
> > > involved
> > > (e.g. the pNFS block/scsi code)?
> > >
> >
> > No special rsize values or pNFS should be involved - here's most of
> > the /proc/mounts
> > /mnt/test nfs4
> > rw,context=system_u:object_r:root_t:s0,relatime,vers=4.1,rsize=524288
> > ,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys
> >
> > If you want I can try against a later server and maybe loopback on
> > the
> > same 5.10-rc4 kernel and see if it reproduces?
> >
> > >
> >
> > FWIW, I've also seen at least once NFS4.1 produces something slightly
> > different:
> >
> > $ cat nfs41-fail-074-different-backtrace.txt
> > [ 60.125028] run fstests generic/074 at 2020-11-24 11:57:51
> > [ 62.281576]
> > [ 62.300548] =============================
> > [ 62.302205] [ BUG: Invalid wait context ]
> > [ 62.303812] 5.10.0-rc4 #127 Not tainted
> > [ 62.305351] -----------------------------
> > [ 62.306954] fstest/2035 is trying to lock:
> > [ 62.308588] ffff8fe4bffd23d8 (&zone->lock){..-.}-{3:3}, at:
> > get_page_from_freelist+0x897/0x2190
> > [ 62.312079] other info that might help us debug this:
> > [ 62.314105] context-{5:5}
> > [ 62.315166] 3 locks held by fstest/2035:
> > [ 62.316722] #0: ffff8fe3b3d78448 (sb_writers#16){.+.+}-{0:0}, at:
> > do_syscall_64+0x33/0x40
> > [ 62.320040] #1: ffff8fe3d8f48488
> > (&sb->s_type->i_mutex_key#21){++++}-{4:4}, at: do_truncate+0x69/0xd0
> > [ 62.323706] #2: ffff8fe4b7c9b2b0 (krc.lock){..-.}-{2:2}, at:
> > kvfree_call_rcu+0x69/0x230
> > [ 62.326974] stack backtrace:
> > [ 62.328151] CPU: 2 PID: 2035 Comm: fstest Kdump: loaded Not
> > tainted
> > 5.10.0-rc4 #127
> > [ 62.331172] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> > [ 62.333449] Call Trace:
> > [ 62.334504] dump_stack+0x8d/0xb5
> > [ 62.335880] __lock_acquire.cold+0x20b/0x2c8
> > [ 62.337621] ? find_get_entries+0x2c7/0x5b0
> > [ 62.339316] lock_acquire+0xca/0x380
> > [ 62.340795] ? get_page_from_freelist+0x897/0x2190
> > [ 62.342722] _raw_spin_lock+0x2c/0x40
> > [ 62.344224] ? get_page_from_freelist+0x897/0x2190
> > [ 62.346123] get_page_from_freelist+0x897/0x2190
> > [ 62.347979] ? __lock_acquire+0x3b1/0x25d0
> > [ 62.349625] __alloc_pages_nodemask+0x1b4/0x460
> > [ 62.351437] __get_free_pages+0xd/0x30
> > [ 62.352942] kvfree_call_rcu+0x168/0x230
> > [ 62.354562] nfs4_do_setattr+0x1f6/0x4e0 [nfsv4]
> > [ 62.356455] nfs4_proc_setattr+0xb0/0x160 [nfsv4]
> > [ 62.358380] nfs_setattr+0x102/0x2c0 [nfs]
> > [ 62.360069] notify_change+0x340/0x4d0
> > [ 62.361587] ? do_truncate+0x76/0xd0
> > [ 62.363034] do_truncate+0x76/0xd0
> > [ 62.364413] do_sys_ftruncate+0x14a/0x230
> > [ 62.366037] do_syscall_64+0x33/0x40
> > [ 62.367481] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [ 62.369489] RIP: 0033:0x7f23c70c2abb
> > [ 62.370927] Code: 77 05 c3 0f 1f 40 00 48 8b 15 c9 73 0c 00 f7 d8
> > 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 4d 00 00
> > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 99 73 0c
> > 00
> > f7 d8
> > [ 62.378219] RSP: 002b:00007ffd0780e788 EFLAGS: 00000202 ORIG_RAX:
> > 000000000000004d
> > [ 62.381208] RAX: ffffffffffffffda RBX: 0000000000002000 RCX:
> > 00007f23c70c2abb
> > [ 62.384031] RDX: 0000000000000242 RSI: 0000000000a00000 RDI:
> > 0000000000000003
> > [ 62.386851] RBP: 00000000017022b0 R08: 0000000000000000 R09:
> > 000000000000001f
> > [ 62.389677] R10: 00000000000001a4 R11: 0000000000000202 R12:
> > 0000000000000003
> > [ 62.392493] R13: 0000000000000000 R14: 0000000000a00000 R15:
> > 0000000000000001
> >
>
> Hmm... Both suggest a use-after-free situation. Would you be able to
> run the test with KASAN enabled? That might help finger the real
> culprit.
>
Trying to get something out of KASAN but so far no luck.
Could this be a junk report from lockdep?
next prev parent reply other threads:[~2020-11-25 15:20 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-24 21:56 NFS failure with generic/074 when lockdep is enabled - BUG: Invalid wait context David Wysochanski
2020-11-25 1:07 ` Trond Myklebust
2020-11-25 1:28 ` David Wysochanski
2020-11-25 1:33 ` Trond Myklebust
2020-11-25 1:43 ` David Wysochanski
2020-11-25 1:47 ` Trond Myklebust
2020-11-25 15:19 ` David Wysochanski [this message]
2020-11-25 17:58 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALF+zOmYk6Mk19mj5rfmqRuydEBZ-EecFBrUYsznS3CH41Gbdw@mail.gmail.com \
--to=dwysocha@redhat.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).