* Re: Dcache oops [not found] ` <DA67EEEA-D769-4D0F-9C20-6145F2B0808F@linuxhacker.ru> @ 2016-06-04 0:56 ` Al Viro 2016-06-04 12:25 ` Jeff Layton 2016-06-04 16:12 ` Oleg Drokin 0 siblings, 2 replies; 5+ messages in thread From: Al Viro @ 2016-06-04 0:56 UTC (permalink / raw) To: Jeff Layton Cc: Linus Torvalds, <linux-kernel@vger.kernel.org> Mailing List, <linux-fsdevel@vger.kernel.org>, Oleg Drokin, linux-nfs On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote: > > EOPENSTALE, that is... Oleg, could you check if the following works? > > Yes, this one lasted for an hour with no crashing, so it must be good. > Thanks. > (note, I am not equipped to verify correctness of NFS operations, though). I suspect that Jeff Layton might have relevant regression tests. Incidentally, we really need a consolidated regression testsuite, including the tests you'd been running. Right now there's some stuff in xfstests, LTP and cthon; if anything, this mess shows just why we need all of that and then some in a single place. Lustre stuff has caught a 3 years old NFS bug (missing d_drop() in nfs_atomic_open()) and a year-old bug in handling of EOPENSTALE retries on the last component of a trailing non-embedded symlink. Neither is hard to trigger; it's just that relevant tests hadn't been run on NFS, period. Jeff, could you verify that the following does not cause regressions in stale fhandles treatment? I want to rip the damn retry logics out of do_last() and if the staleness had only been discovered inside of nfs4_file_open() just have the upper-level logics handle it by doing a normal LOOKUP_REVAL pass from scratch. To hell with trying to be clever; a few roundtrips it saves us in some cases is not worth the complexity and potential for bugs. I'm fairly sure that the time spent debugging this particular turd exceeds the total amount of time it has ever saved, and do_last() is in dire need of simplification. All talk about "enough eyes" isn't worth much when the readers of code in question feel like ripping their eyes out... diff --git a/fs/namei.c b/fs/namei.c index 4c4f95a..3d9511e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3166,9 +3166,7 @@ static int do_last(struct nameidata *nd, int acc_mode = op->acc_mode; unsigned seq; struct inode *inode; - struct path save_parent = { .dentry = NULL, .mnt = NULL }; struct path path; - bool retried = false; int error; nd->flags &= ~LOOKUP_PARENT; @@ -3211,7 +3209,6 @@ static int do_last(struct nameidata *nd, return -EISDIR; } -retry_lookup: if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) { error = mnt_want_write(nd->path.mnt); if (!error) @@ -3292,23 +3289,14 @@ finish_lookup: if (unlikely(error)) return error; - if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path.mnt) { - path_to_nameidata(&path, nd); - } else { - save_parent.dentry = nd->path.dentry; - save_parent.mnt = mntget(path.mnt); - nd->path.dentry = path.dentry; - - } + path_to_nameidata(&path, nd); nd->inode = inode; nd->seq = seq; /* Why this, you ask? _Now_ we might have grown LOOKUP_JUMPED... */ finish_open: error = complete_walk(nd); - if (error) { - path_put(&save_parent); + if (error) return error; - } audit_inode(nd->name, nd->path.dentry, 0); error = -EISDIR; if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry)) @@ -3331,13 +3319,9 @@ finish_open_created: goto out; BUG_ON(*opened & FILE_OPENED); /* once it's opened, it's opened */ error = vfs_open(&nd->path, file, current_cred()); - if (!error) { - *opened |= FILE_OPENED; - } else { - if (error == -EOPENSTALE) - goto stale_open; + if (error) goto out; - } + *opened |= FILE_OPENED; opened: error = open_check_o_direct(file); if (!error) @@ -3353,26 +3337,7 @@ out: } if (got_write) mnt_drop_write(nd->path.mnt); - path_put(&save_parent); return error; - -stale_open: - /* If no saved parent or already retried then can't retry */ - if (!save_parent.dentry || retried) - goto out; - - BUG_ON(save_parent.dentry != dir); - path_put(&nd->path); - nd->path = save_parent; - nd->inode = dir->d_inode; - save_parent.mnt = NULL; - save_parent.dentry = NULL; - if (got_write) { - mnt_drop_write(nd->path.mnt); - got_write = false; - } - retried = true; - goto retry_lookup; } static int do_tmpfile(struct nameidata *nd, unsigned flags, ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Dcache oops 2016-06-04 0:56 ` Dcache oops Al Viro @ 2016-06-04 12:25 ` Jeff Layton 2016-06-04 16:12 ` Oleg Drokin 1 sibling, 0 replies; 5+ messages in thread From: Jeff Layton @ 2016-06-04 12:25 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, <linux-kernel@vger.kernel.org> Mailing List, <linux-fsdevel@vger.kernel.org>, Oleg Drokin, linux-nfs On Sat, 2016-06-04 at 01:56 +0100, Al Viro wrote: > On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote: > > > > > > > > > EOPENSTALE, that is... Oleg, could you check if the following works? > > Yes, this one lasted for an hour with no crashing, so it must be good. > > Thanks. > > (note, I am not equipped to verify correctness of NFS operations, though). > I suspect that Jeff Layton might have relevant regression tests. Incidentally, > we really need a consolidated regression testsuite, including the tests you'd > been running. Right now there's some stuff in xfstests, LTP and cthon; if > anything, this mess shows just why we need all of that and then some in > a single place. Lustre stuff has caught a 3 years old NFS bug (missing > d_drop() in nfs_atomic_open()) and a year-old bug in handling of EOPENSTALE > retries on the last component of a trailing non-embedded symlink. Neither > is hard to trigger; it's just that relevant tests hadn't been run on NFS, > period. > > Jeff, could you verify that the following does not cause regressions in > stale fhandles treatment? I want to rip the damn retry logics out of > do_last() and if the staleness had only been discovered inside of > nfs4_file_open() just have the upper-level logics handle it by doing > a normal LOOKUP_REVAL pass from scratch. To hell with trying to be clever; > a few roundtrips it saves us in some cases is not worth the complexity and > potential for bugs. I'm fairly sure that the time spent debugging this > particular turd exceeds the total amount of time it has ever saved, > and do_last() is in dire need of simplification. All talk about "enough eyes" > isn't worth much when the readers of code in question feel like ripping their > eyes out... > Agreed. I see no need to optimize an error case here. Any performance hit that we'd get here is almost certainly acceptable in this situation. The main thing is that we prevent the ESTALE from bubbling up into userland if we can avoid it by retrying. No, I didn't have the test for this anymore unfortunately. RHQA might have one though. Either way, I cooked one up that does this on the server: #!/bin/bash while true; do rm -rf foo mkdir foo echo foo > foo/bar usleep 100000 done ...and then this on the client after mounting the fs with lookupcache=none and noac. #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <errno.h> #include <stdio.h> #include <unistd.h> int main(int argc, char **argv) { int fd; while(1) { fd = open(argv[1], O_RDONLY); if (fd < 0) { if (errno == ESTALE) { printf("ESTALE"); return 1; } continue; } close(fd); } return 0; } I did see some of the OPEN compounds come back with NFS4ERR_STALE on the PUTFH op but no corresponding ESTALE error in userland. So, this patch does seem to do the right thing. Reviewed-and-Tested-by: Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Dcache oops 2016-06-04 0:56 ` Dcache oops Al Viro 2016-06-04 12:25 ` Jeff Layton @ 2016-06-04 16:12 ` Oleg Drokin 2016-06-04 16:21 ` [PATCH] nfs4: Fix potential use after free of state in nfs4_do_reclaim green 1 sibling, 1 reply; 5+ messages in thread From: Oleg Drokin @ 2016-06-04 16:12 UTC (permalink / raw) To: Al Viro Cc: Jeff Layton, <linux-kernel@vger.kernel.org> Mailing List, <linux-fsdevel@vger.kernel.org>, linux-nfs On Jun 3, 2016, at 8:56 PM, Al Viro wrote: > On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote: > >>> EOPENSTALE, that is... Oleg, could you check if the following works? >> >> Yes, this one lasted for an hour with no crashing, so it must be good. >> Thanks. >> (note, I am not equipped to verify correctness of NFS operations, though). > > I suspect that Jeff Layton might have relevant regression tests. Incidentally, > we really need a consolidated regression testsuite, including the tests you'd > been running. Right now there's some stuff in xfstests, LTP and cthon; if > anything, this mess shows just why we need all of that and then some in > a single place. Lustre stuff has caught a 3 years old NFS bug (missing > d_drop() in nfs_atomic_open()) and a year-old bug in handling of EOPENSTALE > retries on the last component of a trailing non-embedded symlink. Neither > is hard to trigger; it's just that relevant tests hadn't been run on NFS, > period. BTW, the nets also have brought in another use after free in nfs4 state tracking code (this is the one I was trying to hunt down from the start). I'll submit a patch shortly. And also there's a mysterious ext4 data corruption that I do not really fully understand but only hit once so far. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] nfs4: Fix potential use after free of state in nfs4_do_reclaim. 2016-06-04 16:12 ` Oleg Drokin @ 2016-06-04 16:21 ` green 2016-06-04 19:57 ` Jeff Layton 0 siblings, 1 reply; 5+ messages in thread From: green @ 2016-06-04 16:21 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker Cc: Olga Kornievskaia, linux-nfs, linux-kernel, Oleg Drokin From: Oleg Drokin <green@linuxhacker.ru> Commit e8d975e73e5f ("fixing infinite OPEN loop in 4.0 stateid recovery") introduced access to state after it was just potentially freed by nfs4_put_open_state leading to a random data corruption somewhere. BUG: unable to handle kernel paging request at ffff88004941ee40 IP: [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 PGD 3501067 PUD 3504067 PMD 6ff37067 PTE 800000004941e060 Oops: 0002 [#1] SMP DEBUG_PAGEALLOC Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm virtio_console nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops floppy serio_raw virtio_blk drm CPU: 6 PID: 2161 Comm: 192.168.10.253- Not tainted 4.7.0-rc1-vm-nfs+ #112 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800463dcd00 ti: ffff88003ff48000 task.ti: ffff88003ff48000 RIP: 0010:[<ffffffff813baf01>] [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 RSP: 0018:ffff88003ff4bd68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffffff81a49900 RCX: 00000000000000e8 RDX: 00000000000000e8 RSI: ffff8800418b9930 RDI: ffff880040c96c88 RBP: ffff88003ff4bdf8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880040c96c98 R13: ffff88004941ee20 R14: ffff88004941ee40 R15: ffff88004941ee00 FS: 0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88004941ee40 CR3: 0000000060b0b000 CR4: 00000000000006e0 Stack: ffffffff813baad5 ffff8800463dcd00 ffff880000000001 ffffffff810e6b68 ffff880043ddbc88 ffff8800418b9800 ffff8800418b98c8 ffff88004941ee48 ffff880040c96c90 ffff880040c96c00 ffff880040c96c20 ffff880040c96c40 Call Trace: [<ffffffff813baad5>] ? nfs4_do_reclaim+0x35/0x740 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0 [<ffffffff813bb7cd>] nfs4_run_state_manager+0x5ed/0xa40 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740 [<ffffffff810af0d1>] kthread+0x101/0x120 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0 [<ffffffff818843af>] ret_from_fork+0x1f/0x40 [<ffffffff810aefd0>] ? kthread_create_on_node+0x250/0x250 Code: 65 80 4c 8b b5 78 ff ff ff e8 fc 88 4c 00 48 8b 7d 88 e8 13 67 d2 ff 49 8b 47 40 a8 02 0f 84 d3 01 00 00 4c 89 ff e8 7f f9 ff ff <f0> 41 80 26 7f 48 8b 7d c8 e8 b1 84 4c 00 e9 39 fd ff ff 3d e6 RIP [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 RSP <ffff88003ff4bd68> CR2: ffff88004941ee40 Signed-off-by: Oleg Drokin <green@linuxhacker.ru> --- fs/nfs/nfs4state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 9679f47..834b875 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1488,9 +1488,9 @@ restart: } spin_unlock(&state->state_lock); } - nfs4_put_open_state(state); clear_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags); + nfs4_put_open_state(state); spin_lock(&sp->so_lock); goto restart; } -- 2.5.5 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] nfs4: Fix potential use after free of state in nfs4_do_reclaim. 2016-06-04 16:21 ` [PATCH] nfs4: Fix potential use after free of state in nfs4_do_reclaim green @ 2016-06-04 19:57 ` Jeff Layton 0 siblings, 0 replies; 5+ messages in thread From: Jeff Layton @ 2016-06-04 19:57 UTC (permalink / raw) To: green, Trond Myklebust, Anna Schumaker Cc: Olga Kornievskaia, linux-nfs, linux-kernel On Sat, 2016-06-04 at 12:21 -0400, green@linuxhacker.ru wrote: > From: Oleg Drokin <green@linuxhacker.ru> > > Commit e8d975e73e5f ("fixing infinite OPEN loop in 4.0 stateid recovery") > introduced access to state after it was just potentially freed by > nfs4_put_open_state leading to a random data corruption somewhere. > > BUG: unable to handle kernel paging request at ffff88004941ee40 > IP: [] nfs4_do_reclaim+0x461/0x740 > PGD 3501067 PUD 3504067 PMD 6ff37067 PTE 800000004941e060 > Oops: 0002 [#1] SMP DEBUG_PAGEALLOC > Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm virtio_console nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops floppy serio_raw virtio_blk drm > CPU: 6 PID: 2161 Comm: 192.168.10.253- Not tainted 4.7.0-rc1-vm-nfs+ #112 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > task: ffff8800463dcd00 ti: ffff88003ff48000 task.ti: ffff88003ff48000 > RIP: 0010:[] [] nfs4_do_reclaim+0x461/0x740 > RSP: 0018:ffff88003ff4bd68 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffffffff81a49900 RCX: 00000000000000e8 > RDX: 00000000000000e8 RSI: ffff8800418b9930 RDI: ffff880040c96c88 > RBP: ffff88003ff4bdf8 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff880040c96c98 > R13: ffff88004941ee20 R14: ffff88004941ee40 R15: ffff88004941ee00 > FS: 0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffff88004941ee40 CR3: 0000000060b0b000 CR4: 00000000000006e0 > Stack: > ffffffff813baad5 ffff8800463dcd00 ffff880000000001 ffffffff810e6b68 > ffff880043ddbc88 ffff8800418b9800 ffff8800418b98c8 ffff88004941ee48 > ffff880040c96c90 ffff880040c96c00 ffff880040c96c20 ffff880040c96c40 > Call Trace: > [] ? nfs4_do_reclaim+0x35/0x740 > [] ? trace_hardirqs_on_caller+0x128/0x1b0 > [] nfs4_run_state_manager+0x5ed/0xa40 > [] ? nfs4_do_reclaim+0x740/0x740 > [] ? nfs4_do_reclaim+0x740/0x740 > [] kthread+0x101/0x120 > [] ? trace_hardirqs_on_caller+0x128/0x1b0 > [] ret_from_fork+0x1f/0x40 > [] ? kthread_create_on_node+0x250/0x250 > Code: 65 80 4c 8b b5 78 ff ff ff e8 fc 88 4c 00 48 8b 7d 88 e8 13 67 d2 ff 49 8b 47 40 a8 02 0f 84 d3 01 00 00 4c 89 ff e8 7f f9 ff ff 41 80 26 7f 48 8b 7d c8 e8 b1 84 4c 00 e9 39 fd ff ff 3d e6 > RIP [] nfs4_do_reclaim+0x461/0x740 > RSP > CR2: ffff88004941ee40 > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > --- > fs/nfs/nfs4state.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > index 9679f47..834b875 100644 > --- a/fs/nfs/nfs4state.c > +++ b/fs/nfs/nfs4state.c > @@ -1488,9 +1488,9 @@ restart: > } > spin_unlock(&state->state_lock); > } > - nfs4_put_open_state(state); > clear_bit(NFS_STATE_RECLAIM_NOGRACE, > &state->flags); > + nfs4_put_open_state(state); > spin_lock(&sp->so_lock); > goto restart; > } Nice catch. Reviewed-by: Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-06-04 19:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <74306F63-DBDF-4DED-85D2-5C3FB21B8A1E@linuxhacker.ru>
[not found] ` <20160603182203.GR14480@ZenIV.linux.org.uk>
[not found] ` <4285E00F-7228-485C-AD32-97552ED746F2@linuxhacker.ru>
[not found] ` <20160603200759.GS14480@ZenIV.linux.org.uk>
[not found] ` <CA+55aFznPit97evuwZHdCG55=N0QbkFSST2ydKxvExQTR-YXTg@mail.gmail.com>
[not found] ` <20160603212652.GT14480@ZenIV.linux.org.uk>
[not found] ` <CA+55aFyYkbXqoJ3PqfrnxWDEXJM_BH6Q_OdKf+UVJYEUF4gdQA@mail.gmail.com>
[not found] ` <20160603222355.GW14480@ZenIV.linux.org.uk>
[not found] ` <20160603223700.GY14480@ZenIV.linux.org.uk>
[not found] ` <DA67EEEA-D769-4D0F-9C20-6145F2B0808F@linuxhacker.ru>
2016-06-04 0:56 ` Dcache oops Al Viro
2016-06-04 12:25 ` Jeff Layton
2016-06-04 16:12 ` Oleg Drokin
2016-06-04 16:21 ` [PATCH] nfs4: Fix potential use after free of state in nfs4_do_reclaim green
2016-06-04 19:57 ` Jeff Layton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).