From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>,
Anna Schumaker <anna.schumaker@oracle.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Jeff Layton <jlayton@kernel.org>,
Anna Schumaker <anna@kernel.org>,
Trond Myklebust <trondmy@hammerspace.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v15 00/26] nfs/nfsd: add support for LOCALIO
Date: Fri, 13 Sep 2024 08:27:02 -0400 [thread overview]
Message-ID: <ZuQvlsAWRW9CvglZ@kernel.org> (raw)
In-Reply-To: <2A6AA1A5-9498-4783-A23C-C25500AB6D88@oracle.com>
On Thu, Sep 12, 2024 at 11:42:28PM +0000, Chuck Lever III wrote:
>
>
> > On Sep 12, 2024, at 7:31 PM, NeilBrown <neilb@suse.de> wrote:
> >
> > On Thu, 12 Sep 2024, Chuck Lever III wrote:
> >>
> >>
> >>> On Sep 10, 2024, at 8:43 PM, NeilBrown <neilb@suse.de> wrote:
> >>>
> >>> On Sat, 07 Sep 2024, Anna Schumaker wrote:
> >>>> Hi Mike,
> >>>>
> >>>> On 8/31/24 6:37 PM, Mike Snitzer wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Happy Labor Day weekend (US holiday on Monday)! Seems apropos to send
> >>>>> what I hope the final LOCALIO patchset this weekend: its my birthday
> >>>>> this coming Tuesday, so _if_ LOCALIO were to get merged for 6.12
> >>>>> inclusion sometime next week: best b-day gift in a while! ;)
> >>>>>
> >>>>> Anyway, I've been busy incorporating all the review feedback from v14
> >>>>> _and_ working closely with NeilBrown to address some lingering net-ns
> >>>>> refcounting and nfsd modules refcounting issues, and more (Chnagelog
> >>>>> below):
> >>>>>
> >>>>
> >>>> I've been running tests on localio this afternoon after finishing up going through v15 of the patches (I was most of the way through when you posted v16, so I haven't updated yet!). Cthon tests passed on all NFS versions, and xfstests passed on NFS v4.x. However, I saw this crash from xfstests with NFS v3:
> >>>>
> >>>> [ 1502.440896] run fstests generic/633 at 2024-09-06 14:04:17
> >>>> [ 1502.694356] process 'vfstest' launched '/dev/fd/4/file1' with NULL argv: empty string added
> >>>> [ 1502.699514] Oops: general protection fault, probably for non-canonical address 0x6c616e69665f6140: 0000 [#1] PREEMPT SMP NOPTI
> >>>> [ 1502.700970] CPU: 3 UID: 0 PID: 513 Comm: nfsd Not tainted 6.11.0-rc6-g0c79a48cd64d-dirty+ #42323 70d41673e6cbf8e3437eb227e0a9c3c46ed3b289
> >>>> [ 1502.702506] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
> >>>> [ 1502.703593] RIP: 0010:nfsd_cache_lookup+0x2b3/0x840 [nfsd]
> >>>> [ 1502.704474] Code: 8d bb 30 02 00 00 bb 01 00 00 00 eb 12 49 8d 46 10 48 8b 08 ff c3 48 85 c9 0f 84 9c 00 00 00 49 89 ce 4c 8d 61 c8 41 8b 45 00 <3b> 41 c8 75 1f 41 8b 45 04 41 3b 46 cc 74 15 8b 15 2c c6 b8 f2 be
> >>>> [ 1502.706931] RSP: 0018:ffffc27ac0a2fd18 EFLAGS: 00010206
> >>>> [ 1502.707547] RAX: 00000000b95691f7 RBX: 0000000000000002 RCX: 6c616e69665f6178
> >>>
> >>> This doesn't look like code anywhere near the changes that LOCALIO
> >>> makes.
> >>>
> >>> I dug around and the faulting instruction is
> >>> cmp -0x38(%rcx),%eax
> >>>
> >>> The -0x38 points to nfsd_cache_insert(). -0x38 is the index back
> >>> from the rbnode pointer to c_key.k_xid. So the rbtree is corrupt.
> >>> %rcx is 6c616e69665f6178 which is "xa_final". So that rbtree node has
> >>> been over-written or freed and re-used.
> >>>
> >>> It looks like
> >>>
> >>> Commit add1511c3816 ("NFSD: Streamline the rare "found" case")
> >>>
> >>> moved a call to nfsd_reply_cache_free_locked() that was inside a region
> >>> locked with ->cache_lock out of that region.
> >>
> >> My reading of the current code is that cache_lock is held
> >> during the nfsd_reply_cache_free_locked() call.
> >>
> >> add1511c3816 simply moved the call site from before a "goto"
> >> to after the label it branches to. What am I missing?
> >
> > Yes, I let myself get confused by the gotos. As you say that patch
> > didn't move the call out of the locked region. Sorry.
> >
> > I can't see anything wrong with the locking or tree management in
> > nfscache.c, yet this Oops looks a lot like a corrupted rbtree.
> > It *could* be something external stomping on the memory but I think
> > that is unlikely. I'd rather have a more direct explanation.... Not
> > today though it seems.
>
> My spidey sense (well, OK, my PTSD from when I've worked on
> the DRC code previously) is that these kind of memory overwrites
> can happen when the XDR receive buffer is unexpectedly short,
> and the DRC code ends up reading off the end of it. That code
> makes some stunning assumptions that might not hold true in the
> new LOCALIO paths.
I really don't think LOCALIO is the reason for whatever Anna saw. I
haven't ever seen anything like it during all my time with the code.
> Anna/Mike, you might try enabling KASAN to see if it will catch
> which instructions are doing the damage.
ktest runs xfstests with KASAN enabled, not seen any issues yet.
My most pressing work related to LOCALIO is fixing xfstests
generic/525. It is proving to be quite the mystery (for some reason
the final eof page isn't getting added the the pagcache and subsequent
pread is failing to find the page in either NFS or XFS
pagecache.. _only_ for this eof page).. inching closer but I'm going
on day 3 now.
Thanks,
Mike
prev parent reply other threads:[~2024-09-13 12:27 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-31 22:37 [PATCH v15 00/26] nfs/nfsd: add support for LOCALIO Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 01/26] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 02/26] nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 03/26] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 04/26] NFSD: Handle @rqstp == NULL in check_nfsd_access() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 05/26] NFSD: Refactor nfsd_setuser_and_check_port() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 06/26] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 07/26] NFSD: Short-circuit fh_verify tracepoints for LOCALIO Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 08/26] nfsd: factor out __fh_verify to allow NULL rqstp to be passed Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 09/26] nfsd: add nfsd_file_acquire_local() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 10/26] nfsd: add nfsd_serv_try_get and nfsd_serv_put Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 11/26] SUNRPC: remove call_allocate() BUG_ONs Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 12/26] SUNRPC: add svcauth_map_clnt_to_svc_cred_local Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 13/26] SUNRPC: replace program list with program array Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 14/26] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-09-01 23:25 ` NeilBrown
2024-09-03 16:33 ` Mike Snitzer
2024-09-05 19:24 ` Anna Schumaker
2024-09-05 19:38 ` Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 15/26] nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO Mike Snitzer
2024-09-01 23:37 ` NeilBrown
2024-08-31 22:37 ` [PATCH v15 16/26] nfsd: add LOCALIO support Mike Snitzer
2024-09-01 23:46 ` NeilBrown
2024-09-03 14:34 ` Chuck Lever
2024-09-03 14:40 ` Jeff Layton
2024-09-03 15:00 ` Mike Snitzer
2024-09-03 15:19 ` Jeff Layton
2024-09-03 15:29 ` Mike Snitzer
2024-09-03 15:59 ` Chuck Lever III
2024-09-03 16:09 ` Mike Snitzer
2024-09-03 17:07 ` Chuck Lever III
2024-09-03 22:31 ` NeilBrown
2024-09-04 5:01 ` NeilBrown
2024-09-04 13:47 ` Chuck Lever
2024-09-05 14:21 ` Mike Snitzer
2024-09-05 15:41 ` Chuck Lever III
2024-09-05 23:34 ` NeilBrown
2024-09-06 15:04 ` Mike Snitzer
2024-09-06 18:07 ` Mike Snitzer
2024-09-06 21:56 ` NeilBrown
2024-09-06 22:33 ` Chuck Lever III
2024-09-06 23:14 ` NeilBrown
2024-09-07 15:17 ` Mike Snitzer
2024-09-07 16:09 ` Chuck Lever III
2024-09-07 19:08 ` Mike Snitzer
2024-09-07 21:12 ` Chuck Lever III
2024-09-08 15:05 ` Chuck Lever III
2024-09-07 15:52 ` Mike Snitzer
2024-09-04 13:54 ` Jeff Layton
2024-09-04 13:56 ` Chuck Lever III
2024-08-31 22:37 ` [PATCH v15 17/26] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-09-03 14:11 ` Chuck Lever
2024-08-31 22:37 ` [PATCH v15 18/26] nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 19/26] nfs: add LOCALIO support Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 20/26] nfs: enable localio for non-pNFS IO Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 21/26] pnfs/flexfiles: enable localio support Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 22/26] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 23/26] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 24/26] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 25/26] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 26/26] nfs: add "NFS Client and Server Interlock" section to localio.rst Mike Snitzer
2024-09-01 23:52 ` [PATCH v15 00/26] nfs/nfsd: add support for LOCALIO NeilBrown
2024-09-03 14:49 ` Jeff Layton
2024-09-06 19:31 ` Anna Schumaker
2024-09-06 20:34 ` Mike Snitzer
2024-09-06 21:09 ` Chuck Lever III
2024-09-10 16:45 ` Mike Snitzer
2024-09-10 19:14 ` Mike Snitzer
2024-09-10 19:24 ` Anna Schumaker
2024-09-10 20:31 ` Anna Schumaker
2024-09-10 22:11 ` Mike Snitzer
2024-09-11 17:51 ` Mike Snitzer
2024-09-11 18:48 ` Mike Snitzer
2024-09-13 18:12 ` Mike Snitzer
2024-09-11 0:43 ` NeilBrown
2024-09-11 16:03 ` Chuck Lever III
2024-09-12 23:31 ` NeilBrown
2024-09-12 23:42 ` Chuck Lever III
2024-09-13 12:27 ` Mike Snitzer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZuQvlsAWRW9CvglZ@kernel.org \
--to=snitzer@kernel.org \
--cc=anna.schumaker@oracle.com \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).