From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
"dcritch@redhat.com" <dcritch@redhat.com>,
"d.lesca@solinos.it" <d.lesca@solinos.it>
Subject: Re: [PATCH 1/2] nfsd: don't replace page in rq_pages if it's a continuation of last page
Date: Fri, 17 Mar 2023 10:59:57 -0400 [thread overview]
Message-ID: <c1d4fbaf83c6e1e41e31f77d58d889adaecb6d35.camel@kernel.org> (raw)
In-Reply-To: <65C84563-6BCD-41CE-AF68-80E1869D217F@oracle.com>
On Fri, 2023-03-17 at 14:16 +0000, Chuck Lever III wrote:
>
> > On Mar 17, 2023, at 9:06 AM, Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Fri, 2023-03-17 at 06:56 -0400, Jeff Layton wrote:
> > > The splice read calls nfsd_splice_actor to put the pages containing file
> > > data into the svc_rqst->rq_pages array. It's possible however to get a
> > > splice result that only has a partial page at the end, if (e.g.) the
> > > filesystem hands back a short read that doesn't cover the whole page.
> > >
> > > nfsd_splice_actor will plop the partial page into its rq_pages array and
> > > return. Then later, when nfsd_splice_actor is called again, the
> > > remainder of the page may end up being filled out. At this point,
> > > nfsd_splice_actor will put the page into the array _again_ corrupting
> > > the reply. If this is done enough times, rq_next_page will overrun the
> > > array and corrupt the trailing fields -- the rq_respages and
> > > rq_next_page pointers themselves.
> > >
> > > If we've already added the page to the array in the last pass, don't add
> > > it to the array a second time when dealing with a splice continuation.
> > > This was originally handled properly in nfsd_splice_actor, but commit
> > > 91e23b1c3982 removed the check for it, and started universally replacing
> > > pages.
> > >
> > > Fixes: 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()")
> > > Reported-by: Dario Lesca <d.lesca@solinos.it>
> > > Tested-by: David Critch <dcritch@redhat.com>
> > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > > fs/nfsd/vfs.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > index 502e1b7742db..3709ef57d96e 100644
> > > --- a/fs/nfsd/vfs.c
> > > +++ b/fs/nfsd/vfs.c
> > > @@ -941,8 +941,11 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> > > struct page *last_page;
> > >
> > > last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
> > > - for (page += offset / PAGE_SIZE; page <= last_page; page++)
> > > - svc_rqst_replace_page(rqstp, page);
> > > + for (page += offset / PAGE_SIZE; page <= last_page; page++) {
> > > + /* Only replace page if we haven't already done so */
> >
> > Note that I think that this was probably the real rationale for the pp[-
> > 1] check that 91e23b1c3982 removed. Given that, maybe we should flesh
> > this comment out a bit more for posterity?
> >
> > /*
> > * When we're splicing from a pipe, it's possible that
> > * we'll get an incomplete page that may be updated on
> > * a later call. Only splice it into rq_pages once.
> > */
>
> The "real" bug here is that the API contract for pipe splicing
> isn't well defined, so I agree that it's very likely the pp[-1]
> check was because a splice can call the actor repeatedly for the
> same page. No one could remember why that check was there.
>
The whole splice API is a minefield.
> To be clear, if the passed-in page matches the current page in
> the rqst, we're "extending the current page" rather than avoiding
> replacement... maybe:
>
> /*
> * Skip page replacement when extending the contents
> * of the current page.
> */
>
Sure, sounds good.
> In the patch description, would you mention that this case
> arises if the READ request is not page-aligned?
>
Does it though? I'm not sure that page alignment has that much to do
with it. I imagine you can hit this even with aligned I/Os.
My guess is the bigger issue is when your storage is doing sub-page-size
I/Os under the hood. We end up filling up part of a page from storage
and the kernel submits what it has to the pipe and then the next bit
comes in and the page is updated for the next actor call.
> If you resend this patch, please Cc: viro@ . Thanks for chasing
> this down!
>
Will do.
>
> > > + if (page != *(rqstp->rq_next_page - 1))
> > > + svc_rqst_replace_page(rqstp, page);
> > > + }
> > > if (rqstp->rq_res.page_len == 0) // first call
> > > rqstp->rq_res.page_base = offset % PAGE_SIZE;
> > > rqstp->rq_res.page_len += sd->len;
> >
> > --
> > Jeff Layton <jlayton@kernel.org>
>
> --
> Chuck Lever
>
>
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2023-03-17 15:00 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-17 10:56 [PATCH 1/2] nfsd: don't replace page in rq_pages if it's a continuation of last page Jeff Layton
2023-03-17 10:56 ` [PATCH 2/2] sunrpc: add bounds checking to svc_rqst_replace_page Jeff Layton
2023-03-17 13:44 ` Chuck Lever III
2023-03-17 13:52 ` Jeff Layton
2023-03-17 13:54 ` Chuck Lever III
2023-03-17 13:06 ` [PATCH 1/2] nfsd: don't replace page in rq_pages if it's a continuation of last page Jeff Layton
2023-03-17 14:16 ` Chuck Lever III
2023-03-17 14:59 ` Jeff Layton [this message]
2023-03-17 15:04 ` Chuck Lever III
2023-03-17 17:23 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c1d4fbaf83c6e1e41e31f77d58d889adaecb6d35.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=d.lesca@solinos.it \
--cc=dcritch@redhat.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox