public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: malahal naineni <malahal@us.ibm.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Subject: Re: NFSv3/v2: Fix data corruption with NFS short reads.
Date: Wed, 10 Apr 2013 22:55:06 -0500	[thread overview]
Message-ID: <20130411035506.GA32459@us.ibm.com> (raw)
In-Reply-To: <20130411112121.74d996c5@notabene.brown>

NeilBrown [neilb@suse.de] wrote:
> On Fri, 29 Mar 2013 19:57:06 -0000 malahal naineni <malahal@us.ibm.com> wrote:
> 
> > This bug seems to be present in v2.6.37 or lower versions. The code was
> > re-organized in v2.6.38 that eliminated the bug. Current upstream code
> > doesn't have this bug. This may be applicable to some longterm releases!
> > 
> > Here are the bug details:
> > 
> > 1. nfs_read_rpcsetup(), args.count and res.count are both set to the
> >    actual number of bytes to be read. Let us assume that the request is
> >    for 16K, so arg.count = res.count = 16K
> > 2. nfs3_xdr_readres() conditionally sets res.count to to the actual
> >    number of bytes read. This condition is true for the first response
> >    as res.count was set to args.count before the first request. Let us
> >    say the server returned only 4K bytes. res.count=4K
> > 3. Another read request is sent for the remaining data. Note that
> >    res.count is NOT updated. It is still set to the actual amount of
> >    bytes we got in the first response.  The client will send a READ
> >    request for the remaining 12K.
> 
> This is looks like a real bug, but I think the "NOT" above is the best thing
> to fix.

No doubt, a real bug! Easily reproduced with an instrumented nfsd.

> i.e. when another read request is set, res.count *SHOULD*BE* updated.  That
> makes it consistent with the original send, and consistency is good!

I thought about it, but the resp->count is NOT really used as far as I
know. And the current upstream code unconditionally sets the resp->count
in xdr function. So I chose the upstream method! I agree, your patch is
more consistent with the existing code.

> Index: linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c
> ===================================================================
> --- linux-2.6.32-SLE11-SP1-LTSS.orig/fs/nfs/read.c	2013-03-20 16:24:31.426605189 +1100
> +++ linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c	2013-04-11 11:19:57.670724540 +1000
> @@ -368,6 +368,7 @@ static void nfs_readpage_retry(struct rp
>  	argp->offset += resp->count;
>  	argp->pgbase += resp->count;
>  	argp->count -= resp->count;
> +	resp->count = argp->count;
>  	nfs4_restart_rpc(task, NFS_SERVER(data->inode)->nfs_client);
>  	return;
>  out:

This patch should fix the bug as well.

> This would old affect clients with servers which would sometimes
> return partial reads, and I don't think the Linux NFS server does.
> What server have you seen this against?

We came across under Ganesha development, and I was told the same thing
that Linux NFS server doesn't do this. I didn't bother to post then, but
now we saw the same thing with linux NFS server, so I decided to post it
now. Although linux NFS server doesn't in itself create short reads but
it just calls the underlying back-end file system and sends without a
short read "check" to NFS clients. In other words, the short read
behaviour actually depends on the back-end file system rather than linux
NFS server. We saw this bug with our GPFS file system in combination
with HSM -- request for reading data on tape would fail until it is
brought back to disk.

Regards, Malahal.


      reply	other threads:[~2013-04-11  3:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-29 19:57 [PATCH] NFSv3/v2: Fix data corruption with NFS short reads Malahal Naineni
2013-04-11  1:21 ` NeilBrown
2013-04-11  3:55   ` malahal naineni [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130411035506.GA32459@us.ibm.com \
    --to=malahal@us.ibm.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox