linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: <tao.peng@emc.com>, Sorin Faibish <sfaibish@emc.com>
Cc: <Trond.Myklebust@netapp.com>, <bergwolf@gmail.com>,
	<linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 3/3] pnfsblock: bail out unaligned DIO
Date: Mon, 28 May 2012 14:26:10 +0300	[thread overview]
Message-ID: <4FC360D2.4030006@panasas.com> (raw)
In-Reply-To: <F19688880B763E40B28B2B462677FBF805F0BADF36@MX09A.corp.emc.com>

On 05/28/2012 05:30 AM, tao.peng@emc.com wrote:

>> -----Original Message-----
>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com]

<>

>> Also, why do you consider it to be direct i/o specific? If the
>> application is using byte range locking, and the locks aren't page/block
>> aligned then you are in the same position of having to deal with partial
>> page writes even in the read/write from page cache situation.


> You are right about byte range locking + buffered IO, and it should
> be fixed in pg_test with bellow patch and it could be a stable
> candidate. 


What?? please explain. It sounds like you are saying that there is a
very *serious* bug in current block-layout.

>From my experiment I know that lots and lots of IO is done none-paged
aligned even in buffered IO. Actually NFS goes to great length not to do
the usual read-modify-write per page, but keeps the byte range that
was written per page and only RPCs the exact offset-length of the modification.
Because by definition NFS is byte aligned IO, not "blocks" or "sectors".

Please explain what happens now. Is it a data corruption? Or just
performance slowness.

I don't understand. Don't you do the proper read-copy-modify-write that's
mandated by block layout RFC? byte aligned? And what are sectors and PAGES
got to do with it? I thought all IO must be "block" aligned.

In objects-layout we have even worse alignment constrains with raid5
(stripe_size alignment). It was needed to do a (very simple BTW)
read-modify-write. Involving not just partial pages but also full pages read.
BTW we read into the page-cache the surrounding pages, so not to read them multiple
times.

> It is different from DIO case because for DIO we have to
> be sure each page is blocksize aligned. And it can't easily be done
> in pg_test because in pg_test we only have one nfs_page to test
> against.
> 

<snip>

> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 7ae8a60..a84a0da 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -925,6 +925,18 @@ nfs4_blk_get_deviceinfo(struct nfs_server *server, const struct nfs_fh *fh,
>  	return rv;
>  }
>  
> +static bool
> +bl_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
> +	   struct nfs_page *req)
> +{
> +	/* Bail out page unligned IO */
> +	if (req->wb_offset || req->wb_pgbase ||
> +	    req->wb_bytes != PAGE_CACHE_SIZE)
> +		return false;
> +


This is very serious. Not many applications will currently pass
this test. (And hence will not do direct IO)

What happens today without this patch?

> +	return pnfs_generic_pg_test(pgio, prev, req);
> +}
> +


<>

Thanks
Boaz

  parent reply	other threads:[~2012-05-28 11:26 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-27  5:32 [PATCH 0/3] NFS41: make block layout driver work w/ NFS DIO changes Peng Tao
2012-05-27  5:32 ` [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end Peng Tao
2012-05-27 16:29   ` Myklebust, Trond
2012-05-28  2:30     ` tao.peng
2012-05-28  3:42       ` Myklebust, Trond
2012-05-28  4:13         ` tao.peng
2012-05-28 10:44           ` Boaz Harrosh
2012-05-28 10:51             ` Boaz Harrosh
2012-05-28 17:14               ` Peng Tao
2012-05-28 17:24                 ` Boaz Harrosh
2012-05-28 17:36                   ` Peng Tao
2012-05-28 17:44                     ` Myklebust, Trond
2012-05-27  5:32 ` [PATCH 2/3] pnfsblock: call block plug in bl_dio_begin/end Peng Tao
2012-05-27  5:33 ` [PATCH 3/3] pnfsblock: bail out unaligned DIO Peng Tao
2012-05-27 16:38   ` Myklebust, Trond
2012-05-28  2:30     ` tao.peng
2012-05-28  3:45       ` Myklebust, Trond
2012-05-28  4:26         ` tao.peng
2012-05-28 16:33           ` Myklebust, Trond
2012-05-28 17:07             ` Peng Tao
2012-05-28 11:26       ` Boaz Harrosh [this message]
2012-05-28 16:50         ` Peng Tao
2012-05-28 17:10           ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FC360D2.4030006@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=bergwolf@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sfaibish@emc.com \
    --cc=tao.peng@emc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).