From: Boaz Harrosh <bharrosh@panasas.com>
To: <tao.peng@emc.com>, Sorin Faibish <sfaibish@emc.com>
Cc: <Trond.Myklebust@netapp.com>, <bergwolf@gmail.com>,
<linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 3/3] pnfsblock: bail out unaligned DIO
Date: Mon, 28 May 2012 14:26:10 +0300 [thread overview]
Message-ID: <4FC360D2.4030006@panasas.com> (raw)
In-Reply-To: <F19688880B763E40B28B2B462677FBF805F0BADF36@MX09A.corp.emc.com>
On 05/28/2012 05:30 AM, tao.peng@emc.com wrote:
>> -----Original Message-----
>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com]
<>
>> Also, why do you consider it to be direct i/o specific? If the
>> application is using byte range locking, and the locks aren't page/block
>> aligned then you are in the same position of having to deal with partial
>> page writes even in the read/write from page cache situation.
> You are right about byte range locking + buffered IO, and it should
> be fixed in pg_test with bellow patch and it could be a stable
> candidate.
What?? please explain. It sounds like you are saying that there is a
very *serious* bug in current block-layout.
>From my experiment I know that lots and lots of IO is done none-paged
aligned even in buffered IO. Actually NFS goes to great length not to do
the usual read-modify-write per page, but keeps the byte range that
was written per page and only RPCs the exact offset-length of the modification.
Because by definition NFS is byte aligned IO, not "blocks" or "sectors".
Please explain what happens now. Is it a data corruption? Or just
performance slowness.
I don't understand. Don't you do the proper read-copy-modify-write that's
mandated by block layout RFC? byte aligned? And what are sectors and PAGES
got to do with it? I thought all IO must be "block" aligned.
In objects-layout we have even worse alignment constrains with raid5
(stripe_size alignment). It was needed to do a (very simple BTW)
read-modify-write. Involving not just partial pages but also full pages read.
BTW we read into the page-cache the surrounding pages, so not to read them multiple
times.
> It is different from DIO case because for DIO we have to
> be sure each page is blocksize aligned. And it can't easily be done
> in pg_test because in pg_test we only have one nfs_page to test
> against.
>
<snip>
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 7ae8a60..a84a0da 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -925,6 +925,18 @@ nfs4_blk_get_deviceinfo(struct nfs_server *server, const struct nfs_fh *fh,
> return rv;
> }
>
> +static bool
> +bl_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
> + struct nfs_page *req)
> +{
> + /* Bail out page unligned IO */
> + if (req->wb_offset || req->wb_pgbase ||
> + req->wb_bytes != PAGE_CACHE_SIZE)
> + return false;
> +
This is very serious. Not many applications will currently pass
this test. (And hence will not do direct IO)
What happens today without this patch?
> + return pnfs_generic_pg_test(pgio, prev, req);
> +}
> +
<>
Thanks
Boaz
next prev parent reply other threads:[~2012-05-28 11:26 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-27 5:32 [PATCH 0/3] NFS41: make block layout driver work w/ NFS DIO changes Peng Tao
2012-05-27 5:32 ` [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end Peng Tao
2012-05-27 16:29 ` Myklebust, Trond
2012-05-28 2:30 ` tao.peng
2012-05-28 3:42 ` Myklebust, Trond
2012-05-28 4:13 ` tao.peng
2012-05-28 10:44 ` Boaz Harrosh
2012-05-28 10:51 ` Boaz Harrosh
2012-05-28 17:14 ` Peng Tao
2012-05-28 17:24 ` Boaz Harrosh
2012-05-28 17:36 ` Peng Tao
2012-05-28 17:44 ` Myklebust, Trond
2012-05-27 5:32 ` [PATCH 2/3] pnfsblock: call block plug in bl_dio_begin/end Peng Tao
2012-05-27 5:33 ` [PATCH 3/3] pnfsblock: bail out unaligned DIO Peng Tao
2012-05-27 16:38 ` Myklebust, Trond
2012-05-28 2:30 ` tao.peng
2012-05-28 3:45 ` Myklebust, Trond
2012-05-28 4:26 ` tao.peng
2012-05-28 16:33 ` Myklebust, Trond
2012-05-28 17:07 ` Peng Tao
2012-05-28 11:26 ` Boaz Harrosh [this message]
2012-05-28 16:50 ` Peng Tao
2012-05-28 17:10 ` Boaz Harrosh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FC360D2.4030006@panasas.com \
--to=bharrosh@panasas.com \
--cc=Trond.Myklebust@netapp.com \
--cc=bergwolf@gmail.com \
--cc=linux-nfs@vger.kernel.org \
--cc=sfaibish@emc.com \
--cc=tao.peng@emc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).