From: Boaz Harrosh <bharrosh@panasas.com>
To: <tao.peng@emc.com>, Sorin Faibish <sfaibish@emc.com>
Cc: <Trond.Myklebust@netapp.com>, <bergwolf@gmail.com>,
<linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 3/3] pnfsblock: bail out unaligned DIO
Date: Mon, 28 May 2012 14:26:10 +0300 [thread overview]
Message-ID: <4FC360D2.4030006@panasas.com> (raw)
In-Reply-To: <F19688880B763E40B28B2B462677FBF805F0BADF36@MX09A.corp.emc.com>
On 05/28/2012 05:30 AM, tao.peng@emc.com wrote:
>> -----Original Message-----
>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com]
<>
>> Also, why do you consider it to be direct i/o specific? If the
>> application is using byte range locking, and the locks aren't page/block
>> aligned then you are in the same position of having to deal with partial
>> page writes even in the read/write from page cache situation.
> You are right about byte range locking + buffered IO, and it should
> be fixed in pg_test with bellow patch and it could be a stable
> candidate.
What?? please explain. It sounds like you are saying that there is a
very *serious* bug in current block-layout.
>From my experiment I know that lots and lots of IO is done none-paged
aligned even in buffered IO. Actually NFS goes to great length not to do
the usual read-modify-write per page, but keeps the byte range that
was written per page and only RPCs the exact offset-length of the modification.
Because by definition NFS is byte aligned IO, not "blocks" or "sectors".
Please explain what happens now. Is it a data corruption? Or just
performance slowness.
I don't understand. Don't you do the proper read-copy-modify-write that's
mandated by block layout RFC? byte aligned? And what are sectors and PAGES
got to do with it? I thought all IO must be "block" aligned.
In objects-layout we have even worse alignment constrains with raid5
(stripe_size alignment). It was needed to do a (very simple BTW)
read-modify-write. Involving not just partial pages but also full pages read.
BTW we read into the page-cache the surrounding pages, so not to read them multiple
times.
> It is different from DIO case because for DIO we have to
> be sure each page is blocksize aligned. And it can't easily be done
> in pg_test because in pg_test we only have one nfs_page to test
> against.
>
<snip>
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 7ae8a60..a84a0da 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -925,6 +925,18 @@ nfs4_blk_get_deviceinfo(struct nfs_server *server, const struct nfs_fh *fh,
> return rv;
> }
>
> +static bool
> +bl_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
> + struct nfs_page *req)
> +{
> + /* Bail out page unligned IO */
> + if (req->wb_offset || req->wb_pgbase ||
> + req->wb_bytes != PAGE_CACHE_SIZE)
> + return false;
> +
This is very serious. Not many applications will currently pass
this test. (And hence will not do direct IO)
What happens today without this patch?
> + return pnfs_generic_pg_test(pgio, prev, req);
> +}
> +
<>
Thanks
Boaz
next prev parent reply other threads:[~2012-05-28 11:26 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-27 5:32 [PATCH 0/3] NFS41: make block layout driver work w/ NFS DIO changes Peng Tao
2012-05-27 5:32 ` [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end Peng Tao
2012-05-27 16:29 ` Myklebust, Trond
2012-05-28 2:30 ` tao.peng
2012-05-28 3:42 ` Myklebust, Trond
2012-05-28 4:13 ` tao.peng
2012-05-28 10:44 ` Boaz Harrosh
2012-05-28 10:51 ` Boaz Harrosh
2012-05-28 17:14 ` Peng Tao
2012-05-28 17:24 ` Boaz Harrosh
2012-05-28 17:36 ` Peng Tao
2012-05-28 17:44 ` Myklebust, Trond
2012-05-27 5:32 ` [PATCH 2/3] pnfsblock: call block plug in bl_dio_begin/end Peng Tao
2012-05-27 5:33 ` [PATCH 3/3] pnfsblock: bail out unaligned DIO Peng Tao
2012-05-27 16:38 ` Myklebust, Trond
2012-05-28 2:30 ` tao.peng
2012-05-28 3:45 ` Myklebust, Trond
2012-05-28 4:26 ` tao.peng
2012-05-28 16:33 ` Myklebust, Trond
2012-05-28 17:07 ` Peng Tao
2012-05-28 11:26 ` Boaz Harrosh [this message]
2012-05-28 16:50 ` Peng Tao
2012-05-28 17:10 ` Boaz Harrosh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FC360D2.4030006@panasas.com \
--to=bharrosh@panasas.com \
--cc=Trond.Myklebust@netapp.com \
--cc=bergwolf@gmail.com \
--cc=linux-nfs@vger.kernel.org \
--cc=sfaibish@emc.com \
--cc=tao.peng@emc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.