From: Boaz Harrosh <bharrosh@panasas.com>
To: Peng Tao <bergwolf@gmail.com>
Cc: <Trond.Myklebust@netapp.com>, <linux-nfs@vger.kernel.org>,
<stable@vger.kernel.org>, Peng Tao <tao.peng@emc.com>
Subject: Re: [PATCH] pnfsblock: bail out partial page IO
Date: Tue, 29 May 2012 10:07:23 +0300 [thread overview]
Message-ID: <4FC475AB.3010607@panasas.com> (raw)
In-Reply-To: <1338228443-4883-1-git-send-email-bergwolf@gmail.com>
On 05/28/2012 09:07 PM, Peng Tao wrote:
> Current block layout driver read/write code assumes page
> aligned IO in many places. Add a checker to validate the assumption.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Peng Tao <tao.peng@emc.com>
> ---
> This is the minimal patch for buffer IO case. I will cook DIO alignment checker
> based on it, since DIO isn't needed for stable.
>
> fs/nfs/blocklayout/blocklayout.c | 39 +++++++++++++++++++++++++++++++++++--
> 1 files changed, 36 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 7ae8a60..447f8d1 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -228,6 +228,14 @@ bl_end_par_io_read(void *data, int unused)
> schedule_work(&rdata->task.u.tk_work);
> }
>
> +static bool
> +bl_nfspage_aligned(u64 offset, u32 len, u32 blkmask)
> +{
> + if ((offset & blkmask) || (len & blkmask))
> + return false;
> + return true;
> +}
> +
> static enum pnfs_try_status
> bl_read_pagelist(struct nfs_read_data *rdata)
> {
> @@ -244,6 +252,9 @@ bl_read_pagelist(struct nfs_read_data *rdata)
> dprintk("%s enter nr_pages %u offset %lld count %u\n", __func__,
> rdata->pages.npages, f_offset, (unsigned int)rdata->args.count);
>
> + if (!bl_nfspage_aligned(f_offset, rdata->args.count, PAGE_CACHE_MASK))
> + goto use_mds;
> +
Please put a dprint in the true case so we can see if this hits a lot.
What you are saying is that this should happen very rarely.
After you put the print. Does it hit? what is the test case that causes it?
> par = alloc_parallel(rdata);
> if (!par)
> goto use_mds;
> @@ -552,7 +563,7 @@ bl_write_pagelist(struct nfs_write_data *wdata, int sync)
> struct bio *bio = NULL;
> struct pnfs_block_extent *be = NULL, *cow_read = NULL;
> sector_t isect, last_isect = 0, extent_length = 0;
> - struct parallel_io *par;
> + struct parallel_io *par = NULL;
> loff_t offset = wdata->args.offset;
> size_t count = wdata->args.count;
> struct page **pages = wdata->args.pages;
> @@ -563,6 +574,10 @@ bl_write_pagelist(struct nfs_write_data *wdata, int sync)
> NFS_SERVER(header->inode)->pnfs_blksize >> PAGE_CACHE_SHIFT;
>
> dprintk("%s enter, %Zu@%lld\n", __func__, count, offset);
> + /* Check for alignment first */
> + if (!bl_nfspage_aligned(offset, count, PAGE_CACHE_MASK))
> + goto out_mds;
> +
Also here
> /* At this point, wdata->pages is a (sequential) list of nfs_pages.
> * We want to write each, and if there is an error set pnfs_error
> * to have it redone using nfs.
> @@ -996,14 +1011,32 @@ bl_clear_layoutdriver(struct nfs_server *server)
> return 0;
> }
>
> +static void
> +bl_pg_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
> +{
> + if (!bl_nfspage_aligned(req->wb_offset, req->wb_bytes, PAGE_CACHE_MASK))
> + nfs_pageio_reset_read_mds(pgio);
Also here
> + else
> + pnfs_generic_pg_init_read(pgio, req);
> +}
> +
> +static void
> +bl_pg_init_write(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
> +{
> + if (!bl_nfspage_aligned(req->wb_offset, req->wb_bytes, PAGE_CACHE_MASK))
> + nfs_pageio_reset_write_mds(pgio);
And here
> + else
> + pnfs_generic_pg_init_write(pgio, req);
> +}
> +
> static const struct nfs_pageio_ops bl_pg_read_ops = {
> - .pg_init = pnfs_generic_pg_init_read,
> + .pg_init = bl_pg_init_read,
> .pg_test = pnfs_generic_pg_test,
> .pg_doio = pnfs_generic_pg_readpages,
> };
>
> static const struct nfs_pageio_ops bl_pg_write_ops = {
> - .pg_init = pnfs_generic_pg_init_write,
> + .pg_init = bl_pg_init_write,
> .pg_test = pnfs_generic_pg_test,
> .pg_doio = pnfs_generic_pg_writepages,
> };
If the prints actually never hit except with some rare applications
than that's fine. But if they do Perhaps a fix is not that hard.
Specially the read case should not be that hard to fix. Because you
are reading into pages, and page boundaries are aligned to page boundaries
in the stream, so all you need to do is read the reminders (at both ends)
as well.
Also the write case is not that hard. You just need to call a read-for-write()
routine that does a sync read, when done continues with today's write.
Cheers
Boaz
next prev parent reply other threads:[~2012-05-29 7:07 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-28 18:07 [PATCH] pnfsblock: bail out partial page IO Peng Tao
2012-05-28 19:20 ` Jim Rees
2012-05-29 7:07 ` Boaz Harrosh [this message]
2012-05-31 3:33 ` tao.peng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FC475AB.3010607@panasas.com \
--to=bharrosh@panasas.com \
--cc=Trond.Myklebust@netapp.com \
--cc=bergwolf@gmail.com \
--cc=linux-nfs@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tao.peng@emc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.