From: Boaz Harrosh <bharrosh@panasas.com>
To: Peng Tao <bergwolf@gmail.com>
Cc: <Trond.Myklebust@netapp.com>, <linux-nfs@vger.kernel.org>,
<bhalevy@tonian.com>
Subject: Re: [PATCH 0/4] nfs41: allow layoutget at pnfs_do_multiple_writes
Date: Tue, 29 Nov 2011 13:34:28 -0800 [thread overview]
Message-ID: <4ED54FE4.9050008@panasas.com> (raw)
In-Reply-To: <1322887965-2938-1-git-send-email-bergwolf@gmail.com>
On 12/02/2011 08:52 PM, Peng Tao wrote:
> Issuing layoutget at .pg_init will drop the IO size information and ask for 4KB
> layout every time. However, the IO size information is very valuable for MDS to
> determine how much layout it should return to client.
>
> The patchset try to allow LD not to send layoutget at .pg_init but instead at
> pnfs_do_multiple_writes. So that real IO size is preserved and sent to MDS.
>
> Tests against a server that does not aggressively pre-allocate layout, shows
> that the IO size informantion is really useful to block layout MDS.
>
> The generic pnfs layer changes are trival to file layout and object as long as
> they still send layoutget at .pg_init.
>
I have a better solution for your problem. Which is a much smaller a change and
I think gives you much better heuristics.
Keep the layout_get exactly where it is, but instead of sending PAGE_SIZE send
the amount of dirty pages you have.
If it is a linear write you will be exact on the money with a single lo_get. If
it is an heavy random write then you might need more lo_gets and you might be getting
some unused segments. But heavy random write is rare and slow anyway. As a first
approximation its fine. (We can later fix that as well)
The .pg_init is done after .write_pages call from VFS and all the to-be-written
pages are already staged to be written. So there should be a way to easily extract
that information.
> iozone cmd:
> ./iozone -r 1m -s 4G -w -W -c -t 10 -i 0 -F /mnt/iozone.data.1 /mnt/iozone.data.2 /mnt/iozone.data.3 /mnt/iozone.data.4 /mnt/iozone.data.5 /mnt/iozone.data.6 /mnt/iozone.data.7 /mnt/iozone.data.8 /mnt/iozone.data.9 /mnt/iozone.data.10
>
> Befor patch: around 12MB/s throughput
> After patch: around 72MB/s throughput
>
Yes Yes that stupid Brain dead Server is no indication for anything. The server
should know best about optimal sizes and layouts. Please don't give me that stuff
again.
But just do the above and you'll see that it is perfect.
BTW don't limit the lo_segment size by the max_io_size. This is why you
have .bg_test to signal when IO is maxed out.
- The read segments should be as big as possible (i_size long)
- The Write segments should ideally be as big as the Application
wants to write to. (Amount of dirty pages at time of nfs-write-out
is a very good first approximation).
So I guess it is: I hate these patches, to much mess, too little goodness.
Thank
Boaz
> Peng Tao (4):
> nfsv41: export pnfs_find_alloc_layout
> nfsv41: add and export pnfs_find_get_layout_locked
> nfsv41: get lseg before issue LD IO if pgio doesn't carry lseg
> pnfsblock: do ask for layout in pg_init
>
> fs/nfs/blocklayout/blocklayout.c | 54 ++++++++++++++++++++++++++-
> fs/nfs/pnfs.c | 74 +++++++++++++++++++++++++++++++++++++-
> fs/nfs/pnfs.h | 9 +++++
> 3 files changed, 134 insertions(+), 3 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-11-29 21:34 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-03 4:52 [PATCH 0/4] nfs41: allow layoutget at pnfs_do_multiple_writes Peng Tao
2011-11-29 21:34 ` Boaz Harrosh [this message]
2011-11-29 21:50 ` Boaz Harrosh
2011-11-29 21:57 ` Trond Myklebust
2011-11-29 22:40 ` Boaz Harrosh
2011-11-29 22:47 ` Trond Myklebust
2011-11-29 22:58 ` Boaz Harrosh
2011-11-29 23:30 ` Trond Myklebust
2011-11-29 23:49 ` Marc Eshel
2011-11-30 0:08 ` Trond Myklebust
2011-11-30 0:20 ` Marc Eshel
2011-11-30 0:37 ` Trond Myklebust
2011-11-30 0:50 ` Boaz Harrosh
2011-11-30 19:39 ` J. Bruce Fields
2011-11-30 0:52 ` Marc Eshel
2011-11-30 19:44 ` J. Bruce Fields
2011-12-01 9:47 ` Benny Halevy
2011-12-01 11:14 ` J. Bruce Fields
2011-12-01 11:48 ` J. Bruce Fields
2011-11-30 0:42 ` Boaz Harrosh
2011-11-30 0:24 ` Boaz Harrosh
2011-11-30 0:58 ` Trond Myklebust
2011-11-30 1:46 ` Boaz Harrosh
2011-11-30 2:07 ` Trond Myklebust
2011-11-30 3:08 ` Boaz Harrosh
2011-11-30 12:33 ` Benny Halevy
2011-11-30 0:37 ` Matt W. Benjamin
2011-11-30 0:48 ` Matt W. Benjamin
2011-11-30 1:01 ` Trond Myklebust
2011-11-30 1:03 ` Matt W. Benjamin
2011-11-29 23:01 ` Trond Myklebust
2011-11-29 23:47 ` Boaz Harrosh
2011-11-30 3:16 ` tao.peng
2011-11-30 3:50 ` Boaz Harrosh
2011-11-30 5:05 ` tao.peng
2011-11-30 12:42 ` Benny Halevy
2011-12-03 4:52 ` [PATCH 1/4] nfsv41: export pnfs_find_alloc_layout Peng Tao
2011-12-03 4:52 ` [PATCH 2/4] nfsv41: add and export pnfs_find_get_layout_locked Peng Tao
2011-12-03 4:52 ` [PATCH 3/4] nfsv41: get lseg before issue LD IO if pgio doesn't carry lseg Peng Tao
2011-11-30 13:01 ` Benny Halevy
2011-11-30 13:20 ` Peng Tao
2011-12-03 4:52 ` [PATCH 4/4] pnfsblock: do ask for layout in pg_init Peng Tao
2011-11-29 16:40 ` Trond Myklebust
2011-11-29 17:25 ` Peng Tao
2011-11-29 17:43 ` Trond Myklebust
2011-11-30 2:55 ` tao.peng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ED54FE4.9050008@panasas.com \
--to=bharrosh@panasas.com \
--cc=Trond.Myklebust@netapp.com \
--cc=bergwolf@gmail.com \
--cc=bhalevy@tonian.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).