readpages & writepages

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* readpages & writepages
@ 2003-01-22 14:42 Steven French
  2003-01-23  0:37 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Steven French @ 2003-01-22 14:42 UTC (permalink / raw)
  To: linux-fsdevel

[Resending since the original message appears to have been lost]

Some questions about implementing readpages & writepages for network
filesystems ...

I didn't find many examples of readpages & writepages in 2.5 yet (so far
just the local filesystem examples that mostly use fs/mpage.c common code
and the nfs case) as I looked through trying to implement these optional
vfs entry points for the cifs case.   Doing larger reads, each of size 16K,
turns out to be reasonably efficient to typical CIFS servers (larger is of
course also possible), and would have the benefit of reducing the network
traffic noticeably when doing readahead.    There are a few difficulties
though - there does not appear to be an obvious mapping of the list of
pages (passed in by to readpages/writepages), at least to something that I
could memcopy my 16K buffer network read buffer into - so the pages have to
be individually mapped and copied in 4K chunks, losing a small amount of
the benefit of having a readpages/writepages in the first place and the
common routines (read_cache_pages and generic_writepages/mpage_writepages
respectively) don't seem written for the case in which > 4K is copied in
one call to the filesystem and for nfs the mpage_writepages call seems to
default to calling the vfs op writepage (since a null get_block_t routine
is passed in) which could have been done by simply not supporting the
writepages entry point - perhaps this is just an intermediate coding step,
a staging of the eventual function.

Has there been much discussion or much written down (other than a little
bit in documentation/filesystems/locking and the readahead function
comments in mm/readahead.c) on the suitability of readpages & writepages
for network filesystems?  There obviously can be significant benefits in
reducing the number of network roundtrips if it can be made to work
efficiently ...

Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench@us.ibm.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: readpages & writepages
  2003-01-22 14:42 readpages & writepages Steven French
@ 2003-01-23  0:37 ` Andrew Morton
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2003-01-23  0:37 UTC (permalink / raw)
  To: Steven French; +Cc: linux-fsdevel

Steven French <sfrench@us.ibm.com> wrote:
>
> Some questions about implementing readpages & writepages for network
> filesystems ...
> 
> I didn't find many examples of readpages & writepages in 2.5 yet (so far
> just the local filesystem examples that mostly use fs/mpage.c common code
> and the nfs case) as I looked through trying to implement these optional
> vfs entry points for the cifs case.

My approach there was to implement sufficient functionality for the kernel to
be able to assemble multipage BIOs for ext2 filesystems, and no more.  To
keep it simple, clean and well documented, in the expectation that when other
filesystem developers get into it, the functionality will grow on-demand.

And indeed this has happened - enhancements have been added as AFS, reiserfs3
and NFS have started to use this code.

I expect more changes will be needed to accommodate other filesystems. 
That's fine.

>   Doing larger reads, each of size 16K,
> turns out to be reasonably efficient to typical CIFS servers (larger is of
> course also possible), and would have the benefit of reducing the network
> traffic noticeably when doing readahead.    There are a few difficulties
> though - there does not appear to be an obvious mapping of the list of
> pages (passed in by to readpages/writepages), at least to something that I
> could memcopy my 16K buffer network read buffer into - so the pages have to
> be individually mapped and copied in 4K chunks, losing a small amount of
> the benefit of having a readpages/writepages in the first place

Well if you want a single 16k physically contiguous chunk of memory then yup,
there are tons of problems around that; it won't be happening in 2.6.

The overhead of mapping and unmapping 4 pages is very small, especially when
compared with the cost of the copy itself!

You'd be better off looking into avoiding that copy altogether: feed four 4k
pages down to the network stack and get them filled in direct from the
busmastering receive.

(And bear in mind that 1 single 16k chunk could perform worse than 4x4k
chunks, if the network receive and the copy are serialised.  With 4k pages,
the CPU can be copying one page _while_ the network is pulling in the next
one...)

> and the
> common routines (read_cache_pages and generic_writepages/mpage_writepages
> respectively) don't seem written for the case in which > 4K is copied in
> one call to the filesystem and for nfs the mpage_writepages call seems to
> default to calling the vfs op writepage (since a null get_block_t routine
> is passed in) which could have been done by simply not supporting the
> writepages entry point - perhaps this is just an intermediate coding step,
> a staging of the eventual function.
> 
> Has there been much discussion or much written down (other than a little
> bit in documentation/filesystems/locking and the readahead function
> comments in mm/readahead.c) on the suitability of readpages & writepages
> for network filesystems?  There obviously can be significant benefits in
> reducing the number of network roundtrips if it can be made to work
> efficiently ...

I don't understand how this affects the number of roundtrips?  You seem to be
implying that CIFS has an fs-private receive buffer, the contents of which
are copied into the VFS's pagecache?

If so, then making that buffer be 16k should work OK?

If not, then some more details on the general data flow would be needed,
please.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: readpages & writepages
@ 2003-01-23  3:35 Steven French
  0 siblings, 0 replies; 4+ messages in thread
From: Steven French @ 2003-01-23  3:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel

Andrew Morton wrote:
> You seem to be
>implying that CIFS has an fs-private receive buffer, the contents of which
>are copied into the VFS's pagecache?
>
>If so, then making that buffer be 16k should work OK?

Yes - since I (at least for the forseeable future) still have a private
receive buffer for each target server (which usually negotiates to a size
of 16K + a little extra for protocol header), then by doing something
similar to read_cache_pages from readpage.c then readpages can be made to
work by copying the 16K readbuffer into the cache one page at a time.    I
am prototyping that now.    Doing scatter/gather style receives (avoiding a
copy) into the cache pages directly would be desirable (eventually) but
would be non-trivial to prototype and may not even be possible due to the
SMB/CIFS protocol's variable length protocol headers and the multiplexed
tcp traffic but at least for the writepages case something like a scatter
send approach (reducing the copys) shouldn't be too hard to code.

Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench@us.ibm.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* readpages & writepages
@ 2003-01-21 22:29 Steven French
  0 siblings, 0 replies; 4+ messages in thread
From: Steven French @ 2003-01-21 22:29 UTC (permalink / raw)
  To: linux-fsdevel

Some questions about implementing readpages & writepages for network
filesystems ...

I didn't find many examples of readpages & writepages in 2.5 yet (so far
just the local filesystem examples that mostly use fs/mpage.c common code
and the nfs case) as I looked through trying to implement these optional
vfs entry points for the cifs case.   Doing larger reads, each of size 16K,
turns out to be reasonably efficient to typical CIFS servers (larger is of
course also possible), and would have the benefit of reducing the network
traffic noticeably when doing readahead.    There are a few difficulties
though - there does not appear to be an obvious mapping of the list of
pages (passed in by to readpages/writepages), at least to something that I
could memcopy my 16K buffer network read buffer into - so the pages have to
be individually mapped and copied in 4K chunks, losing a small amount of
the benefit of having a readpages/writepages in the first place and the
common routines (read_cache_pages and generic_writepages/mpage_writepages
respectively) don't seem written for the case in which > 4K is copied in
one call to the filesystem and for nfs the mpage_writepages call seems to
default to calling the vfs op writepage (since a null get_block_t routine
is passed in) which could have been done by simply not supporting the
writepages entry point - perhaps this is just an intermediate coding step,
a staging of the eventual function.

Has there been much discussion or much written down (other than a little
bit in documentation/filesystems/locking and the readahead function
comments in mm/readahead.c) on the suitability of readpages & writepages
for network filesystems?  There obviously can be significant benefits in
reducing the number of network roundtrips if it can be made to work
efficiently ...

Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench@us.ibm.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-01-23  6:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-22 14:42 readpages & writepages Steven French
2003-01-23  0:37 ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2003-01-23  3:35 Steven French
2003-01-21 22:29 Steven French

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).