Networked filesystems vs backing_dev

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Networked filesystems vs backing_dev_info
@ 2007-10-27  9:34 Peter Zijlstra
  2007-10-27 15:22 ` Jan Harkes
  2007-10-27 21:02 ` Steve French
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2007-10-27  9:34 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: David Howells, sfrench, jaharkes, Andrew Morton, vandrove

Hi,

I had me a little look at bdi usage in networked filesystems.

 NFS, CIFS, (smbfs), AFS, CODA and NCP

And of those, NFS is the only one that I could find that creates
backing_dev_info structures. The rest seems to fall back to
default_backing_dev_info.

With my recent per bdi dirty limit patches the bdi has become more
important than it has been in the past. While falling back to the
default_backing_dev_info isn't wrong per-se, it isn't right either. 

Could I implore the various maintainers to look into this issue for
their respective filesystem. I'll try and come up with some patches to
address this, but feel free to beat me to it.

peterz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Networked filesystems vs backing_dev_info
  2007-10-27  9:34 Networked filesystems vs backing_dev_info Peter Zijlstra
@ 2007-10-27 15:22 ` Jan Harkes
  2007-10-27 15:32   ` Peter Zijlstra
  2007-10-27 21:02 ` Steve French
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Harkes @ 2007-10-27 15:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-fsdevel, David Howells, sfrench,
	Andrew Morton, vandrove

On Sat, Oct 27, 2007 at 11:34:26AM +0200, Peter Zijlstra wrote:
> I had me a little look at bdi usage in networked filesystems.
> 
>  NFS, CIFS, (smbfs), AFS, CODA and NCP
> 
> And of those, NFS is the only one that I could find that creates
> backing_dev_info structures. The rest seems to fall back to
> default_backing_dev_info.

While a file is opened in Coda we associate the open file handle with a
local cache file. All read and write operations are redirected to this
local file and we even redirect inode->i_mapping. Actual reads and
writes are completely handled by the underlying file system. We send the
new file contents back to the servers only after all local references
have been released (last-close semantics).

As a result, there is no need for backing_dev_info structures in Coda,
if any congestion control is needed it will be handled by the underlying
file system where our locally cached copies are stored.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Networked filesystems vs backing_dev_info
  2007-10-27 15:22 ` Jan Harkes
@ 2007-10-27 15:32   ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2007-10-27 15:32 UTC (permalink / raw)
  To: Jan Harkes
  Cc: linux-kernel, linux-fsdevel, David Howells, sfrench,
	Andrew Morton, vandrove


On Sat, 2007-10-27 at 11:22 -0400, Jan Harkes wrote:
> On Sat, Oct 27, 2007 at 11:34:26AM +0200, Peter Zijlstra wrote:
> > I had me a little look at bdi usage in networked filesystems.
> > 
> >  NFS, CIFS, (smbfs), AFS, CODA and NCP
> > 
> > And of those, NFS is the only one that I could find that creates
> > backing_dev_info structures. The rest seems to fall back to
> > default_backing_dev_info.
> 
> While a file is opened in Coda we associate the open file handle with a
> local cache file. All read and write operations are redirected to this
> local file and we even redirect inode->i_mapping. Actual reads and
> writes are completely handled by the underlying file system. We send the
> new file contents back to the servers only after all local references
> have been released (last-close semantics).
> 
> As a result, there is no need for backing_dev_info structures in Coda,
> if any congestion control is needed it will be handled by the underlying
> file system where our locally cached copies are stored.

Ok, that works. Thanks for this explanation!


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Networked filesystems vs backing_dev_info
  2007-10-27  9:34 Networked filesystems vs backing_dev_info Peter Zijlstra
  2007-10-27 15:22 ` Jan Harkes
@ 2007-10-27 21:02 ` Steve French
  2007-10-27 21:30   ` Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: Steve French @ 2007-10-27 21:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-fsdevel, David Howells, sfrench, jaharkes,
	Andrew Morton, vandrove

On 10/27/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> Hi,
>
> I had me a little look at bdi usage in networked filesystems.
>
>  NFS, CIFS, (smbfs), AFS, CODA and NCP
>
> And of those, NFS is the only one that I could find that creates
> backing_dev_info structures. The rest seems to fall back to
> default_backing_dev_info.
>
> With my recent per bdi dirty limit patches the bdi has become more
> important than it has been in the past. While falling back to the
> default_backing_dev_info isn't wrong per-se, it isn't right either.
>
> Could I implore the various maintainers to look into this issue for
> their respective filesystem. I'll try and come up with some patches to
> address this, but feel free to beat me to it.

I would like to understand more about your patches to see what bdi
values makes sense for CIFS and how to report possible congestion back
to the page manager.   I had been thinking about setting bdi->ra_pages
so that we do more sensible readahead and writebehind - better
matching what is possible over the network and what the server
prefers.    SMB/CIFS Servers typically allow a maximum of 50 requests
in parallel at one time from one client (although this is adjustable
for some). The CIFS client prefers to do writes 14 pages (an iovec of
56K) at a time (although many servers can efficiently handle multiple
of these 56K writes in parallel).  With minor changes CIFS could
handle even larger writes (to just under 64K for Windows and just
under 128K for Samba - the current CIFS Unix Extensions allow servers
to negotiate much larger writes, but lacking a "receivepage"
equivalent Samba does not currently support larger than 128K).
Ideally, to improve large file copy utilization, I would like to see
from 3-10 writes of 56K (or larger in the future) in parallel.   The
read path is harder since we only do 16K reads to Windows and Samba -
but we need to increase the number of these that are done in parallel
on the same inode.  There is a large Google Summer of Code patch for
this which needs more review.

-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Networked filesystems vs backing_dev_info
  2007-10-27 21:02 ` Steve French
@ 2007-10-27 21:30   ` Peter Zijlstra
  2007-10-27 21:37     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2007-10-27 21:30 UTC (permalink / raw)
  To: Steve French
  Cc: linux-kernel, linux-fsdevel, David Howells, sfrench, jaharkes,
	Andrew Morton, vandrove

[-- Attachment #1: Type: text/plain, Size: 2157 bytes --]

On Sat, 2007-10-27 at 16:02 -0500, Steve French wrote:
> On 10/27/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > Hi,
> >
> > I had me a little look at bdi usage in networked filesystems.
> >
> >  NFS, CIFS, (smbfs), AFS, CODA and NCP
> >
> > And of those, NFS is the only one that I could find that creates
> > backing_dev_info structures. The rest seems to fall back to
> > default_backing_dev_info.
> >
> > With my recent per bdi dirty limit patches the bdi has become more
> > important than it has been in the past. While falling back to the
> > default_backing_dev_info isn't wrong per-se, it isn't right either.
> >
> > Could I implore the various maintainers to look into this issue for
> > their respective filesystem. I'll try and come up with some patches to
> > address this, but feel free to beat me to it.
> 
> I would like to understand more about your patches to see what bdi
> values makes sense for CIFS and how to report possible congestion back
> to the page manager. 

So, what my recent patches do is carve up the total writeback cache
size, or dirty page limit as we call it, proportionally to a BDIs
writeout speed. So a fast device gets more than a slow device, but will
not starve it.

However, for this to work, each device, or remote backing store in the
case of networked filesystems, need to have a BDI.

>   I had been thinking about setting bdi->ra_pages
> so that we do more sensible readahead and writebehind - better
> matching what is possible over the network and what the server
> prefers.  

Well, you'd first have to create backing_dev_info instances before
setting that value :-)

>   SMB/CIFS Servers typically allow a maximum of 50 requests
> in parallel at one time from one client (although this is adjustable
> for some).

That seems like a perfect point to set congestion.

So in short, stick a struct backing_dev_info into whatever represents a
client, initialize it using bdi_init(), destroy using bdi_destroy().

Mark it congested once you have 50 (or more) outstanding requests, clear
congestion when you drop below 50.

and you should be set.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Networked filesystems vs backing_dev_info
  2007-10-27 21:30   ` Peter Zijlstra
@ 2007-10-27 21:37     ` Peter Zijlstra
  2007-10-28  7:46       ` Petr Vandrovec
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2007-10-27 21:37 UTC (permalink / raw)
  To: Steve French
  Cc: linux-kernel, linux-fsdevel, David Howells, sfrench, jaharkes,
	Andrew Morton, vandrove

On Sat, 2007-10-27 at 23:30 +0200, Peter Zijlstra wrote:
> On Sat, 2007-10-27 at 16:02 -0500, Steve French wrote:
> > On 10/27/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > Hi,
> > >
> > > I had me a little look at bdi usage in networked filesystems.
> > >
> > >  NFS, CIFS, (smbfs), AFS, CODA and NCP
> > >
> > > And of those, NFS is the only one that I could find that creates
> > > backing_dev_info structures. The rest seems to fall back to
> > > default_backing_dev_info.
> > >
> > > With my recent per bdi dirty limit patches the bdi has become more
> > > important than it has been in the past. While falling back to the
> > > default_backing_dev_info isn't wrong per-se, it isn't right either.
> > >
> > > Could I implore the various maintainers to look into this issue for
> > > their respective filesystem. I'll try and come up with some patches to
> > > address this, but feel free to beat me to it.
> > 
> > I would like to understand more about your patches to see what bdi
> > values makes sense for CIFS and how to report possible congestion back
> > to the page manager. 
> 
> So, what my recent patches do is carve up the total writeback cache
> size, or dirty page limit as we call it, proportionally to a BDIs
> writeout speed. So a fast device gets more than a slow device, but will
> not starve it.
> 
> However, for this to work, each device, or remote backing store in the
> case of networked filesystems, need to have a BDI.
> 
> >   I had been thinking about setting bdi->ra_pages
> > so that we do more sensible readahead and writebehind - better
> > matching what is possible over the network and what the server
> > prefers.  
> 
> Well, you'd first have to create backing_dev_info instances before
> setting that value :-)
> 
> >   SMB/CIFS Servers typically allow a maximum of 50 requests
> > in parallel at one time from one client (although this is adjustable
> > for some).
> 
> That seems like a perfect point to set congestion.
> 
> So in short, stick a struct backing_dev_info into whatever represents a
> client, initialize it using bdi_init(), destroy using bdi_destroy().

Oh, and the most important point, make your fresh I_NEW inodes point to
this bdi struct.

> Mark it congested once you have 50 (or more) outstanding requests, clear
> congestion when you drop below 50.
> 
> and you should be set.
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Networked filesystems vs backing_dev_info
  2007-10-27 21:37     ` Peter Zijlstra
@ 2007-10-28  7:46       ` Petr Vandrovec
  0 siblings, 0 replies; 7+ messages in thread
From: Petr Vandrovec @ 2007-10-28  7:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steve French, linux-kernel, linux-fsdevel, David Howells, sfrench,
	jaharkes, Andrew Morton

Peter Zijlstra wrote:
> On Sat, 2007-10-27 at 23:30 +0200, Peter Zijlstra wrote:
>> So in short, stick a struct backing_dev_info into whatever represents a
>> client, initialize it using bdi_init(), destroy using bdi_destroy().
> 
> Oh, and the most important point, make your fresh I_NEW inodes point to
> this bdi struct.
> 
>> Mark it congested once you have 50 (or more) outstanding requests, clear
>> congestion when you drop below 50.
>> and you should be set.

Thanks.  Unfortunately I do not think that NCPFS will switch to 
backing_dev_info - it uses pagecache only for symlinks and directories, 
and even if it would use pagecache, as most of servers refuse concurrent 
requests even if TCP is used as a transport, there can be only one 
request in flight...

						Petr

P.S.:  And if anyone wants to step in as ncpfs maintainer, feel free.  I 
did not see NetWare server for over year now...

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-10-28  8:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-27  9:34 Networked filesystems vs backing_dev_info Peter Zijlstra
2007-10-27 15:22 ` Jan Harkes
2007-10-27 15:32   ` Peter Zijlstra
2007-10-27 21:02 ` Steve French
2007-10-27 21:30   ` Peter Zijlstra
2007-10-27 21:37     ` Peter Zijlstra
2007-10-28  7:46       ` Petr Vandrovec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).