linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [Linux-cachefs] Re: NFS Patch for FSCache
@ 2005-05-12 22:43 Lever, Charles
  2005-05-13 11:17 ` David Howells
  0 siblings, 1 reply; 12+ messages in thread
From: Lever, Charles @ 2005-05-12 22:43 UTC (permalink / raw)
  To: David Howells, SteveD
  Cc: linux-fsdevel, Linux filesystem caching discussion list

preface:  i think this is interesting important work.

> Steve Dickson <SteveD@redhat.com> wrote:
> 
> > But the real saving, imho, is the fact those reads were 
> measured after the
> > filesystem was umount then remounted. So system wise, there 
> should be some
> > gain due to the fact that NFS is not using the network....

i expect to see those gains when either the network and server are
slower than the client's local disk, or when the cached files are
significantly larger than the client's local RAM.  these conditions will
not always be the case, so i'm interested to know how performance is
affected when the system is running outside this sweet spot.

> I tested md5sum read speed also. My testbox is a dual 200MHz 
> PPro. It's got
> 128MB of RAM. I've got a 100MB file on the NFS server for it to read.
> 
> 	No Cache:	~14s
> 	Cold Cache:	~15s
> 	Warm Cache:	~2s
> 
> Now these numbers are approximate because they're from memory.

to benchmark this i think you need to explore the architectural
weaknesses of your approach.  how bad will it get using cachefs with
badly designed applications or client/server setups?

for instance, what happens when the client's cache disk is much slower
than the server (high performance RAID with high speed networking)?
what happens when the client's cache disk fills up so the disk cache is
constantly turning over (which files are kicked out of your backing
cachefs to make room for new data)?  what happens with multi-threaded
I/O-bound applications when the cachefs is on a single spindle?  is
there any performance dependency on the size of the backing cachefs?

do you also cache directory contents on disk?

remember that the application you designed this for (preserving cache
contents across client reboots) is only one way this will be used.  some
of us would like to use this facility to provide a high-performance
local cache larger than the client's RAM.  :^)

> Note that a cold cache is worse than no cache because CacheFS 
> (a) has to check
> the disk before NFS goes to the server, and (b) has to 
> journal the allocations
> of new data blocks. It may also have to wait whilst pages are 
> written to disk
> before it can get new ones rather than just dropping them 
> (100MB is big enough
> wrt 128MB that this will happen) and 100MB is sufficient to 
> cause it to start
> using single- and double-indirection pointers to find its 
> blocks on disk,
> though these are cached in the page cache.

synchronous file system metadata management is the bane of every cachefs
implementation i know about.  have you measured what performance impact
there is when cache files go from no indirection to single indirect
blocks, or from single to double indirection?  have you measured how
expensive it is to reuse a single cache file because the cachefs file
system is already full?  how expensive is it to invalidate the data in
the cache (say, if some other client changes a file you already have
cached in your cachefs)?

what about using an extent-based file system for the backing cachefs?
that would probably not be too difficult because you have a good
prediction already of how large the file will be (just look at the file
size on the server).

how about using smallish chunks, like the AFS cache manager, to avoid
indirection entirely?  would there be any performance advantage to
caching small files in memory and large files on disk, or vice versa?

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: Re: NFS Patch for FSCache
@ 2005-05-18 16:32 Lever, Charles
  2005-05-18 17:49 ` David Howells
  0 siblings, 1 reply; 12+ messages in thread
From: Lever, Charles @ 2005-05-18 16:32 UTC (permalink / raw)
  To: David Howells, Linux filesystem caching discussion list
  Cc: linux-fsdevel, SteveD

> > If not, we can gain something by using an underlying FS 
> with lazy writes.
> 
> Yes, to some extent. There's still the problem of filesystem 
> integrity to deal
> with, and lazy writes hold up journal closure. This isn't 
> necessarily a
> problem, except when you want to delete and launder a block 
> that has a write
> hanging over it. It's not unsolvable, just tricky.
> 
> Besides, what do you mean by lazy?

as i see it, you have two things to guarantee:

1.  attributes cached on the disk are either up to date, or clearly out
of date (otherwise there's no way to tell whether cached data is stale
or not), and

2.  the consistency of the backing file system must be maintained.

in fact, you don't need to maintain data coherency up to the very last
moment, since the client is pushing data to the server for permanent
storage.  cached data in the local backing FS can be out of date after a
client reboot without any harm whatever, so it doesn't matter a wit that
the on-disk state of the backing FS trails the page cache.

(of course you do need to sync up completely with the server if you
intend to use CacheFS for disconnected operation, but that can be
handled by "umount" rather than keeping strict data coherency all the
time).

it also doesn't matter if the backing FS can't keep up with the server.
the failure mode can be graceful, so that as the backing FS becomes
loaded, it passes more requests back to the server and caches less data
and fewer requests.  this is how it works when there is more data to
cache than there is space to cache it; it should work the same way if
the I/O rate is higher than the backing FS can handle.

> Actually, probably the biggest bottleneck is the disk block allocator.

in my experience with the AFS cache manager, this is exactly the
problem.  the ideal case is where the backing FS behaves a lot like swap
-- just get the bits down onto disk in any location, without any
sophisticated management of free space.  the problem is keeping track of
the data blocks during a client crash/reboot.

the real problem arises when the cache is full and you want to cache a
new file.  the cache manager must choose a file to reclaim, release all
the blocks for that file, then immediately reallocate them for the new
file.  all of this is synchronous activity.

are there advantages to a log-structured file system for this purpose?

is there a good way to trade disk space for the performance of your
block allocator?

> Well, with infinitely fast disk and network, very little - 
> you can afford to
> be profligate on your turnover of disk space, and this 
> affects the options you
> might choose in designing your cache.

in fact, with an infinitely fast server and network, there would be no
need for local caching at all.  so maybe that's not such an interesting
thing to consider.

it might be more appropriate to design, configure, and measure CacheFS
with real typical network and server latency numbers in mind.

> Reading one really big file (bigger than the memory 
> available) over AFS, with
> a cold cache it took very roughly 107% of the time it took 
> with no cache; but
> using a warm cache, it took 14% of the time it took with no 
> cache. However,
> this is on my particular test box, and it varies a lot from 
> box to box.

david, what is the behavior when the file that needs to be cached is
larger than the backing file system?  for example, what happens when
some client application starts reading a large media file that won't fit
entirely in the cache?

^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: NFS Patch for FSCache
  2005-05-09 21:19   ` Andrew Morton
@ 2005-05-10 18:43 Steve Dickson
  2005-05-09 10:31 ` Steve Dickson
  0 siblings, 1 reply; 12+ messages in thread
From: Steve Dickson @ 2005-05-10 18:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, linux-cachefs

Andrew Morton wrote:
> Steve Dickson <SteveD@redhat.com> wrote:
> 
>>Attached is a patch that enables NFS to use David Howells'
>>File System Caching implementation (FSCache).
> 
> 
> Do you have any performance results for this?
I haven't done any formal performance testing, but from
the functionality testing I've done, I've seen
a ~20% increase in reads speed (verses otw reads).
Mainly due to the fact NFS only needs to do getattrs
and such when the data is cached. But buyer beware...
this a very rough number, so mileage may very. ;-)

I don't have a number for writes, (maybe David does)
but I'm sure there will be a penalty to cache that
data, but its something that can be improve over time.

But the real saving, imho, is the fact those
reads were measured after the filesystem was
umount then remounted. So system wise, there
should be some gain due to the fact that NFS
is not using the network....

steved.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-05-19  6:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-12 22:43 [Linux-cachefs] Re: NFS Patch for FSCache Lever, Charles
2005-05-13 11:17 ` David Howells
2005-05-14  2:08   ` Troy Benjegerdes
2005-05-16 12:47   ` [Linux-cachefs] " David Howells
2005-05-17 21:42     ` David Masover
2005-05-18 10:28     ` [Linux-cachefs] " David Howells
2005-05-19  2:18       ` Troy Benjegerdes
2005-05-19  6:48         ` David Masover
  -- strict thread matches above, loose matches on Subject: below --
2005-05-18 16:32 Lever, Charles
2005-05-18 17:49 ` David Howells
2005-05-10 18:43 Steve Dickson
2005-05-09 10:31 ` Steve Dickson
2005-05-09 21:19   ` Andrew Morton
2005-05-10 19:12     ` [Linux-cachefs] " David Howells
2005-05-14  2:18       ` Troy Benjegerdes
2005-05-16 13:30       ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).