linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolai Joukov <kolya@cs.sunysb.edu>
To: Al Boldi <a1426z@gawab.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-raid@vger.kernel.org
Subject: Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems
Date: Sat, 16 Dec 2006 12:39:45 -0500 (EST)	[thread overview]
Message-ID: <Pine.GSO.4.53.0612161118010.24531@compserv1> (raw)
In-Reply-To: <200612161635.49502.a1426z@gawab.com>

> I am looking at filling the net-pipe, and it only reaches 40-75% max, with
> some short 100% bursts, and a slow 10% start.  It seems that caching
> somewhat delays the writes, which then batch up and sync at various speeds.
> So you have the cache really hiding slow sync speeds.  To tune this, it may
> be helpful to turn off caching, which in turn could surface the actual
> bottlenecks.

Well, RAIF is an external kernel module and as such cannot change the
caching behavior much.  The notorious problem of all Linux stackable file
systems is double-caching of data.  Every stackable file system caches the
data at its own level and copies it from/to the lower file system's cached
pages when necessary.  This has some advantages for file systems that
change the data (e.g., encrypt or compress).  However, this effectively
reduces the system's cache memory size by two or more times.

Among all the existing stackable file systems, Tracefs and RAIF have the
highest requirements for low overheads.  Here are the solutions to the
double-caching problem that we tried:

1. Redirect read and write requests to the lower file systems directly.
   This allows to avoid caching of data at the RAIF level.  However, this
   optimization must be turned off as soon as a file is mmap'ed to avoid
   cache inconsistencies.  Also, even if the file is not mmap'ed, RAIF1
   still keeps the data copies for all the lower branches.  This
   optimization is implemented in RAIF but is not turned on by default.
   (We strip this and many other #ifdef'ed code fragments from the
   code releases automatically.)

2. We cache the data at the RAIF level.  When we write to the lower file
   systems we do allocate the lower pages and do copy the data but we
   also mark the lower pages with PG_reclaim flag before calling the
   lower writepage operation.  This releases all the lower pages right
   after the write completes.  This works fine for mmap'ed files and this
   is the default RAIF behavior now.  This solves the problem for most
   workloads that mix reads and writes.  For example, it improved
   Postmark's performance several times.  Unfortunately, this optimization
   does not improve performance for big sequential writes - the workload
   that you tried.  So essentially, you had a quarter of your original page
   cache while running your workload.

3. A known ideal solution for this problem is sharing of the cached pages
   between file systems.  We attempted to do it for Tracefs but the
   resulting code is not beautiful and is potentially racy:
   <http://marc.theaimsgroup.com/?l=linux-fsdevel&m=113193082115222&w=2>
   Unfortunately, for fan-out file systems this solution requires even
   more support from the OS.  However, this is what most OSs do
   (including BSD and Windows) but unfortunately not Linux :-(

> little overhead.  So for RAIF to be viable, it needs to have low overhead,
> which doesn't seem impossible to implement, given RAIF's simple but
> beautiful approach.

Thanks a lot!

Nikolai.
---------------------
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University

  parent reply	other threads:[~2006-12-16 17:39 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-13 17:47 [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems Nikolai Joukov
2006-12-13 19:02 ` Phillip Susi
2006-12-13 19:17   ` Nikolai Joukov
2006-12-13 19:32     ` Nikolai Joukov
2006-12-13 19:33 ` Jan Engelhardt
2006-12-13 19:57   ` Nikolai Joukov
2006-12-13 19:57 ` Al Boldi
2006-12-14 21:01   ` Nikolai Joukov
2006-12-14 21:30     ` Charles Manning
2006-12-15 16:48       ` Nikolai Joukov
2006-12-14 22:48     ` berk walker
2006-12-15  5:02     ` Al Boldi
2006-12-15 17:41       ` Nikolai Joukov
     [not found]         ` <200612161635.49502.a1426z@gawab.com>
2006-12-16 17:39           ` Nikolai Joukov [this message]
2006-12-19 17:50             ` stacked filesystem cache waste Bryan Henderson
     [not found]             ` <200612172059.07941.a1426z@gawab.com>
2006-12-23  3:21               ` [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems Nikolai Joukov
2006-12-14 11:12 ` Al Boldi
2006-12-14 23:44   ` Nikolai Joukov
2006-12-15  5:03     ` Al Boldi
2006-12-15 18:47       ` Nikolai Joukov
2006-12-15 12:47 ` Ed Tomlinson
2006-12-15 20:11   ` Nikolai Joukov
2006-12-15 23:58     ` Ed Tomlinson
2006-12-16  0:20       ` Bryan Henderson
2006-12-16  1:20         ` Nikolai Joukov
2006-12-16 14:46         ` Ed Tomlinson
2006-12-16 17:57           ` Nikolai Joukov
2006-12-16  0:02 ` David Lang
2006-12-16  0:58   ` Nikolai Joukov
  -- strict thread matches above, loose matches on Subject: below --
2006-12-15  1:13 Nikolai Joukov
     [not found] <OF582D7197.D6F604B1-ON88257248.0069CC60-88257248.006AE165@us.ibm.com>
2006-12-25 15:13 ` Nikolai Joukov
2007-01-06  5:17 Chaitanya Patti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.GSO.4.53.0612161118010.24531@compserv1 \
    --to=kolya@cs.sunysb.edu \
    --cc=a1426z@gawab.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).