All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Stroetmann <stroetmann@ontolinux.com>
To: Daniel Phillips <daniel@phunq.net>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Subject: Re: [RFC][PATCH 1/2] Add a super operation for writeback
Date: Mon, 02 Jun 2014 10:30:07 +0200	[thread overview]
Message-ID: <538C360F.7020901@ontolinux.com> (raw)
In-Reply-To: <538B9DEE.20800@phunq.net>

On the 1st of June 2014 23:41, Daniel Phillips wrote:
> Hi,
>
> This is the first of four core changes we would like for Tux3. We
> start with a hard one and suggest a simple solution.
>
> The first patch in this series adds a new super operation to write
> back multiple inodes in a single call. The second patch applies to
> our linux-tux3 repository at git.kernel.org to demonstrate how this
> interface is used, and removes about 450 lines of workaround code.
>
> Traditionally, core kernel tells each mounted filesystems which
> dirty pages of which inodes should be flushed to disk, but
> unfortunately, is blissfully ignorant of filesystem-specific
> ordering constraints. This scheme was really invented for Ext2 and
> has not evolved much recently. Tux3, with its strong ordering and
> optimized delta commit, cannot tolerate random flushing and
> therefore takes responsibility for flush ordering itself. On the
> other hand, Tux3 has no good way to know when is the right time to
> flush, but core is good at that. This proposed API harmonizes those
> two competencies so that Tux3 and core each take care of what they
> are good at, and not more.
>
> The API extension consists of a new writeback hook and two helpers
> to be uses within the hook. The hook sits about halfway down the
> fs-writeback stack, just after core has determined that some dirty
> inodes should be flushed to disk and just before it starts thinking
> about which inodes those should be. At that point, core calls Tux3
> instead of continuing on down the usual do_writepages path. Tux3
> responds by staging a new delta commit, using the new helpers to
> tell core which inodes were flushed versus being left dirty in
> cache. This is pretty much the same behavior as the traditional
> writeout path, but less convoluted, probably more efficient, and
> certainly easier to analyze.
>
> The new writeback hook looks like:
>
>      progress = sb->s_op->writeback(sb,&writeback_control,&nr_pages);
>
> This should be self-explanatory: nr_pages and progress have the
> semantics of existing usage in fs-writeback; writeback_control is
> ignored by Tux3, but that is only because Tux3 always flushes
> everything and does not require hints for now. We can safely assume
> that&wbc or equivalent is wanted here. An obvious wart is the
> overlap between "progress" and "nr_pages", but fs-writeback thinks
> that way, so it would not make much sense to improve one without
> improving the other.
>
> Tux3 necessarily keeps its own dirty inode list, which is an area
> of overlap with fs-writeback. In a perfect world, there would be
> just one dirty inode list per superblock, on which both fs-writeback
> and Tux3 would operate. That would be a deeper core change than
> seems appropriate right now.
>
> Potential races are a big issue with this API, which is no surprise.
> The fs-writeback scheme requires keeping several kinds of object in
> sync: tux3 dirty inode lists, fs-writeback dirty inode lists and
> inode dirty state. The new helpers inode_writeback_done(inode) and
> inode_writeback_touch(inode) take care of that while hiding
> internal details of the fs-writeback implementation.
>
> Tux3 calls inode_writeback_done when it has flushed an inode and
> marked it clean, or calls inode_writeback_touch if it intends to
> retain a dirty inode in cache. These have simple implementations.
> The former just removes a clean inode from any fs-writeback list.
> The latter updates the inode's dirty timestamp so that fs-writeback
> does not keep trying flush it. Both these things could be done more
> efficiently by re-engineering fs-writeback, but we prefer to work
> with the existing scheme for now.
>
> Hirofumi's masterful hack nicely avoided racy removal of inodes from
> the writeback list by taking an internal fs-writeback lock inside
> filesystem code. The new helper requires dropping i_lock inside
> filesystem code and retaking it in the helper, so inode redirty can
> race with writeback list removal. This does not seem to present a
> problem because filesystem code is able to enforce strict
> alternation of cleaning and calling the helper. As an offsetting
> advantage, writeback lock contention is reduced.
>
> Compared to Hirofumi's hack, the cost of this interface is one
> additional spinlock per inode_writeback_done,  which is
> insignificant compared to the convoluted code path that is avoided.
> Regards,
>
> Daniel
When I followed the advice of Dave Chinner:
"We're not going to merge that page forking stuff (like you were told at 
LSF 2013 more than a year ago: http://lwn.net/Articles/548091/) without 
rigorous design review and a demonstration of the solutions to all the 
hard corner cases it has"
given in his e-mail related with the presentation of the latest version 
of the Tux3 file system (see [1]) and read the linked article, I found 
in the second comments:
"Parts of this almost sound like it either a.) overlaps with or b.) 
would benefit greatly from something similar to Featherstitch [[2]]."

Could it be that we have with Featherstitch a general solution already 
that is said to be even "file system agnostic"?
Honestly, I thought that something like this would make its way into the 
Linux code base.



Have fun
Christian

[1] Dave Chinner Re: [RFC] Tux3 for review 
https://lkml.org/lkml/2014/5/18/158
[2] Featherstitch http://featherstitch.cs.ucla.edu/

  parent reply	other threads:[~2014-06-02  8:30 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-01 21:41 [RFC][PATCH 1/2] Add a super operation for writeback Daniel Phillips
2014-06-01 21:42 ` [RFC][PATCH 2/2] tux3: Use writeback hook to remove duplicated core code Daniel Phillips
2014-06-02  3:30   ` Dave Chinner
2014-06-02 20:07     ` Daniel Phillips
2014-06-02  3:15 ` [RFC][PATCH 1/2] Add a super operation for writeback Dave Chinner
2014-06-02 20:02   ` Daniel Phillips
2014-06-03  3:33     ` Dave Chinner
2014-06-03  7:01       ` Daniel Phillips
2014-06-03  7:26         ` Daniel Phillips
2014-06-03  7:47         ` OGAWA Hirofumi
2014-06-03  8:12           ` Dave Chinner
2014-06-03  8:57             ` OGAWA Hirofumi
2014-06-03  7:52         ` Dave Chinner
2014-06-03 14:05           ` Jan Kara
2014-06-03 14:14             ` Christoph Hellwig
2014-06-03 14:25               ` Theodore Ts'o
2014-06-03 15:21               ` Jan Kara
2014-06-03 22:37                 ` Daniel Phillips
2014-06-04 20:16                   ` Jan Kara
2014-06-02  8:30 ` Christian Stroetmann [this message]
2014-06-03  3:39   ` Dave Chinner
2014-06-03  5:30     ` Christian Stroetmann
2014-06-03 14:57       ` Theodore Ts'o
2014-06-03 16:30         ` Christian Stroetmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=538C360F.7020901@ontolinux.com \
    --to=stroetmann@ontolinux.com \
    --cc=akpm@linux-foundation.org \
    --cc=daniel@phunq.net \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.