From: Jan Kara <jack@suse.cz>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
Daniel Phillips <daniel@phunq.net>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
Jens Axboe <axboe@kernel.dk>
Subject: Re: [RFC][PATCH 1/2] Add a super operation for writeback
Date: Tue, 3 Jun 2014 17:21:55 +0200 [thread overview]
Message-ID: <20140603152155.GD30706@quack.suse.cz> (raw)
In-Reply-To: <20140603141444.GA21273@infradead.org>
On Tue 03-06-14 07:14:44, Christoph Hellwig wrote:
> On Tue, Jun 03, 2014 at 04:05:31PM +0200, Jan Kara wrote:
> > So we currently flush inodes in first dirtied first written back order when
> > superblock is not specified in writeback work. That completely ignores the
> > fact to which superblock inode belongs but I don't see per-sb fairness to
> > actually make any sense when
> > 1) flushing old data (to keep promise set in dirty_expire_centisecs)
> > 2) flushing data to reduce number of dirty pages
> > And these are really the only two cases where we don't do per-sb flushing.
> >
> > Now when filesystems want to do something more clever (and I can see
> > reasons for that e.g. when journalling metadata, even more so when
> > journalling data) I agree we need to somehow implement the above two types
> > of writeback using per-sb flushing. Type 1) is actually pretty easy - just
> > tell each sb to writeback dirty data upto time T. Type 2) is more difficult
> > because that is more openended task - it seems similar to what shrinkers do
> > but that would require us to track per sb amount of dirty pages / inodes
> > and I'm not sure we want to add even more page counting statistics...
> > Especially since often bdi == fs. Thoughts?
>
> Honestly I think doing per-bdi writeback has been a major mistake. As
> you said it only even matters when we have filesystems on multiple
> partitions on a single device, and even then only in a simple setup,
> as soon as we use LVM or btrfs this sort of sharing stops to happen
> anyway. I don't even see much of a benefit except that we prevent
> two flushing daemons to congest a single device for that special case
> of multiple filesystems on partitions of the same device, and that could
> be solved in other ways.
So I agree per-bdi / per-sb matters only in simple setups but machines
with single rotating disk with several partitions and without LVM aren't
that rare AFAICT from my experience. And I agree we went for per-bdi
flushing to avoid two threads congesting a single device leading to
suboptimal IO patterns during background writeback.
So currently I'm convinced we want to go for per-sb dirty tracking. That
also makes some speedups in that code noticeably simpler. I'm not convinced
about the per-sb flushing thread - if we don't regress the multiple sb on
bdi case when we just let the threads from different superblocks contend
for IO, then that would be a natural thing to do. But once we have to
introduce some synchronization between threads to avoid regressions, I
think it might be easier to just stay with per-bdi thread which switches
between superblocks.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2014-06-03 15:21 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-01 21:41 [RFC][PATCH 1/2] Add a super operation for writeback Daniel Phillips
2014-06-01 21:42 ` [RFC][PATCH 2/2] tux3: Use writeback hook to remove duplicated core code Daniel Phillips
2014-06-02 3:30 ` Dave Chinner
2014-06-02 20:07 ` Daniel Phillips
2014-06-02 3:15 ` [RFC][PATCH 1/2] Add a super operation for writeback Dave Chinner
2014-06-02 20:02 ` Daniel Phillips
2014-06-03 3:33 ` Dave Chinner
2014-06-03 7:01 ` Daniel Phillips
2014-06-03 7:26 ` Daniel Phillips
2014-06-03 7:47 ` OGAWA Hirofumi
2014-06-03 8:12 ` Dave Chinner
2014-06-03 8:57 ` OGAWA Hirofumi
2014-06-03 7:52 ` Dave Chinner
2014-06-03 14:05 ` Jan Kara
2014-06-03 14:14 ` Christoph Hellwig
2014-06-03 14:25 ` Theodore Ts'o
2014-06-03 15:21 ` Jan Kara [this message]
2014-06-03 22:37 ` Daniel Phillips
2014-06-04 20:16 ` Jan Kara
2014-06-02 8:30 ` Christian Stroetmann
2014-06-03 3:39 ` Dave Chinner
2014-06-03 5:30 ` Christian Stroetmann
2014-06-03 14:57 ` Theodore Ts'o
2014-06-03 16:30 ` Christian Stroetmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140603152155.GD30706@quack.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=daniel@phunq.net \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.