From: tytso@mit.edu
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>, Jens Axboe <jens.axboe@oracle.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
jengelh@medozas.de, stable@kernel.org, gregkh@suse.de
Subject: Re: [PATCH] writeback: Fix broken sync writeback
Date: Tue, 16 Feb 2010 23:30:09 -0500 [thread overview]
Message-ID: <20100217043009.GZ5337@thunk.org> (raw)
In-Reply-To: <alpine.LFD.2.00.1002161848370.4141@localhost.localdomain>
On Tue, Feb 16, 2010 at 07:35:35PM -0800, Linus Torvalds wrote:
> > writeback_single_inode()
> > ...writes 1024 pages.
> > if we haven't written everything in the inode (more than 1024 dirty
> > pages) we end up doing either requeue_io() or redirty_tail(). In the
> > first case the inode is put to b_more_io list, in the second case to
> > the tail of b_dirty list. In either case it will not receive further
> > writeout until we go through all other members of current b_io list.
> >
> > So I claim we currently *do* switch to another inode after 4 MB. That
> > is a fact.
>
> Ok, I think that's the bug. I do agree that it may well be intentional,
> but considering the performance impact, I suspect it's been "intentional
> without any performance numbers".
This is well known amongst file system developers. We've even raised
it from time to time, but apparently most people are too scared to
touch the writeback code. I proposed upping the limit some six months
ago, but I got serious pushback. As a result, I followed XFS's lead,
and so now, both XFS and ext4 will write more pages than what is
requested by the writeback logic, to work around this bug.....
What we really want to do is to time how fast the device is. If the
device is some Piece of Sh*t USB stick, then maybe you only want to
write 4MB at a time to avoid latency problems. Heck, maybe you only
want to write 32k at a time, if it's really slow.... But if it's some
super-fast RAID array, maybe you want to write a lot more than 4MB at
a time.
We've had this logic for a long time, and given the increase in disk
density, and spindle speeds, the 4MB limit, which might have made
sense 10 years ago, probably doesn't make sense now.
> If it's bad for synchronous syncs, then it's bad for background syncing
> too, and I'd rather get rid of the MAX_WRITEBACK_PAGES thing entirely -
> since the whole latency argument goes away if we don't always honor it
> ("Oh, we have good latency - _except_ if you do 'sync()' to synchronously
> write something out" - that's just insane).
I tried arguing for this six months ago, and got the argument that it
might cause latency problems on slow USB sticks. So I added a forced
override for ext4, which now writes 128MB at a time --- with a sysfs
tuning knob that allow the old behaviour to be restored if users
really complained. No one did actually complain....
- Ted
next prev parent reply other threads:[~2010-02-17 4:30 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-12 9:16 [PATCH] writeback: Fix broken sync writeback Jens Axboe
2010-02-12 15:45 ` Linus Torvalds
2010-02-13 12:58 ` Jan Engelhardt
2010-02-15 14:49 ` Jan Kara
2010-02-15 15:41 ` Jan Engelhardt
2010-02-15 15:58 ` Jan Kara
2010-06-27 16:44 ` Jan Engelhardt
2010-10-24 23:41 ` Sync writeback still broken Jan Engelhardt
2010-10-30 0:57 ` Linus Torvalds
2010-10-30 1:16 ` Linus Torvalds
2010-10-30 1:30 ` Linus Torvalds
2010-10-30 3:18 ` Andrew Morton
2010-10-30 13:15 ` Christoph Hellwig
2010-10-31 12:24 ` Jan Kara
2010-10-31 22:40 ` Jan Kara
2010-11-05 21:33 ` Jan Kara
2010-11-05 21:34 ` Jan Kara
2010-11-05 21:41 ` Linus Torvalds
2010-11-05 22:03 ` Jan Engelhardt
2010-11-07 12:57 ` Jan Kara
2011-01-20 22:50 ` Jan Engelhardt
2011-01-21 15:09 ` Jan Kara
2010-02-15 14:17 ` [PATCH] writeback: Fix broken sync writeback Jan Kara
2010-02-16 0:05 ` Linus Torvalds
2010-02-16 23:00 ` Jan Kara
2010-02-16 23:34 ` Linus Torvalds
2010-02-17 0:01 ` Linus Torvalds
2010-02-17 1:33 ` Jan Kara
2010-02-17 1:57 ` Dave Chinner
2010-02-17 3:35 ` Linus Torvalds
2010-02-17 4:30 ` tytso [this message]
2010-02-17 5:16 ` Linus Torvalds
2010-02-22 17:29 ` Jan Kara
2010-02-22 21:01 ` tytso
2010-02-22 22:26 ` Jan Kara
2010-02-23 2:53 ` Dave Chinner
2010-02-23 3:23 ` tytso
2010-02-23 5:53 ` Dave Chinner
2010-02-24 14:56 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100217043009.GZ5337@thunk.org \
--to=tytso@mit.edu \
--cc=gregkh@suse.de \
--cc=jack@suse.cz \
--cc=jengelh@medozas.de \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox