From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Andrew Morton <akpm@linux-foundation.org>,
Theodore Ts'o <tytso@mit.edu>,
Chris Mason <chris.mason@oracle.com>
Subject: Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach)
Date: Tue, 29 Mar 2011 10:41:20 +0800 [thread overview]
Message-ID: <20110329024120.GA9416@localhost> (raw)
In-Reply-To: <20110329021458.GF3008@dastard>
On Tue, Mar 29, 2011 at 10:14:58AM +0800, Dave Chinner wrote:
> -printable
> Content-Length: 2034
> Lines: 51
>
> On Mon, Mar 28, 2011 at 10:44:45AM +0800, Wu Fengguang wrote:
> > On Sat, Mar 26, 2011 at 07:05:44AM +0800, Jan Kara wrote:
> > > And actually the NFS traces you pointed to originally seem to be different
> > > problem, in fact not directly related to what balance_dirty_pages() does...
> > > And with local filesystem the results seem to be reasonable (although there
> > > are some longer sleeps in your JBOD measurements I don't understand yet).
> >
> > Yeah the NFS case can be improved on the FS side (for now you may just
> > reuse my NFS patches and focus on other generic improvements).
> >
> > The JBOD issue is also beyond my understanding.
> >
> > Note that XFS will also see one big IO completion per 0.5-1 seconds,
> > when we are to increase the write chunk size from the current 4MB to
> > near the bdi's write bandwidth. As illustrated by this graph:
> >
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/4G/xfs-1dd-1M-8p-3927M-20%25-2.6.38-rc6-dt6+-2011-02-27-22-58/global_dirtied_written-500.png
>
> Which is _bad_.
>
> Increasing the writeback chunk size simply causes dirty queue
> starvation issues when there are lots of dirty files and lots more
> memory than there is writeback bandwidth. Think of a machine with
> 1TB of RAM (that's a 200GB dirty limit) and 1GB/s of disk
> throughput. Thats 3 minutes worth of writeback and increasing the
> chunk size to ~1s worth of throughput means that the 200th dirty
> file won't get serviced for 3 minutes....
>
> We used to have behaviour similar to this this (prior to 2.6.16, IIRC),
> and it caused all sorts of problems where people were losing 10-15
> minute old data when the system crashed because writeback didn't
> process the dirty inode list fast enough in the presence of lots of
> large files....
Yes it is a problem, and can be best solved by automatically lowering
bdi dirty limit to (bdi->write_bandwidth * dirty_expire_interval/100).
Then we reliably control the lost data size to < 30s by default.
> A small writeback chunk size has no adverse impact on XFS as long as
> the elevator does it's job of merging IOs (which in 99.9% of cases
> it does) so I'm wondering what the reason for making this change
> is.
It's explained in this changelog (is the XFS paragraph still valid?)
https://patchwork.kernel.org/patch/605151/
The larger write chunk size generally helps ext4 and RAID setups.
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-29 2:41 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-08 22:31 [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) Jan Kara
2011-03-08 22:31 ` Jan Kara
2011-03-08 22:31 ` [PATCH 1/5] writeback: account per-bdi accumulated written pages Jan Kara
2011-03-08 22:31 ` Jan Kara
2011-03-08 22:31 ` [PATCH 2/5] mm: Properly reflect task dirty limits in dirty_exceeded logic Jan Kara
2011-03-08 22:31 ` Jan Kara
2011-03-09 21:02 ` Vivek Goyal
2011-03-14 20:44 ` Jan Kara
2011-03-14 20:44 ` Jan Kara
2011-03-15 15:21 ` Vivek Goyal
2011-03-08 22:31 ` [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() Jan Kara
2011-03-08 22:31 ` Jan Kara
2011-03-10 0:07 ` Vivek Goyal
2011-03-14 20:48 ` Jan Kara
2011-03-14 20:48 ` Jan Kara
2011-03-15 15:23 ` Vivek Goyal
2011-03-16 21:26 ` Curt Wohlgemuth
2011-03-16 22:53 ` Curt Wohlgemuth
2011-03-16 22:53 ` Curt Wohlgemuth
2011-03-16 16:53 ` Vivek Goyal
2011-03-16 19:10 ` Jan Kara
2011-03-16 19:31 ` Vivek Goyal
2011-03-16 19:58 ` Jan Kara
2011-03-16 19:58 ` Jan Kara
2011-03-16 20:22 ` Vivek Goyal
2011-03-08 22:31 ` [PATCH 4/5] mm: Remove low limit from sync_writeback_pages() Jan Kara
2011-03-08 22:31 ` [PATCH 5/5] mm: Autotune interval between distribution of page completions Jan Kara
2011-03-08 22:31 ` Jan Kara
2011-03-17 15:46 ` [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) Curt Wohlgemuth
2011-03-17 15:46 ` Curt Wohlgemuth
2011-03-17 15:51 ` Christoph Hellwig
2011-03-17 15:51 ` Christoph Hellwig
2011-03-17 16:24 ` Curt Wohlgemuth
2011-03-17 16:24 ` Curt Wohlgemuth
2011-03-17 16:43 ` Christoph Hellwig
2011-03-17 16:43 ` Christoph Hellwig
2011-03-17 17:32 ` Jan Kara
2011-03-17 17:32 ` Jan Kara
2011-03-17 18:55 ` Curt Wohlgemuth
2011-03-17 18:55 ` Curt Wohlgemuth
2011-03-17 22:56 ` Vivek Goyal
2011-03-17 22:56 ` Vivek Goyal
2011-03-18 14:30 ` Wu Fengguang
2011-03-18 14:30 ` Wu Fengguang
2011-03-22 21:43 ` Jan Kara
2011-03-22 21:43 ` Jan Kara
2011-03-23 4:41 ` Dave Chinner
2011-03-23 4:41 ` Dave Chinner
2011-03-25 12:59 ` Wu Fengguang
2011-03-25 12:59 ` Wu Fengguang
2011-03-25 13:44 ` Wu Fengguang
2011-03-25 23:05 ` Jan Kara
2011-03-25 23:05 ` Jan Kara
2011-03-28 2:44 ` Wu Fengguang
2011-03-28 2:44 ` Wu Fengguang
2011-03-28 15:08 ` Jan Kara
2011-03-28 15:08 ` Jan Kara
2011-03-29 1:44 ` Wu Fengguang
2011-03-29 1:44 ` Wu Fengguang
2011-03-29 2:14 ` Dave Chinner
2011-03-29 2:41 ` Wu Fengguang [this message]
2011-03-29 5:59 ` Dave Chinner
2011-03-29 5:59 ` Dave Chinner
2011-03-29 7:31 ` Wu Fengguang
2011-03-29 7:52 ` Wu Fengguang
2011-03-29 7:52 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110329024120.GA9416@localhost \
--to=fengguang.wu@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.