From: Steven Whitehouse <swhiteho@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 7/7] writeback: timestamp based bdi dirty_exceeded state
Date: Tue, 21 Jun 2011 11:00:03 +0100 [thread overview]
Message-ID: <1308650403.2762.12.camel@menhir> (raw)
In-Reply-To: <20110620200950.GA18537@infradead.org>
Hi,
On Mon, 2011-06-20 at 16:09 -0400, Christoph Hellwig wrote:
> On Sun, Jun 19, 2011 at 11:01:15PM +0800, Wu Fengguang wrote:
> > When there are only one (or several) dirtiers, dirty_exceeded is always
> > (or mostly) off. Converting to timestamp avoids this problem. It helps
> > to use smaller write_chunk for smoother throttling.
>
> In current mainline gfs2 has grown a non-trivial reference to
> backing_dev_info.dirty_exceeded, which needs to be dealt with.
>
So let me try and explain whats going on there... the basic issue is
that writeback is done on a per-inode basis, but pages are accounted for
on a per-address space basis.
In GFS2, glocks referring to inodes and rgrps (resource groups) both
have an address space associated with them. These address spaces contain
the metadata that would normally be in the block device address space,
but have been separated so that we can sync and/or invalidate metadata
easily on a per-inode basis. Note that we have the additional
requirement to be able to track clean metadata, so that the existing
per-inode list of dirty metadata doesn't work for GFS2. Due to the
lifetime rules for the glocks, and the lack of an inode for rgrps, the
mapping->host for the glock address spaces has to point at the block
device inode.
Now in the normal inode case, that isn't a problem - writeback calls
->write_inode which can then write out the dirty metadata pages (if
any). The issue we've hit has been with rgrps and in particular if the
total dirty data associated with rgrps exceeds the per-bdi dirty limit.
In that case we found that writeback was spinning without making any
progress since it was trying to writeback inodes (all by that stage
clean) and it didn't have any way to start writeback on rgrps. So the
simplest solution was to check the dirty exceeded flag during inode
writeback, and if set try writing back more data than actually requested
via the ail lists. This list contains all the dirty metadata, so it
includes the rgrps too. Due to the way in which rgrps are used, it is
impossible to dirty one without also dirtying at least one inode.
In addition to that, the ordering of data blocks on the ail list is
often more optimal (especially for workloads with lots of small files)
and we get a performance improvement by doing writeback that way too.
Having said that, I know its not ideal, and I'm open to any suggestions
for better solutions,
Steve.
next prev parent reply other threads:[~2011-06-21 10:00 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-19 15:01 [PATCH 0/7] more writeback patches for 3.1 Wu Fengguang
2011-06-19 15:01 ` [PATCH 1/7] writeback: consolidate variable names in balance_dirty_pages() Wu Fengguang
2011-06-20 7:45 ` Christoph Hellwig
2011-06-19 15:01 ` [PATCH 2/7] writeback: add parameters to __bdi_update_bandwidth() Wu Fengguang
2011-06-19 15:31 ` Christoph Hellwig
2011-06-19 15:35 ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 3/7] writeback: introduce smoothed global dirty limit Wu Fengguang
2011-06-19 15:36 ` Christoph Hellwig
2011-06-19 15:55 ` Wu Fengguang
2011-06-21 23:59 ` Andrew Morton
2011-06-22 14:11 ` Wu Fengguang
2011-06-20 21:18 ` Jan Kara
2011-06-21 14:24 ` Wu Fengguang
2011-06-22 0:04 ` Andrew Morton
2011-06-22 14:24 ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 4/7] writeback: introduce max-pause and pass-good dirty limits Wu Fengguang
2011-06-22 0:20 ` Andrew Morton
2011-06-23 13:18 ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 5/7] writeback: make writeback_control.nr_to_write straight Wu Fengguang
2011-06-19 15:35 ` Christoph Hellwig
2011-06-19 16:14 ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 6/7] writeback: scale IO chunk size up to half device bandwidth Wu Fengguang
2011-06-19 15:01 ` [PATCH 7/7] writeback: timestamp based bdi dirty_exceeded state Wu Fengguang
2011-06-20 20:09 ` Christoph Hellwig
2011-06-21 10:00 ` Steven Whitehouse [this message]
2011-06-20 21:38 ` Jan Kara
2011-06-21 15:07 ` Wu Fengguang
2011-06-21 21:14 ` Jan Kara
2011-06-22 14:37 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1308650403.2762.12.camel@menhir \
--to=swhiteho@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).