public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dave Chinner <david@fromorbit.com>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mel@csn.ul.ie>,
	Christoph Hellwig <hch@infradead.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 06/18] writeback: sync expired inodes first in background writeback
Date: Fri, 27 May 2011 23:06:26 +0800	[thread overview]
Message-ID: <20110527150625.GA5031@localhost> (raw)
In-Reply-To: <20110526231045.GN5123@quack.suse.cz>

On Fri, May 27, 2011 at 07:10:45AM +0800, Jan Kara wrote:
> On Wed 25-05-11 22:38:57, Wu Fengguang wrote:
> > > and I was wondering: Assume there is one continuously redirtied file and
> > > untar starts in parallel. With the new logic, background writeback will
> > > never consider inodes that are not expired in this situation (we never
> > > switch to "all dirty inodes" phase - or even if we switched, we would just
> > > queue all inodes and then return back to queueing only expired inodes). So
> > > the net effect is that for 30 seconds we will be only continuously writing
> > > pages of the continuously dirtied file instead of (possibly older) pages of
> > > other files that are written. Is this really desirable? Wasn't the old
> > > behavior simpler and not worse than the new one?
> > 
> > Good question! Yes sadly in this case the new behavior could be worse
> > than the old one.
> > 
> > In fact this patch do not improve the small files (< 4MB) case at all,
> > except for the side effect that less unexpired inodes will leave in
> > s_io when the background work quit and the later kupdate work will
> > write less unexpired inodes.
> > 
> > And for the mixed small/large files case, it actually results in worse
> > behavior on your mentioned case.
> > 
> > However the root cause here is the file being _actively_ written to,
> > somehow a livelock scheme. We could add a simple livelock prevention
> > scheme that works for the common case of file appending:
> > 
> > - save i_size when the range_cyclic writeback starts from 0, for
> >   limiting the writeback scope
>   Hmm, but for this we'd have to store additional 'unsigned long' (page
> index) for each inode. Not sure if it's really worth it.

Yeah, it may be considerable space cost when icache grows large.

> > - when range_cyclic writeback hits the saved i_size, quit the current
> >   inode instead of immediately restarting from 0. This will not only
> >   avoid a possible extra seek, but also redirty_tail() the inode and
> >   hence get out of possible livelock.
>   But I like the idea of doing redirty_tail() when we write out some inode
> for too long.

Then there is the hard question of "how long time is enough time to
redirty_tail()?". The time should be able to avoid livelocking the
dirty pages near EOF when the large file is being written to all over
the places (random writes or multiple sequential write streams), and
yet still be small enough to be useful. 

> Maybe we could just do redirty_tail() instead of requeue_io()
> whenever write_cache_pages() had to wrap the index? We could communicate
> this by setting a flag in wbc in write_cache_pages()...

That's the minimal required for the problem here. What do you think
about this old patch that happen to contain the required side effects?

        writeback: quit on wrap for .range_cyclic (ext4 part)
        http://lkml.org/lkml/2009/10/7/129

> > The livelock prevention scheme may not only eliminate the undesirable
> > behavior you observed for this patch, but also prevent the "some old
> > pages may not get the chance to get written to disk in an actively
> > dirtied file" data security issue discussed in an old email. What do
> > you think?
>   So my scheme would not solve this but it does not require per-inode
> overhead...

Another possible scheme is to tag the dirty inode immediate after
requeue_io(), if it's not yet tagged.  Accordingly let
write_cache_pages() write only the tagged pages as long as
mapping_tagged(TOWRITE), and to always update dirtied_when on
writeback index wrap.

Thanks,
Fengguang

  reply	other threads:[~2011-05-27 15:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-24  5:14 [PATCH 00/18] writeback fixes and cleanups for 2.6.40 (v4) Wu Fengguang
2011-05-24  5:14 ` [PATCH 01/18] writeback: introduce .tagged_writepages for the WB_SYNC_NONE sync stage Wu Fengguang
2011-05-24  5:14 ` [PATCH 02/18] writeback: update dirtied_when for synced inode to prevent livelock Wu Fengguang
2011-05-24  5:14 ` [PATCH 03/18] writeback: introduce writeback_control.inodes_cleaned Wu Fengguang
2011-05-24  5:14 ` [PATCH 04/18] writeback: try more writeback as long as something was written Wu Fengguang
2011-05-24  5:14 ` [PATCH 05/18] writeback: the kupdate expire timestamp should be a moving target Wu Fengguang
2011-05-24  5:14 ` [PATCH 06/18] writeback: sync expired inodes first in background writeback Wu Fengguang
2011-05-24 15:52   ` Jan Kara
2011-05-25 14:38     ` Wu Fengguang
2011-05-26 23:10       ` Jan Kara
2011-05-27 15:06         ` Wu Fengguang [this message]
2011-05-27 15:17       ` Wu Fengguang
2011-05-24  5:14 ` [PATCH 07/18] writeback: refill b_io iff empty Wu Fengguang
2011-05-24  5:14 ` [PATCH 08/18] writeback: split inode_wb_list_lock into bdi_writeback.list_lock Wu Fengguang
2011-05-24  5:14 ` [PATCH 09/18] writeback: elevate queue_io() into wb_writeback() Wu Fengguang
2011-05-24  5:14 ` [PATCH 10/18] writeback: avoid extra sync work at enqueue time Wu Fengguang
2011-05-24  5:14 ` [PATCH 11/18] writeback: add bdi_dirty_limit() kernel-doc Wu Fengguang
2011-05-24  5:14 ` [PATCH 12/18] writeback: skip balance_dirty_pages() for in-memory fs Wu Fengguang
2011-05-24  5:14 ` [PATCH 13/18] writeback: remove writeback_control.more_io Wu Fengguang
2011-05-24  5:14 ` [PATCH 14/18] writeback: remove .nonblocking and .encountered_congestion Wu Fengguang
2011-05-24  5:14 ` [PATCH 15/18] writeback: trace event writeback_single_inode Wu Fengguang
2011-05-24  5:14 ` [PATCH 16/18] writeback: trace event writeback_queue_io Wu Fengguang
2011-05-24  5:14 ` [PATCH 17/18] writeback: make writeback_control.nr_to_write straight Wu Fengguang
2011-05-24  5:14 ` [PATCH 18/18] writeback: rearrange the wb_writeback() loop Wu Fengguang
2011-05-29  7:34 ` [PATCH 00/18] writeback fixes and cleanups for 2.6.40 (v4) Sedat Dilek
  -- strict thread matches above, loose matches on Subject: below --
2011-05-19 21:45 [PATCH 00/18] writeback fixes and cleanups for 2.6.40 (v3) Wu Fengguang
2011-05-19 21:45 ` [PATCH 06/18] writeback: sync expired inodes first in background writeback Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110527150625.GA5031@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox