All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: "npiggin@suse.de" <npiggin@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"hch@infradead.org" <hch@infradead.org>,
	"chris.mason@oracle.com" <chris.mason@oracle.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [RFC] [PATCH] Avoid livelock for fsync
Date: Wed, 4 Nov 2009 19:32:11 +0800	[thread overview]
Message-ID: <20091104113211.GA23859@localhost> (raw)
In-Reply-To: <20091103145654.GC29873@duck.suse.cz>

Jan,

On Tue, Nov 03, 2009 at 10:56:54PM +0800, Jan Kara wrote:
[snip]
> > > inodes really does not make any sence - why should be a page in a file
> > > penalized and written later only because there are lots of other dirty
> > > pages in the file? It is enough to make sure that we don't write one file
> > > indefinitely when there are new dirty pages continuously created - and my
> > > patch achieves that.
> > 
> > This is a big policy change. Imagine dirty files A=4GB, B=C=D=1MB.
> > With current policy, it could be
> > 
> >         sync 4MB of A
> >         sync B
> >         sync C
> >         sync D
> >         sync 4MB of A
> >         sync 4MB of A
> >         ...
> > 
> > And you want to change to
> > 
> >         sync A (all 4GB)
> >         sync B
> >         sync C
> >         sync D
> > 
> > This means the writeback of B,C,D won't be able to start at 30s, but
> > delayed to 80s because of A. This is not entirely fair. IMHO writeback
> > of big files shall not delay small files too much. 
>   Yes, I'm aware of this change. It's just that I'm not sure we really
> care. There are few reasons to this: What advantage does it bring that we
> are "fair among files"? User can only tell the difference if after a crash,

I'm not all that sure, too. The perception is, big files normally
contain less valuable information per-page than small files ;)

If crashed, it's much better to lose one single big file, than to lose
all the (big and small) files.

Maybe nobody really care that - sync() has always been working file
after file (ignoring nr_to_write) and no one complained.

> files he wrote long time ago are still not on disk. But we shouldn't
> accumulate too many dirty data (like minutes of writeback) in caches
> anyway... So the difference should not be too big. Also how is the case
> "one big and a few small files" different from the case "many small files"
> where to be fair among files does not bring anything? 
>   It's just that see some substantial code complexity and also performance
> impact (because of smaller chunks of sequential IO) in trying to be fair
> among files and I don't really see adequate advantages of that approach.
> That's why I'm suggesting we should revisit the decision and possibly go in
> a different direction.

Anyway, if this is not a big concern, nr_to_write could be removed.

Note that requeue_io() (or requeue_io_wait) still cannot be removed
because sometimes we have (temporary) problems on writeback an inode.

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: "npiggin@suse.de" <npiggin@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"hch@infradead.org" <hch@infradead.org>,
	"chris.mason@oracle.com" <chris.mason@oracle.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [RFC] [PATCH] Avoid livelock for fsync
Date: Wed, 4 Nov 2009 19:32:11 +0800	[thread overview]
Message-ID: <20091104113211.GA23859@localhost> (raw)
In-Reply-To: <20091103145654.GC29873@duck.suse.cz>

Jan,

On Tue, Nov 03, 2009 at 10:56:54PM +0800, Jan Kara wrote:
[snip]
> > > inodes really does not make any sence - why should be a page in a file
> > > penalized and written later only because there are lots of other dirty
> > > pages in the file? It is enough to make sure that we don't write one file
> > > indefinitely when there are new dirty pages continuously created - and my
> > > patch achieves that.
> > 
> > This is a big policy change. Imagine dirty files A=4GB, B=C=D=1MB.
> > With current policy, it could be
> > 
> >         sync 4MB of A
> >         sync B
> >         sync C
> >         sync D
> >         sync 4MB of A
> >         sync 4MB of A
> >         ...
> > 
> > And you want to change to
> > 
> >         sync A (all 4GB)
> >         sync B
> >         sync C
> >         sync D
> > 
> > This means the writeback of B,C,D won't be able to start at 30s, but
> > delayed to 80s because of A. This is not entirely fair. IMHO writeback
> > of big files shall not delay small files too much. 
>   Yes, I'm aware of this change. It's just that I'm not sure we really
> care. There are few reasons to this: What advantage does it bring that we
> are "fair among files"? User can only tell the difference if after a crash,

I'm not all that sure, too. The perception is, big files normally
contain less valuable information per-page than small files ;)

If crashed, it's much better to lose one single big file, than to lose
all the (big and small) files.

Maybe nobody really care that - sync() has always been working file
after file (ignoring nr_to_write) and no one complained.

> files he wrote long time ago are still not on disk. But we shouldn't
> accumulate too many dirty data (like minutes of writeback) in caches
> anyway... So the difference should not be too big. Also how is the case
> "one big and a few small files" different from the case "many small files"
> where to be fair among files does not bring anything? 
>   It's just that see some substantial code complexity and also performance
> impact (because of smaller chunks of sequential IO) in trying to be fair
> among files and I don't really see adequate advantages of that approach.
> That's why I'm suggesting we should revisit the decision and possibly go in
> a different direction.

Anyway, if this is not a big concern, nr_to_write could be removed.

Note that requeue_io() (or requeue_io_wait) still cannot be removed
because sometimes we have (temporary) problems on writeback an inode.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-11-04 11:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-26 18:13 [RFC] [PATCH] Avoid livelock for fsync Jan Kara
2009-10-26 18:13 ` Jan Kara
2009-10-27  3:39 ` Nick Piggin
2009-10-27  3:39   ` Nick Piggin
2009-10-27  9:17   ` Jan Kara
2009-10-27  9:17     ` Jan Kara
2009-10-27 13:56 ` Nikanth Karthikesan
2009-10-27 13:56   ` Nikanth Karthikesan
2009-10-27 15:32   ` Jan Kara
2009-10-27 15:32     ` Jan Kara
2009-10-28 21:47 ` Andrew Morton
2009-10-28 21:47   ` Andrew Morton
2009-11-02  3:34   ` Nick Piggin
2009-11-02  3:34     ` Nick Piggin
2009-11-03 13:14 ` Wu Fengguang
2009-11-03 13:14   ` Wu Fengguang
2009-11-03 14:56   ` Jan Kara
2009-11-03 14:56     ` Jan Kara
2009-11-04 11:32     ` Wu Fengguang [this message]
2009-11-04 11:32       ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091104113211.GA23859@localhost \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.