Re: ext4 data=writeback performs worse than data=ordered now

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@us.ibm.com>
To: Shaohua Li <shaohua.li@intel.com>
Cc: "Ted Ts'o" <tytso@mit.edu>,
	"Wu, Fengguang" <fengguang.wu@intel.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, LKML <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: ext4 data=writeback performs worse than data=ordered now
Date: Thu, 15 Dec 2011 10:10:44 -0800	[thread overview]
Message-ID: <20111215181044.GD8233@tux1.beaverton.ibm.com> (raw)
In-Reply-To: <1323913345.22361.442.camel@sli10-conroe>

On Thu, Dec 15, 2011 at 09:42:25AM +0800, Shaohua Li wrote:
> On Thu, 2011-12-15 at 09:20 +0800, Darrick J. Wong wrote:
> > On Thu, Dec 15, 2011 at 09:02:57AM +0800, Shaohua Li wrote:
> > > On Wed, 2011-12-14 at 22:30 +0800, Ted Ts'o wrote:
> > > > On Wed, Dec 14, 2011 at 09:34:00PM +0800, Wu Fengguang wrote:
> > > > > Hi,
> > > > > 
> > > > > Shaohua recently found that ext4 writeback mode could perform worse
> > > > > than ordered mode in some cases. It may not be a big problem, however
> > > > > we'd like to share some information on our findings.
> > > > > 
> > > > > I tested both 3.2 and 3.1 kernels on normal SATA disks and USB key.
> > > > > The interesting thing is, data=writeback used to run a bit faster
> > > > > than data=ordered, however situation get inverted presumably by the
> > > > > IO-less dirty throttling.
> > > > 
> > > > Interesting.  What sort of workloads are you using to do these
> > > > measurements?  How many writer threads; I assume you are doing
> > > > sequential writes which are extending one or more files, etc?
> > > > 
> > > > I suspect it's due to the throttling meaning that each thread is
> > > > getting to send less data to the disk, and so there is more seeking
> > > > going on with data=writeback, where as with data=ordered, at each
> > > > journal commit we are forcing all of the dirty pages out to disk, one
> > > > inode at a time, and this is resulting in a more efficient writeback
> > > > compared to when the writeback code is getting to make its own choices
> > > > about how much each inode gets to write out at at time.
> > > > 
> > > > It would be interesting to see what would happen if in
> > > > ext4_da_writepages(), we completely ignore how many pages are
> > > > requested to be written back by the writeback code, and just simply
> > > > write back all of the dirty pages, and see if that brings the
> > > > performance back.
> > > I saw the issue in a machine with a LSI 1068e HBA card and 12 disks.
> > > there is about 20% performance regression with data=writeback comparing
> > > 3.1 and 3.2-rc. with data=order, there is small regression too.
> > > Reverting writeback changes recover the regression for both cases.
> > > 
> > > My investigation shows the block size writing to disk isn't changed with
> > > data=writeback. The block size is still very big, 256k IIRC, which is
> > > the max block size in the disks. And I just have one thread for each
> > > disk, so seek definitely isn't a problem in my workload.
> > > 
> > > I found sometimes one disk hasn't any request inflight, but we can't
> > > send request to the disk, because the scsi host's resource (the queue
> > > depth) is used out, looks we send too many requests from other disks and
> > > leave some disks starved. The resource imbalance in scsi isn't a new
> > 
> > I wonder, does the patch in:
> > http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/02339.html
> > help with this starvation problem?  I noticed a similar problem and sent a
> > patch, but LSI folks never responded.  Maybe two complaining users can change
> > that.  The biggest MaxQ I've seen on LSI SAS is 511, and the driver clamps the
> > value it passes to the SCSI layer to whatever the controller reports as its
> > MaxQ (in /proc/mpt/summary).
> this should recover the regression too. But I'm afraid it's just a
> workaround and will hide some issues. what if I have 120 disks instead
> of 12 disks? I observed one disk can burst 20 requests while the total
> the scsi host queue depth is 127, leaving other disks starved. I'm
> hoping to understand why there is such imbalance.

<shrug> I didn't say it would /fix/ the imbalanced-starvation problem, but we
might as well take full advantage of the hardware.  Even if all it does is
enable the user to plug in more disks before things get whacky, I was hoping
that someone else could at least give it a spin and say "Yes, this does what
it's alleged to do, and without breaking things". :)

afaict SCSI doesn't try to balance requests heading towards the HBA; it's all
FCFS.

--D
> 
> Thanks,
> Shaohua
>

next prev parent reply	other threads:[~2011-12-15 18:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-14 13:34 ext4 data=writeback performs worse than data=ordered now Wu Fengguang
     [not found] ` <20111214140025.GA19650@localhost>
2011-12-14 14:03   ` Wu Fengguang
2011-12-14 14:30 ` Ted Ts'o
2011-12-14 14:49   ` Wu Fengguang
2011-12-14 14:52   ` Tao Ma
2011-12-14 15:02     ` Wu Fengguang
2011-12-15  1:02   ` Shaohua Li
2011-12-15  1:00     ` Wu Fengguang
2011-12-15  1:27       ` NeilBrown
2011-12-15  1:34         ` Wu Fengguang
2011-12-15  5:02         ` Wu Fengguang
2011-12-15  1:20     ` Darrick J. Wong
2011-12-15  1:42       ` Shaohua Li
2011-12-15 18:10         ` Darrick J. Wong [this message]
2011-12-16  1:47           ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111215181044.GD8233@tux1.beaverton.ibm.com \
    --to=djwong@us.ibm.com \
    --cc=fengguang.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).