Re: [RFC PATCH] ext4: fix 50% disk write performance regression

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bill Fink <billfink@mindspring.com>
To: Ted Ts'o <tytso@mit.edu>
Cc: Bill Fink <bill@wizard.sci.gsfc.nasa.gov>,
	"adilger@sun.com" <adilger@sun.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"Fink, William E. (GSFC-6061)" <william.e.fink@nasa.gov>
Subject: Re: [RFC PATCH] ext4: fix 50% disk write performance regression
Date: Mon, 30 Aug 2010 21:14:37 -0400	[thread overview]
Message-ID: <20100830211437.765d117e.billfink@mindspring.com> (raw)
In-Reply-To: <20100831003710.GA4272@thunk.org>

On Mon, 30 Aug 2010, Ted Ts'o wrote:

> On Mon, Aug 30, 2010 at 04:49:58PM -0400, Bill Fink wrote:
> > > Thanks for reporting it.  I'm going to have to take a closer look at
> > > why this makes a difference.  I'm going to guess though that what's
> > > going on is that we're posting writes in such a way that they're no
> > > longer aligned or ending at the end of a RAID5 stripe, causing a
> > > read-modify-write pass.  That would easily explain the write
> > > performance regression.
> > 
> > I'm not sure I understand.  How could calling or not calling
> > ext4_num_dirty_pages() (unpatched versus patched 2.6.35 kernel)
> > affect the write alignment?
> 
> Suppose you have 8 disks, with stripe size of 16k.  Assuming that
> you're only using one parity disk (i.e., RAID 5) and no spare disks,
> that means the optimal I/O size is 7*16k == 112k.  If we do a write
> which is smaller than 112k, or which is not a multiple of 112k, then
> the RAID subsystem will need to do a read-modify-write to update the
> parity disk.  Furthermore, the write had better be aligned on an 112k
> byte boundary.  The block allocator will guarantee that block #0 is
> aligned on a 112k block, but writes have to also be right size in
> order to avoid the read-modify-write.
> 
> If we end up doing very small writes, then it can end up being quite
> disatrous for write performance.

I understand how unaligned writes can be very bad for performance.
That makes perfect sense.  What I don't understand is how just
calling or not calling ext4_num_dirty_pages() can affect the
write alignment, and that's the only difference between the
unpatched and patched 2.6.35 kernels.  I thought the only thing
ext4_num_dirty_pages does is to count the number of ext4 dirty
pages.  How can that counting affect the write alignment?  I
guess there must be some subtle side affect of ext4_num_dirty_pages
that I'm not getting.

> > I was wondering if the locking being done in ext4_num_dirty_pages()
> > could somehow be affecting the performance.  I did notice from top
> > that in the patched 2.6.35 kernel, the I/O wait time was generally
> > in the 60-65% range, while in the unpatched 2.6.35 kernel, it was
> > at a higher 75-80% range.  However, I don't know if that's just a
> > result of the lower performance, or a possible clue to its cause.
> 
> I/O wait time would tend to imply that the raid controller is taking
> longer to do the write updates, which would tend to confirm that we're
> doing more read-modify-write cycles.  If we were hitting spinlock
> contention, this would show up as more system CPU time consumed.

OK.  There wasn't more CPU utilization.  It was about proportionally
less in the bad case as the reduced level of performance.

						-Thanks

						-Bill

next prev parent reply	other threads:[~2010-08-31  1:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-30  3:11 [RFC PATCH] ext4: fix 50% disk write performance regression Bill Fink
2010-08-30 17:05 ` Eric Sandeen
2010-08-30 19:30   ` Bill Fink
2010-08-30 19:35     ` Eric Sandeen
2010-08-30 17:40 ` Ted Ts'o
2010-08-30 20:49   ` Bill Fink
2010-08-30 21:05     ` Eric Sandeen
     [not found]       ` <20100830194533.6d09c38b.bill@wizard.sci.gsfc.nasa.gov>
2010-08-30 23:53         ` Eric Sandeen
     [not found]           ` <20100830210541.8b248a14.billfink@mindspring.com>
     [not found]             ` <4C7C62E9.4090707@redhat.com>
2010-08-31  3:27               ` Bill Fink
2010-08-31  3:29                 ` Eric Sandeen
2010-08-31  0:37     ` Ted Ts'o
2010-08-31  0:51       ` Justin Maggard
2010-08-31  1:44         ` Bill Fink
2010-08-31  1:14       ` Bill Fink [this message]
2010-08-31  3:43 ` [PATCH] " Eric Sandeen
2010-08-31  4:26   ` Eric Sandeen
2010-08-31  4:53   ` Bill Fink
2010-08-31  5:05     ` Eric Sandeen
2010-08-31  5:31       ` Bill Fink
2010-09-09  0:23       ` Daniel Taylor
2010-09-09  3:29         ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100830211437.765d117e.billfink@mindspring.com \
    --to=billfink@mindspring.com \
    --cc=adilger@sun.com \
    --cc=bill@wizard.sci.gsfc.nasa.gov \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=william.e.fink@nasa.gov \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.