Re: [RFC] ext3: per-process soft-syncing data=ordered mode

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chris Mason <chris.mason@oracle.com>
To: Jan Kara <jack@suse.cz>
Cc: Al Boldi <a1426z@gawab.com>, Andreas Dilger <adilger@sun.com>,
	Chris Snook <csnook@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] ext3: per-process soft-syncing data=ordered mode
Date: Thu, 31 Jan 2008 12:14:54 -0500	[thread overview]
Message-ID: <200801311214.55287.chris.mason@oracle.com> (raw)
In-Reply-To: <20080131171040.GL1461@duck.suse.cz>

On Thursday 31 January 2008, Jan Kara wrote:
> On Thu 31-01-08 11:56:01, Chris Mason wrote:
> > On Thursday 31 January 2008, Al Boldi wrote:
> > > Andreas Dilger wrote:
> > > > On Wednesday 30 January 2008, Al Boldi wrote:
> > > > > And, a quick test of successive 1sec delayed syncs shows no hangs
> > > > > until about 1 minute (~180mb) of db-writeout activity, when the
> > > > > sync abruptly hangs for minutes on end, and io-wait shows almost
> > > > > 100%.
> > > >
> > > > How large is the journal in this filesystem?  You can check via
> > > > "debugfs -R 'stat <8>' /dev/XXX".
> > >
> > > 32mb.
> > >
> > > > Is this affected by increasing
> > > > the journal size?  You can set the journal size via "mke2fs -J
> > > > size=400" at format time, or on an unmounted filesystem by running
> > > > "tune2fs -O ^has_journal /dev/XXX" then "tune2fs -J size=400
> > > > /dev/XXX".
> > >
> > > Setting size=400 doesn't help, nor does size=4.
> > >
> > > > I suspect that the stall is caused by the journal filling up, and
> > > > then waiting while the entire journal is checkpointed back to the
> > > > filesystem before the next transaction can start.
> > > >
> > > > It is possible to improve this behaviour in JBD by reducing the
> > > > amount of space that is cleared if the journal becomes "full", and
> > > > also doing journal checkpointing before it becomes full.  While that
> > > > may reduce performance a small amount, it would help avoid such huge
> > > > latency problems. I believe we have such a patch in one of the Lustre
> > > > branches already, and while I'm not sure what kernel it is for the
> > > > JBD code rarely changes much....
> > >
> > > The big difference between ordered and writeback is that once the
> > > slowdown starts, ordered goes into ~100% iowait, whereas writeback
> > > continues 100% user.
> >
> > Does data=ordered write buffers in the order they were dirtied?  This
> > might explain the extreme problems in transactional workloads.
>
>   Well, it does but we submit them to block layer all at once so elevator
> should sort the requests for us...

nr_requests is fairly small, so a long stream of random requests should still 
end up being random IO.

Al, could you please compare the write throughput from vmstat for the 
data=ordered vs data=writeback runs?  I would guess the data=ordered one has 
a lower overall write throughput.

-chris

next prev parent reply	other threads:[~2008-01-31 17:16 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-24 20:36 [RFC] ext3: per-process soft-syncing data=ordered mode Al Boldi
2008-01-24 21:50 ` Diego Calleja
2008-01-24 21:50   ` Diego Calleja
2008-01-26  5:27   ` Al Boldi
2008-01-26  5:27     ` Al Boldi
2008-01-28 17:34     ` Jan Kara
2008-01-28 17:34       ` Jan Kara
2008-01-24 21:58 ` Valdis.Kletnieks
2008-01-26  5:27   ` Al Boldi
2008-01-25  1:19 ` Chris Snook
2008-01-26  5:28   ` Al Boldi
2008-01-29 17:22     ` Jan Kara
2008-01-30  6:04       ` Al Boldi
2008-01-30 14:29         ` Chris Mason
2008-01-30 18:39           ` Al Boldi
2008-01-31  0:32           ` Andreas Dilger
2008-01-31  6:20             ` Al Boldi
2008-01-31 16:56               ` Chris Mason
2008-01-31 17:10                 ` Jan Kara
2008-01-31 17:14                   ` Chris Mason [this message]
2008-02-01 21:26                     ` Al Boldi
2008-02-04 17:54                       ` Jan Kara
2008-02-05  7:07                         ` Al Boldi
2008-02-05 15:07                           ` Jan Kara
2008-02-05 19:20                             ` Al Boldi
2008-01-25  6:47 ` Andreas Dilger
2008-01-25 21:57   ` david
2008-01-25 15:36 ` Jan Kara
2008-01-26  5:27   ` Al Boldi
2008-01-28 17:27     ` Jan Kara
2008-01-28 20:17       ` Al Boldi
2008-02-07  0:00     ` Andreas Dilger
2008-02-10 14:54       ` Al Boldi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200801311214.55287.chris.mason@oracle.com \
    --to=chris.mason@oracle.com \
    --cc=a1426z@gawab.com \
    --cc=adilger@sun.com \
    --cc=csnook@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.