linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Boldi <a1426z@gawab.com>
To: Jan Kara <jack@suse.cz>
Cc: Chris Mason <chris.mason@oracle.com>,
	Andreas Dilger <adilger@sun.com>, Chris Snook <csnook@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] ext3: per-process soft-syncing data=ordered mode
Date: Tue, 5 Feb 2008 10:07:44 +0300	[thread overview]
Message-ID: <200802051007.44139.a1426z@gawab.com> (raw)
In-Reply-To: <20080204175415.GH3426@duck.suse.cz>

Jan Kara wrote:
> On Sat 02-02-08 00:26:00, Al Boldi wrote:
> > Chris Mason wrote:
> > > Al, could you please compare the write throughput from vmstat for the
> > > data=ordered vs data=writeback runs?  I would guess the data=ordered
> > > one has a lower overall write throughput.
> >
> > That's what I would have guessed, but it's actually going up 4x fold for
> > mysql from 559mb to 2135mb, while the db-size ends up at 549mb.
>
>   So you say we write 4-times as much data in ordered mode as in writeback
> mode. Hmm, probably possible because we force all the dirty data to disk
> when committing a transation in ordered mode (and don't do this in
> writeback mode). So if the workload repeatedly dirties the whole DB, we
> are going to write the whole DB several times in ordered mode but in
> writeback mode we just keep the data in memory all the time. But this is
> what you ask for if you mount in ordered mode so I wouldn't consider it a
> bug.

Ok, maybe not a bug, but a bit inefficient.  Check out this workload:

sync;

while :; do
  dd < /dev/full > /mnt/sda2/x.dmp bs=1M count=20
  rm -f /mnt/sda2/x.dmp
  usleep 10000
done

vmstat 1 ( with mount /dev/sda2 /mnt/sda2 -o data=writeback) << note io-bo >>

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 2  0      0 293008   5232  57436    0    0     0     0   18   206  4 80 16  0
 1  0      0 282840   5232  67620    0    0     0     0   18   238  3 81 16  0
 1  0      0 297032   5244  53364    0    0     0   152   21   211  4 79 17  0
 1  0      0 285236   5244  65224    0    0     0     0   18   232  4 80 16  0
 1  0      0 299464   5244  50880    0    0     0     0   18   222  4 80 16  0
 1  0      0 290156   5244  60176    0    0     0     0   18   236  3 80 17  0
 0  0      0 302124   5256  47788    0    0     0   152   21   213  4 80 16  0
 1  0      0 292180   5256  58248    0    0     0     0   18   239  3 81 16  0
 1  0      0 287452   5256  62444    0    0     0     0   18   202  3 80 17  0
 1  0      0 293016   5256  57392    0    0     0     0   18   250  4 80 16  0
 0  0      0 302052   5256  47788    0    0     0     0   19   194  3 81 16  0
 1  0      0 297536   5268  52928    0    0     0   152   20   233  4 79 17  0
 1  0      0 286468   5268  63872    0    0     0     0   18   212  3 81 16  0
 1  0      0 301572   5268  48812    0    0     0     0   18   267  4 79 17  0
 1  0      0 292636   5268  57776    0    0     0     0   18   208  4 80 16  0
 1  0      0 302124   5280  47788    0    0     0   152   21   237  4 80 16  0
 1  0      0 291436   5280  58976    0    0     0     0   18   205  3 81 16  0
 1  0      0 302068   5280  47788    0    0     0     0   18   234  3 81 16  0
 1  0      0 293008   5280  57388    0    0     0     0   18   221  4 79 17  0
 1  0      0 297288   5292  52532    0    0     0   156   22   233  2 81 16  1
 1  0      0 294676   5292  55724    0    0     0     0   19   199  3 81 16  0


vmstat 1 (with mount /dev/sda2 /mnt/sda2 -o data=ordered)

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 2  0      0 291052   5156  59016    0    0     0     0   19   223  3 82 15  0
 1  0      0 291408   5156  58704    0    0     0     0   18   218  3 81 16  0
 1  0      0 291888   5156  58276    0    0     0    20   23   229  3 80 17  0
 1  0      0 300764   5168  49472    0    0     0 12864   91   235  3 69 13 15
 1  0      0 300740   5168  49456    0    0     0     0   19   215  3 80 17  0
 1  0      0 301088   5168  49044    0    0     0     0   18   241  4 80 16  0
 1  0      0 298220   5168  51872    0    0     0     0   18   225  3 81 16  0
 0  1      0 289168   5168  60752    0    0     0 12712   45   237  3 77 15  5
 1  0      0 300260   5180  49852    0    0     0   152   68   211  4 72 15  9
 1  0      0 298616   5180  51460    0    0     0     0   18   237  3 81 16  0
 1  0      0 296988   5180  53092    0    0     0     0   18   223  3 81 16  0
 1  0      0 296608   5180  53480    0    0     0     0   18   223  3 81 16  0
 0  0      0 301640   5192  48036    0    0     0 12868   93   206  4 67 13 16
 0  0      0 301624   5192  48036    0    0     0     0   21   218  3 81 16  0
 0  0      0 301600   5192  48036    0    0     0     0   18   212  3 81 16  0
 0  0      0 301584   5192  48036    0    0     0     0   18   209  4 80 16  0
 0  0      0 301568   5192  48036    0    0     0     0   18   208  3 81 16  0
 1  0      0 285520   5204  64548    0    0     0 12864   95   216  3 69 13 15
 2  0      0 285124   5204  64924    0    0     0     0   18   222  4 80 16  0
 1  0      0 283612   5204  66392    0    0     0     0   18   231  3 81 16  0
 1  0      0 284216   5204  65736    0    0     0     0   18   218  4 80 16  0
 0  1      0 289160   5204  60752    0    0     0 12712   56   213  3 74 15  8
 1  0      0 285884   5216  64128    0    0     0   152   54   209  4 75 15  6
 1  0      0 287472   5216  62572    0    0     0     0   18   223  3 81 16  0

Do you think these 12mb redundant writeouts could be buffered?

(Note: you may need to adjust dd count and usleep to see the same effect)

> I still don't like your hack with per-process journal mode setting
> but we could easily do per-file journal mode setting (we already have a
> flag to do data journaling for a file) and that would help at least your
> DB workload...

Well, that depends on what kind of db you use.  mysql creates db's as a dir,
and then manages the tables and indexes as files inside that dir.  So I don't
think this flag would be feasible for that use-case.  Much easier to just say:

  echo 1 > /proc/`pidof mysqld`/soft-sync

But the per-file flag could definitely help the file-mmap case, and as such
could be a great additional feature in combination to this RFC.


Thanks!

--
Al


  reply	other threads:[~2008-02-05  7:10 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-24 20:36 [RFC] ext3: per-process soft-syncing data=ordered mode Al Boldi
2008-01-24 21:50 ` Diego Calleja
2008-01-26  5:27   ` Al Boldi
2008-01-28 17:34     ` Jan Kara
2008-01-24 21:58 ` Valdis.Kletnieks
2008-01-26  5:27   ` Al Boldi
2008-01-25  1:19 ` Chris Snook
2008-01-26  5:28   ` Al Boldi
2008-01-29 17:22     ` Jan Kara
2008-01-30  6:04       ` Al Boldi
2008-01-30 14:29         ` Chris Mason
2008-01-30 18:39           ` Al Boldi
2008-01-31  0:32           ` Andreas Dilger
2008-01-31  6:20             ` Al Boldi
2008-01-31 16:56               ` Chris Mason
2008-01-31 17:10                 ` Jan Kara
2008-01-31 17:14                   ` Chris Mason
2008-02-01 21:26                     ` Al Boldi
2008-02-04 17:54                       ` Jan Kara
2008-02-05  7:07                         ` Al Boldi [this message]
2008-02-05 15:07                           ` Jan Kara
2008-02-05 19:20                             ` Al Boldi
2008-01-25  6:47 ` Andreas Dilger
2008-01-25 21:57   ` david
2008-01-25 15:36 ` Jan Kara
2008-01-26  5:27   ` Al Boldi
2008-01-28 17:27     ` Jan Kara
2008-01-28 20:17       ` Al Boldi
2008-02-07  0:00     ` Andreas Dilger
2008-02-10 14:54       ` Al Boldi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200802051007.44139.a1426z@gawab.com \
    --to=a1426z@gawab.com \
    --cc=adilger@sun.com \
    --cc=chris.mason@oracle.com \
    --cc=csnook@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).