All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Boldi <a1426z@gawab.com>
To: Jan Kara <jack@suse.cz>
Cc: Chris Mason <chris.mason@oracle.com>,
	Andreas Dilger <adilger@sun.com>, Chris Snook <csnook@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] ext3: per-process soft-syncing data=ordered mode
Date: Tue, 5 Feb 2008 10:07:44 +0300	[thread overview]
Message-ID: <200802051007.44139.a1426z@gawab.com> (raw)
In-Reply-To: <20080204175415.GH3426@duck.suse.cz>

Jan Kara wrote:
> On Sat 02-02-08 00:26:00, Al Boldi wrote:
> > Chris Mason wrote:
> > > Al, could you please compare the write throughput from vmstat for the
> > > data=ordered vs data=writeback runs?  I would guess the data=ordered
> > > one has a lower overall write throughput.
> >
> > That's what I would have guessed, but it's actually going up 4x fold for
> > mysql from 559mb to 2135mb, while the db-size ends up at 549mb.
>
>   So you say we write 4-times as much data in ordered mode as in writeback
> mode. Hmm, probably possible because we force all the dirty data to disk
> when committing a transation in ordered mode (and don't do this in
> writeback mode). So if the workload repeatedly dirties the whole DB, we
> are going to write the whole DB several times in ordered mode but in
> writeback mode we just keep the data in memory all the time. But this is
> what you ask for if you mount in ordered mode so I wouldn't consider it a
> bug.

Ok, maybe not a bug, but a bit inefficient.  Check out this workload:

sync;

while :; do
  dd < /dev/full > /mnt/sda2/x.dmp bs=1M count=20
  rm -f /mnt/sda2/x.dmp
  usleep 10000
done

vmstat 1 ( with mount /dev/sda2 /mnt/sda2 -o data=writeback) << note io-bo >>

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 2  0      0 293008   5232  57436    0    0     0     0   18   206  4 80 16  0
 1  0      0 282840   5232  67620    0    0     0     0   18   238  3 81 16  0
 1  0      0 297032   5244  53364    0    0     0   152   21   211  4 79 17  0
 1  0      0 285236   5244  65224    0    0     0     0   18   232  4 80 16  0
 1  0      0 299464   5244  50880    0    0     0     0   18   222  4 80 16  0
 1  0      0 290156   5244  60176    0    0     0     0   18   236  3 80 17  0
 0  0      0 302124   5256  47788    0    0     0   152   21   213  4 80 16  0
 1  0      0 292180   5256  58248    0    0     0     0   18   239  3 81 16  0
 1  0      0 287452   5256  62444    0    0     0     0   18   202  3 80 17  0
 1  0      0 293016   5256  57392    0    0     0     0   18   250  4 80 16  0
 0  0      0 302052   5256  47788    0    0     0     0   19   194  3 81 16  0
 1  0      0 297536   5268  52928    0    0     0   152   20   233  4 79 17  0
 1  0      0 286468   5268  63872    0    0     0     0   18   212  3 81 16  0
 1  0      0 301572   5268  48812    0    0     0     0   18   267  4 79 17  0
 1  0      0 292636   5268  57776    0    0     0     0   18   208  4 80 16  0
 1  0      0 302124   5280  47788    0    0     0   152   21   237  4 80 16  0
 1  0      0 291436   5280  58976    0    0     0     0   18   205  3 81 16  0
 1  0      0 302068   5280  47788    0    0     0     0   18   234  3 81 16  0
 1  0      0 293008   5280  57388    0    0     0     0   18   221  4 79 17  0
 1  0      0 297288   5292  52532    0    0     0   156   22   233  2 81 16  1
 1  0      0 294676   5292  55724    0    0     0     0   19   199  3 81 16  0


vmstat 1 (with mount /dev/sda2 /mnt/sda2 -o data=ordered)

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 2  0      0 291052   5156  59016    0    0     0     0   19   223  3 82 15  0
 1  0      0 291408   5156  58704    0    0     0     0   18   218  3 81 16  0
 1  0      0 291888   5156  58276    0    0     0    20   23   229  3 80 17  0
 1  0      0 300764   5168  49472    0    0     0 12864   91   235  3 69 13 15
 1  0      0 300740   5168  49456    0    0     0     0   19   215  3 80 17  0
 1  0      0 301088   5168  49044    0    0     0     0   18   241  4 80 16  0
 1  0      0 298220   5168  51872    0    0     0     0   18   225  3 81 16  0
 0  1      0 289168   5168  60752    0    0     0 12712   45   237  3 77 15  5
 1  0      0 300260   5180  49852    0    0     0   152   68   211  4 72 15  9
 1  0      0 298616   5180  51460    0    0     0     0   18   237  3 81 16  0
 1  0      0 296988   5180  53092    0    0     0     0   18   223  3 81 16  0
 1  0      0 296608   5180  53480    0    0     0     0   18   223  3 81 16  0
 0  0      0 301640   5192  48036    0    0     0 12868   93   206  4 67 13 16
 0  0      0 301624   5192  48036    0    0     0     0   21   218  3 81 16  0
 0  0      0 301600   5192  48036    0    0     0     0   18   212  3 81 16  0
 0  0      0 301584   5192  48036    0    0     0     0   18   209  4 80 16  0
 0  0      0 301568   5192  48036    0    0     0     0   18   208  3 81 16  0
 1  0      0 285520   5204  64548    0    0     0 12864   95   216  3 69 13 15
 2  0      0 285124   5204  64924    0    0     0     0   18   222  4 80 16  0
 1  0      0 283612   5204  66392    0    0     0     0   18   231  3 81 16  0
 1  0      0 284216   5204  65736    0    0     0     0   18   218  4 80 16  0
 0  1      0 289160   5204  60752    0    0     0 12712   56   213  3 74 15  8
 1  0      0 285884   5216  64128    0    0     0   152   54   209  4 75 15  6
 1  0      0 287472   5216  62572    0    0     0     0   18   223  3 81 16  0

Do you think these 12mb redundant writeouts could be buffered?

(Note: you may need to adjust dd count and usleep to see the same effect)

> I still don't like your hack with per-process journal mode setting
> but we could easily do per-file journal mode setting (we already have a
> flag to do data journaling for a file) and that would help at least your
> DB workload...

Well, that depends on what kind of db you use.  mysql creates db's as a dir,
and then manages the tables and indexes as files inside that dir.  So I don't
think this flag would be feasible for that use-case.  Much easier to just say:

  echo 1 > /proc/`pidof mysqld`/soft-sync

But the per-file flag could definitely help the file-mmap case, and as such
could be a great additional feature in combination to this RFC.


Thanks!

--
Al


  reply	other threads:[~2008-02-05  7:11 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-24 20:36 [RFC] ext3: per-process soft-syncing data=ordered mode Al Boldi
2008-01-24 21:50 ` Diego Calleja
2008-01-24 21:50   ` Diego Calleja
2008-01-26  5:27   ` Al Boldi
2008-01-26  5:27     ` Al Boldi
2008-01-28 17:34     ` Jan Kara
2008-01-28 17:34       ` Jan Kara
2008-01-24 21:58 ` Valdis.Kletnieks
2008-01-26  5:27   ` Al Boldi
2008-01-25  1:19 ` Chris Snook
2008-01-26  5:28   ` Al Boldi
2008-01-29 17:22     ` Jan Kara
2008-01-30  6:04       ` Al Boldi
2008-01-30 14:29         ` Chris Mason
2008-01-30 18:39           ` Al Boldi
2008-01-31  0:32           ` Andreas Dilger
2008-01-31  6:20             ` Al Boldi
2008-01-31 16:56               ` Chris Mason
2008-01-31 17:10                 ` Jan Kara
2008-01-31 17:14                   ` Chris Mason
2008-02-01 21:26                     ` Al Boldi
2008-02-04 17:54                       ` Jan Kara
2008-02-05  7:07                         ` Al Boldi [this message]
2008-02-05 15:07                           ` Jan Kara
2008-02-05 19:20                             ` Al Boldi
2008-01-25  6:47 ` Andreas Dilger
2008-01-25 21:57   ` david
2008-01-25 15:36 ` Jan Kara
2008-01-26  5:27   ` Al Boldi
2008-01-28 17:27     ` Jan Kara
2008-01-28 20:17       ` Al Boldi
2008-02-07  0:00     ` Andreas Dilger
2008-02-10 14:54       ` Al Boldi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200802051007.44139.a1426z@gawab.com \
    --to=a1426z@gawab.com \
    --cc=adilger@sun.com \
    --cc=chris.mason@oracle.com \
    --cc=csnook@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.