From: Al Boldi <a1426z@gawab.com>
To: Jan Kara <jack@suse.cz>
Cc: Chris Mason <chris.mason@oracle.com>,
Andreas Dilger <adilger@sun.com>, Chris Snook <csnook@redhat.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] ext3: per-process soft-syncing data=ordered mode
Date: Tue, 5 Feb 2008 10:07:44 +0300 [thread overview]
Message-ID: <200802051007.44139.a1426z@gawab.com> (raw)
In-Reply-To: <20080204175415.GH3426@duck.suse.cz>
Jan Kara wrote:
> On Sat 02-02-08 00:26:00, Al Boldi wrote:
> > Chris Mason wrote:
> > > Al, could you please compare the write throughput from vmstat for the
> > > data=ordered vs data=writeback runs? I would guess the data=ordered
> > > one has a lower overall write throughput.
> >
> > That's what I would have guessed, but it's actually going up 4x fold for
> > mysql from 559mb to 2135mb, while the db-size ends up at 549mb.
>
> So you say we write 4-times as much data in ordered mode as in writeback
> mode. Hmm, probably possible because we force all the dirty data to disk
> when committing a transation in ordered mode (and don't do this in
> writeback mode). So if the workload repeatedly dirties the whole DB, we
> are going to write the whole DB several times in ordered mode but in
> writeback mode we just keep the data in memory all the time. But this is
> what you ask for if you mount in ordered mode so I wouldn't consider it a
> bug.
Ok, maybe not a bug, but a bit inefficient. Check out this workload:
sync;
while :; do
dd < /dev/full > /mnt/sda2/x.dmp bs=1M count=20
rm -f /mnt/sda2/x.dmp
usleep 10000
done
vmstat 1 ( with mount /dev/sda2 /mnt/sda2 -o data=writeback) << note io-bo >>
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 293008 5232 57436 0 0 0 0 18 206 4 80 16 0
1 0 0 282840 5232 67620 0 0 0 0 18 238 3 81 16 0
1 0 0 297032 5244 53364 0 0 0 152 21 211 4 79 17 0
1 0 0 285236 5244 65224 0 0 0 0 18 232 4 80 16 0
1 0 0 299464 5244 50880 0 0 0 0 18 222 4 80 16 0
1 0 0 290156 5244 60176 0 0 0 0 18 236 3 80 17 0
0 0 0 302124 5256 47788 0 0 0 152 21 213 4 80 16 0
1 0 0 292180 5256 58248 0 0 0 0 18 239 3 81 16 0
1 0 0 287452 5256 62444 0 0 0 0 18 202 3 80 17 0
1 0 0 293016 5256 57392 0 0 0 0 18 250 4 80 16 0
0 0 0 302052 5256 47788 0 0 0 0 19 194 3 81 16 0
1 0 0 297536 5268 52928 0 0 0 152 20 233 4 79 17 0
1 0 0 286468 5268 63872 0 0 0 0 18 212 3 81 16 0
1 0 0 301572 5268 48812 0 0 0 0 18 267 4 79 17 0
1 0 0 292636 5268 57776 0 0 0 0 18 208 4 80 16 0
1 0 0 302124 5280 47788 0 0 0 152 21 237 4 80 16 0
1 0 0 291436 5280 58976 0 0 0 0 18 205 3 81 16 0
1 0 0 302068 5280 47788 0 0 0 0 18 234 3 81 16 0
1 0 0 293008 5280 57388 0 0 0 0 18 221 4 79 17 0
1 0 0 297288 5292 52532 0 0 0 156 22 233 2 81 16 1
1 0 0 294676 5292 55724 0 0 0 0 19 199 3 81 16 0
vmstat 1 (with mount /dev/sda2 /mnt/sda2 -o data=ordered)
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 291052 5156 59016 0 0 0 0 19 223 3 82 15 0
1 0 0 291408 5156 58704 0 0 0 0 18 218 3 81 16 0
1 0 0 291888 5156 58276 0 0 0 20 23 229 3 80 17 0
1 0 0 300764 5168 49472 0 0 0 12864 91 235 3 69 13 15
1 0 0 300740 5168 49456 0 0 0 0 19 215 3 80 17 0
1 0 0 301088 5168 49044 0 0 0 0 18 241 4 80 16 0
1 0 0 298220 5168 51872 0 0 0 0 18 225 3 81 16 0
0 1 0 289168 5168 60752 0 0 0 12712 45 237 3 77 15 5
1 0 0 300260 5180 49852 0 0 0 152 68 211 4 72 15 9
1 0 0 298616 5180 51460 0 0 0 0 18 237 3 81 16 0
1 0 0 296988 5180 53092 0 0 0 0 18 223 3 81 16 0
1 0 0 296608 5180 53480 0 0 0 0 18 223 3 81 16 0
0 0 0 301640 5192 48036 0 0 0 12868 93 206 4 67 13 16
0 0 0 301624 5192 48036 0 0 0 0 21 218 3 81 16 0
0 0 0 301600 5192 48036 0 0 0 0 18 212 3 81 16 0
0 0 0 301584 5192 48036 0 0 0 0 18 209 4 80 16 0
0 0 0 301568 5192 48036 0 0 0 0 18 208 3 81 16 0
1 0 0 285520 5204 64548 0 0 0 12864 95 216 3 69 13 15
2 0 0 285124 5204 64924 0 0 0 0 18 222 4 80 16 0
1 0 0 283612 5204 66392 0 0 0 0 18 231 3 81 16 0
1 0 0 284216 5204 65736 0 0 0 0 18 218 4 80 16 0
0 1 0 289160 5204 60752 0 0 0 12712 56 213 3 74 15 8
1 0 0 285884 5216 64128 0 0 0 152 54 209 4 75 15 6
1 0 0 287472 5216 62572 0 0 0 0 18 223 3 81 16 0
Do you think these 12mb redundant writeouts could be buffered?
(Note: you may need to adjust dd count and usleep to see the same effect)
> I still don't like your hack with per-process journal mode setting
> but we could easily do per-file journal mode setting (we already have a
> flag to do data journaling for a file) and that would help at least your
> DB workload...
Well, that depends on what kind of db you use. mysql creates db's as a dir,
and then manages the tables and indexes as files inside that dir. So I don't
think this flag would be feasible for that use-case. Much easier to just say:
echo 1 > /proc/`pidof mysqld`/soft-sync
But the per-file flag could definitely help the file-mmap case, and as such
could be a great additional feature in combination to this RFC.
Thanks!
--
Al
next prev parent reply other threads:[~2008-02-05 7:11 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-24 20:36 [RFC] ext3: per-process soft-syncing data=ordered mode Al Boldi
2008-01-24 21:50 ` Diego Calleja
2008-01-24 21:50 ` Diego Calleja
2008-01-26 5:27 ` Al Boldi
2008-01-26 5:27 ` Al Boldi
2008-01-28 17:34 ` Jan Kara
2008-01-28 17:34 ` Jan Kara
2008-01-24 21:58 ` Valdis.Kletnieks
2008-01-26 5:27 ` Al Boldi
2008-01-25 1:19 ` Chris Snook
2008-01-26 5:28 ` Al Boldi
2008-01-29 17:22 ` Jan Kara
2008-01-30 6:04 ` Al Boldi
2008-01-30 14:29 ` Chris Mason
2008-01-30 18:39 ` Al Boldi
2008-01-31 0:32 ` Andreas Dilger
2008-01-31 6:20 ` Al Boldi
2008-01-31 16:56 ` Chris Mason
2008-01-31 17:10 ` Jan Kara
2008-01-31 17:14 ` Chris Mason
2008-02-01 21:26 ` Al Boldi
2008-02-04 17:54 ` Jan Kara
2008-02-05 7:07 ` Al Boldi [this message]
2008-02-05 15:07 ` Jan Kara
2008-02-05 19:20 ` Al Boldi
2008-01-25 6:47 ` Andreas Dilger
2008-01-25 21:57 ` david
2008-01-25 15:36 ` Jan Kara
2008-01-26 5:27 ` Al Boldi
2008-01-28 17:27 ` Jan Kara
2008-01-28 20:17 ` Al Boldi
2008-02-07 0:00 ` Andreas Dilger
2008-02-10 14:54 ` Al Boldi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200802051007.44139.a1426z@gawab.com \
--to=a1426z@gawab.com \
--cc=adilger@sun.com \
--cc=chris.mason@oracle.com \
--cc=csnook@redhat.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.