From: Al Boldi <a1426z@gawab.com>
To: Jan Kara <jack@suse.cz>
Cc: Chris Mason <chris.mason@oracle.com>,
Andreas Dilger <adilger@sun.com>, Chris Snook <csnook@redhat.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] ext3: per-process soft-syncing data=ordered mode
Date: Tue, 5 Feb 2008 10:07:44 +0300 [thread overview]
Message-ID: <200802051007.44139.a1426z@gawab.com> (raw)
In-Reply-To: <20080204175415.GH3426@duck.suse.cz>
Jan Kara wrote:
> On Sat 02-02-08 00:26:00, Al Boldi wrote:
> > Chris Mason wrote:
> > > Al, could you please compare the write throughput from vmstat for the
> > > data=ordered vs data=writeback runs? I would guess the data=ordered
> > > one has a lower overall write throughput.
> >
> > That's what I would have guessed, but it's actually going up 4x fold for
> > mysql from 559mb to 2135mb, while the db-size ends up at 549mb.
>
> So you say we write 4-times as much data in ordered mode as in writeback
> mode. Hmm, probably possible because we force all the dirty data to disk
> when committing a transation in ordered mode (and don't do this in
> writeback mode). So if the workload repeatedly dirties the whole DB, we
> are going to write the whole DB several times in ordered mode but in
> writeback mode we just keep the data in memory all the time. But this is
> what you ask for if you mount in ordered mode so I wouldn't consider it a
> bug.
Ok, maybe not a bug, but a bit inefficient. Check out this workload:
sync;
while :; do
dd < /dev/full > /mnt/sda2/x.dmp bs=1M count=20
rm -f /mnt/sda2/x.dmp
usleep 10000
done
vmstat 1 ( with mount /dev/sda2 /mnt/sda2 -o data=writeback) << note io-bo >>
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 293008 5232 57436 0 0 0 0 18 206 4 80 16 0
1 0 0 282840 5232 67620 0 0 0 0 18 238 3 81 16 0
1 0 0 297032 5244 53364 0 0 0 152 21 211 4 79 17 0
1 0 0 285236 5244 65224 0 0 0 0 18 232 4 80 16 0
1 0 0 299464 5244 50880 0 0 0 0 18 222 4 80 16 0
1 0 0 290156 5244 60176 0 0 0 0 18 236 3 80 17 0
0 0 0 302124 5256 47788 0 0 0 152 21 213 4 80 16 0
1 0 0 292180 5256 58248 0 0 0 0 18 239 3 81 16 0
1 0 0 287452 5256 62444 0 0 0 0 18 202 3 80 17 0
1 0 0 293016 5256 57392 0 0 0 0 18 250 4 80 16 0
0 0 0 302052 5256 47788 0 0 0 0 19 194 3 81 16 0
1 0 0 297536 5268 52928 0 0 0 152 20 233 4 79 17 0
1 0 0 286468 5268 63872 0 0 0 0 18 212 3 81 16 0
1 0 0 301572 5268 48812 0 0 0 0 18 267 4 79 17 0
1 0 0 292636 5268 57776 0 0 0 0 18 208 4 80 16 0
1 0 0 302124 5280 47788 0 0 0 152 21 237 4 80 16 0
1 0 0 291436 5280 58976 0 0 0 0 18 205 3 81 16 0
1 0 0 302068 5280 47788 0 0 0 0 18 234 3 81 16 0
1 0 0 293008 5280 57388 0 0 0 0 18 221 4 79 17 0
1 0 0 297288 5292 52532 0 0 0 156 22 233 2 81 16 1
1 0 0 294676 5292 55724 0 0 0 0 19 199 3 81 16 0
vmstat 1 (with mount /dev/sda2 /mnt/sda2 -o data=ordered)
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 291052 5156 59016 0 0 0 0 19 223 3 82 15 0
1 0 0 291408 5156 58704 0 0 0 0 18 218 3 81 16 0
1 0 0 291888 5156 58276 0 0 0 20 23 229 3 80 17 0
1 0 0 300764 5168 49472 0 0 0 12864 91 235 3 69 13 15
1 0 0 300740 5168 49456 0 0 0 0 19 215 3 80 17 0
1 0 0 301088 5168 49044 0 0 0 0 18 241 4 80 16 0
1 0 0 298220 5168 51872 0 0 0 0 18 225 3 81 16 0
0 1 0 289168 5168 60752 0 0 0 12712 45 237 3 77 15 5
1 0 0 300260 5180 49852 0 0 0 152 68 211 4 72 15 9
1 0 0 298616 5180 51460 0 0 0 0 18 237 3 81 16 0
1 0 0 296988 5180 53092 0 0 0 0 18 223 3 81 16 0
1 0 0 296608 5180 53480 0 0 0 0 18 223 3 81 16 0
0 0 0 301640 5192 48036 0 0 0 12868 93 206 4 67 13 16
0 0 0 301624 5192 48036 0 0 0 0 21 218 3 81 16 0
0 0 0 301600 5192 48036 0 0 0 0 18 212 3 81 16 0
0 0 0 301584 5192 48036 0 0 0 0 18 209 4 80 16 0
0 0 0 301568 5192 48036 0 0 0 0 18 208 3 81 16 0
1 0 0 285520 5204 64548 0 0 0 12864 95 216 3 69 13 15
2 0 0 285124 5204 64924 0 0 0 0 18 222 4 80 16 0
1 0 0 283612 5204 66392 0 0 0 0 18 231 3 81 16 0
1 0 0 284216 5204 65736 0 0 0 0 18 218 4 80 16 0
0 1 0 289160 5204 60752 0 0 0 12712 56 213 3 74 15 8
1 0 0 285884 5216 64128 0 0 0 152 54 209 4 75 15 6
1 0 0 287472 5216 62572 0 0 0 0 18 223 3 81 16 0
Do you think these 12mb redundant writeouts could be buffered?
(Note: you may need to adjust dd count and usleep to see the same effect)
> I still don't like your hack with per-process journal mode setting
> but we could easily do per-file journal mode setting (we already have a
> flag to do data journaling for a file) and that would help at least your
> DB workload...
Well, that depends on what kind of db you use. mysql creates db's as a dir,
and then manages the tables and indexes as files inside that dir. So I don't
think this flag would be feasible for that use-case. Much easier to just say:
echo 1 > /proc/`pidof mysqld`/soft-sync
But the per-file flag could definitely help the file-mmap case, and as such
could be a great additional feature in combination to this RFC.
Thanks!
--
Al
next prev parent reply other threads:[~2008-02-05 7:10 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-24 20:36 [RFC] ext3: per-process soft-syncing data=ordered mode Al Boldi
2008-01-24 21:50 ` Diego Calleja
2008-01-26 5:27 ` Al Boldi
2008-01-28 17:34 ` Jan Kara
2008-01-24 21:58 ` Valdis.Kletnieks
2008-01-26 5:27 ` Al Boldi
2008-01-25 1:19 ` Chris Snook
2008-01-26 5:28 ` Al Boldi
2008-01-29 17:22 ` Jan Kara
2008-01-30 6:04 ` Al Boldi
2008-01-30 14:29 ` Chris Mason
2008-01-30 18:39 ` Al Boldi
2008-01-31 0:32 ` Andreas Dilger
2008-01-31 6:20 ` Al Boldi
2008-01-31 16:56 ` Chris Mason
2008-01-31 17:10 ` Jan Kara
2008-01-31 17:14 ` Chris Mason
2008-02-01 21:26 ` Al Boldi
2008-02-04 17:54 ` Jan Kara
2008-02-05 7:07 ` Al Boldi [this message]
2008-02-05 15:07 ` Jan Kara
2008-02-05 19:20 ` Al Boldi
2008-01-25 6:47 ` Andreas Dilger
2008-01-25 21:57 ` david
2008-01-25 15:36 ` Jan Kara
2008-01-26 5:27 ` Al Boldi
2008-01-28 17:27 ` Jan Kara
2008-01-28 20:17 ` Al Boldi
2008-02-07 0:00 ` Andreas Dilger
2008-02-10 14:54 ` Al Boldi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200802051007.44139.a1426z@gawab.com \
--to=a1426z@gawab.com \
--cc=adilger@sun.com \
--cc=chris.mason@oracle.com \
--cc=csnook@redhat.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).