From: Dmitry Monakhov <dmonakhov@openvz.org>
To: ext4 development <linux-ext4@vger.kernel.org>
Cc: linux-fsdevel@vger.kernel.org, axboe@kernel.dk, Jan Kara <jack@suse.cz>
Subject: EXT4 nodelalloc => back to stone age.
Date: Mon, 01 Apr 2013 15:06:18 +0400 [thread overview]
Message-ID: <87d2uese6t.fsf@openvz.org> (raw)
[-- Attachment #1: Type: text/plain, Size: 496 bytes --]
I've mounted ext4 with -onodelalloc on my SSD (INTEL SSDSA2CW120G3,4PC10362)
It shows numbers which are slower than HDD which was produced 15 years ago
#mount $SCRATCH_DEV $SCRATCH_MNT -onodelalloc
# dd if=/dev/zero of=/mnt_scratch/file bs=1M count=1024 conv=fsync,notrunc
1073741824 bytes (1.1 GB) copied, 46.7948 s, 22.9 MB/s
# dd if=/dev/zero of=/mnt_scratch/file bs=1M count=1024 conv=fsync,notrunc
1073741824 bytes (1.1 GB) copied, 41.2717 s, 26.0 MB/s
blktrace shows horrible traces:
[-- Attachment #2: trace.log --]
[-- Type: text/plain, Size: 1644 bytes --]
253,1 0 11 0.004965203 13618 Q WS 1219360 + 8 [jbd2/dm-1-8]
253,1 0 11 0.004965203 13618 Q WS 1219360 + 8 [jbd2/dm-1-8]
253,1 0 11 0.004965203 13618 Q WS 1219360 + 8 [jbd2/dm-1-8]
253,1 0 11 0.004965203 13618 Q WS 1219360 + 8 [jbd2/dm-1-8]
253,1 1 39 0.004983642 0 C WS 1219344 + 8 [0]
253,1 1 39 0.004983642 0 C WS 1219344 + 8 [0]
253,1 1 39 0.004983642 0 C WS 1219344 + 8 [0]
253,1 1 39 0.004983642 0 C WS 1219344 + 8 [0]
253,1 1 40 0.005082898 0 C WS 1219352 + 8 [0]
253,1 1 40 0.005082898 0 C WS 1219352 + 8 [0]
253,1 1 40 0.005082898 0 C WS 1219352 + 8 [0]
253,1 1 40 0.005082898 0 C WS 1219352 + 8 [0]
253,1 3 12 0.005106049 2580 Q W 1219368 + 8 [flush-253:1]
253,1 3 12 0.005106049 2580 Q W 1219368 + 8 [flush-253:1]
253,1 3 12 0.005106049 2580 Q W 1219368 + 8 [flush-253:1]
253,1 3 12 0.005106049 2580 Q W 1219368 + 8 [flush-253:1]
253,1 2 17 0.005197143 13750 Q WS 1219376 + 8 [dd]
253,1 2 17 0.005197143 13750 Q WS 1219376 + 8 [dd]
253,1 2 17 0.005197143 13750 Q WS 1219376 + 8 [dd]
253,1 2 17 0.005197143 13750 Q WS 1219376 + 8 [dd]
253,1 1 41 0.005199871 0 C WS 1219360 + 8 [0]
253,1 1 41 0.005199871 0 C WS 1219360 + 8 [0]
253,1 1 41 0.005199871 0 C WS 1219360 + 8 [0]
253,1 1 41 0.005199871 0 C WS 1219360 + 8 [0]
[-- Attachment #3: Type: text/plain, Size: 1220 bytes --]
As one can see data written from two threads dd and jbd2 on per-page basis and
jbd2 submit pages with WRITE_SYNC i.e. we write page-by-page
synchronously :)
Exact calltrace:
journal_submit_inode_data_buffers
wbc.sync_mode = WB_SYNC_ALL
->generic_writepages
->write_cache_pages
->ext4_writepage
->ext4_bio_write_page
->io_submit_add_bh
->io_submit_init
io->io_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC :
WRITE);
->ext4_io_submit(io);
1)Do we really have to use WRITE_SYNC in case of WB_SYNC_ALL ?
Why blk_finish_plug(&plug) which is called from generic_writepages() is
not enough? As far as I can see this code was copy-pasted from XFS,
also DIO also tag bio-s with WRITE_SYNC, but what happen if file
is highly fragmented (or block device is RAID0) we will endup doing
synchronous io.
2) Why don't we have writepages for non delalloc case ?
I want to fix (2) by implementing writepages() for non delalloc case
Once this will be done we may add new flag WB_SYNC_NOALLOC so
journal_submit_inode_data_buffers will use
__filemap_fdatawrite_range(, , , WB_SYNC_ALL| WB_SYNC_NOALLC)
which will call optimized ->ext4_writepages()
next reply other threads:[~2013-04-01 11:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-01 11:06 Dmitry Monakhov [this message]
2013-04-01 15:18 ` EXT4 nodelalloc => back to stone age Eric Sandeen
2013-04-01 15:39 ` Theodore Ts'o
2013-04-01 16:00 ` Eric Sandeen
2013-04-01 16:34 ` Zheng Liu
2013-04-01 15:45 ` Chris Mason
2013-04-01 15:57 ` Chris Mason
2013-04-02 13:46 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87d2uese6t.fsf@openvz.org \
--to=dmonakhov@openvz.org \
--cc=axboe@kernel.dk \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).