From: Dmitry Monakhov <dmonakhov@openvz.org>
To: ext4 development <linux-ext4@vger.kernel.org>
Cc: linux-fsdevel@vger.kernel.org, axboe@kernel.dk, Jan Kara <jack@suse.cz>
Subject: EXT4 nodelalloc => back to stone age.
Date: Mon, 01 Apr 2013 15:06:18 +0400 [thread overview]
Message-ID: <87d2uese6t.fsf@openvz.org> (raw)
[-- Attachment #1: Type: text/plain, Size: 496 bytes --]
I've mounted ext4 with -onodelalloc on my SSD (INTEL SSDSA2CW120G3,4PC10362)
It shows numbers which are slower than HDD which was produced 15 years ago
#mount $SCRATCH_DEV $SCRATCH_MNT -onodelalloc
# dd if=/dev/zero of=/mnt_scratch/file bs=1M count=1024 conv=fsync,notrunc
1073741824 bytes (1.1 GB) copied, 46.7948 s, 22.9 MB/s
# dd if=/dev/zero of=/mnt_scratch/file bs=1M count=1024 conv=fsync,notrunc
1073741824 bytes (1.1 GB) copied, 41.2717 s, 26.0 MB/s
blktrace shows horrible traces:
[-- Attachment #2: trace.log --]
[-- Type: text/plain, Size: 1644 bytes --]
253,1 0 11 0.004965203 13618 Q WS 1219360 + 8 [jbd2/dm-1-8]
253,1 0 11 0.004965203 13618 Q WS 1219360 + 8 [jbd2/dm-1-8]
253,1 0 11 0.004965203 13618 Q WS 1219360 + 8 [jbd2/dm-1-8]
253,1 0 11 0.004965203 13618 Q WS 1219360 + 8 [jbd2/dm-1-8]
253,1 1 39 0.004983642 0 C WS 1219344 + 8 [0]
253,1 1 39 0.004983642 0 C WS 1219344 + 8 [0]
253,1 1 39 0.004983642 0 C WS 1219344 + 8 [0]
253,1 1 39 0.004983642 0 C WS 1219344 + 8 [0]
253,1 1 40 0.005082898 0 C WS 1219352 + 8 [0]
253,1 1 40 0.005082898 0 C WS 1219352 + 8 [0]
253,1 1 40 0.005082898 0 C WS 1219352 + 8 [0]
253,1 1 40 0.005082898 0 C WS 1219352 + 8 [0]
253,1 3 12 0.005106049 2580 Q W 1219368 + 8 [flush-253:1]
253,1 3 12 0.005106049 2580 Q W 1219368 + 8 [flush-253:1]
253,1 3 12 0.005106049 2580 Q W 1219368 + 8 [flush-253:1]
253,1 3 12 0.005106049 2580 Q W 1219368 + 8 [flush-253:1]
253,1 2 17 0.005197143 13750 Q WS 1219376 + 8 [dd]
253,1 2 17 0.005197143 13750 Q WS 1219376 + 8 [dd]
253,1 2 17 0.005197143 13750 Q WS 1219376 + 8 [dd]
253,1 2 17 0.005197143 13750 Q WS 1219376 + 8 [dd]
253,1 1 41 0.005199871 0 C WS 1219360 + 8 [0]
253,1 1 41 0.005199871 0 C WS 1219360 + 8 [0]
253,1 1 41 0.005199871 0 C WS 1219360 + 8 [0]
253,1 1 41 0.005199871 0 C WS 1219360 + 8 [0]
[-- Attachment #3: Type: text/plain, Size: 1220 bytes --]
As one can see data written from two threads dd and jbd2 on per-page basis and
jbd2 submit pages with WRITE_SYNC i.e. we write page-by-page
synchronously :)
Exact calltrace:
journal_submit_inode_data_buffers
wbc.sync_mode = WB_SYNC_ALL
->generic_writepages
->write_cache_pages
->ext4_writepage
->ext4_bio_write_page
->io_submit_add_bh
->io_submit_init
io->io_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC :
WRITE);
->ext4_io_submit(io);
1)Do we really have to use WRITE_SYNC in case of WB_SYNC_ALL ?
Why blk_finish_plug(&plug) which is called from generic_writepages() is
not enough? As far as I can see this code was copy-pasted from XFS,
also DIO also tag bio-s with WRITE_SYNC, but what happen if file
is highly fragmented (or block device is RAID0) we will endup doing
synchronous io.
2) Why don't we have writepages for non delalloc case ?
I want to fix (2) by implementing writepages() for non delalloc case
Once this will be done we may add new flag WB_SYNC_NOALLOC so
journal_submit_inode_data_buffers will use
__filemap_fdatawrite_range(, , , WB_SYNC_ALL| WB_SYNC_NOALLC)
which will call optimized ->ext4_writepages()
next reply other threads:[~2013-04-01 11:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-01 11:06 Dmitry Monakhov [this message]
2013-04-01 15:18 ` EXT4 nodelalloc => back to stone age Eric Sandeen
2013-04-01 15:39 ` Theodore Ts'o
2013-04-01 16:00 ` Eric Sandeen
2013-04-01 16:34 ` Zheng Liu
2013-04-01 15:45 ` Chris Mason
2013-04-01 15:57 ` Chris Mason
2013-04-02 13:46 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87d2uese6t.fsf@openvz.org \
--to=dmonakhov@openvz.org \
--cc=axboe@kernel.dk \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.