linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiaying Zhang <jiayingz@google.com>
To: ext4 development <linux-ext4@vger.kernel.org>
Cc: Andrew Morton <akpm@google.com>,
	Michael Rubin <mrubin@google.com>,
	Manuel Benitez <rickyb@google.com>
Subject: ext4 DIO read performance issue on SSD
Date: Fri, 9 Oct 2009 16:34:08 -0700	[thread overview]
Message-ID: <5df78e1d0910091634q22e6a372g3738b0d9e9d0e6c9@mail.gmail.com> (raw)

Hello,

Recently, we are evaluating the ext4 performance on a high speed SSD.
One problem we found is that ext4 performance doesn't scale well with
multiple threads or multiple AIOs reading a single file with O_DIRECT.
E.g., with 4k block size, multiple-thread DIO AIO random read on ext4
can lose up to 50% throughput compared to the results we get via RAW IO.

After some initial analysis, we think the ext4 performance problem is caused
by the use of i_mutex lock during DIO read. I.e., during DIO read, we grab
the i_mutex lock in __blockdev_direct_IO because ext4 uses the default
DIO_LOCKING from the generic fs code. I did a quick test by calling
blockdev_direct_IO_no_locking() in ext4_direct_IO() and I saw ext4 DIO read
got 99% performance as raw IO.

As we understand, the reason why we want to take i_mutex lock during DIO
read is to prevent it from accessing stale data that may be exposed by a
simultaneous write. We saw that Mingming Cao has implemented a patch set
with which when a get_block request comes from direct write, ext4 only
allocates or splits an uninitialized extent. That uninitialized extent
will be marked as initialized at the end_io callback. We are wondering
whether we can extend this idea to buffer write as well. I.e., we always
allocate an uninitialized extent first during any write and convert it
as initialized at the time of end_io callback. This will eliminate the need
to hold i_mutex lock during direct read because a DIO read should never get
a block marked initialized before the block has been written with new data.

We haven't implemented anything yet because we want to ask here first to
see whether this proposal makes sense to you.

Regards,

Jiaying

             reply	other threads:[~2009-10-09 23:35 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-09 23:34 Jiaying Zhang [this message]
2009-10-14 18:48 ` ext4 DIO read performance issue on SSD Mingming
2009-10-14 19:48   ` Jiaying Zhang
2009-10-14 20:57     ` Mingming
2009-10-14 21:42       ` Jiaying Zhang
2009-10-15 17:27         ` Mingming
2009-10-16  1:27           ` Jiaying Zhang
2009-10-16 19:15             ` Theodore Tso
2009-10-20  1:26               ` Jiaying Zhang
2009-10-19 19:04             ` Mingming
2009-10-15  5:14   ` Jiaying Zhang
2009-10-15 17:31     ` Mingming
2009-10-15 20:07       ` Jiaying Zhang
2009-10-15 23:28         ` Mingming
2009-10-15 23:33           ` Jiaying Zhang
2009-10-16 18:56             ` Mingming
2009-10-16 19:44               ` Jiaying Zhang
2009-10-19 20:23                 ` Mingming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5df78e1d0910091634q22e6a372g3738b0d9e9d0e6c9@mail.gmail.com \
    --to=jiayingz@google.com \
    --cc=akpm@google.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mrubin@google.com \
    --cc=rickyb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).