linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Curt Wohlgemuth <curtw@google.com>
Cc: ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH RFC] Insure direct IO writes do not use the page cache
Date: Wed, 29 Jul 2009 14:10:07 -0400	[thread overview]
Message-ID: <20090729181007.GC14105@mit.edu> (raw)
In-Reply-To: <6601abe90907281728h22be79fenc68a16b578e28a91@mail.gmail.com>

On Tue, Jul 28, 2009 at 05:28:05PM -0700, Curt Wohlgemuth wrote:
> This insures that direct IO writes to fallocate'd file space do not use the
> page cache.

This isn't a full review, but I haven't found the time to write a
complete summary of all of the issues around direct I/O yet; but since
people are working on the solutions, I thought I'd raise a few issues
that I noticed while doing a quick read-through of the patch.  It's
not a full review, sorry --- ENOTIME.

One thing that we really need to do is in handle_uninit_extent(), but
a comment that it's only going to be used by at the end of direct I/O,
and then put in a call to ext4_handle_sync() to mark the handle as
synchronous.  That way, the DIO write won't return until the journal
transaction which commits the metadata operation is complete.

This is going to make the DIO write performance with a journal have
horrendous performance, but at least it will be correct.
(Specifically, DIO writes have the semantics that if the alignment
constraints are respected, the write is supposed to be synchronous ---
which means that if the write completes, and the system crashes
immediately afterwards, the data has to be accessible after the system
reboots.  If the extent tree change isn't committed, then the written
data won't be visible to userspace, so it might as well be lost.)  I
don't think we need to do anything special in the no journal case,
since by definition without a journal metadata updates aren't
guaranteed to be up to date.

I think I've only mentioned this on the weekly ext4 concall, so let me
fill in the optimizations I have in mind that should hopefully make
DIO less painful when writing into preallocated space. 

1) If extent tree block in question isn't already part of a journalled
transaction, and there is space in the extent block so we don't have
to split the extent tree node (which would require allocating a new
block), we can update the extent block in place, bypassing the
journal.  This will allow us to avoid waiting for the journal commit.

2) We can modify the ext4_ext_convert_to_initialized() to be more
aggressive about initializing data blocks if we know we are doing DIO,
since zero'ing an aligned 16 to 32 blocks and then waiting for the
journal commit once is cheaper than converting the extent one block at
a time and waiting for the journal commit after each block write.

Does that make sense?

       	   	       	   	   	  - Ted

  parent reply	other threads:[~2009-07-29 18:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-29  0:28 [PATCH RFC] Insure direct IO writes do not use the page cache Curt Wohlgemuth
2009-07-29 16:10 ` Curt Wohlgemuth
2009-07-29 17:18   ` Eric Sandeen
2009-07-29 17:41     ` Eric Sandeen
2009-07-29 19:48     ` Eric Sandeen
2009-07-29 22:17       ` Mingming
2009-07-29 17:47 ` Mingming
2009-07-29 18:10 ` Theodore Tso [this message]
2009-07-30 18:30   ` Jan Kara
2009-07-30 18:39     ` Eric Sandeen
2009-07-30 18:44       ` Jan Kara
2009-07-30 19:16         ` Eric Sandeen
2009-07-30 20:33     ` Theodore Tso
2009-07-31 16:10       ` Curt Wohlgemuth
2009-08-01  6:56         ` [PATCH RFC] ext4 direct IO for holes, fallocate Mingming
2009-08-03 16:47           ` Aneesh Kumar K.V
2009-08-03 23:40             ` Mingming
2009-07-31 17:58       ` [PATCH RFC] Insure direct IO writes do not use the page cache Mingming
2009-07-31 18:03         ` Michael Rubin
2009-07-31 18:03           ` Michael Rubin
2009-08-03  9:36       ` Jan Kara
2009-07-30 11:06 ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090729181007.GC14105@mit.edu \
    --to=tytso@mit.edu \
    --cc=curtw@google.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).