linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Curt Wohlgemuth <curtw@google.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>, ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH RFC] Insure direct IO writes do not use the page cache
Date: Fri, 31 Jul 2009 09:10:20 -0700	[thread overview]
Message-ID: <6601abe90907310910n694c32dfob3e2a207b07333f4@mail.gmail.com> (raw)
In-Reply-To: <20090730203351.GB6833@mit.edu>

Thanks to all for ideas and corrections for my original patch.  I'd like to
summarize the issues that I've seen raised:

1. Using blockdev_direct_IO_own_locking() as it stands, without additional
   locking in the ext4 code, is incorrect.

2. The conversion from uninit to initialized extents should be done in an IO
   completion handler.

3. When the uninit-to-init extents are converted, the handle must be marked
   as synchronous.

   But this will make DIO writes (to fallocated space) with a journal have
   bad performance.

4. Ted mentioned some optimizations possible for extent conversion (when the
   extent block isn't part of a transaction, and no new block is required).
   Jan says that verifying that the extent block is not part of a
   transaction can be difficult.

   Also we could increase the extent size that we're willing to zero out the
   data blocks for.

5. Aneesh mentioned that we could use extent tracking a la Chris Mason's
   patch for data=guarded (I confess, I haven't looked at this yet).

6. Jan's other thought is to use a new ext4_get_blocks_direct() routine as
   the get_block callback to blockdev_direct_IO() -- so no use of
   _own_locking().  This would simply return blocks from uninit extents;
   extent conversion (including possible splitting) would then be done in
   ext4_direct_IO().

7. Ted's last comment is about the tradeoffs between getting the journal
   transaction correct vs aggressive zeroout of data blocks -- seeing if
   it's possible to bypass the journal in the case of preallocated DIO
   writes.

Looking through these, it seems to me that there are two major problems:

   a. How to correctly do extent conversion in the face of locking issues and
      races with other requests (e.g. AIO)

   b. How to efficiently do this extent conversion in the face of correct
      journal semantics.

Have I missed anything?

Jan's idea of a new get_block callback for DIO seems like the simplest
solution to (a) above.  No locking changes would seem to be needed, I think.
Does this seem reasonable?

Problem (b) is one that I would defer to others with more experience with
journals than I have.

Thanks,
Curt

  reply	other threads:[~2009-07-31 16:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-29  0:28 [PATCH RFC] Insure direct IO writes do not use the page cache Curt Wohlgemuth
2009-07-29 16:10 ` Curt Wohlgemuth
2009-07-29 17:18   ` Eric Sandeen
2009-07-29 17:41     ` Eric Sandeen
2009-07-29 19:48     ` Eric Sandeen
2009-07-29 22:17       ` Mingming
2009-07-29 17:47 ` Mingming
2009-07-29 18:10 ` Theodore Tso
2009-07-30 18:30   ` Jan Kara
2009-07-30 18:39     ` Eric Sandeen
2009-07-30 18:44       ` Jan Kara
2009-07-30 19:16         ` Eric Sandeen
2009-07-30 20:33     ` Theodore Tso
2009-07-31 16:10       ` Curt Wohlgemuth [this message]
2009-08-01  6:56         ` [PATCH RFC] ext4 direct IO for holes, fallocate Mingming
2009-08-03 16:47           ` Aneesh Kumar K.V
2009-08-03 23:40             ` Mingming
2009-07-31 17:58       ` [PATCH RFC] Insure direct IO writes do not use the page cache Mingming
2009-07-31 18:03         ` Michael Rubin
2009-07-31 18:03           ` Michael Rubin
2009-08-03  9:36       ` Jan Kara
2009-07-30 11:06 ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6601abe90907310910n694c32dfob3e2a207b07333f4@mail.gmail.com \
    --to=curtw@google.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).