public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Mingming <cmm@us.ibm.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org, Eric Sandeen <sandeen@redhat.com>,
	Theodore Tso <tytso@mit.edu>
Subject: Re: [PATCH 1/2 V2] Direct IO for holes and fallocate: add end_io callback
Date: Tue, 25 Aug 2009 17:49:08 -0700	[thread overview]
Message-ID: <1251247748.3930.23.camel@mingming-laptop> (raw)
In-Reply-To: <1251229270.3930.17.camel@mingming-laptop>

On Tue, 2009-08-25 at 12:41 -0700, Mingming wrote:
> On Tue, 2009-08-25 at 16:01 +0200, Jan Kara wrote:
> > On Mon 24-08-09 14:38:13, Mingming wrote:
> > > On Thu, 2009-08-20 at 13:52 +0200, Jan Kara wrote:
> > > > On Wed 19-08-09 14:26:16, Mingming wrote:
> > > > > On Wed, 2009-08-19 at 16:15 +0200, Jan Kara wrote:
> > > > > > > For direct IO write to the end of file, we now also could get rid of
> > > > > > > using orphan list to protect expose stale data out after crash, when
> > > > > > > direct write to end of file isn't complete. We now fallocate blocks for
> > > > > > > the direct IO write to the end of file as well, and convert those
> > > > > > > fallocated blocks at the end of file to written when IO is complete. If
> > > > > > > fs crashed before the IO complete, it will only seen the file tail has
> > > > > > > been fallocated but won't get the stale data out.
> > > > > >   But you still probably need orphan list to truncate blocks allocated
> > > > > > beyond file end during extending direct write. So I'd just remove this
> > > > > > paragraph...
> > > > > > 
> > > > > 
> > > > > Do we still need to truncate blocks allocated at the end of file? Those
> > > > > blocks are marked as uninitialized extents (as any block allocation from
> > > > > DIO are flagged as uninitialized extents, before IO is complete), so
> > > > > that after recover from crash, the stale data won't get exposed. 
> > > > > 
> > > > > I guess I missed the cases you concerned that we need the orphan list to
> > > > > protect, could you plain a little more?
> > > >   Well, you are right it's not a security problem since no data gets
> > > > exposed. It's just that after a crash, there will be blocks allocated to
> > > > a file and it's not trivial to it find out unless you specifically stat the
> > > > file. So it's just *not nice* kind of thing. If it brings us some
> > > > advantage to not put the inode on the orphan list, then sure it might be
> > > > a tradeoff we are willing to make. But otherwise I see no point.
> > > > 
> > > 
> > > Hmm... I see what you concerned now...user probably don't like allocated
> > > blocks at the end...
> >   Yes, it's kind of counterintuitive that blocks can get allocated but
> > don't really "show up" in the file (because they are unwritten and beyond
> > i_size).
> > 
> > > I am trying to think of the ext3 case. In ext3 case, inode will be
> > > removed from orphan list once the IO is complete, but that is before the
> > > i_disksize get updated and journal handle being closed. What if the
> > > system crash after inode removed from orphan list but before the
> > > i_disksize get updated?
> >   As you write below, until we stop the handle, the transaction cannot be
> > committed and so no change actually gets written to disk.
> > 
> 
> Then in case this, if no change actually gets written to disk, then
> i_disksize hasn't get updated on disk, why we need the orphan list?
> 

Never mind, I missed the fact that i_size was updated in ext3_direct_IO
before submit the IO, so orphan list is there to truncate i_size to
i_disksize in case of crash.

> > > But since the journal handled has not marked as stopped, even without
> > > putting the inode on the orpah list, wouldn't the open journal handle
> > > and delayed i_disksize update until IO complete time, prevent we see
> > > half DIO data on disk? Though blocks are allocated to the file, but
> > > those block mapping wouldn't show up on disk, no? Did I miss anything?
> > > 
> > > In the ext4 +patch case (non AIO case), we also close the handle when
> > > end_io completed. If the system crashed we would see the blocks are
> > > allocated but marked as unwritten.
> >   Exactly, you end up with allocated and unwritten blocks beyond i_size and
> > that's what I'd like to avoid if user did not explicitely fallocate() these
> > blocks.
> > 
> 
> Yes but same as ext3, nothing should write to disk until we stop the
> handle, so the unwritten blocks flag should also been seen as the
> i_disksize has not been updated on disk yet.
> 

Please ignore this, I see what you are saying..

I think it over today, I understand the race is with AIO DIO io has not
completed, while there are truncate or other non AIO DIO which happened
completed before the pending AIO DIO completed. What we do with
i_disksize?

Maybe I just split the patches, only deals with holes and preallocation,
with doesn't need to update i_disksize... and worry about fixing DIO
write to the end of file for AIO later.

Mingming


  reply	other threads:[~2009-08-26  0:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-12 15:54 [PATCH 1/2 V2] Direct IO for holes and fallocate: add end_io callback Mingming
2009-08-19 14:15 ` Jan Kara
2009-08-19 21:26   ` Mingming
2009-08-20 11:52     ` Jan Kara
2009-08-24 21:38       ` Mingming
2009-08-25 14:01         ` Jan Kara
2009-08-25 19:41           ` Mingming
2009-08-26  0:49             ` Mingming [this message]
2009-08-26 13:39               ` Jan Kara
2009-08-26 13:52             ` Jan Kara
2009-09-04  0:44     ` [PATCH 1/2 V3] handle unwritten extents spt for direct IO Mingming
2009-08-24 19:11   ` [PATCH 1/2 V2] Direct IO for holes and fallocate: add end_io callback Theodore Tso
2009-08-25 13:19     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1251247748.3930.23.camel@mingming-laptop \
    --to=cmm@us.ibm.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox