Re: [PATCH] ext4: fix ext4_evict_inode() racing against workqueue processing code

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Theodore Ts'o <tytso@mit.edu>
To: Eric Sandeen <sandeen@redhat.com>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] ext4: fix ext4_evict_inode() racing against workqueue processing code
Date: Wed, 20 Mar 2013 10:45:23 -0400	[thread overview]
Message-ID: <20130320144523.GF12865@thunk.org> (raw)
In-Reply-To: <5149C452.3070206@redhat.com>

On Wed, Mar 20, 2013 at 09:14:42AM -0500, Eric Sandeen wrote:
> 
> As an aside, is there any reason to have "dioread_nolock" as an option
> at this point?  If it works now, would you ever *not* want it?
> 
> (granted it doesn't work with some journaling options etc, but that
> behavior could be automatic, w/o the need for special mount options).

The primary restriction is that diread_nolock doesn't work when fs
block size != page size.  If your proposal is that we automatically
enable diread_nolock when we can use it safely, that's definitely
something to consider for the next merge window.

My long range plan/hope is that we eventually be able to use the
extent status tree so that we do allocating writes, we first (a)
allocate the blocks, and mark them as in use as far as the mballoc
data structures are concerned, but we do _not_ mark them as in use in
the on-disk allocation bitmaps, then (b) we write the data blocks, and
then triggered by the block I/O completion, (c) in a single journal
trnasaction, we update the allocation bitmaps, update the inode's
extent tree, and update the inode's i_size field.

This is different from the dioread_nolock approach in that we're not
initially inserting the blocks in the extent tree as uninitialized,
and then convert the extent tree entries from uninit to init after the
I/O completion.

If we get to this long-term nirvana, then (1) we can eliminate the
data=writeback vs data=ordered distiction, since we'll have the safety
benefits of data=ordered while still having the performance
characteristics of data=writeback, and (2) we can eliminate
diread_nolock, since this approach should also obviate needing to take
the read lock on the direct I/O read path.  I also think this approach
in the long term will be simpler and faster, since we don't have
modify the extent tree, and start a journal transaction, before we
write the data blocks.

					- Ted

next prev parent reply	other threads:[~2013-03-20 14:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-20  1:29 [PATCH] ext4: fix ext4_evict_inode() racing against workqueue processing code Theodore Ts'o
2013-03-20  1:38 ` Theodore Ts'o
2013-03-20 13:22 ` Jan Kara
2013-03-20 13:37   ` Theodore Ts'o
2013-03-20 13:42     ` Jan Kara
2013-03-20 13:51       ` Theodore Ts'o
2013-03-20 14:14 ` Eric Sandeen
2013-03-20 14:45   ` Theodore Ts'o [this message]
2013-03-20 20:13     ` Jan Kara
2013-03-26  5:52     ` Zheng Liu
2013-03-26  5:55       ` Zheng Liu
2013-03-26 20:34       ` Jan Kara
2013-03-27  3:13         ` Zheng Liu
2013-03-29  7:32         ` Zheng Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130320144523.GF12865@thunk.org \
    --to=tytso@mit.edu \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).