From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] ext4: fix ext4_evict_inode() racing against workqueue processing code
Date: Wed, 20 Mar 2013 21:13:55 +0100 [thread overview]
Message-ID: <20130320201355.GI13294@quack.suse.cz> (raw)
In-Reply-To: <20130320144523.GF12865@thunk.org>
On Wed 20-03-13 10:45:23, Ted Tso wrote:
> On Wed, Mar 20, 2013 at 09:14:42AM -0500, Eric Sandeen wrote:
> >
> > As an aside, is there any reason to have "dioread_nolock" as an option
> > at this point? If it works now, would you ever *not* want it?
> >
> > (granted it doesn't work with some journaling options etc, but that
> > behavior could be automatic, w/o the need for special mount options).
>
> The primary restriction is that diread_nolock doesn't work when fs
> block size != page size. If your proposal is that we automatically
> enable diread_nolock when we can use it safely, that's definitely
> something to consider for the next merge window.
>
> My long range plan/hope is that we eventually be able to use the
> extent status tree so that we do allocating writes, we first (a)
> allocate the blocks, and mark them as in use as far as the mballoc
> data structures are concerned, but we do _not_ mark them as in use in
> the on-disk allocation bitmaps, then (b) we write the data blocks, and
> then triggered by the block I/O completion, (c) in a single journal
> trnasaction, we update the allocation bitmaps, update the inode's
> extent tree, and update the inode's i_size field.
>
> This is different from the dioread_nolock approach in that we're not
> initially inserting the blocks in the extent tree as uninitialized,
> and then convert the extent tree entries from uninit to init after the
> I/O completion.
>
> If we get to this long-term nirvana, then (1) we can eliminate the
> data=writeback vs data=ordered distiction, since we'll have the safety
> benefits of data=ordered while still having the performance
> characteristics of data=writeback, and (2) we can eliminate
> diread_nolock, since this approach should also obviate needing to take
> the read lock on the direct I/O read path.
But this will be somewhat tricky because when we have racing buffered
write and DIO read to the same block, we have to make sure that DIO read
ignores the information in the extent status tree because data isn't
written to the blocks yet. Umm, maybe we could just mark the extent as
unwritten in the extent status tree (without having anything on disk) and
this should make DIO read work. That sounds like a nice optimization.
> I also think this approach
> in the long term will be simpler and faster, since we don't have
> modify the extent tree, and start a journal transaction, before we
> write the data blocks.
Yeah, it should be faster because we will need to perform some extent ops
only in memory and not on disk.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2013-03-20 20:13 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-20 1:29 [PATCH] ext4: fix ext4_evict_inode() racing against workqueue processing code Theodore Ts'o
2013-03-20 1:38 ` Theodore Ts'o
2013-03-20 13:22 ` Jan Kara
2013-03-20 13:37 ` Theodore Ts'o
2013-03-20 13:42 ` Jan Kara
2013-03-20 13:51 ` Theodore Ts'o
2013-03-20 14:14 ` Eric Sandeen
2013-03-20 14:45 ` Theodore Ts'o
2013-03-20 20:13 ` Jan Kara [this message]
2013-03-26 5:52 ` Zheng Liu
2013-03-26 5:55 ` Zheng Liu
2013-03-26 20:34 ` Jan Kara
2013-03-27 3:13 ` Zheng Liu
2013-03-29 7:32 ` Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130320201355.GI13294@quack.suse.cz \
--to=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).