From: Jens Axboe <axboe@kernel.dk>
To: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>, Mikulas Patocka <mpatocka@redhat.com>,
Al Viro <viro@zeniv.linux.org.uk>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH block/for-linus] writeback: fix a subtle race condition in I_DIRTY clearing
Date: Tue, 04 Nov 2014 10:41:19 -0700 [thread overview]
Message-ID: <54590FBF.7000303@kernel.dk> (raw)
In-Reply-To: <20141104173443.GF14459@htj.dyndns.org>
On 2014-11-04 10:34, Tejun Heo wrote:
> On Fri, Oct 24, 2014 at 03:38:21PM -0400, Tejun Heo wrote:
>> After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and
>> tests inode->i_state locklessly to see whether it already has all the
>> necessary I_DIRTY bits set. The comment above the barrier doesn't
>> contain any useful information - memory barriers can't ensure "changes
>> are seen by all cpus" by itself.
>>
>> And it sure enough was broken. Please consider the following
>> scenario.
>>
>> CPU 0 CPU 1
>> -------------------------------------------------------------------------------
>>
>> enters __writeback_single_inode()
>> grabs inode->i_lock
>> tests PAGECACHE_TAG_DIRTY which is clear
>> enters __set_page_dirty()
>> grabs mapping->tree_lock
>> sets PAGECACHE_TAG_DIRTY
>> releases mapping->tree_lock
>> leaves __set_page_dirty()
>>
>> enters __mark_inode_dirty()
>> smp_mb()
>> sees I_DIRTY_PAGES set
>> leaves __mark_inode_dirty()
>> clears I_DIRTY_PAGES
>> releases inode->i_lock
>>
>> Now @inode has dirty pages w/ I_DIRTY_PAGES clear. This doesn't seem
>> to lead to an immediately critical problem because requeue_inode()
>> later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when
>> deciding whether the inode needs to be requeued for IO and there are
>> enough unintentional memory barriers inbetween, so while the inode
>> ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the
>> IO list.
>>
>> The lack of explicit barrier may also theoretically affect the other
>> I_DIRTY bits which deal with metadata dirtiness. There is no
>> guarantee that a strong enough barrier exists between
>> I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied
>> inode. Filesystem inode writeout path likely has enough stuff which
>> can behave as full barrier but it's theoretically possible that the
>> writeout may not see all the updates from ->dirty_inode().
>>
>> Fix it by adding an explicit smp_mb() after I_DIRTY clearing. Note
>> that I_DIRTY_PAGES needs a special treatment as it always needs to be
>> cleared to be interlocked with the lockless test on
>> __mark_inode_dirty() side. It's cleared unconditionally and
>> reinstated after smp_mb() if the mapping still has dirty pages.
>>
>> Also add comments explaining how and why the barriers are paired.
>>
>> Lightly tested.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Mikulas Patocka <mpatocka@redhat.com>
>> Cc: Jens Axboe <axboe@kernel.dk>
>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>> Cc: stable@vger.kernel.org
>
> Jens, can you please route this one?
I can, was going to send an ack anyway.
--
Jens Axboe
prev parent reply other threads:[~2014-11-04 17:41 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-24 19:38 [PATCH block/for-linus] writeback: fix a subtle race condition in I_DIRTY clearing Tejun Heo
2014-10-29 16:37 ` Jan Kara
2014-11-04 17:34 ` Tejun Heo
2014-11-04 17:41 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54590FBF.7000303@kernel.dk \
--to=axboe@kernel.dk \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.