Re: [PATCH v3] vfs: fix page locking deadlocks when deduping files

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v3] vfs: fix page locking deadlocks when deduping files
Date: Wed, 14 Aug 2019 19:54:48 +1000	[thread overview]
Message-ID: <20190814095448.GK6129@dread.disaster.area> (raw)
In-Reply-To: <20190813154010.GD5307@bombadil.infradead.org>

On Tue, Aug 13, 2019 at 08:40:10AM -0700, Matthew Wilcox wrote:
> On Tue, Aug 13, 2019 at 08:14:34AM -0700, Darrick J. Wong wrote:
> > +		/*
> > +		 * Now that we've locked both pages, make sure they still
> > +		 * represent the data we're interested in.  If not, someone
> > +		 * is invalidating pages on us and we lose.
> > +		 */
> > +		if (src_page->mapping != src->i_mapping ||
> > +		    src_page->index != srcoff >> PAGE_SHIFT ||
> > +		    dest_page->mapping != dest->i_mapping ||
> > +		    dest_page->index != destoff >> PAGE_SHIFT) {
> > +			same = false;
> > +			goto unlock;
> > +		}
> 
> It is my understanding that you don't need to check the ->index here.
> If I'm wrong about that, I'd really appreciate being corrected, because
> the page cache locking is subtle.

Ah, when talking to Darrick about this, I didn't notice the code
took references on the page, so it probably doesn't need the index
check - the page can't be recycled out from under us here an
inserted into a new mapping until we drop the reference.

What I was mainly concerned about here is that we only have a shared
inode lock on the src inode, so this code can be running
concurrently with both invalidation and insertion into the mapping.
e.g. direct write io does invalidation, buffered read does
insertion. Hence we have to be really careful about the data in the
source page being valid and stable while we run the comparison.

And on further thought, I don't think shared locking is actually
safe here. A shared lock doesn't stop new direct IO from being
submitted, so inode_dio_wait() just drains IO at that point in time
and but doesn't provide any guarantee that there isn't concurrent
DIO running.

Hence we could do the comparison here, see the data is the same,
drop the page lock, a DIO write then invalidates the page and writes
new data while we are comparing the rest of page(s) in the range. By
the time we've checked the whole range, the data at the start is no
longer the same, and the comparison is stale.

And then we do the dedupe operation oblivious to the fact the data
on disk doesn't actually match anymore, and we corrupt the data in
the destination file as it gets linked to mismatched data in the
source file....

Darrick?

> You call read_mapping_page() which returns the page with an elevated
> refcount.  That means the page can't go back to the page allocator and
> be allocated again.  It can, because it's unlocked, still be truncated,
> so the check for ->mapping after locking it is needed.  But the check
> for ->index being correct was done by find_get_entry().
> 
> See pagecache_get_page() -- if we specify FGP_LOCK, then it will lock
> the page, check the ->mapping but not check ->index.  OK, it does check
> ->index, but in a VM_BUG_ON(), so it's not something that ought to be
> able to be wrong.

Yeah, we used to have to play tricks in the old XFS writeback
clustering code to do our own non-blocking page cache lookups adn
this was one of the things we needed to be careful about until
the pagevec_lookup* interfaces came along and solved all the
problems for us. Funny how the brain remembers old gotchas with
also reminding you that the problems went away almost as long
ago.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2019-08-14  9:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-13 15:14 [PATCH v3] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
2019-08-13 15:40 ` Matthew Wilcox
2019-08-14  7:03   ` Gao Xiang
2019-08-14  7:17     ` Gao Xiang
2019-08-14  9:54   ` Dave Chinner [this message]
2019-08-14 15:33     ` Darrick J. Wong
2019-08-14 21:28       ` Dave Chinner
2019-08-15  0:41         ` Darrick J. Wong
2019-08-13 15:53 ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190814095448.GK6129@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.