dax_lock_mapping_entry was never safe

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org,
	Jan Kara <jack@suse.cz>, Dave Jiang <dave.jiang@intel.com>
Subject: dax_lock_mapping_entry was never safe
Date: Mon, 26 Nov 2018 08:12:40 -0800	[thread overview]
Message-ID: <20181126161240.GH3065@bombadil.infradead.org> (raw)


I noticed this path while I was doing the 4.19 backport of
dax: Avoid losing wakeup in dax_lock_mapping_entry

                xa_unlock_irq(&mapping->i_pages);
                revalidate = wait_fn();
                finish_wait(wq, &ewait.wait);
                xa_lock_irq(&mapping->i_pages);

It's not safe to call xa_lock_irq() if mapping can have been freed while
we slept.  We'll probably get away with it; most filesystems use a unique
slab for their inodes, so you'll likely get either a freed inode or an
inode which is now the wrong inode.  But if that page has been freed back
to the page allocator, that pointer could now be pointing at anything.

Fixing this in the current codebase is no easier than fixing it in the
4.19 codebase.  This is the best I've come up with.  Could we do better
by not using the _exclusive form of prepare_to_wait()?  I'm not familiar
with all the things that need to be considered when using this family
of interfaces.

diff --git a/fs/dax.c b/fs/dax.c
index 9bcce89ea18e..154b592b18eb 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -232,6 +232,24 @@ static void *get_unlocked_entry(struct xa_state *xas)
 	}
 }
 
+static void wait_unlocked_entry(struct xa_state *xas, void *entry)
+{
+	struct wait_exceptional_entry_queue ewait;
+	wait_queue_head_t *wq;
+
+	init_wait(&ewait.wait);
+	ewait.wait.func = wake_exceptional_entry_func;
+
+	wq = dax_entry_waitqueue(xas, entry, &ewait.key);
+	prepare_to_wait_exclusive(wq, &ewait.wait, TASK_UNINTERRUPTIBLE);
+	xas_unlock_irq(xas);
+	/* We can no longer look at xas */
+	schedule();
+	finish_wait(wq, &ewait.wait);
+	if (waitqueue_active(wq))
+		__wake_up(wq, TASK_NORMAL, 1, &ewait.key);
+}
+
 static void put_unlocked_entry(struct xa_state *xas, void *entry)
 {
 	/* If we were the only waiter woken, wake the next one */
@@ -389,9 +407,7 @@ bool dax_lock_mapping_entry(struct page *page)
 		entry = xas_load(&xas);
 		if (dax_is_locked(entry)) {
 			rcu_read_unlock();
-			entry = get_unlocked_entry(&xas);
-			xas_unlock_irq(&xas);
-			put_unlocked_entry(&xas, entry);
+			wait_unlocked_entry(&xas, entry);
 			rcu_read_lock();
 			continue;
 		}

next             reply	other threads:[~2018-11-27  3:07 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-26 16:12 Matthew Wilcox [this message]
2018-11-26 17:11 ` dax_lock_mapping_entry was never safe Jan Kara
2018-11-26 20:36   ` Dan Williams
2018-11-27 18:59     ` Matthew Wilcox

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:9bcce89ea18 dfblob:154b592b18e )
 OR (
bs:"dax_lock_mapping_entry was never safe" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181126161240.GH3065@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).