From: Dave Chinner <david@fromorbit.com>
To: Andiry Xu <jix024@eng.ucsd.edu>
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
Dan Williams <dan.j.williams@intel.com>,
jack@suse.com
Subject: Re: xfstests 344 deadlock on NOVA
Date: Wed, 28 Feb 2018 08:41:44 +1100 [thread overview]
Message-ID: <20180227214144.GP7000@dastard> (raw)
In-Reply-To: <CAD4SzjvOr_akMXX8JhKCMB_A3KD66v2q0An4ULB9TF8d0Sb4PA@mail.gmail.com>
On Tue, Feb 27, 2018 at 11:15:55AM -0800, Andiry Xu wrote:
> Hi,
>
> I encounter a problem when running xfstests on NOVA. I appreciate your
> help very much.
>
> Some background: NOVA adopts a per-inode logging design. Metadata
> changes are appended to the log and persisted before returning to the
> user space. For example, a write() in NOVA works like this:
>
> Allocate new pmem blocks and fill with user data
> Append the metadata that describes this write to the end of the inode log
> Update the log tail pointer atomically to commit the write
> Update in-DRAM radix tree to point to the metadata (for fast lookup)
>
> The log appending and radix tree update are protected by inode_lock().
>
> For mmap, nova_dax_get_blocks (similar to ext2_get_blocks) needs to
> persist the metadata for new allocations. So it has to append the new
> allocation metadata to the log, and hence needs to lock the inode.
> This causes deadlock in xfstests 344 with concurrent pwrite and mmap
> write:
>
> Thread 1:
> pwrite
> -> nova_dax_file_write()
> ---> inode_lock()
> -----> invalidate_inode_pages2_range()
> -------> schedule()
Why did this thread schedule here?
> Thread 2:
> dax_fault
> -> nova_dax_get_blocks()
> ---> inode_lock() // deadlock
It's just waiting on an inode_lock() to be released by another
thread. What resource is it holding locked that the first thread
needs to make progress?
> If I remove invalidate_inode_pages2_range() in write path, xfstests
> 344 passed, but 428 will fail.
>
> It appears to me that ext2/ext4 does not have this issue because they
> don't persist metadata immediately and hence do not take inode lock.
Did you realise that you can't take the inode->i_rwsem in the page
fault path (i.e. under the mmap_sem) because that's a known deadlock
vector against the read/write IO path?
(i.e. you can use a mmap'd buffer over a range of the same file as
the data buffer for the IO, then take a page fault when trying to
copy data into/out of that buffer while holding the inode->i_rwsem)
> But nova_dax_get_blocks() has to persist the metadata and needs to
> lock the inode to access the log. Is there a way to workaround this?
> Thank you very much.
I'm pretty sure you don't want to use inode->i_rwsem for internal
metadata serialisation requirements. XFS uses xfs_inode->i_ilock for
this, ext4 uses a combination of other locks, etc, and it's done to
separate internal serialisation requirements from user data access
and VFS serialisation requirements...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-02-27 21:41 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-27 19:15 xfstests 344 deadlock on NOVA Andiry Xu
2018-02-27 21:41 ` Dave Chinner [this message]
2018-03-10 5:27 ` Andiry Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180227214144.GP7000@dastard \
--to=david@fromorbit.com \
--cc=dan.j.williams@intel.com \
--cc=jack@suse.com \
--cc=jix024@eng.ucsd.edu \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nvdimm@ml01.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).