linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-nvdimm <linux-nvdimm@lists.01.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Balcer, Piotr" <piotr.balcer@intel.com>
Subject: Re: find_get_entries_tag regression bisected
Date: Sat, 16 Feb 2019 07:35:11 -0800	[thread overview]
Message-ID: <20190216153511.GM12668@bombadil.infradead.org> (raw)
In-Reply-To: <CAPcyv4gs7nGnf81Lj-Hf04uhmMb2wxKzi42ajaNVCHty7PAsAQ@mail.gmail.com>

On Fri, Feb 15, 2019 at 06:08:15PM -0800, Dan Williams wrote:
> Hi Willy,
> 
> Piotr reports the following crash can be triggered on latest mainline:
> 
>  EXT4-fs (pmem5): recovery complete
>  EXT4-fs (pmem5): mounted filesystem with ordered data mode. Opts: dax
>  ------------[ cut here ]------------
>  kernel BUG at mm/pgtable-generic.c:127!
>  invalid opcode: 0000 [#1] SMP PTI
>  CPU: 28 PID: 1193 Comm: a.out Tainted: G        W  OE     4.19.0-rc5+ #2907
>  [..]
>  RIP: 0010:pmdp_huge_clear_flush+0x1e/0x80
>  [..]
>  Call Trace:
>   dax_writeback_mapping_range+0x473/0x8a0
>   ? print_shortest_lock_dependencies+0x40/0x40
>   ? jbd2_journal_stop+0xef/0x470
>   ? ext4_fill_super+0x3071/0x3110
>   ? __lock_is_held+0x4f/0x90
>   ? __lock_is_held+0x4f/0x90
>   ext4_dax_writepages+0xed/0x2f0
>   do_writepages+0x41/0xe0
>   __filemap_fdatawrite_range+0xbe/0xf0
>   file_write_and_wait_range+0x4c/0xa0
>   ext4_sync_file+0xa6/0x4f0
> 
> I bisected this regression to commit c1901cd33cf4 "page cache: Convert
> find_get_entries_tag to XArray". I suspect another case of pte vs pmd
> confusion.
> 
> Below is the small reproducer from Piotr, it triggers in a qemu
> environment with emulated pmem, but only with ext4 that I can see, but
> I did not dig too deep. Let me know if anything jumps out to you. I'll
> otherwise take a deeper look in the coming days.

I think this is another case of the XArray and radix tree iterators
having different behaviour with multi-slot entries.  The radix tree
iterator would rewind the index to the head index while the XArray
iterator leaves it static.

While the regression was introduced with the change to
find_get_entries_tag(), DAX now uses the xas_for_each_marked() iterator
directly, so changing the behaviour of find_get_entries_tag() would
be pointless.  I think the simplest way to fix this is for DAX to
set the index appropriately.  Maybe something like this?

diff --git a/fs/dax.c b/fs/dax.c
index 6959837cc465..43e3aad31885 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -843,7 +843,7 @@ static void dax_entry_mkclean(struct address_space *mapping, pgoff_t index,
 static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev,
 		struct address_space *mapping, void *entry)
 {
-	unsigned long pfn;
+	unsigned long pfn, index;
 	long ret = 0;
 	size_t size;
 
@@ -894,16 +894,17 @@ static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev,
 	xas_unlock_irq(xas);
 
 	/*
-	 * Even if dax_writeback_mapping_range() was given a wbc->range_start
-	 * in the middle of a PMD, the 'index' we are given will be aligned to
-	 * the start index of the PMD, as will the pfn we pull from 'entry'.
+	 * If dax_writeback_mapping_range() was given a wbc->range_start
+	 * in the middle of a PMD, the 'index' we are given needs to be
+	 * aligned to the start index of the PMD.
 	 * This allows us to flush for PMD_SIZE and not have to worry about
 	 * partial PMD writebacks.
 	 */
 	pfn = dax_to_pfn(entry);
 	size = PAGE_SIZE << dax_entry_order(entry);
+	index = xas->xa_index &~ ((1UL << dax_entry_order(entry)) - 1);
 
-	dax_entry_mkclean(mapping, xas->xa_index, pfn);
+	dax_entry_mkclean(mapping, index, pfn);
 	dax_flush(dax_dev, page_address(pfn_to_page(pfn)), size);
 	/*
 	 * After we have flushed the cache, we can clear the dirty tag. There


Another way to fix this would be to mask the address in dax_entry_mkclean(),
but I think this is cleaner.

  reply	other threads:[~2019-02-16 15:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-16  2:08 find_get_entries_tag regression bisected Dan Williams
2019-02-16 15:35 ` Matthew Wilcox [this message]
2019-02-16 17:29   ` Matthew Wilcox
2019-02-16 21:11     ` Matthew Wilcox
2019-02-26  5:03       ` Dan Williams
2019-02-26 12:08         ` Matthew Wilcox
2019-02-26 13:16           ` Matthew Wilcox
2019-02-27 18:16     ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190216153511.GM12668@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=dan.j.williams@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=piotr.balcer@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).