From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>,
linux-nvdimm@lists.01.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org,
Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH 1/3] dax: Make cache flushing protected by entry lock
Date: Wed, 29 Jun 2016 22:28:06 +0200 [thread overview]
Message-ID: <20160629202806.GC16831@quack2.suse.cz> (raw)
In-Reply-To: <20160624214445.GA20730@linux.intel.com>
Hi Ross,
all your comments look fine and I've incorporated them. Thanks for review!
Honza
On Fri 24-06-16 15:44:45, Ross Zwisler wrote:
> On Tue, Jun 21, 2016 at 05:45:13PM +0200, Jan Kara wrote:
> > Currently, flushing of caches for DAX mappings was ignoring entry lock.
> > So far this was ok (modulo a bug that a difference in entry lock could
> > cause cache flushing to be mistakenly skipped) but in the following
> > patches we will write-protect PTEs on cache flushing and clear dirty
> > tags. For that we will need more exclusion. So do cache flushing under
> > an entry lock. This allows us to remove one lock-unlock pair of
> > mapping->tree_lock as a bonus.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/dax.c | 62 +++++++++++++++++++++++++++++++++++++++-----------------------
> > 1 file changed, 39 insertions(+), 23 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 761495bf5eb9..5209f8cd0bee 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -669,35 +669,54 @@ static int dax_writeback_one(struct block_device *bdev,
> > struct address_space *mapping, pgoff_t index, void *entry)
> > {
> > struct radix_tree_root *page_tree = &mapping->page_tree;
> > - int type = RADIX_DAX_TYPE(entry);
> > - struct radix_tree_node *node;
> > + int type;
> > struct blk_dax_ctl dax;
> > - void **slot;
> > int ret = 0;
> > + void *entry2, **slot;
>
> Nit: Let's retain the "reverse X-mas tree" ordering of our variable
> definitions.
>
> > - spin_lock_irq(&mapping->tree_lock);
> > /*
> > - * Regular page slots are stabilized by the page lock even
> > - * without the tree itself locked. These unlocked entries
> > - * need verification under the tree lock.
> > + * A page got tagged dirty in DAX mapping? Something is seriously
> > + * wrong.
> > */
> > - if (!__radix_tree_lookup(page_tree, index, &node, &slot))
> > - goto unlock;
> > - if (*slot != entry)
> > - goto unlock;
> > -
> > - /* another fsync thread may have already written back this entry */
> > - if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
> > - goto unlock;
> > + if (WARN_ON(!radix_tree_exceptional_entry(entry)))
> > + return -EIO;
> >
> > + spin_lock_irq(&mapping->tree_lock);
> > + entry2 = get_unlocked_mapping_entry(mapping, index, &slot);
> > + /* Entry got punched out / reallocated? */
> > + if (!entry2 || !radix_tree_exceptional_entry(entry2))
> > + goto put_unlock;
> > + /*
> > + * Entry got reallocated elsewhere? No need to writeback. We have to
> > + * compare sectors as we must not bail out due to difference in lockbit
> > + * or entry type.
> > + */
> > + if (RADIX_DAX_SECTOR(entry2) != RADIX_DAX_SECTOR(entry))
> > + goto put_unlock;
> > + type = RADIX_DAX_TYPE(entry2);
> > if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) {
> > ret = -EIO;
> > - goto unlock;
> > + goto put_unlock;
> > }
> > + entry = entry2;
>
> I don't think you need to set 'entry' here - you reset it in 4 lines via
> lock_slot(), and don't use it in between.
>
> > +
> > + /* Another fsync thread may have already written back this entry */
> > + if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
> > + goto put_unlock;
> > + /* Lock the entry to serialize with page faults */
> > + entry = lock_slot(mapping, slot);
>
> As of this patch nobody unlocks the slot. :) A quick test of "write, fsync,
> fsync" confirms that it deadlocks.
>
> You introduce the proper calls to unlock the slot via
> put_locked_mapping_entry() in patch 3/3 - those probably need to be in this
> patch instead.
>
> > + /*
> > + * We can clear the tag now but we have to be careful so that concurrent
> > + * dax_writeback_one() calls for the same index cannot finish before we
> > + * actually flush the caches. This is achieved as the calls will look
> > + * at the entry only under tree_lock and once they do that they will
> > + * see the entry locked and wait for it to unlock.
> > + */
> > + radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> > + spin_unlock_irq(&mapping->tree_lock);
> >
> > dax.sector = RADIX_DAX_SECTOR(entry);
> > dax.size = (type == RADIX_DAX_PMD ? PMD_SIZE : PAGE_SIZE);
> > - spin_unlock_irq(&mapping->tree_lock);
> >
> > /*
> > * We cannot hold tree_lock while calling dax_map_atomic() because it
> > @@ -713,15 +732,12 @@ static int dax_writeback_one(struct block_device *bdev,
> > }
> >
> > wb_cache_pmem(dax.addr, dax.size);
> > -
> > - spin_lock_irq(&mapping->tree_lock);
> > - radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> > - spin_unlock_irq(&mapping->tree_lock);
> > - unmap:
> > +unmap:
> > dax_unmap_atomic(bdev, &dax);
> > return ret;
> >
> > - unlock:
> > +put_unlock:
> > + put_unlocked_mapping_entry(mapping, index, entry2);
> > spin_unlock_irq(&mapping->tree_lock);
> > return ret;
> > }
> > --
> > 2.6.6
> >
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-06-29 20:28 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-21 15:45 [PATCH 0/3 v1] dax: Clear dirty bits after flushing caches Jan Kara
2016-06-21 15:45 ` [PATCH 1/3] dax: Make cache flushing protected by entry lock Jan Kara
2016-06-24 21:44 ` Ross Zwisler
2016-06-29 20:28 ` Jan Kara [this message]
2016-06-21 15:45 ` [PATCH 2/3] mm: Export follow_pte() Jan Kara
2016-06-24 21:55 ` Ross Zwisler
2016-06-29 20:29 ` Jan Kara
2016-06-21 15:45 ` [PATCH 3/3] dax: Clear dirty entry tags on cache flush Jan Kara
2016-06-21 17:31 ` kbuild test robot
2016-06-21 20:59 ` kbuild test robot
2016-06-23 10:47 ` Jan Kara
2016-06-28 21:38 ` Ross Zwisler
2016-06-29 20:47 ` Jan Kara
2016-06-28 21:41 ` [PATCH 0/3 v1] dax: Clear dirty bits after flushing caches Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160629202806.GC16831@quack2.suse.cz \
--to=jack@suse.cz \
--cc=dan.j.williams@intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=ross.zwisler@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).