linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-nvdimm@lists.01.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org,
	Dan Williams <dan.j.williams@intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [PATCH 1/3] dax: Make cache flushing protected by entry lock
Date: Fri, 24 Jun 2016 15:44:45 -0600	[thread overview]
Message-ID: <20160624214445.GA20730@linux.intel.com> (raw)
In-Reply-To: <1466523915-14644-2-git-send-email-jack@suse.cz>

On Tue, Jun 21, 2016 at 05:45:13PM +0200, Jan Kara wrote:
> Currently, flushing of caches for DAX mappings was ignoring entry lock.
> So far this was ok (modulo a bug that a difference in entry lock could
> cause cache flushing to be mistakenly skipped) but in the following
> patches we will write-protect PTEs on cache flushing and clear dirty
> tags. For that we will need more exclusion. So do cache flushing under
> an entry lock. This allows us to remove one lock-unlock pair of
> mapping->tree_lock as a bonus.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/dax.c | 62 +++++++++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 39 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 761495bf5eb9..5209f8cd0bee 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -669,35 +669,54 @@ static int dax_writeback_one(struct block_device *bdev,
>  		struct address_space *mapping, pgoff_t index, void *entry)
>  {
>  	struct radix_tree_root *page_tree = &mapping->page_tree;
> -	int type = RADIX_DAX_TYPE(entry);
> -	struct radix_tree_node *node;
> +	int type;
>  	struct blk_dax_ctl dax;
> -	void **slot;
>  	int ret = 0;
> +	void *entry2, **slot;

Nit: Let's retain the "reverse X-mas tree" ordering of our variable
definitions.

> -	spin_lock_irq(&mapping->tree_lock);
>  	/*
> -	 * Regular page slots are stabilized by the page lock even
> -	 * without the tree itself locked.  These unlocked entries
> -	 * need verification under the tree lock.
> +	 * A page got tagged dirty in DAX mapping? Something is seriously
> +	 * wrong.
>  	 */
> -	if (!__radix_tree_lookup(page_tree, index, &node, &slot))
> -		goto unlock;
> -	if (*slot != entry)
> -		goto unlock;
> -
> -	/* another fsync thread may have already written back this entry */
> -	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
> -		goto unlock;
> +	if (WARN_ON(!radix_tree_exceptional_entry(entry)))
> +		return -EIO;
>  
> +	spin_lock_irq(&mapping->tree_lock);
> +	entry2 = get_unlocked_mapping_entry(mapping, index, &slot);
> +	/* Entry got punched out / reallocated? */
> +	if (!entry2 || !radix_tree_exceptional_entry(entry2))
> +		goto put_unlock;
> +	/*
> +	 * Entry got reallocated elsewhere? No need to writeback. We have to
> +	 * compare sectors as we must not bail out due to difference in lockbit
> +	 * or entry type.
> +	 */
> +	if (RADIX_DAX_SECTOR(entry2) != RADIX_DAX_SECTOR(entry))
> +		goto put_unlock;
> +	type = RADIX_DAX_TYPE(entry2);
>  	if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) {
>  		ret = -EIO;
> -		goto unlock;
> +		goto put_unlock;
>  	}
> +	entry = entry2;

I don't think you need to set 'entry' here - you reset it in 4 lines via
lock_slot(), and don't use it in between.

> +
> +	/* Another fsync thread may have already written back this entry */
> +	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
> +		goto put_unlock;
> +	/* Lock the entry to serialize with page faults */
> +	entry = lock_slot(mapping, slot);

As of this patch nobody unlocks the slot.  :)  A quick test of "write, fsync,
fsync" confirms that it deadlocks.

You introduce the proper calls to unlock the slot via
put_locked_mapping_entry() in patch 3/3 - those probably need to be in this
patch instead.

> +	/*
> +	 * We can clear the tag now but we have to be careful so that concurrent
> +	 * dax_writeback_one() calls for the same index cannot finish before we
> +	 * actually flush the caches. This is achieved as the calls will look
> +	 * at the entry only under tree_lock and once they do that they will
> +	 * see the entry locked and wait for it to unlock.
> +	 */
> +	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> +	spin_unlock_irq(&mapping->tree_lock);
>  
>  	dax.sector = RADIX_DAX_SECTOR(entry);
>  	dax.size = (type == RADIX_DAX_PMD ? PMD_SIZE : PAGE_SIZE);
> -	spin_unlock_irq(&mapping->tree_lock);
>  
>  	/*
>  	 * We cannot hold tree_lock while calling dax_map_atomic() because it
> @@ -713,15 +732,12 @@ static int dax_writeback_one(struct block_device *bdev,
>  	}
>  
>  	wb_cache_pmem(dax.addr, dax.size);
> -
> -	spin_lock_irq(&mapping->tree_lock);
> -	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> -	spin_unlock_irq(&mapping->tree_lock);
> - unmap:
> +unmap:
>  	dax_unmap_atomic(bdev, &dax);
>  	return ret;
>  
> - unlock:
> +put_unlock:
> +	put_unlocked_mapping_entry(mapping, index, entry2);
>  	spin_unlock_irq(&mapping->tree_lock);
>  	return ret;
>  }
> -- 
> 2.6.6
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-06-24 21:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-21 15:45 [PATCH 0/3 v1] dax: Clear dirty bits after flushing caches Jan Kara
2016-06-21 15:45 ` [PATCH 1/3] dax: Make cache flushing protected by entry lock Jan Kara
2016-06-24 21:44   ` Ross Zwisler [this message]
2016-06-29 20:28     ` Jan Kara
2016-06-21 15:45 ` [PATCH 2/3] mm: Export follow_pte() Jan Kara
2016-06-24 21:55   ` Ross Zwisler
2016-06-29 20:29     ` Jan Kara
2016-06-21 15:45 ` [PATCH 3/3] dax: Clear dirty entry tags on cache flush Jan Kara
2016-06-21 17:31   ` kbuild test robot
2016-06-21 20:59   ` kbuild test robot
2016-06-23 10:47   ` Jan Kara
2016-06-28 21:38   ` Ross Zwisler
2016-06-29 20:47     ` Jan Kara
2016-06-28 21:41 ` [PATCH 0/3 v1] dax: Clear dirty bits after flushing caches Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160624214445.GA20730@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).