All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: linux-fsdevel@vger.kernel.org, "Wilcox,
	Matthew R  <matthew.r.wilcox@intel.com>, Jan Kara <jack@suse.cz>,
	NeilBrown" <neilb@suse.com>,
	linux-nvdimm@lists.01.org
Subject: Re: [PATCH 02/10] radix-tree: make 'indirect' bit available to exception entries.
Date: Tue, 22 Mar 2016 10:12:32 +0100	[thread overview]
Message-ID: <20160322091232.GD4459@quack.suse.cz> (raw)
In-Reply-To: <20160321173458.GO23727@linux.intel.com>

On Mon 21-03-16 13:34:58, Matthew Wilcox wrote:
> I have a patch currently in my tree which has the same effect, but looks
> a little neater:
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index b77c31c..06dfed5 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -70,6 +70,8 @@ struct radix_tree_preload {
>  };
>  static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, };
>  
> +#define RADIX_TREE_RETRY       ((void *)1)
> +
>  static inline void *ptr_to_indirect(void *ptr)
>  {
>         return (void *)((unsigned long)ptr | RADIX_TREE_INDIRECT_PTR);
> @@ -934,7 +936,7 @@ restart:
>                 }
>  
>                 slot = rcu_dereference_raw(node->slots[offset]);
> -               if (slot == NULL)
> +               if ((slot == NULL) || (slot == RADIX_TREE_RETRY))
>                         goto restart;
>                 offset = follow_sibling(node, &slot, offset);
>                 if (!radix_tree_is_indirect_ptr(slot))
> @@ -1443,8 +1455,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root)
>                  * to force callers to retry.
>                  */
>                 if (!radix_tree_is_indirect_ptr(slot))
> -                       *((unsigned long *)&to_free->slots[0]) |=
> -                                               RADIX_TREE_INDIRECT_PTR;
> +                       to_free->slots[0] = RADIX_TREE_RETRY;
>  
>                 radix_tree_node_free(to_free);
>         }
> 
> What do you think to doing it this way?
> 
> It might be slightly neater to replace the first hunk with this:
> 
> #define RADIX_TREE_RETRY       ((void *)RADIX_TREE_INDIRECT_PTR)
> 
> I also considered putting that define in radix-tree.h instead of
> radix-tree.c, but on the whole I don't think it'll be useful outside
> radix-tree.h.

So after spending over and hour reading radix tree code back and forth (and
also digging into historical versions where stuff is easier to understand) I
think I can finally fully appreciate subtlety of the retry logic ;). And
actually now I think that Neil's variant is buggy because in his case
radix_tree_lookup() could return NULL if it raced with radix_tree_shrink()
for index 0, although there is valid entry at index 0.

Your variant actually doesn't make things much better. See e.g.
mm/filemap.c: find_get_entry()

	pagep = radix_tree_lookup_slot(&mapping->page_tree, offset);
	// pagep points to node that is under RCU freeing
	if (pagep) {
		page = radix_tree_deref_slot(pagep);
		if (unlikely(!page))	// False since
					// RADIX_TREE_INDIRECT_PTR is set
		if (radix_tree_exception(page))	// False - no exeptional bit
		if (!page_cache_get_speculative(page)) // oops...

What we need to do is either to make all radix_tree_deref_slot() callers
check return value immediately with something like radix_tree_deref_retry()
(but that is still prone to hard to debug bugs when someone forgets to call
radix_tree_deref_retry() in some place) or we just give up the idea of
using INDIRECT bit in exceptional entries.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org,
	NeilBrown <neilb@suse.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>
Subject: Re: [PATCH 02/10] radix-tree: make 'indirect' bit available to exception entries.
Date: Tue, 22 Mar 2016 10:12:32 +0100	[thread overview]
Message-ID: <20160322091232.GD4459@quack.suse.cz> (raw)
In-Reply-To: <20160321173458.GO23727@linux.intel.com>

On Mon 21-03-16 13:34:58, Matthew Wilcox wrote:
> I have a patch currently in my tree which has the same effect, but looks
> a little neater:
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index b77c31c..06dfed5 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -70,6 +70,8 @@ struct radix_tree_preload {
>  };
>  static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, };
>  
> +#define RADIX_TREE_RETRY       ((void *)1)
> +
>  static inline void *ptr_to_indirect(void *ptr)
>  {
>         return (void *)((unsigned long)ptr | RADIX_TREE_INDIRECT_PTR);
> @@ -934,7 +936,7 @@ restart:
>                 }
>  
>                 slot = rcu_dereference_raw(node->slots[offset]);
> -               if (slot == NULL)
> +               if ((slot == NULL) || (slot == RADIX_TREE_RETRY))
>                         goto restart;
>                 offset = follow_sibling(node, &slot, offset);
>                 if (!radix_tree_is_indirect_ptr(slot))
> @@ -1443,8 +1455,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root)
>                  * to force callers to retry.
>                  */
>                 if (!radix_tree_is_indirect_ptr(slot))
> -                       *((unsigned long *)&to_free->slots[0]) |=
> -                                               RADIX_TREE_INDIRECT_PTR;
> +                       to_free->slots[0] = RADIX_TREE_RETRY;
>  
>                 radix_tree_node_free(to_free);
>         }
> 
> What do you think to doing it this way?
> 
> It might be slightly neater to replace the first hunk with this:
> 
> #define RADIX_TREE_RETRY       ((void *)RADIX_TREE_INDIRECT_PTR)
> 
> I also considered putting that define in radix-tree.h instead of
> radix-tree.c, but on the whole I don't think it'll be useful outside
> radix-tree.h.

So after spending over and hour reading radix tree code back and forth (and
also digging into historical versions where stuff is easier to understand) I
think I can finally fully appreciate subtlety of the retry logic ;). And
actually now I think that Neil's variant is buggy because in his case
radix_tree_lookup() could return NULL if it raced with radix_tree_shrink()
for index 0, although there is valid entry at index 0.

Your variant actually doesn't make things much better. See e.g.
mm/filemap.c: find_get_entry()

	pagep = radix_tree_lookup_slot(&mapping->page_tree, offset);
	// pagep points to node that is under RCU freeing
	if (pagep) {
		page = radix_tree_deref_slot(pagep);
		if (unlikely(!page))	// False since
					// RADIX_TREE_INDIRECT_PTR is set
		if (radix_tree_exception(page))	// False - no exeptional bit
		if (!page_cache_get_speculative(page)) // oops...

What we need to do is either to make all radix_tree_deref_slot() callers
check return value immediately with something like radix_tree_deref_retry()
(but that is still prone to hard to debug bugs when someone forgets to call
radix_tree_deref_retry() in some place) or we just give up the idea of
using INDIRECT bit in exceptional entries.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2016-03-22  9:12 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-21 13:22 [RFC v2] [PATCH 0/10] DAX page fault locking Jan Kara
2016-03-21 13:22 ` Jan Kara
2016-03-21 13:22 ` [PATCH 01/10] DAX: move RADIX_DAX_ definitions to dax.c Jan Kara
2016-03-21 13:22   ` Jan Kara
2016-03-21 17:25   ` Matthew Wilcox
2016-03-21 17:25     ` Matthew Wilcox
2016-03-21 13:22 ` [PATCH 02/10] radix-tree: make 'indirect' bit available to exception entries Jan Kara
2016-03-21 13:22   ` Jan Kara
2016-03-21 17:34   ` Matthew Wilcox
2016-03-21 17:34     ` Matthew Wilcox
2016-03-22  9:12     ` Jan Kara [this message]
2016-03-22  9:12       ` Jan Kara
2016-03-22  9:27       ` Matthew Wilcox
2016-03-22  9:27         ` Matthew Wilcox
2016-03-22 10:37         ` Jan Kara
2016-03-22 10:37           ` Jan Kara
2016-03-23 16:41           ` Ross Zwisler
2016-03-23 16:41             ` Ross Zwisler
2016-03-24 12:31             ` Jan Kara
2016-03-24 12:31               ` Jan Kara
2016-03-21 13:22 ` [PATCH 03/10] dax: Remove complete_unwritten argument Jan Kara
2016-03-21 13:22   ` Jan Kara
2016-03-23 17:12   ` Ross Zwisler
2016-03-23 17:12     ` Ross Zwisler
2016-03-24 12:32     ` Jan Kara
2016-03-24 12:32       ` Jan Kara
2016-03-21 13:22 ` [PATCH 04/10] dax: Fix data corruption for written and mmapped files Jan Kara
2016-03-21 13:22   ` Jan Kara
2016-03-23 17:39   ` Ross Zwisler
2016-03-23 17:39     ` Ross Zwisler
2016-03-24 12:51     ` Jan Kara
2016-03-24 12:51       ` Jan Kara
2016-03-29 15:17       ` Ross Zwisler
2016-03-29 15:17         ` Ross Zwisler
2016-03-21 13:22 ` [PATCH 05/10] dax: Allow DAX code to replace exceptional entries Jan Kara
2016-03-21 13:22   ` Jan Kara
2016-03-23 17:52   ` Ross Zwisler
2016-03-23 17:52     ` Ross Zwisler
2016-03-24 10:42     ` Jan Kara
2016-03-24 10:42       ` Jan Kara
2016-03-21 13:22 ` [PATCH 06/10] dax: Remove redundant inode size checks Jan Kara
2016-03-21 13:22   ` Jan Kara
2016-03-23 21:08   ` Ross Zwisler
2016-03-23 21:08     ` Ross Zwisler
2016-03-21 13:22 ` [PATCH 07/10] dax: Disable huge page handling Jan Kara
2016-03-21 13:22   ` Jan Kara
2016-03-23 20:50   ` Ross Zwisler
2016-03-23 20:50     ` Ross Zwisler
2016-03-24 12:56     ` Jan Kara
2016-03-24 12:56       ` Jan Kara
2016-03-21 13:22 ` [PATCH 08/10] dax: New fault locking Jan Kara
2016-03-21 13:22   ` Jan Kara
2016-03-29 21:57   ` Ross Zwisler
2016-03-29 21:57     ` Ross Zwisler
2016-03-31 16:27     ` Jan Kara
2016-03-31 16:27       ` Jan Kara
2016-03-21 13:22 ` [PATCH 09/10] dax: Use radix tree entry lock to protect cow faults Jan Kara
2016-03-21 13:22   ` Jan Kara
2016-03-21 19:11   ` Matthew Wilcox
2016-03-21 19:11     ` Matthew Wilcox
2016-03-22  7:03     ` Jan Kara
2016-03-22  7:03       ` Jan Kara
2016-03-29 22:18   ` Ross Zwisler
2016-03-29 22:18     ` Ross Zwisler
2016-03-21 13:22 ` [PATCH 10/10] dax: Remove i_mmap_lock protection Jan Kara
2016-03-21 13:22   ` Jan Kara
2016-03-29 22:17   ` Ross Zwisler
2016-03-29 22:17     ` Ross Zwisler
2016-03-21 17:41 ` [RFC v2] [PATCH 0/10] DAX page fault locking Matthew Wilcox
2016-03-21 17:41   ` Matthew Wilcox
2016-03-23 15:09   ` Jan Kara
2016-03-23 15:09     ` Jan Kara
2016-03-23 20:50     ` Matthew Wilcox
2016-03-23 20:50       ` Matthew Wilcox
2016-03-24 10:00     ` Matthew Wilcox
2016-03-24 10:00       ` Matthew Wilcox
2016-03-22 19:32 ` Ross Zwisler
2016-03-22 19:32   ` Ross Zwisler
2016-03-22 21:07   ` Toshi Kani
2016-03-22 21:07     ` Toshi Kani
2016-03-22 21:15     ` Dave Chinner
2016-03-22 21:15       ` Dave Chinner
2016-03-23  9:45     ` Jan Kara
2016-03-23  9:45       ` Jan Kara
2016-03-23 15:11       ` Toshi Kani
2016-03-23 15:11         ` Toshi Kani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160322091232.GD4459@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=neilb@suse.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.