From: Jan Kara <jack@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jan Kara <jack@suse.cz>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@fb.com
Subject: Re: [PATCH 3/6] lib: radix-tree: native accounting of exceptional entries
Date: Tue, 8 Nov 2016 11:08:41 +0100 [thread overview]
Message-ID: <20161108100841.GJ32353@quack2.suse.cz> (raw)
In-Reply-To: <20161107190741.3619-4-hannes@cmpxchg.org>
On Mon 07-11-16 14:07:38, Johannes Weiner wrote:
> The way the page cache is sneaking shadow entries of evicted pages
> into the radix tree past the node entry accounting and tracking them
> manually in the upper bits of node->count is fraught with problems.
>
> These shadow entries are marked in the tree as exceptional entries,
> which are a native concept to the radix tree. Maintain an explicit
> counter of exceptional entries in the radix tree node. Subsequent
> patches will switch shadow entry tracking over to that counter.
>
> DAX and shmem are the other users of exceptional entries. Since slot
> replacements that change the entry type from regular to exceptional
> must now be accounted, introduce a __radix_tree_replace() function
> that does replacement and accounting, and switch DAX and shmem over.
>
> The increase in radix tree node size is temporary. The last patch will
> transition from the shadow count in the upper bits of node->count to
> the new exceptional entry counter and shrink node->count to one byte.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
The patch looks good to me. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/dax.c | 5 +++--
> include/linux/radix-tree.h | 10 +++++++---
> lib/radix-tree.c | 50 +++++++++++++++++++++++++++++++++++++++++++---
> mm/shmem.c | 8 ++++----
> 4 files changed, 61 insertions(+), 12 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 014defd2e744..db78bae0dc0f 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -643,12 +643,13 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
> }
> mapping->nrexceptional++;
> } else {
> + struct radix_tree_node *node;
> void **slot;
> void *ret;
>
> - ret = __radix_tree_lookup(page_tree, index, NULL, &slot);
> + ret = __radix_tree_lookup(page_tree, index, &node, &slot);
> WARN_ON_ONCE(ret != entry);
> - radix_tree_replace_slot(slot, new_entry);
> + __radix_tree_replace(page_tree, node, slot, new_entry);
> }
> if (vmf->flags & FAULT_FLAG_WRITE)
> radix_tree_tag_set(page_tree, index, PAGECACHE_TAG_DIRTY);
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index af3581b8a451..7ced8a70cc8b 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -85,9 +85,10 @@ static inline bool radix_tree_is_internal_node(void *ptr)
> #define RADIX_TREE_COUNT_MASK ((1UL << RADIX_TREE_COUNT_SHIFT) - 1)
>
> struct radix_tree_node {
> - unsigned char shift; /* Bits remaining in each slot */
> - unsigned char offset; /* Slot offset in parent */
> - unsigned int count;
> + unsigned char shift; /* Bits remaining in each slot */
> + unsigned char offset; /* Slot offset in parent */
> + unsigned int count; /* Total entry count */
> + unsigned char exceptional; /* Exceptional entry count */
> union {
> struct {
> /* Used when ascending tree */
> @@ -276,6 +277,9 @@ void *__radix_tree_lookup(struct radix_tree_root *root, unsigned long index,
> struct radix_tree_node **nodep, void ***slotp);
> void *radix_tree_lookup(struct radix_tree_root *, unsigned long);
> void **radix_tree_lookup_slot(struct radix_tree_root *, unsigned long);
> +void __radix_tree_replace(struct radix_tree_root *root,
> + struct radix_tree_node *node,
> + void **slot, void *item);
> bool __radix_tree_delete_node(struct radix_tree_root *root,
> struct radix_tree_node *node);
> void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 8e6d552c40dd..ddc6facbf4da 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -220,10 +220,10 @@ static void dump_node(struct radix_tree_node *node, unsigned long index)
> {
> unsigned long i;
>
> - pr_debug("radix node: %p offset %d tags %lx %lx %lx shift %d count %d parent %p\n",
> + pr_debug("radix node: %p offset %d tags %lx %lx %lx shift %d count %d exceptional %d parent %p\n",
> node, node->offset,
> node->tags[0][0], node->tags[1][0], node->tags[2][0],
> - node->shift, node->count, node->parent);
> + node->shift, node->count, node->exceptional, node->parent);
>
> for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
> unsigned long first = index | (i << node->shift);
> @@ -522,8 +522,13 @@ static int radix_tree_extend(struct radix_tree_root *root,
> node->offset = 0;
> node->count = 1;
> node->parent = NULL;
> - if (radix_tree_is_internal_node(slot))
> + if (radix_tree_is_internal_node(slot)) {
> entry_to_node(slot)->parent = node;
> + } else {
> + /* Moving an exceptional root->rnode to a node */
> + if (radix_tree_exceptional_entry(slot))
> + node->exceptional = 1;
> + }
> node->slots[0] = slot;
> slot = node_to_entry(node);
> rcu_assign_pointer(root->rnode, slot);
> @@ -649,6 +654,8 @@ int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
> if (node) {
> unsigned offset = get_slot_offset(node, slot);
> node->count++;
> + if (radix_tree_exceptional_entry(item))
> + node->exceptional++;
> BUG_ON(tag_get(node, 0, offset));
> BUG_ON(tag_get(node, 1, offset));
> BUG_ON(tag_get(node, 2, offset));
> @@ -747,6 +754,41 @@ void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index)
> EXPORT_SYMBOL(radix_tree_lookup);
>
> /**
> + * __radix_tree_replace - replace item in a slot
> + * @root: radix tree root
> + * @node: pointer to tree node
> + * @slot: pointer to slot in @node
> + * @item: new item to store in the slot.
> + *
> + * For use with __radix_tree_lookup(). Caller must hold tree write locked
> + * across slot lookup and replacement.
> + */
> +void __radix_tree_replace(struct radix_tree_root *root,
> + struct radix_tree_node *node,
> + void **slot, void *item)
> +{
> + void *old = rcu_dereference_raw(*slot);
> + int count, exceptional;
> +
> + WARN_ON_ONCE(radix_tree_is_internal_node(item));
> +
> + count = !!item - !!old;
> + exceptional = !!radix_tree_exceptional_entry(item) -
> + !!radix_tree_exceptional_entry(old);
> +
> + WARN_ONCE(!node && slot != (void **)&root->rnode &&
> + (count || exceptional),
> + "Unaccounted slot replacement: old=%p item=%p count=%d exceptional=%d\n",
> + old, item, count, exceptional);
> +
> + if (node) {
> + node->count += count;
> + node->exceptional += exceptional;
> + }
> + rcu_assign_pointer(*slot, item);
> +}
> +
> +/**
> * radix_tree_tag_set - set a tag on a radix tree node
> * @root: radix tree root
> * @index: index key
> @@ -1561,6 +1603,8 @@ void *radix_tree_delete_item(struct radix_tree_root *root,
> delete_sibling_entries(node, node_to_entry(slot), offset);
> node->slots[offset] = NULL;
> node->count--;
> + if (radix_tree_exceptional_entry(entry))
> + node->exceptional--;
>
> __radix_tree_delete_node(root, node);
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad7813d73ea7..7f3a08df25c9 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -300,18 +300,18 @@ void shmem_uncharge(struct inode *inode, long pages)
> static int shmem_radix_tree_replace(struct address_space *mapping,
> pgoff_t index, void *expected, void *replacement)
> {
> + struct radix_tree_node *node;
> void **pslot;
> void *item;
>
> VM_BUG_ON(!expected);
> VM_BUG_ON(!replacement);
> - pslot = radix_tree_lookup_slot(&mapping->page_tree, index);
> - if (!pslot)
> + item = __radix_tree_lookup(&mapping->page_tree, index, &node, &pslot);
> + if (!item)
> return -ENOENT;
> - item = radix_tree_deref_slot_protected(pslot, &mapping->tree_lock);
> if (item != expected)
> return -ENOENT;
> - radix_tree_replace_slot(pslot, replacement);
> + __radix_tree_replace(&mapping->page_tree, node, pslot, replacement);
> return 0;
> }
>
> --
> 2.10.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jan Kara <jack@suse.cz>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@fb.com
Subject: Re: [PATCH 3/6] lib: radix-tree: native accounting of exceptional entries
Date: Tue, 8 Nov 2016 11:08:41 +0100 [thread overview]
Message-ID: <20161108100841.GJ32353@quack2.suse.cz> (raw)
In-Reply-To: <20161107190741.3619-4-hannes@cmpxchg.org>
On Mon 07-11-16 14:07:38, Johannes Weiner wrote:
> The way the page cache is sneaking shadow entries of evicted pages
> into the radix tree past the node entry accounting and tracking them
> manually in the upper bits of node->count is fraught with problems.
>
> These shadow entries are marked in the tree as exceptional entries,
> which are a native concept to the radix tree. Maintain an explicit
> counter of exceptional entries in the radix tree node. Subsequent
> patches will switch shadow entry tracking over to that counter.
>
> DAX and shmem are the other users of exceptional entries. Since slot
> replacements that change the entry type from regular to exceptional
> must now be accounted, introduce a __radix_tree_replace() function
> that does replacement and accounting, and switch DAX and shmem over.
>
> The increase in radix tree node size is temporary. The last patch will
> transition from the shadow count in the upper bits of node->count to
> the new exceptional entry counter and shrink node->count to one byte.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
The patch looks good to me. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/dax.c | 5 +++--
> include/linux/radix-tree.h | 10 +++++++---
> lib/radix-tree.c | 50 +++++++++++++++++++++++++++++++++++++++++++---
> mm/shmem.c | 8 ++++----
> 4 files changed, 61 insertions(+), 12 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 014defd2e744..db78bae0dc0f 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -643,12 +643,13 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
> }
> mapping->nrexceptional++;
> } else {
> + struct radix_tree_node *node;
> void **slot;
> void *ret;
>
> - ret = __radix_tree_lookup(page_tree, index, NULL, &slot);
> + ret = __radix_tree_lookup(page_tree, index, &node, &slot);
> WARN_ON_ONCE(ret != entry);
> - radix_tree_replace_slot(slot, new_entry);
> + __radix_tree_replace(page_tree, node, slot, new_entry);
> }
> if (vmf->flags & FAULT_FLAG_WRITE)
> radix_tree_tag_set(page_tree, index, PAGECACHE_TAG_DIRTY);
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index af3581b8a451..7ced8a70cc8b 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -85,9 +85,10 @@ static inline bool radix_tree_is_internal_node(void *ptr)
> #define RADIX_TREE_COUNT_MASK ((1UL << RADIX_TREE_COUNT_SHIFT) - 1)
>
> struct radix_tree_node {
> - unsigned char shift; /* Bits remaining in each slot */
> - unsigned char offset; /* Slot offset in parent */
> - unsigned int count;
> + unsigned char shift; /* Bits remaining in each slot */
> + unsigned char offset; /* Slot offset in parent */
> + unsigned int count; /* Total entry count */
> + unsigned char exceptional; /* Exceptional entry count */
> union {
> struct {
> /* Used when ascending tree */
> @@ -276,6 +277,9 @@ void *__radix_tree_lookup(struct radix_tree_root *root, unsigned long index,
> struct radix_tree_node **nodep, void ***slotp);
> void *radix_tree_lookup(struct radix_tree_root *, unsigned long);
> void **radix_tree_lookup_slot(struct radix_tree_root *, unsigned long);
> +void __radix_tree_replace(struct radix_tree_root *root,
> + struct radix_tree_node *node,
> + void **slot, void *item);
> bool __radix_tree_delete_node(struct radix_tree_root *root,
> struct radix_tree_node *node);
> void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 8e6d552c40dd..ddc6facbf4da 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -220,10 +220,10 @@ static void dump_node(struct radix_tree_node *node, unsigned long index)
> {
> unsigned long i;
>
> - pr_debug("radix node: %p offset %d tags %lx %lx %lx shift %d count %d parent %p\n",
> + pr_debug("radix node: %p offset %d tags %lx %lx %lx shift %d count %d exceptional %d parent %p\n",
> node, node->offset,
> node->tags[0][0], node->tags[1][0], node->tags[2][0],
> - node->shift, node->count, node->parent);
> + node->shift, node->count, node->exceptional, node->parent);
>
> for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
> unsigned long first = index | (i << node->shift);
> @@ -522,8 +522,13 @@ static int radix_tree_extend(struct radix_tree_root *root,
> node->offset = 0;
> node->count = 1;
> node->parent = NULL;
> - if (radix_tree_is_internal_node(slot))
> + if (radix_tree_is_internal_node(slot)) {
> entry_to_node(slot)->parent = node;
> + } else {
> + /* Moving an exceptional root->rnode to a node */
> + if (radix_tree_exceptional_entry(slot))
> + node->exceptional = 1;
> + }
> node->slots[0] = slot;
> slot = node_to_entry(node);
> rcu_assign_pointer(root->rnode, slot);
> @@ -649,6 +654,8 @@ int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
> if (node) {
> unsigned offset = get_slot_offset(node, slot);
> node->count++;
> + if (radix_tree_exceptional_entry(item))
> + node->exceptional++;
> BUG_ON(tag_get(node, 0, offset));
> BUG_ON(tag_get(node, 1, offset));
> BUG_ON(tag_get(node, 2, offset));
> @@ -747,6 +754,41 @@ void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index)
> EXPORT_SYMBOL(radix_tree_lookup);
>
> /**
> + * __radix_tree_replace - replace item in a slot
> + * @root: radix tree root
> + * @node: pointer to tree node
> + * @slot: pointer to slot in @node
> + * @item: new item to store in the slot.
> + *
> + * For use with __radix_tree_lookup(). Caller must hold tree write locked
> + * across slot lookup and replacement.
> + */
> +void __radix_tree_replace(struct radix_tree_root *root,
> + struct radix_tree_node *node,
> + void **slot, void *item)
> +{
> + void *old = rcu_dereference_raw(*slot);
> + int count, exceptional;
> +
> + WARN_ON_ONCE(radix_tree_is_internal_node(item));
> +
> + count = !!item - !!old;
> + exceptional = !!radix_tree_exceptional_entry(item) -
> + !!radix_tree_exceptional_entry(old);
> +
> + WARN_ONCE(!node && slot != (void **)&root->rnode &&
> + (count || exceptional),
> + "Unaccounted slot replacement: old=%p item=%p count=%d exceptional=%d\n",
> + old, item, count, exceptional);
> +
> + if (node) {
> + node->count += count;
> + node->exceptional += exceptional;
> + }
> + rcu_assign_pointer(*slot, item);
> +}
> +
> +/**
> * radix_tree_tag_set - set a tag on a radix tree node
> * @root: radix tree root
> * @index: index key
> @@ -1561,6 +1603,8 @@ void *radix_tree_delete_item(struct radix_tree_root *root,
> delete_sibling_entries(node, node_to_entry(slot), offset);
> node->slots[offset] = NULL;
> node->count--;
> + if (radix_tree_exceptional_entry(entry))
> + node->exceptional--;
>
> __radix_tree_delete_node(root, node);
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad7813d73ea7..7f3a08df25c9 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -300,18 +300,18 @@ void shmem_uncharge(struct inode *inode, long pages)
> static int shmem_radix_tree_replace(struct address_space *mapping,
> pgoff_t index, void *expected, void *replacement)
> {
> + struct radix_tree_node *node;
> void **pslot;
> void *item;
>
> VM_BUG_ON(!expected);
> VM_BUG_ON(!replacement);
> - pslot = radix_tree_lookup_slot(&mapping->page_tree, index);
> - if (!pslot)
> + item = __radix_tree_lookup(&mapping->page_tree, index, &node, &pslot);
> + if (!item)
> return -ENOENT;
> - item = radix_tree_deref_slot_protected(pslot, &mapping->tree_lock);
> if (item != expected)
> return -ENOENT;
> - radix_tree_replace_slot(pslot, replacement);
> + __radix_tree_replace(&mapping->page_tree, node, pslot, replacement);
> return 0;
> }
>
> --
> 2.10.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2016-11-08 10:08 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-07 19:07 [PATCH 0/6] mm: workingset: radix tree subtleties & single-page file refaults Johannes Weiner
2016-11-07 19:07 ` Johannes Weiner
2016-11-07 19:07 ` [PATCH 1/6] mm: khugepaged: fix radix tree node leak in shmem collapse error path Johannes Weiner
2016-11-07 19:07 ` Johannes Weiner
2016-11-08 9:53 ` Jan Kara
2016-11-08 9:53 ` Jan Kara
2016-11-08 16:12 ` Johannes Weiner
2016-11-08 16:12 ` Johannes Weiner
2016-11-09 7:41 ` Jan Kara
2016-11-09 7:41 ` Jan Kara
2016-11-11 10:59 ` Kirill A. Shutemov
2016-11-11 10:59 ` Kirill A. Shutemov
2016-11-11 12:22 ` Jan Kara
2016-11-11 12:22 ` Jan Kara
2016-11-11 16:37 ` Kirill A. Shutemov
2016-11-11 16:37 ` Kirill A. Shutemov
2016-11-14 8:07 ` Jan Kara
2016-11-14 8:07 ` Jan Kara
2016-11-14 14:29 ` Kirill A. Shutemov
2016-11-14 14:29 ` Kirill A. Shutemov
2016-11-14 15:52 ` Johannes Weiner
2016-11-14 15:52 ` Johannes Weiner
2016-11-14 16:48 ` Johannes Weiner
2016-11-14 16:48 ` Johannes Weiner
2016-11-14 19:40 ` Kirill A. Shutemov
2016-11-14 19:40 ` Kirill A. Shutemov
2016-11-15 14:00 ` Johannes Weiner
2016-11-15 14:00 ` Johannes Weiner
2016-11-07 19:07 ` [PATCH 2/6] mm: workingset: turn shadow node shrinker bugs into warnings Johannes Weiner
2016-11-07 19:07 ` Johannes Weiner
2016-11-08 9:57 ` Jan Kara
2016-11-08 9:57 ` Jan Kara
2016-11-07 19:07 ` [PATCH 3/6] lib: radix-tree: native accounting of exceptional entries Johannes Weiner
2016-11-07 19:07 ` Johannes Weiner
2016-11-08 10:08 ` Jan Kara [this message]
2016-11-08 10:08 ` Jan Kara
2016-11-07 19:07 ` [PATCH 4/6] lib: radix-tree: check accounting of existing slot replacement users Johannes Weiner
2016-11-07 19:07 ` Johannes Weiner
2016-11-08 10:12 ` Jan Kara
2016-11-08 10:12 ` Jan Kara
2016-11-07 19:07 ` [PATCH 5/6] mm: workingset: switch shadow entry tracking to radix tree exceptional counting Johannes Weiner
2016-11-07 19:07 ` Johannes Weiner
2016-11-08 10:27 ` Jan Kara
2016-11-08 10:27 ` Jan Kara
2016-11-08 19:30 ` Johannes Weiner
2016-11-08 19:30 ` Johannes Weiner
2016-11-07 19:07 ` [PATCH 6/6] mm: workingset: restore refault tracking for single-page files Johannes Weiner
2016-11-07 19:07 ` Johannes Weiner
2016-11-08 10:31 ` Jan Kara
2016-11-08 10:31 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161108100841.GJ32353@quack2.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.