From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Hillf Danton <dhillf@gmail.com>, Dave Hansen <dave@sr71.net>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [PATCHv4 04/39] radix-tree: implement preload for multiple contiguous elements
Date: Sun, 12 May 2013 04:23:01 +0300 [thread overview]
Message-ID: <1368321816-17719-5-git-send-email-kirill.shutemov@linux.intel.com> (raw)
In-Reply-To: <1368321816-17719-1-git-send-email-kirill.shutemov@linux.intel.com>
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
The radix tree is variable-height, so an insert operation not only has
to build the branch to its corresponding item, it also has to build the
branch to existing items if the size has to be increased (by
radix_tree_extend).
The worst case is a zero height tree with just a single item at index 0,
and then inserting an item at index ULONG_MAX. This requires 2 new branches
of RADIX_TREE_MAX_PATH size to be created, with only the root node shared.
Radix tree is usually protected by spin lock. It means we want to
pre-allocate required memory before taking the lock.
Currently radix_tree_preload() only guarantees enough nodes to insert
one element. It's a hard limit. For transparent huge page cache we want
to insert HPAGE_PMD_NR (512 on x86-64) entires to address_space at once.
This patch introduces radix_tree_preload_count(). It allows to
preallocate nodes enough to insert a number of *contiguous* elements.
Worst case for adding N contiguous items is adding entries at indexes
(ULONG_MAX - N) to ULONG_MAX. It requires nodes to insert single worst-case
item plus extra nodes if you cross the boundary from one node to the next.
Preload uses per-CPU array to store nodes. The total cost of preload is
"array size" * sizeof(void*) * NR_CPUS. We want to increase array size
to be able to handle 512 entries at once.
Size of array depends on system bitness and on RADIX_TREE_MAP_SHIFT.
We have three possible RADIX_TREE_MAP_SHIFT:
#ifdef __KERNEL__
#define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6)
#else
#define RADIX_TREE_MAP_SHIFT 3 /* For more stressful testing */
#endif
On 64-bit system:
For RADIX_TREE_MAP_SHIFT=3, old array size is 43, new is 107.
For RADIX_TREE_MAP_SHIFT=4, old array size is 31, new is 63.
For RADIX_TREE_MAP_SHIFT=6, old array size is 21, new is 30.
On 32-bit system:
For RADIX_TREE_MAP_SHIFT=3, old array size is 21, new is 84.
For RADIX_TREE_MAP_SHIFT=4, old array size is 15, new is 46.
For RADIX_TREE_MAP_SHIFT=6, old array size is 11, new is 19.
On most machines we will have RADIX_TREE_MAP_SHIFT=6.
Since only THP uses batched preload at the , we disable (set max preload
to 1) it if !CONFIG_TRANSPARENT_HUGEPAGE_PAGECACHE. This can be changed
in the future.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/radix-tree.h | 11 +++++++++++
lib/radix-tree.c | 33 ++++++++++++++++++++++++++-------
2 files changed, 37 insertions(+), 7 deletions(-)
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index ffc444c..a859195 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -83,6 +83,16 @@ do { \
(root)->rnode = NULL; \
} while (0)
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE_PAGECACHE
+/*
+ * At the moment only THP uses preload for more then on item for batched
+ * pagecache manipulations.
+ */
+#define RADIX_TREE_PRELOAD_NR 512
+#else
+#define RADIX_TREE_PRELOAD_NR 1
+#endif
+
/**
* Radix-tree synchronization
*
@@ -231,6 +241,7 @@ unsigned long radix_tree_next_hole(struct radix_tree_root *root,
unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
unsigned long index, unsigned long max_scan);
int radix_tree_preload(gfp_t gfp_mask);
+int radix_tree_preload_count(unsigned size, gfp_t gfp_mask);
void radix_tree_init(void);
void *radix_tree_tag_set(struct radix_tree_root *root,
unsigned long index, unsigned int tag);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index e796429..1bc352f 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -81,16 +81,24 @@ static struct kmem_cache *radix_tree_node_cachep;
* The worst case is a zero height tree with just a single item at index 0,
* and then inserting an item at index ULONG_MAX. This requires 2 new branches
* of RADIX_TREE_MAX_PATH size to be created, with only the root node shared.
+ *
+ * Worst case for adding N contiguous items is adding entries at indexes
+ * (ULONG_MAX - N) to ULONG_MAX. It requires nodes to insert single worst-case
+ * item plus extra nodes if you cross the boundary from one node to the next.
+ *
* Hence:
*/
-#define RADIX_TREE_PRELOAD_SIZE (RADIX_TREE_MAX_PATH * 2 - 1)
+#define RADIX_TREE_PRELOAD_MIN (RADIX_TREE_MAX_PATH * 2 - 1)
+#define RADIX_TREE_PRELOAD_MAX \
+ (RADIX_TREE_PRELOAD_MIN + \
+ DIV_ROUND_UP(RADIX_TREE_PRELOAD_NR - 1, RADIX_TREE_MAP_SIZE))
/*
* Per-cpu pool of preloaded nodes
*/
struct radix_tree_preload {
int nr;
- struct radix_tree_node *nodes[RADIX_TREE_PRELOAD_SIZE];
+ struct radix_tree_node *nodes[RADIX_TREE_PRELOAD_MAX];
};
static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, };
@@ -257,29 +265,35 @@ radix_tree_node_free(struct radix_tree_node *node)
/*
* Load up this CPU's radix_tree_node buffer with sufficient objects to
- * ensure that the addition of a single element in the tree cannot fail. On
- * success, return zero, with preemption disabled. On error, return -ENOMEM
+ * ensure that the addition of *contiguous* elements in the tree cannot fail.
+ * On success, return zero, with preemption disabled. On error, return -ENOMEM
* with preemption not disabled.
*
* To make use of this facility, the radix tree must be initialised without
* __GFP_WAIT being passed to INIT_RADIX_TREE().
*/
-int radix_tree_preload(gfp_t gfp_mask)
+int radix_tree_preload_count(unsigned size, gfp_t gfp_mask)
{
struct radix_tree_preload *rtp;
struct radix_tree_node *node;
int ret = -ENOMEM;
+ int preload_target = RADIX_TREE_PRELOAD_MIN +
+ DIV_ROUND_UP(size - 1, RADIX_TREE_MAP_SIZE);
+
+ if (WARN_ONCE(size > RADIX_TREE_PRELOAD_NR,
+ "too large preload requested"))
+ return -ENOMEM;
preempt_disable();
rtp = &__get_cpu_var(radix_tree_preloads);
- while (rtp->nr < ARRAY_SIZE(rtp->nodes)) {
+ while (rtp->nr < preload_target) {
preempt_enable();
node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
if (node == NULL)
goto out;
preempt_disable();
rtp = &__get_cpu_var(radix_tree_preloads);
- if (rtp->nr < ARRAY_SIZE(rtp->nodes))
+ if (rtp->nr < preload_target)
rtp->nodes[rtp->nr++] = node;
else
kmem_cache_free(radix_tree_node_cachep, node);
@@ -288,6 +302,11 @@ int radix_tree_preload(gfp_t gfp_mask)
out:
return ret;
}
+
+int radix_tree_preload(gfp_t gfp_mask)
+{
+ return radix_tree_preload_count(1, gfp_mask);
+}
EXPORT_SYMBOL(radix_tree_preload);
/*
--
1.7.10.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-05-12 1:21 UTC|newest]
Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-12 1:22 [PATCHv4 00/39] Transparent huge page cache Kirill A. Shutemov
2013-05-12 1:22 ` [PATCHv4 01/39] mm: drop actor argument of do_generic_file_read() Kirill A. Shutemov
2013-05-21 18:22 ` Dave Hansen
2013-05-12 1:22 ` [PATCHv4 02/39] block: implement add_bdi_stat() Kirill A. Shutemov
2013-05-21 18:25 ` Dave Hansen
2013-05-22 11:06 ` Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 03/39] mm: implement zero_huge_user_segment and friends Kirill A. Shutemov
2013-05-23 10:32 ` Hillf Danton
2013-05-23 11:32 ` Kirill A. Shutemov
2013-05-12 1:23 ` Kirill A. Shutemov [this message]
2013-05-21 18:58 ` [PATCHv4 04/39] radix-tree: implement preload for multiple contiguous elements Dave Hansen
2013-05-22 12:03 ` Kirill A. Shutemov
2013-05-22 14:20 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 05/39] memcg, thp: charge huge cache pages Kirill A. Shutemov
2013-05-21 19:04 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 06/39] thp, mm: avoid PageUnevictable on active/inactive lru lists Kirill A. Shutemov
2013-05-21 19:17 ` Dave Hansen
2013-05-22 12:34 ` Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 07/39] thp, mm: basic defines for transparent huge page cache Kirill A. Shutemov
2013-05-23 10:36 ` Hillf Danton
2013-05-23 15:49 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 08/39] thp: compile-time and sysfs knob for thp pagecache Kirill A. Shutemov
2013-05-22 11:19 ` Hillf Danton
2013-05-12 1:23 ` [PATCHv4 09/39] thp, mm: introduce mapping_can_have_hugepages() predicate Kirill A. Shutemov
2013-05-21 19:28 ` Dave Hansen
2013-05-22 13:51 ` Kirill A. Shutemov
2013-05-22 15:31 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 10/39] thp: account anon transparent huge pages into NR_ANON_PAGES Kirill A. Shutemov
2013-05-21 19:32 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 11/39] thp: represent file thp pages in meminfo and friends Kirill A. Shutemov
2013-05-21 19:34 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 12/39] thp, mm: rewrite add_to_page_cache_locked() to support huge pages Kirill A. Shutemov
2013-05-21 19:59 ` Dave Hansen
2013-05-23 14:36 ` Kirill A. Shutemov
2013-05-23 16:00 ` Dave Hansen
2013-05-28 11:59 ` Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 13/39] mm: trace filemap: dump page order Kirill A. Shutemov
2013-05-21 19:35 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 14/39] thp, mm: rewrite delete_from_page_cache() to support huge pages Kirill A. Shutemov
2013-05-21 20:14 ` Dave Hansen
2013-05-28 12:28 ` Kirill A. Shutemov
2013-06-07 15:10 ` Kirill A. Shutemov
2013-06-07 15:56 ` Dave Hansen
2013-06-10 17:41 ` Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 15/39] thp, mm: trigger bug in replace_page_cache_page() on THP Kirill A. Shutemov
2013-05-21 20:17 ` Dave Hansen
2013-05-28 12:53 ` Kirill A. Shutemov
2013-05-28 16:33 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 16/39] thp, mm: locking tail page is a bug Kirill A. Shutemov
2013-05-21 20:18 ` Dave Hansen
2013-05-22 14:12 ` Kirill A. Shutemov
2013-05-22 14:53 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 17/39] thp, mm: handle tail pages in page_cache_get_speculative() Kirill A. Shutemov
2013-05-21 20:49 ` Dave Hansen
2013-06-27 12:40 ` Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 18/39] thp, mm: add event counters for huge page alloc on write to a file Kirill A. Shutemov
2013-05-21 20:54 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 19/39] thp, mm: allocate huge pages in grab_cache_page_write_begin() Kirill A. Shutemov
2013-05-21 21:14 ` Dave Hansen
2013-05-30 13:20 ` Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 20/39] thp, mm: naive support of thp in generic read/write routines Kirill A. Shutemov
2013-05-21 21:28 ` Dave Hansen
2013-06-07 15:17 ` Kirill A. Shutemov
2013-06-07 15:29 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 21/39] thp, libfs: initial support of thp in simple_read/write_begin/write_end Kirill A. Shutemov
2013-05-21 21:49 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 22/39] thp: handle file pages in split_huge_page() Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 23/39] thp: wait_split_huge_page(): serialize over i_mmap_mutex too Kirill A. Shutemov
2013-05-21 22:05 ` Dave Hansen
2013-06-03 15:02 ` Kirill A. Shutemov
2013-06-03 15:53 ` Dave Hansen
2013-06-03 16:09 ` Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 24/39] thp, mm: truncate support for transparent huge page cache Kirill A. Shutemov
2013-05-21 22:39 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 25/39] thp, mm: split huge page on mmap file page Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 26/39] ramfs: enable transparent huge page cache Kirill A. Shutemov
2013-05-21 22:43 ` Dave Hansen
2013-05-22 14:22 ` Kirill A. Shutemov
2013-05-22 14:55 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 27/39] x86-64, mm: proper alignment mappings with hugepages Kirill A. Shutemov
2013-05-21 22:56 ` Dave Hansen
2013-06-25 14:56 ` Kirill A. Shutemov
2013-06-25 16:46 ` Dave Hansen
2013-05-21 23:20 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 28/39] thp: prepare zap_huge_pmd() to uncharge file pages Kirill A. Shutemov
2013-05-22 7:26 ` Hillf Danton
2013-05-12 1:23 ` [PATCHv4 29/39] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() Kirill A. Shutemov
2013-05-21 23:23 ` Dave Hansen
2013-05-22 14:37 ` Kirill A. Shutemov
2013-05-22 14:56 ` Dave Hansen
2013-05-21 23:23 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 30/39] thp: do_huge_pmd_anonymous_page() cleanup Kirill A. Shutemov
2013-05-22 11:45 ` Hillf Danton
2013-05-12 1:23 ` [PATCHv4 31/39] thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page() Kirill A. Shutemov
2013-05-21 23:38 ` Dave Hansen
2013-05-22 6:51 ` Hillf Danton
2013-05-12 1:23 ` [PATCHv4 32/39] mm: cleanup __do_fault() implementation Kirill A. Shutemov
2013-05-21 23:57 ` Dave Hansen
2013-05-12 1:23 ` [PATCHv4 33/39] thp, mm: implement do_huge_linear_fault() Kirill A. Shutemov
2013-05-22 12:47 ` Hillf Danton
2013-05-22 15:13 ` Kirill A. Shutemov
2013-05-22 12:56 ` Hillf Danton
2013-05-22 15:14 ` Kirill A. Shutemov
2013-05-22 13:24 ` Hillf Danton
2013-05-22 15:26 ` Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 34/39] thp, mm: handle huge pages in filemap_fault() Kirill A. Shutemov
2013-05-22 11:37 ` Hillf Danton
2013-05-22 15:34 ` Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 35/39] mm: decomposite do_wp_page() and get rid of some 'goto' logic Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 36/39] mm: do_wp_page(): extract VM_WRITE|VM_SHARED case to separate function Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 37/39] thp: handle write-protect exception to file-backed huge pages Kirill A. Shutemov
2013-05-23 11:57 ` Hillf Danton
2013-05-23 12:08 ` Kirill A. Shutemov
2013-05-23 12:12 ` Hillf Danton
2013-05-23 12:33 ` Kirill A. Shutemov
2013-05-12 1:23 ` [PATCHv4 38/39] thp: vma_adjust_trans_huge(): adjust file-backed VMA too Kirill A. Shutemov
2013-05-23 11:01 ` Hillf Danton
2013-05-12 1:23 ` [PATCHv4 39/39] thp: map file-backed huge pages on fault Kirill A. Shutemov
2013-05-23 11:36 ` Hillf Danton
2013-05-23 11:48 ` Kirill A. Shutemov
2013-05-21 18:37 ` [PATCHv4 00/39] Transparent huge page cache Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1368321816-17719-5-git-send-email-kirill.shutemov@linux.intel.com \
--to=kirill.shutemov@linux.intel.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dave@sr71.net \
--cc=dhillf@gmail.com \
--cc=fengguang.wu@intel.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mgorman@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).