From: Kuan-Wei Chiu <visitorckw@gmail.com>
To: colyli@suse.de, kent.overstreet@linux.dev, msakai@redhat.com,
corbet@lwn.net, peterz@infradead.org, mingo@redhat.com,
acme@kernel.org, namhyung@kernel.org, akpm@linux-foundation.org
Cc: mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, jserv@ccns.ncku.edu.tw,
linux-kernel@vger.kernel.org, linux-bcache@vger.kernel.org,
dm-devel@lists.linux.dev, linux-bcachefs@vger.kernel.org,
linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org,
Kuan-Wei Chiu <visitorckw@gmail.com>
Subject: [PATCH 2/3] lib min_heap: Optimize min heap by prescaling counters for better performance
Date: Mon, 14 Oct 2024 02:47:02 +0800 [thread overview]
Message-ID: <20241013184703.659652-3-visitorckw@gmail.com> (raw)
In-Reply-To: <20241013184703.659652-1-visitorckw@gmail.com>
Improve the efficiency of the min heap by prescaling counters,
eliminating the need to repeatedly compute 'index * element_size' when
accessing elements. By doing so, we avoid the overhead associated with
recalculating the byte offset for each heap operation.
However, with prescaling, the calculation for the parent element's
location is no longer as simple as '(i - 1) / 2'. To address this, we
copy the parent function from 'lib/sort.c', which calculates the parent
offset in a branchless manner without using any division instructions.
This optimization should result in a more efficient heap implementation
by reducing the computational overhead of finding parent and child
offsets.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
Tested with test_min_heap module.
include/linux/min_heap.h | 73 +++++++++++++++++++++++++++-------------
1 file changed, 49 insertions(+), 24 deletions(-)
diff --git a/include/linux/min_heap.h b/include/linux/min_heap.h
index 0abb21173979..bee28d7b6efc 100644
--- a/include/linux/min_heap.h
+++ b/include/linux/min_heap.h
@@ -73,38 +73,61 @@ bool __min_heap_full_inline(min_heap_char *heap)
#define min_heap_full_inline(_heap) \
__min_heap_full_inline((min_heap_char *)_heap)
+/**
+ * parent - given the offset of the child, find the offset of the parent.
+ * @i: the offset of the heap element whose parent is sought. Non-zero.
+ * @lsbit: a precomputed 1-bit mask, equal to "size & -size"
+ * @size: size of each element
+ *
+ * In terms of array indexes, the parent of element j = @i/@size is simply
+ * (j-1)/2. But when working in byte offsets, we can't use implicit
+ * truncation of integer divides.
+ *
+ * Fortunately, we only need one bit of the quotient, not the full divide.
+ * @size has a least significant bit. That bit will be clear if @i is
+ * an even multiple of @size, and set if it's an odd multiple.
+ *
+ * Logically, we're doing "if (i & lsbit) i -= size;", but since the
+ * branch is unpredictable, it's done with a bit of clever branch-free
+ * code instead.
+ */
+__attribute_const__ __always_inline
+static size_t parent(size_t i, unsigned int lsbit, size_t size)
+{
+ i -= size;
+ i -= size & -(i & lsbit);
+ return i / 2;
+}
+
/* Sift the element at pos down the heap. */
static __always_inline
void __min_heap_sift_down_inline(min_heap_char *heap, int pos, size_t elem_size,
const struct min_heap_callbacks *func, void *args)
{
- void *left, *right;
+ const unsigned long lsbit = elem_size & -elem_size;
void *data = heap->data;
- void *root = data + pos * elem_size;
- int i = pos, j;
+ /* pre-scale counters for performance */
+ size_t a = pos * elem_size;
+ size_t b, c, d;
+ size_t n = heap->nr * elem_size;
/* Find the sift-down path all the way to the leaves. */
- for (;;) {
- if (i * 2 + 2 >= heap->nr)
- break;
- left = data + (i * 2 + 1) * elem_size;
- right = data + (i * 2 + 2) * elem_size;
- i = func->less(left, right, args) ? i * 2 + 1 : i * 2 + 2;
- }
+ for (b = a; c = 2 * b + elem_size, (d = c + elem_size) < n;)
+ b = func->less(data + c, data + d, args) ? c : d;
/* Special case for the last leaf with no sibling. */
- if (i * 2 + 2 == heap->nr)
- i = i * 2 + 1;
+ if (d == n)
+ b = c;
/* Backtrack to the correct location. */
- while (i != pos && func->less(root, data + i * elem_size, args))
- i = (i - 1) / 2;
+ while (b != a && func->less(data + a, data + b, args))
+ b = parent(b, lsbit, elem_size);
/* Shift the element into its correct place. */
- j = i;
- while (i != pos) {
- i = (i - 1) / 2;
- func->swp(data + i * elem_size, data + j * elem_size, args);
+ c = b;
+ while (b != a) {
+ b = parent(b, lsbit, elem_size);
+ func->swp(data + b, data + c, args);
}
}
@@ -117,15 +140,17 @@ static __always_inline
void __min_heap_sift_up_inline(min_heap_char *heap, size_t elem_size, size_t idx,
const struct min_heap_callbacks *func, void *args)
{
+ const unsigned long lsbit = elem_size & -elem_size;
void *data = heap->data;
- size_t parent;
+ /* pre-scale counters for performance */
+ size_t a = idx * elem_size, b;
- while (idx) {
- parent = (idx - 1) / 2;
- if (func->less(data + parent * elem_size, data + idx * elem_size, args))
+ while (a) {
+ b = parent(a, lsbit, elem_size);
+ if (func->less(data + b, data + a, args))
break;
- func->swp(data + parent * elem_size, data + idx * elem_size, args);
- idx = parent;
+ func->swp(data + a, data + b, args);
+ a = b;
}
}
--
2.34.1
next prev parent reply other threads:[~2024-10-13 18:47 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-13 18:47 [PATCH 0/3] Enhance min heap API with non-inline functions and optimizations Kuan-Wei Chiu
2024-10-13 18:47 ` [PATCH 1/3] lib/min_heap: Introduce non-inline versions of min heap API functions Kuan-Wei Chiu
2024-10-14 8:13 ` Peter Zijlstra
2024-10-14 8:20 ` Peter Zijlstra
2024-10-14 9:41 ` Kuan-Wei Chiu
2024-10-17 3:26 ` Kent Overstreet
2024-10-17 9:55 ` Peter Zijlstra
2024-10-17 10:46 ` Kent Overstreet
2024-10-20 5:15 ` Kuan-Wei Chiu
2024-10-13 18:47 ` Kuan-Wei Chiu [this message]
2024-10-13 18:47 ` [PATCH 3/3] Documentation/core-api: Add min heap API introduction Kuan-Wei Chiu
2024-10-14 8:55 ` Matthew Wilcox
2024-10-14 10:04 ` Kuan-Wei Chiu
2024-10-13 23:05 ` [PATCH 0/3] Enhance min heap API with non-inline functions and optimizations Kent Overstreet
2024-10-13 23:27 ` Kuan-Wei Chiu
2024-10-14 2:08 ` Kent Overstreet
2024-10-14 1:18 ` Coly Li
2024-10-14 1:23 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241013184703.659652-3-visitorckw@gmail.com \
--to=visitorckw@gmail.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=colyli@suse.de \
--cc=corbet@lwn.net \
--cc=dm-devel@lists.linux.dev \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=jserv@ccns.ncku.edu.tw \
--cc=kan.liang@linux.intel.com \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=msakai@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).