From: Kuan-Wei Chiu <visitorckw@gmail.com>
To: colyli@suse.de, kent.overstreet@linux.dev
Cc: bfoster@redhat.com, jserv@ccns.ncku.edu.tw,
linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-bcachefs@vger.kernel.org,
Kuan-Wei Chiu <visitorckw@gmail.com>
Subject: [PATCH 3/5] bcachefs: Optimize sort_cmp_size() using bottom-up heapsort
Date: Sun, 21 Jan 2024 23:36:47 +0800 [thread overview]
Message-ID: <20240121153649.2733274-4-visitorckw@gmail.com> (raw)
In-Reply-To: <20240121153649.2733274-1-visitorckw@gmail.com>
This optimization reduces the average number of comparisons required
from 2*n*log2(n) - 3*n + o(n) to n*log2(n) + 0.37*n + o(n). When n is
sufficiently large, it results in approximately 50% fewer comparisons.
Currently, sort_cmp_size employs the textbook version of heapsort,
where during the heapify process, each level requires two comparisons
to determine the maximum among three elements. In contrast, the
bottom-up heapsort, during heapify, only compares two children at each
level until reaching a leaf node. Then, it backtracks from the leaf
node to find the correct position. Since heapify typically continues
until very close to the leaf node, the standard heapify requires about
2*log2(n) comparisons, while the bottom-up variant only needs log2(n)
comparisons.
Note: This patch depends on patch "bcachefs: Introduce parent function
for sort_cmp_size()".
The experimental data presented below is based on an array generated
by get_random_u32().
| N | comparisons(old) | comparisons(new) | time(old) | time(new) |
|-------|------------------|------------------|-----------|-----------|
| 10000 | 235498 | 136592 | 17954 us | 14834 us |
| 20000 | 510666 | 293254 | 23549 us | 18364 us |
| 30000 | 800461 | 457351 | 33361 us | 19493 us |
| 40000 | 1101317 | 626661 | 33672 us | 26810 us |
| 50000 | 1409743 | 799745 | 42634 us | 34694 us |
| 60000 | 1721165 | 974737 | 51414 us | 41367 us |
| 70000 | 2037972 | 1152111 | 63084 us | 49146 us |
| 80000 | 2362590 | 1333270 | 73802 us | 54715 us |
| 90000 | 2690155 | 1516148 | 82693 us | 63070 us |
| 100000| 3019730 | 1699757 | 88981 us | 70367 us |
Refs:
BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
QUICKSORT (if n is not very small)
Ingo Wegener
Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
https://doi.org/10.1016/0304-3975(93)90364-Y
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
This patch has undergone unit testing and micro benchmarking using the
following code [1].
[1]:
static long long int cmp_count = 0;
static int mycmp(const void *a, const void *b, size_t size)
{
u32 _a = *(u32 *)a;
u32 _b = *(u32 *)b;
cmp_count++;
if (_a < _b)
return -1;
else if (_a > _b)
return 1;
else
return 0;
}
static int test(void)
{
size_t N, i;
ktime_t start, end;
s64 delta;
u32 *arr;
for (N = 10000; N <= 100000; N += 10000) {
arr = kmalloc_array(N, sizeof(u32), GFP_KERNEL);
cmp_count = 0;
for (i = 0; i < N; i++)
arr[i] = get_random_u32();
start = ktime_get();
sort_cmp_size(arr, N, sizeof(u32), mycmp, NULL);
end = ktime_get();
delta = ktime_us_delta(end, start);
printk(KERN_INFO "time: %lld\n", delta);
printk(KERN_INFO "comparisons: %lld\n", cmp_count);
for (i = 0; i < N - 1; i++) {
if (arr[i] > arr[i+1]) {
kfree(arr);
return -1;
}
}
kfree(arr);
}
return 0;
}
fs/bcachefs/util.c | 52 +++++++++++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 17 deletions(-)
diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c
index f5bbf96df2ce..4dacb2b9a0a5 100644
--- a/fs/bcachefs/util.c
+++ b/fs/bcachefs/util.c
@@ -979,7 +979,8 @@ void sort_cmp_size(void *base, size_t num, size_t size,
void (*swap_func)(void *, void *, size_t size))
{
/* pre-scale counters for performance */
- int i = (num/2 - 1) * size, n = num * size, c, r;
+ int i = (num / 2 - 1) * size, n = num * size, j, k;
+ const size_t lsbit = size & -size;
if (!swap_func) {
if (size == 4 && alignment_ok(base, 4))
@@ -992,28 +993,45 @@ void sort_cmp_size(void *base, size_t num, size_t size,
/* heapify */
for ( ; i >= 0; i -= size) {
- for (r = i; r * 2 + size < n; r = c) {
- c = r * 2 + size;
- if (c < n - size &&
- cmp_func(base + c, base + c + size, size) < 0)
- c += size;
- if (cmp_func(base + r, base + c, size) >= 0)
- break;
- swap_func(base + r, base + c, size);
+ /* Find the sift-down path all the way to the leaves. */
+ for (j = i; k = j * 2 + size, k + size < n;)
+ j = cmp_func(base + k, base + k + size, size) > 0 ? k : k + size;
+
+ /* Special case for the last leaf with no sibling. */
+ if (j * 2 + size * 2 == n)
+ j = j * 2 + size;
+
+ /* Backtrack to the correct location. */
+ while (j != i && cmp_func(base + i, base + j, size) >= 0)
+ j = parent(j, lsbit, size);
+
+ /* Shift the element into its correct place. */
+ for (k = j; j != i;) {
+ j = parent(j, lsbit, size);
+ swap_func(base + j, base + k, size);
}
}
/* sort */
for (i = n - size; i > 0; i -= size) {
swap_func(base, base + i, size);
- for (r = 0; r * 2 + size < i; r = c) {
- c = r * 2 + size;
- if (c < i - size &&
- cmp_func(base + c, base + c + size, size) < 0)
- c += size;
- if (cmp_func(base + r, base + c, size) >= 0)
- break;
- swap_func(base + r, base + c, size);
+
+ /* Find the sift-down path all the way to the leaves. */
+ for (j = 0; k = j * 2 + size, k + size < i;)
+ j = cmp_func(base + k, base + k + size, size) > 0 ? k : k + size;
+
+ /* Special case for the last leaf with no sibling. */
+ if (j * 2 + size * 2 == i)
+ j = j * 2 + size;
+
+ /* Backtrack to the correct location. */
+ while (j && cmp_func(base, base + j, size) >= 0)
+ j = parent(j, lsbit, size);
+
+ /* Shift the element into its correct place. */
+ for (k = j; j;) {
+ j = parent(j, lsbit, size);
+ swap_func(base + j, base + k, size);
}
}
}
--
2.25.1
next prev parent reply other threads:[~2024-01-21 15:37 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-21 15:36 [PATCH 0/5] Optimize number of comparisons for heap/heapsort implementaion Kuan-Wei Chiu
2024-01-21 15:36 ` [PATCH 1/5] bcachefs: Optimize eytzinger0_sort() using bottom-up heapsort Kuan-Wei Chiu
2024-01-21 15:36 ` [PATCH 2/5] bcachefs: Introduce parent function for sort_cmp_size() Kuan-Wei Chiu
2024-01-21 16:17 ` Kent Overstreet
2024-01-21 17:05 ` Kuan-Wei Chiu
2024-01-21 17:37 ` Kent Overstreet
2024-01-21 15:36 ` Kuan-Wei Chiu [this message]
2024-01-21 15:36 ` [PATCH 4/5] bcachefs: Optimize number of comparisons in heap_sift_down Kuan-Wei Chiu
2024-01-21 15:36 ` [PATCH 5/5] bcache: Optimize number of comparisons in heap_sift Kuan-Wei Chiu
2024-01-21 16:21 ` [PATCH 0/5] Optimize number of comparisons for heap/heapsort implementaion Kent Overstreet
2024-01-21 16:55 ` Kuan-Wei Chiu
2024-01-21 17:41 ` Kent Overstreet
2024-01-22 15:06 ` Kuan-Wei Chiu
2024-01-22 16:06 ` Kent Overstreet
2024-01-22 16:23 ` Kuan-Wei Chiu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240121153649.2733274-4-visitorckw@gmail.com \
--to=visitorckw@gmail.com \
--cc=bfoster@redhat.com \
--cc=colyli@suse.de \
--cc=jserv@ccns.ncku.edu.tw \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.