All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kuan-Wei Chiu <visitorckw@gmail.com>
To: colyli@suse.de, kent.overstreet@linux.dev
Cc: bfoster@redhat.com, jserv@ccns.ncku.edu.tw,
	linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-bcachefs@vger.kernel.org,
	Kuan-Wei Chiu <visitorckw@gmail.com>
Subject: [PATCH 3/5] bcachefs: Optimize sort_cmp_size() using bottom-up heapsort
Date: Sun, 21 Jan 2024 23:36:47 +0800	[thread overview]
Message-ID: <20240121153649.2733274-4-visitorckw@gmail.com> (raw)
In-Reply-To: <20240121153649.2733274-1-visitorckw@gmail.com>

This optimization reduces the average number of comparisons required
from 2*n*log2(n) - 3*n + o(n) to n*log2(n) + 0.37*n + o(n). When n is
sufficiently large, it results in approximately 50% fewer comparisons.

Currently, sort_cmp_size employs the textbook version of heapsort,
where during the heapify process, each level requires two comparisons
to determine the maximum among three elements. In contrast, the
bottom-up heapsort, during heapify, only compares two children at each
level until reaching a leaf node. Then, it backtracks from the leaf
node to find the correct position. Since heapify typically continues
until very close to the leaf node, the standard heapify requires about
2*log2(n) comparisons, while the bottom-up variant only needs log2(n)
comparisons.

Note: This patch depends on patch "bcachefs: Introduce parent function
for sort_cmp_size()".

The experimental data presented below is based on an array generated
by get_random_u32().

|   N   | comparisons(old) | comparisons(new) | time(old) | time(new) |
|-------|------------------|------------------|-----------|-----------|
| 10000 |     235498       |     136592       |  17954 us |  14834 us |
| 20000 |     510666       |     293254       |  23549 us |  18364 us |
| 30000 |     800461       |     457351       |  33361 us |  19493 us |
| 40000 |    1101317       |     626661       |  33672 us |  26810 us |
| 50000 |    1409743       |     799745       |  42634 us |  34694 us |
| 60000 |    1721165       |     974737       |  51414 us |  41367 us |
| 70000 |    2037972       |    1152111       |  63084 us |  49146 us |
| 80000 |    2362590       |    1333270       |  73802 us |  54715 us |
| 90000 |    2690155       |    1516148       |  82693 us |  63070 us |
| 100000|    3019730       |    1699757       |  88981 us |  70367 us |

Refs:
  BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
  QUICKSORT (if n is not very small)
  Ingo Wegener
  Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
  https://doi.org/10.1016/0304-3975(93)90364-Y

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
This patch has undergone unit testing and micro benchmarking using the
following code [1].

[1]:
static long long int cmp_count = 0;

static int mycmp(const void *a, const void *b, size_t size)
{
    u32 _a = *(u32 *)a;
	u32 _b = *(u32 *)b;

	cmp_count++;
	if (_a < _b)
		return -1;
	else if (_a > _b)
		return 1;
	else
		return 0;
}

static int test(void)
{
    size_t N, i;
	ktime_t start, end;
	s64 delta;
	u32 *arr;

	for (N = 10000; N <= 100000; N += 10000) {
		arr = kmalloc_array(N, sizeof(u32), GFP_KERNEL);
		cmp_count = 0;

		for (i = 0; i < N; i++)
			arr[i] = get_random_u32();
		
		start = ktime_get();
		sort_cmp_size(arr, N, sizeof(u32), mycmp, NULL);
		end = ktime_get();

		delta = ktime_us_delta(end, start);
        printk(KERN_INFO "time: %lld\n", delta);
		printk(KERN_INFO "comparisons: %lld\n", cmp_count);

		for (i = 0; i < N - 1; i++) {
			if (arr[i] > arr[i+1]) {
				kfree(arr);
				return -1;
			}
		}

		kfree(arr);
	}
    return 0;
}

 fs/bcachefs/util.c | 52 +++++++++++++++++++++++++++++++---------------
 1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c
index f5bbf96df2ce..4dacb2b9a0a5 100644
--- a/fs/bcachefs/util.c
+++ b/fs/bcachefs/util.c
@@ -979,7 +979,8 @@ void sort_cmp_size(void *base, size_t num, size_t size,
 	  void (*swap_func)(void *, void *, size_t size))
 {
 	/* pre-scale counters for performance */
-	int i = (num/2 - 1) * size, n = num * size, c, r;
+	int i = (num / 2 - 1) * size, n = num * size, j, k;
+	const size_t lsbit = size & -size;
 
 	if (!swap_func) {
 		if (size == 4 && alignment_ok(base, 4))
@@ -992,28 +993,45 @@ void sort_cmp_size(void *base, size_t num, size_t size,
 
 	/* heapify */
 	for ( ; i >= 0; i -= size) {
-		for (r = i; r * 2 + size < n; r  = c) {
-			c = r * 2 + size;
-			if (c < n - size &&
-			    cmp_func(base + c, base + c + size, size) < 0)
-				c += size;
-			if (cmp_func(base + r, base + c, size) >= 0)
-				break;
-			swap_func(base + r, base + c, size);
+		/* Find the sift-down path all the way to the leaves. */
+		for (j = i; k = j * 2 + size, k + size < n;)
+			j = cmp_func(base + k, base + k + size, size) > 0 ? k : k + size;
+
+		/* Special case for the last leaf with no sibling. */
+		if (j * 2 + size * 2 == n)
+			j = j * 2 + size;
+
+		/* Backtrack to the correct location. */
+		while (j != i && cmp_func(base + i, base + j, size) >= 0)
+			j = parent(j, lsbit, size);
+
+		/* Shift the element into its correct place. */
+		for (k = j; j != i;) {
+			j = parent(j, lsbit, size);
+			swap_func(base + j, base + k, size);
 		}
 	}
 
 	/* sort */
 	for (i = n - size; i > 0; i -= size) {
 		swap_func(base, base + i, size);
-		for (r = 0; r * 2 + size < i; r = c) {
-			c = r * 2 + size;
-			if (c < i - size &&
-			    cmp_func(base + c, base + c + size, size) < 0)
-				c += size;
-			if (cmp_func(base + r, base + c, size) >= 0)
-				break;
-			swap_func(base + r, base + c, size);
+
+		/* Find the sift-down path all the way to the leaves. */
+		for (j = 0; k = j * 2 + size, k + size < i;)
+			j = cmp_func(base + k, base + k + size, size) > 0 ? k : k + size;
+
+		/* Special case for the last leaf with no sibling. */
+		if (j * 2 + size * 2 == i)
+			j = j * 2 + size;
+
+		/* Backtrack to the correct location. */
+		while (j && cmp_func(base, base + j, size) >= 0)
+			j = parent(j, lsbit, size);
+
+		/* Shift the element into its correct place. */
+		for (k = j; j;) {
+			j = parent(j, lsbit, size);
+			swap_func(base + j, base + k, size);
 		}
 	}
 }
-- 
2.25.1


  parent reply	other threads:[~2024-01-21 15:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-21 15:36 [PATCH 0/5] Optimize number of comparisons for heap/heapsort implementaion Kuan-Wei Chiu
2024-01-21 15:36 ` [PATCH 1/5] bcachefs: Optimize eytzinger0_sort() using bottom-up heapsort Kuan-Wei Chiu
2024-01-21 15:36 ` [PATCH 2/5] bcachefs: Introduce parent function for sort_cmp_size() Kuan-Wei Chiu
2024-01-21 16:17   ` Kent Overstreet
2024-01-21 17:05     ` Kuan-Wei Chiu
2024-01-21 17:37       ` Kent Overstreet
2024-01-21 15:36 ` Kuan-Wei Chiu [this message]
2024-01-21 15:36 ` [PATCH 4/5] bcachefs: Optimize number of comparisons in heap_sift_down Kuan-Wei Chiu
2024-01-21 15:36 ` [PATCH 5/5] bcache: Optimize number of comparisons in heap_sift Kuan-Wei Chiu
2024-01-21 16:21 ` [PATCH 0/5] Optimize number of comparisons for heap/heapsort implementaion Kent Overstreet
2024-01-21 16:55   ` Kuan-Wei Chiu
2024-01-21 17:41     ` Kent Overstreet
2024-01-22 15:06       ` Kuan-Wei Chiu
2024-01-22 16:06         ` Kent Overstreet
2024-01-22 16:23           ` Kuan-Wei Chiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240121153649.2733274-4-visitorckw@gmail.com \
    --to=visitorckw@gmail.com \
    --cc=bfoster@redhat.com \
    --cc=colyli@suse.de \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.