git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Toon Claes <toon@iotcl.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
	Justin Tobler <jltobler@gmail.com>,
	Derrick Stolee <stolee@gmail.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH] last-modified: implement faster algorithm
Date: Tue, 21 Oct 2025 11:04:05 +0200	[thread overview]
Message-ID: <87cy6gtym2.fsf@iotcl.com> (raw)
In-Reply-To: <87jz0tu3yh.fsf@iotcl.com>

> Taylor Blau <me@ttaylorr.com> writes:

>> Nice, I am glad to see that we are using a bitmap here rather than the
>> hacky 'char *' that we had originally written. I seem to remember that
>> there was a tiny slow-down when using bitmaps, but can't find the
>> discussion anymore. (It wasn't in the internal PR that I originally
>> opened, and I no longer can read messages that far back in history.)
>>
>> It might be worth benchmarking here to see if using a 'char *' is
>> faster. Of course, that's 8x worse in terms of memory usage, but not a
>> huge deal given both the magnitude and typical number of directory
>> elements (you'd need 1024^2 entries in a single tree to occupy even a
>> single MiB of heap).

Using ewah bitmaps is slightly faster, although the difference is almost
neglible.

    Benchmark 1: bitmap-ewah
      Time (mean ± σ):     793.1 ms ±   6.2 ms    [User: 755.1 ms, System: 35.2 ms]
      Range (min … max):   784.7 ms … 804.8 ms    10 runs

    Benchmark 2: bitmap-chars
      Time (mean ± σ):     808.9 ms ±  11.2 ms    [User: 770.8 ms, System: 35.4 ms]
      Range (min … max):   800.2 ms … 830.5 ms    10 runs

    Summary
      bitmap-ewah ran
        1.02 ± 0.02 times faster than bitmap-chars

And ewah bitmap being more memory efficient, it makes more sense to keep
using those.

>> Likewise, I wonder if we should have elemtype here be just 'struct
>> bitmap'. Unfortunately I don't think the EWAH code has a function like:
>>
>>     void bitmap_init(struct bitmap *);
>>
>> and only has ones that allocate for us. So we may consider adding one,
>> or creating a dummy bitmap and copying its contents, or otherwise.

I've done some testing, and to do so I've made bitmap_grow() public.

    Benchmark 1: bitmap-as-pointers
      Time (mean ± σ):     783.7 ms ±   8.9 ms    [User: 744.1 ms, System: 37.5 ms]
      Range (min … max):   774.4 ms … 803.4 ms    10 runs

    Benchmark 2: bitmap-as-values
      Time (mean ± σ):     856.7 ms ±  10.5 ms    [User: 816.0 ms, System: 38.1 ms]
      Range (min … max):   845.7 ms … 872.5 ms    10 runs

    Summary
      bitmap-as-pointers ran
        1.09 ± 0.02 times faster than bitmap-as-values

It seems using ewah bitmaps as pointers is faster than using bitmaps as
values. I must admit I'm surprised as well, but in case you want to
double check, here's the patch:

------------------------ >8 ------------------------

diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index c1316e1019..f607c47506 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -47,7 +47,7 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
  * Hold a bitmap for each commit we're working with. Each bit represents a path
  * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
  */
-define_commit_slab(commit_bitmaps, struct bitmap *);
+define_commit_slab(commit_bitmaps, struct bitmap);

 struct last_modified {
        struct hashmap paths;
@@ -65,11 +65,12 @@ struct last_modified {

 static struct bitmap *get_bitmap(struct last_modified *lm, struct commit *c)
 {
-       struct bitmap **bitmap = commit_bitmaps_at(&lm->commit_bitmaps, c);
-       if (!*bitmap)
-               *bitmap = bitmap_word_alloc(lm->all_paths_nr / BITS_IN_EWORD);
+       struct bitmap *bm = commit_bitmaps_at(&lm->commit_bitmaps, c);
+       if (!bm->word_alloc) {
+               bitmap_grow(bm, lm->all_paths_nr);
+       }

-       return *bitmap;
+       return bm;
 }

 static void last_modified_release(struct last_modified *lm)
@@ -442,7 +443,8 @@ static int last_modified_run(struct last_modified *lm)
                }

 cleanup:
-               bitmap_free(active_c);
+               free(active_c->words);
+               active_c->word_alloc = 0;
        }

        if (hashmap_get_size(&lm->paths))
diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 55928dada8..2500e3a0d7 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -42,7 +42,7 @@ struct bitmap *bitmap_dup(const struct bitmap *src)
        return dst;
 }

-static void bitmap_grow(struct bitmap *self, size_t word_alloc)
+void bitmap_grow(struct bitmap *self, size_t word_alloc)
 {
        size_t old_size = self->word_alloc;
        ALLOC_GROW(self->words, word_alloc, self->word_alloc);
diff --git a/ewah/ewok.h b/ewah/ewok.h
index c29d354236..3316807572 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -188,6 +188,7 @@ struct bitmap *bitmap_word_alloc(size_t word_alloc);
 struct bitmap *bitmap_dup(const struct bitmap *src);
 void bitmap_set(struct bitmap *self, size_t pos);
 void bitmap_unset(struct bitmap *self, size_t pos);
+void bitmap_grow(struct bitmap *self, size_t word_alloc);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_free(struct bitmap *self);
 int bitmap_equals(struct bitmap *self, struct bitmap *other);


-- 
Cheers,
Toon

  reply	other threads:[~2025-10-21  9:04 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16  8:39 [PATCH] last-modified: implement faster algorithm Toon Claes
2025-10-16 18:51 ` Justin Tobler
2025-10-17 10:38   ` Toon Claes
2025-10-16 20:48 ` D. Ben Knoble
2025-10-17 10:45   ` Toon Claes
2025-10-16 23:38 ` Taylor Blau
2025-10-17  6:30   ` Jeff King
2025-10-17 14:54     ` Taylor Blau
2025-10-21  8:20       ` Jeff King
2025-10-17 12:07   ` Toon Claes
2025-10-21  9:04     ` Toon Claes [this message]
2025-10-23 23:59       ` Taylor Blau
2025-10-21 13:00     ` Toon Claes
2025-10-23 23:56     ` Taylor Blau
2025-10-27 15:48       ` Toon Claes
2025-10-17  6:37 ` Jeff King
2025-10-17 10:47   ` Toon Claes
2025-10-21 12:56 ` [PATCH v2] " Toon Claes
2025-10-21 17:52   ` Junio C Hamano
2025-10-22  0:26     ` Taylor Blau
2025-10-22  0:28       ` Taylor Blau
2025-10-22  3:48       ` Junio C Hamano
2025-10-24  0:01         ` Taylor Blau
2025-10-24  0:37           ` Junio C Hamano
2025-10-27 19:22             ` Taylor Blau
2025-10-29 13:01               ` Toon Claes
2025-10-23  8:01     ` Toon Claes
2025-10-23  7:50   ` [PATCH v3] " Toon Claes
2025-10-24  0:03     ` Taylor Blau
2025-10-27  7:03       ` Toon Claes
2025-11-03 15:47   ` [PATCH v4] " Toon Claes
2025-11-03 16:44     ` Junio C Hamano
2025-11-04 15:08       ` Toon Claes
2025-11-19 11:34     ` t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm) Anders Kaseorg
2025-11-19 13:49       ` Kristoffer Haugsbakk
2025-11-19 20:06         ` Anders Kaseorg
2025-11-20  8:16           ` Jeff King
2025-11-28 16:45             ` Toon Claes
2025-11-28 17:35               ` Kristoffer Haugsbakk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cy6gtym2.fsf@iotcl.com \
    --to=toon@iotcl.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).