From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 3/4] ls-files: use strbuf_add_uint()
Date: Tue, 12 May 2026 22:44:21 +0200 [thread overview]
Message-ID: <2f45a33b-5945-431d-97a5-7d61e271cfba@web.de> (raw)
In-Reply-To: <20260512190105.GE70851@coredump.intra.peff.net>
On 5/12/26 9:01 PM, Jeff King wrote:
> On Tue, May 12, 2026 at 01:56:02PM +0200, René Scharfe wrote:
>
>> Speed up printing of objectsize values by using the specialized function
>> strbuf_add_uint() as well as strbuf_insert() for padding instead of the
>> general-purpose function strbuf_addf(). Here are the numbers I get when
>> listing files in the Linux kernel repo:
>>
>> Benchmark 1: ./git_main -C ../linux ls-files --format='%(objectsize)'
>> Time (mean ± σ): 257.3 ms ± 0.4 ms [User: 197.4 ms, System: 56.7 ms]
>> Range (min … max): 256.7 ms … 258.1 ms 11 runs
>>
>> Benchmark 2: ./git -C ../linux ls-files --format='%(objectsize)'
>> Time (mean ± σ): 253.4 ms ± 0.3 ms [User: 193.6 ms, System: 56.6 ms]
>> Range (min … max): 253.0 ms … 253.8 ms 11 runs
>
> OK, so here the improvement is less impressive than the previous commit.
> And the code is...
>
>> {
>> + static const char padding[] = " ";
>> + size_t min_len = padded ? strlen(padding) : 0;
>> + size_t orig_len = line->len;
>> + size_t len;
>> +
>> if (type == OBJ_BLOB) {
>> unsigned long size;
>> if (odb_read_object_info(repo->objects, oid, &size) < 0)
>> die(_("could not get object info about '%s'"),
>> oid_to_hex(oid));
>> - if (padded)
>> - strbuf_addf(line, "%7"PRIuMAX, (uintmax_t)size);
>> - else
>> - strbuf_addf(line, "%"PRIuMAX, (uintmax_t)size);
>> - } else if (padded) {
>> - strbuf_addf(line, "%7s", "-");
>> + strbuf_add_uint(line, size);
>> } else {
>> strbuf_addstr(line, "-");
>> }
>> + len = line->len - orig_len;
>> + if (len < min_len)
>> + strbuf_insert(line, orig_len, padding, min_len - len);
>> }
>
> ...also less nice. We are formatting into the strbuf, and then maybe
> memmove()-ing the result to accommodate padding. I wonder how much that
> affects the timing. It's extra shuffling, but memmove() etc is often
> surprisingly fast.
I gave my objectsize and objectsize:padded numbers; the difference was
1.2 ms, albeit with 1.0 ms noise in padded case.
> I wonder how bad it would be to handle the padding ahead of time.
> Obviously strbuf_add_uint() knows the size of the result right before it
> calls memcpy(), and it could insert the padding then. But adding a
> padding length parameter (let alone the space vs "0" decision) to that
> function feels kind of gross.
>
> In the earlier patch I raised the notion of pre-computing the output
> length. If we had a helper to do that, it would be pretty easy to do:
>
> /* noop if third parameter is negative */
> strbuf_pad(line, ' ', 7 - decimal_digits(size));
> strbuf_add_uint(line, size);
I started with a struct numbuf for holding a number string and its
length, which helped avoid the memmove(3) call. It's simple and
doesn't require a lot of code, but introducing that concept felt a bit
much for just two users.
> You could also imagine a world where we had some stateful formatting
> system, and you could say:
>
> strbuf_pad_next(line, ' ', 7);
> strbuf_add_uint(line, size);
>
> but somebody has to store that state between the calls, and I don't love
> the idea of bloating strbuf with it. So probably you have some
> "formatter" struct, and it operates on a strbuf. And now we have all of
> the OO boilerplate hassles like initializing and tearing down our
> formatter object. ;) So probably not worth it for this triviality.
Terrifying!
> Having it all in one string ("%7d") is nice and concise.
Yes, except the original code includes the 7 twice (again for the
"-" fallback).
> I have often wondered how hard it would be to implement our own
> vsnprintf(), and whether we could do better than the libc ones. It would
> be nice to be able to add shorthands for common types (instead of the
> unreadable PRIuMAX mess), as well as custom ones (e.g., hex oids).
C99 has %ju for uintmax_t and %zu for size_t. Hmm, do we actually
still need to avoid them? CodingGuidelines says "the C library used
by MinGW does not" support it. 82c36fa0a9 (submodule: hash the
submodule name for the gitdir path, 2026-01-12) just added a %zu,
and there are lots of them in compat/mimalloc/ in Git for Windows.
An extensible printf-like formatter would be nice indeed. I wondered
about how something like that could be used to write structured output
like tar and zip archive entries in a terse way. The thought faded,
I guess, when I found no compelling reason for that compactness that
would justify the required complexity of the mechanism.
René
next prev parent reply other threads:[~2026-05-12 20:44 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 11:55 [PATCH 0/4] strbuf: add and use strbuf_add_uint() René Scharfe
2026-05-12 11:56 ` [PATCH 1/4] strbuf: add strbuf_add_uint() René Scharfe
2026-05-12 18:42 ` Jeff King
2026-05-12 19:32 ` René Scharfe
2026-05-13 16:22 ` Jeff King
2026-05-13 16:47 ` Jeff King
2026-05-13 16:49 ` Jeff King
2026-05-14 11:09 ` René Scharfe
2026-05-14 11:53 ` Junio C Hamano
2026-05-15 3:53 ` Jeff King
2026-05-13 17:46 ` René Scharfe
2026-05-12 11:56 ` [PATCH 2/4] cat-file: use strbuf_add_uint() René Scharfe
2026-05-12 18:46 ` Jeff King
2026-05-12 11:56 ` [PATCH 3/4] ls-files: " René Scharfe
2026-05-12 19:01 ` Jeff King
2026-05-12 20:44 ` René Scharfe [this message]
2026-05-13 16:46 ` Jeff King
2026-05-12 11:56 ` [PATCH 4/4] ls-tree: " René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2f45a33b-5945-431d-97a5-7d61e271cfba@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox