Git development
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 3/4] ls-files: use strbuf_add_uint()
Date: Tue, 12 May 2026 22:44:21 +0200	[thread overview]
Message-ID: <2f45a33b-5945-431d-97a5-7d61e271cfba@web.de> (raw)
In-Reply-To: <20260512190105.GE70851@coredump.intra.peff.net>

On 5/12/26 9:01 PM, Jeff King wrote:
> On Tue, May 12, 2026 at 01:56:02PM +0200, René Scharfe wrote:
> 
>> Speed up printing of objectsize values by using the specialized function
>> strbuf_add_uint() as well as strbuf_insert() for padding instead of the
>> general-purpose function strbuf_addf().  Here are the numbers I get when
>> listing files in the Linux kernel repo:
>>
>> Benchmark 1: ./git_main -C ../linux ls-files --format='%(objectsize)'
>>   Time (mean ± σ):     257.3 ms ±   0.4 ms    [User: 197.4 ms, System: 56.7 ms]
>>   Range (min … max):   256.7 ms … 258.1 ms    11 runs
>>
>> Benchmark 2: ./git -C ../linux ls-files --format='%(objectsize)'
>>   Time (mean ± σ):     253.4 ms ±   0.3 ms    [User: 193.6 ms, System: 56.6 ms]
>>   Range (min … max):   253.0 ms … 253.8 ms    11 runs
> 
> OK, so here the improvement is less impressive than the previous commit.
> And the code is...
> 
>>  {
>> +	static const char padding[] = "       ";
>> +	size_t min_len = padded ? strlen(padding) : 0;
>> +	size_t orig_len = line->len;
>> +	size_t len;
>> +
>>  	if (type == OBJ_BLOB) {
>>  		unsigned long size;
>>  		if (odb_read_object_info(repo->objects, oid, &size) < 0)
>>  			die(_("could not get object info about '%s'"),
>>  			    oid_to_hex(oid));
>> -		if (padded)
>> -			strbuf_addf(line, "%7"PRIuMAX, (uintmax_t)size);
>> -		else
>> -			strbuf_addf(line, "%"PRIuMAX, (uintmax_t)size);
>> -	} else if (padded) {
>> -		strbuf_addf(line, "%7s", "-");
>> +		strbuf_add_uint(line, size);
>>  	} else {
>>  		strbuf_addstr(line, "-");
>>  	}
>> +	len = line->len - orig_len;
>> +	if (len < min_len)
>> +		strbuf_insert(line, orig_len, padding, min_len - len);
>>  }
> 
> ...also less nice. We are formatting into the strbuf, and then maybe
> memmove()-ing the result to accommodate padding. I wonder how much that
> affects the timing. It's extra shuffling, but memmove() etc is often
> surprisingly fast.

I gave my objectsize and objectsize:padded numbers; the difference was
1.2 ms, albeit with 1.0 ms noise in padded case.

> I wonder how bad it would be to handle the padding ahead of time.
> Obviously strbuf_add_uint() knows the size of the result right before it
> calls memcpy(), and it could insert the padding then. But adding a
> padding length parameter (let alone the space vs "0" decision) to that
> function feels kind of gross.
> 
> In the earlier patch I raised the notion of pre-computing the output
> length. If we had a helper to do that, it would be pretty easy to do:
> 
>   /* noop if third parameter is negative */
>   strbuf_pad(line, ' ', 7 - decimal_digits(size));
>   strbuf_add_uint(line, size);

I started with a struct numbuf for holding a number string and its
length, which helped avoid the memmove(3) call.  It's simple and
doesn't require a lot of code, but introducing that concept felt a bit
much for just two users.

> You could also imagine a world where we had some stateful formatting
> system, and you could say:
> 
>   strbuf_pad_next(line, ' ', 7);
>   strbuf_add_uint(line, size);
> 
> but somebody has to store that state between the calls, and I don't love
> the idea of bloating strbuf with it. So probably you have some
> "formatter" struct, and it operates on a strbuf. And now we have all of
> the OO boilerplate hassles like initializing and tearing down our
> formatter object. ;) So probably not worth it for this triviality.

Terrifying!

> Having it all in one string ("%7d") is nice and concise.

Yes, except the original code includes the 7 twice (again for the
"-" fallback).
> I have often wondered how hard it would be to implement our own
> vsnprintf(), and whether we could do better than the libc ones. It would
> be nice to be able to add shorthands for common types (instead of the
> unreadable PRIuMAX mess), as well as custom ones (e.g., hex oids).

C99 has %ju for uintmax_t and %zu for size_t.  Hmm, do we actually
still need to avoid them?  CodingGuidelines says "the C library used
by MinGW does not" support it.  82c36fa0a9 (submodule: hash the
submodule name for the gitdir path, 2026-01-12) just added a %zu,
and there are lots of them in compat/mimalloc/ in Git for Windows.

An extensible printf-like formatter would be nice indeed.  I wondered
about how something like that could be used to write structured output
like tar and zip archive entries in a terse way.  The thought faded,
I guess, when I found no compelling reason for that compactness that
would justify the required complexity of the mechanism.

René


  reply	other threads:[~2026-05-12 20:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 11:55 [PATCH 0/4] strbuf: add and use strbuf_add_uint() René Scharfe
2026-05-12 11:56 ` [PATCH 1/4] strbuf: add strbuf_add_uint() René Scharfe
2026-05-12 18:42   ` Jeff King
2026-05-12 19:32     ` René Scharfe
2026-05-13 16:22       ` Jeff King
2026-05-13 16:47         ` Jeff King
2026-05-13 16:49         ` Jeff King
2026-05-14 11:09           ` René Scharfe
2026-05-14 11:53             ` Junio C Hamano
2026-05-15  3:53             ` Jeff King
2026-05-13 17:46         ` René Scharfe
2026-05-12 11:56 ` [PATCH 2/4] cat-file: use strbuf_add_uint() René Scharfe
2026-05-12 18:46   ` Jeff King
2026-05-12 11:56 ` [PATCH 3/4] ls-files: " René Scharfe
2026-05-12 19:01   ` Jeff King
2026-05-12 20:44     ` René Scharfe [this message]
2026-05-13 16:46       ` Jeff King
2026-05-12 11:56 ` [PATCH 4/4] ls-tree: " René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f45a33b-5945-431d-97a5-7d61e271cfba@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox