All of lore.kernel.org
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 3/4] ls-files: use strbuf_add_uint()
Date: Tue, 12 May 2026 22:44:21 +0200	[thread overview]
Message-ID: <2f45a33b-5945-431d-97a5-7d61e271cfba@web.de> (raw)
In-Reply-To: <20260512190105.GE70851@coredump.intra.peff.net>

On 5/12/26 9:01 PM, Jeff King wrote:
> On Tue, May 12, 2026 at 01:56:02PM +0200, René Scharfe wrote:
> 
>> Speed up printing of objectsize values by using the specialized function
>> strbuf_add_uint() as well as strbuf_insert() for padding instead of the
>> general-purpose function strbuf_addf().  Here are the numbers I get when
>> listing files in the Linux kernel repo:
>>
>> Benchmark 1: ./git_main -C ../linux ls-files --format='%(objectsize)'
>>   Time (mean ± σ):     257.3 ms ±   0.4 ms    [User: 197.4 ms, System: 56.7 ms]
>>   Range (min … max):   256.7 ms … 258.1 ms    11 runs
>>
>> Benchmark 2: ./git -C ../linux ls-files --format='%(objectsize)'
>>   Time (mean ± σ):     253.4 ms ±   0.3 ms    [User: 193.6 ms, System: 56.6 ms]
>>   Range (min … max):   253.0 ms … 253.8 ms    11 runs
> 
> OK, so here the improvement is less impressive than the previous commit.
> And the code is...
> 
>>  {
>> +	static const char padding[] = "       ";
>> +	size_t min_len = padded ? strlen(padding) : 0;
>> +	size_t orig_len = line->len;
>> +	size_t len;
>> +
>>  	if (type == OBJ_BLOB) {
>>  		unsigned long size;
>>  		if (odb_read_object_info(repo->objects, oid, &size) < 0)
>>  			die(_("could not get object info about '%s'"),
>>  			    oid_to_hex(oid));
>> -		if (padded)
>> -			strbuf_addf(line, "%7"PRIuMAX, (uintmax_t)size);
>> -		else
>> -			strbuf_addf(line, "%"PRIuMAX, (uintmax_t)size);
>> -	} else if (padded) {
>> -		strbuf_addf(line, "%7s", "-");
>> +		strbuf_add_uint(line, size);
>>  	} else {
>>  		strbuf_addstr(line, "-");
>>  	}
>> +	len = line->len - orig_len;
>> +	if (len < min_len)
>> +		strbuf_insert(line, orig_len, padding, min_len - len);
>>  }
> 
> ...also less nice. We are formatting into the strbuf, and then maybe
> memmove()-ing the result to accommodate padding. I wonder how much that
> affects the timing. It's extra shuffling, but memmove() etc is often
> surprisingly fast.

I gave my objectsize and objectsize:padded numbers; the difference was
1.2 ms, albeit with 1.0 ms noise in padded case.

> I wonder how bad it would be to handle the padding ahead of time.
> Obviously strbuf_add_uint() knows the size of the result right before it
> calls memcpy(), and it could insert the padding then. But adding a
> padding length parameter (let alone the space vs "0" decision) to that
> function feels kind of gross.
> 
> In the earlier patch I raised the notion of pre-computing the output
> length. If we had a helper to do that, it would be pretty easy to do:
> 
>   /* noop if third parameter is negative */
>   strbuf_pad(line, ' ', 7 - decimal_digits(size));
>   strbuf_add_uint(line, size);

I started with a struct numbuf for holding a number string and its
length, which helped avoid the memmove(3) call.  It's simple and
doesn't require a lot of code, but introducing that concept felt a bit
much for just two users.

> You could also imagine a world where we had some stateful formatting
> system, and you could say:
> 
>   strbuf_pad_next(line, ' ', 7);
>   strbuf_add_uint(line, size);
> 
> but somebody has to store that state between the calls, and I don't love
> the idea of bloating strbuf with it. So probably you have some
> "formatter" struct, and it operates on a strbuf. And now we have all of
> the OO boilerplate hassles like initializing and tearing down our
> formatter object. ;) So probably not worth it for this triviality.

Terrifying!

> Having it all in one string ("%7d") is nice and concise.

Yes, except the original code includes the 7 twice (again for the
"-" fallback).
> I have often wondered how hard it would be to implement our own
> vsnprintf(), and whether we could do better than the libc ones. It would
> be nice to be able to add shorthands for common types (instead of the
> unreadable PRIuMAX mess), as well as custom ones (e.g., hex oids).

C99 has %ju for uintmax_t and %zu for size_t.  Hmm, do we actually
still need to avoid them?  CodingGuidelines says "the C library used
by MinGW does not" support it.  82c36fa0a9 (submodule: hash the
submodule name for the gitdir path, 2026-01-12) just added a %zu,
and there are lots of them in compat/mimalloc/ in Git for Windows.

An extensible printf-like formatter would be nice indeed.  I wondered
about how something like that could be used to write structured output
like tar and zip archive entries in a terse way.  The thought faded,
I guess, when I found no compelling reason for that compactness that
would justify the required complexity of the mechanism.

René


  reply	other threads:[~2026-05-12 20:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 11:55 [PATCH 0/4] strbuf: add and use strbuf_add_uint() René Scharfe
2026-05-12 11:56 ` [PATCH 1/4] strbuf: add strbuf_add_uint() René Scharfe
2026-05-12 18:42   ` Jeff King
2026-05-12 19:32     ` René Scharfe
2026-05-13 16:22       ` Jeff King
2026-05-13 16:47         ` Jeff King
2026-05-13 16:49         ` Jeff King
2026-05-14 11:09           ` René Scharfe
2026-05-14 11:53             ` Junio C Hamano
2026-05-15  3:53             ` Jeff King
2026-05-13 17:46         ` René Scharfe
2026-05-12 11:56 ` [PATCH 2/4] cat-file: use strbuf_add_uint() René Scharfe
2026-05-12 18:46   ` Jeff King
2026-05-12 11:56 ` [PATCH 3/4] ls-files: " René Scharfe
2026-05-12 19:01   ` Jeff King
2026-05-12 20:44     ` René Scharfe [this message]
2026-05-13 16:46       ` Jeff King
2026-05-12 11:56 ` [PATCH 4/4] ls-tree: " René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f45a33b-5945-431d-97a5-7d61e271cfba@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.