public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "René Scharfe" <l.s.r@web.de>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 2/2] parseopt: check for duplicate long names and numerical options
Date: Fri, 27 Feb 2026 18:08:22 -0500	[thread overview]
Message-ID: <20260227230822.GA2965111@coredump.intra.peff.net> (raw)
In-Reply-To: <20260227225055.GC2956443@coredump.intra.peff.net>

On Fri, Feb 27, 2026 at 05:50:56PM -0500, Jeff King wrote:

> On Fri, Feb 27, 2026 at 08:27:02PM +0100, René Scharfe wrote:
> 
> > The check clearly has a cost, but I have a hard time measuring it.
> > We already do lots of (kinda cheap) checks.  Turning them on only
> > in DEVELOPER builds (and ideally demonstrating a speedup) left as
> > an exercise for interested readers (with stronger benchmark-fu)..
> 
> I agree it is probably not introducing a measurable slowdown. If we were
> to make it conditional, I'd suggest a run-time toggle (so we could turn
> it on for all test scripts, but not regular use).

Just for fun, I was going to write a script that generated a test-tool
parse-options list with 100k entries. But then I realized we already
have something like that!

If you do this:

  (
    echo usage
    echo --
    for i in $(seq 100000); do
      echo "opt$i option $i"
    done
  ) >input

then hyperfine reports (before and after your patches):

  Benchmark 1: ./git.old rev-parse --parseopt -- --opt42 <input
    Time (mean ± σ):      22.2 ms ±   0.4 ms    [User: 16.6 ms, System: 5.6 ms]
    Range (min … max):    21.5 ms …  23.9 ms    127 runs
  
  Benchmark 2: ./git.new rev-parse --parseopt -- --opt42 <input
    Time (mean ± σ):      32.5 ms ±   0.5 ms    [User: 23.8 ms, System: 8.6 ms]
    Range (min … max):    31.7 ms …  34.8 ms    89 runs
  
  Summary
    ./git.old rev-parse --parseopt -- --opt42 <input ran
      1.46 ± 0.03 times faster than ./git.new rev-parse --parseopt -- --opt42 <input

So it is measurable (even with the extra per-option costs to generate
the option structs in the first place). Looks like on the order of 10ms
for 100k options, or about 100ns per option. If you imagine that most
option lists are smaller than 100, we're talking about probably the
equivalent of 50-100 syscalls. If we are really looking to
micro-optimize startup time, I suspect there's pretty low-hanging fruit
to be found of that magnitude.

> > +		if (opts->long_name) {
> > +			if (strset_contains(&long_names, opts->long_name))
> > +				optbug(opts, "long name already used");
> > +			strset_add(&long_names, opts->long_name);
> > +		}
> 
> ...if you want to micro-optimize, note that the return value of
> strset_add() tells you whether the item was already in the set. That can
> save one hash of the string.
> 
> Probably the allocation for each element is the dominating cost, though,
> and it doesn't help with that.

Doing this:

diff --git a/parse-options.c b/parse-options.c
index 51b72eee11..f056a4471e 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -659,9 +659,8 @@ static void parse_options_check(const struct option *opts)
 				optbug(opts, "short name already used");
 		}
 		if (opts->long_name) {
-			if (strset_contains(&long_names, opts->long_name))
+			if (!strset_add(&long_names, opts->long_name))
 				optbug(opts, "long name already used");
-			strset_add(&long_names, opts->long_name);
 		}
 		if (opts->type == OPTION_NUMBER) {
 			if (saw_number_option)

seems to shave off ~1% of my benchmark. Not that exciting, but hey, it's
one line shorter to boot.

-Peff

  reply	other threads:[~2026-02-27 23:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27  0:13 [Bug] duplicated long-form options go unnoticed Junio C Hamano
2026-02-27 19:27 ` [PATCH 1/2] pack-objects: remove duplicate --stdin-packs definition René Scharfe
2026-02-27 19:27 ` [PATCH 2/2] parseopt: check for duplicate long names and numerical options René Scharfe
2026-02-27 22:50   ` Jeff King
2026-02-27 23:08     ` Jeff King [this message]
2026-02-27 23:28       ` Junio C Hamano
2026-02-28  9:19       ` René Scharfe
2026-02-28  9:19   ` [PATCH v2 " René Scharfe
2026-02-28 10:58     ` Jeff King
2026-02-28 11:28       ` René Scharfe
2026-03-02 18:24         ` Jeff King
2026-03-01 14:33       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260227230822.GA2965111@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox