public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: Justin Tobler <jltobler@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH 5/5] builtin/repo: find tree with most entries
Date: Wed, 4 Feb 2026 09:28:31 +0100	[thread overview]
Message-ID: <aYMDL4m7Ceifl1Ja@pks.im> (raw)
In-Reply-To: <xmqqldh9qw5d.fsf@gitster.g>

On Tue, Feb 03, 2026 at 02:50:38PM -0800, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > The size of a tree object usually corresponds with the number of entries
> > it has. While iterating through objects in the repository for
> > git-repo-structure, identify the tree with the most entries and display
> > it in the output.
> 
> All of these "largest" and "most", it would be a lot more
> interesting if we can give not just these extreme values but
> distrubution, possibly in a graphical way for bonus points.

That would be amazing indeed! I think having the largest values is still
valuable as it allows you to detect weird outliers quite easily. But
having a histogram would of course give the bigger picture.

I guess the challenging part would be to compute the buckets of that
histogram in a streaming fashion. But I guess we could:

  1. Pick a target number of buckets.

  2. Track the maximum respective values as we stream.

  3. Merge existing buckets and create new ones in case the maximum
     value changes.

The target number of buckets may not necessarily be the same number as
the number of buckets that we will eventually print for increased
resolution.

The distributions could then be printed as an ASCII bar chart, for
example something like:

    0-50   │████████████████████████████████████████ 1,247
   50-100  │█████████████████████████████████ 812
  100-150  │█████████████ 401
  150-200  │████████ 253
  200-250  │████ 128
  250-300  │██ 67
  300+     │▏ 12
           0        250       500       750      1k     1.2k count
    bytes / count

From my point of view that would be the cherry on top of the new tool :)
I'd personally still like to learn about maximum values in the table, as
I've found that info to be useful with some customer incidents in the
past. It's not giving you a trend, but it immediately gives you some
good signal that the repo shape might be weird if you have commits with
hundreds of parents.

So maybe this is another step we can do in a subsequent patch series?

Thanks!

Patrick

  reply	other threads:[~2026-02-04  8:28 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-03 22:17 [PATCH 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-03 22:17 ` [PATCH 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-03 22:36   ` Junio C Hamano
2026-02-18 19:40     ` Justin Tobler
2026-02-26 19:20       ` Junio C Hamano
2026-02-26 19:29         ` Justin Tobler
2026-02-03 22:17 ` [PATCH 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-03 22:45   ` Junio C Hamano
2026-02-18 20:01     ` Justin Tobler
2026-02-03 22:17 ` [PATCH 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-13 13:14   ` Patrick Steinhardt
2026-02-18 20:13     ` Justin Tobler
2026-02-03 22:17 ` [PATCH 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-03 22:48   ` Junio C Hamano
2026-02-03 23:14     ` Kristoffer Haugsbakk
2026-02-03 23:33       ` Junio C Hamano
2026-02-18 20:06       ` Justin Tobler
2026-02-03 22:17 ` [PATCH 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-03 22:50   ` Junio C Hamano
2026-02-04  8:28     ` Patrick Steinhardt [this message]
2026-02-04 15:28       ` Junio C Hamano
2026-02-23 17:41 ` [PATCH v2 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-23 17:41   ` [PATCH v2 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-23 17:41   ` [PATCH v2 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-26 19:50     ` Junio C Hamano
2026-03-02 17:28       ` Justin Tobler
2026-02-28 23:36     ` Lucas Seiki Oshiro
2026-03-02 17:38       ` Justin Tobler
2026-02-23 17:41   ` [PATCH v2 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-26 19:56     ` Junio C Hamano
2026-03-02 17:39       ` Justin Tobler
2026-02-23 17:41   ` [PATCH v2 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-23 17:41   ` [PATCH v2 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-24  9:35   ` [PATCH v2 0/5] builtin/repo: include largest object information Patrick Steinhardt
2026-02-28 23:43   ` Lucas Seiki Oshiro
2026-03-01 19:22     ` Justin Tobler
2026-03-02 21:45   ` [PATCH v3 0/6] " Justin Tobler
2026-03-02 21:45     ` [PATCH v3 1/6] builtin/repo: update stats for each object Justin Tobler
2026-03-02 21:45     ` [PATCH v3 2/6] builtin/repo: add helper for printing keyvalue output Justin Tobler
2026-03-03 13:27       ` Patrick Steinhardt
2026-03-03 17:40         ` Junio C Hamano
2026-03-03 18:08           ` Justin Tobler
2026-03-02 21:45     ` [PATCH v3 3/6] builtin/repo: collect largest inflated objects Justin Tobler
2026-03-03 13:27       ` Patrick Steinhardt
2026-03-02 21:45     ` [PATCH v3 4/6] builtin/repo: add OID annotations to table output Justin Tobler
2026-03-02 21:45     ` [PATCH v3 5/6] builtin/repo: find commit with most parents Justin Tobler
2026-03-02 21:45     ` [PATCH v3 6/6] builtin/repo: find tree with most entries Justin Tobler
2026-03-02 22:09     ` [PATCH v3 0/6] builtin/repo: include largest object information Junio C Hamano
2026-03-06 22:36       ` Junio C Hamano
2026-03-08 18:44         ` Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYMDL4m7Ceifl1Ja@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jltobler@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox