public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/5] builtin/repo: collect largest inflated objects
Date: Wed, 18 Feb 2026 14:01:04 -0600	[thread overview]
Message-ID: <aZYV0o9Xp-v5IPL1@denethor> (raw)
In-Reply-To: <xmqqv7gdqwei.fsf@gitster.g>

On 26/02/03 02:45PM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > The "structure" output for git-repo(1) shows the total inflated and disk
> > sizes of reachable objects in the repository, but doesn't show the size
> > of the largest individual objects. Since an individual object may be a
> > large contributor to the overall repository size, it is useful for users
> > to know the maximum size of individual objects.
> 
> Hmph.  It is true that a byte is worth the same amount of money no
> matter what object it is used to represent, but comparing the size
> of a commit object and the size of a blob object feels inherently
> meaningless to me.

I certainly agree that comparing max size values between the types
themselves is not particularly meaningfull. I do think though the max
size values by themselves provide insight into the extremes of the
repository.

> It all depends on what you are trying to learn out of the stats, but
> having many small blob objects that add up to 1GB and having medium
> number of medium sized tree objects that adds up to the same 1GB
> would give the same number in object_stats.inflated_sizes for both
> types, indicating that they are costing you about the same.  But the
> members in largest_objects for these types would be different,
> hinting (incorrectly) that one type may be costing more than the
> other.  Would that really tell us something useful, I have to
> wonder?

Ya the largest objects and inflated sizes you can not really gain any
insight regarding the distribution, but I think it still a good idea to
showcase the extremes. If I see the max size values are "normal", that
at least gives me some insight into the repository usage patterns.

> One thing that is related to "largest" that might be useful is how
> spiky size distribution is.  Among many medium sized blobs, if there
> is only a handful of super huge blobs, that is quite a notable thing
> to know (as opposed to the case where these super huge blobs are
> not so unusual).

I agree that showing a distribution here would be quite useful. This is
something I plan to explore in a followup series. :)

-Justin

  reply	other threads:[~2026-02-18 20:01 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-03 22:17 [PATCH 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-03 22:17 ` [PATCH 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-03 22:36   ` Junio C Hamano
2026-02-18 19:40     ` Justin Tobler
2026-02-26 19:20       ` Junio C Hamano
2026-02-26 19:29         ` Justin Tobler
2026-02-03 22:17 ` [PATCH 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-03 22:45   ` Junio C Hamano
2026-02-18 20:01     ` Justin Tobler [this message]
2026-02-03 22:17 ` [PATCH 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-13 13:14   ` Patrick Steinhardt
2026-02-18 20:13     ` Justin Tobler
2026-02-03 22:17 ` [PATCH 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-03 22:48   ` Junio C Hamano
2026-02-03 23:14     ` Kristoffer Haugsbakk
2026-02-03 23:33       ` Junio C Hamano
2026-02-18 20:06       ` Justin Tobler
2026-02-03 22:17 ` [PATCH 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-03 22:50   ` Junio C Hamano
2026-02-04  8:28     ` Patrick Steinhardt
2026-02-04 15:28       ` Junio C Hamano
2026-02-23 17:41 ` [PATCH v2 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-23 17:41   ` [PATCH v2 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-23 17:41   ` [PATCH v2 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-26 19:50     ` Junio C Hamano
2026-03-02 17:28       ` Justin Tobler
2026-02-28 23:36     ` Lucas Seiki Oshiro
2026-03-02 17:38       ` Justin Tobler
2026-02-23 17:41   ` [PATCH v2 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-26 19:56     ` Junio C Hamano
2026-03-02 17:39       ` Justin Tobler
2026-02-23 17:41   ` [PATCH v2 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-23 17:41   ` [PATCH v2 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-24  9:35   ` [PATCH v2 0/5] builtin/repo: include largest object information Patrick Steinhardt
2026-02-28 23:43   ` Lucas Seiki Oshiro
2026-03-01 19:22     ` Justin Tobler
2026-03-02 21:45   ` [PATCH v3 0/6] " Justin Tobler
2026-03-02 21:45     ` [PATCH v3 1/6] builtin/repo: update stats for each object Justin Tobler
2026-03-02 21:45     ` [PATCH v3 2/6] builtin/repo: add helper for printing keyvalue output Justin Tobler
2026-03-03 13:27       ` Patrick Steinhardt
2026-03-03 17:40         ` Junio C Hamano
2026-03-03 18:08           ` Justin Tobler
2026-03-02 21:45     ` [PATCH v3 3/6] builtin/repo: collect largest inflated objects Justin Tobler
2026-03-03 13:27       ` Patrick Steinhardt
2026-03-02 21:45     ` [PATCH v3 4/6] builtin/repo: add OID annotations to table output Justin Tobler
2026-03-02 21:45     ` [PATCH v3 5/6] builtin/repo: find commit with most parents Justin Tobler
2026-03-02 21:45     ` [PATCH v3 6/6] builtin/repo: find tree with most entries Justin Tobler
2026-03-02 22:09     ` [PATCH v3 0/6] builtin/repo: include largest object information Junio C Hamano
2026-03-06 22:36       ` Junio C Hamano
2026-03-08 18:44         ` Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZYV0o9Xp-v5IPL1@denethor \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox