All of lore.kernel.org
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/5] builtin/repo: collect largest inflated objects
Date: Wed, 18 Feb 2026 14:01:04 -0600	[thread overview]
Message-ID: <aZYV0o9Xp-v5IPL1@denethor> (raw)
In-Reply-To: <xmqqv7gdqwei.fsf@gitster.g>

On 26/02/03 02:45PM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > The "structure" output for git-repo(1) shows the total inflated and disk
> > sizes of reachable objects in the repository, but doesn't show the size
> > of the largest individual objects. Since an individual object may be a
> > large contributor to the overall repository size, it is useful for users
> > to know the maximum size of individual objects.
> 
> Hmph.  It is true that a byte is worth the same amount of money no
> matter what object it is used to represent, but comparing the size
> of a commit object and the size of a blob object feels inherently
> meaningless to me.

I certainly agree that comparing max size values between the types
themselves is not particularly meaningfull. I do think though the max
size values by themselves provide insight into the extremes of the
repository.

> It all depends on what you are trying to learn out of the stats, but
> having many small blob objects that add up to 1GB and having medium
> number of medium sized tree objects that adds up to the same 1GB
> would give the same number in object_stats.inflated_sizes for both
> types, indicating that they are costing you about the same.  But the
> members in largest_objects for these types would be different,
> hinting (incorrectly) that one type may be costing more than the
> other.  Would that really tell us something useful, I have to
> wonder?

Ya the largest objects and inflated sizes you can not really gain any
insight regarding the distribution, but I think it still a good idea to
showcase the extremes. If I see the max size values are "normal", that
at least gives me some insight into the repository usage patterns.

> One thing that is related to "largest" that might be useful is how
> spiky size distribution is.  Among many medium sized blobs, if there
> is only a handful of super huge blobs, that is quite a notable thing
> to know (as opposed to the case where these super huge blobs are
> not so unusual).

I agree that showing a distribution here would be quite useful. This is
something I plan to explore in a followup series. :)

-Justin

  reply	other threads:[~2026-02-18 20:01 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-03 22:17 [PATCH 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-03 22:17 ` [PATCH 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-03 22:36   ` Junio C Hamano
2026-02-18 19:40     ` Justin Tobler
2026-02-26 19:20       ` Junio C Hamano
2026-02-26 19:29         ` Justin Tobler
2026-02-03 22:17 ` [PATCH 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-03 22:45   ` Junio C Hamano
2026-02-18 20:01     ` Justin Tobler [this message]
2026-02-03 22:17 ` [PATCH 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-13 13:14   ` Patrick Steinhardt
2026-02-18 20:13     ` Justin Tobler
2026-02-03 22:17 ` [PATCH 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-03 22:48   ` Junio C Hamano
2026-02-03 23:14     ` Kristoffer Haugsbakk
2026-02-03 23:33       ` Junio C Hamano
2026-02-18 20:06       ` Justin Tobler
2026-02-03 22:17 ` [PATCH 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-03 22:50   ` Junio C Hamano
2026-02-04  8:28     ` Patrick Steinhardt
2026-02-04 15:28       ` Junio C Hamano
2026-02-23 17:41 ` [PATCH v2 0/5] builtin/repo: include largest object information Justin Tobler
2026-02-23 17:41   ` [PATCH v2 1/5] builtin/repo: update stats for each object Justin Tobler
2026-02-23 17:41   ` [PATCH v2 2/5] builtin/repo: collect largest inflated objects Justin Tobler
2026-02-26 19:50     ` Junio C Hamano
2026-03-02 17:28       ` Justin Tobler
2026-02-28 23:36     ` Lucas Seiki Oshiro
2026-03-02 17:38       ` Justin Tobler
2026-02-23 17:41   ` [PATCH v2 3/5] builtin/repo: add OID annotations to table output Justin Tobler
2026-02-26 19:56     ` Junio C Hamano
2026-03-02 17:39       ` Justin Tobler
2026-02-23 17:41   ` [PATCH v2 4/5] builtin/repo: find commit with most parents Justin Tobler
2026-02-23 17:41   ` [PATCH v2 5/5] builtin/repo: find tree with most entries Justin Tobler
2026-02-24  9:35   ` [PATCH v2 0/5] builtin/repo: include largest object information Patrick Steinhardt
2026-02-28 23:43   ` Lucas Seiki Oshiro
2026-03-01 19:22     ` Justin Tobler
2026-03-02 21:45   ` [PATCH v3 0/6] " Justin Tobler
2026-03-02 21:45     ` [PATCH v3 1/6] builtin/repo: update stats for each object Justin Tobler
2026-03-02 21:45     ` [PATCH v3 2/6] builtin/repo: add helper for printing keyvalue output Justin Tobler
2026-03-03 13:27       ` Patrick Steinhardt
2026-03-03 17:40         ` Junio C Hamano
2026-03-03 18:08           ` Justin Tobler
2026-03-02 21:45     ` [PATCH v3 3/6] builtin/repo: collect largest inflated objects Justin Tobler
2026-03-03 13:27       ` Patrick Steinhardt
2026-03-02 21:45     ` [PATCH v3 4/6] builtin/repo: add OID annotations to table output Justin Tobler
2026-03-02 21:45     ` [PATCH v3 5/6] builtin/repo: find commit with most parents Justin Tobler
2026-03-02 21:45     ` [PATCH v3 6/6] builtin/repo: find tree with most entries Justin Tobler
2026-03-02 22:09     ` [PATCH v3 0/6] builtin/repo: include largest object information Junio C Hamano
2026-03-06 22:36       ` Junio C Hamano
2026-03-08 18:44         ` Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZYV0o9Xp-v5IPL1@denethor \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.