public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
Cc: git@vger.kernel.org,  Elijah Newren <newren@gmail.com>,
	 Patrick Steinhardt <ps@pks.im>,  Taylor Blau <me@ttaylorr.com>,
	 Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack
Date: Mon, 23 Mar 2026 14:30:26 -0700	[thread overview]
Message-ID: <xmqqfr5q44jx.fsf@gitster.g> (raw)
In-Reply-To: <0bb031e7443bb53abbbb0afaa347285d6d8cf7b8.1774205661.git.lorenzo.pegorari2002@gmail.com> (LorenzoPegorari's message of "Sun, 22 Mar 2026 20:18:05 +0100")

LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:

> Create a `copy_all_promisor_files()` helper function used to copy the
> contents of all ".promisor" files in a `repository` inside another
> ".promisor" file.
>
> This function can be used to preserve the contents of all ".promisor"
> files inside a new ".promisor" file, for example when a repack happens.
>
> This function is written in such a way so that it will read all the
> ".promisor" files inside the given `repository` line by line, and copy
> only the lines that are not already present in the destination file. This
> is done to avoid copying the same lines multiple times that may come from
> multiple (redundant) packfiles. There might be another better/cleaner way
> to achieve this.

In the previous step, we extablished that these "back then their ref
X used to point at object Y" records are there so that we can
identify which refs were fetched at the time the packfile was
downloaded to help debugging.  When repacking, losing these records
certainly would lose information.

But would concatenating all into a single file help preserve the
useful information?  Don't we need do better than that?

A NEEDSWORK comment, as was discussed in another thread or two in
the recent past, is not necessarily a well thought out fully
finished specification of an additional piece of work.  "We know
this has a problem, we may need to do something about it, like
concatenating to save the contents, perhaps?  We do not know the
answer, and we do not bother thinking it through right at this
moment.  It is left to the future developers to figure it out" is
what a NEEDSWORK comment is about.

Your first response to such a comment may be "yeah, I agree that it
is bad to lose information we added to help debugging", but the
second one should be to wonder if the "like concatenating..." is the
best approach going forward.

In other words, we should take a NEEDSWORK comment as a mere
starting point, and what NEEDS your work begins at thinking what
needs to be done about the problem raised there.

By mixing them up all into a single list, you no longer can tell
when their ref X was observed to be pointing at object Y anymore.
You may have two packs originally, with a record for "ref X pointing
at object Y" in each of them, but by deduping them, you lose the
information that you cloned at one time, and made an additional
fetch on another day, and the fact the ref X was pointing at the
same value at both times.  I am not sure if it is a good
implementation if the objective of this topic is to preserve
information that is useful for debugging.

I wonder if it helps to append to each line the file timestamp of
the .promisor file we took the record originally?  For the sake of
completeness, we could consider adding the filename as well, but we
can quickly dismiss it as not so useful ;-)

If repacking already repacked promisor packfile, the records would
already contain such a timestamp at the end, so the code to copy
existing records must be prepared to see if the records are the
<ref, oid> tuple, or <ref, oid, timestamp> tuple, and act
accordingly.

I am *not* saying that without such a "preserve timestamp" column in
the record, copying existing records to a new .promisor file is
useless.  But we do not see any explanation why the author thinks
that it is sufficient to copy existing records while silently
deduping.  We can implement only one choice backed by series of
decisions like "timestamp might help" and "original filenames would
probably not help", and the design should describe what was
considered and rejected (as opposed to "we didn't think things
through---we just did what the original NEEDSWORK comment suggested
doing").

Thanks.

  parent reply	other threads:[~2026-03-23 21:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-21 21:28 [GSoC PATCH 0/3] preserve promisor files content after repack LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 1/3] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-22  2:04   ` Eric Sunshine
2026-03-22 18:50     ` Lorenzo Pegorari
2026-03-21 21:29 ` [GSoC PATCH 3/3] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
2026-03-22 19:16   ` [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-23 21:07     ` Junio C Hamano
2026-03-25 21:33       ` Lorenzo Pegorari
2026-03-22 19:18   ` [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-23 20:27     ` Eric Sunshine
2026-03-26 16:15       ` Lorenzo Pegorari
2026-03-23 21:30     ` Junio C Hamano [this message]
2026-03-26  2:01       ` Lorenzo Pegorari
2026-03-22 19:18   ` [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-23 21:48     ` Junio C Hamano
2026-03-26  2:12       ` Lorenzo Pegorari
2026-03-22 19:18   ` [GSoC PATCH v2 4/4] t7700: test for promisor file content " LorenzoPegorari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqfr5q44jx.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=lorenzo.pegorari2002@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox