public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
	Patrick Steinhardt <ps@pks.im>, Taylor Blau <me@ttaylorr.com>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack
Date: Thu, 26 Mar 2026 03:01:07 +0100	[thread overview]
Message-ID: <acSTY_i4zAueN9jD@lorenzo-VM> (raw)
In-Reply-To: <xmqqfr5q44jx.fsf@gitster.g>

On Mon, Mar 23, 2026 at 02:30:26PM -0700, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> 
> > Create a `copy_all_promisor_files()` helper function used to copy the
> > contents of all ".promisor" files in a `repository` inside another
> > ".promisor" file.
> >
> > This function can be used to preserve the contents of all ".promisor"
> > files inside a new ".promisor" file, for example when a repack happens.
> >
> > This function is written in such a way so that it will read all the
> > ".promisor" files inside the given `repository` line by line, and copy
> > only the lines that are not already present in the destination file. This
> > is done to avoid copying the same lines multiple times that may come from
> > multiple (redundant) packfiles. There might be another better/cleaner way
> > to achieve this.
> 
> In the previous step, we extablished that these "back then their ref
> X used to point at object Y" records are there so that we can
> identify which refs were fetched at the time the packfile was
> downloaded to help debugging.  When repacking, losing these records
> certainly would lose information.
> 
> But would concatenating all into a single file help preserve the
> useful information?  Don't we need do better than that?

Yeah, we absolutely need to! That's why I said that I was not satisfied
at all with the patch (in cover letter of v1). I really needed some
feedback, because I knew that I was doing things wrong.

> A NEEDSWORK comment, as was discussed in another thread or two in
> the recent past, is not necessarily a well thought out fully
> finished specification of an additional piece of work.  "We know
> this has a problem, we may need to do something about it, like
> concatenating to save the contents, perhaps?  We do not know the
> answer, and we do not bother thinking it through right at this
> moment.  It is left to the future developers to figure it out" is
> what a NEEDSWORK comment is about.
> 
> Your first response to such a comment may be "yeah, I agree that it
> is bad to lose information we added to help debugging", but the
> second one should be to wonder if the "like concatenating..." is the
> best approach going forward.
> 
> In other words, we should take a NEEDSWORK comment as a mere
> starting point, and what NEEDS your work begins at thinking what
> needs to be done about the problem raised there.

I fully understand this! Honestly, my biggest weakness that I've
discovered about myself as a dev (through past open-source experience,
e.g. GSoC'25) is that I get hesitant when I have to work on and submit
a patch if I don't have a lot of experience with the codebase. This
happens particularly when I have to take a decision, and not only
complete a task.

In fact, I decided to work on this specific NEEDSWORK issue to get more
experience on promisor remotes (the feature that I want to improve in my
GSoC proposal) before the GSoC coding period... if I get selected, of
course :-).

I will try my best to improve!

> By mixing them up all into a single list, you no longer can tell
> when their ref X was observed to be pointing at object Y anymore.
> You may have two packs originally, with a record for "ref X pointing
> at object Y" in each of them, but by deduping them, you lose the
> information that you cloned at one time, and made an additional
> fetch on another day, and the fact the ref X was pointing at the
> same value at both times.  I am not sure if it is a good
> implementation if the objective of this topic is to preserve
> information that is useful for debugging.

My reasoning was based on the (wrong) assumption that it was impossible
for the same record fo "ref X is pointing at object Y" to appear
multiple times. Obviously then, deduping them is the wrong solution, as
it will lose some debugging information.

> I wonder if it helps to append to each line the file timestamp of
> the .promisor file we took the record originally?  For the sake of
> completeness, we could consider adding the filename as well, but we
> can quickly dismiss it as not so useful ;-)

Related to what I said before about getting hesitant... I actually
thought about (pretty much) exactly this! My original idea was to add
a timestamp of the current time when the repack happened. I discarded it
because I didn't want to add any new information (for no particular
reason tbh) and because I didn't want ".promisor" file content to
potentially become too long if many repacks happen.

Your solution is much cleaner compared to what I originally thought of.

> If repacking already repacked promisor packfile, the records would
> already contain such a timestamp at the end, so the code to copy
> existing records must be prepared to see if the records are the
> <ref, oid> tuple, or <ref, oid, timestamp> tuple, and act
> accordingly.

Makes perfect sense.

> I am *not* saying that without such a "preserve timestamp" column in
> the record, copying existing records to a new .promisor file is
> useless.  But we do not see any explanation why the author thinks
> that it is sufficient to copy existing records while silently
> deduping.  We can implement only one choice backed by series of
> decisions like "timestamp might help" and "original filenames would
> probably not help", and the design should describe what was
> considered and rejected (as opposed to "we didn't think things
> through---we just did what the original NEEDSWORK comment suggested
> doing").

I 100% agree.

> Thanks.

Thank you Junio for your time and feedback,

Lorenzo

  reply	other threads:[~2026-03-26  2:01 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-21 21:28 [GSoC PATCH 0/3] preserve promisor files content after repack LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 1/3] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-21 21:28 ` [GSoC PATCH 2/3] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-22  2:04   ` Eric Sunshine
2026-03-22 18:50     ` Lorenzo Pegorari
2026-03-21 21:29 ` [GSoC PATCH 3/3] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-22 19:16 ` [GSoC PATCH v2 0/4] preserve promisor files content " LorenzoPegorari
2026-03-22 19:16   ` [GSoC PATCH v2 1/4] pack-write: add explanation to promisor file content LorenzoPegorari
2026-03-23 21:07     ` Junio C Hamano
2026-03-25 21:33       ` Lorenzo Pegorari
2026-03-22 19:18   ` [GSoC PATCH v2 2/4] pack-write: add helper to fill promisor file after repack LorenzoPegorari
2026-03-23 20:27     ` Eric Sunshine
2026-03-26 16:15       ` Lorenzo Pegorari
2026-03-23 21:30     ` Junio C Hamano
2026-03-26  2:01       ` Lorenzo Pegorari [this message]
2026-03-22 19:18   ` [GSoC PATCH v2 3/4] repack-promisor: preserve content of promisor files " LorenzoPegorari
2026-03-23 21:48     ` Junio C Hamano
2026-03-26  2:12       ` Lorenzo Pegorari
2026-03-22 19:18   ` [GSoC PATCH v2 4/4] t7700: test for promisor file content " LorenzoPegorari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acSTY_i4zAueN9jD@lorenzo-VM \
    --to=lorenzo.pegorari2002@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox