git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: John Cai <johncai86@gmail.com>
Cc: Robert Coup <robert.coup@koordinates.com>,
	John Cai via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>,
	Christian Couder <christian.couder@gmail.com>
Subject: Re: [PATCH v2 0/4] [RFC] repack: add --filter=
Date: Sun, 20 Feb 2022 22:11:55 -0500	[thread overview]
Message-ID: <YhMC+3FdSEZz22qX@nand.local> (raw)
In-Reply-To: <CB2ACEF7-76A9-4253-AD43-7BC842F9576D@gmail.com>

On Wed, Feb 16, 2022 at 04:07:14PM -0500, John Cai wrote:
> > I don't know whether that is just around naming (--delete-filter /
> > --drop-filter /
> > --expire-filter ?), and/or making the documentation very explicit that
> > this isn't so
> > much "omitting certain objects from a packfile" as irretrievably
> > deleting objects.
>
> Yeah, making the name very clear (I kind of like --delete-filter) would certainly help.
> Also, to have more protection we can either
>
> 1. add a config value that needs to be set to true for repack to remove
> objects (repack.allowDestroyFilter).
>
> 2. --filter is dry-run by default and prints out objects that would have been removed,
> and it has to be combined with another flag --destroy in order for it to actually remove
> objects from the odb.

I share the same concern as Robert and Stolee do. But I think this issue
goes deeper than just naming.

Even if we called this `git repack --delete-filter` and only ran it with
`--i-know-what-im-doing` flag, we would still be leaving repository
corruption on the table, just making it marginally more difficult to
achieve.

I'm not familiar enough with the proposal to comment authoritatively,
but it seems like we should be verifying that there is a promisor remote
which promises any objects that we are about to filter out of the
repository.

I think that this is basically what `pack-objects`'s
`--missing=allow-promisor` does, though I don't think that's the right
tool for this job, either. Because we pack-objects also knows the object
filter, by the time we are ready to construct a pack, we're traversing
the filtered list of objects.

So we don't even bother to call show_object (or, in this case,
builtin/pack-objects.c::show_objecT__ma_allow_promisor) on them.

So I wonder what your thoughts are on having pack-objects only allow an
object to get "filtered out" if a copy of it is promised by some
promisor remote. Alternatively, and perhaps a more straight-forward
option might be to have `git repack` look at any objects that exist in a
pack we're about to delete, but don't exist in any of the packs we are
going to leave around, and make sure that any of those objects are
either unreachable or exist on a promisor remote.

But as it stands right now, I worry that this feature is too easily
misused and could result in unintended repository corruption.

I think verifying that that any objects we're about to delete exist
somewhere should make this safer to use, though even then, I think we're
still open to a TOCTOU race whereby the promisor has the objects when
we're about to delete them (convincing Git that deleting those objects
is OK to do) but gets rid of them after objects have been deleted from
the local copy (leaving no copies of the object around).

So, I don't know exactly what the right path forward is. But I'm curious
to get your thoughts on the above.

Thanks,
Taylor

  reply	other threads:[~2022-02-21  3:11 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-27  1:49 [PATCH 0/2] repack: add --filter= John Cai via GitGitGadget
2022-01-27  1:49 ` [PATCH 1/2] pack-objects: allow --filter without --stdout John Cai via GitGitGadget
2022-01-27  1:49 ` [PATCH 2/2] repack: add --filter=<filter-spec> option John Cai via GitGitGadget
2022-01-27 15:03   ` Derrick Stolee
2022-01-29 19:14     ` John Cai
2022-01-30  8:16       ` Christian Couder
2022-01-30 13:02       ` John Cai
2022-02-09  2:10 ` [PATCH v2 0/4] [RFC] repack: add --filter= John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 1/4] pack-objects: allow --filter without --stdout John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 2/4] repack: add --filter=<filter-spec> option John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 3/4] upload-pack: allow missing promisor objects John Cai via GitGitGadget
2022-02-09  2:10   ` [PATCH v2 4/4] tests for repack --filter mode John Cai via GitGitGadget
2022-02-17 16:14     ` Robert Coup
2022-02-17 20:36       ` John Cai
2022-02-09  2:27   ` [PATCH v2 0/4] [RFC] repack: add --filter= John Cai
2022-02-16 15:39   ` Robert Coup
2022-02-16 21:07     ` John Cai
2022-02-21  3:11       ` Taylor Blau [this message]
2022-02-21 15:38         ` Robert Coup
2022-02-21 17:57           ` Taylor Blau
2022-02-21 21:10         ` Christian Couder
2022-02-21 21:42           ` Taylor Blau
2022-02-22 17:11             ` Christian Couder
2022-02-22 17:33               ` Taylor Blau
2022-02-23 15:40               ` Robert Coup
2022-02-23 19:31               ` Junio C Hamano
2022-02-26 16:01                 ` John Cai
2022-02-26 17:29                   ` Taylor Blau
2022-02-26 20:19                     ` John Cai
2022-02-26 20:30                       ` Taylor Blau
2022-02-26 21:05                         ` John Cai
2022-02-26 21:44                           ` Taylor Blau
2022-02-22 18:52             ` John Cai
2022-02-22 19:35               ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YhMC+3FdSEZz22qX@nand.local \
    --to=me@ttaylorr.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johncai86@gmail.com \
    --cc=robert.coup@koordinates.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).