* [PATCH] packfile: delete .idx files before .pack files
@ 2023-06-20 19:01 Derrick Stolee via GitGitGadget
2023-06-20 20:39 ` Junio C Hamano
2023-06-20 22:02 ` Taylor Blau
0 siblings, 2 replies; 3+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-06-20 19:01 UTC (permalink / raw)
To: git; +Cc: gitster, me, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <derrickstolee@github.com>
When installing a packfile, we place the .pack file before the .idx
file. The intention is that Git scans for .idx files in the pack
directory and then loads the .pack files from that list.
However, when we delete packfiles, we do not do this in the reverse
order as we should. The unlink_pack_path() method deletes the .pack
followed by the .idx.
This creates a window where the process could be interrupted between
the .pack deletion and the .idx deletion, leaving the repository in a
state that looks strange, but isn't actually too problematic if we
assume the pack was safe to delete. The .idx without a .pack will cause
some overhead, but will not interrupt other Git processes.
This ordering was introduced into the 'git repack' builtin by
a1bbc6c0176 (repack: rewrite the shell script in C, 2013-09-15), though
we must be careful to track history through the code move in 8434e85d5f9
(repack: refactor pack deletion for future use, 2019-06-10) to see that.
This became more important after 73320e49add (builtin/repack.c: only
collect fully-formed packs, 2023-06-07) changed how 'git repack' scanned
for packfiles for use in the cruft pack process. It previously looked
for .pack files, but that was problematic due to the order that packs
are installed: repacks between the creation of a .pack and the creation
of its .idx would result in hard failures.
There is an independent proposal about what to do in the case of a .idx
without a .pack during this 'git repack' scenario, but this change is
focused on deleting .pack files more safely.
Modify the order to delete the .idx before the .pack. The rest of the
modifiers on the .pack should still come after the .pack so we know all
of the presumed properties of the packfile as long as it exists in the
filesystem, in case we wish to reinstate it by re-indexing the .pack
file.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
packfile: delete .idx files before .pack files
I'm going to submit an RFC soon [1] that approaches this issue from
another angle, but this should be a straightforward fix to avoid the
problematic "an .idx exists without its .pack" case that has been seen
in the wild.
[1] https://github.com/gitgitgadget/git/pull/1546
Thanks, -Stolee
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1547%2Fderrickstolee%2Fdelete-idx-first-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1547/derrickstolee/delete-idx-first-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1547
packfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/packfile.c b/packfile.c
index fd083c86e00..6b591ddcccf 100644
--- a/packfile.c
+++ b/packfile.c
@@ -381,7 +381,7 @@ void close_object_store(struct raw_object_store *o)
void unlink_pack_path(const char *pack_name, int force_delete)
{
- static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"};
+ static const char *exts[] = {".idx", ".pack", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"};
int i;
struct strbuf buf = STRBUF_INIT;
size_t plen;
base-commit: fe86abd7511a9a6862d5706c6fa1d9b57a63ba09
--
gitgitgadget
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] packfile: delete .idx files before .pack files
2023-06-20 19:01 [PATCH] packfile: delete .idx files before .pack files Derrick Stolee via GitGitGadget
@ 2023-06-20 20:39 ` Junio C Hamano
2023-06-20 22:02 ` Taylor Blau
1 sibling, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2023-06-20 20:39 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, me, vdye, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> However, when we delete packfiles, we do not do this in the reverse
> order as we should. The unlink_pack_path() method deletes the .pack
> followed by the .idx.
Makes sense.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] packfile: delete .idx files before .pack files
2023-06-20 19:01 [PATCH] packfile: delete .idx files before .pack files Derrick Stolee via GitGitGadget
2023-06-20 20:39 ` Junio C Hamano
@ 2023-06-20 22:02 ` Taylor Blau
1 sibling, 0 replies; 3+ messages in thread
From: Taylor Blau @ 2023-06-20 22:02 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, vdye, Derrick Stolee
On Tue, Jun 20, 2023 at 07:01:15PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> When installing a packfile, we place the .pack file before the .idx
> file. The intention is that Git scans for .idx files in the pack
> directory and then loads the .pack files from that list.
>
> However, when we delete packfiles, we do not do this in the reverse
> order as we should. The unlink_pack_path() method deletes the .pack
> followed by the .idx.
>
> This creates a window where the process could be interrupted between
> the .pack deletion and the .idx deletion, leaving the repository in a
> state that looks strange, but isn't actually too problematic if we
> assume the pack was safe to delete. The .idx without a .pack will cause
> some overhead, but will not interrupt other Git processes.
I think that this specific case wouldn't cause a problem since we were
about to delete that pack anyway, but it may be worth emphasizing that
having a missing pack with an extant .idx file is not always guaranteed
to be safe.
> This ordering was introduced into the 'git repack' builtin by
> a1bbc6c0176 (repack: rewrite the shell script in C, 2013-09-15), though
> we must be careful to track history through the code move in 8434e85d5f9
> (repack: refactor pack deletion for future use, 2019-06-10) to see that.
Thanks for calling this out, I agree that the "regression" dates back to
a1bbc6c0176, since we can see that in git-repack.sh, we do:
# Remove the "old-" files
for name in $names
do
rm -f "$PACKDIR/old-pack-$name.idx"
rm -f "$PACKDIR/old-pack-$name.pack"
done
...removing the .idx first, which is the right thing to do here.
> diff --git a/packfile.c b/packfile.c
> index fd083c86e00..6b591ddcccf 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -381,7 +381,7 @@ void close_object_store(struct raw_object_store *o)
>
> void unlink_pack_path(const char *pack_name, int force_delete)
> {
> - static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"};
> + static const char *exts[] = {".idx", ".pack", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"};
> int i;
> struct strbuf buf = STRBUF_INIT;
> size_t plen;
Looks good, thanks.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-06-20 22:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-20 19:01 [PATCH] packfile: delete .idx files before .pack files Derrick Stolee via GitGitGadget
2023-06-20 20:39 ` Junio C Hamano
2023-06-20 22:02 ` Taylor Blau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).