From: Junio C Hamano <gitster@pobox.com>
To: Eric Wong <e@80x24.org>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] prune: recursively prune objects directory
Date: Tue, 22 Nov 2022 10:28:34 +0900 [thread overview]
Message-ID: <xmqq7cznu69p.fsf@gitster.g> (raw)
In-Reply-To: <20221122000927.M873500@dcvr> (Eric Wong's message of "Tue, 22 Nov 2022 00:09:27 +0000")
Eric Wong <e@80x24.org> writes:
> I am unsure about duplicating ishex() from name-rev.c, however...
Yeah, I wonder why name-rev.c does not use isxdigit() in the first
place.
> ------8<-----
> Subject: [PATCH] prune: recursively prune objects directory
>
> $GIT_DIR/objects/pack may be removed to save inodes in shared
> repositories, so avoid scanning it if it does not exist. Loose
> object directories ($GIT_DIR/objects/??) may have old temporary
> files, so we now prune those, too.
>
> Recursion is limited to a single level since git doesn't use
> deeper levels. This avoids the risk of stack overflows via
> infinite recursion when pruning untrusted repos.
>
> We'll also emit the system error in case a directory cannot be
> opened to help users diagnose permissions problems or resource
> constraints.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
> builtin/prune.c | 28 ++++++++++++++++++++--------
> t/t5304-prune.sh | 16 ++++++++++++++++
> 2 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/prune.c b/builtin/prune.c
> index df376b2ed1..0f6a33690a 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -114,25 +114,41 @@ static int prune_subdir(unsigned int nr, const char *path, void *data)
> return 0;
> }
>
> +/*
> + * XXX ishex is duplicated in builtin/name-rev.c, perhaps git-compat-util.h
> + * is a better home for it
> + */
> +#define ishex(x) (isdigit((x)) || ((x) >= 'a' && (x) <= 'f'))
> +static int is_loose_prefix(const char *d_name)
> +{
> + return strlen(d_name) == 2 && ishex(d_name[0]) && ishex(d_name[1]);
> +}
> +
> /*
> * Write errors (particularly out of space) can result in
> * failed temporary packs (and more rarely indexes and other
> * files beginning with "tmp_") accumulating in the object
> * and the pack directories.
> */
> -static void remove_temporary_files(const char *path)
> +static void remove_temporary_files(const char *path, int recurse)
> {
> DIR *dir;
> struct dirent *de;
>
> dir = opendir(path);
> if (!dir) {
> - fprintf(stderr, "Unable to open directory %s\n", path);
> + warning_errno(_("unable to open directory %s"), path);
> return;
> }
> while ((de = readdir(dir)) != NULL)
> - if (starts_with(de->d_name, "tmp_"))
> + if (starts_with(de->d_name, "tmp_")) {
> prune_tmp_file(mkpath("%s/%s", path, de->d_name));
> + } else if (recurse && (strcmp(de->d_name, "packs") == 0 ||
> + is_loose_prefix(de->d_name))) {
OK, the intent is to be careful and deal only with the fan-out
directories objects/[0-9a-f]{2}/ and objects/pack/ and leave crufts
in objects/info and any other unknown subdirectories, which makes
sense.
Two nits are:
- "packs" wants to be "pack".
- "strcmp() == 0" wants to be "!strcmp()".
> + char *s = mkpathdup("%s/%s", path, de->d_name);
> + remove_temporary_files(s, 0);
> + free(s);
> + }
> closedir(dir);
> }
>
> @@ -150,7 +166,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
> N_("limit traversal to objects outside promisor packfiles")),
> OPT_END()
> };
> - char *s;
>
> expire = TIME_MAX;
> save_commit_buffer = 0;
> @@ -186,10 +201,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
> prune_cruft, prune_subdir, &revs);
>
> prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
> - remove_temporary_files(get_object_directory());
> - s = mkpathdup("%s/pack", get_object_directory());
> - remove_temporary_files(s);
> - free(s);
> + remove_temporary_files(get_object_directory(), 1);
>
> if (is_repository_shallow(the_repository)) {
> perform_reachability_traversal(&revs);
> diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> index 8ae314af58..8c2278035e 100755
> --- a/t/t5304-prune.sh
> +++ b/t/t5304-prune.sh
> @@ -29,6 +29,22 @@ test_expect_success setup '
> git gc
> '
>
> +test_expect_success 'prune stale loose objects' '
> + mkdir .git/objects/aa &&
> + >.git/objects/aa/tmp_foo &&
> + test-tool chmtime =-86501 .git/objects/aa/tmp_foo &&
> + git prune --expire 1.day &&
> + test_path_is_missing .git/objects/aa/tmp_foo
> +'
> +
> +test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
> + git clone -q --shared --template= --bare . bare.git &&
> + rmdir bare.git/objects/pack &&
> + git --git-dir=bare.git prune --no-progress 2>prune.err &&
> + test_must_be_empty prune.err &&
> + rm -r bare.git prune.err
> +'
Is the last "clean-up" step necessary?
> +
> test_expect_success 'prune stale packs' '
> orig_pack=$(echo .git/objects/pack/*.pack) &&
> >.git/objects/tmp_1.pack &&
Other than that, looks like a good idea.
Thanks.
next prev parent reply other threads:[~2022-11-22 1:28 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-19 20:12 [PATCH] prune: quiet ENOENT on missing directories Eric Wong
2022-11-21 6:02 ` Junio C Hamano
2022-11-21 10:44 ` Eric Wong
2022-11-21 13:08 ` Junio C Hamano
2022-11-21 23:09 ` Junio C Hamano
2022-11-22 0:09 ` [PATCH] prune: recursively prune objects directory Eric Wong
2022-11-22 1:28 ` Junio C Hamano [this message]
2022-11-22 9:59 ` Eric Wong
2022-11-22 23:16 ` Junio C Hamano
2022-11-21 11:16 ` [PATCH] prune: quiet ENOENT on missing directories Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq7cznu69p.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.