* [PATCH 0/2] remote: optimize rm/prune ref deletion
@ 2014-05-20 10:34 Jens Lindström
2014-05-20 10:39 ` [PATCH 1/2] remote: defer repacking packed-refs when deleting refs Jens Lindström
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Jens Lindström @ 2014-05-20 10:34 UTC (permalink / raw)
To: git
At work, we have some shared repositories with far too many refs in
them, which causes various issues, performance and otherwise. We plan
to move most of the refs out of them, but for that to help users that
have already fetched all the refs into their local repositories, those
users should want to run 'git remote prune'.
It turns out that 'git remote prune' (and also 'git remote rm') has
its own rather severe performance issues relating to removing the
obsolete refs from the local repository, which I've addressed in the
following patches. The performance improves from many CPU minutes
(long enough that I could never be bothered to let it run to
completion) to around a few seconds, when removing ~15000 refs.
Jens Lindström (2):
remote: defer repacking packed-refs when deleting refs
remote prune: optimize "dangling symref" check/warning
builtin/remote.c | 25 +++++++++++++++++++++----
refs.c | 34 +++++++++++++++++++++++++++-------
refs.h | 4 ++++
3 files changed, 52 insertions(+), 11 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/2] remote: defer repacking packed-refs when deleting refs
2014-05-20 10:34 [PATCH 0/2] remote: optimize rm/prune ref deletion Jens Lindström
@ 2014-05-20 10:39 ` Jens Lindström
2014-05-20 19:30 ` Junio C Hamano
2014-05-20 10:41 ` [PATCH 2/2] remote prune: optimize "dangling symref" check/warning Jens Lindström
2014-05-23 10:26 ` [PATCH v2 0/3] remote: optimize rm/prune ref deletion Jens Lindström
2 siblings, 1 reply; 16+ messages in thread
From: Jens Lindström @ 2014-05-20 10:39 UTC (permalink / raw)
To: git
When 'git remote rm' or 'git remote prune' were used in a repository
with many refs, and needed to delete many refs, a lot of time was spent
deleting those refs since for each deleted ref, repack_without_refs()
was called to rewrite packed-refs without just that deleted ref.
To avoid this, defer the repacking until after all refs have been
deleted (by delete_ref()), and then call repack_without_refs() once to
repack without all the deleted refs.
Signed-off-by: Jens Lindström <jl@opera.com>
---
This patch changes behavior when the operation is aborted in the
middle, so that loose refs and ref logs might have been deleted, but
not the corresponding entries in packed-refs, since packed-refs is now
only updated at the end. This is a bit unfortunate, and may
"resurrect" an obsolete packed-refs entry by deleting the loose ref
that had overridden it.
Mitigating factors would be that this only affects remote tracking
branches that we were about to delete anyway, and that in the case of
'git remote prune' were apparently not actually matching a ref in the
remote.
Also, it is a lot harder to interrupt the operation in the middle when
it takes a few seconds compared to when it takes many minutes to
complete. :-)
builtin/remote.c | 19 ++++++++++++++++---
refs.c | 15 +++++++++------
refs.h | 3 +++
3 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/builtin/remote.c b/builtin/remote.c
index b3ab4cf..ce60a30 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -749,15 +749,18 @@ static int mv(int argc, const char **argv)
static int remove_branches(struct string_list *branches)
{
+ const char **branch_names = xmalloc(branches->nr * sizeof(*branch_names));
int i, result = 0;
for (i = 0; i < branches->nr; i++) {
struct string_list_item *item = branches->items + i;
- const char *refname = item->string;
+ const char *refname = branch_names[i] = item->string;
unsigned char *sha1 = item->util;
- if (delete_ref(refname, sha1, 0))
+ if (delete_ref(refname, sha1, REF_DEFERREPACK))
result |= error(_("Could not remove branch %s"), refname);
}
+ result |= repack_without_refs(branch_names, branches->nr);
+ free(branch_names);
return result;
}
@@ -1303,6 +1306,7 @@ static int prune_remote(const char *remote, int dry_run)
{
int result = 0, i;
struct ref_states states;
+ const char **delete_refs;
const char *dangling_msg = dry_run
? _(" %s will become dangling!")
: _(" %s has become dangling!");
@@ -1316,13 +1320,16 @@ static int prune_remote(const char *remote, int dry_run)
states.remote->url_nr
? states.remote->url[0]
: _("(no URL)"));
+ delete_refs = xmalloc(states.stale.nr * sizeof(*delete_refs));
}
for (i = 0; i < states.stale.nr; i++) {
const char *refname = states.stale.items[i].util;
+ delete_refs[i] = refname;
+
if (!dry_run)
- result |= delete_ref(refname, NULL, 0);
+ result |= delete_ref(refname, NULL, REF_DEFERREPACK);
if (dry_run)
printf_ln(_(" * [would prune] %s"),
@@ -1333,6 +1340,12 @@ static int prune_remote(const char *remote, int dry_run)
warn_dangling_symref(stdout, dangling_msg, refname);
}
+ if (states.stale.nr) {
+ if (!dry_run)
+ result |= repack_without_refs(delete_refs, states.stale.nr);
+ free(delete_refs);
+ }
+
free_remote_ref_states(&states);
return result;
}
diff --git a/refs.c b/refs.c
index 28d5eca..3b62aca 100644
--- a/refs.c
+++ b/refs.c
@@ -2431,7 +2431,7 @@ static int curate_packed_ref_fn(struct ref_entry *entry, void *cb_data)
return 0;
}
-static int repack_without_refs(const char **refnames, int n)
+int repack_without_refs(const char **refnames, int n)
{
struct ref_dir *packed;
struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
@@ -2507,11 +2507,14 @@ int delete_ref(const char *refname, const unsigned char *sha1, int delopt)
return 1;
ret |= delete_ref_loose(lock, flag);
- /* removing the loose one could have resurrected an earlier
- * packed one. Also, if it was not loose we need to repack
- * without it.
- */
- ret |= repack_without_ref(lock->ref_name);
+ if (!(delopt & REF_DEFERREPACK))
+ {
+ /* removing the loose one could have resurrected an earlier
+ * packed one. Also, if it was not loose we need to repack
+ * without it.
+ */
+ ret |= repack_without_ref(lock->ref_name);
+ }
unlink_or_warn(git_path("logs/%s", lock->ref_name));
clear_loose_ref_cache(&ref_cache);
diff --git a/refs.h b/refs.h
index 87a1a79..0db5584 100644
--- a/refs.h
+++ b/refs.h
@@ -132,6 +132,8 @@ extern void rollback_packed_refs(void);
*/
int pack_refs(unsigned int flags);
+extern int repack_without_refs(const char **refnames, int n);
+
extern int ref_exists(const char *);
/*
@@ -149,6 +151,7 @@ extern struct ref_lock *lock_ref_sha1(const char *refname, const unsigned char *
/** Locks any ref (for 'HEAD' type refs). */
#define REF_NODEREF 0x01
+#define REF_DEFERREPACK 0x02
extern struct ref_lock *lock_any_ref_for_update(const char *refname,
const unsigned char *old_sha1,
int flags, int *type_p);
--
1.9.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] remote prune: optimize "dangling symref" check/warning
2014-05-20 10:34 [PATCH 0/2] remote: optimize rm/prune ref deletion Jens Lindström
2014-05-20 10:39 ` [PATCH 1/2] remote: defer repacking packed-refs when deleting refs Jens Lindström
@ 2014-05-20 10:41 ` Jens Lindström
2014-05-23 10:26 ` [PATCH v2 0/3] remote: optimize rm/prune ref deletion Jens Lindström
2 siblings, 0 replies; 16+ messages in thread
From: Jens Lindström @ 2014-05-20 10:41 UTC (permalink / raw)
To: git
When 'git remote prune' was used to delete many refs in a repository
with many refs, a lot of time was spent checking for (now) dangling
symbolic refs pointing to the deleted ref, since warn_dangling_symref()
was once per deleted ref to check all other refs in the repository.
Avoid this using the new warn_dangling_symrefs() function which
makes one pass over all refs and checks for all the deleted refs in
one go, after they have all been deleted.
Signed-off-by: Jens Lindström <jl@opera.com>
---
builtin/remote.c | 6 +++++-
refs.c | 19 ++++++++++++++++++-
refs.h | 1 +
3 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/builtin/remote.c b/builtin/remote.c
index ce60a30..5e4a8dd 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -1306,6 +1306,7 @@ static int prune_remote(const char *remote, int dry_run)
{
int result = 0, i;
struct ref_states states;
+ struct string_list delete_refs_list = STRING_LIST_INIT_NODUP;
const char **delete_refs;
const char *dangling_msg = dry_run
? _(" %s will become dangling!")
@@ -1327,6 +1328,7 @@ static int prune_remote(const char *remote, int dry_run)
const char *refname = states.stale.items[i].util;
delete_refs[i] = refname;
+ string_list_insert(&delete_refs_list, refname);
if (!dry_run)
result |= delete_ref(refname, NULL, REF_DEFERREPACK);
@@ -1337,9 +1339,11 @@ static int prune_remote(const char *remote, int dry_run)
else
printf_ln(_(" * [pruned] %s"),
abbrev_ref(refname, "refs/remotes/"));
- warn_dangling_symref(stdout, dangling_msg, refname);
}
+ warn_dangling_symrefs(stdout, dangling_msg, &delete_refs_list);
+ string_list_clear(&delete_refs_list, 0);
+
if (states.stale.nr) {
if (!dry_run)
result |= repack_without_refs(delete_refs, states.stale.nr);
diff --git a/refs.c b/refs.c
index 3b62aca..fdd8b74 100644
--- a/refs.c
+++ b/refs.c
@@ -1611,6 +1611,7 @@ int peel_ref(const char *refname, unsigned char *sha1)
struct warn_if_dangling_data {
FILE *fp;
const char *refname;
+ const struct string_list *refnames;
const char *msg_fmt;
};
@@ -1625,8 +1626,12 @@ static int warn_if_dangling_symref(const char *refname, const unsigned char *sha
return 0;
resolves_to = resolve_ref_unsafe(refname, junk, 0, NULL);
- if (!resolves_to || strcmp(resolves_to, d->refname))
+ if (!resolves_to
+ || (d->refname
+ ? strcmp(resolves_to, d->refname)
+ : !string_list_has_string(d->refnames, resolves_to))) {
return 0;
+ }
fprintf(d->fp, d->msg_fmt, refname);
fputc('\n', d->fp);
@@ -1639,6 +1644,18 @@ void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refname)
data.fp = fp;
data.refname = refname;
+ data.refnames = NULL;
+ data.msg_fmt = msg_fmt;
+ for_each_rawref(warn_if_dangling_symref, &data);
+}
+
+void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct string_list *refnames)
+{
+ struct warn_if_dangling_data data;
+
+ data.fp = fp;
+ data.refname = NULL;
+ data.refnames = refnames;
data.msg_fmt = msg_fmt;
for_each_rawref(warn_if_dangling_symref, &data);
}
diff --git a/refs.h b/refs.h
index 0db5584..cd4710d 100644
--- a/refs.h
+++ b/refs.h
@@ -89,7 +89,7 @@ static inline const char *has_glob_specials(const char *pattern)
extern int for_each_rawref(each_ref_fn, void *);
extern void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refname);
+extern void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct string_list* refnames);
/*
* Lock the packed-refs file for writing. Flags is passed to
--
1.9.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] remote: defer repacking packed-refs when deleting refs
2014-05-20 10:39 ` [PATCH 1/2] remote: defer repacking packed-refs when deleting refs Jens Lindström
@ 2014-05-20 19:30 ` Junio C Hamano
2014-05-20 20:29 ` Junio C Hamano
0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2014-05-20 19:30 UTC (permalink / raw)
To: Jens Lindström; +Cc: git
Jens Lindström <jl@opera.com> writes:
> When 'git remote rm' or 'git remote prune' were used in a repository
> with many refs, and needed to delete many refs, a lot of time was spent
> deleting those refs since for each deleted ref, repack_without_refs()
> was called to rewrite packed-refs without just that deleted ref.
>
> To avoid this, defer the repacking until after all refs have been
> deleted (by delete_ref()), and then call repack_without_refs() once to
> repack without all the deleted refs.
>
> Signed-off-by: Jens Lindström <jl@opera.com>
> ---
> This patch changes behavior when the operation is aborted in the
> middle, so that loose refs and ref logs might have been deleted, but
> not the corresponding entries in packed-refs, since packed-refs is now
> only updated at the end.
Also this makes it a bit more dangerous for processes accessing the
repository while pruning is in progress by exposing stale refs that
may be pointing at objects that are no longer present in the object
store in the packed-refs file for a longer period ("git fsck" may
fail, for example).
As long as we accept that removing a remote or pruning one are kinds
of "maintenance" operations and nobody else should be peeking into
the repository during maintenance period, it may be alright, but
aborting the operation in the middle will essentially leave the
repository in a corrupted state---the "abort corrupts" is a downside
with or without this change and is not a new problem.
A bit safer way to organize might be to first create a list of the
refs to be removed in-core, update packed-refs without these refs to
be removed, and then finally remove the loose ones, but I haven't
thought things through.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] remote: defer repacking packed-refs when deleting refs
2014-05-20 19:30 ` Junio C Hamano
@ 2014-05-20 20:29 ` Junio C Hamano
2014-05-23 10:03 ` Jens Lindström
0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2014-05-20 20:29 UTC (permalink / raw)
To: Jens Lindström; +Cc: git
Junio C Hamano <gitster@pobox.com> writes:
> A bit safer way to organize might be to first create a list of the
> refs to be removed in-core, update packed-refs without these refs to
> be removed, and then finally remove the loose ones, but I haven't
> thought things through.
Perhaps a removal of remote can go in this order to be resistant
against an abort-in-the-middle.
* update packed-refs without the refs that came from the remote.
- when interrupted before the new temporary file is renamed to
the final, it would be a no-op.
- when interrupted after the rename, only some refs that came
from the remote may disappear.
* remove the loose refs that came from the remote.
* finally, remove the configuration related to the remote.
This order would let you interrupt "remote rm" without leaving the
repository in a broken state. Before the final state, it may appear
that you have some but not all remote-tracking refs from the remote
in your repository, but you would not have any ref that point at an
obsolete object. Running "remote rm" again, once it finishes, will
give you the desired result.
A longer-term goal might be to have ref-transaction infrastructure
clever enough to coalesce the "repack-without-these-refs" requests
into one automatically without forcing the callers you are fixing
care about these things, though. And such a transaction semantics
may have to also cover the updating of configuration files.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] remote: defer repacking packed-refs when deleting refs
2014-05-20 20:29 ` Junio C Hamano
@ 2014-05-23 10:03 ` Jens Lindström
2014-05-23 17:09 ` Junio C Hamano
0 siblings, 1 reply; 16+ messages in thread
From: Jens Lindström @ 2014-05-23 10:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
On Tue, May 20, 2014 at 10:29 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> A bit safer way to organize might be to first create a list of the
>> refs to be removed in-core, update packed-refs without these refs to
>> be removed, and then finally remove the loose ones, but I haven't
>> thought things through.
>
> Perhaps a removal of remote can go in this order to be resistant
> against an abort-in-the-middle.
>
> * update packed-refs without the refs that came from the remote.
>
> - when interrupted before the new temporary file is renamed to
> the final, it would be a no-op.
>
> - when interrupted after the rename, only some refs that came
> from the remote may disappear.
>
> * remove the loose refs that came from the remote.
>
> * finally, remove the configuration related to the remote.
>
> This order would let you interrupt "remote rm" without leaving the
> repository in a broken state. Before the final state, it may appear
> that you have some but not all remote-tracking refs from the remote
> in your repository, but you would not have any ref that point at an
> obsolete object. Running "remote rm" again, once it finishes, will
> give you the desired result.
Removing the remote configuration (I assume you mean the
"remote.<name>" section from .git/config) last in 'remote rm' would be
a bit better I think. Especially with the very slow removal of
branches an impatient user would likely abort the command if there
were many remote-tracking branches, after which rerunning would fail
since the remote configuration was already gone, and then there would
be no obvious way to get rid of the remaining remote-tracking
branches.
Doing the repacking first and then run through and delete loose refs
and ref logs leads to a smaller and nicer patch as well; no need to
tell delete_ref() to not repack then, since repack_without_refs() will
just find that the ref isn't in packed-refs already and do nothing.
One additional change was required in
builtin/remote.c:remove_branches(). It used to pass in the expected
SHA-1 of the ref to delete_ref(), which only works if the ref exists.
If repack_without_refs() is called first, most refs won't exist, and
delete_ref() would fail. Branch removal from 'remote prune' didn't
have this problem, since it called delete_ref() with a NULL SHA-1.
> A longer-term goal might be to have ref-transaction infrastructure
> clever enough to coalesce the "repack-without-these-refs" requests
> into one automatically without forcing the callers you are fixing
> care about these things, though. And such a transaction semantics
> may have to also cover the updating of configuration files.
Yes, some kind of more advanced transaction mechanism would be great,
and would likely solve this type of performance issue by design.
/ Jens
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 0/3] remote: optimize rm/prune ref deletion
2014-05-20 10:34 [PATCH 0/2] remote: optimize rm/prune ref deletion Jens Lindström
2014-05-20 10:39 ` [PATCH 1/2] remote: defer repacking packed-refs when deleting refs Jens Lindström
2014-05-20 10:41 ` [PATCH 2/2] remote prune: optimize "dangling symref" check/warning Jens Lindström
@ 2014-05-23 10:26 ` Jens Lindström
2014-05-23 10:28 ` [PATCH v2 1/3] remote rm: delete remote configuration as the last Jens Lindström
` (2 more replies)
2 siblings, 3 replies; 16+ messages in thread
From: Jens Lindström @ 2014-05-23 10:26 UTC (permalink / raw)
To: git, Junio C Hamano
Changes since previous version:
* Additionally change the order that 'remote rm' does things so that it
removes the remote configuration as the last step and only if the
other steps succeeded.
* Change the packed-refs repacking patch to repack before deleting refs
instead of after. This makes the patch simpler, since delete_ref()
no longer needs to be told not to repack.
Jens Lindstrom (3):
remote rm: delete remote configuration as the last step
remote: repack packed-refs once when deleting multiple refs
remote prune: optimize "dangling symref" check/warning
builtin/remote.c | 37 ++++++++++++++++++++++++++++++-------
refs.c | 21 +++++++++++++++++++--
refs.h | 3 +++
3 files changed, 52 insertions(+), 9 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/3] remote rm: delete remote configuration as the last
2014-05-23 10:26 ` [PATCH v2 0/3] remote: optimize rm/prune ref deletion Jens Lindström
@ 2014-05-23 10:28 ` Jens Lindström
2014-05-23 18:55 ` Junio C Hamano
2014-05-23 10:29 ` [PATCH v2 2/3] remote: repack packed-refs once when deleting multiple refs Jens Lindström
2014-05-23 10:30 ` [PATCH v2 3/3] remote prune: optimize "dangling symref" check/warning Jens Lindström
2 siblings, 1 reply; 16+ messages in thread
From: Jens Lindström @ 2014-05-23 10:28 UTC (permalink / raw)
To: git, Junio C Hamano
When removing a remote, delete the remote-tracking branches before
deleting the remote configuration. This way, if the operation fails or
is aborted while deleting the remote-tracking branches, the command can
be rerun to complete the operation.
Signed-off-by: Jens Lindström <jl@opera.com>
---
builtin/remote.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/builtin/remote.c b/builtin/remote.c
index b3ab4cf..84802cd 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -789,10 +789,6 @@ static int rm(int argc, const char **argv)
known_remotes.to_delete = remote;
for_each_remote(add_known_remote, &known_remotes);
- strbuf_addf(&buf, "remote.%s", remote->name);
- if (git_config_rename_section(buf.buf, NULL) < 1)
- return error(_("Could not remove config section '%s'"), buf.buf);
-
read_branches();
for (i = 0; i < branch_list.nr; i++) {
struct string_list_item *item = branch_list.items + i;
@@ -837,6 +833,12 @@ static int rm(int argc, const char **argv)
}
string_list_clear(&skipped, 0);
+ if (!result) {
+ strbuf_addf(&buf, "remote.%s", remote->name);
+ if (git_config_rename_section(buf.buf, NULL) < 1)
+ return error(_("Could not remove config section '%s'"), buf.buf);
+ }
+
return result;
}
--
1.9.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 2/3] remote: repack packed-refs once when deleting multiple refs
2014-05-23 10:26 ` [PATCH v2 0/3] remote: optimize rm/prune ref deletion Jens Lindström
2014-05-23 10:28 ` [PATCH v2 1/3] remote rm: delete remote configuration as the last Jens Lindström
@ 2014-05-23 10:29 ` Jens Lindström
2014-05-23 19:11 ` Junio C Hamano
2014-05-23 10:30 ` [PATCH v2 3/3] remote prune: optimize "dangling symref" check/warning Jens Lindström
2 siblings, 1 reply; 16+ messages in thread
From: Jens Lindström @ 2014-05-23 10:29 UTC (permalink / raw)
To: git, Junio C Hamano
When 'git remote rm' or 'git remote prune' were used in a repository
with many refs, and needed to delete many remote-tracking refs, a lot
of time was spent deleting those refs since for each deleted ref,
repack_without_refs() was called to rewrite packed-refs without just
that deleted ref.
To avoid this, call repack_without_refs() first to repack without all
the refs that will be deleted, before calling delete_ref() to delete
each one completely. The call to repack_without_ref() in delete_ref()
then becomes a no-op, since packed-refs already won't contain any of
the deleted refs.
Signed-off-by: Jens Lindström <jl@opera.com>
---
Note: remove_branches() no longer checks that the remote-tracking
branches it deletes point at the right object before deleting them
by passing the expected SHA-1 to delete_ref(). This was a required
change since all packed refs have been deleted already by the time
we call delete_ref(), which causes delete_ref() to fail if given an
expected SHA-1 to check. 'remote prune' already behaved this way.
builtin/remote.c | 20 ++++++++++++++++++--
refs.c | 2 +-
refs.h | 2 ++
3 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/builtin/remote.c b/builtin/remote.c
index 84802cd..d33abe6 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -749,15 +749,23 @@ static int mv(int argc, const char **argv)
static int remove_branches(struct string_list *branches)
{
+ const char **branch_names;
int i, result = 0;
+
+ branch_names = xmalloc(branches->nr * sizeof(*branch_names));
+ for (i = 0; i < branches->nr; i++)
+ branch_names[i] = branches->items[i].string;
+ result |= repack_without_refs(branch_names, branches->nr);
+ free(branch_names);
+
for (i = 0; i < branches->nr; i++) {
struct string_list_item *item = branches->items + i;
const char *refname = item->string;
- unsigned char *sha1 = item->util;
- if (delete_ref(refname, sha1, 0))
+ if (delete_ref(refname, NULL, 0))
result |= error(_("Could not remove branch %s"), refname);
}
+
return result;
}
@@ -1305,6 +1313,7 @@ static int prune_remote(const char *remote, int dry_run)
{
int result = 0, i;
struct ref_states states;
+ const char **delete_refs;
const char *dangling_msg = dry_run
? _(" %s will become dangling!")
: _(" %s has become dangling!");
@@ -1318,6 +1327,13 @@ static int prune_remote(const char *remote, int dry_run)
states.remote->url_nr
? states.remote->url[0]
: _("(no URL)"));
+
+ delete_refs = xmalloc(states.stale.nr * sizeof(*delete_refs));
+ for (i = 0; i < states.stale.nr; i++)
+ delete_refs[i] = states.stale.items[i].util;
+ if (!dry_run)
+ result |= repack_without_refs(delete_refs, states.stale.nr);
+ free(delete_refs);
}
for (i = 0; i < states.stale.nr; i++) {
diff --git a/refs.c b/refs.c
index 28d5eca..262c1c2 100644
--- a/refs.c
+++ b/refs.c
@@ -2431,7 +2431,7 @@ static int curate_packed_ref_fn(struct ref_entry *entry, void *cb_data)
return 0;
}
-static int repack_without_refs(const char **refnames, int n)
+int repack_without_refs(const char **refnames, int n)
{
struct ref_dir *packed;
struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
diff --git a/refs.h b/refs.h
index 87a1a79..f287c7a 100644
--- a/refs.h
+++ b/refs.h
@@ -132,6 +132,8 @@ extern void rollback_packed_refs(void);
*/
int pack_refs(unsigned int flags);
+extern int repack_without_refs(const char **refnames, int n);
+
extern int ref_exists(const char *);
/*
--
1.9.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 3/3] remote prune: optimize "dangling symref" check/warning
2014-05-23 10:26 ` [PATCH v2 0/3] remote: optimize rm/prune ref deletion Jens Lindström
2014-05-23 10:28 ` [PATCH v2 1/3] remote rm: delete remote configuration as the last Jens Lindström
2014-05-23 10:29 ` [PATCH v2 2/3] remote: repack packed-refs once when deleting multiple refs Jens Lindström
@ 2014-05-23 10:30 ` Jens Lindström
2 siblings, 0 replies; 16+ messages in thread
From: Jens Lindström @ 2014-05-23 10:30 UTC (permalink / raw)
To: git, Junio C Hamano
When 'git remote prune' was used to delete many refs in a repository
with many refs, a lot of time was spent checking for (now) dangling
symbolic refs pointing to the deleted ref, since warn_dangling_symref()
was once per deleted ref to check all other refs in the repository.
Avoid this using the new warn_dangling_symrefs() function which
makes one pass over all refs and checks for all the deleted refs in
one go, after they have all been deleted.
Signed-off-by: Jens Lindström <jl@opera.com>
---
builtin/remote.c | 7 ++++++-
refs.c | 19 ++++++++++++++++++-
refs.h | 1 +
3 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/builtin/remote.c b/builtin/remote.c
index d33abe6..9b3e368 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -1313,6 +1313,7 @@ static int prune_remote(const char *remote, int dry_run)
{
int result = 0, i;
struct ref_states states;
+ struct string_list delete_refs_list = STRING_LIST_INIT_NODUP;
const char **delete_refs;
const char *dangling_msg = dry_run
? _(" %s will become dangling!")
@@ -1339,6 +1340,8 @@ static int prune_remote(const char *remote, int dry_run)
for (i = 0; i < states.stale.nr; i++) {
const char *refname = states.stale.items[i].util;
+ string_list_insert(&delete_refs_list, refname);
+
if (!dry_run)
result |= delete_ref(refname, NULL, 0);
@@ -1348,9 +1351,11 @@ static int prune_remote(const char *remote, int dry_run)
else
printf_ln(_(" * [pruned] %s"),
abbrev_ref(refname, "refs/remotes/"));
- warn_dangling_symref(stdout, dangling_msg, refname);
}
+ warn_dangling_symrefs(stdout, dangling_msg, &delete_refs_list);
+ string_list_clear(&delete_refs_list, 0);
+
free_remote_ref_states(&states);
return result;
}
diff --git a/refs.c b/refs.c
index 262c1c2..59fb700 100644
--- a/refs.c
+++ b/refs.c
@@ -1611,6 +1611,7 @@ int peel_ref(const char *refname, unsigned char *sha1)
struct warn_if_dangling_data {
FILE *fp;
const char *refname;
+ const struct string_list *refnames;
const char *msg_fmt;
};
@@ -1625,8 +1626,12 @@ static int warn_if_dangling_symref(const char *refname, const unsigned char *sha
return 0;
resolves_to = resolve_ref_unsafe(refname, junk, 0, NULL);
- if (!resolves_to || strcmp(resolves_to, d->refname))
+ if (!resolves_to
+ || (d->refname
+ ? strcmp(resolves_to, d->refname)
+ : !string_list_has_string(d->refnames, resolves_to))) {
return 0;
+ }
fprintf(d->fp, d->msg_fmt, refname);
fputc('\n', d->fp);
@@ -1639,6 +1644,18 @@ void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refname)
data.fp = fp;
data.refname = refname;
+ data.refnames = NULL;
+ data.msg_fmt = msg_fmt;
+ for_each_rawref(warn_if_dangling_symref, &data);
+}
+
+void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct string_list *refnames)
+{
+ struct warn_if_dangling_data data;
+
+ data.fp = fp;
+ data.refname = NULL;
+ data.refnames = refnames;
data.msg_fmt = msg_fmt;
for_each_rawref(warn_if_dangling_symref, &data);
}
diff --git a/refs.h b/refs.h
index f287c7a..1440acc 100644
--- a/refs.h
+++ b/refs.h
@@ -89,6 +89,7 @@ static inline const char *has_glob_specials(const char *pattern)
extern int for_each_rawref(each_ref_fn, void *);
extern void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refname);
+extern void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct string_list* refnames);
/*
* Lock the packed-refs file for writing. Flags is passed to
--
1.9.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] remote: defer repacking packed-refs when deleting refs
2014-05-23 10:03 ` Jens Lindström
@ 2014-05-23 17:09 ` Junio C Hamano
2014-05-24 7:54 ` Jens Lindström
0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2014-05-23 17:09 UTC (permalink / raw)
To: Jens Lindström; +Cc: Git Mailing List
Jens Lindström <jl@opera.com> writes:
> Removing the remote configuration (I assume you mean the
> "remote.<name>" section from .git/config) last in 'remote rm' would be
> a bit better I think. Especially ...
Yes, that is exactly why I suggested it ;-)
> Doing the repacking first and then run through and delete loose refs
> and ref logs leads to a smaller and nicer patch as well ...
Hmph, that would be a bonus, then. I suggested it primarily as a
correctness thing against an interrupted operation.
> One additional change was required in
> builtin/remote.c:remove_branches(). It used to pass in the expected
> SHA-1 of the ref to delete_ref(), which only works if the ref exists.
> If repack_without_refs() is called first, most refs won't exist,...
Why? A ref, before you start pruning or removing a remote, could be
in one of these three states.
(a) only loose refs/remotes/them/frotz exists
(b) both loose and packed refs/remotes/them/nitfol exist
(c) only packed refs/remotes/them/xyzzy exists
And then you repack packed-refs file without these three refs. When
you do so, you know that you would need to remove frotz and nitfol,
and also you know you do not want to call delete_ref for xyzzy, no?
In other words, the problem you are describing in remove_branches(),
that it wants to make sure that the ref-to-be-removed still points
at the expected object, does not sound like it is doing anything
inherently wrong---rather, it sounds like you didn't make necessary
changes to the caller to make sure that you do not call delete_ref()
on something you know you removed already.
Puzzled....
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/3] remote rm: delete remote configuration as the last
2014-05-23 10:28 ` [PATCH v2 1/3] remote rm: delete remote configuration as the last Jens Lindström
@ 2014-05-23 18:55 ` Junio C Hamano
0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2014-05-23 18:55 UTC (permalink / raw)
To: Jens Lindström; +Cc: git
Jens Lindström <jl@opera.com> writes:
> When removing a remote, delete the remote-tracking branches before
> deleting the remote configuration. This way, if the operation fails or
> is aborted while deleting the remote-tracking branches, the command can
> be rerun to complete the operation.
>
> Signed-off-by: Jens Lindström <jl@opera.com>
I think this is a good change, regardless of the "calling delete-ref
millions of times, each time rewriting the whole packed-ref file, is
inefficient" issue.
I wonder if the new "if (!result)" block wants to have its own else
clause to let the users know that the definition of the remote was
still left intact under such an unusual condition where ref deletion
somehow fails, but it would be OK as the users have presumably seen
error messages and understand that removal did not happen.
Will queue. Thanks.
> ---
> builtin/remote.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/remote.c b/builtin/remote.c
> index b3ab4cf..84802cd 100644
> --- a/builtin/remote.c
> +++ b/builtin/remote.c
> @@ -789,10 +789,6 @@ static int rm(int argc, const char **argv)
> known_remotes.to_delete = remote;
> for_each_remote(add_known_remote, &known_remotes);
>
> - strbuf_addf(&buf, "remote.%s", remote->name);
> - if (git_config_rename_section(buf.buf, NULL) < 1)
> - return error(_("Could not remove config section '%s'"), buf.buf);
> -
> read_branches();
> for (i = 0; i < branch_list.nr; i++) {
> struct string_list_item *item = branch_list.items + i;
> @@ -837,6 +833,12 @@ static int rm(int argc, const char **argv)
> }
> string_list_clear(&skipped, 0);
>
> + if (!result) {
> + strbuf_addf(&buf, "remote.%s", remote->name);
> + if (git_config_rename_section(buf.buf, NULL) < 1)
> + return error(_("Could not remove config section '%s'"), buf.buf);
> + }
> +
> return result;
> }
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] remote: repack packed-refs once when deleting multiple refs
2014-05-23 10:29 ` [PATCH v2 2/3] remote: repack packed-refs once when deleting multiple refs Jens Lindström
@ 2014-05-23 19:11 ` Junio C Hamano
2014-05-23 19:25 ` Junio C Hamano
0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2014-05-23 19:11 UTC (permalink / raw)
To: Jens Lindström; +Cc: git
Jens Lindström <jl@opera.com> writes:
> When 'git remote rm' or 'git remote prune' were used in a repository
> with many refs, and needed to delete many remote-tracking refs, a lot
> of time was spent deleting those refs since for each deleted ref,
> repack_without_refs() was called to rewrite packed-refs without just
> that deleted ref.
>
> To avoid this, call repack_without_refs() first to repack without all
> the refs that will be deleted, before calling delete_ref() to delete
> each one completely. The call to repack_without_ref() in delete_ref()
> then becomes a no-op, since packed-refs already won't contain any of
> the deleted refs.
>
> Signed-off-by: Jens Lindström <jl@opera.com>
> ---
> Note: remove_branches() no longer checks that the remote-tracking
> branches it deletes point at the right object before deleting them
> by passing the expected SHA-1 to delete_ref(). This was a required
> change since all packed refs have been deleted already by the time
> we call delete_ref(), which causes delete_ref() to fail if given an
> expected SHA-1 to check. 'remote prune' already behaved this way.
>
> builtin/remote.c | 20 ++++++++++++++++++--
> refs.c | 2 +-
> refs.h | 2 ++
> 3 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/remote.c b/builtin/remote.c
> index 84802cd..d33abe6 100644
> --- a/builtin/remote.c
> +++ b/builtin/remote.c
> @@ -749,15 +749,23 @@ static int mv(int argc, const char **argv)
>
> static int remove_branches(struct string_list *branches)
> {
> + const char **branch_names;
> int i, result = 0;
> +
> + branch_names = xmalloc(branches->nr * sizeof(*branch_names));
> + for (i = 0; i < branches->nr; i++)
> + branch_names[i] = branches->items[i].string;
> + result |= repack_without_refs(branch_names, branches->nr);
> + free(branch_names);
Hmph. I wonder if you can refactor/enhance the interface to the
repack_without_refs() function before this step in the series, so
that the function lets the caller to know if each ref was actually
removed or it was not in packed-refs from the beginning, and also I
wonder if such a refactoring helps. My gut feeling is that it would
not help that much this particular series, but it would probably be
a good thing to do in the longer run. So probably it is better to do
without such a change as a part of this series.
> for (i = 0; i < branches->nr; i++) {
> struct string_list_item *item = branches->items + i;
> const char *refname = item->string;
> - unsigned char *sha1 = item->util;
Here, you can check if the refname still exists as a ref; if it no
longer exists, it would mean that the only copy of the ref was in
the packed-refs file and you have already deleted it, and you can
refrain from calling delete_ref() altogether, e.g.
if (!ref_exists(refname))
continue; /* already removed the sole copy from packed-ref */
and then still retain the safetly against racing somebody else who
created or updated the ref you wanted to remove here by passing the
object name to delete_ref().
> - if (delete_ref(refname, sha1, 0))
> + if (delete_ref(refname, NULL, 0))
> result |= error(_("Could not remove branch %s"), refname);
> }
> +
> return result;
> }
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] remote: repack packed-refs once when deleting multiple refs
2014-05-23 19:11 ` Junio C Hamano
@ 2014-05-23 19:25 ` Junio C Hamano
0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2014-05-23 19:25 UTC (permalink / raw)
To: Jens Lindström; +Cc: git, Michael Haggerty, Ronnie Sahlberg
Junio C Hamano <gitster@pobox.com> writes:
>> - if (delete_ref(refname, sha1, 0))
>> + if (delete_ref(refname, NULL, 0))
>> result |= error(_("Could not remove branch %s"), refname);
By the way, how does this series interact with what Ronnie and Michael
are working on with their ref-transaction series?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] remote: defer repacking packed-refs when deleting refs
2014-05-23 17:09 ` Junio C Hamano
@ 2014-05-24 7:54 ` Jens Lindström
2014-05-27 16:55 ` Junio C Hamano
0 siblings, 1 reply; 16+ messages in thread
From: Jens Lindström @ 2014-05-24 7:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
On Fri, May 23, 2014 at 7:09 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jens Lindström <jl@opera.com> writes:
>> One additional change was required in
>> builtin/remote.c:remove_branches(). It used to pass in the expected
>> SHA-1 of the ref to delete_ref(), which only works if the ref exists.
>> If repack_without_refs() is called first, most refs won't exist,...
>
> Why? A ref, before you start pruning or removing a remote, could be
> in one of these three states.
>
> (a) only loose refs/remotes/them/frotz exists
> (b) both loose and packed refs/remotes/them/nitfol exist
> (c) only packed refs/remotes/them/xyzzy exists
>
> And then you repack packed-refs file without these three refs. When
> you do so, you know that you would need to remove frotz and nitfol,
> and also you know you do not want to call delete_ref for xyzzy, no?
>
> In other words, the problem you are describing in remove_branches(),
> that it wants to make sure that the ref-to-be-removed still points
> at the expected object, does not sound like it is doing anything
> inherently wrong---rather, it sounds like you didn't make necessary
> changes to the caller to make sure that you do not call delete_ref()
> on something you know you removed already.
>
> Puzzled....
There is one reason why one would want to call delete_ref() even if
the ref itself was already fully deleted by repack_without_refs()
(because it was only packed) and that is that delete_ref() also
removes the ref log, if there is one. We could refactor the deletion
to
1) repack_without_refs() on all refs
2) delete_ref() on still existing (loose) refs
3) delete_ref_log() on all refs
to let us only call delete_ref() on existing refs, and then keep the
current value check.
Personally I don't feel that we're solving an important problem here.
Making sure not to overwrite a ref that has been updated since we read
its value is of course of great value for 'receive-pack' and 'commit'
and such, but in this case we're removing a remote and all its
remote-tracking branches and configuration. If a remote-tracking
branch is updated concurrently, the current value check would fail,
and the remote configuration and that one branch would remain. But if
the update had happened just an instant earlier, just before we
started removing the remote, we would have happily deleted that
remote-tracking branch as well, throwing away whatever information the
update stored in it.
/ Jens
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] remote: defer repacking packed-refs when deleting refs
2014-05-24 7:54 ` Jens Lindström
@ 2014-05-27 16:55 ` Junio C Hamano
0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2014-05-27 16:55 UTC (permalink / raw)
To: Jens Lindström; +Cc: Git Mailing List
Jens Lindström <jl@opera.com> writes:
>> Puzzled....
>
> There is one reason why one would want to call delete_ref() even if
> the ref itself was already fully deleted by repack_without_refs()
> (because it was only packed) and that is that delete_ref() also
> removes the ref log, if there is one.
Ahh, ok, no longer puzzled---I completely forgot about that part.
> We could refactor the deletion to
>
> 1) repack_without_refs() on all refs
> 2) delete_ref() on still existing (loose) refs
> 3) delete_ref_log() on all refs
>
> to let us only call delete_ref() on existing refs, and then keep the
> current value check.
I tend to agree that it is sufficient for the purpose of this topic
to be loose about the check; the refactoring can come later, as part
of the ref-transaction refactoring that is going on in a separate
thread.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-05-27 16:55 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-20 10:34 [PATCH 0/2] remote: optimize rm/prune ref deletion Jens Lindström
2014-05-20 10:39 ` [PATCH 1/2] remote: defer repacking packed-refs when deleting refs Jens Lindström
2014-05-20 19:30 ` Junio C Hamano
2014-05-20 20:29 ` Junio C Hamano
2014-05-23 10:03 ` Jens Lindström
2014-05-23 17:09 ` Junio C Hamano
2014-05-24 7:54 ` Jens Lindström
2014-05-27 16:55 ` Junio C Hamano
2014-05-20 10:41 ` [PATCH 2/2] remote prune: optimize "dangling symref" check/warning Jens Lindström
2014-05-23 10:26 ` [PATCH v2 0/3] remote: optimize rm/prune ref deletion Jens Lindström
2014-05-23 10:28 ` [PATCH v2 1/3] remote rm: delete remote configuration as the last Jens Lindström
2014-05-23 18:55 ` Junio C Hamano
2014-05-23 10:29 ` [PATCH v2 2/3] remote: repack packed-refs once when deleting multiple refs Jens Lindström
2014-05-23 19:11 ` Junio C Hamano
2014-05-23 19:25 ` Junio C Hamano
2014-05-23 10:30 ` [PATCH v2 3/3] remote prune: optimize "dangling symref" check/warning Jens Lindström
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).