* Re: [PATCH v3 07/28] shallow.c: add remove_reachable_shallow_points() @ 2013-11-26 12:56 Duy Nguyen 0 siblings, 0 replies; 3+ messages in thread From: Duy Nguyen @ 2013-11-26 12:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List On Tue, Nov 26, 2013 at 4:53 AM, Junio C Hamano <gitster@pobox.com> wrote: > Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > >> When we receive a pack and the shallow points from another repository, >> we may want to add more shallow points to current repo to make sure no >> commits point to nowhere. However we do not want to add unnecessary >> shallow points and cut our history short because the other end is a >> shallow version of this repo. The output shallow points may or may not >> be added to .git/shallow, depending on whether they are actually >> reachable in the new pack. >> >> This function filters such shallow points out, leaving ones that might >> potentially be added. A simple has_sha1_file won't do because we may >> have incomplete object islands (i.e. not connected to any refs) and >> the shallow points are on these islands. In that case we want to keep >> the shallow points as candidates until we figure out if the new pack >> connects to such object islands. >> >> Typical cases that use remove_reachable_shallow_points() are: >> >> - fetch from a repo that has longer history: in this case all remote >> shallow roots do not exist in the client repo, this function will >> be cheap as it just does a few lookup_commit_graft and >> has_sha1_file. > > It is unclear why. If you fetched from a repository more complete > than you, you may deepen your history which may allow you to unplug > the shallow points you originally had, and remote "shallow root" (by > the way, lets find a single good phrase to represent this thing and > stick to it) may want to replace them, no? Except that deepen/shorten history is a different mode that this function is not used at all. I should have made that clear. This and the next patch are about "stick to our base and add something on top" Any suggestions about a good phase? I've been swinging between "shallow points" (from 4 months ago) and "shallow roots" (recently). >> - fetch from a repo that has exactly the same shallow root set >> (e.g. a clone from a shallow repo): this case may trigger >> in_merge_bases_many all the way to roots. An exception is made to >> avoid such costly path with a faith that .git/shallow does not >> usually points to unreachable commit islands. > > ... and when the faith is broken, you will end up with a broken > repository??? Not really broken because the new ref will be cut at the troublesome shallow root before it goes out of bound, so the user may be surprised that he got a history shorter than he wanted. It's when the root is removed that we have a problem. But commits in .git/shallow are only removed by 1) deepening history 2) the prune patch 28/28 #1 should send the missing objects and insert a new commit to .git/shallow to plug the hole, so we're good. #2 only removes commits from .git/shallow if they are not reachable from any refs, which is no longer true. >> +static int add_ref(const char *refname, >> + const unsigned char *sha1, int flags, void *cb_data) >> +{ >> + struct commit_array *ca = cb_data; >> + ALLOC_GROW(ca->commits, ca->nr + 1, ca->alloc); >> + ca->commits[ca->nr++] = lookup_commit(sha1); >> + return 0; >> +} > > Can't a ref point at a non-commit-ish? Is the code prepared to deal > with such an entry (possibly a NULL pointer) in the commit_array > struct? Eck, yes a ref can. No the code is not :P Thanks for pointing this out. We don't care about non-commit refs, so we just need to filter them out. -- Duy ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v3 00/28] First class shallow clone @ 2013-11-25 3:55 Nguyễn Thái Ngọc Duy 2013-11-25 3:55 ` [PATCH v3 07/28] shallow.c: add remove_reachable_shallow_points() Nguyễn Thái Ngọc Duy 0 siblings, 1 reply; 3+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2013-11-25 3:55 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy Compared to v2 [1], v3 grows a bit. The biggest difference is .git/shallow is not updated by default (except when you clone from a shallow repository). When you send something, the "safe" refs that do not need new shallow roots are accepted at the receiver end, the others rejected. To accept those other refs, either use "fetch --update-shallow" or enable receive.shallowupdate on the receiver side of a push. Filtering "safe" refs requires walking through some commits, so it'll be more expensive than normal full clones. This is especially true for the receiver of a push (see 07/28 and 17/28). I envision shallow repos are used as upstream to archive old history, so this is not a good news. Commit cache (or pack v4) might help. We might even be able to move some work from receive-pack to send-pack to reduce server load.. [1] http://mid.gmane.org/1374314290-5976-1-git-send-email-pclouds@gmail.com Nguyễn Thái Ngọc Duy (28): transport.h: remove send_pack prototype, already defined in send-pack.h send-pack: forbid pushing from a shallow repository clone: prevent --reference to a shallow repository update-server-info: do not publish shallow clones This part is just cleanup. Advertise shallow graft information on the server end connect.c: teach get_remote_heads to parse "shallow" lines shallow.c: add remove_reachable_shallow_points() shallow.c: add mark_new_shallow_refs() shallow.c: extend setup_*_shallow() to accept extra shallow points fetch-pack.c: move shallow update code out of fetch_pack() fetch-pack.h: one statement per bitfield declaration clone: support remote shallow repository fetch: support fetching from a shallow repository upload-pack: make sure deepening preserves shallow roots fetch: add --update-shallow to get refs that require updating .git/shallow Basic shallow fetch/clone support on git protocol receive-pack: reorder some code in unpack() Support pushing from a shallow clone New var GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses connected.c: add new variant that runs with --shallow-file receive-pack: allow pushing with new shallow roots send-pack: support pushing to a shallow clone remote-curl: pass ref SHA-1 to fetch-pack as well Push support Support fetch/clone over http receive-pack: support pushing to a shallow clone via http send-pack: support pushing from a shallow clone via http smart-http support git-clone.txt: remove shallow clone limitations clone: use git protocol for cloning shallow repo locally prune: clean .git/shallow after pruning objects miscellaneous Documentation/config.txt | 4 + Documentation/fetch-options.txt | 14 +- Documentation/git-clone.txt | 7 +- Documentation/gitremote-helpers.txt | 7 + Documentation/technical/pack-protocol.txt | 7 +- builtin/clone.c | 18 +- builtin/fetch-pack.c | 23 +- builtin/fetch.c | 15 +- builtin/gc.c | 1 + builtin/prune.c | 4 + builtin/receive-pack.c | 248 +++++++++++++++++---- builtin/send-pack.c | 5 +- cache.h | 1 + commit.h | 19 +- connect.c | 14 +- connected.c | 42 +++- connected.h | 2 + environment.c | 6 + fetch-pack.c | 132 ++++++++++-- fetch-pack.h | 29 +-- git.c | 2 +- remote-curl.c | 33 ++- remote.h | 5 +- send-pack.c | 20 +- server-info.c | 9 + shallow.c | 348 +++++++++++++++++++++++++++++- t/t5536-fetch-shallow.sh (new +x) | 193 +++++++++++++++++ t/t5537-push-shallow.sh (new +x) | 184 ++++++++++++++++ t/t5601-clone.sh | 7 + transport-helper.c | 6 + transport.c | 22 +- transport.h | 16 +- upload-pack.c | 8 +- 33 files changed, 1323 insertions(+), 128 deletions(-) create mode 100755 t/t5536-fetch-shallow.sh create mode 100755 t/t5537-push-shallow.sh -- 1.8.2.83.gc99314b ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v3 07/28] shallow.c: add remove_reachable_shallow_points() 2013-11-25 3:55 [PATCH v3 00/28] First class shallow clone Nguyễn Thái Ngọc Duy @ 2013-11-25 3:55 ` Nguyễn Thái Ngọc Duy 2013-11-25 21:53 ` Junio C Hamano 0 siblings, 1 reply; 3+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2013-11-25 3:55 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy When we receive a pack and the shallow points from another repository, we may want to add more shallow points to current repo to make sure no commits point to nowhere. However we do not want to add unnecessary shallow points and cut our history short because the other end is a shallow version of this repo. The output shallow points may or may not be added to .git/shallow, depending on whether they are actually reachable in the new pack. This function filters such shallow points out, leaving ones that might potentially be added. A simple has_sha1_file won't do because we may have incomplete object islands (i.e. not connected to any refs) and the shallow points are on these islands. In that case we want to keep the shallow points as candidates until we figure out if the new pack connects to such object islands. Typical cases that use remove_reachable_shallow_points() are: - fetch from a repo that has longer history: in this case all remote shallow roots do not exist in the client repo, this function will be cheap as it just does a few lookup_commit_graft and has_sha1_file. - fetch from a repo that has exactly the same shallow root set (e.g. a clone from a shallow repo): this case may trigger in_merge_bases_many all the way to roots. An exception is made to avoid such costly path with a faith that .git/shallow does not usually points to unreachable commit islands. - push from a shallow repo that has shorter history than the server's: in_merge_bases_many() is unavoidable, so the longer history the client has the higher the server cost is. The cost may be reduced with the help of pack bitmaps, or commit cache, though. - push from a shallow clone that has exactly the same shallow root set: the same as the second fetch case above, .i.e. cheap by exception. The function must be called before the new pack is installed, or we won't know which objects are ours, which theirs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- commit.h | 3 +++ connect.c | 2 +- remote.h | 1 + shallow.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 50 insertions(+), 1 deletion(-) diff --git a/commit.h b/commit.h index a879526..98044e6 100644 --- a/commit.h +++ b/commit.h @@ -193,6 +193,7 @@ extern struct commit_list *get_octopus_merge_bases(struct commit_list *in); /* largest positive number a signed 32-bit integer can contain */ #define INFINITE_DEPTH 0x7fffffff +struct extra_have_objects; extern int register_shallow(const unsigned char *sha1); extern int unregister_shallow(const unsigned char *sha1); extern int for_each_commit_graft(each_commit_graft_fn, void *); @@ -206,6 +207,8 @@ extern void setup_alternate_shallow(struct lock_file *shallow_lock, const char **alternate_shallow_file); extern char *setup_temporary_shallow(void); extern void advertise_shallow_grafts(int); +extern void remove_reachable_shallow_points(struct extra_have_objects *out, + const struct extra_have_objects *in); int is_descendant_of(struct commit *, struct commit_list *); int in_merge_bases(struct commit *, struct commit *); diff --git a/connect.c b/connect.c index d0602b0..80e4360 100644 --- a/connect.c +++ b/connect.c @@ -45,7 +45,7 @@ int check_ref_type(const struct ref *ref, int flags) return check_ref(ref->name, strlen(ref->name), flags); } -static void add_extra_have(struct extra_have_objects *extra, unsigned char *sha1) +void add_extra_have(struct extra_have_objects *extra, unsigned char *sha1) { ALLOC_GROW(extra->array, extra->nr + 1, extra->alloc); hashcpy(&(extra->array[extra->nr][0]), sha1); diff --git a/remote.h b/remote.h index 773faa9..ff604ff 100644 --- a/remote.h +++ b/remote.h @@ -145,6 +145,7 @@ extern struct ref **get_remote_heads(int in, char *src_buf, size_t src_len, struct ref **list, unsigned int flags, struct extra_have_objects *have, struct extra_have_objects *shallow); +extern void add_extra_have(struct extra_have_objects *extra, unsigned char *sha1); int resolve_remote_symref(struct ref *ref, struct ref *list); int ref_newer(const unsigned char *new_sha1, const unsigned char *old_sha1); diff --git a/shallow.c b/shallow.c index 9552512..a974d2d 100644 --- a/shallow.c +++ b/shallow.c @@ -2,6 +2,8 @@ #include "commit.h" #include "tag.h" #include "pkt-line.h" +#include "remote.h" +#include "refs.h" static int is_shallow = -1; static struct stat shallow_stat; @@ -235,3 +237,46 @@ void advertise_shallow_grafts(int fd) return; for_each_commit_graft(advertise_shallow_grafts_cb, &fd); } + +struct commit_array { + struct commit **commits; + int nr, alloc; +}; + +static int add_ref(const char *refname, + const unsigned char *sha1, int flags, void *cb_data) +{ + struct commit_array *ca = cb_data; + ALLOC_GROW(ca->commits, ca->nr + 1, ca->alloc); + ca->commits[ca->nr++] = lookup_commit(sha1); + return 0; +} + +void remove_reachable_shallow_points(struct extra_have_objects *out, + const struct extra_have_objects *in) +{ + struct commit_array ca; + int i; + + memset(&ca, 0, sizeof(ca)); + head_ref(add_ref, &ca); + for_each_ref(add_ref, &ca); + for (i = 0; i < in->nr; i++) { + struct commit_graft *graft = lookup_commit_graft(in->array[i]); + /* + * For a clone from a shallow upstream, the clone has + * the same shallow roots as upstream and it will + * trigger in_merge_bases_many() all the way to roots. + * Avoid that costly path and assume .git/shallow is + * good most of the time. + */ + if (graft && graft->nr_parent < 0) + continue; + if (has_sha1_file(in->array[i]) && + in_merge_bases_many(lookup_commit(in->array[i]), + ca.nr, ca.commits)) + continue; + add_extra_have(out, in->array[i]); + } + free(ca.commits); +} -- 1.8.2.83.gc99314b ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v3 07/28] shallow.c: add remove_reachable_shallow_points() 2013-11-25 3:55 ` [PATCH v3 07/28] shallow.c: add remove_reachable_shallow_points() Nguyễn Thái Ngọc Duy @ 2013-11-25 21:53 ` Junio C Hamano 0 siblings, 0 replies; 3+ messages in thread From: Junio C Hamano @ 2013-11-25 21:53 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > When we receive a pack and the shallow points from another repository, > we may want to add more shallow points to current repo to make sure no > commits point to nowhere. However we do not want to add unnecessary > shallow points and cut our history short because the other end is a > shallow version of this repo. The output shallow points may or may not > be added to .git/shallow, depending on whether they are actually > reachable in the new pack. > > This function filters such shallow points out, leaving ones that might > potentially be added. A simple has_sha1_file won't do because we may > have incomplete object islands (i.e. not connected to any refs) and > the shallow points are on these islands. In that case we want to keep > the shallow points as candidates until we figure out if the new pack > connects to such object islands. > > Typical cases that use remove_reachable_shallow_points() are: > > - fetch from a repo that has longer history: in this case all remote > shallow roots do not exist in the client repo, this function will > be cheap as it just does a few lookup_commit_graft and > has_sha1_file. It is unclear why. If you fetched from a repository more complete than you, you may deepen your history which may allow you to unplug the shallow points you originally had, and remote "shallow root" (by the way, lets find a single good phrase to represent this thing and stick to it) may want to replace them, no? > - fetch from a repo that has exactly the same shallow root set > (e.g. a clone from a shallow repo): this case may trigger > in_merge_bases_many all the way to roots. An exception is made to > avoid such costly path with a faith that .git/shallow does not > usually points to unreachable commit islands. ... and when the faith is broken, you will end up with a broken repository??? > - push from a shallow repo that has shorter history than the > server's: in_merge_bases_many() is unavoidable, so the longer > history the client has the higher the server cost is. The cost may > be reduced with the help of pack bitmaps, or commit cache, though. > > - push from a shallow clone that has exactly the same shallow root > set: the same as the second fetch case above, .i.e. cheap by > exception. > > The function must be called before the new pack is installed, or we > won't know which objects are ours, which theirs. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > commit.h | 3 +++ > connect.c | 2 +- > remote.h | 1 + > shallow.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 50 insertions(+), 1 deletion(-) > > diff --git a/commit.h b/commit.h > index a879526..98044e6 100644 > --- a/commit.h > +++ b/commit.h > @@ -193,6 +193,7 @@ extern struct commit_list *get_octopus_merge_bases(struct commit_list *in); > /* largest positive number a signed 32-bit integer can contain */ > #define INFINITE_DEPTH 0x7fffffff > > +struct extra_have_objects; > extern int register_shallow(const unsigned char *sha1); > extern int unregister_shallow(const unsigned char *sha1); > extern int for_each_commit_graft(each_commit_graft_fn, void *); > @@ -206,6 +207,8 @@ extern void setup_alternate_shallow(struct lock_file *shallow_lock, > const char **alternate_shallow_file); > extern char *setup_temporary_shallow(void); > extern void advertise_shallow_grafts(int); > +extern void remove_reachable_shallow_points(struct extra_have_objects *out, > + const struct extra_have_objects *in); > > int is_descendant_of(struct commit *, struct commit_list *); > int in_merge_bases(struct commit *, struct commit *); > diff --git a/connect.c b/connect.c > index d0602b0..80e4360 100644 > --- a/connect.c > +++ b/connect.c > @@ -45,7 +45,7 @@ int check_ref_type(const struct ref *ref, int flags) > return check_ref(ref->name, strlen(ref->name), flags); > } > > -static void add_extra_have(struct extra_have_objects *extra, unsigned char *sha1) > +void add_extra_have(struct extra_have_objects *extra, unsigned char *sha1) > { > ALLOC_GROW(extra->array, extra->nr + 1, extra->alloc); > hashcpy(&(extra->array[extra->nr][0]), sha1); > diff --git a/remote.h b/remote.h > index 773faa9..ff604ff 100644 > --- a/remote.h > +++ b/remote.h > @@ -145,6 +145,7 @@ extern struct ref **get_remote_heads(int in, char *src_buf, size_t src_len, > struct ref **list, unsigned int flags, > struct extra_have_objects *have, > struct extra_have_objects *shallow); > +extern void add_extra_have(struct extra_have_objects *extra, unsigned char *sha1); > > int resolve_remote_symref(struct ref *ref, struct ref *list); > int ref_newer(const unsigned char *new_sha1, const unsigned char *old_sha1); > diff --git a/shallow.c b/shallow.c > index 9552512..a974d2d 100644 > --- a/shallow.c > +++ b/shallow.c > @@ -2,6 +2,8 @@ > #include "commit.h" > #include "tag.h" > #include "pkt-line.h" > +#include "remote.h" > +#include "refs.h" > > static int is_shallow = -1; > static struct stat shallow_stat; > @@ -235,3 +237,46 @@ void advertise_shallow_grafts(int fd) > return; > for_each_commit_graft(advertise_shallow_grafts_cb, &fd); > } > + > +struct commit_array { > + struct commit **commits; > + int nr, alloc; > +}; > + > +static int add_ref(const char *refname, > + const unsigned char *sha1, int flags, void *cb_data) > +{ > + struct commit_array *ca = cb_data; > + ALLOC_GROW(ca->commits, ca->nr + 1, ca->alloc); > + ca->commits[ca->nr++] = lookup_commit(sha1); > + return 0; > +} Can't a ref point at a non-commit-ish? Is the code prepared to deal with such an entry (possibly a NULL pointer) in the commit_array struct? > +void remove_reachable_shallow_points(struct extra_have_objects *out, > + const struct extra_have_objects *in) > +{ > + struct commit_array ca; > + int i; > + > + memset(&ca, 0, sizeof(ca)); > + head_ref(add_ref, &ca); > + for_each_ref(add_ref, &ca); > + for (i = 0; i < in->nr; i++) { > + struct commit_graft *graft = lookup_commit_graft(in->array[i]); > + /* > + * For a clone from a shallow upstream, the clone has > + * the same shallow roots as upstream and it will > + * trigger in_merge_bases_many() all the way to roots. > + * Avoid that costly path and assume .git/shallow is > + * good most of the time. > + */ > + if (graft && graft->nr_parent < 0) > + continue; > + if (has_sha1_file(in->array[i]) && > + in_merge_bases_many(lookup_commit(in->array[i]), > + ca.nr, ca.commits)) > + continue; > + add_extra_have(out, in->array[i]); > + } > + free(ca.commits); > +} ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-11-26 12:56 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-26 12:56 [PATCH v3 07/28] shallow.c: add remove_reachable_shallow_points() Duy Nguyen -- strict thread matches above, loose matches on Subject: below -- 2013-11-25 3:55 [PATCH v3 00/28] First class shallow clone Nguyễn Thái Ngọc Duy 2013-11-25 3:55 ` [PATCH v3 07/28] shallow.c: add remove_reachable_shallow_points() Nguyễn Thái Ngọc Duy 2013-11-25 21:53 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).