git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Michael Haggerty <mhagger@alum.mit.edu>
Subject: git-bundle rev handling and de-duping
Date: Thu, 16 Oct 2014 20:33:56 -0400	[thread overview]
Message-ID: <20141017003356.GC7848@peff.net> (raw)
In-Reply-To: <xmqqr3y7ud5h.fsf@gitster.dls.corp.google.com>

[subject tweaked as we have veered quite far off the original, and
 this might get more attention from interested people]

On Thu, Oct 16, 2014 at 10:39:54AM -0700, Junio C Hamano wrote:

>  2.  We use object_array_remove_duplicates() to de-dup "git bundle
>      create x master master", which came from b2a6d1c6 (bundle:
>      allow the same ref to be given more than once, 2009-01-17),
>      which is still the sole caller of the function, and I think
>      this is bogus.  Comparing .name would not de-dup "git bundle
>      create x master refs/heads/master".

Hmm. We also doesn't respect the UNINTERESTING flag, so it will dedup
"foo" and "^foo" into a single entry. I _think_ this doesn't have any
bad side effects, because they are equivalent (i.e., the flag is carried
on the object, not the pending entry). I would expect this:

  git bundle create ... foo ^foo

to barf (and it does) because the bundle is empty. But with this:

  git bundle create ... another-ref foo ^foo

I would expect the resulting bundle to contain the objects from
another-ref, but still mention "foo" as an update (we didn't need to
send any objects, since presumably the other side already had them). It
doesn't, but that is not because of the dedup. It is because we generate
the list of refs to send based on the pending list, and we skip all
UNINTERESTING objects (we must, because otherwise "^foo" would make an
entry). I.e., it is the same "the flag is on the object, not the pending
entry" that saved us above now causing its own bug. Obviously the
example above is sort of silly, but if you have:

  # two branches point at the same object
  git branch foo master

  # the other side already has master. Let's send them foo.
  # this will fail because the bundle is empty. That's mildly
  # annoying because we really want to tell them "hey, update
  # your foo branch". But at least we get an error.
  git bundle create tmp.bundle foo ^master

  # now the same thing, but we're sending them some other objects.
  # This one succeeds, but silently omits foo from the bundle!
  git bundle create tmp.bundle foo another-ref ^master

I have a feeling that the list needs to be generated from revs.cmdline,
not revs.pending, and the de-duplication needs to happen there (with the
ref resolution that you mention).

I also have the feeling that fast-export had to deal with this exact
same issue. It looks like we use revs.cmdline there. I seem to recall
there was some ongoing work in that area, but I stopped paying close
attention due to some personality conflicts, and I don't know if all of
the issues were ever resolved.

> I think the right way to fix these two and a half problems is to do
> the following:
> 
>  - object_array_remove_duplicates() (and contains_name() helper it
>    uses) should be removed from object.c;
> 
>  - create_bundle() in bundle.c should implement a helper that is
>    similar to contains_name() but knows about ref dwimming and use
>    it to call object_array_filter() to replace its call to
>    object_array_remove_duplicates().

Agreed. The loop in create_bundle right after the de-dup does the
dwimming. Probably it would be simple to just skip duplicates there
using a hashmap or sorted list.

> I am not doing this myself, and I do not expect either you or
> Michael to do so, either.  I am just writing this down to point out
> a low hanging fruit to aspiring new contributors (hint, hint).

I am also not planning on working on it soon, but now we have hopefully
fed plenty of possibilities to anybody who wants to. :)

-Peff

  reply	other threads:[~2014-10-17  0:34 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-15 22:32 [PATCH v2 0/25] prune-safety Jeff King
2014-10-15 22:33 ` [PATCH v2 01/25] foreach_alt_odb: propagate return value from callback Jeff King
2014-10-15 22:34 ` [PATCH v2 02/25] isxdigit: cast input to unsigned char Jeff King
2014-10-16 17:16   ` Junio C Hamano
2014-10-15 22:34 ` [PATCH v2 03/25] object_array: factor out slopbuf-freeing logic Jeff King
2014-10-16 17:39   ` Junio C Hamano
2014-10-17  0:33     ` Jeff King [this message]
2014-10-17 21:03       ` git-bundle rev handling and de-duping Philip Oakley
2014-10-17 22:41         ` Junio C Hamano
2014-10-15 22:34 ` [PATCH v2 04/25] object_array: add a "clear" function Jeff King
2014-10-15 22:35 ` [PATCH v2 05/25] clean up name allocation in prepare_revision_walk Jeff King
2014-10-15 22:37 ` [PATCH v2 06/25] reachable: use traverse_commit_list instead of custom walk Jeff King
2014-10-16 17:53   ` Junio C Hamano
2014-10-15 22:38 ` [PATCH v2 07/25] reachable: reuse revision.c "add all reflogs" code Jeff King
2014-10-15 22:38 ` [PATCH v2 08/25] prune: factor out loose-object directory traversal Jeff King
2014-10-15 22:40 ` [PATCH v2 09/25] reachable: mark index blobs as SEEN Jeff King
2014-10-15 22:40 ` [PATCH v2 10/25] prune-packed: use for_each_loose_file_in_objdir Jeff King
2014-10-15 22:40 ` [PATCH v2 11/25] count-objects: do not use xsize_t when counting object size Jeff King
2014-10-15 22:41 ` [PATCH v2 12/25] count-objects: use for_each_loose_file_in_objdir Jeff King
2014-10-15 22:41 ` [PATCH v2 13/25] sha1_file: add for_each iterators for loose and packed objects Jeff King
2014-10-15 22:41 ` [PATCH v2 14/25] prune: keep objects reachable from recent objects Jeff King
2014-10-15 22:41 ` [PATCH v2 15/25] pack-objects: refactor unpack-unreachable expiration check Jeff King
2014-10-15 22:42 ` [PATCH v2 16/25] pack-objects: match prune logic for discarding objects Jeff King
2014-10-15 22:42 ` [PATCH v2 17/25] write_sha1_file: freshen existing objects Jeff King
2014-10-15 22:42 ` [PATCH v2 18/25] make add_object_array_with_context interface more sane Jeff King
2014-10-15 22:43 ` [PATCH v2 19/25] traverse_commit_list: support pending blobs/trees with paths Jeff King
2014-10-15 22:43 ` [PATCH v2 20/25] rev-list: document --reflog option Jeff King
2014-10-15 22:44 ` [PATCH v2 21/25] rev-list: add --index-objects option Jeff King
2014-10-16 18:41   ` Junio C Hamano
2014-10-17  0:12     ` Jeff King
2014-10-17  0:43       ` Jeff King
2014-10-17  0:44         ` [PATCH v3 22/26] rev-list: add --indexed-objects option Jeff King
2014-10-17  0:44         ` [PATCH v3 23/26] reachable: use revision machinery's --indexed-objects code Jeff King
2014-10-17  0:44         ` [PATCH v3 24/26] pack-objects: use argv_array Jeff King
2014-10-17  0:44         ` [PATCH v3 25/26] repack: pack objects mentioned by the index Jeff King
2014-10-17  0:44         ` [PATCH v3 26/26] pack-objects: double-check options before discarding objects Jeff King
2014-10-15 22:44 ` [PATCH v2 22/25] reachable: use revision machinery's --index-objects code Jeff King
2014-10-15 22:45 ` [PATCH v2 23/25] pack-objects: use argv_array Jeff King
2014-10-15 22:46 ` [PATCH v2 24/25] repack: pack objects mentioned by the index Jeff King
2014-10-15 22:48 ` [PATCH v2 25/25] pack-objects: double-check options before discarding objects Jeff King
2014-10-16 21:07 ` [PATCH v2 0/25] prune-safety Junio C Hamano
2014-10-16 21:10   ` Junio C Hamano
2014-10-16 21:21   ` Jeff King
2014-10-16 21:39     ` Jeff King
2014-10-16 22:18       ` Junio C Hamano
2014-10-17  0:03         ` Jeff King
     [not found]       ` <CAPc5daX0AFv9jDrFyd_OnupW5AfZW9Je_rgzaViX_xxs3SG5zg@mail.gmail.com>
2014-10-17  4:49         ` Jeff King
2014-10-18 12:31       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141017003356.GC7848@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).