From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Nguyen Thai Ngoc Duy <pclouds@gmail.com>, git@vger.kernel.org
Subject: Re: [WIP PATCH] Manual rename correction
Date: Thu, 2 Aug 2012 18:37:33 -0400 [thread overview]
Message-ID: <20120802223733.GA28217@sigill.intra.peff.net> (raw)
In-Reply-To: <7vipd2e00g.fsf@alter.siamese.dyndns.org>
On Wed, Aug 01, 2012 at 03:10:55PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > On Tue, Jul 31, 2012 at 11:01:27PM -0700, Junio C Hamano wrote:
> > ...
> >> As we still have the pathname in this codepath, I am wondering if we
> >> would benefit from custom "content hash" that knows the nature of
> >> payload than the built-in similarity estimator, driven by the
> >> attribute mechanism (if the latter is the case, that is).
> >
> > Hmm. Interesting. But I don't think that attributes are a good fit here.
> > They are pathname based, so how do I apply anything related to
> > similarity of a particular version by pathname? IOW, how does it apply
> > in one tree but not another?
>
> When you move porn/0001.jpg in the preimage to naughty/00001.jpg in
> the postimage, they both can hit "*.jpg contentid=jpeg" line in the
> top-level .gitattribute file, and the contentid driver for jpeg type
> may strip exif and hash the remainder bits in the image to come up
> with a token you can use in a similar way as object ID is used in
> the exact rename detection phase.
>
> Just thinking aloud.
Ah, I see. That still feels like way too specific a use case to me. A
much more general use case to me would be a contentid driver which
splits the file into multiple chunks (which can be concatenated to
arrive at the original content), and marks chunks as "OK to delta" or
"not able to delta". In other words, a content-specific version of the
bup-style splitting that people have proposed.
Assuming we split a jpeg into its EXIF bits (+delta) and its image bits
(-delta), then you could do a fast rename or pack-objects comparison
between two such files (in fact, with chunked object storage,
pack-objects can avoid looking at the image parts at all).
However, it may be the case that such "smart" splitting is not
necessary, as stupid and generic bup-style splitting may be enough. I
really need to start playing with the patches you wrote last year that
started in that direction.
-Peff
next prev parent reply other threads:[~2012-08-02 22:37 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-31 14:15 [WIP PATCH] Manual rename correction Nguyen Thai Ngoc Duy
2012-07-31 16:32 ` Junio C Hamano
2012-07-31 19:23 ` Jeff King
2012-07-31 20:20 ` Junio C Hamano
2012-08-01 0:42 ` Jeff King
2012-08-01 6:01 ` Junio C Hamano
2012-08-01 21:54 ` Jeff King
2012-08-01 22:10 ` Junio C Hamano
2012-08-02 22:37 ` Jeff King [this message]
2012-08-02 22:51 ` Junio C Hamano
2012-08-02 22:58 ` Jeff King
2012-08-02 5:33 ` Junio C Hamano
2012-08-01 1:10 ` Nguyen Thai Ngoc Duy
2012-08-01 2:01 ` Jeff King
2012-08-01 4:36 ` Nguyen Thai Ngoc Duy
2012-08-01 6:09 ` Junio C Hamano
2012-08-01 6:34 ` Nguyen Thai Ngoc Duy
2012-08-01 21:32 ` Jeff King
2012-08-01 21:27 ` Jeff King
2012-08-02 12:08 ` Nguyen Thai Ngoc Duy
2012-08-02 22:41 ` Jeff King
2012-08-04 17:09 ` [PATCH 0/8] caching rename results Jeff King
2012-08-04 17:10 ` [PATCH 1/8] implement generic key/value map Jeff King
2012-08-04 22:58 ` Junio C Hamano
2012-08-06 20:35 ` Jeff King
2012-08-04 17:10 ` [PATCH 2/8] map: add helper functions for objects as keys Jeff King
2012-08-04 17:11 ` [PATCH 3/8] fast-export: use object to uint32 map instead of "decorate" Jeff King
2012-08-04 17:11 ` [PATCH 4/8] decorate: use "map" for the underlying implementation Jeff King
2012-08-04 17:11 ` [PATCH 5/8] map: implement persistent maps Jeff King
2012-08-04 17:11 ` [PATCH 6/8] implement metadata cache subsystem Jeff King
2012-08-04 22:49 ` Junio C Hamano
2012-08-06 20:31 ` Jeff King
2012-08-06 20:38 ` Jeff King
2012-08-04 17:12 ` [PATCH 7/8] implement rename cache Jeff King
2012-08-04 17:14 ` [PATCH 8/8] diff: optionally use " Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120802223733.GA28217@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).