From: Nicolas Pitre <nico@cam.org>
To: Junio C Hamano <junkio@cox.net>
Cc: David Lang <david.lang@digitalinsight.com>,
Andy Parkins <andyparkins@gmail.com>,
git@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
Date: Tue, 17 Apr 2007 22:39:41 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.0.98.0704172154160.4504@xanadu.home> (raw)
In-Reply-To: <7vy7kqlj5r.fsf@assigned-by-dhcp.cox.net>
On Tue, 17 Apr 2007, Junio C Hamano wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
> >> I would like to, however this doesn't currently integrate
> >> well with git. I've been told in the past that once
> >> .gitattributes is in place then the hooks for the crlf stuff
> >> can be generalized to allow for calls out to custom code to
> >> do this sort of thing.
> >
> > And I agree that this is a perfectly sensible thing to do. The facility
> > should be there for you to apply any kind of transformation with
> > external tools on data going in or out from Git. There are good and bad
> > things you can do with such a facility, but at least it becomes your
> > responsibility to screw^H^H^H^Hfilter your data and not something that
> > is enforced by Git itself.
>
> You have to be careful, though. Depending on what kind of
> transformation you implement with the external tools, you would
> end up having to slow down everything we would do.
So what?
We provide a rope with proper caveat emptor. Up to others to hang
themselves with it if they so desire. It is not our problem anymore.
> It boils down to this statement from Andy:
>
> ..., keywords (in other VCSs, and so why not in git) are
> only updated when a file is checked out. There is no need
> to touch every file. It's actually beneficial, because the
> keyword in the file is the state of the file at the time it
> was checked in - which is actually more useful than updating
> it to the latest commit every time.
>
> That means you're only ever expanding in a file that your
> changing anyway - so it's effectively free. git-checkout
> would still be immediate and instantaneous.
>
> Back up a bit and think what "when a file is checked out" means.
> His argument assumes the current behaviour of not checking out
> when the underlying blob objects before munging are the same.
And I think that should remain. If someone really wants full
transformation aka keyword expansion or whatever then he'd just need to
force a full read-tree after switching branch.
> But with keyword expansion and fancier "external tools" whose
> semantics are not well defined (iow, defined to be "do whatever
> they please"), does it still make sense to consider two blobs
> that appear in totally different context "the same" and omit
> checking out (and causing the external tools hook not getting
> run)?
I think so. At least by default. And if we really want to be kind we
could provide a special attribute just for disabling such optimization
i.e. to force a checkout of everything marked with such an attribute
everytime. But we might as well wait to see if someone actually ask
about that.
But for many other cases having such a facility would be just nice
especially for those cases that don't depend on the branch/commit but
only on the content itself. For example it was pointed out that Open
Office documents are gzipped XML files. In that case it would be
extremely advantageous to have the ability to specify an input filter as
"gzip -d" and an output filter as "gzip -c" so Git has a chance to
actually perform some kind of delta compression.
I'm sure there might be other type of filters for situation we've not
thought about yet, and that we might not want to carry as a builtin
feature. Who knows, maybe someone might want to port Git to the
System/360 and will need an EBCDIC filter on checked out text files. Or
maybe a byte swapping filter for some kind of binary files when on a
system with a different endianness.
> I already pointed out to Andy that the branch name the
> file was taken from, if it were to take part of the keyword
> expansion, would come out incorrectly in his printed svg
> drawing.
Tough.
> If you want somebody's earlier example of "giving a file with
> embedded keyword to somebody, who modifies and sends the result
> back in full, now you would want to incorporate the change by
> identifying the origin" to work, you would want "$Source$" (I am
> looking at CVS documentation, "Keyword substitution/Keyword
> List") to identify where that file came from (after all, a
> source tree could have duplicated files) so that you can tell
> which file the update is about, and this keyword would expand
> differently depending on where in the project tree the blob
> appears.
Like we don't record renames, I don't think we should record such thing
in checked out files either. Using a search for the closest match (like
we do for rename detection) is probably a better avenue than trusting
that the ID embedded in the file wasn't messed with. Linus already
provided a small script that would do that with pretty good results.
> It is not just the checkout codepath. We omit diffs when we
> know from SHA-1 that the blobs are the same before decoration.
> We even omit diffs when we know from SHA-1 that two trees are
> the same without taking possible decorations that can be applied
> differently to the blobs they contain into account. Earlier,
> Andy said he wanted to grep for the expanded text if he is
> grepping in the working tree, and I think that makes sense, but
> that means git-grep cannot do the same "borrow from working tree
> when expanding from blob object is more expensive" optimization
> we have for diff. We also need to disable that optimization
> from the diff, regardless of what the correct semantics for
> grepping in working trees should be.
>
> I suspect that you would have to play safe and say "when
> external tools are involved, we need to disable the existing
> content SHA-1 based optimization for all paths that ask for
> them" to keep your sanity.
Maybe. If that is what's really needed then so be it. People who
really want to do strange things will have the flexibility to do so, but
they'll have to pay the price in loss of performance.
Nicolas
next prev parent reply other threads:[~2007-04-18 2:40 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-17 9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
[not found] ` <200704171803.58940.an dyparkins@gmail.com>
2007-04-17 10:09 ` Junio C Hamano
2007-04-17 11:35 ` Andy Parkins
2007-04-17 15:53 ` Linus Torvalds
2007-04-17 17:03 ` Andy Parkins
2007-04-17 18:12 ` Linus Torvalds
2007-04-17 19:12 ` Andy Parkins
[not found] ` <alpine.LFD. 0.98.0704171530220.4504@xanadu.home>
2007-04-17 19:41 ` Nicolas Pitre
2007-04-17 19:45 ` David Lang
[not found] ` <alpin e.LFD.0.98.0704171624190.4504@xanadu.home>
2007-04-17 20:29 ` Nicolas Pitre
2007-04-17 20:05 ` David Lang
2007-04-17 21:16 ` Nicolas Pitre
[not found] ` <7vy7k qlj5r.fsf@assigned-by-dhcp.cox.net>
2007-04-17 20:53 ` David Lang
2007-04-17 21:52 ` Andy Parkins
2007-04-17 22:40 ` Junio C Hamano
2007-04-18 2:39 ` Nicolas Pitre [this message]
2007-04-18 5:04 ` Junio C Hamano
2007-04-18 14:56 ` Nicolas Pitre
2007-04-18 11:14 ` Johannes Schindelin
2007-04-18 15:10 ` Nicolas Pitre
2007-04-19 8:19 ` Johannes Schindelin
2007-04-21 0:42 ` David Lang
2007-04-21 1:54 ` Junio C Hamano
2007-04-21 2:06 ` Nicolas Pitre
2007-04-21 23:31 ` David Lang
2007-04-18 6:24 ` Rogan Dawes
2007-04-18 15:02 ` Linus Torvalds
2007-04-18 15:34 ` Nicolas Pitre
2007-04-18 15:38 ` Rogan Dawes
2007-04-18 15:59 ` Nicolas Pitre
2007-04-18 16:09 ` Rogan Dawes
2007-04-18 17:58 ` Alon Ziv
2007-04-17 19:54 ` Linus Torvalds
2007-04-17 20:46 ` Andy Parkins
2007-04-17 20:52 ` [PATCH] Add keyword collapse " Andy Parkins
2007-04-17 21:10 ` [PATCH 2/2] Add keyword unexpansion " Linus Torvalds
2007-04-17 21:13 ` Linus Torvalds
2007-04-18 11:11 ` Johannes Schindelin
2007-04-20 11:32 ` Nikolai Weibull
2007-04-17 21:18 ` Martin Langhoff
2007-04-17 21:24 ` Junio C Hamano
2007-04-20 0:30 ` Jakub Narebski
2007-04-21 0:47 ` David Lang
2007-04-17 15:46 ` Linus Torvalds
2007-04-17 10:41 ` Johannes Sixt
2007-04-17 15:32 ` Linus Torvalds
2007-04-17 17:10 ` Andy Parkins
2007-04-17 17:18 ` Rogan Dawes
2007-04-17 18:23 ` Linus Torvalds
2007-04-17 20:27 ` Rogan Dawes
2007-04-17 23:56 ` Robin H. Johnson
2007-04-18 0:02 ` Junio C Hamano
2007-04-18 0:26 ` J. Bruce Fields
2007-04-18 1:19 ` Linus Torvalds
2007-04-18 1:28 ` Junio C Hamano
2007-04-18 1:33 ` Linus Torvalds
2007-04-18 1:06 ` Robin H. Johnson
2007-04-18 1:15 ` Junio C Hamano
2007-04-18 1:42 ` Junio C Hamano
2007-04-18 2:53 ` Robin H. Johnson
2007-04-18 4:15 ` Daniel Barkalow
2007-04-18 11:32 ` Johannes Schindelin
2007-04-18 2:50 ` Martin Langhoff
2007-04-18 10:06 ` David Kågedal
2007-04-18 11:08 ` Robin H. Johnson
2007-04-17 21:00 ` Matthieu Moy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.98.0704172154160.4504@xanadu.home \
--to=nico@cam.org \
--cc=andyparkins@gmail.com \
--cc=david.lang@digitalinsight.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).