git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Does GIT has vc keywords like CVS/Subversion?
@ 2006-10-09  1:25 Dongsheng Song
  2006-10-09  2:44 ` Liu Yubao
  0 siblings, 1 reply; 12+ messages in thread
From: Dongsheng Song @ 2006-10-09  1:25 UTC (permalink / raw)
  To: git

I want to know whether there is a plan to add this feature, or GIT
doesn't require it at all.

Keywords like LastChangedDate, LastChangedRevision, LastChangedBy, Id
are useful for version control.

Dongsheng

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-09  1:25 Does GIT has vc keywords like CVS/Subversion? Dongsheng Song
@ 2006-10-09  2:44 ` Liu Yubao
  2006-10-09  2:59   ` Petr Baudis
  2006-10-09 16:13   ` Linus Torvalds
  0 siblings, 2 replies; 12+ messages in thread
From: Liu Yubao @ 2006-10-09  2:44 UTC (permalink / raw)
  To: Dongsheng Song; +Cc: git

Dongsheng Song wrote:
> I want to know whether there is a plan to add this feature, or GIT
> doesn't require it at all.
> 
> Keywords like LastChangedDate, LastChangedRevision, LastChangedBy, Id
> are useful for version control.
> 
I almost mistake I sent my last question twice:-), maybe we need more FAQs
like this:
Q: Does GIT [some feature] like [some vcs] ?
A: No. Because ...

IMHO, I don't think keyword substitution is a good idea, as it will confuse
the external diff/merge tools.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-09  2:44 ` Liu Yubao
@ 2006-10-09  2:59   ` Petr Baudis
  2006-10-09 16:13   ` Linus Torvalds
  1 sibling, 0 replies; 12+ messages in thread
From: Petr Baudis @ 2006-10-09  2:59 UTC (permalink / raw)
  To: Liu Yubao; +Cc: Dongsheng Song, git

Dear diary, on Mon, Oct 09, 2006 at 04:44:10AM CEST, I got a letter
where Liu Yubao <yubao.liu@gmail.com> said that...
> Dongsheng Song wrote:
> >I want to know whether there is a plan to add this feature, or GIT
> >doesn't require it at all.
> >
> >Keywords like LastChangedDate, LastChangedRevision, LastChangedBy, Id
> >are useful for version control.
> >
> I almost mistake I sent my last question twice:-), maybe we need more FAQs
> like this:
> Q: Does GIT [some feature] like [some vcs] ?
> A: No. Because ...

I have added direct link to FAQ in the Git wiki to the Git homepage
header - http://git.or.cz/gitwiki/GitFaq. It's a wiki, so feel free to
add more q/a there.

> IMHO, I don't think keyword substitution is a good idea, as it will confuse
> the external diff/merge tools.

There can be valid usage scenarios for keyword substitution but I tend
to agree that it usually is not necessary to have it (and projects tend
to use it just "because we can", which is of course their right). Also,
implementing it in Git poses some challenges and has some ugly
implications (like actually having to start to worry about binary
files).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-09  2:44 ` Liu Yubao
  2006-10-09  2:59   ` Petr Baudis
@ 2006-10-09 16:13   ` Linus Torvalds
  2006-10-09 21:08     ` Martin Langhoff
  1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2006-10-09 16:13 UTC (permalink / raw)
  To: Liu Yubao; +Cc: Dongsheng Song, git



On Mon, 9 Oct 2006, Liu Yubao wrote:
> 
> IMHO, I don't think keyword substitution is a good idea, as it will confuse
> the external diff/merge tools.

There are other reasons why it's a _horrible_ idea, like the fact that it 
can mess up binary files etc (so if you do keyword substitution, you also 
need to suddenly care _deeply_ whether a file is binary or not).

The whole notion of keyword substitution is just totally idiotic. It's 
trivial to do "outside" of the actual content tracking, if you want to 
have it when doing release trees as tar-balls etc.

So:
 - inside of the SCM, keyword substitution is pointless, since you have 
   much better tools available (like "git log filename")
 - outside of the SCM, keyword substitution can make sense, but doing it 
   should be in helper scripts or something that can easily tailor it for 
   the actual need of that particular project.

For example, we actually do a certain kind of keyword subtituion for the 
kernel. Look at the -git snapshots: the script that generates the snapshot 
diffs has a simple sequence in it to "keyword substitute" the Makefile for 
the EXTRAVERSION flag, so the diff will result in the Makefile having the 
knowledge of which git SHA1 version the resulting patch was, even though 
the thing isn't a git tree any more:

	...
	git-read-tree $CURCOMM
	git-checkout-index Makefile
	perl -pi -e "s/EXTRAVERSION =.*/EXTRAVERSION = $EXTRAVERSION/" Makefile
	git-diff-index -m -p $RELTREE | gzip -9 > $STAGE/patch-$CURNAME.gz
	...

So this is how to do keyword substitution in a _sane_ way.

Sure, we could do something like this as a git script, and support it 
"natively", but the fact is, keyword substitution is just stupid.

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-09 16:13   ` Linus Torvalds
@ 2006-10-09 21:08     ` Martin Langhoff
  2006-10-09 22:48       ` Johannes Schindelin
                         ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Martin Langhoff @ 2006-10-09 21:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Liu Yubao, Dongsheng Song, git

On 10/10/06, Linus Torvalds <torvalds@osdl.org> wrote:
> So:
>  - inside of the SCM, keyword substitution is pointless, since you have
>    much better tools available (like "git log filename")
>  - outside of the SCM, keyword substitution can make sense, but doing it
>    should be in helper scripts or something that can easily tailor it for
>    the actual need of that particular project.

For the outside of the SCM case, keyword subst is useful indeed if
someone has a $version_unknown tarball, unpacks it and hacks away. It
is a pretty broken scenario, and less likely to happen nowadays with
easy access to SCM tools.

However, I don't think that scenario is hard to support and Git can
have a much better story to tell than keyword substituting SCMs.

If we have a tool that I can pass a file or a directory tree and will
find the (perfectly|closely) matching trees and related commits.

For the single file case, searching for an exact SHA1 match is easy,
as is by path. If we get a file without a path it gets a bit harder --
is there a way to scan the object store for blobs of around a given
size (as the packing code does) from Perl? Actually, if we find a
relatively close match, it'd be useful to ask git if it's deltified
and ask for other members of the delta chain.

For the directory tree case, the ideal thing would be to build a
temporary index without getting the blobs in the object store, and
then do a first pass trying to match tree SHA1s. If the user has
modified a few files in a large project, it'll be trivial to find a
good candidate commit for delta. OTOH, if the user has indulged in
wide ranging search and replace... it will be well deserved pain ;-)

cheers,



martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-09 21:08     ` Martin Langhoff
@ 2006-10-09 22:48       ` Johannes Schindelin
  2006-10-09 22:57         ` Martin Langhoff
  2006-10-09 22:55       ` Junio C Hamano
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Johannes Schindelin @ 2006-10-09 22:48 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Linus Torvalds, Liu Yubao, Dongsheng Song, git

Hi,

On Tue, 10 Oct 2006, Martin Langhoff wrote:

> On 10/10/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
> >  - outside of the SCM, keyword substitution can make sense, but doing it
> >    should be in helper scripts or something that can easily tailor it for
> >    the actual need of that particular project.

... like a pre-commit hook.

> If we have a tool that I can pass a file or a directory tree and will 
> find the (perfectly|closely) matching trees and related commits.
> 
> For the single file case, searching for an exact SHA1 match is easy,
> as is by path.

If you have the path, you can reuse the whole algorithm for finding the 
best delta base.

However, if you do not have the path, you might as well just give up (if 
there is no perfect match for the SHA1), since the SHA1 is _not_ similar 
for similar contents. IOW, you'd literally have to search _all_ objects in 
the repository, which usually takes a long, long time.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-09 21:08     ` Martin Langhoff
  2006-10-09 22:48       ` Johannes Schindelin
@ 2006-10-09 22:55       ` Junio C Hamano
  2006-10-10  7:37       ` Rene Scharfe
  2006-10-10 16:49       ` Shawn Pearce
  3 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2006-10-09 22:55 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git


"Martin Langhoff" <martin.langhoff@gmail.com> writes:

> is there a way to scan the object store for blobs of around a given
> size (as the packing code does) from Perl?

For objects in packs, verify-pack -v comes to mind (show-index
might show the same information).  Loose objects needs help from
git-cat-file -s or git-cat-file -t or both.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-09 22:48       ` Johannes Schindelin
@ 2006-10-09 22:57         ` Martin Langhoff
  0 siblings, 0 replies; 12+ messages in thread
From: Martin Langhoff @ 2006-10-09 22:57 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, Liu Yubao, Dongsheng Song, git

On 10/10/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> If you have the path, you can reuse the whole algorithm for finding
> the best delta base.

Can I do that from Perl/bash? (how?)

> However, if you do not have the path, you might as well just give up (if
> there is no perfect match for the SHA1), since the SHA1 is _not_ similar
> for similar contents. IOW, you'd literally have to search _all_ objects in
> the repository, which usually takes a long, long time.

So the delta base algorithm doesn't work without a path. I thought we
had a quick way to find blobs of similar size. If the user can't even
give us a filename (that we can use to try and build a likely path)
then they have bigger problems than the delta ;-) -- at some point we
have to provide git-paddedcell for the remaining <ahem> users.

cheers,


maritn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-09 21:08     ` Martin Langhoff
  2006-10-09 22:48       ` Johannes Schindelin
  2006-10-09 22:55       ` Junio C Hamano
@ 2006-10-10  7:37       ` Rene Scharfe
  2006-10-10 16:49       ` Shawn Pearce
  3 siblings, 0 replies; 12+ messages in thread
From: Rene Scharfe @ 2006-10-10  7:37 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Linus Torvalds, Liu Yubao, Dongsheng Song, git

Martin Langhoff schrieb:
> For the outside of the SCM case, keyword subst is useful indeed if
> someone has a $version_unknown tarball, unpacks it and hacks away. It
> is a pretty broken scenario, and less likely to happen nowadays with
> easy access to SCM tools.

If you still have the tar file, and if it has been created using
git-archive or git-tar-tree it may contain the commit ID in an archive
comment.  You can use git-get-tar-commit-id to extract it in that case.

This won't work with official git project tarballs btw., as commit ID
embedding has been turned off.  The reason is that older tar versions
extracted the comment as a regular file, which confused users.

René

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-09 21:08     ` Martin Langhoff
                         ` (2 preceding siblings ...)
  2006-10-10  7:37       ` Rene Scharfe
@ 2006-10-10 16:49       ` Shawn Pearce
  2006-10-10 17:14         ` Linus Torvalds
  3 siblings, 1 reply; 12+ messages in thread
From: Shawn Pearce @ 2006-10-10 16:49 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> However, I don't think that scenario is hard to support and Git can
> have a much better story to tell than keyword substituting SCMs.
> 
> If we have a tool that I can pass a file or a directory tree and will
> find the (perfectly|closely) matching trees and related commits.
> 
> For the single file case, searching for an exact SHA1 match is easy,
> as is by path. If we get a file without a path it gets a bit harder --
> is there a way to scan the object store for blobs of around a given
> size (as the packing code does) from Perl? Actually, if we find a
> relatively close match, it'd be useful to ask git if it's deltified
> and ask for other members of the delta chain.

git-verify-pack -v will print every SHA1, its type and its
decompressed size.  It also prints who its delta base is.  Its also
not very fast.  However if you run that on a pack file once and
cache the result then you have much of the data you are looking for.

You can find objects within a margin of error of the blob size,
then find all objects in those delta chains.  Then start fetching
those objects and comparing contents.  But this is brutal and will
take a long time due to the sheer number of objects that probably
would fall into that size bucket.

The single file case without a path is not an easy problem.  Even if
you have an exact SHA1 match (an unmodified file) its difficult
to find what commits used that SHA1 somewhere within their trees.
You need to unpack every tree in every commit and test every
entry for a match.  That's going to take a while on any decent
sized repository.

Most maintainers would just toss the modified file pack at the sender
and say "Uh, where did this file come from?!"  And rightly so.

A maintainer familiar with that section of the repository might
recognize some of the file contents and be able to guess the
filename.  So in short I don't think the single file case without
filename is doable, and I don't think its very useful either.
 
> For the directory tree case, the ideal thing would be to build a
> temporary index without getting the blobs in the object store, and
> then do a first pass trying to match tree SHA1s. If the user has
> modified a few files in a large project, it'll be trivial to find a
> good candidate commit for delta. OTOH, if the user has indulged in
> wide ranging search and replace... it will be well deserved pain ;-)

You have a chance in the tree case.  If you have the entire tree
as a working directory and the modifications made are limited to
a handful of paths then you can load that working directory into a
set of tree objects and perform a match process by walking backwards
through the commit chains looking for trees which have a high number
of paths in common with the working directory.

Unfortunately this also has limited use (but I have one myself!).
If you got the entire working directory from a submitter than that
implies they took your entire distribution, unpacked it, hacked away,
repacked it and sent you the tar/zip file.  That's significantly
larger than a simple patch file produced by diff -R.  As a maintainer
you probably should be kicking that back at the user and saying
"Uh, please submit a patch instead, thanks."


I actually have a scenario where I'm using Git to track another
(much, much crappier) file revision storage tool that would probably
benefit from this, but the benefit is relatively low.

I'm completely unable to read that tool's version data.  The only
thing I can get from that tool is a snapshot of files as they exist
at the point in time that I am running the snapshot.  The snapshots
aren't always consistent with themselves.  Worse they take upwards
of 30 minutes to run, can only run on a Windows desktop, and consume
100% of the CPU while running.  So we cannot get them very often.

I have several users working on those files in Git through a common
shared repository.  We send changes to that file revision storage
tool on a frequent basis, say up to 3-5 times per day.  Each such
change is basically a squashed merge commit in Git terminology,
so the fine grained commits in Git aren't being preserved by that
storage tool, despite being in our shared Git repository.

Many days later most of the changes the users put into the storage
tool suddenly appear on the next snapshot we obtain from it.  I say
most because sometimes the powers that be either don't permit a
change to show up in the snapshot and delay it for a while, or
because they actually wanted to include a change but someone fat
fingered the storage controls and the change got omitted.  Yet the
powers that be *believe* the change is included, right up through
testing accusing development of not fixing the bug despite the fix
being there in the file revision storage tool.

Now I'd like to take these snapshots every so often, load them
into Git on a special branch just for the snapshots, then generate
a merge commit on that branch which merges the real commit that
corresponds as closely as possible to to this snapshot into the
snapshot branch.  Part of the reason for doing this is to look
for unexpected differences between what Git has and what the file
revision storage tool has.

But doing that is nearly impossible, so I don't.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-10 16:49       ` Shawn Pearce
@ 2006-10-10 17:14         ` Linus Torvalds
  2006-10-10 17:41           ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2006-10-10 17:14 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Martin Langhoff, git



On Tue, 10 Oct 2006, Shawn Pearce wrote:
> 
> Now I'd like to take these snapshots every so often, load them
> into Git on a special branch just for the snapshots, then generate
> a merge commit on that branch which merges the real commit that
> corresponds as closely as possible to to this snapshot into the
> snapshot branch.  Part of the reason for doing this is to look
> for unexpected differences between what Git has and what the file
> revision storage tool has.
> 
> But doing that is nearly impossible, so I don't.

Well, it probably wouldn't be too nasty to try to have a "find nearest 
commit" kind of thing. It's not quite as simple as bisection, but you 
could probably use a bisection-like algorithm to do something like a 
binary search to try to guess which tree is the closest. 

In other words, if you just give git a "range" of commits to look at, and 
let a bisection-line thing pick a mid-way point, you can then compare the 
mid-way point and the end-points (more than two) against the target tree, 
and then pick the range that looks "closer".

I wouldn't guarantee that it finds the best candidate (since the "closer" 
choice will inevitably not guarantee a monotonic sequence), but I think 
you could probably most of the time find something that is reasonably 
close.

If you do a lot of branching, you'd have to be a lot smarter about it 
(since you'd not have _one_ commit for beginning/end), but in a 
straight-line tree it should be really trivial, and in a branchy one I 
think it should still be quite doable. 

I dunno. It might be useful even if it's just a heuristic, in a "try to 
find a commit in the range X..Y that generates the smallest diff when 
compared against this tree". If it finds something sucky, you can try to 
look at the history of one of the files that generates a big diff, and try 
to give a better range - the automation should hopefully have given you 
_some_ clues.

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Does GIT has vc keywords like CVS/Subversion?
  2006-10-10 17:14         ` Linus Torvalds
@ 2006-10-10 17:41           ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2006-10-10 17:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Shawn Pearce, Martin Langhoff

Linus Torvalds <torvalds@osdl.org> writes:

> Well, it probably wouldn't be too nasty to try to have a "find nearest 
> commit" kind of thing. It's not quite as simple as bisection, but you 
> could probably use a bisection-like algorithm to do something like a 
> binary search to try to guess which tree is the closest. 

I had to do something like that in my day job once.  A customer
installation was made from a tarball of unknown vintage, and
then field patched with later fixes.

I ended up slurping the thing back and populated my index with
it.  Luckily I could guess a good initial point to find the
commit that gives minimum "git diff" output.  Then from the
remaining patches it was reasonably easy to find out which
changes were cherry-picked by hand with "git log master --
$paths".

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-10-10 17:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-09  1:25 Does GIT has vc keywords like CVS/Subversion? Dongsheng Song
2006-10-09  2:44 ` Liu Yubao
2006-10-09  2:59   ` Petr Baudis
2006-10-09 16:13   ` Linus Torvalds
2006-10-09 21:08     ` Martin Langhoff
2006-10-09 22:48       ` Johannes Schindelin
2006-10-09 22:57         ` Martin Langhoff
2006-10-09 22:55       ` Junio C Hamano
2006-10-10  7:37       ` Rene Scharfe
2006-10-10 16:49       ` Shawn Pearce
2006-10-10 17:14         ` Linus Torvalds
2006-10-10 17:41           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).