Git development

Git development
 help / color / mirror / Atom feed

* Re: [ANNOUNCE] Git wiki
From: Petr Baudis @ 2006-05-05 16:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vejz8241m.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Fri, May 05, 2006 at 11:51:01AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Petr Baudis <pasky@suse.cz> writes:
> 
> > But the non-obviously important part here to note is that the branch B
> > merely "corrects a typo on a comment somewhere" - the latest versions in
> > branch A and branch B are always compared for renames, therefore if
> > branch A renamed the file and branch B sums up to some larger-scale
> > changes in the file, it still won't be merged properly.
> 
> I probably am guilty of starting this misinformation, but the
> code does not compare the latest in A and B for rename
> detection; it compares (O, A) and (O, B).

Where O = LCA(A,B) (modulo recursiveness)? Yes, that is what I meant to
say but I phrased it wrong, sorry.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Jakub Narebski @ 2006-05-05 16:47 UTC (permalink / raw)
  To: git
In-Reply-To: <7vejz8241m.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> Petr Baudis <pasky@suse.cz> writes:
> 
>> But the non-obviously important part here to note is that the branch B
>> merely "corrects a typo on a comment somewhere" - the latest versions in
>> branch A and branch B are always compared for renames, therefore if
>> branch A renamed the file and branch B sums up to some larger-scale
>> changes in the file, it still won't be merged properly.
> 
> I probably am guilty of starting this misinformation, but the
> code does not compare the latest in A and B for rename
> detection; it compares (O, A) and (O, B).
> 
> But the end result is the same - what you say is correct.  If a
> path (say O to A) that renamed has too big a change, then no
> matter how small the changes are on the other path (O to B),
> rename detection can be fooled.  We could perhaps alleviate it
> by following the whole commit chain.

Or perhaps by helper information about renames, entered either by git-mv
(and git-cp) or rename detection at commit, e.g. in the following form

        note at <commit-sha1> was-in <pathname>
        note at <commit-sha1> was-in <pathname>

(with the obvious limit of this "note header" solution is that it wouldn't
work for filenames and directory name containing "\n"). I'm not sure if
<pathname> should be just basename, of full pathname.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [PATCH] binary patch.
From: Junio C Hamano @ 2006-05-05 17:38 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0605051128100.28543@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

>> +	delta = NULL;
>> +	deflated = deflate_it(two->ptr, two->size, &deflate_size);
>> +	if (one->size && two->size) {
>> +		delta = diff_delta(one->ptr, one->size,
>> +				   two->ptr, two->size,
>> +				   &delta_size, deflate_size);
>
> Here you probably want to use deflate_size-1 (deflate_size can't be 0).

I am not sure if -1 is worth here.

The delta is going to be deflated and hopefully gets a bit
smaller, so if we really care that level of detail, it might be
worth to do (deflate_size*3/2) or something like that here, use
delta with or without deflate whichever is smaller, and mark the
uncompressed delta with a different tag ("uncompressed delta"?).
And for symmetry, to deal with uncompressible data, we may want
to have "uncompressed literal" as well.

>> +		orig_size = delta_size;
>> +		if (delta)
>> +			delta = deflate_it(delta, delta_size, &delta_size);
>
> Here you're leaking the original delta buffer memory.

Indeed.  Thanks.

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Linus Torvalds @ 2006-05-05 17:48 UTC (permalink / raw)
  To: Petr Baudis; +Cc: linux, Git Mailing List
In-Reply-To: <20060505163629.GZ27689@pasky.or.cz>

On Fri, 5 May 2006, Petr Baudis wrote:
> 
> It's a philosophical question here, but I'd say that Git is much closer
> to Monotone than to any other version control system

Some historical background..

Before I dropped BK, I ended up being involved in trying to get Larry and 
Tridge to come to some agreement about how to solve the issues Tridge had 
with BK not being open-source. That actually went on for maybe two months 
or so, and I kept on hoping that we'd find some acceptably middle ground. 

I thought we could find somethign that would actually work for everybody: 
to hopefully both make BK technically better, _and_ to make the end result 
more palatable to the "free software or bust" contingency.

One of the suggestions that I tried to push as an acceptable middle ground 
was to make a "generic" BK repository export format, so that people who 
didn't want to use BK could still get all the information, and not in a 
broken format like CVS (yes, CVS makes sense as an interchange format, 
since _everybody_ speaks CVS, but it's a horrible, horrible, horrible 
format from any technical standpoint).

My example export format was really a strange mixture of patches with 
parenthood information, where the history information was described with 
hashes (MD5 rather than SHA1, but that was just an implementation thing, 
and mostly because BK used MD5 sums). Not something really useful as a 
real SCM, but it wasn't designed for that - it was just meant to be a 
useful and unambiguous interoperability format.

Now, that didn't work out, and I was a little bummed. I thought it would 
have made both sides happy, because it would actually have been a better 
format than CVS (and yes, I'm somewhat biased: in my opinion, having a 
million monkeys throwing crap at the walls and encoding the information in 
the patterns on monkey shit is a better format than CVS), so it would 
actually have improved BK, while also making it possible to interoperate 
if you didn't want to use BK itself.

But Tridge didn't believe that it would actually have exported all the 
information in a BK tree, even if both I and Larry told him it would. I'm 
not a hundred percent sure that Larry would have gone for the export 
format either, but hey, one sign of a good compromise is that neither side 
really gets what they really want. Whatever. It didn't work.

So it didn't actually resolve the deadlock, but when it became clear that 
I couldn't work with BK any more, I thought I might use something like 
that "patch + parenthood" representation as a way to maintain my tree 
while looking at other alternatives.

So in many ways, when I started looking around for distributed SCM's, I 
came into the game with the background of keeping the history around as 
chains of hashes describing it, and then just having patches to describe 
the differences between versions.

So that was really my "fallback" position: if nothing out there worked, 
I'd rather go back to lists of patches than use CVS. 

Now, if you keep track of just patches, one of the issues is that you 
can't afford to re-create the tree every time by walking patches forward 
from the beginning, so I also was planning to have an "cache" that 
maintained the current state of the tree as a separate state from the 
working tree, so that I would always have the "working tree" and the 
"result of patches up to this moment" as two separate things (so that I 
could do the "bk diff" that I was used to doing to see the difference 
between my last state and the current state of the working tree).

In other words, I was already working on the git "index" file. And I was 
planning to just have a patch-based system behind it, with a hashed 
history. Kind of "quilt with history and an index to speed things up".

The index itself would be backed-up with whole files (all hidden in the 
".dircache" directory), and the patch series would thus normally never 
actually be _used_. So the inefficiency of working with patches would 
never be much of an issue. A "commit" would create a new patch from the 
current working directory and the previous shadow tree, and update the 
shadow tree and add a new entry to the history list.

And then I found Monotone.

Now, monotone was slow. Monotone was so _horrendously_ slow that I had to 
do special hacks just to import _one_ version of Linux into it in less 
than two hours. It was something stupid like an O(N**3) algorithm in the 
number of filenames (and the kernel had 17,291 files at that time: 
v2.6.12-rc2), and it was just totally unusable for me.

I also thought (and still think) that the whole signing thing was a waste 
of time and misdesigned, and I obviously am not a huge fan of databases. 
So in many ways I disliked the monotone implementation decisions (and some 
of its design decisions). But at the same time, I immediately liked the 
SHA1 object naming concept of Monotone.

It also already matched how I had conceptually planned on doing on the 
history anyway, and had some ideas for, but it took that whole "history 
hashing" all the way.

And thus git was born. 

So git really has three parents. In a very real sense, BK (or, perhaps 
more appropriately - the way I personally used BK, which is not 
necessarily how others have used it) was the biggest thing from the 
standpoint of what I wanted my _workflow_ to be like. It was simply how I 
had done things for the last few years, so a lot of my mental model for 
how things are supposed to _work_ came from BK. 

I still don't think people give Larry enough credit for actually pushing 
this whole distributed SCM thing as a _usable_ model. Very few of the 
open-source distributed SCM's are actually usable even today, and as far 
as I've been able to gather, the commercial ones aren't really any closer 
either. Larry didn't have the kind of examples of what _can_ work that I 
had.

The other parent was the stupid "series of patches" model, which was what 
really resulted in the "index" thing. I realize that people don't always 
much like the index, but it's really a pretty central part of git history, 
and one of the distinguising marks of git. It may be trivial, and to some 
degree it's been overshadowed by all the tree operations we do (the 
combination of revision walking and tree diffing), but it was very central 
to how git came to be.

The index also ended up being central to how we did merges - even if some 
day we may end up doing more of that on a pure tree level (ie the current 
git-merge-tree model), I think the way we ended up doing merges owes a lot 
to the index as a staging area.

(Historically, the "index" was called the "cache". Exactly because it came 
from the notion of "caching" the top commit state in a patch series, and 
then working with patches either backwards or forwards from that top 
cached state. Similarly, we didn't have a ".git" directory: it was 
called ".dircache", exactly because it was all about caching the state 
of the previous commit directory layout).

And finally, Monotone for the "everything is an object named by its SHA1" 
model, which to some degree is perhaps the central - or at least the most 
obvious - part of git. It largely was designed really just to be the 
"backing store" for the "cache", and to not be _that_ important. That also 
explains why I didn't worry too much about disk usage etc initially: the 
object store wasn't even the most important part, and I envisioned just 
moving old objects that weren't needed into some "backup storage" kind of 
thing.

			Linus

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Petr Baudis @ 2006-05-05 18:15 UTC (permalink / raw)
  To: linux; +Cc: git
In-Reply-To: <20060505005659.9092.qmail@science.horizon.com>

Dear diary, on Fri, May 05, 2006 at 02:56:59AM CEST, I got a letter
where linux@horizon.com said that...
> But, as Linus has pointed out, this is a very partial solution which
> introduces a lot of difficulties elsewhere.  File renaming is a subset of
> the general class of code reorganizations.  Source files will be split,
> merged, and have functions moved back and forth.  You want the patch to
> find the code it applies to even if that code was moved.
> 
> And that can be done by taking a more global view of the patch.
> Identical file names is only a heuristic.  If the hunk on branch A
> can't find a place to apply on the same file in branch B, then
> you have to look a little harder, either at changes from branch B
> that introduce matching code elsewhere, or perhaps looking
> through history for a change that removed the match from the
> obvious place to see if it added a match elsewhere.

There are really two distinctions here which should be kept separate:
automatic vs. explicit movement tracking and file-level vs.
subfile-level movement tracking.

The automatic vs. explicit movement tracking is a lot more
controversial. Explicit movement tracking is pretty easy to provide for
file-level movements, it's just that the user says "I _did_ move file
A to file B" (I never got the Linus' argument that the user has no idea
- he just _performed_ the move, also explicitly, by calling *mv).

However, I guess the explicit movement tracking completely fails if you
go sub-file (without being extremely bothersome for the user) - you
would have to have control over the editor and the clipboard and even
then I'm not sure if you could reach any sensible results.

I still dislike automated movement tracking for whole files, but I'm
conciliated with it. Because it is probably the only really sensible way
to implement subfile-level tracking.  It would not be hard to implement
using pickaxe (actually, I believe it was near the top of Junio's TODO
few weeks ago) and a similarity detector comparing new and old version
(if it's dissimilar enough, check if that or a similar hunk was not
added somewhere else in the same commit; well, at least the idea
sounds simple).

One obvious problem are ambiguities - several similar files are renamed
to other similar files and now how do you decide which version to
choose? Merge the change to all the new files? Only to some? Panic?
I wonder how does the current recursive strategy deal with that.
Of course, this case sounds quite artificial and rare for whole files,
but I suspect that it will be much more common once you do not deal with
files but just hunks, moving bits of code around.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Petr Baudis @ 2006-05-05 18:20 UTC (permalink / raw)
  To: linux; +Cc: git
In-Reply-To: <20060505181540.GB27689@pasky.or.cz>

Dear diary, on Fri, May 05, 2006 at 08:15:41PM CEST, I got a letter
where Petr Baudis <pasky@suse.cz> said that...
> There are really two distinctions here which should be kept separate:
> automatic vs. explicit movement tracking and file-level vs.
> subfile-level movement tracking.

I should have revised this paragraph before sending the mail out, I
ended up sorting out my thoughts on the subject as I wrote the mail. The
two aspects end up so tied that it makes sense to mingle them. Examining
them separately here still hopefully shed some light on possible
reasoning behind the Git design decisions.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Jakub Narebski @ 2006-05-05 18:27 UTC (permalink / raw)
  To: git
In-Reply-To: <20060505181540.GB27689@pasky.or.cz>

Petr Baudis wrote:

> The automatic vs. explicit movement tracking is a lot more
> controversial. Explicit movement tracking is pretty easy to provide for
> file-level movements, it's just that the user says "I _did_ move file
> A to file B" (I never got the Linus' argument that the user has no idea
> - he just _performed_ the move, also explicitly, by calling *mv).
> 
> However, I guess the explicit movement tracking completely fails if you
> go sub-file (without being extremely bothersome for the user) - you
> would have to have control over the editor and the clipboard and even
> then I'm not sure if you could reach any sensible results.

If I remember correctly there are some problems if the explicit file-level
contents movement tracking (aka. file rename tracking) is done via
equivalent of file-id, inodes, or persistent names. Although it works for
many (most?) cases.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Linus Torvalds @ 2006-05-05 18:31 UTC (permalink / raw)
  To: Petr Baudis; +Cc: linux, git
In-Reply-To: <20060505181540.GB27689@pasky.or.cz>

On Fri, 5 May 2006, Petr Baudis wrote:
> 
> The automatic vs. explicit movement tracking is a lot more
> controversial. Explicit movement tracking is pretty easy to provide for
> file-level movements, it's just that the user says "I _did_ move file
> A to file B" (I never got the Linus' argument that the user has no idea
> - he just _performed_ the move, also explicitly, by calling *mv).

THE USER DID NO SUCH THING.

Moving data around happens with a whole lot more than "mv".

It happens with patches (somebody _else_ may have done an "mv", without 
using git at all), and it happens with editors (moving data around until 
most of it exists in another file).

So doing "*mv" is just a special case.

And supporting special cases is _wrong_. If you start depending on data 
that isn't actually dependable, that's WRONG.

There's another reason why encoding movement information in the commit is 
totally broken, namely the fact that a lot of the actions DO NOT WALK THE 
COMMIT CHAIN!

Try doing

	git diff v1.3.0..

and think about what that actually _means_. Think about the fact that it 
doesn't actually walk the commit chain at all: it diffs the trees between 
v1.3.0 and the current one. What if the rename happened in a commit in the 
middle?

The "track contents, not intentions" approach avoids both these things. 
The end result is _reliable_, not a "random guess".

Adding file movement note to commits is simply WRONG.

Why does this come up every three months or so? I was right the first 
time. You'd think that as time passes, people would just notice more and 
more how right I was and am, instead of forgetting and bringing this 
idiotic idea up over and over and over again.

		Linus

^ permalink raw reply

* Re: [PATCH] binary patch.
From: Nicolas Pitre @ 2006-05-05 18:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vejz8z80p.fsf@assigned-by-dhcp.cox.net>

On Fri, 5 May 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> >> +	delta = NULL;
> >> +	deflated = deflate_it(two->ptr, two->size, &deflate_size);
> >> +	if (one->size && two->size) {
> >> +		delta = diff_delta(one->ptr, one->size,
> >> +				   two->ptr, two->size,
> >> +				   &delta_size, deflate_size);
> >
> > Here you probably want to use deflate_size-1 (deflate_size can't be 0).
> 
> I am not sure if -1 is worth here.
> 
> The delta is going to be deflated and hopefully gets a bit
> smaller, so if we really care that level of detail, it might be
> worth to do (deflate_size*3/2) or something like that here, use
> delta with or without deflate whichever is smaller, and mark the
> uncompressed delta with a different tag ("uncompressed delta"?).
> And for symmetry, to deal with uncompressible data, we may want
> to have "uncompressed literal" as well.

Nah...  Please just forget that.  ;-)


Nicolas

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Jakub Narebski @ 2006-05-05 18:49 UTC (permalink / raw)
  To: git
In-Reply-To: <e3fvj2$779$1@sea.gmane.org>

Jakub Narebski wrote:

> Junio C Hamano wrote:
> 
>> Petr Baudis <pasky@suse.cz> writes:
>> 
>>> But the non-obviously important part here to note is that the branch B
>>> merely "corrects a typo on a comment somewhere" - the latest versions in
>>> branch A and branch B are always compared for renames, therefore if
>>> branch A renamed the file and branch B sums up to some larger-scale
>>> changes in the file, it still won't be merged properly.
>> 
>> I probably am guilty of starting this misinformation, but the
>> code does not compare the latest in A and B for rename
>> detection; it compares (O, A) and (O, B).
>> 
>> But the end result is the same - what you say is correct.  If a
>> path (say O to A) that renamed has too big a change, then no
>> matter how small the changes are on the other path (O to B),
>> rename detection can be fooled.  We could perhaps alleviate it
>> by following the whole commit chain.
> 
> Or perhaps by helper information about renames, entered either by git-mv
> (and git-cp) or rename detection at commit, e.g. in the following form
> 
>         note at <commit-sha1> was-in <pathname>
>         note at <commit-sha1> was-in <pathname>
> 
> (with the obvious limit of this "note header" solution is that it wouldn't
> work for filenames and directory name containing "\n"). I'm not sure if
> <pathname> should be just basename, of full pathname.

Erm, I'm sorry, forget the implementation which wouldn't work. The idea was
to accumulate renames and contents moving information, and remember at
which commit it occured. But it's place (as a _helper_ information) is
perhaps in separate structure.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Petr Baudis @ 2006-05-05 18:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux, git
In-Reply-To: <Pine.LNX.4.64.0605051123420.3622@g5.osdl.org>

Dear diary, on Fri, May 05, 2006 at 08:31:06PM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> said that...
> Moving data around happens with a whole lot more than "mv".

Let's keep this on the per-file level - if you want to go below the file
granularity, I already _DID_ say that I agree that explicit tracking is
not a way. (If sub-file tracking would end up having any usable
reliability in real-world cases, which is something I do not take for
granted.)

Another thing is, the sub-file content tracking would end up being a lot
more "magic" than the simple per-file content tracking, and you stated
several times that you prefer simple merge over better but magic merge -
so why do you prefer sub-file content tracking anyway?

> It happens with patches (somebody _else_ may have done an "mv", without 
> using git at all),

_Here_ is the place for automated renames detection. Between applying
and committing the patch, the user can verify that it got the renames
right. That's impossible when guessing the renames later.

> and it happens with editors (moving data around until 
> most of it exists in another file).

I doubt this in fact happens that often (to a degree the automatic
rename detection would catch). And if it happens, then the user has to
tell Git - I have never heard that _this_ would be any problem in other
version control systems. You could make it more foolproof by running the
automatic rename detection on the diff being committed and suggesting
the user that other yet unrecorded renames did happen.

The point is, the user stays in control and can override any stupid guess.

> So doing "*mv" is just a special case.
> 
> And supporting special cases is _wrong_. If you start depending on data 
> that isn't actually dependable, that's WRONG.

I prefer making this data dependable to having to resort to guessing on
dependable less amount of data.

> There's another reason why encoding movement information in the commit is 
> totally broken, namely the fact that a lot of the actions DO NOT WALK THE 
> COMMIT CHAIN!
> 
> Try doing
> 
> 	git diff v1.3.0..
> 
> and think about what that actually _means_. Think about the fact that it 
> doesn't actually walk the commit chain at all: it diffs the trees between 
> v1.3.0 and the current one. What if the rename happened in a commit in the 
> middle?

Then the automated renames detection will miss it given that the other
accumulated differences are large enough, and the suggested workarounds
_are_ precisely walking the commit chain.

If you use persistent file ids, you never miss it _AND_ you DO NOT WALK
THE COMMIT CHAIN! You still just match file ids in the two trees.

> The "track contents, not intentions" approach avoids both these things. 
> The end result is _reliable_, not a "random guess".

No, the end result is whichever some heuristic randomly guessed, and
it's not reliable either since the heuristic can change.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: [PATCH 1/3] Alphabetize the glossary.
From: Junio C Hamano @ 2006-05-05 19:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0605041238240.26488@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> The idea of having it not alphabetized, but doing it by a script, was to 
> let people actually _read_ it. There is nothing more annoying than having 
> to jump forward and backward and eventually be lost.
>
> glossary, as I started it, was topologically ordered: no Git term was used 
> before it was described (at least that was the plan).

I myself rarely read either man nor html formatted ones.  When I
need to find something, I go straight to Documentation/
directory looking for *.txt files.  Being able to find things
from an alphabetized list is very handy.

On the other hand, we would want to make it easy for people to
read it in the logical order.  For that purpose, html formatted
version, thanks to the cross references the script creates, is a
lot easier than the plain text version.

Maybe we should do both.  We _could_ teach the sort script to
also do an topological sort, and have two sections in the
resulting formatted documentation, the top part being
"alphabetical", and the second part being "bedtime reading".

A random sort that is merely topologically correct probably is
not what we want, so it might make sense to have a hint that
instructs "these should come first before those although they
are topologically independent" to the sort script.  Of course
that "hint" could be the order entries appear in the source text
(which was what you had originally), but when somebody wants to
add a new entry to the glossary, it makes unambiguous where the
new entry should go if the source text is already sorted, which
I am hoping would make it somewhat easier to maintain.

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Dave Jones @ 2006-05-05 19:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, linux, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0605050944200.3622@g5.osdl.org>

On Fri, May 05, 2006 at 10:48:38AM -0700, Linus Torvalds wrote:

 > (and yes, I'm somewhat biased: in my opinion, having a 
 > million monkeys throwing crap at the walls and encoding the information in 
 > the patterns on monkey shit is a better format than CVS), so it would 
 > actually have improved BK, while also making it possible to interoperate 
 > if you didn't want to use BK itself.
 >  ...
 > So that was really my "fallback" position: if nothing out there worked, 
 > I'd rather go back to lists of patches than use CVS. 

I've encountered managing kernel trees in CVS both during my tenure at SuSE,
and to a more involved extent as Fedora/RHEL maintainer, and I'd just like
to echo how much it _completely sucks_ at times.

Rebasing to a newer release is a *nightmare* that usually takes
up most of an afternoon compared to rebasing my git based projects.

In the event I can't persuade the powers at be to switch to git at some point
for managing our packages, I'll be sure to bring up your suggestion of
a million monkeys. I believe you can pick them up fairly cheap these days.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply

* [PATCH] Several trivial documentation touch ups.
From: sean @ 2006-05-05 19:05 UTC (permalink / raw)
  To: git

  Move incorrect asciidoc level 2 titles back to level 1.

  Show output of git-name-rev in man page example.

  Reword sentences that begin with a period (.) in asciidoc
  numbered lists to work around conversion to man page bug.

  Mention that git-repack now calls git-prune-packed
  when the -d option is passed to it.

  [imap] section headers in the config file example need to be
  contained in a literal block.  imap.pass is the proper config
  file variable to use, not imap.password.

Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>


---

 Documentation/git-clone.txt       |    2 +-
 Documentation/git-imap-send.txt   |    4 +++-
 Documentation/git-name-rev.txt    |    1 +
 Documentation/git-repack.txt      |    1 +
 Documentation/git-repo-config.txt |    6 +++---
 Documentation/git-reset.txt       |    2 +-
 6 files changed, 10 insertions(+), 6 deletions(-)

227b8dd1fa66a6d96a25e9fd8fc070be1ea31449
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 131e445..b333f51 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -101,7 +101,7 @@ OPTIONS
 	is not allowed.
 
 Examples
-~~~~~~~~
+--------
 
 Clone from upstream::
 +
diff --git a/Documentation/git-imap-send.txt b/Documentation/git-imap-send.txt
index cfc0d88..eca9e9c 100644
--- a/Documentation/git-imap-send.txt
+++ b/Documentation/git-imap-send.txt
@@ -29,6 +29,7 @@ CONFIGURATION
 git-imap-send requires the following values in the repository
 configuration file (shown with examples):
 
+..........................
 [imap]
     Folder = "INBOX.Drafts"
 
@@ -38,8 +39,9 @@ configuration file (shown with examples)
 [imap]
     Host = imap.server.com
     User = bob
-    Password = pwd
+    Pass = pwd
     Port = 143
+..........................
 
 
 BUGS
diff --git a/Documentation/git-name-rev.txt b/Documentation/git-name-rev.txt
index 6870708..ffaa004 100644
--- a/Documentation/git-name-rev.txt
+++ b/Documentation/git-name-rev.txt
@@ -41,6 +41,7 @@ Enter git-name-rev:
 
 ------------
 % git name-rev 33db5f4d9027a10e477ccf054b2c1ab94f74c85a
+33db5f4d9027a10e477ccf054b2c1ab94f74c85a tags/v0.99^0~940
 ------------
 
 Now you are wiser, because you know that it happened 940 revisions before v0.99.
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index d2f9a44..9516227 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -38,6 +38,7 @@ OPTIONS
 -d::
 	After packing, if the newly created packs make some
 	existing packs redundant, remove the redundant packs.
+	Also runs gitlink:git-prune-packed[1].
 
 -l::
         Pass the `--local` option to `git pack-objects`, see
diff --git a/Documentation/git-repo-config.txt b/Documentation/git-repo-config.txt
index ddcf523..fd44f62 100644
--- a/Documentation/git-repo-config.txt
+++ b/Documentation/git-repo-config.txt
@@ -34,10 +34,10 @@ convert the value to the canonical form 
 a "true" or "false" string for bool). If no type specifier is passed,
 no checks or transformations are performed on the value.
 
-This command will fail if
+This command will fail if:
 
-. .git/config is invalid,
-. .git/config can not be written to,
+. The .git/config file is invalid,
+. Can not write to .git/config,
 . no section was provided,
 . the section or key is invalid,
 . you try to unset an option which does not exist, or
diff --git a/Documentation/git-reset.txt b/Documentation/git-reset.txt
index ebcfe5e..b27399d 100644
--- a/Documentation/git-reset.txt
+++ b/Documentation/git-reset.txt
@@ -43,7 +43,7 @@ OPTIONS
 	Commit to make the current HEAD.
 
 Examples
-~~~~~~~~
+--------
 
 Undo a commit and redo::
 +
-- 
1.3.1.g9c203

^ permalink raw reply related

* [PATCH] Fix up docs where "--" isn't displayed correctly.
From: sean @ 2006-05-05 19:05 UTC (permalink / raw)
  To: git

A bare "--" doesn't show up in man or html pages correctly
as two individual dashes unless backslashed as \--
in the asciidoc source.  Note, no backslash is needed
inside a literal block.

Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>


---

 Documentation/git-add.txt            |    2 +-
 Documentation/git-checkout-index.txt |    2 +-
 Documentation/git-commit.txt         |    2 +-
 Documentation/git-log.txt            |    2 +-
 Documentation/git-ls-files.txt       |    2 +-
 Documentation/git-merge-index.txt    |    4 ++--
 Documentation/git-prune.txt          |    2 +-
 Documentation/git-rm.txt             |    2 +-
 Documentation/git-update-index.txt   |    2 +-
 Documentation/git-verify-pack.txt    |    2 +-
 Documentation/git-whatchanged.txt    |    2 +-
 Documentation/gitk.txt               |    2 +-
 12 files changed, 13 insertions(+), 13 deletions(-)

32a74a984e6c1869dbebc9bc8d2fe9503e8dd624
diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
index ae24547..5e31129 100644
--- a/Documentation/git-add.txt
+++ b/Documentation/git-add.txt
@@ -26,7 +26,7 @@ OPTIONS
 -v::
         Be verbose.
 
---::
+\--::
 	This option can be used to separate command-line options from
 	the list of files, (useful when filenames might be mistaken
 	for command-line options).
diff --git a/Documentation/git-checkout-index.txt b/Documentation/git-checkout-index.txt
index 09bd6a5..765c173 100644
--- a/Documentation/git-checkout-index.txt
+++ b/Documentation/git-checkout-index.txt
@@ -63,7 +63,7 @@ OPTIONS
 	Only meaningful with `--stdin`; paths are separated with
 	NUL character instead of LF.
 
---::
+\--::
 	Do not interpret any more arguments as options.
 
 The order of the flags used to matter, but not anymore.
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index 0a7365b..38df59c 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -106,7 +106,7 @@ but can be used to amend a merge commit.
 	index and the latest commit does not match on the
 	specified paths to avoid confusion.
 
---::
+\--::
 	Do not interpret any more arguments as options.
 
 <file>...::
diff --git a/Documentation/git-log.txt b/Documentation/git-log.txt
index af378ff..c9ffff7 100644
--- a/Documentation/git-log.txt
+++ b/Documentation/git-log.txt
@@ -51,7 +51,7 @@ git log v2.6.12.. include/scsi drivers/s
 	Show all commits since version 'v2.6.12' that changed any file
 	in the include/scsi or drivers/scsi subdirectories
 
-git log --since="2 weeks ago" -- gitk::
+git log --since="2 weeks ago" \-- gitk::
 
 	Show the changes during the last two weeks to the file 'gitk'.
 	The "--" is necessary to avoid confusion with the *branch* named
diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 796d049..a29c633 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -106,7 +106,7 @@ OPTIONS
 	lines, show only handful hexdigits prefix.
 	Non default number of digits can be specified with --abbrev=<n>.
 
---::
+\--::
 	Do not interpret any more arguments as options.
 
 <file>::
diff --git a/Documentation/git-merge-index.txt b/Documentation/git-merge-index.txt
index fbc986a..332e023 100644
--- a/Documentation/git-merge-index.txt
+++ b/Documentation/git-merge-index.txt
@@ -8,7 +8,7 @@ git-merge-index - Runs a merge for files
 
 SYNOPSIS
 --------
-'git-merge-index' [-o] [-q] <merge-program> (-a | -- | <file>\*)
+'git-merge-index' [-o] [-q] <merge-program> (-a | \-- | <file>\*)
 
 DESCRIPTION
 -----------
@@ -19,7 +19,7 @@ files are passed as arguments 5, 6 and 7
 
 OPTIONS
 -------
---::
+\--::
 	Do not interpret any more arguments as options.
 
 -a::
diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index f694fcb..a11e303 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -28,7 +28,7 @@ OPTIONS
 	Do not remove anything; just report what it would
 	remove.
 
---::
+\--::
 	Do not interpret any more arguments as options.
 
 <head>...::
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index c9c3088..66fc478 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -32,7 +32,7 @@ OPTIONS
 -v::
         Be verbose.
 
---::
+\--::
 	This option can be used to separate command-line options from
 	the list of files, (useful when filenames might be mistaken
 	for command-line options).
diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 23f2b6f..57177c7 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -113,7 +113,7 @@ OPTIONS
 	Only meaningful with `--stdin`; paths are separated with
 	NUL character instead of LF.
 
---::
+\--::
 	Do not interpret any more arguments as options.
 
 <file>::
diff --git a/Documentation/git-verify-pack.txt b/Documentation/git-verify-pack.txt
index 4962d69..7a6132b 100644
--- a/Documentation/git-verify-pack.txt
+++ b/Documentation/git-verify-pack.txt
@@ -25,7 +25,7 @@ OPTIONS
 -v::
 	After verifying the pack, show list of objects contained
 	in the pack.
---::
+\--::
 	Do not interpret any more arguments as options.
 
 OUTPUT FORMAT
diff --git a/Documentation/git-whatchanged.txt b/Documentation/git-whatchanged.txt
index 641cb7e..e8f21d0 100644
--- a/Documentation/git-whatchanged.txt
+++ b/Documentation/git-whatchanged.txt
@@ -58,7 +58,7 @@ git-whatchanged -p v2.6.12.. include/scs
 	Show as patches the commits since version 'v2.6.12' that changed
 	any file in the include/scsi or drivers/scsi subdirectories
 
-git-whatchanged --since="2 weeks ago" -- gitk::
+git-whatchanged --since="2 weeks ago" \-- gitk::
 
 	Show the changes during the last two weeks to the file 'gitk'.
 	The "--" is necessary to avoid confusion with the *branch* named
diff --git a/Documentation/gitk.txt b/Documentation/gitk.txt
index eb126d7..cb482bf 100644
--- a/Documentation/gitk.txt
+++ b/Documentation/gitk.txt
@@ -31,7 +31,7 @@ gitk v2.6.12.. include/scsi drivers/scsi
 	Show as the changes since version 'v2.6.12' that changed any
 	file in the include/scsi or drivers/scsi subdirectories
 
-gitk --since="2 weeks ago" -- gitk::
+gitk --since="2 weeks ago" \-- gitk::
 
 	Show the changes during the last two weeks to the file 'gitk'.
 	The "--" is necessary to avoid confusion with the *branch* named
-- 
1.3.1.g9c203

^ permalink raw reply related

* [PATCH] Update  git-unpack-objects documentation.
From: sean @ 2006-05-05 19:05 UTC (permalink / raw)
  To: git

Document that git-unpack-objects will not produce any
results when used on a pack that exists in a repository;
move it first.

Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>


---

 Documentation/git-unpack-objects.txt |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

68facf4045556d10c541534a086b3c6486a1c5fb
diff --git a/Documentation/git-unpack-objects.txt b/Documentation/git-unpack-objects.txt
index 1828062..c20b38b 100644
--- a/Documentation/git-unpack-objects.txt
+++ b/Documentation/git-unpack-objects.txt
@@ -13,9 +13,16 @@ SYNOPSIS
 
 DESCRIPTION
 -----------
-Reads a packed archive (.pack) from the standard input, and
-expands the objects contained in the pack into "one-file
-one-object" format in $GIT_OBJECT_DIRECTORY.
+Read a packed archive (.pack) from the standard input, expanding
+the objects contained within and writing them into the repository in
+"loose" (one object per file) format.
+
+Objects that already exist in the repository will *not* be unpacked
+from the pack-file.  Therefore, nothing will be unpacked if you use
+this command on a pack-file that exists within the target repository.
+
+Please see the `git-repack` documentation for options to generate
+new packs and replace existing ones.
 
 OPTIONS
 -------
-- 
1.3.1.g9c203

^ permalink raw reply related

* [PATCH] Clarify git-cherry documentation.
From: sean @ 2006-05-05 19:06 UTC (permalink / raw)
  To: git

Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>


---

 Documentation/git-cherry.txt |   19 ++++++++++++++-----
 1 files changed, 14 insertions(+), 5 deletions(-)

6978ad8b3935b8ce2c55da65b099c67a32ff94d0
diff --git a/Documentation/git-cherry.txt b/Documentation/git-cherry.txt
index 9a5e371..893baaa 100644
--- a/Documentation/git-cherry.txt
+++ b/Documentation/git-cherry.txt
@@ -11,11 +11,20 @@ SYNOPSIS
 
 DESCRIPTION
 -----------
-Each commit between the fork-point and <head> is examined, and compared against
-the change each commit between the fork-point and <upstream> introduces.
-Commits already included in upstream are prefixed with '-' (meaning "drop from
-my local pull"), while commits missing from upstream are prefixed with '+'
-(meaning "add to the updated upstream").
+The changeset (or "diff") of each commit between the fork-point and <head>
+is compared against each commit between the fork-point and <upstream>.
+
+Every commit with a changeset that doesn't exist in the other branch
+has its id (sha1) reported, prefixed by a symbol.  Those existing only
+in the <upstream> branch are prefixed with a minus (-) sign, and those
+that only exist in the <head> branch are prefixed with a plus (+) symbol.
+
+Because git-cherry compares the changeset rather than the commit id
+(sha1), you can use git-cherry to find out if a commit you made locally
+has been applied <upstream> under a different commit id.  For example,
+this will happen if you're feeding patches <upstream> via email rather
+than pushing or pulling commits directly.
+
 
 OPTIONS
 -------
-- 
1.3.1.g9c203

^ permalink raw reply related

* Re: [PATCH] binary patch.
From: Junio C Hamano @ 2006-05-05 19:23 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0605051431390.24505@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> On Fri, 5 May 2006, Junio C Hamano wrote:
>
>> The delta is going to be deflated and hopefully gets a bit
>> smaller, so if we really care that level of detail, it might be
>> worth to do (deflate_size*3/2) or something like that here, use
>> delta with or without deflate whichever is smaller, and mark the
>> uncompressed delta with a different tag ("uncompressed delta"?).
>> And for symmetry, to deal with uncompressible data, we may want
>> to have "uncompressed literal" as well.
>
> Nah...  Please just forget that.  ;-)

I was serious about the above actually.

BTW, this "binary patch" opens a different can of worms.

Currently, the diff uses a heuristic borrowed from GNU diff 
(I did not look at the code when I did it, but it is described
in its documentation) to decide if a file is binary (look at the
first few bytes and find NUL).  I am sure people will want to
have a way to say "that heuristic fails but this _is_ a binary
file and please treat it as such".

There are two, both valid, I think, ways to do it.

 - give an option to "diff" that says "treat this path as binary
   for this invocation of the program".

 - give an attribute to blob object that says "this blob is
   binary and should be treated as such".

The latter is probably the right way to go in the longer term.

A blob being binary or not is a property of the content and does
not depend on where it sits in the history, so unlike "recording
renames as a hint in commit objects", the attribute is at the
blob level, not at the commit nor the tree that points at the
blob.

But "binaryness" affects only certain operations that extract
the data (e.g. diff and grep) and not others (e.g. fetch).
Also, it makes sense to being able to retroactively mark a blob,
which was not marked as such originally, is a binary.  So I do
not think it should be recorded in the object header.

Which suggests that we may perhaps want to have notes that can
be attached to existing objects to augment them without changing
the contents of the data, and have tools notice these notes when
they are available.  Another example is to associate correct
MIME types to blobs so, gitweb _blob_ links can do sensible
things to them.

These external notes are purely for Porcelains (in the context
of this sentence "diff" and "grep" are Porcelain), but we would
also want a way to propagate them across repositories somehow.
In a sense, "grafts" information is similar to the external
notes in that it augments existing commit objects, but its
effect is a bit more intrusive; it affects the way the core
operates.

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Jakub Narebski @ 2006-05-05 19:39 UTC (permalink / raw)
  To: git
In-Reply-To: <20060505185445.GD27689@pasky.or.cz>

Petr Baudis wrote:

> Dear diary, on Fri, May 05, 2006 at 08:31:06PM CEST, I got a letter
> where Linus Torvalds <torvalds@osdl.org> said that...

> I prefer making this [rename detection] data dependable to having to
> resort to guessing on dependable less amount of data.
> 
>> There's another reason why encoding movement information in the commit is
>> totally broken, namely the fact that a lot of the actions DO NOT WALK THE
>> COMMIT CHAIN!
>> 
>> Try doing
>> 
>> git diff v1.3.0..
>> 
>> and think about what that actually _means_. Think about the fact that it
>> doesn't actually walk the commit chain at all: it diffs the trees between
>> v1.3.0 and the current one. What if the rename happened in a commit in
>> the middle?
> 
> Then the automated renames detection will miss it given that the other
> accumulated differences are large enough, and the suggested workarounds
> _are_ precisely walking the commit chain.
> 
> If you use persistent file ids, you never miss it _AND_ you DO NOT WALK
> THE COMMIT CHAIN! You still just match file ids in the two trees.

Let not jump to the one of the possible solution. The detecting and noting
renames and content moving (with user interaction) at commit is nice...
unless does something which cannot allow interactiveness (like applying
patchbomb), but even then detecting and saving info at commit would be good
idea.

What we need is to for two given linked revisions (with a path between them)
to easily extract information about renames (content moving). Perhaps using
additional structure... best if we could do this without walking the chain.
The rest is details... ;-P

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* A custom Logo that expresses your company! (ID949104636)
From: Orlando Hester @ 2006-05-05 22:43 UTC (permalink / raw)
  To: georas

hmz

Our art team creates a custom logo for you, based on your needs.  Years of experience have taught us how to create a logo that makes a statement that is unique to you.

In a professional manner we learn about your image and how you would like the world to perceive you and your company.  With this information we then create a logo that is not only unique but reflects the purpose of you and your company.

For value and a logo that reflects your image, take a few minutes and visit Logo Maker!

http://cornish.com.logotip-marke.com

Sincerely,
Logo Design Team

 committed attache cuttlebone

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Junio C Hamano @ 2006-05-05 19:49 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20060505185445.GD27689@pasky.or.cz>

Petr Baudis <pasky@suse.cz> writes:

> I doubt this in fact happens that often (to a degree the automatic
> rename detection would catch). And if it happens, then the user has to
> tell Git - I have never heard that _this_ would be any problem in other
> version control systems.

It does not become an issue only because users accept it as a
fact of life.  When Linus was moving most of the contents in
rev-list.c to create a new revision.c, I already had some tweaks
to rev-list.c published before he sent me a patch for the code
movement, and I am sure he needed to re-roll the patch by
merging the change I did to rev-list.c back into his revision.c
file.  No SCM may handle that automatically, and no user
accustomed to existing SCM (including git) expect that to work
automatically.  But that does not necessarily mean a tool that
notices it and tells user what is going on is a bad thing.

However it is a different story to try recording "what is going on"
whether it comes from the tool's guess or directly from the user.

Having a way to affect the inprecise "guess" the tool makes when
that guesswork is needed might make sense.  If you (think you)
know arch/i386/foo.h was copied to create arch/x86-64/foo.h but
the detector does not detect it and seeing a creation patch for
arch/x86-64/foo.h frustrates you, you may want to have a way to
explicitly say "compare arch/i386/foo.h with arch/x86-64/foo.h
in that commit -- I want to examine the change needed to adjust
foo to x86-64 architecture".

But we have "git diff v2.6.14:arch/i386/foo.h v2.6.14:arch/x86-64/foo.h"
for that ;-).

> Then the automated renames detection will miss it given that the other
> accumulated differences are large enough, and the suggested workarounds
> _are_ precisely walking the commit chain.

The HEAD may _not_ have anything to do with v1.3.0 in which case
you would get nothing from walking the ancestry.

> If you use persistent file ids, you never miss it _AND_ you DO NOT WALK
> THE COMMIT CHAIN! You still just match file ids in the two trees.

It is unworkable.

Which one should inherit the persistent id of the old
rev-list.c?  New rev-list.c, or revision.c that has most of the
old contents split out?

Oh, and did you know there was a different revision.h that is
not related to the current revision.h in the history of git?
Should its persistent id have any relation with the persistent
id of the current revision.h?  When would you decide to make the
id inherited and when not to?  If I remove revision.h by mistake
in a commit and resurrect it in the next commit, should it get
the same id back?  If I forget to tell the tool that those two
"disappeared and then reappeared" are related and should get the
same persistent id when I make the resurrection commit, and keep
piling other commits on top, do I have to rewind the ancestry
chain all the way to correct the mistake?

^ permalink raw reply

* Re: [PATCH] binary patch.
From: Nicolas Pitre @ 2006-05-05 20:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vac9wxom0.fsf@assigned-by-dhcp.cox.net>

On Fri, 5 May 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > On Fri, 5 May 2006, Junio C Hamano wrote:
> >
> >> The delta is going to be deflated and hopefully gets a bit
> >> smaller, so if we really care that level of detail, it might be
> >> worth to do (deflate_size*3/2) or something like that here, use
> >> delta with or without deflate whichever is smaller, and mark the
> >> uncompressed delta with a different tag ("uncompressed delta"?).
> >> And for symmetry, to deal with uncompressible data, we may want
> >> to have "uncompressed literal" as well.
> >
> > Nah...  Please just forget that.  ;-)
> 
> I was serious about the above actually.

And I think this is overkill.

First, if a deflated delta is to be _larger_ than its inflated version 
this is because the delta data is really really short, most probably 
shorter than a single base85 line.  Same for literal data.

So I truely think the pretty special and rare case where not deflating 
might be smaller is simply not worth the added complexity.

> BTW, this "binary patch" opens a different can of worms.
> 
> Currently, the diff uses a heuristic borrowed from GNU diff 
> (I did not look at the code when I did it, but it is described
> in its documentation) to decide if a file is binary (look at the
> first few bytes and find NUL).  I am sure people will want to
> have a way to say "that heuristic fails but this _is_ a binary
> file and please treat it as such".
> 
> There are two, both valid, I think, ways to do it.
> 
>  - give an option to "diff" that says "treat this path as binary
>    for this invocation of the program".
> 
>  - give an attribute to blob object that says "this blob is
>    binary and should be treated as such".
> 
> The latter is probably the right way to go in the longer term.

I'm not sure I agree.

> A blob being binary or not is a property of the content and does
> not depend on where it sits in the history, so unlike "recording
> renames as a hint in commit objects", the attribute is at the
> blob level, not at the commit nor the tree that points at the
> blob.

Well, sort of.

> But "binaryness" affects only certain operations that extract
> the data (e.g. diff and grep) and not others (e.g. fetch).
> Also, it makes sense to being able to retroactively mark a blob,
> which was not marked as such originally, is a binary.  So I do
> not think it should be recorded in the object header.

Agreed.

> Which suggests that we may perhaps want to have notes that can
> be attached to existing objects to augment them without changing
> the contents of the data, and have tools notice these notes when
> they are available.  Another example is to associate correct
> MIME types to blobs so, gitweb _blob_ links can do sensible
> things to them.

I think blobs are the wrong level to attach such notes.  If you go that 
path you'll have to add as many entries for the number of blobs many 
revisions of the same file might have.

Instead I think it should be attached to files.  After all being a 
binary or not is a file attribute regardless of its revision.  And 
implementation wise I'd do it as a .gitbin file listing all names of 
files that should be considered as binaries, with path globing and all, 
just like .gitignore currently lists files that should be ignored.

And the advantage is that those .gitbin files can be distributed and 
revision-controlled just like the .gitignore files.

And in addition to those files you could have a section in the repo 
config file listing default name patterns for files that are considered 
binaries.  Or even a section, if present, that lists patterns for files 
that are _not_ binaries since that list might certainly be shorter.  
There could be a corresponding .gittext as well.

And in the absence of any of those then the default automatic euristic 
applies.

Nicolas

^ permalink raw reply

* Re: [PATCH] binary patch.
From: Daniel Barkalow @ 2006-05-05 20:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git
In-Reply-To: <7vac9wxom0.fsf@assigned-by-dhcp.cox.net>

On Fri, 5 May 2006, Junio C Hamano wrote:

> But "binaryness" affects only certain operations that extract
> the data (e.g. diff and grep) and not others (e.g. fetch).
> Also, it makes sense to being able to retroactively mark a blob,
> which was not marked as such originally, is a binary.  So I do
> not think it should be recorded in the object header.

Why do you think it makes sense to retroactively mark a blob with things 
like binariness or MIME type? To the extent that the information is not 
possible to extract from the blob contents, it seems to me to be a 
permanent aspect of the blob. And I could see having blobs with the same 
content but different type information (that one is a ZIP archive, while 
this one is a OpenDocument file), and tools may care how they were 
specified, and the user would want to be able to track how they had 
historically been marked, if the system allows them to be marked at all.

Of course, there's still the issue of how this info is generated for a new 
blob; I think it should live in the index for tracked files and come from 
a .gitignore-style file for new files. (For that matter, there could be a 
.gitmetadata file, which would handle "ignore" as well as binary and 
whatever other info you want to produce about your not-previously-tracked 
files.)

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Re: [ANNOUNCE] Git wiki
From: Olivier Galibert @ 2006-05-05 20:45 UTC (permalink / raw)
  To: linux; +Cc: git
In-Reply-To: <20060505181540.GB27689@pasky.or.cz>

On Fri, May 05, 2006 at 08:15:41PM +0200, Petr Baudis wrote:
> The automatic vs. explicit movement tracking is a lot more
> controversial. Explicit movement tracking is pretty easy to provide for
> file-level movements, it's just that the user says "I _did_ move file
> A to file B" (I never got the Linus' argument that the user has no idea
> - he just _performed_ the move, also explicitly, by calling *mv).

In one of my projects 99% or the renames are "done" when unzipping the
source release of the next version.  Explicit tracking would be
unbearable, frankly.

And once you have a good enough implicit tracking, why bother with an
explicit one?

  OG.

^ permalink raw reply

* Re: [PATCH] binary patch.
From: Junio C Hamano @ 2006-05-05 20:50 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0605051605340.6713@iabervon.org>

Daniel Barkalow <barkalow@iabervon.org> writes:

> On Fri, 5 May 2006, Junio C Hamano wrote:
>
>> But "binaryness" affects only certain operations that extract
>> the data (e.g. diff and grep) and not others (e.g. fetch).
>> Also, it makes sense to being able to retroactively mark a blob,
>> which was not marked as such originally, is a binary.  So I do
>> not think it should be recorded in the object header.
>
> Why do you think it makes sense to retroactively mark a blob with things 
> like binariness or MIME type? To the extent that the information is not 
> possible to extract from the blob contents, it seems to me to be a 
> permanent aspect of the blob. And I could see having blobs with the same 
> content but different type information (that one is a ZIP archive, while 
> this one is a OpenDocument file), and tools may care how they were 
> specified, and the user would want to be able to track how they had 
> historically been marked, if the system allows them to be marked at all.
>
> Of course, there's still the issue of how this info is generated for a new 
> blob; I think it should live in the index for tracked files and come from 
> a .gitignore-style file for new files. (For that matter, there could be a 
> .gitmetadata file, which would handle "ignore" as well as binary and 
> whatever other info you want to produce about your not-previously-tracked 
> files.)

I think Nico's solution (compromise?) is the right and most
practical one.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox