Git development
 help / color / mirror / Atom feed
* Re: Google Summer of Code 2009: GIT
From: Johannes Schindelin @ 2009-03-19 23:42 UTC (permalink / raw)
  To: saurabh gupta; +Cc: david, Junio C Hamano, git
In-Reply-To: <ab9fa62a0903191217t5d0e6d9cn4915a425ed8084ff@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3554 bytes --]

Hi,

On Fri, 20 Mar 2009, saurabh gupta wrote:

> On Thu, Mar 19, 2009 at 4:46 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > For example, if we decide that OOXML is a must (as it is a proper 
> > standard, and many people will benefit from it), we will most likely 
> > end up in having to write a merge _driver_ (to handle those .zip 
> > files), _and_ a merge _helper_, although we can avoid writing our own 
> > GUI, as we can create an OOXML that has its own version of conflict 
> > markers.
> 
> Well, for ODF type document, we can write a merge driver which will 
> change the xml file in an appropriate way that OO can understand it and 
> the user can see the merge result/conflict in a comfortable way. As 
> described by Junio, in this case, a dedicated merge helper is not needed 
> as OO can parse the markers made by merge-driver and provide the user to 
> resolve the conflict and register the changes to index.

There is also the idea that OOffice has building blocks in place to help 
resolving merge conflicts.  For a successful application, you will have to 
show that you researched that option, and describe how well/badly it fits 
with the goal of the project.

> > - knowing what data types we want to support _at the least_, and what 
> >   data  types we keep for the free skate,
> 
> As of now, how about going for XML files. For this summer, we can go for 
> XML files and latex files can be handled later.

If your goal is just XML files (without any more specific goal, like ODF 
or SVG), I am afraid that I think that project is not worth 4500 dollar 
from Google's pocket.  I mean, we are not talking peanuts here.

> > - a clear picture of the user interface we want to be able to provide,
> 
> In my opinion, we have following things to do:
> 
> => while merging an ODF document, merge-driver will merge the file at
> file level. If changes don't overlap, then it returns the result with
> a success. For example, if the file is changed only on one side, then
> the driver will simply add the new content.
> 
> => If conflicts appear, then the merge driver will put the markers in
> an appropriate manner which the end-user application (e.g. open
> office) can understand and show the user. For example, the XML file of
> that ODF document will be modified and OO can show it  to user in its
> way. We will have to study about the OO style of version marking.
> Another method is to implement the marker style in our own way. For
> example, to show any marker, the XML file is modified so that user can
> see markers like ">>>> " or "====" in openoffice....In this case, we
> will have to just change the xml content in this way.

That is correct, but I would appreciate a bit more definitive research 
_before_ the project proposal, as a sign that you are capable of working 
out the details of the project.

> > - a timeline (weekly milestones should be fine, I guess) what should 
> >   be  achieved when, and
> 
> Timeline can be decided once we reach some conclusion and the work which 
> needs to be done become clear to us.

Last year, most successful applications detailed a proposed timeline in 
their proposal...

Do not get me wrong, I want this project to succeed.

But on the other hand, I feel the obligation to be a bit demanding for the 
gracious donation of Google: we _do_ want to have something stunningly 
awesome at the end of the summer.

And that means that I have to get the impression from the student proposal 
that something like that is at least _possible_.

Ciao,
Dscho

^ permalink raw reply

* Re: Git Large Object Support Proposal
From: david @ 2009-03-19 23:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Chacon, git list
In-Reply-To: <7veiwt6t6a.fsf@gitster.siamese.dyndns.org>

On Thu, 19 Mar 2009, Junio C Hamano wrote:

> Scott Chacon <schacon@gmail.com> writes:
>
>> But where Git instead stores a stub object and the large binary object
>> is pulled in via a separate mechanism. I was thinking that the client
>> could set a max file size and when a binary object larger than that is
>> staged, Git instead writes a stub blob like:
>>
>> ==
>> blob [size]\0
>> [sha of large blob]
>> ==
>
> An immediate pair of questions are, if you can solve the issue by
> delegating large media to somebody else (i.e. "media server"), and that
> somebody else can solve the issues you are having, (1) what happens if you
> lower that "large" threashold to "0 byte"?  Does that somebody else still
> work fine, and does the git that uses indirection also still work fine?
> If so why are you using git instead of that somebody else altogether?

ideally the difference between useing git with 'large' set to 0 and git 
with no pack file should be an extra lookup for the indirection.

it may be that some other file manipulation may not be possible for 
'large' files, resulting in some reduced functionality.

in any case, the added efficiancy of using pack files (both for local 
storage and for network transport) will make handling the 'large' files 
worse than the same size files through git (assuming that they can benifit 
from delta compression)

> and
> (2) what prevents us from stealing the trick that somebody else uses so
> that git itself can natively handle large blobs without indirection?

the key thing is that large files do not get mmaped or considered for 
inclusion in pack files (including cloning and pulling pack files)

to make them full first-class citizens you would need to make alternate 
code paths for everything that currently does mmap, making those paths 
either process the file a different way. in the long run that may be the 
best thing to do, but that's a lot of change compared to the proposed 
change.

> Without thinking the ramifications through myself, this sounds pretty much
> like a band-aid and will nend up hitting the same "blob is larger than we
> can handle" issue when you follow the indirection eventually, but that is
> just my gut feeling.

it depends on what you are doing with that file when you get to it. if you 
have to mmap it you may run into the same problem. but if the file is a 
streaming video, you can transport it around (with rsync, http, etc) 
without a problem, and using the file (playing the video) never keeps much 
of the file in memory, so it will be very useful on systems that would 
never have a chance of accessing the entire file through mmap.

David Lang

^ permalink raw reply

* Re: Git Large Object Support Proposal
From: Junio C Hamano @ 2009-03-19 23:44 UTC (permalink / raw)
  To: Scott Chacon; +Cc: git list
In-Reply-To: <d411cc4a0903191618x503db946n62d3132eece69175@mail.gmail.com>

Scott Chacon <schacon@gmail.com> writes:

> The point is that we don't keep this data as 'blob's - we don't try to
> compress them or add the header to them, they're too big and already
> compressed, it's a waste of time and often outside the memory
> tolerance of many systems. We keep only the stub in our db and stream
> the large media content directly to and from disk.  If we do a
> 'checkout' or something that would switch it out, we could store the
> data in '.git/media' or the equivalent until it's uploaded elsewhere.

Aha, that sounds like you can just maintain a set of out-of-tree symbolic
links that you keep track of, and let other people (e.g. rsync) deal with
the complexity of managing that side of the world.

And I think you can start experimenting it without any change to the core
datastructures.  In your single-page web site in which its sole html file
embeds an mpeg movie, you keep track of these two things in git:

	porn-of-the-day.html
        porn-of-the-day.mpg -> ../media/6066f5ae75ec.mpg

and any time you want to feed a new movie, you update the symlink to a
different one that lives outside the source-controlled tree, while
arranging the link target to be updated out-of-band.

^ permalink raw reply

* Re: Git Large Object Support Proposal
From: david @ 2009-03-19 23:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Chacon, git list
In-Reply-To: <7vzlfh5b7y.fsf@gitster.siamese.dyndns.org>

On Thu, 19 Mar 2009, Junio C Hamano wrote:

> Scott Chacon <schacon@gmail.com> writes:
>
>> The point is that we don't keep this data as 'blob's - we don't try to
>> compress them or add the header to them, they're too big and already
>> compressed, it's a waste of time and often outside the memory
>> tolerance of many systems. We keep only the stub in our db and stream
>> the large media content directly to and from disk.  If we do a
>> 'checkout' or something that would switch it out, we could store the
>> data in '.git/media' or the equivalent until it's uploaded elsewhere.
>
> Aha, that sounds like you can just maintain a set of out-of-tree symbolic
> links that you keep track of, and let other people (e.g. rsync) deal with
> the complexity of managing that side of the world.
>
> And I think you can start experimenting it without any change to the core
> datastructures.  In your single-page web site in which its sole html file
> embeds an mpeg movie, you keep track of these two things in git:
>
> 	porn-of-the-day.html
>        porn-of-the-day.mpg -> ../media/6066f5ae75ec.mpg
>
> and any time you want to feed a new movie, you update the symlink to a
> different one that lives outside the source-controlled tree, while
> arranging the link target to be updated out-of-band.

that would work, but the proposed change has some advantages

1. you store the sha1 of the real mpg in the 'large file' blob so you can 
detect problems

2. since it knows the sha1 of the real file, it can auto-create the real 
file as needed, without wasting space on too many copies of it.

David Lang

^ permalink raw reply

* Re: Push tag from shallow clone?
From: skillzero @ 2009-03-20  0:01 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git
In-Reply-To: <20090319180216.GT23521@spearce.org>

On Thu, Mar 19, 2009 at 11:02 AM, Shawn O. Pearce <spearce@spearce.org> wrote:
> skillzero@gmail.com wrote:
>> The documentation for git clone says that if you use --depth=1 to make
>> a shallow clone that you can't push it. But I made a shallow clone,
>> created a tag, then tried to push that tag and it worked. Am I just
>> getting lucky or is it safe to push a tag with a shallow clone?
>
> Yea, you are getting lucky.  The tag is easily identified as one
> object head of the current branch on the remote, and the client is
> able to produce the pack and send it.
>
> If the remote branch gets modified in the interm, the builder may
> not be able to deduce what it needs to send, and will attempt to
> pack a lot more data, potentially finding the missing parents from
> where it is shallow.
>
> Why not just have a central area on the build server that keeps
> full clones of everything, and use "git clone -s" or "git clone
> --reference" in order to create the new work area for the builder?

Thanks for the info. As for using --reference, one of the things that
the builder does is archive the build in its entirety so it can be
reproduced later on a different machine. I'll probably just need to
use a full clone (or do some kind of stripping after the build
succeeds and before it archives).

^ permalink raw reply

* Re: Google Summer of Code 2009: GIT
From: david @ 2009-03-20  0:07 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: saurabh gupta, Junio C Hamano, git
In-Reply-To: <alpine.DEB.1.00.0903200034230.10279@pacific.mpi-cbg.de>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4504 bytes --]

On Fri, 20 Mar 2009, Johannes Schindelin wrote:

> Hi,
>
> On Fri, 20 Mar 2009, saurabh gupta wrote:
>
>> On Thu, Mar 19, 2009 at 4:46 AM, Johannes Schindelin
>> <Johannes.Schindelin@gmx.de> wrote:
>>
>>> For example, if we decide that OOXML is a must (as it is a proper
>>> standard, and many people will benefit from it), we will most likely
>>> end up in having to write a merge _driver_ (to handle those .zip
>>> files), _and_ a merge _helper_, although we can avoid writing our own
>>> GUI, as we can create an OOXML that has its own version of conflict
>>> markers.
>>
>> Well, for ODF type document, we can write a merge driver which will
>> change the xml file in an appropriate way that OO can understand it and
>> the user can see the merge result/conflict in a comfortable way. As
>> described by Junio, in this case, a dedicated merge helper is not needed
>> as OO can parse the markers made by merge-driver and provide the user to
>> resolve the conflict and register the changes to index.
>
> There is also the idea that OOffice has building blocks in place to help
> resolving merge conflicts.  For a successful application, you will have to
> show that you researched that option, and describe how well/badly it fits
> with the goal of the project.

true, although for the 'simple case' of an ODF text file you can use 
text strings exactly the same way you do with a text file. the difference 
is that when inserting the two versions of things into the 'conflict' 
version of the ODF file you need to make sure that you include the 
complete open/close set of tags in each version.

for example if file 1 has

<tag1 param='1'>
text
</tag1>

and file 2 has

<tag1 param='1'>
text2
</tag1>

you can do


<tag1 param='1'>
>>>>>>>>
text
========
text2
<<<<<<<<
</tag1>


but if file2 has


<tag1 param='2'>
text
</tag1>

your conflict would need to be

>>>>>>>>
<tag1 param='1'>
text
</tag1>
========
<tag1 param='1'>
text
</tag1>
<<<<<<<<

(although since < and > are special characters, they would really be &gt 
and &lt in the file)

if there are nicer ways to do this, supporting them would be good, but as 
long as the marker strings are configurable you can probably do so

you could change
the first string from >>>>>>> to <conflict option='1'>
the second string from ======== to </conflict><conflict option='2'>
the third string from <<<<<<< to </conflict>

and now instead of having to search for those special text strings, your 
ODF editor would 'magicly' identify them and remind you that you hadn't 
resolved all of them.

>>> - knowing what data types we want to support _at the least_, and what
>>>   data  types we keep for the free skate,
>>
>> As of now, how about going for XML files. For this summer, we can go for
>> XML files and latex files can be handled later.
>
> If your goal is just XML files (without any more specific goal, like ODF
> or SVG), I am afraid that I think that project is not worth 4500 dollar
> from Google's pocket.  I mean, we are not talking peanuts here.

I see good support for XML being a superset of what's needed to support 
ODF or SVG, not a subset.

or another way of putting it, the gitconfig definition for ODF would be a 
shortcut for a longer XML definition with a long list of options.

to be accepted by google, they will need to feel that the work is worth 
the money, so defining what file types you are going to support is an 
important item. This can include saying 'by handling this type of tweak to 
an XML file we can then handle file type Y instead of just file type X 
with the same merge driver'

as you are considering this list, please think about the items I mentioned 
earlier in the thread that would improve the support for config files and 
maintainers files (unordered lines/paragraphs)

>>> - a timeline (weekly milestones should be fine, I guess) what should
>>>   be  achieved when, and
>>
>> Timeline can be decided once we reach some conclusion and the work which
>> needs to be done become clear to us.
>
> Last year, most successful applications detailed a proposed timeline in
> their proposal...
>
> Do not get me wrong, I want this project to succeed.
>
> But on the other hand, I feel the obligation to be a bit demanding for the
> gracious donation of Google: we _do_ want to have something stunningly
> awesome at the end of the summer.
>
> And that means that I have to get the impression from the student proposal
> that something like that is at least _possible_.

sounds reasonable.

David Lang

^ permalink raw reply

* Re: Git Large Object Support Proposal
From: Junio C Hamano @ 2009-03-20  0:11 UTC (permalink / raw)
  To: david; +Cc: Scott Chacon, git list
In-Reply-To: <alpine.DEB.1.10.0903191650160.16753@asgard.lang.hm>

david@lang.hm writes:

> On Thu, 19 Mar 2009, Junio C Hamano wrote:
>
>> Scott Chacon <schacon@gmail.com> writes:
>>
>>> The point is that we don't keep this data as 'blob's - we don't try to
>>> compress them or add the header to them, they're too big and already
>>> compressed, it's a waste of time and often outside the memory
>>> tolerance of many systems. We keep only the stub in our db and stream
>>> the large media content directly to and from disk.  If we do a
>>> 'checkout' or something that would switch it out, we could store the
>>> data in '.git/media' or the equivalent until it's uploaded elsewhere.
>>
>> Aha, that sounds like you can just maintain a set of out-of-tree symbolic
>> links that you keep track of, and let other people (e.g. rsync) deal with
>> the complexity of managing that side of the world.
>>
>> And I think you can start experimenting it without any change to the core
>> datastructures.  In your single-page web site in which its sole html file
>> embeds an mpeg movie, you keep track of these two things in git:
>>
>> 	porn-of-the-day.html
>>        porn-of-the-day.mpg -> ../media/6066f5ae75ec.mpg
>>
>> and any time you want to feed a new movie, you update the symlink to a
>> different one that lives outside the source-controlled tree, while
>> arranging the link target to be updated out-of-band.
>
> that would work, but the proposed change has some advantages
>
> 1. you store the sha1 of the real mpg in the 'large file' blob so you
> can detect problems

You store the unique identifier of the real mpg in the symbolic link
target which is a blob payload, so you can detect problems already.  I
deliberately said "unique identifier"; you seem to think saying SHA-1
brings something magical but I do not think it needs to be even blob's
SHA-1.  Hashing that much data costs.

In any case, you can have a script (or client-side hook) that does:

    (1) find the out-of-tree symlinks in the index (or in the work tree);

    (2) if it is dangling, and if you have definition of where to get that
        hierarchy from (e.g ../media), run rsync or wget or whatever
        external means to grab it.

and call it after "git pull" updates from some other place.  The "git
media" of Scott's message could be an alias to such a command.

Adding a new type "external-blob" would be an unwelcome pain.  Reusing
"blob" so that existing "blob" codepath now needs to notice special "0"
that is not length "0" is even bigger pain than that.

And that is a pain for unknown benefit, especially when you can start
experimenting without any changes to the existing data structure.  In the
worst case, the experiment may not pan out as well as you hoped and if
that is the end of the story, so be it.  It is not a great loss.  If it
works well enough and we can have the external large media support without
any changes to the data structure, that would be really great.  If it
sort-of works but hits limitation, we can analyze how best to overcome
that limitation, and at that time it _might_ turn out to be the best
approach to introduce a new blob type.

But I do not think we know that yet.

In the longer run, as you speculated in your message, I think the native
blob codepaths need to be updated to tolerate a large, unmappable objects
better.  With that goal in mind, I think it is a huge mistake to
prematurely introduce an arbitrary distinct "blob" and "large blob" types,
if in the end they need to be merged back again; it would force the future
code indefinitely to care about the historical "large blob" types that was
once supported.

> 2. since it knows the sha1 of the real file, it can auto-create the
> real file as needed, without wasting space on too many copies of it.

Hmm, since when SHA-1 is reversible?

^ permalink raw reply

* [PATCH] Documentation/git-filter-branch.txt: Remove unnecessary URL quoting
From: Johan Herland @ 2009-03-19 23:12 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Thomas Rast

Embedding the URL in '+++' causes AsciiDoc (v8.4.1) to generate invalid XML.
None of the other URLs in Git's documentation are quoted in this manner.
There's no reason to treat this URL differently.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-filter-branch.txt |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
index 237f85e..64b99d7 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -361,7 +361,7 @@ objects until you tell it to.  First make sure that:
 Then there are two ways to get a smaller repository.  A safer way is
 to clone, that keeps your original intact.
 
-* Clone it with `git clone +++file:///path/to/repo+++`.  The clone
+* Clone it with `git clone file:///path/to/repo`.  The clone
   will not have the removed objects.  See linkgit:git-clone[1].  (Note
   that cloning with a plain path just hardlinks everything!)
 
-- 
1.6.2.1.352.gae594.dirty

^ permalink raw reply related

* Re: Git Large Object Support Proposal
From: Scott Chacon @ 2009-03-20  0:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: david, git list
In-Reply-To: <7vtz5p59zp.fsf@gitster.siamese.dyndns.org>

Hey,

On Thu, Mar 19, 2009 at 5:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
> david@lang.hm writes:
>
>> On Thu, 19 Mar 2009, Junio C Hamano wrote:
>>
>>> Scott Chacon <schacon@gmail.com> writes:
>>>
>>>> The point is that we don't keep this data as 'blob's - we don't try to
>>>> compress them or add the header to them, they're too big and already
>>>> compressed, it's a waste of time and often outside the memory
>>>> tolerance of many systems. We keep only the stub in our db and stream
>>>> the large media content directly to and from disk.  If we do a
>>>> 'checkout' or something that would switch it out, we could store the
>>>> data in '.git/media' or the equivalent until it's uploaded elsewhere.
>>>
>>> Aha, that sounds like you can just maintain a set of out-of-tree symbolic
>>> links that you keep track of, and let other people (e.g. rsync) deal with
>>> the complexity of managing that side of the world.
>>>
>>> And I think you can start experimenting it without any change to the core
>>> datastructures.  In your single-page web site in which its sole html file
>>> embeds an mpeg movie, you keep track of these two things in git:
>>>
>>>      porn-of-the-day.html
>>>        porn-of-the-day.mpg -> ../media/6066f5ae75ec.mpg
>>>
>>> and any time you want to feed a new movie, you update the symlink to a
>>> different one that lives outside the source-controlled tree, while
>>> arranging the link target to be updated out-of-band.

It seems like the main problem here would be that most operations in
the working directory would be overwriting not the symlink but the
file it points to.  If you do a simple 'cp ~/generated_file.mpg
porn-of-the-day.mpg' (to upload your newest and bestest porn), it will
overwrite the '../media/6066f5ae75ec.mpg' file, not the symlink so
that we can generate a new symlink.  Then if we haven't uploaded the
'../media/6066f5ae75ec.mpg' file anywhere yet, it's a goner.  Right?
What you are proposing is almost exactly what I want to do, but I'm
concerned with this issue of the symlink reference not working right
for normal working directory operations.  If a file is never
overwritten, however, this is basically identical to what I wanted to
do.

Scott


>>
>> that would work, but the proposed change has some advantages
>>
>> 1. you store the sha1 of the real mpg in the 'large file' blob so you
>> can detect problems
>
> You store the unique identifier of the real mpg in the symbolic link
> target which is a blob payload, so you can detect problems already.  I
> deliberately said "unique identifier"; you seem to think saying SHA-1
> brings something magical but I do not think it needs to be even blob's
> SHA-1.  Hashing that much data costs.
>
> In any case, you can have a script (or client-side hook) that does:
>
>    (1) find the out-of-tree symlinks in the index (or in the work tree);
>
>    (2) if it is dangling, and if you have definition of where to get that
>        hierarchy from (e.g ../media), run rsync or wget or whatever
>        external means to grab it.
>
> and call it after "git pull" updates from some other place.  The "git
> media" of Scott's message could be an alias to such a command.
>
> Adding a new type "external-blob" would be an unwelcome pain.  Reusing
> "blob" so that existing "blob" codepath now needs to notice special "0"
> that is not length "0" is even bigger pain than that.
>
> And that is a pain for unknown benefit, especially when you can start
> experimenting without any changes to the existing data structure.  In the
> worst case, the experiment may not pan out as well as you hoped and if
> that is the end of the story, so be it.  It is not a great loss.  If it
> works well enough and we can have the external large media support without
> any changes to the data structure, that would be really great.  If it
> sort-of works but hits limitation, we can analyze how best to overcome
> that limitation, and at that time it _might_ turn out to be the best
> approach to introduce a new blob type.
>
> But I do not think we know that yet.
>
> In the longer run, as you speculated in your message, I think the native
> blob codepaths need to be updated to tolerate a large, unmappable objects
> better.  With that goal in mind, I think it is a huge mistake to
> prematurely introduce an arbitrary distinct "blob" and "large blob" types,
> if in the end they need to be merged back again; it would force the future
> code indefinitely to care about the historical "large blob" types that was
> once supported.
>
>> 2. since it knows the sha1 of the real file, it can auto-create the
>> real file as needed, without wasting space on too many copies of it.
>
> Hmm, since when SHA-1 is reversible?
>

^ permalink raw reply

* Re: [PATCH v2] Introduce %<branch> as shortcut to the tracked branch
From: Johannes Schindelin @ 2009-03-20  0:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, Andreas Gruenbacher, git
In-Reply-To: <alpine.DEB.1.00.0903182343580.10279@pacific.mpi-cbg.de>

Hi,

On Wed, 18 Mar 2009, Johannes Schindelin wrote:

> On Wed, 18 Mar 2009, Junio C Hamano wrote:
> 
> > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > 
> > > Suggested by Pasky.
> > >
> > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > 
> > In the longer term who suggested matters much less than why such a 
> > feature is desirable, how it is used, and without it what is impossible 
> > and/or cumbersome.  What's the motivation behind this?
> > 
> > You do not have to explain it to me, but you should explain it to the 
> > history that records this commit, and to the users who read doccos.
> 
> And that's not all... Documentation updates and tests for % and %<branch> 
> are missing, too.

Darn, darn, DARN!

Just when I squeezed that half an hour from the time I have to sleep, to 
provide documentation and tests, _just_ after I finished that, I got the 
idea that '%' might not be a 'bad ref char' after all.

And of course I was correct.

Just try this:

	$ git checkout -b %helloworld

and weep... so, no v3 of that patch, even if I have it right here.

Ciao,
Dscho "who goes to bed being frustrated"

^ permalink raw reply

* Re: Git Large Object Support Proposal
From: david @ 2009-03-20  0:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Scott Chacon, git list
In-Reply-To: <7vtz5p59zp.fsf@gitster.siamese.dyndns.org>

On Thu, 19 Mar 2009, Junio C Hamano wrote:

> david@lang.hm writes:
>
>> On Thu, 19 Mar 2009, Junio C Hamano wrote:
>>
>>> Scott Chacon <schacon@gmail.com> writes:
>>>
>>>> The point is that we don't keep this data as 'blob's - we don't try to
>>>> compress them or add the header to them, they're too big and already
>>>> compressed, it's a waste of time and often outside the memory
>>>> tolerance of many systems. We keep only the stub in our db and stream
>>>> the large media content directly to and from disk.  If we do a
>>>> 'checkout' or something that would switch it out, we could store the
>>>> data in '.git/media' or the equivalent until it's uploaded elsewhere.
>>>
>>> Aha, that sounds like you can just maintain a set of out-of-tree symbolic
>>> links that you keep track of, and let other people (e.g. rsync) deal with
>>> the complexity of managing that side of the world.
>>>
>>> And I think you can start experimenting it without any change to the core
>>> datastructures.  In your single-page web site in which its sole html file
>>> embeds an mpeg movie, you keep track of these two things in git:
>>>
>>> 	porn-of-the-day.html
>>>        porn-of-the-day.mpg -> ../media/6066f5ae75ec.mpg
>>>
>>> and any time you want to feed a new movie, you update the symlink to a
>>> different one that lives outside the source-controlled tree, while
>>> arranging the link target to be updated out-of-band.
>>
>> that would work, but the proposed change has some advantages
>>
>> 1. you store the sha1 of the real mpg in the 'large file' blob so you
>> can detect problems
>
> You store the unique identifier of the real mpg in the symbolic link
> target which is a blob payload, so you can detect problems already.  I
> deliberately said "unique identifier"; you seem to think saying SHA-1
> brings something magical but I do not think it needs to be even blob's
> SHA-1.  Hashing that much data costs.

but hashing the data and using that as the unique identifier gives you 
some advantages.

1. you can detect file corruption

2. you can trivially detect duplicates (even if the duplicates come from 
different sources)

3. it's repeatable (you will always get the same hash from the same input)

> In any case, you can have a script (or client-side hook) that does:
>
>    (1) find the out-of-tree symlinks in the index (or in the work tree);
>
>    (2) if it is dangling, and if you have definition of where to get that
>        hierarchy from (e.g ../media), run rsync or wget or whatever
>        external means to grab it.
>
> and call it after "git pull" updates from some other place.  The "git
> media" of Scott's message could be an alias to such a command.
>
> Adding a new type "external-blob" would be an unwelcome pain.  Reusing
> "blob" so that existing "blob" codepath now needs to notice special "0"
> that is not length "0" is even bigger pain than that.
>
> And that is a pain for unknown benefit, especially when you can start
> experimenting without any changes to the existing data structure.  In the
> worst case, the experiment may not pan out as well as you hoped and if
> that is the end of the story, so be it.  It is not a great loss.  If it
> works well enough and we can have the external large media support without
> any changes to the data structure, that would be really great.  If it
> sort-of works but hits limitation, we can analyze how best to overcome
> that limitation, and at that time it _might_ turn out to be the best
> approach to introduce a new blob type.
>
> But I do not think we know that yet.
>
> In the longer run, as you speculated in your message, I think the native
> blob codepaths need to be updated to tolerate a large, unmappable objects
> better.  With that goal in mind, I think it is a huge mistake to
> prematurely introduce an arbitrary distinct "blob" and "large blob" types,
> if in the end they need to be merged back again; it would force the future
> code indefinitely to care about the historical "large blob" types that was
> once supported.

valid point.

keep in mind that what's a "large, unmappable object" on one system may be 
no problem on another.

>> 2. since it knows the sha1 of the real file, it can auto-create the
>> real file as needed, without wasting space on too many copies of it.
>
> Hmm, since when SHA-1 is reversible?

when it is processing a new, unknown file it can hash it, and look to see 
if a file with that hash exists. if so the work is done, if not it can 
create a file with that hash.

by far the best long-term option would be to make all the codepaths handle 
unmappable files, the question is how large a task that would be.

David Lang

^ permalink raw reply

* Re: Google Summer of Code 2009: GIT
From: Johannes Schindelin @ 2009-03-20  0:30 UTC (permalink / raw)
  To: david; +Cc: saurabh gupta, Junio C Hamano, git
In-Reply-To: <alpine.DEB.1.10.0903191652500.16753@asgard.lang.hm>

Hi,

On Thu, 19 Mar 2009, david@lang.hm wrote:

> I see good support for XML being a superset of what's needed to support 
> ODF or SVG, not a subset.

No, not at all.  If we can get away with the default 3-way merge of Git, 
the generic XML merge driver be damned.

I'd rather have more file types supported that are useful for the average 
user, than a generic XML merge driver that is useful to only a handful of 
people.

Ciao,
Dscho

^ permalink raw reply

* ref name troubles, was Re: [PATCH v2] Introduce %<branch> as shortcut to the tracked branch
From: Johannes Schindelin @ 2009-03-20  0:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, Andreas Gruenbacher, git
In-Reply-To: <alpine.DEB.1.00.0903200121330.10279@pacific.mpi-cbg.de>

Hi,

On Fri, 20 Mar 2009, Johannes Schindelin wrote:

> Just try this:
> 
> 	$ git checkout -b %helloworld

It gets worse.  Much worse.

Try this (triggered by a comment by Ilari on IRC):

	$ git checkout -b '@{1}'

That _works_! WTH?

Ciao,
Dscho

^ permalink raw reply

* Re: ref name troubles, was Re: [PATCH v2] Introduce %<branch> as shortcut to the tracked branch
From: Shawn O. Pearce @ 2009-03-20  0:40 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Petr Baudis, Andreas Gruenbacher, git
In-Reply-To: <alpine.DEB.1.00.0903200137230.10279@pacific.mpi-cbg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Fri, 20 Mar 2009, Johannes Schindelin wrote:
> 
> > Just try this:
> > 
> > 	$ git checkout -b %helloworld
> 
> It gets worse.  Much worse.
> 
> Try this (triggered by a comment by Ilari on IRC):
> 
> 	$ git checkout -b '@{1}'
> 
> That _works_! WTH?

'@' is not reserved.  Neither is '{' or '}'.  Neither is
the combination.

Waaaaaay back when I added reflog query syntax I tried to use only
'@', people with branch names like 'foo@bar' made a point that they
didn't want to reserve it.  We stuck the {} in as a "highly unlikely
to conflict with a branch name" and others pointed out most shells
will do fun things with those, but we kept it to avoid ambiguous
meanings of "foo@noon" when foo@noon is already a branch.

Fast-forward more than 2 years, and the "@{...}" syntax is quite
widely used, perhaps more so than "@" in a branch name.  But its
still not reserved.

So yea, you can create a branch named "foo@{1}".

$ git branch foo@{1}
$ git branch
  cache-walk
  foo@{1}
  master
* mergebase-bug
  mw/blame
  rr/compareeditor
  transport-mirror
  worktree-api
  worktree-edit

Yay.

-- 
Shawn.

^ permalink raw reply

* Re: Git Large Object Support Proposal
From: Junio C Hamano @ 2009-03-20  0:41 UTC (permalink / raw)
  To: Scott Chacon; +Cc: git list
In-Reply-To: <7vzlfh5b7y.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Scott Chacon <schacon@gmail.com> writes:
>
>> The point is that we don't keep this data as 'blob's - we don't try to
>> compress them or add the header to them, they're too big and already
>> compressed, it's a waste of time and often outside the memory
>> tolerance of many systems. We keep only the stub in our db and stream
>> the large media content directly to and from disk.  If we do a
>> 'checkout' or something that would switch it out, we could store the
>> data in '.git/media' or the equivalent until it's uploaded elsewhere.
>
> Aha, that sounds like you can just maintain a set of out-of-tree symbolic
> links that you keep track of, and let other people (e.g. rsync) deal with
> the complexity of managing that side of the world.
>
> And I think you can start experimenting it without any change to the core
> datastructures.  In your single-page web site in which its sole html file
> embeds an mpeg movie, you keep track of these two things in git:
>
> 	porn-of-the-day.html
>       porn-of-the-day.mpg -> ../media/6066f5ae75ec.mpg
>
> and any time you want to feed a new movie, you update the symlink to a
> different one that lives outside the source-controlled tree, while
> arranging the link target to be updated out-of-band.

I wasn't thinking clearly.

This is not really a new "huge blob" type but is just a slightly different
flavor of symbolic link.  Its link target name may resemble SHA-1 object
name, but it does not participate in the reachability computation.  it
won't be fetched nor pushed, and if you ever get one via the usual git
codepath into your object store, it will be subject to "git gc", but you
are unlikely to place it inside your object store to begin with.  You have
something like:

    100644 2222222222222222222222222222222222222222 porn-of-the-day.html
    120001 5ed22400803161de2f49331d005be424b7f6d036 porn-of-the-day.mpg

where 5ed22400803161de2f49331d005be424b7f6d036 is a blob that stores the
name of a regular blob object, 6ff87c4664981e4397625791c8ea3bbb5f2279a3,
in your tree object (and in the index), and:

 * When running "git media", you have a configuration to tell it where the
   external media files are kept (e.g. ../media in the previous example),
   and it rsyncs to ../media/6ff87c4664981e4397625791c8ea3bbb5f2279a3 in
   some unspecified way from some unspecified place;

 * When checking out porn-of-the-day.mpg, it becomes a symbolic link that
   points at ../media/6ff87c4664981e4397625791c8ea3bbb5f2279a3 (because it
   follows the same site-specific configuration);

 * When comparing the index (that records the 120001 "slightly different
   symbolic link" entry with the shell blob object) and the work tree
   (that has a symbolic link that points at ../media/6ff87c46649...), you
   do not look at the contents of the ../media/6ff87c46649... file, but
   you do look at its name, apply a reverse of the mapping "checkout"
   codepath did to arrive at 6ff87c4664981e4397625791c8ea3bbb5f2279a3
   SHA-1, compare that with what the shell blob object records.  If you
   updated the symbolic link in the work tree, "git add" would result in
   creating a new shell object (just like when you change the link target
   for a normal symbolic link) that records the external blob.

It still is bothersome that we need to introduce a new tree nodetype
(rather, a new blob subtype similar to "regular file blob", "symlink
blob"), but it is of much less impact than what I originally
misunderstood.

Having said that, if that is what is happening, I do not see the need for
the payload to be even a blob SHA-1 name.  Any identifier that is
convenient to generate in the application domain could do.

But that is a minor detail that immediately popped at me; there may be
other minor details I may find objectionable later.  But overall, I think
your proposal makes sense.

I still think a large part of preliminary experiments to see the benefit
of this approach can and should be done without and before touching the
core part (like introduction of the slightly different symlink 1200001
mode), though.

^ permalink raw reply

* Re: ref name troubles, was Re: [PATCH v2] Introduce %<branch> as shortcut to the tracked branch
From: Shawn O. Pearce @ 2009-03-20  0:44 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Petr Baudis, Andreas Gruenbacher, git
In-Reply-To: <20090320004029.GX23521@spearce.org>

"Shawn O. Pearce" <spearce@spearce.org> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > On Fri, 20 Mar 2009, Johannes Schindelin wrote:
> > 
> > > Just try this:
> > > 
> > > 	$ git checkout -b %helloworld
> > 
> > It gets worse.  Much worse.
> > 
> > Try this (triggered by a comment by Ilari on IRC):
> > 
> > 	$ git checkout -b '@{1}'
> > 
> > That _works_! WTH?
> 
> '@' is not reserved.  Neither is '{' or '}'.  Neither is
> the combination.

In hindsight, I wish we had reserved all of the "fun" characters
like !@#$%^&*():;~'"\ and prevented them from ever appearing in a
ref name.

Instead only what check_ref_format() in refs.c ll.694 tells is
is reserved:

 671 /*
 672  * Make sure "ref" is something reasonable to have under ".git/refs/";
 673  * We do not like it if:
 674  *
 675  * - any path component of it begins with ".", or
 676  * - it has double dots "..", or
 677  * - it has ASCII control character, "~", "^", ":" or SP, anywhere, or
 678  * - it ends with a "/".
 679  */

Heh.  At least : and SP are reserved.

Use BEL for your %helloworld hack.  It'll be fun to type.  :-)

-- 
Shawn.

^ permalink raw reply

* git-gui: some French translation enhancements
From: Nicolas Sebrecht @ 2009-03-20  0:54 UTC (permalink / raw)
  To: Git List; +Cc: Sam Hocevar, Christian Couder, Alexandre Bourget
In-Reply-To: <20090318205410.GA900@zoy.org>


This one is built on top of the previous Sam's patch.

^ permalink raw reply

* [PATCH 2/2] git-gui: some French translation enhancements
From: Nicolas Sebrecht @ 2009-03-20  0:54 UTC (permalink / raw)
  To: Git List; +Cc: Sam Hocevar, Christian Couder, Alexandre Bourget,
	Nicolas Sebrecht
In-Reply-To: <20090318205410.GA900@zoy.org>



Signed-off-by: Nicolas Sebrecht <nicolas.s-dev@laposte.net>
---
 po/fr.po |   21 ++++++++++-----------
 1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/po/fr.po b/po/fr.po
index eb5f68e..2f96054 100644
--- a/po/fr.po
+++ b/po/fr.po
@@ -316,11 +316,11 @@ msgstr "Indexer toutes modifications"
 
 #: git-gui.sh:2479
 msgid "Unstage From Commit"
-msgstr "Désindexer"
+msgstr "Retirer de l'index"
 
 #: git-gui.sh:2484 lib/index.tcl:410
 msgid "Revert Changes"
-msgstr "Révoquer les modifications"
+msgstr "Inverser les modifications (revert)"
 
 #: git-gui.sh:2491 git-gui.sh:3069
 msgid "Show Less Context"
@@ -485,11 +485,11 @@ msgstr "Revenir à la version de base"
 
 #: git-gui.sh:3169
 msgid "Unstage Hunk From Commit"
-msgstr "Désindexer la section"
+msgstr "Enlever la section de l'index"
 
 #: git-gui.sh:3170
 msgid "Unstage Line From Commit"
-msgstr "Désindexer la ligne"
+msgstr "Enlever la ligne de l'index"
 
 #: git-gui.sh:3172
 msgid "Stage Hunk For Commit"
@@ -1705,7 +1705,7 @@ msgstr "Déverrouiller l'index"
 #: lib/index.tcl:287
 #, tcl-format
 msgid "Unstaging %s from commit"
-msgstr "Désindexation de : %s"
+msgstr "Enlève %s de l'index"
 
 #: lib/index.tcl:326
 msgid "Ready to commit."
@@ -1719,18 +1719,17 @@ msgstr "Ajout de %s"
 #: lib/index.tcl:396
 #, tcl-format
 msgid "Revert changes in file %s?"
-msgstr "Révoquer les modifications dans le fichier %s ? "
+msgstr "Inverser les modifications dans le fichier %s ? "
 
 #: lib/index.tcl:398
 #, tcl-format
 msgid "Revert changes in these %i files?"
-msgstr "Révoquer les modifications dans ces %i fichiers ?"
+msgstr "Inverser les modifications dans ces %i fichiers ?"
 
 #: lib/index.tcl:406
 msgid "Any unstaged changes will be permanently lost by the revert."
 msgstr ""
-"Toutes les modifications non-indexées seront définitivement perdues par "
-"la révocation."
+"Toutes les modifications non-indexées seront définitivement perdues."
 
 #: lib/index.tcl:409
 msgid "Do Nothing"
@@ -1738,12 +1737,12 @@ msgstr "Ne rien faire"
 
 #: lib/index.tcl:427
 msgid "Reverting selected files"
-msgstr "Révocation en cours des fichiers selectionnés"
+msgstr "Inversion en cours des fichiers selectionnés"
 
 #: lib/index.tcl:431
 #, tcl-format
 msgid "Reverting %s"
-msgstr "Révocation en cours de %s"
+msgstr "Inversion en cours de %s"
 
 #: lib/merge.tcl:13
 msgid ""
-- 
1.6.2.169.g92418

^ permalink raw reply related

* Re: git am from scratch
From: Andreas Gruenbacher @ 2009-03-20  1:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20090319210214.GA17589@coredump.intra.peff.net>

On Thursday, 19 March 2009 22:02:14 Jeff King wrote:
> Yikes. Out of curiosity, what did you use to do the CVS import?

git-cvsimport

Andreas

^ permalink raw reply

* How to commit changes if remote repository changed directory structure?
From: andholt @ 2009-03-20  1:17 UTC (permalink / raw)
  To: git


I have a lot of local changes to add, commit, and push. Right now our
directory structure is 1/2/3. Another developer decided to move everything
up one level, so used git move to move 3 to 2, and removed 3, so now the
level is 1/2. However, locally, all of my changes are in 1/2/3. 

I want to commit my changes and merge them into the new directory structure.
How would I go about doing that?

Thanks!
-- 
View this message in context: http://www.nabble.com/How-to-commit-changes-if-remote-repository-changed-directory-structure--tp22612715p22612715.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply

* Re: Google Summer of Code 2009: GIT
From: david @ 2009-03-20  3:09 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: saurabh gupta, Junio C Hamano, git
In-Reply-To: <alpine.DEB.1.00.0903200128020.10279@pacific.mpi-cbg.de>

On Fri, 20 Mar 2009, Johannes Schindelin wrote:

> On Thu, 19 Mar 2009, david@lang.hm wrote:
>
>> I see good support for XML being a superset of what's needed to support
>> ODF or SVG, not a subset.
>
> No, not at all.  If we can get away with the default 3-way merge of Git,
> the generic XML merge driver be damned.

I would agree, but unless you don't do any auto-merging and punt 
everything to the 'conflict resolution tool' the existing merge drivers 
won't work for a structured file. And if you do want to do that, it's not 
a git project, it's a project for whatever tool you are working from to be 
the GUI plus (possibly) a smidge of scripting to call that tool from git.

> I'd rather have more file types supported that are useful for the average
> user, than a generic XML merge driver that is useful to only a handful of
> people.

we are both after the same thing, the most use to the average user.

you look at SVG, ODF word, ODF spreadsheet, OOXML, etc as completely 
seperate things that should have support developed seperatly.

I look at the same formats and am seeing a strong similarity between them. 
that being that they are all structured XML. so if you get the ability to 
handle XML in a configurable way (and define the appropriate 
configurations), you not only get the tools for these things, but many 
others as well.

I would be a little disappointed if the result of the summer only handled 
XML files (and more so if it only handled a handful of popular XML-based 
files). I think that there are a number of file types that aren't handled 
well by the current merge drivers. Saurabh has voiced the opinion that 
many of these have similar problems as the XML situations, so it may end 
up making sense to handle them in the same driver.

it would probably be a good thing to see suggestions from a bunch of 
people as to what file types they see being useful.

David Lang

^ permalink raw reply

* [PATCH] Documentation: Reworded example text in git-bisect.txt.
From: David J. Mellor @ 2009-03-20  3:35 UTC (permalink / raw)
  To: gitster; +Cc: git

Reworded to avoid splitting sentences across examples of command usage.

Signed-off-by: David J. Mellor <dmellor@whistlingcat.com>
---
 Documentation/git-bisect.txt |   44 ++++++++++++++++++++++-------------------
 1 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt
index 1a4a527..93d9fc0 100644
--- a/Documentation/git-bisect.txt
+++ b/Documentation/git-bisect.txt
@@ -50,28 +50,29 @@ $ git bisect good v2.6.13-rc2    # v2.6.13-rc2 was the last version
 ------------------------------------------------
 
 When you have specified at least one bad and one good version, the
-command bisects the revision tree and outputs something similar to:
+command bisects the revision tree and outputs something similar to
+the following:
 
 ------------------------------------------------
 Bisecting: 675 revisions left to test after this
 ------------------------------------------------
 
-and then checks out the state in the middle. You would now compile
-that kernel and boot it. If the booted kernel works correctly, you
-would then issue the following command:
+The state in the middle of the set of revisions is then checked out.
+You would now compile that kernel and boot it. If the booted kernel
+works correctly, you would then issue the following command:
 
 ------------------------------------------------
 $ git bisect good			# this one is good
 ------------------------------------------------
 
-which would then output something similar to:
+The output of this command would be something similar to the following:
 
 ------------------------------------------------
 Bisecting: 337 revisions left to test after this
 ------------------------------------------------
 
-and you continue along, compiling that one, testing it, and depending
-on whether it is good or bad issuing the command "git bisect good"
+You keep repeating this process, compiling the tree, testing it, and
+depending on whether it is good or bad issuing the command "git bisect good"
 or "git bisect bad" to ask for the next bisection.
 
 Eventually there will be no more revisions left to bisect, and you
@@ -81,7 +82,7 @@ Bisect reset
 ~~~~~~~~~~~~
 
 To return to the original head after a bisect session, you issue the
-command:
+following command:
 
 ------------------------------------------------
 $ git bisect reset
@@ -94,14 +95,14 @@ the bisection state).
 Bisect visualize
 ~~~~~~~~~~~~~~~~
 
-During the bisection process, you issue the command:
+To see the currently remaining suspects in 'gitk', the following command
+is issued during the bisection process:
 
 ------------
 $ git bisect visualize
 ------------
 
-to see the currently remaining suspects in 'gitk'.  `view` may also
-be used as a synonym for `visualize`.
+`view` may also be used as a synonym for `visualize`.
 
 If the 'DISPLAY' environment variable is not set, 'git log' is used
 instead.  You can also give command line options such as `-p` and
@@ -114,16 +115,17 @@ $ git bisect view --stat
 Bisect log and bisect replay
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-After having marked revisions as good or bad, then:
+After having marked revisions as good or bad, you issue the following
+command to show what has been done so far:
 
 ------------
 $ git bisect log
 ------------
 
-shows what you have done so far. If you discover that you made a mistake
-in specifying the status of a revision, you can save the output of this
-command to a file, edit it to remove the incorrect entries, and then issue
-the following commands to return to a corrected state:
+If you discover that you made a mistake in specifying the status of a
+revision, you can save the output of this command to a file, edit it to
+remove the incorrect entries, and then issue the following commands to
+return to a corrected state:
 
 ------------
 $ git bisect reset
@@ -173,8 +175,8 @@ using the "'<commit1>'..'<commit2>'" notation. For example:
 $ git bisect skip v2.5..v2.6
 ------------
 
-would mean that no commit between `v2.5` excluded and `v2.6` included
-can be tested.
+The effect of this would be that no commit between `v2.5` excluded and
+`v2.6` included could be tested.
 
 Note that if you also want to skip the first commit of the range you
 would issue the command:
@@ -183,14 +185,16 @@ would issue the command:
 $ git bisect skip v2.5 v2.5..v2.6
 ------------
 
-and the commit pointed to by `v2.5` would also be skipped.
+This would cause the commits between `v2.5` included and `v2.6` included
+to be skipped.
+
 
 Cutting down bisection by giving more parameters to bisect start
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 You can further cut down the number of trials, if you know what part of
 the tree is involved in the problem you are tracking down, by specifying
-path parameters when issuing the `bisect start` command, like this:
+path parameters when issuing the `bisect start` command:
 
 ------------
 $ git bisect start -- arch/i386 include/asm-i386
-- 
1.6.2.1

^ permalink raw reply related

* [PATCH 2/5] git-repack.sh: don't use --kept-pack-only option to pack-objects
From: Brandon Casey @ 2009-03-20  3:47 UTC (permalink / raw)
  To: gitster; +Cc: git, drafnel
In-Reply-To: <t_s5aa51o2kq_ePRWgLTEpak5ue1ZM7YICzIF-RsnmN68psiOC0Tnz9bsH5tTxgVEU0bxG-OtJ8@cipher.nrlssc.navy.mil>

The --kept-pack-only option to pack-objects treats all kept packs as equal.
This results in objects that reside in an alternate pack that has a .keep
file, not being packed into a newly created pack when the user specifies the
-a option to repack.  Since the user may not have any control over the
alternate database, git should not refrain from repacking those objects
even though they are in a pack with a .keep file.

This fixes the 'packed obs in alternate ODB kept pack are repacked' test in
t7700.
---
 git-repack.sh     |    6 +-----
 t/t7700-repack.sh |    2 +-
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/git-repack.sh b/git-repack.sh
index 0144c2d..1782a23 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -71,11 +71,7 @@ case ",$all_into_one," in
 				existing="$existing $e"
 			fi
 		done
-		if test -n "$existing"
-		then
-			args="--kept-pack-only"
-		fi
-		if test -n "$args" -a -n "$unpack_unreachable" -a \
+		if test -n "$existing" -a -n "$unpack_unreachable" -a \
 			-n "$remove_redundant"
 		then
 			args="$args $unpack_unreachable"
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index e869995..1242c9d 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -88,7 +88,7 @@ test_expect_failure 'packed obs in alt ODB are repacked when local repo has pack
 	done
 '
 
-test_expect_failure 'packed obs in alternate ODB kept pack are repacked' '
+test_expect_success 'packed obs in alternate ODB kept pack are repacked' '
 	# swap the .keep so the commit object is in the pack with .keep
 	for p in alt_objects/pack/*.pack
 	do
-- 
1.6.2.16.geb16e

^ permalink raw reply related

* [PATCH 3/5] pack-objects: only repack or loosen objects residing in "local" packs
From: Brandon Casey @ 2009-03-20  3:47 UTC (permalink / raw)
  To: gitster; +Cc: git, drafnel
In-Reply-To: <t_s5aa51o2kq_ePRWgLTEg6KbvKii55gDA1y-1oKgx9KP4EKyrqg8sDFaph97G5MPoLgUx_vx48@cipher.nrlssc.navy.mil>

These two features were invented for use by repack when repack will delete
the local packs that have been made redundant.  The packs accessible
through alternates are not deleted by repack, so the objects contained in
them are still accessible after the local packs are deleted.  They do not
need to be repacked into the new pack or loosened.  For the case of
loosening they would immediately be deleted by the subsequent prune-packed
that is called by repack anyway.

This fixes the test
'packed unreachable obs in alternate ODB are not loosened' in t7700.
---
 builtin-pack-objects.c |    4 ++--
 t/t7700-repack.sh      |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 1c6d2c4..22d69ef 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1966,7 +1966,7 @@ static void add_objects_in_unpacked_packs(struct rev_info *revs)
 		const unsigned char *sha1;
 		struct object *o;
 
-		if (p->pack_keep)
+		if (!p->pack_local || p->pack_keep)
 			continue;
 		if (open_pack_index(p))
 			die("cannot open pack index");
@@ -2002,7 +2002,7 @@ static void loosen_unused_packed_objects(struct rev_info *revs)
 	const unsigned char *sha1;
 
 	for (p = packed_git; p; p = p->next) {
-		if (p->pack_keep)
+		if (!p->pack_local || p->pack_keep)
 			continue;
 
 		if (open_pack_index(p))
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 1242c9d..31e6d22 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -113,7 +113,7 @@ test_expect_success 'packed obs in alternate ODB kept pack are repacked' '
 	done
 '
 
-test_expect_failure 'packed unreachable obs in alternate ODB are not loosened' '
+test_expect_success 'packed unreachable obs in alternate ODB are not loosened' '
 	rm -f alt_objects/pack/*.keep &&
 	mv .git/objects/pack/* alt_objects/pack/ &&
 	csha1=$(git rev-parse HEAD^{commit}) &&
-- 
1.6.2.16.geb16e

^ permalink raw reply related

* [PATCH 1/5] t7700-repack: add two new tests demonstrating repacking flaws
From: Brandon Casey @ 2009-03-20  3:47 UTC (permalink / raw)
  To: gitster; +Cc: git, drafnel
In-Reply-To: <t_s5aa51o2kq_ePRWgLTEkVg4HqH1dQa6_mVq4djPPG4Vxylm2hNqmx7fPC2W5AsfcXg83DYbGc@cipher.nrlssc.navy.mil>

  1) The new --kept-pack-only mechansim of rev-list/pack-objects has
     replaced --unpacked=.  This new mechansim does not operate solely on
     "local" packs now.  The result is that objects residing in an alternate
     pack which has a .keep file will not be repacked with repack -a.

     This flaw is only apparent when a commit object is the one residing in
     an alternate kept pack.

  2) The 'repack unpacked objects' and 'loosen unpacked objects' mechanisms
     of pack-objects, i.e. --keep-unreachable and --unpack-unreachable,
     now do not operate solely on local packs.  The --keep-unreachable
     option no longer has any callers, but --unpack-unreachable is used when
     repack is called with '-A -d' and the local repo has existing packs.
     In this case, objects residing in alternate, not-kept packs will be
     loosened, and then immediately deleted by repack's call to
     prune-packed.

     The test must manually call pack-objects to avoid the call to
     prune-packed that is made by repack when -d is used.
---
 t/t7700-repack.sh |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index f5682d6..e869995 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -88,5 +88,49 @@ test_expect_failure 'packed obs in alt ODB are repacked when local repo has pack
 	done
 '
 
+test_expect_failure 'packed obs in alternate ODB kept pack are repacked' '
+	# swap the .keep so the commit object is in the pack with .keep
+	for p in alt_objects/pack/*.pack
+	do
+		base_name=$(basename $p .pack)
+		if test -f alt_objects/pack/$base_name.keep
+		then
+			rm alt_objects/pack/$base_name.keep
+		else
+			touch alt_objects/pack/$base_name.keep
+		fi
+	done
+	git repack -a -d &&
+	myidx=$(ls -1 .git/objects/pack/*.idx) &&
+	test -f "$myidx" &&
+	for p in alt_objects/pack/*.idx; do
+		git verify-pack -v $p | sed -n -e "/^[0-9a-f]\{40\}/p"
+	done | while read sha1 rest; do
+		if ! ( git verify-pack -v $myidx | grep "^$sha1" ); then
+			echo "Missing object in local pack: $sha1"
+			return 1
+		fi
+	done
+'
+
+test_expect_failure 'packed unreachable obs in alternate ODB are not loosened' '
+	rm -f alt_objects/pack/*.keep &&
+	mv .git/objects/pack/* alt_objects/pack/ &&
+	csha1=$(git rev-parse HEAD^{commit}) &&
+	git reset --hard HEAD^ &&
+	sleep 1 &&
+	git reflog expire --expire=now --expire-unreachable=now --all &&
+	# The pack-objects call on the next line is equivalent to
+	# git repack -A -d without the call to prune-packed
+	git pack-objects --honor-pack-keep --non-empty --all --reflog \
+	    --unpack-unreachable </dev/null pack &&
+	rm -f .git/objects/pack/* &&
+	mv pack-* .git/objects/pack/ &&
+	test 0 = $(git verify-pack -v -- .git/objects/pack/*.idx |
+		egrep "^$csha1 " | sort | uniq | wc -l) &&
+	echo > .git/objects/info/alternates &&
+	test_must_fail git show $csha1
+'
+
 test_done
 
-- 
1.6.2.16.geb16e

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox