Git development
 help / color / mirror / Atom feed
* Re: Pasky problem with 'git init URL'
From: John Stoffel @ 2005-04-22 12:44 UTC (permalink / raw)
  To: Petr Baudis; +Cc: John Stoffel, Martin Schlemmer, GIT Mailing Lists
In-Reply-To: <20050421212648.GM7443@pasky.ji.cz>


Petr> Dear diary, on Thu, Apr 21, 2005 at 11:15:54PM CEST, I got a letter
Petr> where John Stoffel <john@stoffel.org> told me that...
>> >>>>> "Petr" == Petr Baudis <pasky@ucw.cz> writes:
>> 
Petr> Perhaps it would be useful to have some "command classes" (with at least
Petr> cg-*-(add|ls|rm)), like:
>> 
Petr> cg-branch-ls
Petr> cg-remote-rm
Petr> cg-tag-add
>> 
>> Does a standard like:
>> 
>> git <objecttype> <command> <args> [<obj> ...]

Petr> Isn't this basically what I was proposing? (Modulo the UI
Petr> changes related to git-pasky -> Cogito.)

I'm not quite upto speed on git, I was away on vacation when the list
started up and didn't catch it until just recently... 

And it is close to what you were proposing, but instead of dashes (-)
between the command and the object, I'm proposing just a space.
Actually, I'm proposing that we decide on a grammar and it's syntax
and to try and make it orthagonal and consistent.  Principal of least
surprises, etc.

Thanks,
John

^ permalink raw reply

* Re: First web interface and service API draft
From: El Draper @ 2005-04-22 12:37 UTC (permalink / raw)
  To: Christian Meder; +Cc: git
In-Reply-To: <1114166517.3233.4.camel@localhost>

Christian Meder wrote:

>Comments ? Ideas ? Other feedback ?
>
>  
>

Hi guys,

New around these parts, so be gentle :-)

I would like to suggest the idea of a SOAP interface. If we are talking 
about a true service orientated API, then a way of calling a uri and 
having it return a nice SOAP packet with the return data in it would be 
great. If we ensured compliance with web service standards, then it 
would then mean anyone could write themselves a client desktop based 
program, a web interface, or any utility command line tools (in Java, 
.net, whatever they want, and for whatever platform), that could 
communicate with the web service and retrieve relevant data. You'd then 
have a true service interface into a Git repository. Seeing as how the 
idea of returning XML has already come up, I don't think it would be a 
stretch to extend the web interface to returning web service compliant 
SOAP packets in order to return data.

Regards,
-= El =-

^ permalink raw reply

* Re: First web interface and service API draft
From: Jon Seymour @ 2005-04-22 12:27 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Christian Meder, git
In-Reply-To: <20050422121059.GB7173@pasky.ji.cz>

On 4/22/05, Petr Baudis <pasky@ucw.cz> wrote:
> Dear diary, on Fri, Apr 22, 2005 at 01:34:45PM CEST, I got a letter
> where Jon Seymour <jon.seymour@gmail.com> told me that...
> > On 4/22/05, Christian Meder <chris@absolutegiganten.org> wrote:
> > >
> > > Comments ? Ideas ? Other feedback ?
> > >
> >
> > I'd suggest serving XML rather than HTML and using client side XSLT to
> > transform it into HTML. ...
> 
> Why "rather than"? Why not "in addition to"?
> 
> You just append either .html or .xml, based on what you want.
> 

You are right - there is no good reason that an implementation should
not to support both.

>From the point of view of a specification, though, I think it would be
useful to focus on an XML content model rather than the details of one
particular HTML model - get the XML model right and you can do
whatever you like with the HTML model at any time after that.

jon.

On 4/22/05, Petr Baudis <pasky@ucw.cz> wrote:
> Dear diary, on Fri, Apr 22, 2005 at 01:34:45PM CEST, I got a letter
> where Jon Seymour <jon.seymour@gmail.com> told me that...
> > On 4/22/05, Christian Meder <chris@absolutegiganten.org> wrote:
> > >
> > > Comments ? Ideas ? Other feedback ?
> > >
> >
> > I'd suggest serving XML rather than HTML and using client side XSLT to
> > transform it into HTML. Client-side XSLT works well in IE 6 and all
> > versions of Firefox, so there is no question that it is a mature
> > technology. Provide a fall back via server transformed HTML if need
> > be, but that is trivial to do once you have the client-side XSLT
> > stylesheets.
> >
> > Serving XML is as easy as serving HTML and gives you a much more
> > flexible outcome.
> 
> Why "rather than"? Why not "in addition to"?
> 
> You just append either .html or .xml, based on what you want.
> 
> --
>                                 Petr "Pasky" Baudis
> Stuff: http://pasky.or.cz/
> C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
> 


-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply

* Re: First web interface and service API draft
From: Petr Baudis @ 2005-04-22 12:10 UTC (permalink / raw)
  To: Jon Seymour; +Cc: Christian Meder, git
In-Reply-To: <2cfc4032050422043419b578cd@mail.gmail.com>

Dear diary, on Fri, Apr 22, 2005 at 01:34:45PM CEST, I got a letter
where Jon Seymour <jon.seymour@gmail.com> told me that...
> On 4/22/05, Christian Meder <chris@absolutegiganten.org> wrote:
> >
> > Comments ? Ideas ? Other feedback ?
> > 
> 
> I'd suggest serving XML rather than HTML and using client side XSLT to
> transform it into HTML. Client-side XSLT works well in IE 6 and all
> versions of Firefox, so there is no question that it is a mature
> technology. Provide a fall back via server transformed HTML if need
> be, but that is trivial to do once you have the client-side XSLT
> stylesheets.
> 
> Serving XML is as easy as serving HTML and gives you a much more
> flexible outcome.

Why "rather than"? Why not "in addition to"?

You just append either .html or .xml, based on what you want.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: First web interface and service API draft
From: Petr Baudis @ 2005-04-22 12:10 UTC (permalink / raw)
  To: Christian Meder; +Cc: git
In-Reply-To: <1114166517.3233.4.camel@localhost>

Dear diary, on Fri, Apr 22, 2005 at 12:41:56PM CEST, I got a letter
where Christian Meder <chris@absolutegiganten.org> told me that...
> Hi,

Hi,

> /<project>
> 
> Ok. The URI should start by stating the project name
> e.g. /linux-2.6. This does bloat the URI slightly but I don't think
> that we want to have one root namespace per git archive in the long
> run. Additionally you can always put rewriting or redirecting rules at
> the root level for additional convenience when there's an obvious
> default project.
> 
> Should provide some meta data, stats, etc. if available.

I don't think this makes much sense. I think you should just apply -p1
to all the directories, and define that there should be some / page
which should contain some metadata regarding the repository you are
accessing (probably branches, tags, and such).

> -------
> /<project>/blob/<blob-sha1>
> /<project>/commit/<commit-sha1>
> 
> These are the easy ones: the web interface should be able to spit out
> the plain text data of a blob and a commit at these URIs. Users would
> be probably scripts and other downloads.
> Open questions:
> * Blob data should be probably binary ?

What do you mean by binary?

> * Should it be commit or changeset ? Linus seems to have changed
> nomenclature in the REAME

We call it commit everywhere but in the README. :-)

The "changeset" name is bad anyway. It is a commit of a complete tree
state, diff against one of its parent commits is the set of changes.

> -------
> /<project>/tree/<tree-sha1>
> 
> Tree objects are served in binary form. Primary audience are scripts,
> etc. Human beings will probably get a heart attack when they
> accidentally visit this URI.

Binary form is unusable for scripts.

Anything wrong with putting ls-tree output there?


We should also have /gitobj/<sha1> for fetching the raw git objects.

> -------
> /<project>/blob/<blob-sha1>.html
> /<project>/commit/<commit-sha1>.html
> /<project>/tree/<tree-sha1>.html
> 
> A HTML version of blob, commit and tree fully linked aimed at human
> beings.

How can I imagine an "HTML version of blob"?


> -------
> /<project>/tree/<tree-sha1>/diff/<ancestor-tree-sha1>/html
> 
> Non recursive HTML view of the objects which are contained in the diff
> fully linked with the individual HTML views.

Why not .html?

> -------
> /<project>/changelog/<time-spec>

I'd personally prefer /log/, but whatever.

For consistency, I'd stay with the plaintext output by default, .html if
requested.

And I think abusing directories for this is bad. Query string seems much
more appropriate, since this is something that changes dynamically a
lot, not a permanent resource identifier.

OTOH, I'd use

	/log/<commit>

to specify what commit to start at. It just does not make sense
otherwise, you would not know where to start.

I think the <commit> should follow the same or similar rules as Cogito
id decoding. E.g. to get latest Linus' changelog, you'd do

	/log/linus

> -------
> /<project>/changelog/<time-spec>/search/<regexp>
> 
> HTML changelog for the given <time-spec> filtered by the <regexp>.
> 
> * again plain version needed ?
> 
> ------
> /<project>/changelog/<time-spec>/search/author/<regexp>
> /<project>/changelog/<time-spec>/search/committer/<regexp>
> /<project>/changelog/<time-spec>/search/signedoffby/<regexp>
> 
> convenience wrappers for generic search restricted to these fields.

Same here. just ?author=...&committer=...&signedoffby=... etc. You can
even combine several criteria.

> ------
> 
> open questions:
> * how to generate and publish additional merge information ?

I don't understand....

> * how to generate and publish tree and blob history information ? This
> is probably expensive with git.

...this either.

> * how to represent branches ? should we code up the branches in the
> project id like linux-2.6-mm or whatever ?

See above.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: First web interface and service API draft
From: Jon Seymour @ 2005-04-22 11:34 UTC (permalink / raw)
  To: Christian Meder; +Cc: git
In-Reply-To: <1114166517.3233.4.camel@localhost>

On 4/22/05, Christian Meder <chris@absolutegiganten.org> wrote:
>
> Comments ? Ideas ? Other feedback ?
> 

I'd suggest serving XML rather than HTML and using client side XSLT to
transform it into HTML. Client-side XSLT works well in IE 6 and all
versions of Firefox, so there is no question that it is a mature
technology. Provide a fall back via server transformed HTML if need
be, but that is trivial to do once you have the client-side XSLT
stylesheets.

Serving XML is as easy as serving HTML and gives you a much more
flexible outcome.

jon.

^ permalink raw reply

* First web interface and service API draft
From: Christian Meder @ 2005-04-22 10:41 UTC (permalink / raw)
  To: git

Hi,

me again after a couple of hours of sleep ;-)

This probably gets a bit longer so if you are not interested in a web
service api or the web interface now is your chance to get off the
train.

I'm probably making a complete git of myself but that's not uncalled
for in this contxt ;-)

For those that are still with me let me start by iterating again that
I _do_ care for URIs as the primary API for web service
applications _and_ humans. I probably don't have to tell Linux people
anything about the importance to get the API right ;-)

As it's fairly early in the web service interface cycle I like to change
things around a little bit and starting to get the API straight.

The following considerations should be pretty implementation agnostic
and not specific to wit. The interface should be flexible enough to be
used as a kind of web command line.

-------
/<project>

Ok. The URI should start by stating the project name
e.g. /linux-2.6. This does bloat the URI slightly but I don't think
that we want to have one root namespace per git archive in the long
run. Additionally you can always put rewriting or redirecting rules at
the root level for additional convenience when there's an obvious
default project.

Should provide some meta data, stats, etc. if available.

-------
/<project>/blob/<blob-sha1>
/<project>/commit/<commit-sha1>

These are the easy ones: the web interface should be able to spit out
the plain text data of a blob and a commit at these URIs. Users would
be probably scripts and other downloads.
Open questions:
* Blob data should be probably binary ?
* Should it be commit or changeset ? Linus seems to have changed
nomenclature in the REAME
* If we serve the pristine commit objects we will put the email
addresses in plain sight. If we remove or change the email addresses
it's not the original commit object anymore. Thoughts ?

-------
/<project>/tree/<tree-sha1>

Tree objects are served in binary form. Primary audience are scripts,
etc. Human beings will probably get a heart attack when they
accidentally visit this URI.

-------
/<project>/blob/<blob-sha1>.html
/<project>/commit/<commit-sha1>.html
/<project>/tree/<tree-sha1>.html

A HTML version of blob, commit and tree fully linked aimed at human
beings.

-------
/<project>/tree/<tree-sha1>.tar.bz2
/<project>/tree/<tree-sha1>.tar.gz
/<project>/commit/<commit-sha1>.tar.bz2
/<project>/commit/<commit-sha1>.tar.gz

Tarballs of the specified commits or trees. Note that these can be
individual subtrees too.


-------
/<project>/tree/<tree-sha1>/diff/<ancestor-tree-sha1>

Unified plain text recursive diff of the given trees. I guess the
user could specify any two tree ids but the relevance of the results
would vary greatly ;-)
* Possibly a DOS issue
* does something like /<project>/tree/<tree-sha1>/diff/ make sense
producing a full diff from scratch ?  

-------
/<project>/tree/<tree-sha1>/diff/<ancestor-tree-sha1>/html

Non recursive HTML view of the objects which are contained in the diff
fully linked with the individual HTML views.

-------
/<project>/blob/<blob-sha1>/diff/<ancestor-sha1>

Unified plain text diff of the given blobs.
* again /<project>/blob/<blob-sha1>/diff/ sensible ?

-------
/<project>/blob/<blob-sha1>/diff/<ancestor-sha1>/html

HTML view (probably colorized) view of a single blob diff.

-------
/<project>/changelog/<time-spec>

HTML changelog for the given <time-spec>. I think valid values for
timespec should be number of days <nnn>d, number of entries <nnn> and
the keyword 'all'.

* perhaps additionally number of hours <nnn>h, number of months
  <nnn>m, number of years <nnn>y. Combinations shouldn't be allowed
* time ranges are probably overkill
* is a plain text version needed /<project>/changelog/<time-spec/plain?

-------
/<project>/changelog/<time-spec>/search/<regexp>

HTML changelog for the given <time-spec> filtered by the <regexp>.

* again plain version needed ?

------
/<project>/changelog/<time-spec>/search/author/<regexp>
/<project>/changelog/<time-spec>/search/committer/<regexp>
/<project>/changelog/<time-spec>/search/signedoffby/<regexp>

convenience wrappers for generic search restricted to these fields.

------

open questions:
* how to generate and publish additional merge information ?
* how to generate and publish tree and blob history information ? This
is probably expensive with git.
* how to represent branches ? should we code up the branches in the
project id like linux-2.6-mm or whatever ?


Comments ? Ideas ? Other feedback ?




				Christian
  
-- 
Christian Meder, email: chris@absolutegiganten.org

The Way-Seeking Mind of a tenzo is actualized 
by rolling up your sleeves.

                (Eihei Dogen Zenji)


^ permalink raw reply

* Re: [ANNOUNCE] git-pasky-0.6.3 && request for testing
From: Petr Baudis @ 2005-04-22 10:37 UTC (permalink / raw)
  To: Barry K. Nathan; +Cc: git
In-Reply-To: <20050422072437.GC8467@ip68-225-251-162.oc.oc.cox.net>

Dear diary, on Fri, Apr 22, 2005 at 09:24:37AM CEST, I got a letter
where "Barry K. Nathan" <barryn@pobox.com> told me that...
> On Fri, Apr 22, 2005 at 12:16:26AM -0700, Barry K. Nathan wrote:
> > With git-pasky 0.6.3, "git log" is unusable on my Mandrake 10.1 system.
> > Basically I get a neverending flood of these until I press 'q' to quit
> > less:
> [snip sed segmentation faults which happen with 0.6.3 but not 0.6.2]
> > I'm not sure if I have time tonight (or tomorrow) to troubleshoot this
> > further, but I'll see if I can.
> 
> I had sed-4.1.1-2mdk. I downloaded sed-4.1.4-2mdk (from Mandriva 2005
> Limited Edition) and updated to that, and the problem went away.
> 
> FWIW this is the second package I've had to update to the Mandriva 2005
> LE level (the first was mktemp). I don't mind however.

Duh, segfaulting sed! Could you please check which of the sed
invocations actually segfault for you?

Thanks,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: "GIT_INDEX_FILE" environment variable
From: Petr Baudis @ 2005-04-22 10:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <7vzmvr72j6.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Fri, Apr 22, 2005 at 08:23:41AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
>  - Further admit that to support it without core layer help,
>    what Cogito layer needs to do involves quite a lot of "yuck"
>    factor.

I actually thought that I would just walk to parent directories at the
time of invocation, to find the .git directory, then save that to
$gitdir and use that to always reference to it, setting also
GIT_INDEX_FILE etc. I basically just postponed this until I have some
kind of library or something, and do all this stuff in a single common
init routine. I think it should be doable pretty well in Cogito alone,
but of course I won't mind if someone does it in git core. ;-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: proposal: delta based git archival
From: Jaime Medrano @ 2005-04-22  9:49 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: git
In-Reply-To: <20050422090341.GC22479@zoy.org>

On 4/22/05, Michel Lespinasse <walken@zoy.org> wrote:
> I noticed people on this mailing list start talking about using blob deltas
> for compression, and the basic issue that the resulting files are too small
> for efficient filesystem storage. I thought about this a little and decided
> I should send out my ideas for discussion.
> 

I've been thinking in another simpler approach.

The main benefit of using deltas is reducing the bandwith use in
pull/push. My idea is leaving the blob storage as it is by now and
adding a new kind of object (remote) that acts as a link to an object
in another repository.

So that, when you rsync, you don't have to get all the blobs (which
can be a lot of data), but only the sha1 of the new objects created.
Then a remote object is created for each new object in the local
repository pointing to its location in the external repository.

Once the rsync is done, when git has to access any of the new objects
they can be fetched from the original location, so that only necessary
objects are transfered.

This way, the cost of a sync in terms of bandwith is nearly zero.

I've been working on this, so if you think it to be a good idea, I can
send a patch when I get it fully working.

Regards,
Jaime Medrano.
http://jmedrano.sl-form.com

^ permalink raw reply

* Re: [PATCH] multi item packed files
From: Krzysztof Halasa @ 2005-04-22  9:48 UTC (permalink / raw)
  To: Chris Mason; +Cc: Linus Torvalds, git
In-Reply-To: <200504211622.48065.mason@suse.com>

Chris Mason <mason@suse.com> writes:

> Shrug, we shouldn't need help from the kernel for something like this.
>  git as 
> a database hits worst case scenarios for almost every FS.

Not sure.

> 1) subdirectories with lots of files

Correct. But git doesn't search dirs so it's not that bad.

> 2) wasted space for tiny files

... depends on block size. With 2 KB:

defiant:~$ du -s /pub/mirror/linux-2.6.git
88366   /pub/mirror/linux-2.6.git
defiant:~$ du -s --apparent-size /pub/mirror/linux-2.6.git
63400   /pub/mirror/linux-2.6.git

Not bad, is it?

> 3) files that are likely to be accessed together spread across the whole disk

... across the whole filesystem.

Well, probably it isn't best to have git and .iso archives on the same
filesystem.
-- 
Krzysztof Halasa

^ permalink raw reply

* Re: [PATCH] multi item packed files
From: Krzysztof Halasa @ 2005-04-22  9:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Mason, git
In-Reply-To: <Pine.LNX.4.58.0504211301240.2344@ppc970.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> And dammit, if I'm the original author and likely biggest power-user, and 
> _I_ can't be bothered to use special filesystems, then who can? Nobody.

If someone is motivated enough, and if the task is quite trivial (as it
seems to be) someone may try it. I can see nothing wrong with it as long
as it doesn't affect other people.

> This is why I absolutely do not believe in arguments like "if your
> filesystem doesn't do tail packing, you shouldn't use it" or "if your
> don't have name hashing enabled in your filesystem it's broken".

Of course. But one may consider using a filesystem with, say, different
settings. Or a special filesystem for this task, such as CNFS used by
news servers (it seems news servers do quite the same what git does,
except they also purge old contents, i.e., container files don't grow up).

> I'm perfectly willing to optimize for the common case, but that's as far 
> as it goes. I do not want to make fundamental design decisions that depend 
> on the target filesystem having some particular feature.

The optimization would be (in) the underlying filesystem (i.e., the OS
thing, or possibly a shared preloaded library?), not git itself.
-- 
Krzysztof Halasa

^ permalink raw reply

* Re: "GIT_INDEX_FILE" environment variable
From: Zach Welch @ 2005-04-22  9:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504212200400.2344@ppc970.osdl.org>

Howdy,

Linus Torvalds wrote:
> On Thu, 21 Apr 2005, Junio C Hamano wrote: 
>>I am thinking about an alternative way of doing the above by
>>some modifications to the git core.  I think the root of this
>>problem is that there is no equivalent to GIT_INDEX_FILE and
>>SHA1_FILE_DIRECTORY that tells the core git where the project
>>top directory (i.e. the root of the working tree that
>>corresponds to what $GIT_INDEX_FILE describes) is.
> 
> I'd _really_ prefer to just try to teach people to work from the "top" 
> directory instead.

Would it be okay if that were settable on a per-repository basis? :)
Or do you have specific subset of operations you want restricted?

>> - A new environment variable GIT_WORKING_TREE points at the
>>   root of the working tree.
[snip]
> I really don't like it that much, but to some degree it obviously is
> exactly what "--prefix=" does to checkout-cache. It's basically saying 
> that all normal file operations have to be prefixed with a magic string. 

I'm going to script it one way or the other, but the environment route
allows me to set things up after a fork and before exec in Perl. This
works regardless of what git command I'm running, and should work even
with ithreads. This ease of use would not be the case with the
'--prefix' solution, as scripting the commands would requiring passing
arguments to those commands that need/support them at a higher level
than is desirable.

At present, I have implemented Yogi to support being able to run
commands from a different working directory than the root of the
repository, and that behavior might be per-repository settable
(someday). If I had my way, I would like to see git support the
following variables:

  GIT_WORKING_DIRECTORY   - default to '.'
  GIT_CACHE_DIRECTORTY    - default to ${GIT_WORKING_DIRECTORY}/.git
  GIT_OBJECT_DIRECTORY    - defaults to ${GIT_CACHE_DIRECTORY}/objects

The reasoning is simple: One object repository can be shared among
numerous working caches, which can be shared among multiple working
directories (e.g. any directories under the project root, but maybe also
import/exports, or other magic...). There are two layers of one to many
relationships between the three classes of directories, and my scripts
want to make use of that flexibility to the hilt.

Also, do you really think git will only ever have the index file, and
not someday possibly other related bits? (You may have said that
elsewhere, but I missed it.) If that's ever the case, the directory
variable is the way to go; scripts can be forward compatible and won't
risk accidentally mingling repository data when their scripts have only
set GIT_INDEX_FILE and not GIT_SOME_OTHER_FILE.

That said, I think GIT_INDEX_FILE would supplement the above scheme
nicely, overriding a default of ${GIT_CACHE_DIRECTORY}/index, because of
use cases you've described.

Cheers,

Zach

^ permalink raw reply

* Re: proposal: delta based git archival
From: Jeffrey E. Hundstad @ 2005-04-22  9:12 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: git
In-Reply-To: <20050422090341.GC22479@zoy.org>

Michel Lespinasse wrote:

>Does this sound insane ? Too complicated maybe ?
>  
>
My vote is YES on both counts.

Simplicity and flexibility is what makes git a good thing; and imho this 
works against that quite aggressively.

-- 
Jeffrey Hundstad


^ permalink raw reply

* proposal: delta based git archival
From: Michel Lespinasse @ 2005-04-22  9:03 UTC (permalink / raw)
  To: git

I noticed people on this mailing list start talking about using blob deltas
for compression, and the basic issue that the resulting files are too small
for efficient filesystem storage. I thought about this a little and decided
I should send out my ideas for discussion.

In my proposal, the current git object storage model (one compressed object
per file) remains as the primary storage mechanism, however there would be
some kind of backup mechanism based on multiple deltas grouped in one file.

For example, suppose you're looking for an object with a hash of
eab75ce51622aa312bb0b03572d43769f420c347

First you'd look at .git/objects/ea/b75ce51622aa312bb0b03572d43769f420c347 -
if the file exists, that's your object.

If the file does not exist, you'd then look for .git/deltas/ea/b,
.git/deltas/ea/b7, .git/deltas/ea/b75, .git/deltas/ea/b75c, ...
up to some maximum search path lenght. You stop at the first file you can
find.

Supposing that file is .git/deltas/ea/b7, it would contain a diff
(let's assume unified format for now, though ideally it'd be better to
have something that allows binary file deltas too) of many archived
objects with hashes starting with eab7, compared to a different object
(presumably some direct or indirect ancestor):

diff -u 8f5ba0203e31204c5c052d995a5b4449226bcfb5 eab75ce51622aa312bb0b03572d43769f420c347
--- 8f5ba0203e31204c5c052d995a5b4449226bcfb5
+++ eab75ce51622aa312bb0b03572d43769f420c347
@@ -522,7 +522,7 @@
....
diff -u 77dc2cb94930017f62b55b9706cbadda8c90f650 eab71c51dbc62797d6c903203de44cc6a734c05c
--- 77dc2cb94930017f62b55b9706cbadda8c90f650
+++ eab71c51dbc62797d6c903203de44cc6a734c05c
@@ -560,13 +563,17 @@
...

Based on this delta file, we'd then look for the object
8f5ba0203e31204c5c052d995a5b4449226bcfb5 (this process could require
recursively rebuilding that object) and try to build
eab75ce51622aa312bb0b03572d43769f420c347 by applying the delta and then
double checking the hash.

To me the strenghts of this proposal would be:
* It does not muddy the git object model - it just acts independently of it,
  as a way to rebuild git objects from deltas
* Old objects can be compressed by creating a delta with a close ancestor,
  then erasing the original file storage for that object. The object delta
  can be appended to an existing delta file (which avoids the small-file
  storage issue), or if the delta file gets too big, it can be split off
  into 16 smaller files based on the hashes of the objects this file stores
  deltas for.
* The system is flexible enough to explore different delta
  strategies. For example one could decide to keep one object every 10
  in the database and store other 9 as deltas based on the immediate
  object ancestor, or any other tradeoff - and the system would still
  work the same (with different performance tradeoffs though).

Does this sound insane ? Too complicated maybe ?

Is there any kind of semi-standard binary-capable multiple-file diff format
that could be used for this application instead of unified diffs ?

-- 
Michel "Walken" Lespinasse
"Bill Gates is a monocle and a Persian cat away from being the villain
in a James Bond movie." -- Dennis Miller

^ permalink raw reply

* [RFC] Is there a need for binary bit in cache/tree entries to properly support Cygwin builds of GIT?
From: Jon Seymour @ 2005-04-22  8:53 UTC (permalink / raw)
  To: lode leroy; +Cc: git
In-Reply-To: <BAY22-F35C035C6AE6B9A45CDEF61FF2D0@phx.gbl>

On 4/22/05, lode leroy <lode_leroy@hotmail.com> wrote:
> I wonder if anyone is interested in using git on windows / cygwin.
> It almost compiles out of the box... just this one little thinggy
> that's glibc-specific (struct dirent . d_type)
> 
 
I wonder if a cygwin compile of GIT should be forced to strip CR's
from text files prior to checksum calculations and blob storage.
Otherwise, spurious differences may be introduced into text files that
are somehow munged while checked out in a Windows environment.

There is an argument that this should be done external to the GIT
core, but then every external non-unix tool that interacts with GIT
has to have heuristics to distinguish text from binary and they all
have to have the same heuristics.

So, perhaps there is an argument for using one of the unused "mode"
bits to encode a binary flag and add an option to update-cache that
allows the bit to be flipped if a blob is known to be binary. A cygwin
GIT binary could then be forced to strip CR's from blobs marked as
text, but a unix binary need not change its behaviour.

Regards,

jon.

^ permalink raw reply

* (nearly trivial) patch to compile git on cygwin
From: lode leroy @ 2005-04-22  8:09 UTC (permalink / raw)
  To: git

I wonder if anyone is interested in using git on windows / cygwin.
It almost compiles out of the box... just this one little thinggy
that's glibc-specific (struct dirent . d_type)

~/pkg $ diff -bruw git-0.6 git-0.6-cyg | grep -v ^Only
diff -bruw git-0.6/Makefile git-0.6-cyg/Makefile
--- git-0.6/Makefile    2005-04-21 19:58:47.000000000 +0200
+++ git-0.6-cyg/Makefile        2005-04-22 09:28:54.259531200 +0200
@@ -30,7 +30,7 @@
$(LIB_FILE): $(LIB_OBJS)
        $(AR) rcs $@ $(LIB_OBJS)

-LIBS= $(LIB_FILE) -lssl -lz
+LIBS= $(LIB_FILE) -lssl -lz -lcrypto

init-db: init-db.o

diff -bruw git-0.6/show-files.c git-0.6-cyg/show-files.c
--- git-0.6/show-files.c        2005-04-21 19:58:47.000000000 +0200
+++ git-0.6-cyg/show-files.c    2005-04-22 10:03:04.227240000 +0200
@@ -61,26 +61,33 @@
                                continue;
                        len = strlen(de->d_name);
                        memcpy(fullname + baselen, de->d_name, len+1);

+#ifdef DT_DIR
                        switch (de->d_type) {
+#endif
                        struct stat st;
+#ifdef DT_DIR
                        default:
                                continue;
                        case DT_UNKNOWN:
+#endif
                                if (lstat(fullname, &st))
                                        continue;
                                if (S_ISREG(st.st_mode))
                                        break;
                                if (!S_ISDIR(st.st_mode))
                                        continue;
+#ifdef DT_DIR
                                /* fallthrough */
                        case DT_DIR:
+#endif
                                memcpy(fullname + baselen + len, "/", 2);
                                read_directory(fullname, fullname, baselen + 
len + 1);
                                continue;
+#ifdef DT_DIR
                        case DT_REG:
                                break;
                        }
+#endif
                        add_name(fullname, baselen + len);
                }
                closedir(dir);
~/pkg $

_________________________________________________________________



^ permalink raw reply

* Re: [PATCH] #!/bin/sh --> #!/usr/bin/env bash
From: H. Peter Anvin @ 2005-04-22  7:37 UTC (permalink / raw)
  To: Alecs King; +Cc: git
In-Reply-To: <20050421194255.GA8479@alc.bsd.st>

Alecs King wrote:
> 
> And as for bash, only gitdiff-do and gitlog.sh 'explicitly' use bash
> instead of /bin/sh.  On most Linux distros, /bin/sh is just a symbolic
> link to bash.  But not on some others.  I found gitlsobj.sh could not
> work using a plain /bin/sh on fbsd.  To make life easier, i think it
> might be better if we all explicitly use bash for all shell scripts.
> 

How about #!/bin/bash (build from .in files if you feel it necessary to 
support systems which don't have bash in /bin) instead of doubling the 
number of execs?

	-hpa

^ permalink raw reply

* Re: Mozilla SHA1 implementation
From: Paul Mackerras @ 2005-04-22  7:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Edgar Toernig, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504211238150.2344@ppc970.osdl.org>

Linus Torvalds writes:

> Interestingly, the Mozilla SHA1 code is about twice as fast as the openssl
> code on my G5, and judging by the disassembly, it's because it's much
> simpler. I think the openssl people have unrolled all the loops totally,
> which tends to be a disaster on any half-way modern CPU. But hey, it could
> be something as simple as optimization flags too.

Which gcc version are you using?

I get the opposite result on my 2GHz G5: the Mozilla version does
45MB/s, the openssl version does 135MB/s, and my version does 218MB/s.
The time for a fsck-cache on a linux-2.6 tree (cache hot) is 8.0
seconds for the Mozilla version, 5.2 seconds for the openssl version,
and 4.4 seconds for my version.

Paul.

^ permalink raw reply

* Re: [ANNOUNCE] git-pasky-0.6.3 && request for testing
From: Barry K. Nathan @ 2005-04-22  7:24 UTC (permalink / raw)
  To: Barry K. Nathan; +Cc: Petr Baudis, git
In-Reply-To: <20050422071626.GB8467@ip68-225-251-162.oc.oc.cox.net>

On Fri, Apr 22, 2005 at 12:16:26AM -0700, Barry K. Nathan wrote:
> With git-pasky 0.6.3, "git log" is unusable on my Mandrake 10.1 system.
> Basically I get a neverending flood of these until I press 'q' to quit
> less:
[snip sed segmentation faults which happen with 0.6.3 but not 0.6.2]
> I'm not sure if I have time tonight (or tomorrow) to troubleshoot this
> further, but I'll see if I can.

I had sed-4.1.1-2mdk. I downloaded sed-4.1.4-2mdk (from Mandriva 2005
Limited Edition) and updated to that, and the problem went away.

FWIW this is the second package I've had to update to the Mandriva 2005
LE level (the first was mktemp). I don't mind however.

-Barry K. Nathan <barryn@pobox.com>


^ permalink raw reply

* Re: [ANNOUNCE] git-pasky-0.6.3 && request for testing
From: Barry K. Nathan @ 2005-04-22  7:16 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050422030931.GA14565@pasky.ji.cz>

With git-pasky 0.6.3, "git log" is unusable on my Mandrake 10.1 system.
Basically I get a neverending flood of these until I press 'q' to quit
less:

/home/barryn/softbag/git-pasky-0.6.3/gitlog.sh: line 73:  7598 Segmentation faul
t      sed -re '
                                        / *Signed-off-by.*/Is//'$colsignoff'&'$c
oldefault'/
                                        s/^/    /
                                '
/home/barryn/softbag/git-pasky-0.6.3/gitlog.sh: line 73:  7609 Segmentation faul
t      sed -re '
                                        / *Signed-off-by.*/Is//'$colsignoff'&'$c
oldefault'/
                                        s/^/    /
                                '
/home/barryn/softbag/git-pasky-0.6.3/gitlog.sh: line 73:  7620 Segmentation faul
t      sed -re '
                                        / *Signed-off-by.*/Is//'$colsignoff'&'$c
oldefault'/
                                        s/^/    /
                                '

git-pasky-0.6.2 works fine.

I'm not sure if I have time tonight (or tomorrow) to troubleshoot this
further, but I'll see if I can.

-Barry K. Nathan <barryn@pobox.com>


^ permalink raw reply

* Re: [ANNOUNCE] git-pasky-0.6.3 && request for testing
From: Greg KH @ 2005-04-22  6:49 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050422030931.GA14565@pasky.ji.cz>

On Fri, Apr 22, 2005 at 05:09:31AM +0200, Petr Baudis wrote:
>   Hello,
> 
>   FYI, I've released git-pasky-0.6.3 earlier in the night.

Hm, fun thing to try:
	go into a kernel git tree.
	rm Makefile
	git diff

Watch it as it thinks that every Makefile in the kernel tree is now
gone...

thanks,

greg k-h

^ permalink raw reply

* Re: Mozilla SHA1 implementation
From: Paul Mackerras @ 2005-04-22  6:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504211238150.2344@ppc970.osdl.org>

Linus Torvalds writes:

> I've just integrated the Mozilla SHA1 library implementation that Adgar
> Toernig sent me into the standard git archive (but I did the integration
> differently).

Here is a new PPC SHA1 patch that integrates better with this...

> Interestingly, the Mozilla SHA1 code is about twice as fast as the openssl
> code on my G5, and judging by the disassembly, it's because it's much
> simpler. I think the openssl people have unrolled all the loops totally,
> which tends to be a disaster on any half-way modern CPU. But hey, it could
> be something as simple as optimization flags too.

Very interesting.  On my G4 powerbook (since I am at LCA), for a
fsck-cache on a linux-2.6 tree, it takes 6.6 seconds with the openssl
SHA1, 10.7 seconds with the Mozilla SHA1, and ~5.8 seconds with my
SHA1.  I'll test it on a G5 tonight, hopefully.

Paul.

diff -urN git.orig/Makefile git/Makefile
--- git.orig/Makefile	2005-04-22 16:23:44.000000000 +1000
+++ git/Makefile	2005-04-22 16:43:31.000000000 +1000
@@ -34,9 +34,14 @@
   SHA1_HEADER="mozilla-sha1/sha1.h"
   LIB_OBJS += mozilla-sha1/sha1.o
 else
+ifdef PPC_SHA1
+  SHA1_HEADER="ppc/sha1.h"
+  LIB_OBJS += ppc/sha1.o ppc/sha1ppc.o
+else
   SHA1_HEADER=<openssl/sha.h>
   LIBS += -lssl
 endif
+endif
 
 CFLAGS += '-DSHA1_HEADER=$(SHA1_HEADER)'
 
@@ -77,7 +82,7 @@
 write-tree.o: $(LIB_H)
 
 clean:
-	rm -f *.o mozilla-sha1/*.o $(PROG) $(LIB_FILE)
+	rm -f *.o mozilla-sha1/*.o ppc/*.o $(PROG) $(LIB_FILE)
 
 backup: clean
 	cd .. ; tar czvf dircache.tar.gz dir-cache
diff -urN git.orig/ppc/sha1.c git/ppc/sha1.c
--- /dev/null	2005-04-04 12:56:19.000000000 +1000
+++ git/ppc/sha1.c	2005-04-22 16:29:19.000000000 +1000
@@ -0,0 +1,72 @@
+/*
+ * SHA-1 implementation.
+ *
+ * Copyright (C) 2005 Paul Mackerras <paulus@samba.org>
+ *
+ * This version assumes we are running on a big-endian machine.
+ * It calls an external sha1_core() to process blocks of 64 bytes.
+ */
+#include <stdio.h>
+#include <string.h>
+#include "sha1.h"
+
+extern void sha1_core(uint32_t *hash, const unsigned char *p,
+		      unsigned int nblocks);
+
+int SHA1_Init(SHA_CTX *c)
+{
+	c->hash[0] = 0x67452301;
+	c->hash[1] = 0xEFCDAB89;
+	c->hash[2] = 0x98BADCFE;
+	c->hash[3] = 0x10325476;
+	c->hash[4] = 0xC3D2E1F0;
+	c->len = 0;
+	c->cnt = 0;
+	return 0;
+}
+
+int SHA1_Update(SHA_CTX *c, const void *ptr, unsigned long n)
+{
+	unsigned long nb;
+	const unsigned char *p = ptr;
+
+	c->len += n << 3;
+	while (n != 0) {
+		if (c->cnt || n < 64) {
+			nb = 64 - c->cnt;
+			if (nb > n)
+				nb = n;
+			memcpy(&c->buf.b[c->cnt], p, nb);
+			if ((c->cnt += nb) == 64) {
+				sha1_core(c->hash, c->buf.b, 1);
+				c->cnt = 0;
+			}
+		} else {
+			nb = n >> 6;
+			sha1_core(c->hash, p, nb);
+			nb <<= 6;
+		}
+		n -= nb;
+		p += nb;
+	}
+	return 0;
+}	
+
+int SHA1_Final(unsigned char *hash, SHA_CTX *c)
+{
+	unsigned int cnt = c->cnt;
+
+	c->buf.b[cnt++] = 0x80;
+	if (cnt > 56) {
+		if (cnt < 64)
+			memset(&c->buf.b[cnt], 0, 64 - cnt);
+		sha1_core(c->hash, c->buf.b, 1);
+		cnt = 0;
+	}
+	if (cnt < 56)
+		memset(&c->buf.b[cnt], 0, 56 - cnt);
+	c->buf.l[7] = c->len;
+	sha1_core(c->hash, c->buf.b, 1);
+	memcpy(hash, c->hash, 20);
+	return 0;
+}
diff -urN git.orig/ppc/sha1.h git/ppc/sha1.h
--- /dev/null	2005-04-04 12:56:19.000000000 +1000
+++ git/ppc/sha1.h	2005-04-22 16:45:28.000000000 +1000
@@ -0,0 +1,20 @@
+/*
+ * SHA-1 implementation.
+ *
+ * Copyright (C) 2005 Paul Mackerras <paulus@samba.org>
+ */
+#include <stdint.h>
+
+typedef struct sha_context {
+	uint32_t hash[5];
+	uint32_t cnt;
+	uint64_t len;
+	union {
+		unsigned char b[64];
+		uint64_t l[8];
+	} buf;
+} SHA_CTX;
+
+int SHA1_Init(SHA_CTX *c);
+int SHA1_Update(SHA_CTX *c, const void *p, unsigned long n);
+int SHA1_Final(unsigned char *hash, SHA_CTX *c);
diff -urN git.orig/ppc/sha1ppc.S git/ppc/sha1ppc.S
--- /dev/null	2005-04-04 12:56:19.000000000 +1000
+++ git/ppc/sha1ppc.S	2005-04-22 16:29:19.000000000 +1000
@@ -0,0 +1,185 @@
+/*
+ * SHA-1 implementation for PowerPC.
+ *
+ * Copyright (C) 2005 Paul Mackerras.
+ */
+#define FS	80
+
+/*
+ * We roll the registers for T, A, B, C, D, E around on each
+ * iteration; T on iteration t is A on iteration t+1, and so on.
+ * We use registers 7 - 12 for this.
+ */
+#define RT(t)	((((t)+5)%6)+7)
+#define RA(t)	((((t)+4)%6)+7)
+#define RB(t)	((((t)+3)%6)+7)
+#define RC(t)	((((t)+2)%6)+7)
+#define RD(t)	((((t)+1)%6)+7)
+#define RE(t)	((((t)+0)%6)+7)
+
+/* We use registers 16 - 31 for the W values */
+#define W(t)	(((t)%16)+16)
+
+#define STEPD0(t)				\
+	and	%r6,RB(t),RC(t);		\
+	andc	%r0,RD(t),RB(t);		\
+	rotlwi	RT(t),RA(t),5;			\
+	rotlwi	RB(t),RB(t),30;			\
+	or	%r6,%r6,%r0;			\
+	add	%r0,RE(t),%r15;			\
+	add	RT(t),RT(t),%r6;		\
+	add	%r0,%r0,W(t);			\
+	add	RT(t),RT(t),%r0
+
+#define STEPD1(t)				\
+	xor	%r6,RB(t),RC(t);		\
+	rotlwi	RT(t),RA(t),5;			\
+	rotlwi	RB(t),RB(t),30;			\
+	xor	%r6,%r6,RD(t);			\
+	add	%r0,RE(t),%r15;			\
+	add	RT(t),RT(t),%r6;		\
+	add	%r0,%r0,W(t);			\
+	add	RT(t),RT(t),%r0
+
+#define STEPD2(t)				\
+	and	%r6,RB(t),RC(t);		\
+	and	%r0,RB(t),RD(t);		\
+	rotlwi	RT(t),RA(t),5;			\
+	rotlwi	RB(t),RB(t),30;			\
+	or	%r6,%r6,%r0;			\
+	and	%r0,RC(t),RD(t);		\
+	or	%r6,%r6,%r0;			\
+	add	%r0,RE(t),%r15;			\
+	add	RT(t),RT(t),%r6;		\
+	add	%r0,%r0,W(t);			\
+	add	RT(t),RT(t),%r0
+
+#define LOADW(t)				\
+	lwz	W(t),(t)*4(%r4)
+
+#define UPDATEW(t)				\
+	xor	%r0,W((t)-3),W((t)-8);		\
+	xor	W(t),W((t)-16),W((t)-14);	\
+	xor	W(t),W(t),%r0;			\
+	rotlwi	W(t),W(t),1
+
+#define STEP0LD4(t)				\
+	STEPD0(t);   LOADW((t)+4);		\
+	STEPD0((t)+1); LOADW((t)+5);		\
+	STEPD0((t)+2); LOADW((t)+6);		\
+	STEPD0((t)+3); LOADW((t)+7)
+
+#define STEPUP4(t, fn)				\
+	STEP##fn(t);   UPDATEW((t)+4);		\
+	STEP##fn((t)+1); UPDATEW((t)+5);	\
+	STEP##fn((t)+2); UPDATEW((t)+6);	\
+	STEP##fn((t)+3); UPDATEW((t)+7)
+
+#define STEPUP20(t, fn)				\
+	STEPUP4(t, fn);				\
+	STEPUP4((t)+4, fn);			\
+	STEPUP4((t)+8, fn);			\
+	STEPUP4((t)+12, fn);			\
+	STEPUP4((t)+16, fn)
+
+	.globl	sha1_core
+sha1_core:
+	stwu	%r1,-FS(%r1)
+	stw	%r15,FS-68(%r1)
+	stw	%r16,FS-64(%r1)
+	stw	%r17,FS-60(%r1)
+	stw	%r18,FS-56(%r1)
+	stw	%r19,FS-52(%r1)
+	stw	%r20,FS-48(%r1)
+	stw	%r21,FS-44(%r1)
+	stw	%r22,FS-40(%r1)
+	stw	%r23,FS-36(%r1)
+	stw	%r24,FS-32(%r1)
+	stw	%r25,FS-28(%r1)
+	stw	%r26,FS-24(%r1)
+	stw	%r27,FS-20(%r1)
+	stw	%r28,FS-16(%r1)
+	stw	%r29,FS-12(%r1)
+	stw	%r30,FS-8(%r1)
+	stw	%r31,FS-4(%r1)
+
+	/* Load up A - E */
+	lwz	RA(0),0(%r3)	/* A */
+	lwz	RB(0),4(%r3)	/* B */
+	lwz	RC(0),8(%r3)	/* C */
+	lwz	RD(0),12(%r3)	/* D */
+	lwz	RE(0),16(%r3)	/* E */
+
+	mtctr	%r5
+
+1:	LOADW(0)
+	LOADW(1)
+	LOADW(2)
+	LOADW(3)
+
+	lis	%r15,0x5a82	/* K0-19 */
+	ori	%r15,%r15,0x7999
+	STEP0LD4(0)
+	STEP0LD4(4)
+	STEP0LD4(8)
+	STEPUP4(12, D0)
+	STEPUP4(16, D0)
+
+	lis	%r15,0x6ed9	/* K20-39 */
+	ori	%r15,%r15,0xeba1
+	STEPUP20(20, D1)
+
+	lis	%r15,0x8f1b	/* K40-59 */
+	ori	%r15,%r15,0xbcdc
+	STEPUP20(40, D2)
+
+	lis	%r15,0xca62	/* K60-79 */
+	ori	%r15,%r15,0xc1d6
+	STEPUP4(60, D1)
+	STEPUP4(64, D1)
+	STEPUP4(68, D1)
+	STEPUP4(72, D1)
+	STEPD1(76)
+	STEPD1(77)
+	STEPD1(78)
+	STEPD1(79)
+
+	lwz	%r20,16(%r3)
+	lwz	%r19,12(%r3)
+	lwz	%r18,8(%r3)
+	lwz	%r17,4(%r3)
+	lwz	%r16,0(%r3)
+	add	%r20,RE(80),%r20
+	add	RD(0),RD(80),%r19
+	add	RC(0),RC(80),%r18
+	add	RB(0),RB(80),%r17
+	add	RA(0),RA(80),%r16
+	mr	RE(0),%r20
+	stw	RA(0),0(%r3)
+	stw	RB(0),4(%r3)
+	stw	RC(0),8(%r3)
+	stw	RD(0),12(%r3)
+	stw	RE(0),16(%r3)
+
+	addi	%r4,%r4,64
+	bdnz	1b
+
+	lwz	%r15,FS-68(%r1)
+	lwz	%r16,FS-64(%r1)
+	lwz	%r17,FS-60(%r1)
+	lwz	%r18,FS-56(%r1)
+	lwz	%r19,FS-52(%r1)
+	lwz	%r20,FS-48(%r1)
+	lwz	%r21,FS-44(%r1)
+	lwz	%r22,FS-40(%r1)
+	lwz	%r23,FS-36(%r1)
+	lwz	%r24,FS-32(%r1)
+	lwz	%r25,FS-28(%r1)
+	lwz	%r26,FS-24(%r1)
+	lwz	%r27,FS-20(%r1)
+	lwz	%r28,FS-16(%r1)
+	lwz	%r29,FS-12(%r1)
+	lwz	%r30,FS-8(%r1)
+	lwz	%r31,FS-4(%r1)
+	addi	%r1,%r1,FS
+	blr

^ permalink raw reply

* Re: "GIT_INDEX_FILE" environment variable
From: Junio C Hamano @ 2005-04-22  6:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504212200400.2344@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> I'd _really_ prefer to just try to teach people to work from
LT> the "top" directory instead.

I share the sentiment, but I do not think that is an option.
There are three possibilities:

 - Train people to always work from the top and never support
   working in subdirectory at any layer.

 - Admit that people cannot be trained, and support it at Cogito
   layer.

 - Further admit that to support it without core layer help,
   what Cogito layer needs to do involves quite a lot of "yuck"
   factor.

For somebody whose primary concern is to pull the whole tree
from outside and watch out for merge conflicts, always working
from the top may be a practical option.  But you also have to
consider that the people who actually feed those whole trees to
you probably do most of their work in their subdirectories.  You
would want to make life easier for them in order for you to get
high-quality results from them.

I initially thought that the third one in the above list was the
case, and that's why I asked.  After reviewing the core layer to
see the extent of the damage the proposed change would cause, to
my surprise, it turns out that it is not all that bad.  It
probably is not surprising to you because of the way you
designed things --- doing as much as possible in the dircache,
and avoiding looking at the working tree.

The commands I would want to take paths relative to the user cwd
are quite limited; note that I just want these available to the
user and I do not care which one, the core or Cogito, groks the
cwd relative paths:

  check-files paths...
  show-diff [-R] [-q] [-s] [-z] [paths...]
  update-cache [--add] [--remove] [--refresh]
      [--cacheinfo mode blob-id] paths...

The only parameters that needs $R prefixing are the "paths..."
above.  I think the wrapper layer can manage without the help
from the core layer for these small number of commands using the
workaround I outlined in my previous message.

In addition, there is another one that looks at the working
tree:

  diff-cache [-z] [-r] [--cached] tree-id

But this one is even easier.  The wrapper layer needs to figure
out the project top, chdir to it and run the underlying
diff-cache there.

LT> I really don't like it that much, but to some degree it
LT> obviously is exactly what "--prefix=" does to
LT> checkout-cache. It's basically saying that all normal file
LT> operations have to be prefixed with a magic string.

More or less so.  I actually was thinking about going a bit more
than just prefix, and normalizing paths in the core layer, in
order to get something like the following operate sensibly:

  $ find . -type f | xargs update-cache
  $ cd mozilla-sha1 && show-diff ../*.h

But this may be going a bit overboard.

LT> And git really doesn't do too many of those, so maybe it's
LT> ok. What would the patch look like? I don't really love the
LT> idea, but if the patch is clean enough...

Please forget this one for a bit.  I'm attacking this from both
fronts.

Core changes supporting the "project root" notion is what we are
discussing here.  As I said, I do not think it would be a huge
change as I feared initially, but after the initial "let's get
the list of commands and analyze how they use the paths" phase,
I have backburnered this approach, at least for now.  Working
around in the wrapper layer without core support seems to be a
viable option, especially now I know that what needs to be
wrapped are not that many, and that is what I've been looking
at this evening.

For your amusement, eh, rather, to test your "yuck" tolerance
;-), I've attached two scripts.  jit-find-index is a helper
script for wrappers.  It finds the project root and computes $R
prefix; the wrappers call it and eval its result.
jit-update-cache is a wrapper to run update-cache inside of
subdirectory.  This is the worst example among the four wrappers.

Not-Signed-off-yet-by: Junio C Hamano <junkio@cox.net>
---

 jit-find-index   |   60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 jit-update-cache |   23 +++++++++++++++++++++
 2 files changed, 83 insertions(+)

--- /dev/null	2005-03-19 15:28:25.000000000 -0800
+++ jit-find-index	2005-04-21 22:59:55.000000000 -0700
@@ -0,0 +1,60 @@
+#!/bin/sh
+
+sq=s/\'/\''\\'\'\'/g ;# see sq-expand in show-diff.c
+
+lookfor_index=${GIT_INDEX_FILE-.git/index}
+lookfor_object=${SHA1_FILE_DIRECTORY-.git/objects}
+
+index= object= project_top=
+
+# No point in looking for something specified with an absolute path.
+case "$lookfor_index" in
+/*) index="$lookfor_index" ;;
+esac
+case "$lookfor_object" in
+/*) object="$lookfor_object" ;;
+esac
+
+# Beware of symlinks.  We need to find out what the current directory
+# is called relative to the path recorded in the dircache.
+dir=${PWD-$(pwd)} cwd="$dir" down=
+
+while 
+    case "$dir" in /) break ;; esac && # we searched all.
+    case ",$index,$object,$project_top," in
+    *,,*) ;;
+    *)    break ;; # we now have all.
+    esac
+do
+    case "$index" in
+    '') test -f "$dir/$lookfor_index" &&
+	index="$dir/$lookfor_index" ;;
+    esac
+    case "$object" in
+    '') test -d "$dir/$lookfor_object" &&
+	object="$dir/$lookfor_object" ;;
+    esac
+
+    case "$project_top" in
+    '') test -d "$dir/.git" &&
+	project_top="$dir" &&
+	working_dir="$down" ;;
+    esac
+    down="$(basename "$dir")/$down"
+    dir=$(dirname "$dir")
+done
+
+if test ! -f "$index" || test ! -d "$object" || test ! -d "$project_top"
+then
+    echo >&2 \
+      "Cannot find the project top, index file, or object database."
+    echo exit 1 ;# love this!
+else
+    # Working directory relative to the project top
+
+    echo "GIT_INDEX_FILE='$(echo "$index" | sed -e "$sq")'"
+    echo "SHA1_FILE_DIRECTORY='$(echo "$object" | sed -e "$sq")'"
+    echo "GIT_PROJECT_TOP='$(echo "$project_top" | sed -e "$sq")'"
+    echo "GIT_WORKING_DIR='$(echo "$working_dir" | sed -e "$sq")'"
+    echo export GIT_INDEX_FILE SHA1_FILE_DIRECTORY GIT_PROJECT_TOP
+fi



--- /dev/null	2005-03-19 15:28:25.000000000 -0800
+++ jit-update-cache	2005-04-21 22:59:48.000000000 -0700
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+eval "$(jit-find-index)"
+sq=s/\'/\''\\'\'\'/g
+RQ=$(echo "$GIT_WORKING_DIR" | sed -e "$sq")
+args=
+while case "$#" in 0) break ;; esac
+do
+	case "$1" in
+	--add | --remove | --refresh)
+	    args="${args}$1 " ;;
+	--cacheinfo)
+	    args="${args}$1 "
+	    shift; args="${args}'$(echo "$1" | sed -e "$sq")' "
+	    shift; args="${args}'$(echo "$1" | sed -e "$sq")' " ;;
+	*)
+	    args="${args}'$RQ$(echo "$1" | sed -e "$sq")' " ;;
+	esac
+	shift
+done
+eval "set x $args; shift"
+
+cd $GIT_PROJECT_TOP && exec update-cache "$@"




^ permalink raw reply

* [PATCH] optimized SHA1 for powerpc
From: Paul Mackerras @ 2005-04-22  5:52 UTC (permalink / raw)
  To: torvalds; +Cc: git, anton

Linus,

Just for fun, I wrote a ppc-assembly SHA1 routine.  It appears to be
about 2.5x faster than the generic version.  It reduces the time for a
fsck-cache on a linux-2.6 tree from ~6.8 seconds to ~6.0 seconds on my
G4 powerbook.

Paul.

diff -urN git.orig/Makefile git/Makefile
--- git.orig/Makefile	2005-04-22 15:21:10.000000000 +1000
+++ git/Makefile	2005-04-22 15:11:28.000000000 +1000
@@ -25,7 +25,12 @@
 
 LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o
 LIB_FILE=libgit.a
-LIB_H=cache.h object.h
+LIB_H=cache.h object.h sha1.h
+
+arch := $(shell uname -m | tr -d 0-9)
+ifeq ($(arch),ppc)
+LIB_OBJS += sha1.o sha1ppc.o
+endif
 
 $(LIB_FILE): $(LIB_OBJS)
 	$(AR) rcs $@ $(LIB_OBJS)
diff -urN git.orig/cache.h git/cache.h
--- git.orig/cache.h	2005-04-22 15:21:10.000000000 +1000
+++ git/cache.h	2005-04-22 13:57:36.000000000 +1000
@@ -12,7 +12,7 @@
 #include <sys/mman.h>
 #include <netinet/in.h>
 
-#include <openssl/sha.h>
+#include "sha1.h"
 #include <zlib.h>
 
 /*
diff -urN git.orig/sha1.c git/sha1.c
--- /dev/null	2005-04-04 12:56:19.000000000 +1000
+++ git/sha1.c	2005-04-22 15:17:27.000000000 +1000
@@ -0,0 +1,72 @@
+/*
+ * SHA-1 implementation.
+ *
+ * Copyright (C) 2005 Paul Mackerras <paulus@samba.org>
+ *
+ * This version assumes we are running on a big-endian machine.
+ * It calls an external sha1_core() to process blocks of 64 bytes.
+ */
+#include <stdio.h>
+#include <string.h>
+#include "sha1.h"
+
+extern void sha1_core(uint32_t *hash, const unsigned char *p,
+		      unsigned int nblocks);
+
+int SHA1_Init(SHA_CTX *c)
+{
+	c->hash[0] = 0x67452301;
+	c->hash[1] = 0xEFCDAB89;
+	c->hash[2] = 0x98BADCFE;
+	c->hash[3] = 0x10325476;
+	c->hash[4] = 0xC3D2E1F0;
+	c->len = 0;
+	c->cnt = 0;
+	return 0;
+}
+
+int SHA1_Update(SHA_CTX *c, const void *ptr, unsigned long n)
+{
+	unsigned long nb;
+	const unsigned char *p = ptr;
+
+	c->len += n << 3;
+	while (n != 0) {
+		if (c->cnt || n < 64) {
+			nb = 64 - c->cnt;
+			if (nb > n)
+				nb = n;
+			memcpy(&c->buf.b[c->cnt], p, nb);
+			if ((c->cnt += nb) == 64) {
+				sha1_core(c->hash, c->buf.b, 1);
+				c->cnt = 0;
+			}
+		} else {
+			nb = n >> 6;
+			sha1_core(c->hash, p, nb);
+			nb <<= 6;
+		}
+		n -= nb;
+		p += nb;
+	}
+	return 0;
+}	
+
+int SHA1_Final(unsigned char *hash, SHA_CTX *c)
+{
+	unsigned int cnt = c->cnt;
+
+	c->buf.b[cnt++] = 0x80;
+	if (cnt > 56) {
+		if (cnt < 64)
+			memset(&c->buf.b[cnt], 0, 64 - cnt);
+		sha1_core(c->hash, c->buf.b, 1);
+		cnt = 0;
+	}
+	if (cnt < 56)
+		memset(&c->buf.b[cnt], 0, 56 - cnt);
+	c->buf.l[7] = c->len;
+	sha1_core(c->hash, c->buf.b, 1);
+	memcpy(hash, c->hash, 20);
+	return 0;
+}
diff -urN git.orig/sha1.h git/sha1.h
--- /dev/null	2005-04-04 12:56:19.000000000 +1000
+++ git/sha1.h	2005-04-22 15:06:53.000000000 +1000
@@ -0,0 +1,19 @@
+#ifndef __powerpc__
+#include <openssl/sha.h>
+#else
+#include <stdint.h>
+
+typedef struct sha_context {
+	uint32_t hash[5];
+	uint32_t cnt;
+	uint64_t len;
+	union {
+		unsigned char b[64];
+		uint64_t l[8];
+	} buf;
+} SHA_CTX;
+
+int SHA1_Init(SHA_CTX *c);
+int SHA1_Update(SHA_CTX *c, const void *p, unsigned long n);
+int SHA1_Final(unsigned char *hash, SHA_CTX *c);
+#endif
diff -urN git.orig/sha1ppc.S git/sha1ppc.S
--- /dev/null	2005-04-04 12:56:19.000000000 +1000
+++ git/sha1ppc.S	2005-04-22 15:18:19.000000000 +1000
@@ -0,0 +1,185 @@
+/*
+ * SHA-1 implementation for PowerPC.
+ *
+ * Copyright (C) 2005 Paul Mackerras.
+ */
+#define FS	80
+
+/*
+ * We roll the registers for T, A, B, C, D, E around on each
+ * iteration; T on iteration t is A on iteration t+1, and so on.
+ * We use registers 7 - 12 for this.
+ */
+#define RT(t)	((((t)+5)%6)+7)
+#define RA(t)	((((t)+4)%6)+7)
+#define RB(t)	((((t)+3)%6)+7)
+#define RC(t)	((((t)+2)%6)+7)
+#define RD(t)	((((t)+1)%6)+7)
+#define RE(t)	((((t)+0)%6)+7)
+
+/* We use registers 16 - 31 for the W values */
+#define W(t)	(((t)%16)+16)
+
+#define STEPD0(t)				\
+	and	%r6,RB(t),RC(t);		\
+	andc	%r0,RD(t),RB(t);		\
+	rotlwi	RT(t),RA(t),5;			\
+	rotlwi	RB(t),RB(t),30;			\
+	or	%r6,%r6,%r0;			\
+	add	%r0,RE(t),%r15;			\
+	add	RT(t),RT(t),%r6;		\
+	add	%r0,%r0,W(t);			\
+	add	RT(t),RT(t),%r0
+
+#define STEPD1(t)				\
+	xor	%r6,RB(t),RC(t);		\
+	rotlwi	RT(t),RA(t),5;			\
+	rotlwi	RB(t),RB(t),30;			\
+	xor	%r6,%r6,RD(t);			\
+	add	%r0,RE(t),%r15;			\
+	add	RT(t),RT(t),%r6;		\
+	add	%r0,%r0,W(t);			\
+	add	RT(t),RT(t),%r0
+
+#define STEPD2(t)				\
+	and	%r6,RB(t),RC(t);		\
+	and	%r0,RB(t),RD(t);		\
+	rotlwi	RT(t),RA(t),5;			\
+	rotlwi	RB(t),RB(t),30;			\
+	or	%r6,%r6,%r0;			\
+	and	%r0,RC(t),RD(t);		\
+	or	%r6,%r6,%r0;			\
+	add	%r0,RE(t),%r15;			\
+	add	RT(t),RT(t),%r6;		\
+	add	%r0,%r0,W(t);			\
+	add	RT(t),RT(t),%r0
+
+#define LOADW(t)				\
+	lwz	W(t),(t)*4(%r4)
+
+#define UPDATEW(t)				\
+	xor	%r0,W((t)-3),W((t)-8);		\
+	xor	W(t),W((t)-16),W((t)-14);	\
+	xor	W(t),W(t),%r0;			\
+	rotlwi	W(t),W(t),1
+
+#define STEP0LD4(t)				\
+	STEPD0(t);   LOADW((t)+4);		\
+	STEPD0((t)+1); LOADW((t)+5);		\
+	STEPD0((t)+2); LOADW((t)+6);		\
+	STEPD0((t)+3); LOADW((t)+7)
+
+#define STEPUP4(t, fn)				\
+	STEP##fn(t);   UPDATEW((t)+4);		\
+	STEP##fn((t)+1); UPDATEW((t)+5);	\
+	STEP##fn((t)+2); UPDATEW((t)+6);	\
+	STEP##fn((t)+3); UPDATEW((t)+7)
+
+#define STEPUP20(t, fn)				\
+	STEPUP4(t, fn);				\
+	STEPUP4((t)+4, fn);			\
+	STEPUP4((t)+8, fn);			\
+	STEPUP4((t)+12, fn);			\
+	STEPUP4((t)+16, fn)
+
+	.globl	sha1_core
+sha1_core:
+	stwu	%r1,-FS(%r1)
+	stw	%r15,FS-68(%r1)
+	stw	%r16,FS-64(%r1)
+	stw	%r17,FS-60(%r1)
+	stw	%r18,FS-56(%r1)
+	stw	%r19,FS-52(%r1)
+	stw	%r20,FS-48(%r1)
+	stw	%r21,FS-44(%r1)
+	stw	%r22,FS-40(%r1)
+	stw	%r23,FS-36(%r1)
+	stw	%r24,FS-32(%r1)
+	stw	%r25,FS-28(%r1)
+	stw	%r26,FS-24(%r1)
+	stw	%r27,FS-20(%r1)
+	stw	%r28,FS-16(%r1)
+	stw	%r29,FS-12(%r1)
+	stw	%r30,FS-8(%r1)
+	stw	%r31,FS-4(%r1)
+
+	/* Load up A - E */
+	lwz	RA(0),0(%r3)	/* A */
+	lwz	RB(0),4(%r3)	/* B */
+	lwz	RC(0),8(%r3)	/* C */
+	lwz	RD(0),12(%r3)	/* D */
+	lwz	RE(0),16(%r3)	/* E */
+
+	mtctr	%r5
+
+1:	LOADW(0)
+	LOADW(1)
+	LOADW(2)
+	LOADW(3)
+
+	lis	%r15,0x5a82	/* K0-19 */
+	ori	%r15,%r15,0x7999
+	STEP0LD4(0)
+	STEP0LD4(4)
+	STEP0LD4(8)
+	STEPUP4(12, D0)
+	STEPUP4(16, D0)
+
+	lis	%r15,0x6ed9	/* K20-39 */
+	ori	%r15,%r15,0xeba1
+	STEPUP20(20, D1)
+
+	lis	%r15,0x8f1b	/* K40-59 */
+	ori	%r15,%r15,0xbcdc
+	STEPUP20(40, D2)
+
+	lis	%r15,0xca62	/* K60-79 */
+	ori	%r15,%r15,0xc1d6
+	STEPUP4(60, D1)
+	STEPUP4(64, D1)
+	STEPUP4(68, D1)
+	STEPUP4(72, D1)
+	STEPD1(76)
+	STEPD1(77)
+	STEPD1(78)
+	STEPD1(79)
+
+	lwz	%r20,16(%r3)
+	lwz	%r19,12(%r3)
+	lwz	%r18,8(%r3)
+	lwz	%r17,4(%r3)
+	lwz	%r16,0(%r3)
+	add	%r20,RE(80),%r20
+	add	RD(0),RD(80),%r19
+	add	RC(0),RC(80),%r18
+	add	RB(0),RB(80),%r17
+	add	RA(0),RA(80),%r16
+	mr	RE(0),%r20
+	stw	RA(0),0(%r3)
+	stw	RB(0),4(%r3)
+	stw	RC(0),8(%r3)
+	stw	RD(0),12(%r3)
+	stw	RE(0),16(%r3)
+
+	addi	%r4,%r4,64
+	bdnz	1b
+
+	lwz	%r15,FS-68(%r1)
+	lwz	%r16,FS-64(%r1)
+	lwz	%r17,FS-60(%r1)
+	lwz	%r18,FS-56(%r1)
+	lwz	%r19,FS-52(%r1)
+	lwz	%r20,FS-48(%r1)
+	lwz	%r21,FS-44(%r1)
+	lwz	%r22,FS-40(%r1)
+	lwz	%r23,FS-36(%r1)
+	lwz	%r24,FS-32(%r1)
+	lwz	%r25,FS-28(%r1)
+	lwz	%r26,FS-24(%r1)
+	lwz	%r27,FS-20(%r1)
+	lwz	%r28,FS-16(%r1)
+	lwz	%r29,FS-12(%r1)
+	lwz	%r30,FS-8(%r1)
+	lwz	%r31,FS-4(%r1)
+	addi	%r1,%r1,FS
+	blr

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox