Git development

Git development
 help / color / mirror / Atom feed

* Re: edit Author/Date metadata as part of 'git commit' $EDITOR  invocation?
From: Adam Megacz @ 2010-01-04 21:08 UTC (permalink / raw)
  To: git
In-Reply-To: <fabb9a1e1001041232h4e5827d1pb5c648b33ecfb5ce@mail.gmail.com>

Sverre Rabbelier <srabbelier@gmail.com> writes:
> On Sun, Jan 3, 2010 at 18:32, Adam Megacz <adam@megacz.com> wrote:
>>     I've been having problems lately with running git on machines where
>>     I forgot to set up my .gitconfig; I wind up with patches that have
>>     committers like root@mymachine and so forth.  Being automatically
>>     shown the committer/author when I make the commit would help me
>>     avoid these situations.
>
> At the very least it should be easy to include these fields as
> comments in the message template.

That would be great.

> But of course you would still be bitten if you used "git commit -m"
> :(.

Perhaps a preference (off by default) demanding that they be set
explicitly when "git commit -m" is used?

Some people care more than others about the metadata; this is for the
folks to whom it matters a lot.

  - a

^ permalink raw reply

* Re: edit Author/Date metadata as part of 'git commit' $EDITOR  invocation?
From: Sverre Rabbelier @ 2010-01-04 20:32 UTC (permalink / raw)
  To: Adam Megacz; +Cc: git
In-Reply-To: <xuu2fx6m4vdi.fsf@nowhere.com>

Heya,

On Sun, Jan 3, 2010 at 18:32, Adam Megacz <adam@megacz.com> wrote:
>     I've been having problems lately with running git on machines where
>     I forgot to set up my .gitconfig; I wind up with patches that have
>     committers like root@mymachine and so forth.  Being automatically
>     shown the committer/author when I make the commit would help me
>     avoid these situations.

At the very least it should be easy to include these fields as
comments in the message template. But of course you would still be
bitten if you used "git commit -m" :(.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply

* edit Author/Date metadata as part of 'git commit' $EDITOR invocation?
From: Adam Megacz @ 2010-01-03 23:32 UTC (permalink / raw)
  To: git


Hi, folks.

>From the output of 'git show', it appears that a commit has a few fields
of metadata associated with it in addition to the comment.  These fields
seem to include Author, AuthorDate, Committer, and CommitDate.

  1. Are there other fields aside from these four?

  2. When I invoke 'git commit' without the '-m' argument I'm dropped
     into the cozy $EDITOR of my choice and given the opportunity to
     edit the commit message.  Is there any way to include the metadata
     fields in this editing session?  That way I could both sanity-check
     them as I perform the commit (important) and modify them if they're
     wrong (less important).

     I've been having problems lately with running git on machines where
     I forgot to set up my .gitconfig; I wind up with patches that have
     committers like root@mymachine and so forth.  Being automatically
     shown the committer/author when I make the commit would help me
     avoid these situations.

Thanks,

  - a

^ permalink raw reply

* Re: A question about changing remote repo name
From: Miklos Vajna @ 2010-01-04 20:09 UTC (permalink / raw)
  To: Dongas; +Cc: git
In-Reply-To: <60ce8d251001032245n4e0267b1o1ecc796f324f8179@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 258 bytes --]

On Mon, Jan 04, 2010 at 02:45:09PM +0800, Dongas <dongas86@gmail.com> wrote:
> I'm running ubuntu 9.04 and the git coming along with it doesn't
> support git remote rename command.

It first appeared in v1.6.1, about a year ago. What does 'git version'
say?

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: RFC: display dirty submodule working directory in git gui and gitk
From: Jens Lehmann @ 2010-01-04 19:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyen Thai Ngoc Duy, Johannes Schindelin, Git Mailing List,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli
In-Reply-To: <7viqbhelmh.fsf@alter.siamese.dyndns.org>

Am 04.01.2010 20:05, schrieb Junio C Hamano:
> Jens Lehmann <Jens.Lehmann@web.de> writes:
> 
>> Am 04.01.2010 18:51, schrieb Nguyen Thai Ngoc Duy:
>>> Incidentally I was just drafting git-super.sh it see how far it goes.
>>> The goal was to implement some cross-module operations over time. "git
>>> super status", "git super commit" and others could be handy.
>>
>> Hm, i'm not sure if this will really help us. I would rather see "git
>> status" and friends do the right thing for submodules too. Maybe this
>> has to be configurable but i think the separate commands that one has
>> to use for submodules now are part of the usability problems we are
>> seeing.

> Both will be valid approaches to work toward the same goal.  A separate
> prototype implementation can be a way to easily figure out what the
> desired features are.

> For the past 12 months, you and Johan Herland were the people who had more
> than one patches with substance to git-submodule.sh and I would really
> appreciate and at the same time want to encourage experimentation by
> people like you who are heavy users with need for a better submodule
> support.

Right. It was not my intention to discourage such experimentations with
my reply. I'm sorry if my email made this impression.

^ permalink raw reply

* Re: submodules, was Re: RFC: display dirty submodule working  directory in git gui and gitk
From: Jens Lehmann @ 2010-01-04 19:14 UTC (permalink / raw)
  To: Avery Pennarun
  Cc: Johannes Schindelin, Heiko Voigt, Git Mailing List,
	Junio C Hamano, Shawn O. Pearce, Paul Mackerras, Lars Hjemli
In-Reply-To: <32541b131001041029t5adc535bt9681d33174042871@mail.gmail.com>

Am 04.01.2010 19:29, schrieb Avery Pennarun:
> For me one big problem comes down to producing accurate output for
> 'git log'.  git submodules assume that the history inside the module
> is entirely separate (you need to run multiple 'git log' instances to
> see the full history); git-subtree assumes that it's entirely
> integrated.  In that sense, git-subtree is somewhat more in line with
> the core principle of git (we track the history of "the content", not
> any particular file or subdir).  Unfortunately, it also exposes a
> problem with that core principle: taken to its extreme, "the content"
> includes all data in the universe.  And while git could branch and
> merge the universe very efficiently in about O(log n) time, 'git log'
> output gets less useful about O(n) with the size of the tree.
> 
> Neither git-subtree nor git submodules seem to help with this "log
> pollution" problem very much - but I don't know what to do that would
> be better.

I think this depends extremely on the use case and may even differ
from submodule to submodule. It might be desirable to be able to
specify which submodule logs you want to see, because only the user
knows what is important for him. But you should be able to ask "git
log" directly without forking it in every submodule you care about,
no?

There has been a thread between Junio and Heiko about group mappings
for submodules. Maybe the configuration could be extended to contain
information about what submodule should add to the superprojects log?
http://thread.gmane.org/gmane.comp.version-control.git/130928/


> Outside of this, my major problem with submodules is they use separate
> work trees and repositories, and thus require lots of extra
> housekeeping to get anything done.  I'd be much happier if submodules
> would share the same objects/packs/.gitdir/refs/indexfile as the
> superproject, and the *only* thing special about them would be that
> the superproject's tree points at a commit object instead of a tree
> object.  In other words, I think the actual repo format is correct
> as-is, but the tools surrounding it cause a lot of confusion.

I don't care deeply where the objects live but agree about the repo
format and the confusion ;-)


> Imagine if cloning a superproject also checked out the subproject
> transparently,

That would be great (at least at checkout time, after clone you
might wanna decide which submodules to initialize first - unless
group mappings are working). Right now we use post-checkout hooks
to do that.


> and committing dirty data inside the subproject's tree
> created a new commit object for the subproject, then tacked that
> commit object into the superproject's index for a later commit
> (exactly as changing a subdir creates a new tree object that the
> parent directory can refer to).

That would be a nice feature.


> This doesn't solve some use cases, however, such as ones where people
> really don't want to check out (or even fetch) the contents of some
> submodules, even when they check out the superproject.  The current
> implementation *does* handle that situation.  I'm not sure how many
> people rely on that behaviour, though.  (And maybe the correct
> solution to *that* is proper support for sparse clone/checkout
> regardless of submodules.)

We do rely on this behavior. But sparse clone or group mappings
could replace that need.

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2010, #01; Mon, 04)
From: Junio C Hamano @ 2010-01-04 19:10 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git
In-Reply-To: <vpqaawtyh99.fsf@bauges.imag.fr>

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> * mm/diag-path-in-treeish (2009-12-07) 1 commit
>>  - Detailed diagnosis when parsing an object name fails.
>
> This one has been there for quite some time and shouldn't be
> controversial. Do I need anything to push it into next?

Prodding like this ;-) 

I wanted to stagger and spread the merge into 'next' over a few rounds.

Thanks.

^ permalink raw reply

* Re: RFC: display dirty submodule working directory in git gui and gitk
From: Junio C Hamano @ 2010-01-04 19:05 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Nguyen Thai Ngoc Duy, Johannes Schindelin, Git Mailing List,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli
In-Reply-To: <4B423633.6090603@web.de>

Jens Lehmann <Jens.Lehmann@web.de> writes:

> Am 04.01.2010 18:51, schrieb Nguyen Thai Ngoc Duy:
>> Incidentally I was just drafting git-super.sh it see how far it goes.
>> The goal was to implement some cross-module operations over time. "git
>> super status", "git super commit" and others could be handy.
>
> Hm, i'm not sure if this will really help us. I would rather see "git
> status" and friends do the right thing for submodules too. Maybe this
> has to be configurable but i think the separate commands that one has
> to use for submodules now are part of the usability problems we are
> seeing.
>
> IMHO putting the functionality of "git submodule summary" into "git
> diff" was a step in the right direction. This thread is about adding a
> line to the diff output when diffing against the working directory and
> a submodule has a dirty working directory too. Then you can ask "git
> diff" and it tells you anything you need to know about the submodule
> before committing or checking out in the supermodule (And IMO later on
> "git status" should give us this information too).

Both will be valid approaches to work toward the same goal.  A separate
prototype implementation can be a way to easily figure out what the
desired features are.

If "git super status" does turns out to be consistent with what "git
status" is supposed to do, you can decide to fold that into the latter at
that point.  On the other hand, information people may want from "git
super status" could be different from what people want "git status" from,
in which case it might be better to either become a new option to "git
status", or become a new subcommand to "git submodule".

You start the prototype by changing "git status" and later decide that the
end result either needs to become an optional behaviour, or maybe even a
separate command.  Either way the end result will be the same---a good
feature to help people is placed at the most logical place.

For the past 12 months, you and Johan Herland were the people who had more
than one patches with substance to git-submodule.sh and I would really
appreciate and at the same time want to encourage experimentation by
people like you who are heavy users with need for a better submodule
support.

Thanks.

^ permalink raw reply

* "git add -i" with path gives "Argument list too long"
From: Wincent Colaiuta @ 2010-01-04 18:43 UTC (permalink / raw)
  To: git

Just ran "git add -i <path>" with "<path>" pointing to a subdirectory  
which happens to have a bunch of files in it (about 7k) and it barfed  
thusly:

   Can't exec "git": Argument list too long at /usr/local/libexec/git- 
core/git-add--interactive line 158.
   Died at /usr/local/libexec/git-core/git-add--interactive line 158.

I see that what it's trying to do under the hood is:

   git diff-index --cached --numstat --summary HEAD -- <7,000+ paths...>

Sure, we could divide the paths into smaller groups, run multiple  
invocations of "git diff-index", and concatenate the results. But it  
would be nicer if there was some other way that we could get at the  
same information without having to pass 7,000 paths explicitly on the  
command line; is there any which I am overlooking?

The enormous file list is the result of passing <path> into "git ls- 
files -- <path>". Would it be worth:

- either, modifying "git diff-index" to accept a list of paths over  
stdin so that we could at least pipe the output from "git ls-files"  
into "git diff-index"

- or, preferably, teach "git diff index" to recurse into directories  
rather than expect a list of paths-of-blobs (possibly with a command  
line switch to activate the behaviour if it were deemed a dangerous  
default)

This is one piece of plumbing that I've never dabbled with, so forgive  
me if my questions are a little dumb.

Cheers,
Wincent

^ permalink raw reply

* Re: RFC: display dirty submodule working directory in git gui and gitk
From: Jens Lehmann @ 2010-01-04 18:40 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Johannes Schindelin, Git Mailing List, Junio C Hamano,
	Shawn O. Pearce, Paul Mackerras, Heiko Voigt, Lars Hjemli
In-Reply-To: <fcaeb9bf1001040951r3f797750o5ebd25e93c0272ea@mail.gmail.com>

Am 04.01.2010 18:51, schrieb Nguyen Thai Ngoc Duy:
> Incidentally I was just drafting git-super.sh it see how far it goes.
> The goal was to implement some cross-module operations over time. "git
> super status", "git super commit" and others could be handy.

Hm, i'm not sure if this will really help us. I would rather see "git
status" and friends do the right thing for submodules too. Maybe this
has to be configurable but i think the separate commands that one has
to use for submodules now are part of the usability problems we are
seeing.

IMHO putting the functionality of "git submodule summary" into "git
diff" was a step in the right direction. This thread is about adding a
line to the diff output when diffing against the working directory and
a submodule has a dirty working directory too. Then you can ask "git
diff" and it tells you anything you need to know about the submodule
before committing or checking out in the supermodule (And IMO later on
"git status" should give us this information too).

^ permalink raw reply

* Re: submodules, was Re: RFC: display dirty submodule working  directory in git gui and gitk
From: Avery Pennarun @ 2010-01-04 18:29 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Heiko Voigt, Jens Lehmann, Git Mailing List, Junio C Hamano,
	Shawn O. Pearce, Paul Mackerras, Lars Hjemli
In-Reply-To: <alpine.DEB.1.00.1001041157020.3695@intel-tinevez-2-302>

On Mon, Jan 4, 2010 at 6:46 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> But I think that an important precondition to come up with a better design
> of the submodules is to have suffered the current implementation in
> real-world work using submodules. (Which reminds me very much of the
> autocrlf mess.)

I suffered the current implementation, which is why I wrote
git-subtree :)  I'm still suffering, though; git-subtree works much
better for my own use cases, but after some experience with it, I'm
still not totally happy.

For me one big problem comes down to producing accurate output for
'git log'.  git submodules assume that the history inside the module
is entirely separate (you need to run multiple 'git log' instances to
see the full history); git-subtree assumes that it's entirely
integrated.  In that sense, git-subtree is somewhat more in line with
the core principle of git (we track the history of "the content", not
any particular file or subdir).  Unfortunately, it also exposes a
problem with that core principle: taken to its extreme, "the content"
includes all data in the universe.  And while git could branch and
merge the universe very efficiently in about O(log n) time, 'git log'
output gets less useful about O(n) with the size of the tree.

Neither git-subtree nor git submodules seem to help with this "log
pollution" problem very much - but I don't know what to do that would
be better.

Outside of this, my major problem with submodules is they use separate
work trees and repositories, and thus require lots of extra
housekeeping to get anything done.  I'd be much happier if submodules
would share the same objects/packs/.gitdir/refs/indexfile as the
superproject, and the *only* thing special about them would be that
the superproject's tree points at a commit object instead of a tree
object.  In other words, I think the actual repo format is correct
as-is, but the tools surrounding it cause a lot of confusion.

Imagine if cloning a superproject also checked out the subproject
transparently, and committing dirty data inside the subproject's tree
created a new commit object for the subproject, then tacked that
commit object into the superproject's index for a later commit
(exactly as changing a subdir creates a new tree object that the
parent directory can refer to).

This doesn't solve some use cases, however, such as ones where people
really don't want to check out (or even fetch) the contents of some
submodules, even when they check out the superproject.  The current
implementation *does* handle that situation.  I'm not sure how many
people rely on that behaviour, though.  (And maybe the correct
solution to *that* is proper support for sparse clone/checkout
regardless of submodules.)

Have fun,

Avery

^ permalink raw reply

* Re: RFC: display dirty submodule working directory in git gui and  gitk
From: Nguyen Thai Ngoc Duy @ 2010-01-04 17:51 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jens Lehmann, Git Mailing List, Junio C Hamano, Shawn O. Pearce,
	Paul Mackerras, Heiko Voigt, Lars Hjemli
In-Reply-To: <alpine.DEB.1.00.1001041038520.4985@pacific.mpi-cbg.de>

On 1/4/10, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> The real problem is that submodules in the current form are not very well
>  designed.  For example, a submodule being at a different commit than in
>  the superproject's index is not as fatal as the submodule having changes.
>
>  So in the long run, IMHO a proper redesign of the submodules would not
>  make only a little sense (it does not help, though, that those who
>  implemented and furthered the current approach over other discussed
>  approaches do not use submodules themselves -- not even now).
>
>  In ths short run, we can paper over the shortcomings of the submodules by
>  introducing a command line option "--include-submodules" to
>  update-refresh, diff-files and diff-index, though.

Incidentally I was just drafting git-super.sh it see how far it goes.
The goal was to implement some cross-module operations over time. "git
super status", "git super commit" and others could be handy.
-- 
Duy

^ permalink raw reply

* Re: RFC: display dirty submodule working directory in git gui and gitk
From: Jens Lehmann @ 2010-01-04 17:04 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Git Mailing List, Junio C Hamano, Shawn O. Pearce, Paul Mackerras,
	Heiko Voigt, Lars Hjemli
In-Reply-To: <alpine.DEB.1.00.1001041038520.4985@pacific.mpi-cbg.de>

Am 04.01.2010 10:44, schrieb Johannes Schindelin:
> The real problem is that submodules in the current form are not very well 
> designed.

IMVHO using the tree sha1 for a submodule seems to be the 'natural' way
to include another git repo. And it gives the reproducibility i expect
from a scm. Or am i missing something?

It looks to me as most shortcomings come from the fact that most git
commands tend to ignore submodules (and if they don't, like git gui and
gitk do now, they e.g. only show certain aspects of their state).

Submodules are in heavy use in our company since last year. Virtually
every patch i submitted for submodules came from that experience and
scratched an itch i or one of my colleagues had (and the situation did
already improve noticeably by the few things we changed). We are still
convinced that using submodules was the right decision. But some work
has still to be done to be able to use them easily and to get rid of
some pitfalls.

> In ths short run, we can paper over the shortcomings of the submodules by 
> introducing a command line option "--include-submodules" to 
> update-refresh, diff-files and diff-index, though.

Maybe this is the way to go for now (and hopefully we can turn this
option on by default later because we did the right thing ;-).

^ permalink raw reply

* Re: [PATCH 7/6] t0021: use $SHELL_PATH for the filter script
From: Johannes Sixt @ 2010-01-04 16:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Nanako Shiraishi, git
In-Reply-To: <20100104160317.GB9136@coredump.intra.peff.net>

Jeff King schrieb:
> I converted more than that; see my 2/6. It is also the pager, the
> imap-send tunnel helper, and external merge helpers. Not the editor,
> since it already had the no-metacharacters optimization (though it, too,
> could be affected if we implement your DWIM trick instead of the
> metacharacter thing).
> 
> So I think we need to make a conscious decision that this is an
> acceptable change of behavior (and I am totally fine with the change
> happening -- I just want to be clear about the extent of what is being
> changed).

Hm, ok, I see.

- The clean and smudge filters are probably the most important cases.

- I *did* write my own merge script (to merge PNGs!), but I made sure to 
begin it with #!/bin/bash, and I don't think anybody else is crazy enough 
to write a custom merge script ;)

- imap-send on Windows is so new that I don't think anyone is already 
using it with a custom tunneling script.

- The change in pager.c is unimportant because all versions shipped so far 
(via msysgit) have the conflicting change that tried without "sh -c" first.

I think that these can be handled with an entry in the release notes.

-- Hannes

^ permalink raw reply

* Re: Git Server Authentication & Management
From: Shawn O. Pearce @ 2010-01-04 16:33 UTC (permalink / raw)
  To: Pedro Lemos; +Cc: Git
In-Reply-To: <1a710981001040827q23f61bdew8db1ae76d5bfb855@mail.gmail.com>

Pedro Lemos <pedrolemos454@gmail.com> wrote:
> I'm relatively new to Git.
> At the moment I'm trying to understand if it will be possible to:
> 
> 1 - configure a central server (server A) to host all my git repositories.
> 2 - also I would like to configure access to those Git repositories in
> order to use authentication:
>         - using LDAP;
>         - using MS Active Directory;

You might want to look at Gerrit Code Review [1].  It has
out-of-the-box support for integration with Active Directory.

[1] http://code.google.com/p/gerrit/

> 3 - Moreover, I would like to know if is there any administration
> interface to use within git repositories?

Gerrit Code Review uses a web based administration interface, though
with an LDAP/Active Directory configuration access controls will
most likely be managed in the directory server by user membership
to groups.

> 4 - And to close this email, I need a way to manage access permissions
> over the server repositories. Such as:
>         - read-write, read-only, or no access at all;
>         - deletes-allowed, renames-allowed, tags allowed;

Yup, Gerrit Code Review can do that.

It also can be used as a code review system.  :-) But if you don't
want to use the code review features, you can just grant out the
Push Branch +1 (or +2 or +3) permission to allow pushing to a branch.

A different, but much more popular choice is gitosis [2], but that
doesn't use LDAP for user authentication and access management.
It uses its own SSH key repository.  To be fair, Gerrit Code Review
also uses its own SSH key repository... but users can manage their
keys individually through the web interface, which is authenticated
by LDAP.

[2] http://eagain.net/gitweb/?p=gitosis.git

-- 
Shawn.

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2010, #01; Mon, 04)
From: Johannes Sixt @ 2010-01-04 16:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vljgei7rs.fsf@alter.siamese.dyndns.org>

Junio C Hamano schrieb:
> * jk/run-command-use-shell (2010-01-01) 8 commits
>  - t4030, t4031: work around bogus MSYS bash path conversion
>  - t0021: use $SHELL_PATH for the filter script
>  - diff: run external diff helper with shell
>  - textconv: use shell to run helper
>  - editor: use run_command's shell feature
>  - run-command: optimize out useless shell calls
>  - run-command: convert simple callsites to use_shell
>  - run-command: add "use shell" option

Two notes about this:

1. My patch "t0021:..." contains an unrelated change to t4030 (it changes 
a /bin/sh to $SHELL_PATH) that is not necessary. I included it in my first 
version of the patch, but later noticed that we already have many similar 
uses of /bin/sh instead of $SHELL_PATH in test scriptlets and decided to 
remove the change, but I only changed the commit message and forgot to 
unstage t4030.

2. If you intend to merge the early part of the topic to master early and 
hold "diff:..." and "textconv:..." in next a bit longer (as proposed by 
Jeff), then you should move "t0021:..." after "run-command: optimize out 
useless shell calls".

Thanks,
-- Hannes

^ permalink raw reply

* Git Server Authentication & Management
From: Pedro Lemos @ 2010-01-04 16:27 UTC (permalink / raw)
  To: Git

Hi,

I'm relatively new to Git.
At the moment I'm trying to understand if it will be possible to:

1 - configure a central server (server A) to host all my git repositories.
2 - also I would like to configure access to those Git repositories in
order to use authentication:
        - using LDAP;
        - using MS Active Directory;

3 - Moreover, I would like to know if is there any administration
interface to use within git repositories?
4 - And to close this email, I need a way to manage access permissions
over the server repositories. Such as:
        - read-write, read-only, or no access at all;
        - deletes-allowed, renames-allowed, tags allowed;

Can anyone guide me through any items referred above?
Any help appreciated!

Best Regards,
Pedro Lemos

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Linus Torvalds @ 2010-01-04 16:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Miles Bader, Nguyen Thai Ngoc Duy, git
In-Reply-To: <alpine.LFD.2.00.1001040659150.3630@localhost.localdomain>

On Mon, 4 Jan 2010, Linus Torvalds wrote:
> 
>  - external grep:
> 
> 	[torvalds@nehalem linux]$ time git grep qwerty
> 	...
> 	real	0m0.412s
> 	user	0m0.196s
> 	sys	0m0.132s
> 
>  - NO_EXTERNAL_GREP:
> 
> 	[torvalds@nehalem linux]$ time ~/git/git grep qwerty
> 	...
> 	real	0m1.006s
> 	user	0m0.900s
> 	sys	0m0.096s
> 
> so that's not even close.

Side note: at least for me, if we did some auto-parallelization, the 
internal grep would make up for all its other suckiness. Do four or eight 
greps in parallel, and buffer the results (you still need to show them in 
the right order).

That might be an acceptable way to "fix" it. Developers pretty much all 
have at least two cores these days, some of us have four+HT. We use 
threads in other places, maybe this could be one more of them.

(Start 'n' threads, do an initial per-thread regex and 'regcomp()' to make 
it thread-safer, and the only interesting issue would be serializing the 
output. Whenever you get a result, you'd need to make sure that all files 
before have been completed, but you could do that all under a specific 
lock that protects completion information).

		Linus

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2010, #01; Mon, 04)
From: Matthieu Moy @ 2010-01-04 16:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vljgei7rs.fsf@alter.siamese.dyndns.org>

Hi,

Junio C Hamano <gitster@pobox.com> writes:

> * mm/diag-path-in-treeish (2009-12-07) 1 commit
>  - Detailed diagnosis when parsing an object name fails.

This one has been there for quite some time and shouldn't be
controversial. Do I need anything to push it into next?

Thanks,

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Miles Bader @ 2010-01-04 15:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Junio C Hamano, Nguyen Thai Ngoc Duy, git
In-Reply-To: <alpine.LFD.2.00.1001040659150.3630@localhost.localdomain>

On Tue, Jan 5, 2010 at 12:54 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> And "perf record" followed by "perf report" on the internal one shows
> that it's not even regexec() - we use strstr() for the trivial case:

Does strstr use e.g. boyer-moore?  I imagine grep does...

-miles

-- 
Do not taunt Happy Fun Ball.

^ permalink raw reply

* Re: [PATCH 7/6] t0021: use $SHELL_PATH for the filter script
From: Jeff King @ 2010-01-04 16:03 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Nanako Shiraishi, git
In-Reply-To: <4B420E4F.1040706@kdbg.org>

On Mon, Jan 04, 2010 at 04:50:39PM +0100, Johannes Sixt wrote:

> >>On Windows, we need the shbang line to correctly invoke shell scripts via
> >>a POSIX shell, except when the script is invoked via 'sh -c' because
> >>sh (a bash) does "the right thing". Since nowadays the clean and smudge
> >>filters are not always invoked via 'sh -c' anymore, we have to mark the
> >>the one in t0021-conversion with #!$SHELL_PATH.
> >
> >Hrm. This does mean we might be breaking users who have helper scripts
> >in a similar state to those in the test suite...
> 
> Not helper scripts in general, but only clean and smudge filters,
> because these have been invoked with "sh -c" so far. Everything else,
> that was not run via "sh -c", but now is, is safe.

I converted more than that; see my 2/6. It is also the pager, the
imap-send tunnel helper, and external merge helpers. Not the editor,
since it already had the no-metacharacters optimization (though it, too,
could be affected if we implement your DWIM trick instead of the
metacharacter thing).

So I think we need to make a conscious decision that this is an
acceptable change of behavior (and I am totally fine with the change
happening -- I just want to be clear about the extent of what is being
changed).

-Peff

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Linus Torvalds @ 2010-01-04 16:03 UTC (permalink / raw)
  To: Miles Bader; +Cc: Jeff King, Junio C Hamano, Nguyen Thai Ngoc Duy, git
In-Reply-To: <fc339e4a1001040757n31298f3h724eacfafb68c63e@mail.gmail.com>

On Tue, 5 Jan 2010, Miles Bader wrote:

> On Tue, Jan 5, 2010 at 12:54 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > And "perf record" followed by "perf report" on the internal one shows
> > that it's not even regexec() - we use strstr() for the trivial case:
> 
> Does strstr use e.g. boyer-moore?  I imagine grep does...

It doesn't matter. Since we do the line-by-line thing, the input is always 
so short that DFA vs NFA vs BM vs other-clever-search doesn't matter. 
There is no scaling - the grep buffer tends to be too small for the 
algorithm to matter.

And the reason we do things line-by-line is that we need to then output 
things line-per-line.

			Linus

^ permalink raw reply

* Re: [PATCH] Use warning function instead of fprintf(stderr, "Warning: ...").
From: Johannes Sixt @ 2010-01-04 16:02 UTC (permalink / raw)
  To: Thiago Farina; +Cc: git
In-Reply-To: <1262463886-8956-1-git-send-email-tfransosi@gmail.com>

Did you actually test any of the changed warnings? You should see extra 
empty lines because warning() adds its own LF and you didn't remove any.

-- Hannes

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Linus Torvalds @ 2010-01-04 16:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Miles Bader, Nguyen Thai Ngoc Duy, git
In-Reply-To: <20100104080940.GA4815@coredump.intra.peff.net>



On Mon, 4 Jan 2010, Jeff King wrote:
> 
> However, gprof reports that for the pcre dfa case, we spend more time in
> grep.c:end_of_line than we do actually running the regex. So clearly
> there are some other micro-optimizations in GNU grep that are making a
> difference, too.

Don't use gprof. You're _much_ better off using the newish Linux 'perf' 
tool. It's quite competent, and doesn't need the code to be compiled with 
-pg (which totally changes all performance characteristics).

Do something like this:

	perf record git grep qwerty

followed by

	perf report
	perf annotate grep_buffer_1

(that "perf report" gives a per-symbol overview, the "perf annotate" gives 
a disassembly with source annotations and per-instruction costs). It works 
with inlining too, so you get things like this:

	...
         :      static char *end_of_line(char *cp, unsigned long *left)
         :      {
         :              unsigned long l = *left;
         :              while (l && *cp != '\n') {
   24.76 :        476a50:       80 3b 0a                cmpb   $0xa,(%rbx)
   10.46 :        476a53:       0f 84 e7 00 00 00       je     476b40 <grep_buffer_1+0x1b0>
         :                      l--;
         :                      cp++;
   21.19 :        476a59:       48 83 c3 01             add    $0x1,%rbx
         :      }
         :
         :      static char *end_of_line(char *cp, unsigned long *left)
         :      {
         :              unsigned long l = *left;
         :              while (l && *cp != '\n') {
    0.94 :        476a5d:       49 83 ed 01             sub    $0x1,%r13
    4.85 :        476a61:       75 ed                   jne    476a50 <grep_buffer_1+0xc0>
         :
	...

and yes, it's all the per-line crap.

The perf tools are included with modern kernels in tools/perf (which also 
has a Documentation subdirectory). I can pretty much guarantee that once 
you start using it, you'll never use gprof or oprofile again.

		Linus

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Linus Torvalds @ 2010-01-04 15:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Miles Bader, Nguyen Thai Ngoc Duy, git
In-Reply-To: <20100104064408.GA7785@coredump.intra.peff.net>

On Mon, 4 Jan 2010, Jeff King wrote:
> 
> I have to wonder, though...did anybody ever actually profile our
> internal grep to find out _why_ it was so much slower than GNU grep?
> Could we simply ship a better grep engine and obsolete external grep?

The internal grep is about 2.5 times slower than the external one for me. 
That's a big deal:

 - external grep:

	[torvalds@nehalem linux]$ time git grep qwerty
	...
	real	0m0.412s
	user	0m0.196s
	sys	0m0.132s

 - NO_EXTERNAL_GREP:

	[torvalds@nehalem linux]$ time ~/git/git grep qwerty
	...
	real	0m1.006s
	user	0m0.900s
	sys	0m0.096s

so that's not even close.

And "perf record" followed by "perf report" on the internal one shows 
that it's not even regexec() - we use strstr() for the trivial case:

    43.63%      git  /home/torvalds/git/git         [.] grep_buffer_1
    25.19%      git  /lib64/libc-2.11.so            [.] __strstr_sse42
     9.16%      git  /home/torvalds/git/git         [.] match_one_pattern
     4.79%      git  /lib64/libc-2.11.so            [.] __m128i_strloadu

bit it seems to be all that line-per-line crud. If we got rid of that one, 
and could do the match as a _single_ regexec() instead (at least for the 
trivial cases of just one grep expression), perhaps we'd be better off.

			Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox