Git development

Git development
 help / color / mirror / Atom feed

* Re: irc usage..
From: Donnie Berkholz @ 2006-05-22  5:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Yann Dirson, Git Mailing List, Matthias Urlichs, Martin Langhoff,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605212132570.3697@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 451 bytes --]

Linus Torvalds wrote:
> Did you do a "top" at any time just before this all happened? It _sounds_ 
> like it might actually be a memory leak on the CVS server side, and the 
> problem may (or may not) be due to the optimization that keeps a single 
> long-running CVS server instance for the whole process.

No. =\ I just started the thing running in a screen session and came
back a few hours later to find it like that.

Thanks,
Donnie


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply

* avoid atoi, when possible; int overflow -> heap corruption
From: Jim Meyering @ 2006-05-22  6:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v3bf3jl15.fsf@assigned-by-dhcp.cox.net>

This is another one of those `would be nice' sort of changes.
Probably not worth much at this early stage in development, but
eventually worth changing.

There are about 20 uses of atoi, and most calls can return
a usable result in spite of an invalid input -- just because
atoi returns the same thing for "99" as "99-and-any-suffix".
It would be better not to ignore invalid inputs.

-------------------
Also, integer overflow in object.c can cause trouble.
When the xrealloc byte count exceeds 2^32 (for a 32-bit int),
xrealloc will happily return a buffer of the requested (small) size,
but the following memset will scribble zeroes far beyond the end
of that new buffer.

static int nr_objs;
int obj_allocs;
...
void created_object(const unsigned char *sha1, struct object *obj)
{
...
	if (obj_allocs - 1 <= nr_objs * 2) {
		int i, count = obj_allocs;
		obj_allocs = (obj_allocs < 32 ? 32 : 2 * obj_allocs);
		objs = xrealloc(objs, obj_allocs * sizeof(struct object *));
		memset(objs + count, 0, (obj_allocs - count)
				* sizeof(struct object *));

But this may be only theoretical, because the problem doesn't strike
until there are over 250M objects (assuming 32-bit int and 8-byte pointers).

^ permalink raw reply

* Re: don't accept bogus N in `HEAD~N'
From: Jim Meyering @ 2006-05-22  7:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e4qmsn$3mv$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> wrote:

> Jim Meyering wrote:
>
>> In a very shallow audit, I spotted code where overflow was not detected.
>> But it's hardly critical.
>>
>> Currently,
>>
>>   git-diff HEAD HEAD
>>
>> is equivalent to this
>>
>>   git-diff HEAD HEAD~18446744073709551616   # aka 2^64
>>
>> Exercising git-rev-parse directly, currently I get this:
>>
>>   $ git-rev-parse --no-flags --sq HEAD~18446744073709551616
>>   '639ca5497279607665847f2e3a11064441a8f2a6'
>>
>> It'd be better to produce a diagnostic and fail:
>>
>>   $ ./git-rev-parse --no-flags --sq -- HEAD~18446744073709551616 /dev/null
>>   fatal: ambiguous argument 'HEAD~18446744073709551616': unknown revision or filename
>
> Wouldn't it remove ability to say "to the root commit"?
> One can do it now I guess exactly by specyfying overly large N.
> Although there should probably be some limit... or not.

Do people really use HEAD~<VERY_LARGE_INTEGER> to refer to the root?
Any who do that will find it surprising that HEAD~18446744073709551616
is currently interpreted just like `HEAD~0'.
And HEAD~18446744073709551617 just like HEAD~1, etc.

^ permalink raw reply

* Re: irc usage..
From: Martin Langhoff @ 2006-05-22  7:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605212132570.3697@g5.osdl.org>

On 5/22/06, Linus Torvalds <torvalds@osdl.org> wrote:
> Did you do a "top" at any time just before this all happened? It _sounds_
> like it might actually be a memory leak on the CVS server side, and the
> problem may (or may not) be due to the optimization that keeps a single
> long-running CVS server instance for the whole process.

Running a few tests right now. Looks like cvs (Debian/etch 1.12.9-13)
itself is not leaking any memory. The Perl (Debian/etch
5.8.7-something and now 5.8.8-4) process OTOH is visibly allocating
memory. Starts off at 4MB and gets up to ~17MB by the time it has done
6K commits.

I am trying to figure out whether the leak is in the script or in the
Perl implementation, using PadWalk, Devel::Leak and friends. If the
leak is here, I can't see it (yet).

> I wouldn't be in the least surprised if that ends up triggering a slow
> leak in CVS itself, and then CVS runs out of memory.

Or a slow leak in Perl? The 5.8.8 release notes do talk about some
leaks being fixed, but this 5.8.8 isn't making a difference.

Working on it.

martin

^ permalink raw reply

* Re: don't accept bogus N in `HEAD~N'
From: Junio C Hamano @ 2006-05-22  8:16 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git
In-Reply-To: <87psi6h5kv.fsf@rho.meyering.net>

Jim Meyering <jim@meyering.net> writes:

> Jakub Narebski <jnareb@gmail.com> wrote:
>
>> Jim Meyering wrote:
>>
>>> It'd be better to produce a diagnostic and fail:

I agree with you that we are loose in integer overlaps.  Some of
them do matter, some don't.  The xrealloc one is, as you said,
borderline, I think, but more serious than this one.  This one
is worth fixing only if/because the fix is obvious and does not
hurt the code otherwise (e.g. does not decrease portability,
does not hurt usability, etc.).

>>>
>>>   $ ./git-rev-parse --no-flags --sq -- HEAD~18446744073709551616 /dev/null
>>>   fatal: ambiguous argument 'HEAD~18446744073709551616': unknown revision or filename
>>
>> Wouldn't it remove ability to say "to the root commit"?
>> One can do it now I guess exactly by specyfying overly large N.
>> Although there should probably be some limit... or not.
>
> Do people really use HEAD~<VERY_LARGE_INTEGER> to refer to the root?

You shouldn't have to care about nor refer to the root commit
that often (if ever) in a real project.  It is handy to be able
to refer to it when your repository is very young and you are
toying with git more than you are working on your own project
that is managed by git.  But in such a case, finding it once and
tagging it is so easy and efficient that you would not want to
traverse the whole history every time you would want to refer to
it.

In other words, I think Jakub was just joking, and this
particular objection does not qualify as "hurt usability"
criteria I said in the above.

^ permalink raw reply

* Re: [PATCH 2/3] tutorial: expanded discussion of commit history
From: Jakub Narebski @ 2006-05-22  8:23 UTC (permalink / raw)
  To: git
In-Reply-To: <1148255528.61d5d241.1@fieldses.org>

J. Bruce Fields wrote:

> +Finally, most commands that take filenames will optionally allow you
> +to precede any filename by a commit, to specify a particular version
> +fo the file:
> +
> +-------------------------------------
> +$ git diff v2.5:Makefile HEAD:Makefile.in
> +-------------------------------------

Why not mention also :<stage>:<filename>, or would <stage> be not defined in
this place of tutorial?

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: don't accept bogus N in `HEAD~N'
From: Junio C Hamano @ 2006-05-22  8:25 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git
In-Reply-To: <7vr72meapg.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> Jim Meyering <jim@meyering.net> writes:
>
>> Jakub Narebski <jnareb@gmail.com> wrote:
>>
>>> Jim Meyering wrote:
>>>
>>>> It'd be better to produce a diagnostic and fail:
>
> I agree with you that we are loose in integer overlaps.  Some of

Oops; I meant overflow or wraparound.. Late night typo/thinko X-<. 

^ permalink raw reply

* Current Issues #3
From: Junio C Hamano @ 2006-05-22  8:44 UTC (permalink / raw)
  To: git

[Third installment of the "Issues" series, but I've been half
 awake for the past week or so, and I suspect I have missed some
 topics that deserve further discussion.]

* Per branch configuration

  The [section "foo"] configuration syntax update by Linus, and
  git-parse-remote update to use remote.stuff.{url,push,pull} by
  Johannes are now both in the "master".  The stage is set to
  discuss what to actually do with per-branch configuration.

  We will use the [branch "foo"] section for configuration about
  local branch named "foo".  I do not think there is any
  disagreement about this.

  The ideas floated so far (I am forgetting many of them
  perhaps):

    1. "upstream" refers to the remote section to use when
       running "git-{fetch,pull,push}" while on that branch.

	[branch "master"]
		upstream = "origin"

	[remote "origin"]
        	url = "git://git.kernel.org/.../git.git"
		fetch = refs/heads/master:refs/remotes/origin/master

    2. "url/fetch/push" directly specifies what would usually be
       taken from a remote section by "git-{fetch,pull,push}"
       while on that branch.

	[branch "foo"]
        	url = "company.com.xz:myrepo"
		fetch = refs/heads/master:refs/remotes/origin/master
		push = refs/heads/master:refs/heads/origin

* reflog

  I still haven't merged this series to "next" -- I do not have
  much against what the code does, but I am unconvinced if it is
  useful.  Also objections raised on the list that this can be
  replaced by making sure that a repository that has hundreds of
  tags usable certainly have a point.

^ permalink raw reply

* Re: [PATCH 2/3] tutorial: expanded discussion of commit history
From: Junio C Hamano @ 2006-05-22  8:45 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e4rsef$v34$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> J. Bruce Fields wrote:
>
>> +Finally, most commands that take filenames will optionally allow you
>> +to precede any filename by a commit, to specify a particular version
>> +fo the file:
>> +
>> +-------------------------------------
>> +$ git diff v2.5:Makefile HEAD:Makefile.in
>> +-------------------------------------
>
> Why not mention also :<stage>:<filename>, or would <stage> be not defined in
> this place of tutorial?

I do not think being able to do diff with arbitrary stage is
often used in practice.  By definition, you would want to do
diff with a stage during a conflicted merge, and most of the
time the default combined diff without any colon form should
give you the most useful results.  Also, ":<path>" to mean the
entry in the index is often equivalent to "git diff --cached".

IOW, these are obscure special purpose notation, and I do not
think tutorial is a good place to cover them.

^ permalink raw reply

* Re: [PATCH 2/3] tutorial: expanded discussion of commit history
From: Jakub Narebski @ 2006-05-22  9:01 UTC (permalink / raw)
  To: git
In-Reply-To: <7vzmhacuso.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> J. Bruce Fields wrote:
>>
>>> +Finally, most commands that take filenames will optionally allow you
>>> +to precede any filename by a commit, to specify a particular version
>>> +fo the file:
>>> +
>>> +-------------------------------------
>>> +$ git diff v2.5:Makefile HEAD:Makefile.in
>>> +-------------------------------------
>>
>> Why not mention also :<stage>:<filename>, or would <stage> be not defined
in
>> this place of tutorial?
> 
> I do not think being able to do diff with arbitrary stage is
> often used in practice.  By definition, you would want to do
> diff with a stage during a conflicted merge, and most of the
> time the default combined diff without any colon form should
> give you the most useful results.  Also, ":<path>" to mean the
> entry in the index is often equivalent to "git diff --cached".
> 
> IOW, these are obscure special purpose notation, and I do not
> think tutorial is a good place to cover them.

Hmmm... perhaps in tutorial-3.txt, covering merges and how to resolve
conflicted merge, cherry picking, reverting and rebasing. And of course
some git workflows covering usage of branches (including pull/push,
fast-forward and "union" branches like 'pu' branch in git).

Well, perhaps not tutorial, but Git Cookbook, or Git Receipies, 
or Git Usage Examples,...

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22  9:13 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <46a038f90605220042v369e9ff5o3dc7841472171d02@mail.gmail.com>

On Mon, 22 May 2006, Martin Langhoff wrote:
> 
> Or a slow leak in Perl? The 5.8.8 release notes do talk about some
> leaks being fixed, but this 5.8.8 isn't making a difference.
> 
> Working on it.

Thanks. Looking at what I did convert, that horrid gentoo CVS tree is 
interesting. The resulting (partial) git history has 93413 commits and 
850,000+ objects total, all in a totally linear history.

And that's just up to April 2004, so the full tree is probably a million 
objects.

The good news is that git seems to handle that size repo no problem at 
all. The repack did indeed take a long while, but it packed it all down to 
a 189MB pack-file (and 20MB pack index).

Considering that the bzip2'd tar-file of the CVS history was 157MB, and 
the actual CVS footprint was about 1.6GB, if git stays at under a quarter 
gigabyte for the whole archive once converted (which sounds likely, 
counting indexing), git would basically cut down the disk usage for a live 
repo by a factor of 7 or so.

_And_ I can do a "git log origin > /dev/null" in about 2.4 seconds. Take 
that, CVS.

		Linus

^ permalink raw reply

* [PATCH] git help: remove whatchanged from list of common commands
From: Martin Waitz @ 2006-05-22 10:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Junio C Hamano, git
In-Reply-To: <1148255528.61d5d241.0@fieldses.org>

whatchanged is replaced by git log now.

Signed-off-by: Martin Waitz

---

7da71dafe75f2a682b07cd1140a29e6fd2705583
 generate-cmdlist.sh |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

7da71dafe75f2a682b07cd1140a29e6fd2705583
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 6c59dbd..ec1eda2 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -37,7 +37,6 @@ show-branch
 status
 tag
 verify-tag
-whatchanged
 EOF
 while read cmd
 do
-- 
1.3.3.g288c

-- 
Martin Waitz

^ permalink raw reply related

* Re: Current Issues #3
From: Linus Torvalds @ 2006-05-22 10:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v8xoue9eo.fsf@assigned-by-dhcp.cox.net>

On Mon, 22 May 2006, Junio C Hamano wrote:
> 
> * Per branch configuration
> 
>   The [section "foo"] configuration syntax update by Linus, and
>   git-parse-remote update to use remote.stuff.{url,push,pull} by
>   Johannes are now both in the "master".  The stage is set to
>   discuss what to actually do with per-branch configuration.
> 
>   We will use the [branch "foo"] section for configuration about
>   local branch named "foo".  I do not think there is any
>   disagreement about this.
> 
>   The ideas floated so far (I am forgetting many of them
>   perhaps):
> 
>     1. "upstream" refers to the remote section to use when
>        running "git-{fetch,pull,push}" while on that branch.
> 
> 	[branch "master"]
> 		upstream = "origin"
> 
> 	[remote "origin"]
>         	url = "git://git.kernel.org/.../git.git"
> 		fetch = refs/heads/master:refs/remotes/origin/master
> 
>     2. "url/fetch/push" directly specifies what would usually be
>        taken from a remote section by "git-{fetch,pull,push}"
>        while on that branch.
> 
> 	[branch "foo"]
>         	url = "company.com.xz:myrepo"
> 		fetch = refs/heads/master:refs/remotes/origin/master
> 		push = refs/heads/master:refs/heads/origin

I'd _much_ prefer (1) over (2).

However, I wonder if we couldn't do even better. How about forgetting 
about the "branch" vs "remote" thing, and instead splitting it into 
_three_: "branch", "repository" and "remote branch".

Something like

	[repo "origin"]
		url = "git://git.kernel.org/.../git.git"

	[repo "gitk"]
		url = "git://git.kernel.org/.../gitk.git"

to describe two remote repositories (and NOTE! No branch descriptions 
within those. We're just describing the actual repository, so we might 
have things like "readonly" to indicate that we can't push to them, but if 
we do things like that, they would be "repo-wide" things that we 
describe for that repository),

Then, we can describe remote branches within those repositories:

	[remote "origin/master"]
		repo = origin
		branch = master

	[remote "origin/next"]
		repo = origin
		branch = next

	[remote "origin/pu"]
		repo = origin
		branch = pu

	[remote "gitk/master"]
		repo = gitk
		branch = master

now, here we're describing two things: the name of the remote is what we 
will then use for the ".git/remotes/<name>" thing to remember the last 
value, and we're describing where to get that data (which repo, and which 
branch).

NOTE! In the example above, I made the name of the remote always match the 
<repo>/<branch> format, but that would be just a convention. You could do

	[remote "linus"]
		repo = kernel
		branch = master

to describe the "linus" remote as the master branch of the "kernel" 
repository.

Finally, local branches:

	[branch "master"]
		source = origin/master

	[branch "origin"]
		readonly
		source = origin/master

	[branch "next"]
		readonly
		source = origin/next

	[branch "pu"]
		readonly
		rebase
		source = origin/pu

	[branch "gitk"]
		readonly
		source = gitk/master

This marks the things that just _track_ somebody elses branch as being 
readonly (so "master" and "origin" are really different: they're both 
branches, but one of them just tracks remotes/origin/master, while the 
other one can be committed to), and "pu" has been marked as not only being 
read-only, it also re-bases to its source.

I dunno. Does this sound too verbose and abstract?

Normally, you'd not have a lot of these. For example, for somebody who 
follows the kernel, you'd literally just have

	[branch "master"]
		source = linus

	[remote "linus"]
		repo = kernel
		branch = master

	[repo "kernel"]
		url = git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

and you'd be done. The above would describe both the local "master" branch 
and the "remotes/linus" head, and give the relationship between them.

The git repo is actually much more complex, especially if you want to 
track all of the different branches Junio has, and if you want to also 
track the branches Paul has to gitk.

But with the above, you can fairly naturally do:

 - "git pull" 

	No arguments. fetch the remote described by the current branch, 
	and merge into current branch (we might decide to fetch all the 
	remotes associated with that repo, just because once we do this, 
	we might as well, but that's not that important to the end 
	result).

 - "git pull <repo>"

	fetch all remotes that use <repo>. IFF the current branch is 
	matched to one of those remotes, merge the changes into the 
	current branch. But if you happened to be on another unrelated 
	branch, nothing happens aside from the fetch.

 - "git pull <remote>"

	fetch just the named remote. IFF that remote is also the remote 
	for the current branch, do merge it into current. Again, we 
	_might_ decide to just do the whole repo.

 - "git pull <repo> <branchname>"

	fetch the named branch from the named repository and merge it into 
	current (no ifs, buts or maybes - now we've basically overridden 
	the default relationships, so now the <repo> is just a pure 
	shorthand for the location of the repository)

 - "git pull <repo> <src>:<dst>"

	same as now. fetch <repo> <src> into <dst>, and merge it into the 
	current branch (again, we've overridden any default relationships).

but maybe this is overdesigned. Comments?

			Linus

^ permalink raw reply

* Re: Current Issues #3
From: Martin Waitz @ 2006-05-22 10:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v8xoue9eo.fsf@assigned-by-dhcp.cox.net>

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

hoi :)

On Mon, May 22, 2006 at 01:44:15AM -0700, Junio C Hamano wrote:
>     1. "upstream" refers to the remote section to use when
>        running "git-{fetch,pull,push}" while on that branch.
> 
> 	[branch "master"]
> 		upstream = "origin"

what do you do for [branch "next"] here?

Does it make sense to regard all refs/remotes/*/<branchname> as
upstream and merge these into the current branch when pulling?

Perhaps a pull could even merge all newly fetched remote heads
into the corresponding branch, but for that we'd need to be
able to merge without using the working dir.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* detect write failure, even for stdout
From: Jim Meyering @ 2006-05-22 10:27 UTC (permalink / raw)
  To: git
In-Reply-To: <7v3bf3jl15.fsf@assigned-by-dhcp.cox.net>

git doesn't always detect write failures.  A write I/O error,
(e.g., hardware I/O error or simply disk full)
doesn't provoke nonzero exit status:

    $ ./git-cat-file -t HEAD > /dev/full && echo did not detect write failure
    did not detect write failure

This is perhaps more important than the other things
I've reported, since it can lead to porcelain being unable
to detect a real failure in the plumbing.

Here are two more:

    $ ./git-ls-tree HEAD > /dev/full && echo fail
    fail
    $ ./git-show > /dev/full && echo fail
    fail

If you were using gnulib, I'd suggest simply adding this line

    atexit (close_stdout);

near the beginning of each `main'.  Then you wouldn't have to
manually track down each and every place where a write to stdout
can occur -- not to mention the maintenance burden of keeping
things correct as the code evolves.

^ permalink raw reply

* Re: Current Issues #3
From: Sean @ 2006-05-22 11:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: junkio, git
In-Reply-To: <Pine.LNX.4.64.0605220216310.3697@g5.osdl.org>

On Mon, 22 May 2006 03:18:02 -0700 (PDT)
Linus Torvalds <torvalds@osdl.org> wrote:

> [...]
> 
> but maybe this is overdesigned. Comments?

It all looks good, especially your description of the git pull variations
which seem more natural than what exists now.

The only minor comment i'd make is that we shouldn't mix so many different
names for the same thing.  In your example you have  "remote" (singular)
sections with branch sections that contain "source" keys which map to those
remote sections, both corresponding to "refs/remotes" (plural).

There doesn't seem to be any need to stick with "source" as a key, so :

[remote "origin/master"]
	repo = origin
	branch = master

[branch "master"]
	remote = "origin/master"

Sean

^ permalink raw reply

* [PATCH] cvsimport: introduce -L<imit> option to workaround memory leaks
From: Martin Langhoff @ 2006-05-22 11:38 UTC (permalink / raw)
  To: git, junkio, Johannes.Schindelin, torvals, spyderous, smurf
  Cc: Martin Langhoff

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>

---
This is ugly, but while I work on cleaning up the leak
that seems to be somewhere in the commit() sub, we may
as well set up a workaround.

I am not 100% happy woth including this in git.git. 
In any case, I hope we can revert it soon.

---

 git-cvsimport.perl |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

64ea3c83d8cd176ee972055bd1d11f398655dad8
diff --git a/git-cvsimport.perl b/git-cvsimport.perl
index c0ae00b..c1923d1 100755
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -29,7 +29,7 @@ use IPC::Open2;
 $SIG{'PIPE'}="IGNORE";
 $ENV{'TZ'}="UTC";
 
-our($opt_h,$opt_o,$opt_v,$opt_k,$opt_u,$opt_d,$opt_p,$opt_C,$opt_z,$opt_i,$opt_P, $opt_s,$opt_m,$opt_M,$opt_A,$opt_S);
+our($opt_h,$opt_o,$opt_v,$opt_k,$opt_u,$opt_d,$opt_p,$opt_C,$opt_z,$opt_i,$opt_P, $opt_s,$opt_m,$opt_M,$opt_A,$opt_S,$opt_L);
 my (%conv_author_name, %conv_author_email);
 
 sub usage() {
@@ -85,7 +85,7 @@ sub write_author_info($) {
 	close ($f);
 }
 
-getopts("hivmkuo:d:p:C:z:s:M:P:A:S:") or usage();
+getopts("hivmkuo:d:p:C:z:s:M:P:A:S:L:") or usage();
 usage if $opt_h;
 
 @ARGV <= 1 or usage();
@@ -716,6 +716,7 @@ my $commit = sub {
 	}
 };
 
+my $commitcount = 1;
 while(<CVS>) {
 	chomp;
 	if($state == 0 and /^-+$/) {
@@ -849,6 +850,9 @@ #	VERSION:1.96->1.96.2.1
 	} elsif($state == 9 and /^\s*$/) {
 		$state = 10;
 	} elsif(($state == 9 or $state == 10) and /^-+$/) {
+		if ($opt_L && $commitcount++ >= $opt_L) {
+			last;
+		}
 		&$commit();
 		$state = 1;
 	} elsif($state == 11 and /^-+$/) {
-- 
1.3.2.g82000

^ permalink raw reply related

* [PATCH] cvsimport: replace anonymous sub ref with a normal sub
From: Martin Langhoff @ 2006-05-22 12:45 UTC (permalink / raw)
  To: git; +Cc: Martin Langhoff

commit() does not need to be an anonymous subreference. Keep it simple.

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>


---

 git-cvsimport.perl |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

a0bbc1c2010ca46fc215453d5e4c4853c679f950
diff --git a/git-cvsimport.perl b/git-cvsimport.perl
index c1923d1..2ecfe14 100755
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -563,7 +563,7 @@ my $state = 0;
 
 my($patchset,$date,$author_name,$author_email,$branch,$ancestor,$tag,$logmsg);
 my(@old,@new,@skipped);
-my $commit = sub {
+sub commit {
 	my $pid;
 	while(@old) {
 		my @o2;
@@ -853,7 +853,7 @@ #	VERSION:1.96->1.96.2.1
 		if ($opt_L && $commitcount++ >= $opt_L) {
 			last;
 		}
-		&$commit();
+		commit();
 		$state = 1;
 	} elsif($state == 11 and /^-+$/) {
 		$state = 1;
@@ -863,7 +863,7 @@ #	VERSION:1.96->1.96.2.1
 		print "* UNKNOWN LINE * $_\n";
 	}
 }
-&$commit() if $branch and $state != 11;
+commit() if $branch and $state != 11;
 
 unlink($git_index);
 
-- 
1.3.2.g82000

^ permalink raw reply related

* [PATCH] cvsimport: minor fixups
From: Martin Langhoff @ 2006-05-22 12:45 UTC (permalink / raw)
  To: git; +Cc: Martin Langhoff

Cleanup @skipped after it's used. Close a fhandle. 
Removing suspects one at a time.

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>


---

 git-cvsimport.perl |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

93bef2832d30c9a04e95ff348b9ab8ab8cabee98
diff --git a/git-cvsimport.perl b/git-cvsimport.perl
index 2ecfe14..176b787 100755
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -650,6 +650,8 @@ sub commit {
 			"GIT_COMMITTER_DATE=".strftime("+0000 %Y-%m-%d %H:%M:%S",gmtime($date)),
 			"git-commit-tree", $tree,@par);
 		die "Cannot exec git-commit-tree: $!\n";
+		
+		close OUT;
 	}
 	$pw->writer();
 	$pr->reader();
@@ -661,6 +663,7 @@ sub commit {
 	if (@skipped) {
 	    $logmsg .= "\n\n\nSKIPPED:\n\t";
 	    $logmsg .= join("\n\t", @skipped) . "\n";
+	    @skipped = ();
 	}
 
 	print $pw "$logmsg\n"
-- 
1.3.2.g82000

^ permalink raw reply related

* Re: irc usage..
From: Martin Langhoff @ 2006-05-22 12:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605220203200.3697@g5.osdl.org>

On 5/22/06, Linus Torvalds <torvalds@osdl.org> wrote:
> On Mon, 22 May 2006, Martin Langhoff wrote:
> >
> > Or a slow leak in Perl? The 5.8.8 release notes do talk about some
> > leaks being fixed, but this 5.8.8 isn't making a difference.
> >
> > Working on it.
>
> Thanks. Looking at what I did convert, that horrid gentoo CVS tree is
> interesting. The resulting (partial) git history has 93413 commits and
> 850,000+ objects total, all in a totally linear history.

Ok, so there's 3 patches posted that should help narrow down the
problem. There's a new -L <imit> so that Donnie can get his stuff done
by running it in a while(true) loop. Not proud of it, but hey.

And there are two patches that I suspect may fix the leak. After
applying them, the cvsimport process grows up to ~13MB and then tapers
off, at least as far as my patience has gotten me. It's late on this
side of the globe so I'll look at the results tomorrow morning.

(BTW, I typo-ed Linus' address in the git-send-email invocation. Will
resend to him separately)

I'll also prep a patch as Linus suggests to do auto-repacking while
the import runs so we don't eat up the harddisk.

> git would basically cut down the disk usage for a live
> repo by a factor of 7 or so.
>
> _And_ I can do a "git log origin > /dev/null" in about 2.4 seconds. Take
> that, CVS.

Heh. Faster Gitticat, Kill Kill Kill!

martin

^ permalink raw reply

* Re: avoid atoi, when possible; int overflow -> heap corruption
From: Morten Welinder @ 2006-05-22 13:16 UTC (permalink / raw)
  To: Jim Meyering; +Cc: Junio C Hamano, git
In-Reply-To: <871wumim28.fsf_-_@rho.meyering.net>

> There are about 20 uses of atoi, and most calls can return
> a usable result in spite of an invalid input -- just because
> atoi returns the same thing for "99" as "99-and-any-suffix".
> It would be better not to ignore invalid inputs.

atoi has undefined behaviour for "99-and-any-suffix".  You might
get lucky and get back 99, but you might also get a random value
or a core dump.

Morten

^ permalink raw reply

* Re: avoid atoi, when possible; int overflow -> heap corruption
From: Jim Meyering @ 2006-05-22 13:31 UTC (permalink / raw)
  To: Morten Welinder; +Cc: Junio C Hamano, git
In-Reply-To: <118833cc0605220616t75a182b1oa404d5efe8a1f5d9@mail.gmail.com>

"Morten Welinder" <mwelinder@gmail.com> wrote:
>> There are about 20 uses of atoi, and most calls can return
>> a usable result in spite of an invalid input -- just because
>> atoi returns the same thing for "99" as "99-and-any-suffix".
>> It would be better not to ignore invalid inputs.
>
> atoi has undefined behaviour for "99-and-any-suffix".  You might
> get lucky and get back 99, but you might also get a random value
> or a core dump.

I've never heard of that.
POSIX says that atoi(str) is equivalent to:

    (int) strtol(str, (char **)NULL, 10)
    except that the handling of errors may differ.
    If the value cannot be represented, the behavior is undefined.

Since strtol works fine with such a suffix, and since 99 can be
represented, I don't see why there would be any undefined behavior.

Do you know of an implementation for which `atoi ("99-and-any-suffix")'
does anything other than return 99?

^ permalink raw reply

* Re: avoid atoi, when possible; int overflow -> heap corruption
From: Jeff King @ 2006-05-22 13:37 UTC (permalink / raw)
  To: Morten Welinder; +Cc: git
In-Reply-To: <118833cc0605220616t75a182b1oa404d5efe8a1f5d9@mail.gmail.com>

On Mon, May 22, 2006 at 09:16:50AM -0400, Morten Welinder wrote:

> atoi has undefined behaviour for "99-and-any-suffix".  You might
> get lucky and get back 99, but you might also get a random value
> or a core dump.

Where do you get that from? The standard claims that it converts "the
initial portion of the string pointed to" (7.20.1.2). Furthermore, atoi
is equivalent to strtol with a base of 10 (with the exception of range
errors). From 7.20.1.4, paragraph 2:
  The strtol [...] functions [...] decompose the input string into three
  parts: an initial, possibly empty, sequence of white-space characters
  [...], a subject sequence resembling an integer represented in some
  radix determined by the value of base, and a final string of one or
  more unrecognized characters...
If no conversion can be performed (i.e., you feed it garbage with no
number), zero is returned.

atoi does NOT handle range errors, however; the behavior is undefined in
that case. In practice, I expect most implementations do some sort of
wrapping.

-Peff

^ permalink raw reply

* Re: avoid atoi, when possible; int overflow -> heap corruption
From: Morten Welinder @ 2006-05-22 13:54 UTC (permalink / raw)
  To: Morten Welinder, git
In-Reply-To: <20060522133746.GA12302@coredump.intra.peff.net>

My copy (which is admittedly a draft because I am cheap) does not
restrict undefined behaviour to _range_ errors, but simply says
"Except for the behavior on error, they are equivalent to [the strtol call]"

M.

^ permalink raw reply

* Re: [PATCH 2/3] tutorial: expanded discussion of commit history
From: J. Bruce Fields @ 2006-05-22 14:18 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e4ruku$5uk$1@sea.gmane.org>

On Mon, May 22, 2006 at 11:01:20AM +0200, Jakub Narebski wrote:
> Junio C Hamano wrote:
> > I do not think being able to do diff with arbitrary stage is
> > often used in practice.  By definition, you would want to do
> > diff with a stage during a conflicted merge, and most of the
> > time the default combined diff without any colon form should
> > give you the most useful results.  Also, ":<path>" to mean the
> > entry in the index is often equivalent to "git diff --cached".
> > 
> > IOW, these are obscure special purpose notation, and I do not
> > think tutorial is a good place to cover them.
> 
> Hmmm... perhaps in tutorial-3.txt, covering merges and how to resolve
> conflicted merge, cherry picking, reverting and rebasing.

Even then I had the impression that stages were pretty much invisible to
users.  So that should stay in core-tutorial.txt.  Which could use some
revision (Junio had some ideas) but I'm personally more interested in
end-user documentation than developer documentation for now.

So I'd imagined future tutorial parts cannibalizing everyday.txt and the
howto's.

--b.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox