Git development

Git development
 help / color / mirror / Atom feed

* git packs
From: bob @ 2007-11-10  4:47 UTC (permalink / raw)
  To: git

When a repository is packed such as for a clone or fetch, is there  
just one pack file created that is used for the transfer?

^ permalink raw reply

* Re: git packs
From: Nicolas Pitre @ 2007-11-10  5:13 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <F6DD8DCD-416B-4DDF-B384-7213C9ED5565@mac.com>

On Fri, 9 Nov 2007, bob wrote:

> When a repository is packed such as for a clone or fetch, is there just one
> pack file created that is used for the transfer?

Yes.

And modern Git is able to handle packs larger than 4GB too, assuming it 
is compiled using a toolchain with large file support.

Nicolas

^ permalink raw reply

* [PATCH] builtin-commit: fix --signoff
From: Johannes Schindelin @ 2007-11-10  5:49 UTC (permalink / raw)
  To: git, krh, gitster


The Signed-off-by: line contained a spurious timestamp.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin-commit.c |   31 ++++++++++++++++++++-----------
 1 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/builtin-commit.c b/builtin-commit.c
index e8bc4c4..f79ad48 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -181,21 +181,30 @@ static int prepare_log_message(const char *index_file, const char *prefix)
 		die("could not open %s\n", git_path(commit_editmsg));
 
 	stripspace(&sb, 0);
-	if (fwrite(sb.buf, 1, sb.len, fp) < sb.len)
-		die("could not write commit template: %s\n",
-		    strerror(errno));
 
 	if (signoff) {
-		const char *info, *bol;
-
-		info = git_committer_info(1);
-		strbuf_addch(&sb, '\0');
-		bol = strrchr(sb.buf + sb.len - 1, '\n');
-		if (!bol || prefixcmp(bol, sign_off_header))
-			fprintf(fp, "\n");
-		fprintf(fp, "%s%s\n", sign_off_header, git_committer_info(1));
+		struct strbuf sob;
+		const char *p;
+		int i;
+
+		strbuf_init(&sob, 0);
+		strbuf_addstr(&sob, sign_off_header);
+		strbuf_addstr(&sob, git_committer_info(1));
+		p = strrchr(sob.buf, '>');
+		if (p)
+			strbuf_setlen(&sob, p + 1 - sob.buf);
+		strbuf_addch(&sob, '\n');
+
+		for (i = sb.len - 1; i > 0 && sb.buf[i - 1] != '\n'; i--)
+			; /* do nothing */
+		if (prefixcmp(sb.buf + i, sob.buf))
+			strbuf_addbuf(&sb, &sob);
 	}
 
+	if (fwrite(sb.buf, 1, sb.len, fp) < sb.len)
+		die("could not write commit template: %s\n",
+		    strerror(errno));
+
 	strbuf_release(&sb);
 
 	if (in_merge && !no_edit)
-- 
1.5.3.5.1674.g6e7f7

^ permalink raw reply related

* Re: git packs
From: bob @ 2007-11-10  6:00 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.0.9999.0711100011150.21255@xanadu.home>

When you say toolchain, are you referring to the compiler
and associated libraries or are you referring to OS programs
such as ls, md5, cat, etc or both?

The reason that I ask is that I have been playing different
scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
all day and every time that

A) 	a file approaches or exceeds 2gig on an 'add', it
	results in:

	fatal: Out of memory? mmap failed: Cannot allocate memory

B) 	the repository size less the .git subdirectory approaches
	4gig on a 'fetch' it results in:

	Resolving 3356 deltas...
	fatal: serious inflate inconsistency: -3 (unknown compression method)
	fatal: index-pack died with error code 128
	fatal: Fetch failure: ../rmwHtmlOld

	Under B, building the initial repository works fine.

(I added a patch the Linus Torvalds gave out when a previous inflate  
problem
was being researched.)  Also, I have been looking in the source
in particular in builtin-add.c builtin-pack-objects.c and associated  
headers
and see int and unsigned long being used a lot, but not any unsigned  
long
longs.  I have been testing on my laptop which has a 32-bit Intel  
Core Duo.
Also, I have run the same tests on a dual quad-core Intel processor
which is 64 bit, (but not sure that Apple uses the 64 bits in  
10.4.10).  I
get the same results as above.

The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
which from what I can tell supports large files, because 'off_t' is 8  
bytes
which is the size used for a 'stat' file size.

I am just wondering if these size limitations exist for MacOSX
or maybe I am doing something wrong (which is probably
the case).

On Nov 10, 2007, at 12:13 AM, Nicolas Pitre wrote:

> On Fri, 9 Nov 2007, bob wrote:
>
>> When a repository is packed such as for a clone or fetch, is there  
>> just one
>> pack file created that is used for the transfer?
>
> Yes.
>
> And modern Git is able to handle packs larger than 4GB too,  
> assuming it
> is compiled using a toolchain with large file support.
>
>
> Nicolas
> -

^ permalink raw reply

* Re: gitweb, updating 'last changed' column on the project page
From: Jon Smirl @ 2007-11-10  6:01 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <fh337a$ggp$1@ger.gmane.org>

On 11/9/07, Jakub Narebski <jnareb@gmail.com> wrote:
> Jon Smirl wrote:
>
> > At http://git.digispeaker.com/ the 'last change' column is not getting updated.
> >
> > mpc5200b.git
> >       DigiSpeaker for Freescale MPC5200B.
> >       Jon Smirl
> >       5 weeks ago
> >       summary | shortlog | log | tree
> >
> > It still says 5 weeks ago, but if I click on the project last change is today.
> >
> > What controls this? I tried running update-server-info
>
> What does
>
>   git for-each-ref --format="%(refname):%09%(committer)" --sort=-committerdate
>       refs/heads

[daedalus]$ git for-each-ref --format="%(refname):%09%(committer)"
--sort=-committerdate refs/heads
refs/heads/m24: Jon Smirl <jonsmirl@gmail.com> 1191362799 -0400
refs/heads/m25: Jon Smirl <jonsmirl@gmail.com> 1191472422 -0400
refs/heads/m26: Jon Smirl <jonsmirl@gmail.com> 1194382038 -0500
refs/heads/m28: Jon Smirl <jonsmirl@gmail.com> 1194385071 -0500
refs/heads/m29: Jon Smirl <jonsmirl@gmail.com> 1194585780 -0500

>
> return? Does adding --count select proper branch, with proper update
> date?

Is it looking for master, and just picking the first branch instead?

>
> Which gitweb version is this?

<!-- git web interface version 1.5.3.5.605.g79fa-dirty, (C) 2005-2006,
Kay Sievers <kay.sievers@vrfy.org>, Christian Gierke -->
<!-- git core binaries version 1.5.3.5.605.g79fa-dirty -->

>
> --
> Jakub Narebski
> Warsaw, Poland
> ShadeHawk on #git
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: Reducing the memory footprint
From: Jon Smirl @ 2007-11-10  6:07 UTC (permalink / raw)
  To: Brian Downing; +Cc: Git Mailing List
In-Reply-To: <9e4733910711091705i6f77d05uc5ba04f668796a73@mail.gmail.com>

On 11/9/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 11/9/07, Brian Downing <bdowning@lavos.net> wrote:
> > On Fri, Nov 09, 2007 at 06:38:00PM -0500, Jon Smirl wrote:
> > > I'm using this config file:
> > >
> > > [pack]
> > >         windowMemory = 1M
> > >         deltaCacheSize = 1M
> > >
> > > And I have NO_MMAP compiled in.
> > >
> > > git is still using over 200MB of memory or address space, my process
> > > gets killed either way.
> >
> > I'm assuming it's dying on repacking since you included the pack
> > parameters.
> >
> > How big is your biggest object?  Even with pack.windowMemory, it still
> > keeps the last object around to try and delta against (in other words,
> > the window only shrinks to size 1), which means you have to have room
> > for it and its delta index.
>
> It's a Linux kernel repository. Git receive-pack is going over 200MB
> and getting zapped.  I don't understand why the process is so large. I
> am compiled with -DNO_MMAP.

I believe I must not have installed everything correctly with my
NO_MMAP build. After debugging for a while and fixing things I'm able
to do a push now in about 80MB of memory.

> I think I have a achieved a work around. I rsync'd in my last several
> weeks of changes. Now I can 'git push' small amounts of changes
> without getting killed.
>
> I'm begging dreamhost to simply install git. Installed commands don't
> get zapped.
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: [PATCH] Make builtin-tag.c use parse_options.
From: Junio C Hamano @ 2007-11-10  6:07 UTC (permalink / raw)
  To: Carlos Rica; +Cc: git
In-Reply-To: <473463E0.7000406@gmail.com>

Carlos Rica <jasampler@gmail.com> writes:

> Also, this removes those tests ensuring that repeated
> -m options don't allocate memory more than once, because now
> this is done after parsing options, using the last one
> when more are given. The same for -F.

The reason for this change is...?  Is this because it is
cumbersome to detect and refuse multiple -m options using the
parseopt API?  If so, the API may be what needs to be fixed.
Taking the last one and discarding earlier ones feels to me an
arbitrary choice.

While I freely admit that I do not particularly find the "One -m
introduces one new line, concatenated to form the final
paragraph" handling of multiple -m options done by git-commit
nice nor useful, I suspect that it would make more sense to make
git-tag and git-commit handle multiple -m option consistently,
if you are going to change the existing semantics.  Since some
people really seem to like multiple -m handling of git-commit,
the avenue of the least resistance for better consistency would
be to accept and concatenate (with LF in between) multiple -m
options.

With multiple -F, I think erroring out would be the sensible
thing to do, but some people might prefer concatenation.  I do
not care either way as long as commit and tag behave
consistently.

^ permalink raw reply

* Re: gitweb, updating 'last changed' column on the project page
From: Jon Smirl @ 2007-11-10  6:10 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <9e4733910711092201n5aaeeb7cvfd0e76e43170d481@mail.gmail.com>

On 11/10/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 11/9/07, Jakub Narebski <jnareb@gmail.com> wrote:
> > Jon Smirl wrote:
> >
> > > At http://git.digispeaker.com/ the 'last change' column is not getting updated.
> > >
> > > mpc5200b.git
> > >       DigiSpeaker for Freescale MPC5200B.
> > >       Jon Smirl
> > >       5 weeks ago
> > >       summary | shortlog | log | tree
> > >
> > > It still says 5 weeks ago, but if I click on the project last change is today.
> > >
> > > What controls this? I tried running update-server-info
> >
> > What does
> >
> >   git for-each-ref --format="%(refname):%09%(committer)" --sort=-committerdate
> >       refs/heads
>
> [daedalus]$ git for-each-ref --format="%(refname):%09%(committer)"
> --sort=-committerdate refs/heads
> refs/heads/m24: Jon Smirl <jonsmirl@gmail.com> 1191362799 -0400
> refs/heads/m25: Jon Smirl <jonsmirl@gmail.com> 1191472422 -0400
> refs/heads/m26: Jon Smirl <jonsmirl@gmail.com> 1194382038 -0500
> refs/heads/m28: Jon Smirl <jonsmirl@gmail.com> 1194385071 -0500
> refs/heads/m29: Jon Smirl <jonsmirl@gmail.com> 1194585780 -0500


It appears to be using the first head instead of the most recent date.


>
> >
> > return? Does adding --count select proper branch, with proper update
> > date?
>
> Is it looking for master, and just picking the first branch instead?
>
> >
> > Which gitweb version is this?
>
> <!-- git web interface version 1.5.3.5.605.g79fa-dirty, (C) 2005-2006,
> Kay Sievers <kay.sievers@vrfy.org>, Christian Gierke -->
> <!-- git core binaries version 1.5.3.5.605.g79fa-dirty -->
>
> >
> > --
> > Jakub Narebski
> > Warsaw, Poland
> > ShadeHawk on #git
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe git" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: git packs
From: Luke Lu @ 2007-11-10  6:36 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <FC175E4F-D9BE-42CC-B0BB-561B2EDCD941@mac.com>

On Nov 9, 2007, at 10:00 PM, bob wrote:
> When you say toolchain, are you referring to the compiler
> and associated libraries or are you referring to OS programs
> such as ls, md5, cat, etc or both?
>
> The reason that I ask is that I have been playing different
> scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
> all day and every time that
>
> A) 	a file approaches or exceeds 2gig on an 'add', it
> 	results in:
> 	
> 	fatal: Out of memory? mmap failed: Cannot allocate memory
>
>
>
> B) 	the repository size less the .git subdirectory approaches
> 	4gig on a 'fetch' it results in:
>
> 	Resolving 3356 deltas...
> 	fatal: serious inflate inconsistency: -3 (unknown compression method)
> 	fatal: index-pack died with error code 128
> 	fatal: Fetch failure: ../rmwHtmlOld
>
> 	Under B, building the initial repository works fine.
>
> (I added a patch the Linus Torvalds gave out when a previous  
> inflate problem
> was being researched.)  Also, I have been looking in the source
> in particular in builtin-add.c builtin-pack-objects.c and  
> associated headers
> and see int and unsigned long being used a lot, but not any  
> unsigned long
> longs.  I have been testing on my laptop which has a 32-bit Intel  
> Core Duo.
> Also, I have run the same tests on a dual quad-core Intel processor
> which is 64 bit, (but not sure that Apple uses the 64 bits in  
> 10.4.10).  I
> get the same results as above.
>
> The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
> which from what I can tell supports large files, because 'off_t' is  
> 8 bytes
> which is the size used for a 'stat' file size.

mmap(2), which git uses by default, is subject to vm limits  
(typically <2GB), regardless of large file support. file `which git`  
will probably tell you that it's a Mach-O executable i386 instead of  
x86_64. In order to get 64 bit binaries on Mactel boxes, you'll need  
the -m64 flag for gcc. I suspect that compiling with NO_MMAP option  
work as well.

__Luke

^ permalink raw reply

* Re: git packs
From: Linus Torvalds @ 2007-11-10  6:38 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <FC175E4F-D9BE-42CC-B0BB-561B2EDCD941@mac.com>

On Sat, 10 Nov 2007, bob wrote:
> 
> The reason that I ask is that I have been playing different
> scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
> all day and every time that
> 
> A) 	a file approaches or exceeds 2gig on an 'add', it
> 	results in:
> 		fatal: Out of memory? mmap failed: Cannot allocate memory

Git wants to handle single files as one single entity, so single big files 
really do end up being very painful. The costs of compressing them and 
generating deltas would probably get prohibitively high *anyway*, but it 
does mean that if you have gigabyte files, you do want a 64-bit VM.

I thought OS X could do 64 bits these days. Maybe not.

Anyway, that explains the "cannot allocate memory". Git simply wants to 
mmap the whole file. You don't have enough VM space for it.

(And if you seriously want to work with multi-gigabyte files, git probbaly 
isn't going to perform wonderfully well, even if it *should* work fine if 
you just have a full 64-bit environment that allows the mmap).

> B) 	the repository size less the .git subdirectory approaches
> 	4gig on a 'fetch' it results in:
> 
> 	Resolving 3356 deltas...
> 	fatal: serious inflate inconsistency: -3 (unknown compression method)

That sounds really broken. I'm not seeing what would cause that, apart 
from some really bad data corruption and/or broken zlib implementation. 
But if the pack-file really is 2GB+ in size, I could imagine some sign 
issues cropping up.

git will generally use "unsigned long" (which is probably just 32-bit on 
your setup), but since git in those circumstances would be limited by the 
size of the VM _anyway_, that's not really much of a limitation (although 
probably broken on the crazy Windows "LLP64" model). But maybe we have 
some place where we use a signed thing, or zlib does, and I could see that 
causing breakage.

But that code-sequence really should never even come *close* to the 31-bit 
limit, as long as the individual objects themselves aren't bigger than the 
available VM space (and git currently assumes "unsigned long" is 
sufficiently big to cover the VM space, which is not technically correct, 
but should be fine on OS X too).

That said, we should use "off_t" in that function. I suspect we have a 
number of people (read: me) who have grown too used to living in a 64-bit 
world..

> I have been testing on my laptop which has a 32-bit Intel Core Duo.

Ok, so you're 32-bit limited even if there is were to be some 64-bit 
support for OS X.

> Also, I have run the same tests on a dual quad-core Intel processor
> which is 64 bit, (but not sure that Apple uses the 64 bits in 10.4.10).  I
> get the same results as above.

I'm pretty sure OS X defaults to a 32-bit environment, but has at least 
*some* 64-bit support. It would definitely need to be enabled explicitly 
(since they made the *insane* decision to move over to Intel laptop chips 
six months before they got 64-bit support! Somebody at Apple is a total 
idiot, and should get fired).

So it would be interesting to hear if a 64-bit build would make a 
difference.

> The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
> which from what I can tell supports large files, because 'off_t' is 8 bytes
> which is the size used for a 'stat' file size.

See above: single files are size-limited, but with a large off_t like 
yours, you should be fine. Except we may have screwed up.

> I am just wondering if these size limitations exist for MacOSX
> or maybe I am doing something wrong (which is probably
> the case).

We *have* had issues with broken implementations of "pread()" on some 
systems.  

You could try setting NO_PREAD in the Makefile and compiling with the 
compatibility function.. That's the only thing that comes to mind as being 
worth trying in that area.

And if you have some script to generate the repository (ie you aren't 
using "live data", but are testing the limits of the system), if you can 
make that available, so that people with non-OSX environments can test, 
that would be interesting..

I certainly have some 32-bit environments too (old linux boxes), but I'm 
too lazy to write a test-case, so I was hoping you'd be using some simple 
scripts that I could just test and see if I can see the behaviour you 
describe myself.

That said, I have worked with a 3GB pack-file (one of the KDE trial 
repos). That worked fine. But git does tend to want a *lot* of memory for 
really big repositories, so I suspect that if you actually work with 2GB+ 
pack-files, you'll be wanting a 64-bit environment just because you'll be 
wanting more than 2GB of physical RAM in order to be able to access it 
efficiently.

			Linus

^ permalink raw reply

* Re: git packs
From: Linus Torvalds @ 2007-11-10  6:53 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <alpine.LFD.0.999.0711092211250.15101@woody.linux-foundation.org>



On Fri, 9 Nov 2007, Linus Torvalds wrote:
> 
> That said, I have worked with a 3GB pack-file (one of the KDE trial 
> repos). That worked fine. But git does tend to want a *lot* of memory for 
> really big repositories, so I suspect that if you actually work with 2GB+ 
> pack-files, you'll be wanting a 64-bit environment just because you'll be 
> wanting more than 2GB of physical RAM in order to be able to access it 
> efficiently.

Just double-checked. Yes, sirree. You definitely want 4GB+ if you are 
cloning a 3GB git pack-file. The "git-pack-objects" phase not only is 
going to walk all over the pack-file, it's going to add its own memory 
footprint on top of that just keeping track of all the objects.

So I doubt 2GB+ pack-files are all that practical on 32-bit hosts. At 
least not with the kind of performance behaviour *I* would accept.

(Of course, since git packs things pretty damn well, it would need to be a 
really really big project to be a 2GB+ pack-file, or just contain a lot of 
generally large non-deltable binary data file - one scenario where git 
definitely doesn't work wonderfully well, although I doubt many other 
SCM's do either..)

			Linus

^ permalink raw reply

* Re: git packs
From: Linus Torvalds @ 2007-11-10  6:58 UTC (permalink / raw)
  To: Luke Lu; +Cc: bob, git
In-Reply-To: <DF65F7E4-448A-4726-8B42-642776155A8F@vicaya.com>

On Fri, 9 Nov 2007, Luke Lu wrote:
> 
> mmap(2), which git uses by default, is subject to vm limits (typically <2GB),
> regardless of large file support. file `which git` will probably tell you that
> it's a Mach-O executable i386 instead of x86_64. In order to get 64 bit
> binaries on Mactel boxes, you'll need the -m64 flag for gcc. I suspect that
> compiling with NO_MMAP option work as well.

Even with NO_MMAP, git will still want to read in source files in their 
entirety (just with regular reads). So you'll still be VM size-limited: 
the mmap() will just be replaced with a malloc+read in order to avoid 
some broken windows mmap() behaviour.

But hearing whether -m64 makes a difference would be interesting. I'm 
hoping OS X is LP64, not some insane half-way thing like Vista is.

		Linus

^ permalink raw reply

* Re: git packs
From: bob @ 2007-11-10  7:19 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.0.999.0711092211250.15101@woody.linux-foundation.org>

I will try a few things and see if I can get a script put together
that generates the inflate problem.  The data that I am
using is a backup of my original repository.  So, I can
play all that I want.  But it would be a lot easier if I
could just generate some files using dd or something.

I'll also try the 64-bit compile on my Mac Pro and see
if it works.

My only reason for keeping this directory under git is
that I find git so easy to work with across multiple
machines.  I own 10+ computers and I use git to
provide easy transfers/updates from machine to
machine normally without any issues and a great
deal of reliability.

This directory is a website that I use
internally to keep track of things important to me.
For instance, the one large file is a movie of the
inside of my house before they sheet-rocked it
so that later I would have an easier time finding
things in the walls.  There is some html and php
that I wrote in it which I did want versioned.

Maybe I should just drop back to using two
directories, one of the large files which are
static anyway and a git repo for the html and php.
I was just trying to keep everything in the
repo for simplicity.

No matter which direction that I decide, I will still
try to provide the script.  Thank you both (Luke Lu)
for the feedback.


On Nov 10, 2007, at 1:38 AM, Linus Torvalds wrote:

>
>
> On Sat, 10 Nov 2007, bob wrote:
>>
>> The reason that I ask is that I have been playing different
>> scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
>> all day and every time that
>>
>> A) 	a file approaches or exceeds 2gig on an 'add', it
>> 	results in:
>> 		fatal: Out of memory? mmap failed: Cannot allocate memory
>
> Git wants to handle single files as one single entity, so single  
> big files
> really do end up being very painful. The costs of compressing them and
> generating deltas would probably get prohibitively high *anyway*,  
> but it
> does mean that if you have gigabyte files, you do want a 64-bit VM.
>
> I thought OS X could do 64 bits these days. Maybe not.
>
> Anyway, that explains the "cannot allocate memory". Git simply  
> wants to
> mmap the whole file. You don't have enough VM space for it.
>
> (And if you seriously want to work with multi-gigabyte files, git  
> probbaly
> isn't going to perform wonderfully well, even if it *should* work  
> fine if
> you just have a full 64-bit environment that allows the mmap).
>
>> B) 	the repository size less the .git subdirectory approaches
>> 	4gig on a 'fetch' it results in:
>>
>> 	Resolving 3356 deltas...
>> 	fatal: serious inflate inconsistency: -3 (unknown compression  
>> method)
>
> That sounds really broken. I'm not seeing what would cause that, apart
> from some really bad data corruption and/or broken zlib  
> implementation.
> But if the pack-file really is 2GB+ in size, I could imagine some sign
> issues cropping up.
>
> git will generally use "unsigned long" (which is probably just 32- 
> bit on
> your setup), but since git in those circumstances would be limited  
> by the
> size of the VM _anyway_, that's not really much of a limitation  
> (although
> probably broken on the crazy Windows "LLP64" model). But maybe we have
> some place where we use a signed thing, or zlib does, and I could  
> see that
> causing breakage.
>
> But that code-sequence really should never even come *close* to the  
> 31-bit
> limit, as long as the individual objects themselves aren't bigger  
> than the
> available VM space (and git currently assumes "unsigned long" is
> sufficiently big to cover the VM space, which is not technically  
> correct,
> but should be fine on OS X too).
>
> That said, we should use "off_t" in that function. I suspect we have a
> number of people (read: me) who have grown too used to living in a  
> 64-bit
> world..
>
>> I have been testing on my laptop which has a 32-bit Intel Core Duo.
>
> Ok, so you're 32-bit limited even if there is were to be some 64-bit
> support for OS X.
>
>> Also, I have run the same tests on a dual quad-core Intel processor
>> which is 64 bit, (but not sure that Apple uses the 64 bits in  
>> 10.4.10).  I
>> get the same results as above.
>
> I'm pretty sure OS X defaults to a 32-bit environment, but has at  
> least
> *some* 64-bit support. It would definitely need to be enabled  
> explicitly
> (since they made the *insane* decision to move over to Intel laptop  
> chips
> six months before they got 64-bit support! Somebody at Apple is a  
> total
> idiot, and should get fired).
>
> So it would be interesting to hear if a 64-bit build would make a
> difference.
>
>> The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
>> which from what I can tell supports large files, because 'off_t'  
>> is 8 bytes
>> which is the size used for a 'stat' file size.
>
> See above: single files are size-limited, but with a large off_t like
> yours, you should be fine. Except we may have screwed up.
>
>> I am just wondering if these size limitations exist for MacOSX
>> or maybe I am doing something wrong (which is probably
>> the case).
>
> We *have* had issues with broken implementations of "pread()" on some
> systems.
>
> You could try setting NO_PREAD in the Makefile and compiling with the
> compatibility function.. That's the only thing that comes to mind  
> as being
> worth trying in that area.
>
> And if you have some script to generate the repository (ie you aren't
> using "live data", but are testing the limits of the system), if  
> you can
> make that available, so that people with non-OSX environments can  
> test,
> that would be interesting..
>
> I certainly have some 32-bit environments too (old linux boxes),  
> but I'm
> too lazy to write a test-case, so I was hoping you'd be using some  
> simple
> scripts that I could just test and see if I can see the behaviour you
> describe myself.
>
> That said, I have worked with a 3GB pack-file (one of the KDE trial
> repos). That worked fine. But git does tend to want a *lot* of  
> memory for
> really big repositories, so I suspect that if you actually work  
> with 2GB+
> pack-files, you'll be wanting a 64-bit environment just because  
> you'll be
> wanting more than 2GB of physical RAM in order to be able to access it
> efficiently.
>
> 			Linus

^ permalink raw reply

* Re: git packs
From: David Brown @ 2007-11-10  7:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Luke Lu, bob, git
In-Reply-To: <alpine.LFD.0.999.0711092254130.15101@woody.linux-foundation.org>

On Fri, Nov 09, 2007 at 10:58:16PM -0800, Linus Torvalds wrote:

>But hearing whether -m64 makes a difference would be interesting. I'm 
>hoping OS X is LP64, not some insane half-way thing like Vista is.

Some casual tests with printf and sizeof makes it look like it is.  At
least sizeof (void *) and sizeof (long) are both 8.

David

^ permalink raw reply

* [PATCH 1/2] ce_match_stat, run_diff_files: use symbolic constants for readability
From: Junio C Hamano @ 2007-11-10  9:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Kristian Høgsberg, git
In-Reply-To: <alpine.LFD.0.999.0711091840120.15101@woody.linux-foundation.org>

ce_match_stat() can be told:

 (1) to ignore CE_VALID bit (used under "assume unchanged" mode)
     and perform the stat comparison anyway;

 (2) not to perform the contents comparison for racily clean
     entries and report mismatch of cached stat information;

using its "option" parameter.  Give them symbolic constants.

Similarly, run_diff_files() can be told not to report anything
on removed paths.  Also give it a symbolic constant for that.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 Linus Torvalds <torvalds@linux-foundation.org> writes:

 > On Fri, 9 Nov 2007, Junio C Hamano wrote:
 >>  
 >> -int run_diff_files(struct rev_info *revs, int silent_on_removed)
 >> +int run_diff_files(struct rev_info *revs, int option)
 >
 > Wouldn't it be much better to now 
 >  - make it "unsigned int flags"
 >  - create a few enums or #define's to make the usage be more readable?
 >
 > Because this:
 >
 >>-       run_diff_files(&rev, 0);
 >>+       run_diff_files(&rev, 2);
 >> -	    !ie_modified(istate, istate->cache[pos], &st, 1)) {
 >> +	    !ie_match_stat(istate, istate->cache[pos], &st, 3)) {
 >
 > just went from subtle to "incredibly non-obvious".

 That really is true.  Apparently I am getting much slower
 lately.  This is to just introduce the constants and change the
 types.

 builtin-apply.c |    2 +-
 cache.h         |   14 ++++++++++----
 check-racy.c    |    2 +-
 diff-lib.c      |   16 +++++++++-------
 diff.h          |    4 +++-
 entry.c         |    2 +-
 read-cache.c    |   47 ++++++++++++++++++++++++++++++-----------------
 unpack-trees.c  |    4 ++--
 8 files changed, 57 insertions(+), 34 deletions(-)

diff --git a/builtin-apply.c b/builtin-apply.c
index 5cc90e6..0fff02e 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -2099,7 +2099,7 @@ static int verify_index_match(struct cache_entry *ce, struct stat *st)
 			return -1;
 		return 0;
 	}
-	return ce_match_stat(ce, st, 1);
+	return ce_match_stat(ce, st, CE_MATCH_IGNORE_VALID);
 }
 
 static int check_patch(struct patch *patch, struct patch *prev_patch)
diff --git a/cache.h b/cache.h
index fc195bc..31af16a 100644
--- a/cache.h
+++ b/cache.h
@@ -174,8 +174,8 @@ extern struct index_state the_index;
 #define remove_file_from_cache(path) remove_file_from_index(&the_index, (path))
 #define add_file_to_cache(path, verbose) add_file_to_index(&the_index, (path), (verbose))
 #define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL)
-#define ce_match_stat(ce, st, really) ie_match_stat(&the_index, (ce), (st), (really))
-#define ce_modified(ce, st, really) ie_modified(&the_index, (ce), (st), (really))
+#define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options))
+#define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options))
 #endif
 
 enum object_type {
@@ -266,8 +266,14 @@ extern int remove_file_from_index(struct index_state *, const char *path);
 extern int add_file_to_index(struct index_state *, const char *path, int verbose);
 extern struct cache_entry *make_cache_entry(unsigned int mode, const unsigned char *sha1, const char *path, int stage, int refresh);
 extern int ce_same_name(struct cache_entry *a, struct cache_entry *b);
-extern int ie_match_stat(struct index_state *, struct cache_entry *, struct stat *, int);
-extern int ie_modified(struct index_state *, struct cache_entry *, struct stat *, int);
+
+/* do stat comparison even if CE_VALID is true */
+#define CE_MATCH_IGNORE_VALID		01
+/* do not check the contents but report dirty on racily-clean entries */
+#define CE_MATCH_RACY_IS_DIRTY	02
+extern int ie_match_stat(struct index_state *, struct cache_entry *, struct stat *, unsigned int);
+extern int ie_modified(struct index_state *, struct cache_entry *, struct stat *, unsigned int);
+
 extern int ce_path_match(const struct cache_entry *ce, const char **pathspec);
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, enum object_type type, const char *path);
 extern int read_fd(int fd, char **return_buf, unsigned long *return_size);
diff --git a/check-racy.c b/check-racy.c
index d6a08b4..00d92a1 100644
--- a/check-racy.c
+++ b/check-racy.c
@@ -18,7 +18,7 @@ int main(int ac, char **av)
 
 		if (ce_match_stat(ce, &st, 0))
 			dirty++;
-		else if (ce_match_stat(ce, &st, 2))
+		else if (ce_match_stat(ce, &st, CE_MATCH_RACY_IS_DIRTY))
 			racy++;
 		else
 			clean++;
diff --git a/diff-lib.c b/diff-lib.c
index da55713..9f8afbe 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -173,9 +173,10 @@ static int is_in_index(const char *path)
 }
 
 static int handle_diff_files_args(struct rev_info *revs,
-		int argc, const char **argv, int *silent)
+				  int argc, const char **argv,
+				  unsigned int *options)
 {
-	*silent = 0;
+	*options = 0;
 
 	/* revs->max_count == -2 means --no-index */
 	while (1 < argc && argv[1][0] == '-') {
@@ -192,7 +193,7 @@ static int handle_diff_files_args(struct rev_info *revs,
 			revs->diffopt.no_index = 1;
 		}
 		else if (!strcmp(argv[1], "-q"))
-			*silent = 1;
+			*options |= DIFF_SILENT_ON_REMOVED;
 		else
 			return error("invalid option: %s", argv[1]);
 		argv++; argc--;
@@ -305,9 +306,9 @@ int setup_diff_no_index(struct rev_info *revs,
 
 int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv)
 {
-	int silent_on_removed;
+	unsigned int options;
 
-	if (handle_diff_files_args(revs, argc, argv, &silent_on_removed))
+	if (handle_diff_files_args(revs, argc, argv, &options))
 		return -1;
 
 	if (revs->diffopt.no_index) {
@@ -329,13 +330,14 @@ int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv)
 		perror("read_cache");
 		return -1;
 	}
-	return run_diff_files(revs, silent_on_removed);
+	return run_diff_files(revs, options);
 }
 
-int run_diff_files(struct rev_info *revs, int silent_on_removed)
+int run_diff_files(struct rev_info *revs, unsigned int option)
 {
 	int entries, i;
 	int diff_unmerged_stage = revs->max_count;
+	int silent_on_removed = option & DIFF_SILENT_ON_REMOVED;
 
 	if (diff_unmerged_stage < 0)
 		diff_unmerged_stage = 2;
diff --git a/diff.h b/diff.h
index 4546aad..de533da 100644
--- a/diff.h
+++ b/diff.h
@@ -224,7 +224,9 @@ extern void diff_flush(struct diff_options*);
 
 extern const char *diff_unique_abbrev(const unsigned char *, int);
 
-extern int run_diff_files(struct rev_info *revs, int silent_on_removed);
+/* do not report anything on removed paths */
+#define DIFF_SILENT_ON_REMOVED 01
+extern int run_diff_files(struct rev_info *revs, unsigned int option);
 extern int setup_diff_no_index(struct rev_info *revs,
 		int argc, const char ** argv, int nongit, const char *prefix);
 extern int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv);
diff --git a/entry.c b/entry.c
index fc3a506..ef88f62 100644
--- a/entry.c
+++ b/entry.c
@@ -200,7 +200,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
 	strcpy(path + len, ce->name);
 
 	if (!lstat(path, &st)) {
-		unsigned changed = ce_match_stat(ce, &st, 1);
+		unsigned changed = ce_match_stat(ce, &st, CE_MATCH_IGNORE_VALID);
 		if (!changed)
 			return 0;
 		if (!state->force) {
diff --git a/read-cache.c b/read-cache.c
index 928e8fa..9e4d4a9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -194,11 +194,12 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 }
 
 int ie_match_stat(struct index_state *istate,
-		  struct cache_entry *ce, struct stat *st, int options)
+		  struct cache_entry *ce, struct stat *st,
+		  unsigned int options)
 {
 	unsigned int changed;
-	int ignore_valid = options & 01;
-	int assume_racy_is_modified = options & 02;
+	int ignore_valid = options & CE_MATCH_IGNORE_VALID;
+	int assume_racy_is_modified = options & CE_MATCH_RACY_IS_DIRTY;
 
 	/*
 	 * If it's marked as always valid in the index, it's
@@ -238,10 +239,11 @@ int ie_match_stat(struct index_state *istate,
 }
 
 int ie_modified(struct index_state *istate,
-		struct cache_entry *ce, struct stat *st, int really)
+		struct cache_entry *ce, struct stat *st, unsigned int options)
 {
 	int changed, changed_fs;
-	changed = ie_match_stat(istate, ce, st, really);
+
+	changed = ie_match_stat(istate, ce, st, options);
 	if (!changed)
 		return 0;
 	/*
@@ -420,7 +422,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	pos = index_name_pos(istate, ce->name, namelen);
 	if (0 <= pos &&
 	    !ce_stage(istate->cache[pos]) &&
-	    !ie_modified(istate, istate->cache[pos], &st, 1)) {
+	    !ie_modified(istate, istate->cache[pos], &st, CE_MATCH_IGNORE_VALID)) {
 		/* Nothing changed, really */
 		free(ce);
 		return 0;
@@ -782,11 +784,13 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
  * to link up the stat cache details with the proper files.
  */
 static struct cache_entry *refresh_cache_ent(struct index_state *istate,
-					     struct cache_entry *ce, int really, int *err)
+					     struct cache_entry *ce,
+					     unsigned int options, int *err)
 {
 	struct stat st;
 	struct cache_entry *updated;
 	int changed, size;
+	int ignore_valid = options & CE_MATCH_IGNORE_VALID;
 
 	if (lstat(ce->name, &st) < 0) {
 		if (err)
@@ -794,16 +798,23 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 		return NULL;
 	}
 
-	changed = ie_match_stat(istate, ce, &st, really);
+	changed = ie_match_stat(istate, ce, &st, options);
 	if (!changed) {
-		if (really && assume_unchanged &&
+		/*
+		 * The path is unchanged.  If we were told to ignore
+		 * valid bit, then we did the actual stat check and
+		 * found that the entry is unmodified.  If the entry
+		 * is not marked VALID, this is the place to mark it
+		 * valid again, under "assume unchanged" mode.
+		 */
+		if (ignore_valid && assume_unchanged &&
 		    !(ce->ce_flags & htons(CE_VALID)))
 			; /* mark this one VALID again */
 		else
 			return ce;
 	}
 
-	if (ie_modified(istate, ce, &st, really)) {
+	if (ie_modified(istate, ce, &st, options)) {
 		if (err)
 			*err = EINVAL;
 		return NULL;
@@ -814,13 +825,14 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 	memcpy(updated, ce, size);
 	fill_stat_cache_info(updated, &st);
 
-	/* In this case, if really is not set, we should leave
-	 * CE_VALID bit alone.  Otherwise, paths marked with
-	 * --no-assume-unchanged (i.e. things to be edited) will
-	 * reacquire CE_VALID bit automatically, which is not
-	 * really what we want.
+	/*
+	 * If ignore_valid is not set, we should leave CE_VALID bit
+	 * alone.  Otherwise, paths marked with --no-assume-unchanged
+	 * (i.e. things to be edited) will reacquire CE_VALID bit
+	 * automatically, which is not really what we want.
 	 */
-	if (!really && assume_unchanged && !(ce->ce_flags & htons(CE_VALID)))
+	if (!ignore_valid && assume_unchanged &&
+	    !(ce->ce_flags & htons(CE_VALID)))
 		updated->ce_flags &= ~htons(CE_VALID);
 
 	return updated;
@@ -834,6 +846,7 @@ int refresh_index(struct index_state *istate, unsigned int flags, const char **p
 	int allow_unmerged = (flags & REFRESH_UNMERGED) != 0;
 	int quiet = (flags & REFRESH_QUIET) != 0;
 	int not_new = (flags & REFRESH_IGNORE_MISSING) != 0;
+	unsigned int options = really ? CE_MATCH_IGNORE_VALID : 0;
 
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new;
@@ -855,7 +868,7 @@ int refresh_index(struct index_state *istate, unsigned int flags, const char **p
 		if (pathspec && !match_pathspec(pathspec, ce->name, strlen(ce->name), 0, seen))
 			continue;
 
-		new = refresh_cache_ent(istate, ce, really, &cache_errno);
+		new = refresh_cache_ent(istate, ce, options, &cache_errno);
 		if (new == ce)
 			continue;
 		if (!new) {
diff --git a/unpack-trees.c b/unpack-trees.c
index ccfeb6e..9411c67 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -406,7 +406,7 @@ static void verify_uptodate(struct cache_entry *ce,
 		return;
 
 	if (!lstat(ce->name, &st)) {
-		unsigned changed = ce_match_stat(ce, &st, 1);
+		unsigned changed = ce_match_stat(ce, &st, CE_MATCH_IGNORE_VALID);
 		if (!changed)
 			return;
 		/*
@@ -927,7 +927,7 @@ int oneway_merge(struct cache_entry **src,
 		if (o->reset) {
 			struct stat st;
 			if (lstat(old->name, &st) ||
-			    ce_match_stat(old, &st, 1))
+			    ce_match_stat(old, &st, CE_MATCH_IGNORE_VALID))
 				old->ce_flags |= htons(CE_UPDATE);
 		}
 		return keep_entry(old, o);
-- 
1.5.3.5.1651.g30bf0

^ permalink raw reply related

* [PATCH 2/2] git-add: make the entry stat-clean after re-adding the same contents
From: Junio C Hamano @ 2007-11-10  9:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Kristian Høgsberg, git
In-Reply-To: <alpine.LFD.0.999.0711091840120.15101@woody.linux-foundation.org>

Earlier in commit 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe
(add_file_to_index: skip rehashing if the cached stat already
matches), add_file_to_index() were taught not to re-add the path
if it already matches the index.

The change meant well, but was not executed quite right.  It
used ie_modified() to see if the file on the work tree is really
different from the index, and skipped adding the contents if the
function says "not modified".

This was wrong.  There are three possible comparison results
between the index and the file in the work tree:

 - with lstat(2) we _know_ they are different.  E.g. if the
   length or the owner in the cached stat information is
   different from the length we just obtained from lstat(2), we
   can tell the file is modified without looking at the actual
   contents.

 - with lstat(2) we _know_ they are the same.  The same length,
   the same owner, the same everything (but this has a twist, as
   described below).

 - we cannot tell from lstat(2) information alone and need to go
   to the filesystem to actually compare.

The last case arises from what we call 'racy git' situation,
that can be caused with this sequence:

    $ echo hello >file
    $ git add file
    $ echo aeiou >file ;# the same length

If the second "echo" is done within the same filesystem
timestamp granularity as the first "echo", then the timestamp
recorded by "git add" and the timestamp we get from lstat(2)
will be the same, and we can mistakenly say the file is not
modified.  The path is called 'racily clean'.  We need to
reliably detect racily clean paths are in fact modified.

To solve this problem, when we write out the index, we mark the
index entry that has the same timestamp as the index file itself
(that is the time from the point of view of the filesystem) to
tell any later code that does the lstat(2) comparison not to
trust the cached stat info, and ie_modified() then actually goes
to the filesystem to compare the contents for such a path.

That's all good, but it should not be used for this "git add"
optimization, as the goal of "git add" is to actually update the
path in the index and make it stat-clean.  With the false
optimization, we did _not_ cause any data loss (after all, what
we failed to do was only to update the cached stat information),
but it made the following sequence leave the file stat dirty:

    $ echo hello >file
    $ git add file
    $ echo hello >file ;# the same contents
    $ git add file

The solution is not to use ie_modified() which goes to the
filesystem to see if it is really clean, but instead use
ie_match_stat() with "assume racily clean paths are dirty"
option, to force re-adding of such a path.

There was another problem with "git add -u".  The codepath
shares the same issue when adding the paths that are found to be
modified, but in addition, it asked "git diff-files" machinery
run_diff_files() function (which is "git diff-files") to list
the paths that are modified.  But "git diff-files" machinery
uses the same ie_modified() call so that it does not report
racily clean _and_ actually clean paths as modified, which is
not what we want.

The patch allows the callers of run_diff_files() to pass the
same "assume racily clean paths are dirty" option, and makes
"git-add -u" codepath to use that option, to discover and re-add
racily clean _and_ actually clean paths.

We could further optimize on top of this patch to differentiate
the case where the path really needs re-adding (i.e. the content
of the racily clean entry was indeed different) and the case
where only the cached stat information needs to be refreshed
(i.e. the racily clean entry was actually clean), but I do not
think it is worth it.

This patch applies to maint and all the way up.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * And this is a respin of the earlier fix, using the new
   constants.

 builtin-add.c |    2 +-
 diff-lib.c    |    4 +++-
 diff.h        |    2 ++
 read-cache.c  |    3 ++-
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 373f87f..e072320 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -123,7 +123,7 @@ static void update(int verbose, const char *prefix, const char **files)
 	rev.diffopt.format_callback_data = &verbose;
 	if (read_cache() < 0)
 		die("index file corrupt");
-	run_diff_files(&rev, 0);
+	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
 }

 static void refresh(int verbose, const char **pathspec)
diff --git a/diff-lib.c b/diff-lib.c
index 9f8afbe..ec1b5e3 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -338,6 +338,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 	int entries, i;
 	int diff_unmerged_stage = revs->max_count;
 	int silent_on_removed = option & DIFF_SILENT_ON_REMOVED;
+	unsigned ce_option = ((option & DIFF_RACY_IS_MODIFIED)
+			      ? CE_MATCH_RACY_IS_DIRTY : 0);

 	if (diff_unmerged_stage < 0)
 		diff_unmerged_stage = 2;
@@ -443,7 +445,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 				       ce->sha1, ce->name, NULL);
 			continue;
 		}
-		changed = ce_match_stat(ce, &st, 0);
+		changed = ce_match_stat(ce, &st, ce_option);
 		if (!changed && !revs->diffopt.find_copies_harder)
 			continue;
 		oldmode = ntohl(ce->ce_mode);
diff --git a/diff.h b/diff.h
index de533da..efaa8f7 100644
--- a/diff.h
+++ b/diff.h
@@ -226,6 +226,8 @@ extern const char *diff_unique_abbrev(const unsigned char *, int);

 /* do not report anything on removed paths */
 #define DIFF_SILENT_ON_REMOVED 01
+/* report racily-clean paths as modified */
+#define DIFF_RACY_IS_MODIFIED 02
 extern int run_diff_files(struct rev_info *revs, unsigned int option);
 extern int setup_diff_no_index(struct rev_info *revs,
 		int argc, const char ** argv, int nongit, const char *prefix);
diff --git a/read-cache.c b/read-cache.c
index 9e4d4a9..c3dbf89 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -388,6 +388,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	int size, namelen, pos;
 	struct stat st;
 	struct cache_entry *ce;
+	unsigned ce_option = CE_MATCH_IGNORE_VALID|CE_MATCH_RACY_IS_DIRTY;

 	if (lstat(path, &st))
 		die("%s: unable to stat (%s)", path, strerror(errno));
@@ -422,7 +423,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	pos = index_name_pos(istate, ce->name, namelen);
 	if (0 <= pos &&
 	    !ce_stage(istate->cache[pos]) &&
-	    !ie_modified(istate, istate->cache[pos], &st, CE_MATCH_IGNORE_VALID)) {
+	    !ie_match_stat(istate, istate->cache[pos], &st, ce_option)) {
 		/* Nothing changed, really */
 		free(ce);
 		return 0;
-- 
1.5.3.5.1651.g30bf0

^ permalink raw reply related

* Re: [PATCH] builtin-commit: fix --signoff
From: Junio C Hamano @ 2007-11-10  9:06 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, krh, gitster
In-Reply-To: <Pine.LNX.4.64.0711100548071.4362@racer.site>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>  	if (signoff) {
> ...
> +		strbuf_init(&sob, 0);
> +		strbuf_addstr(&sob, sign_off_header);
> +		strbuf_addstr(&sob, git_committer_info(1));
> +		p = strrchr(sob.buf, '>');
> +		if (p)
> +			strbuf_setlen(&sob, p + 1 - sob.buf);
> +		strbuf_addch(&sob, '\n');
> +
> +		for (i = sb.len - 1; i > 0 && sb.buf[i - 1] != '\n'; i--)
> +			; /* do nothing */
> +		if (prefixcmp(sb.buf + i, sob.buf))
> +			strbuf_addbuf(&sb, &sob);
>  	}

At this point doesn't this leak sob.buf?

^ permalink raw reply

* Re: [PATCH] Make builtin-tag.c use parse_options.
From: Junio C Hamano @ 2007-11-10  9:26 UTC (permalink / raw)
  To: Kristian Høgsberg; +Cc: git, Carlos Rica
In-Reply-To: <7vabpmpr9y.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> While I freely admit that I do not particularly find the "One -m
> introduces one new line, concatenated to form the final
> paragraph" handling of multiple -m options done by git-commit
> nice nor useful, I suspect that it would make more sense to make
> git-tag and git-commit handle multiple -m option consistently,
> if you are going to change the existing semantics.  Since some
> people really seem to like multiple -m handling of git-commit,
> the avenue of the least resistance for better consistency would
> be to accept and concatenate (with LF in between) multiple -m
> options.
>
> With multiple -F, I think erroring out would be the sensible
> thing to do, but some people might prefer concatenation.  I do
> not care either way as long as commit and tag behave
> consistently.

Alas, this exposes a regression in kh/commit series.

 t/t7501-commit.sh |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index 1b444d4..bf5dd86 100644
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -178,4 +178,27 @@ test_expect_success 'amend commit to fix author' '
 	diff expected current
 
 '
+
+test_expect_success 'sign off' '
+
+	>positive &&
+	git add positive &&
+	git commit -s -m "thank you" &&
+	actual=$(git cat-file commit HEAD | sed -ne "s/Signed-off-by: //p") &&
+	expected=$(git var GIT_COMMITTER_IDENT | sed -e "s/>.*/>/") &&
+	test "z$actual" = "z$expected"
+
+'
+
+test_expect_success 'multiple -m' '
+
+	>negative &&
+	git add negative &&
+	git commit -m "one" -m "two" -m "three" &&
+	actual=$(git cat-file commit HEAD | sed -e "1,/^\$/d") &&
+	expected=$(echo one; echo; echo two; echo; echo three) &&
+	test "z$actual" = "z$expected"
+
+'
+
 test_done

^ permalink raw reply related

* Re: gitweb, updating 'last changed' column on the project page
From: Jakub Narebski @ 2007-11-10  9:34 UTC (permalink / raw)
  To: Jon Smirl; +Cc: git
In-Reply-To: <9e4733910711092210o65759276sd6970af5d29f2f32@mail.gmail.com>

On Saturday, 10 November 2007, Jon Smirl wrote:
> On 11/10/07, Jon Smirl <jonsmirl@gmail.com> wrote:
>> On 11/9/07, Jakub Narebski <jnareb@gmail.com> wrote:
>>> Jon Smirl wrote:
>>>
>>>> At http://git.digispeaker.com/ the 'last change' column is not getting updated.
>>>>
>>>> mpc5200b.git
>>>>       DigiSpeaker for Freescale MPC5200B.
>>>>       Jon Smirl
>>>>       5 weeks ago
>>>>       summary | shortlog | log | tree
>>>>
>>>> It still says 5 weeks ago, but if I click on the project last change is today.
>>>>
>>>> What controls this? I tried running update-server-info
>>>
>>> What does
>>>
>>>   git for-each-ref --format="%(refname):%09%(committer)" --sort=-committerdate
>>>       refs/heads
>>
>> [daedalus]$ git for-each-ref --format="%(refname):%09%(committer)" \
>> --sort=-committerdate refs/heads
>> refs/heads/m24: Jon Smirl <jonsmirl@gmail.com> 1191362799 -0400
>> refs/heads/m25: Jon Smirl <jonsmirl@gmail.com> 1191472422 -0400
>> refs/heads/m26: Jon Smirl <jonsmirl@gmail.com> 1194382038 -0500
>> refs/heads/m28: Jon Smirl <jonsmirl@gmail.com> 1194385071 -0500
>> refs/heads/m29: Jon Smirl <jonsmirl@gmail.com> 1194585780 -0500
> 
> It appears to be using the first head instead of the most recent date.

It appears to not _sort_ the output by committerdate, as it should with
'--sort=-committerdate'.

1442:[gitweb/web!git]$ git for-each-ref --format="%(refname):%09%(committer)" \
  --sort=-committerdate refs/heads
refs/heads/gitweb/web:  Jakub Narebski <jnareb@gmail.com> 1194616779 +0100
refs/heads/man: Junio C Hamano <junio@hera.kernel.org> 1194602628 +0000
refs/heads/html:        Junio C Hamano <junio@hera.kernel.org> 1194602626 +0000
refs/heads/origin:      Junio C Hamano <gitster@pobox.com> 1194602274 -0800
[...]
refs/heads/gitweb-snapshot+navbar:      Sven Verdoolaege <skimo@kotnet.org> 1134765981 +0100

1443:[gitweb/web!git]$ git --version
git version 1.5.3.5


Note that git-for-each-ref with those options returns most recent head
first, sorting output by date of commit (date of adding to repository)

>>>
>>> return? Does adding --count select proper branch, with proper update
>>> date?
>>
>> Is it looking for master, and just picking the first branch instead?

Gitweb should not (and I think does not) have 'master' hardcoded
anywhere. It might use HEAD in some cases you don't want it to...

>>>
>>> Which gitweb version is this?
>>
>> <!-- git web interface version 1.5.3.5.605.g79fa-dirty, (C) 2005-2006,
>> Kay Sievers <kay.sievers@vrfy.org>, Christian Gierke -->
>> <!-- git core binaries version 1.5.3.5.605.g79fa-dirty -->

Older version of gitweb used HEAD branch for'last changed' info on
the projects list page. That is why I asked about gitweb version.

But this is not the case of your problem:
1. Your gitweb is new enough to use git-for-each-ref. It use
   git for-each-ref --format="%(committer)" --sort=-committerdate 
                    --count=1 refs/heads
2. Looking at 'heads' view (or 'heads' part of summary view) one can see
   that m29 is current branch (HEAD), and it is most recent.

Strange...
-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: [PATCH] Make builtin-tag.c use parse_options.
From: Junio C Hamano @ 2007-11-10  9:41 UTC (permalink / raw)
  To: Kristian Høgsberg; +Cc: git, Carlos Rica
In-Reply-To: <7vhcjuo3h9.fsf@gitster.siamese.dyndns.org>

This is an updated patch to the test script...

 t/t7501-commit.sh |   69 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index b151b51..4dc35bd 100644
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -163,4 +163,73 @@ test_expect_success 'partial commit that involves removal (3)' '
 
 '
 
+author="The Real Author <someguy@his.email.org>"
+test_expect_success 'amend commit to fix author' '
+
+	oldtick=$GIT_AUTHOR_DATE &&
+	test_tick &&
+	git reset --hard &&
+	git cat-file -p HEAD |
+	sed -e "s/author.*/author $author $oldtick/" \
+		-e "s/^\(committer.*> \).*$/\1$GIT_COMMITTER_DATE/" > \
+		expected &&
+	git commit --amend --author="$author" &&
+	git cat-file -p HEAD > current &&
+	diff expected current
+
+'
+
+test_expect_success 'sign off (1)' '
+
+	echo 1 >positive &&
+	git add positive &&
+	git commit -s -m "thank you" &&
+	git cat-file commit HEAD | sed -e "1,/^\$/d" >actual &&
+	(
+		echo thank you
+		echo
+		git var GIT_COMMITTER_IDENT |
+		sed -e "s/>.*/>/" -e "s/^/Signed-off-by: /"
+	) >expected &&
+	diff -u expected actual
+
+'
+
+test_expect_success 'sign off (2)' '
+
+	echo 2 >positive &&
+	git add positive &&
+	existing="Signed-off-by: Watch This <watchthis@example.com>" &&
+	git commit -s -m "thank you
+
+$existing" &&
+	git cat-file commit HEAD | sed -e "1,/^\$/d" >actual &&
+	(
+		echo thank you
+		echo
+		echo $existing
+		git var GIT_COMMITTER_IDENT |
+		sed -e "s/>.*/>/" -e "s/^/Signed-off-by: /"
+	) >expected &&
+	diff -u expected actual
+
+'
+
+test_expect_success 'multiple -m' '
+
+	>negative &&
+	git add negative &&
+	git commit -m "one" -m "two" -m "three" &&
+	git cat-file commit HEAD | sed -e "1,/^\$/d" >actual &&
+	(
+		echo one
+		echo
+		echo two
+		echo
+		echo three
+	) >expected &&
+	diff -u expected actual
+
+'
+
 test_done

^ permalink raw reply related

* [PATCH v3] user-manual: add advanced topic "bisecting merges"
From: Steffen Prohaska @ 2007-11-10  9:48 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Benoit Sigoure, Andreas Ericsson, Johannes Sixt,
	Steffen Prohaska
In-Reply-To: <217E7104-312D-4D0C-BC66-C4829779C216@zib.de>

This commits adds a discussion of the challenge of bisecting
merge commits to the user manual. The original author is
Junio C Hamano <gitster@pobox.com>, who posted the text to
the mailing list:
<http://marc.info/?l=git&m=119403257315527&w=2>.
The text from the email is slightly adapted for the manual.

The discussion is added to "Exploring git history" in a
sub-section titled "Advanced topics". The discussion requires
detailed knowledge about git. It is assumed that the reader will
skip advanced topics on first reading. At least the text suggest
to do so.

The text includes suggestions and fixed by
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>,
Benoit Sigoure <tsuna@lrde.epita.fr>,
Johannes Sixt <j.sixt@viscovery.net>.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 Documentation/user-manual.txt |  104 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 104 insertions(+), 0 deletions(-)

Next try.

The text now contains an introductory paragraph, proposes rebase as a
'solution', and recommends to throw away the rebased branch.

    Steffen

diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
index d99adc6..2f4c314 100644
--- a/Documentation/user-manual.txt
+++ b/Documentation/user-manual.txt
@@ -934,6 +934,110 @@ Figuring out why this works is left as an exercise to the (advanced)
 student.  The gitlink:git-log[1], gitlink:git-diff-tree[1], and
 gitlink:git-hash-object[1] man pages may prove helpful.
 
+[[history-advanced-topics]]
+Advanced topics
+---------------
+This section covers advanced topics that typically require more
+knowledge about git than the manual presented to this point.
+
+You may want to skip the section on first reading, and come back
+later when you have a better understanding of git.
+
+[[bisect-merges]]
+Why bisecting merge commits can be harder than bisecting linear history
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+This section discusses how gitlink:git-bisect[1] plays
+with differently shaped histories. If you did not yet
+publish a branch you can use either gitlink:git-merge[1] or
+gitlink:git-rebase[1] to integrate changes from a second
+branch. The two approaches create differently shaped
+histories. This section discusses the implications on
+gitlink:git-bisect[1]. If the history is already published
+temporarily rebasing can still be helpful for bisecting.
+
+The following text is based upon an email by Junio C. Hamano to
+the git mailing list
+(link:http://marc.info/?l=git&m=119403257315527&w=2[link:http://marc.info/?l=git&m=119403257315527&w=2]).
+It was slightly adapted for this manual.
+
+Using gitlink:git-bisect[1] on a history with merges can be challenging.
+Bisecting through merges is not a
+technical problem. The real problem is what to do when the
+culprit turns out to be a merge commit.  How to spot what really
+is wrong, and figure out how to fix.  The problem is not for the
+tool but for the human, and it is real.
+
+Imagine this history.
+
+................................................
+      ---Z---o---X---...---o---A---C---D
+          \                       /
+           o---o---Y---...---o---B
+................................................
+
+Suppose that on the upper development line, the meaning of one
+of the functions that existed at Z was changed at commit X.  The
+commits from Z leading to A change both the function's
+implementation and all calling sites that existed at Z, as well
+as new calling sites they add, to be consistent.  There is no
+bug at A.
+
+Suppose that in the meantime the lower development line somebody
+added a new calling site for that function at commit Y.  The
+commits from Z leading to B all assume the old semantics of that
+function and the callers and the callee are consistent with each
+other.  There is no bug at B, either.
+
+You merge to create C.  There is no textual conflict with this
+three way merge, and the result merges cleanly.  You bisect
+this, because you found D is bad and you know Z was good.  Your
+bisect will find that C (merge) is broken.  Understandably so,
+as at C, the new calling site of the function added by the lower
+branch is not converted to the new semantics, while all the
+other calling sites that already existed at Z would have been
+converted by the merge.  The new calling site has semantic
+adjustment needed, but you do not know that yet.  You need to
+find out that that is the cause of the breakage by looking at the
+merge commit C and the history leading to it.
+
+How would you do that?
+
+Both "git diff A C" and "git diff B C" would be an enormous patch.
+Each of them essentially shows the whole change on each branch
+since they diverged.  The developers may have well behaved to
+create good commits that follow the "commit small, commit often,
+commit well contained units" mantra, and each individual commit
+leading from Z to A and from Z to B may be easy to review and
+understand, but looking at these small and easily reviewable
+steps alone would not let you spot the breakage.  You need to
+have a global picture of what the upper branch did (and
+among many, one of them is to change the semantics of that
+particular function) and look first at the huge "diff A C"
+(which shows the change the lower branch introduces), and see if
+that huge change is consistent with what have been done between
+Z and A.
+
+A solution is to linearize the history by rebasing the lower
+branch on top of the upper, instead of merging. There were no
+textual conflicts in the original three way merge. So there
+should not be conflicts during rebase either. Now the bug becomes
+much easier to find and understand. Your history would instead
+be:
+
+................................................................
+    ---Z---o---X--...---o---A---o---o---Y*--...---o---B*--D*
+................................................................
+
+and there is a single commit Y* between A and B* that introduced
+the new calling site that still uses the old semantics of the
+function, even though that was already modified at X. "git show
+Y*" will be a much smaller patch than "git diff A C" and it is
+much easier to deal with.
+
+Now that you know what caused the error (and how to fix it),
+throw away the rebased branch, and commit a fix on top of D.
+
+
 [[Developing-with-git]]
 Developing with git
 ===================
-- 
1.5.3.5.578.g886d

^ permalink raw reply related

* Re: linux-2.6 clone with CygWin git
From: Andreas Ericsson @ 2007-11-10  9:50 UTC (permalink / raw)
  To: Pascal Obry; +Cc: Medve Emilian, git
In-Reply-To: <473495E1.3080305@obry.net>

Pascal Obry wrote:
> David Brown a écrit :
>> It's because of the case-insensitive nature of the filesystem.  The kernel
>> contains both an xt_connmark.h and an xt_CONNMARK.h file, that are
>> different.  But only one exists on Windows, so git sees the first one it
>> wrote as having been modified.
> 
> Probably not something for the list... but if you ask me this is just
> bad practice anyway. Casing has no semantic, it is nonsense to have two
> files with the same name with different casing!
> 

You're right, it's not something for this list. Try arguing your case at
the linux kernel mailing list.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* [PATCH] Make GIT_INDEX_FILE apply to git-commit
From: Remi Vanicat @ 2007-11-10 10:00 UTC (permalink / raw)
  To: git

Currently, when committing, git-commit ignore the value of
GIT_INDEX_FILE, and always use $GIT_DIR/index. This patch
fix it.

Signed-off-by: Rémi Vanicat <vanicat@debian.org>
---
 git-commit.sh |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-commit.sh b/git-commit.sh
index fcb8443..6490045 100755
--- a/git-commit.sh
+++ b/git-commit.sh
@@ -26,7 +26,7 @@ refuse_partial () {
 }

 TMP_INDEX=
-THIS_INDEX="$GIT_DIR/index"
+THIS_INDEX="${GIT_INDEX_FILE:-$GIT_DIR/index}"
 NEXT_INDEX="$GIT_DIR/next-index$$"
 rm -f "$NEXT_INDEX"
 save_index () {
--
1.5.3.5

^ permalink raw reply related

* Re: `git-send-email' doesn't specify `Content-Type'
From: Brian Swetland @ 2007-11-10 10:14 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Ludovic Courtès, git
In-Reply-To: <Pine.LNX.4.64.0711100052290.4362@racer.site>

[Johannes Schindelin <Johannes.Schindelin@gmx.de>]
> Hi,
> 
> On Sat, 10 Nov 2007, Ludovic Court?s wrote:
> 
> > Apparently, `git-send-email' doesn't specify the email's `Content-Type',
> > notably its charset, while it should really add something like:
> > 
> >   Content-Type: text/plain; charset=UTF-8
> > 
> > Or did I miss an option or something?
> 
> Apparently.  There was a thread some days ago, about that very issue.  
> Please find and read it.

The thread I found says that git-send-email should do the right thing if
there are non-ascii characters, but this does not seem to be the case
for me.

The example I have involves a coworker's name which needs non-ascii
characters.  They are properly escaped in the From: line generated by
git-format-patch.  git-send-email puts the generated From: line at the
top of the body of the email, unescapes it (to utf-8), and proceeds to
send the email with no Content-Type specified.

This behaviour is observed in 1.5.3.5.  A sample output from
git-format-patch follows, which demonstrates the problem:


>From 3440baaed3b21138f6fc8b80e03769e3903f9c11 Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Arve=20Hj=C3=B8nnev=C3=A5g?= <arve@android.com>
Date: Wed, 7 Nov 2007 22:51:44 -0800
Subject: [PATCH] hrtimer: Add timer back to pending list if it was reactivated and has already expired again.

This avoids problems with timer hardware that does not respond to timers set in the past.

Signed-off-by: Brian Swetland <swetland@android.com>
---
 kernel/hrtimer.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 22a2514..7c60769 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1149,8 +1149,14 @@ static void run_hrtimer_softirq(struct softirq_action *h)
 			 * If the timer was rearmed on another CPU, reprogram
 			 * the event device.
 			 */
-			if (timer->base->first == &timer->node)
-				hrtimer_reprogram(timer, timer->base);
+			if (timer->base->first == &timer->node) {
+				if(hrtimer_reprogram(timer, timer->base)) {
+					__remove_hrtimer(timer, timer->base,
+							 HRTIMER_STATE_PENDING, 0);
+					list_add_tail(&timer->cb_entry,
+						      &cpu_base->cb_pending);
+				}
+			}
 		}
 	}
 	spin_unlock_irq(&cpu_base->lock);
-- 
1.5.3.5

^ permalink raw reply related

* Re: [PATCH v3] user-manual: add advanced topic "bisecting merges"
From: Junio C Hamano @ 2007-11-10 10:36 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: git, Benoit Sigoure, Andreas Ericsson, Johannes Sixt
In-Reply-To: <119468808499-git-send-email-prohaska@zib.de>

Steffen Prohaska <prohaska@zib.de> writes:

> ...
> +A solution is to linearize the history by rebasing the lower
> +branch on top of the upper, instead of merging. There were no

Hmm.  When I wrote it, I did not mean this as a "solution", but
as an illustration of how a merge heavy history and a linear
history have impact on bisectability.  So it is more like...

    On the other hand, if you did not merge at C but rebased the
    history between Z to B on top of A, you would have get this
    linear history [illustration here].  Bisecting between Z and
    D* would hit a single culprit commit Y* instead.  This tends
    to be easier to understand why it is broken.

For this reason, many experienced git users, even when they are
working on an otherwise merge-heavy project, keep the histories
linear by rebasing their work on top of public upstreams before
publishing (when able).  An extreme example: merges from a few
top-level lieutenants to Linus in the kernel, e.g. David Miller,
are known to _almost always_ fast-forward for Linus.

IOW, the description is to mildly encourage private rebasing to
keep the job of later bisecting (for potentially others) easier.
I realize I originally wrote as if C (merge) was made by the
same person as the person who ends up bisecting, but that is
not necessarily the case.  Keeping the history without needless
merges tend to make _other_ people's lives simpler.

And after encouraging the private rebasing, I would continue
like...

    But if you already made a merge C instead of rebasing, all
    is not lost.  In the illustrated case, you can easily rebase
    one parent branch on top of the other after the fact, just
    to understand the history and to make the history more
    easily bisectable.  Even though the published history should
    not be rewound without consent with others in the project,
    nobody gets hurt if you rebased to create alternate history
    privately.  After understanding the breakage and coming up
    with a fix on top of D*, you can discard that rebased
    history, and apply the same fix on top of D, as D* and D
    should have the identical trees.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox