Be more careful about updating refs

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Be more careful about updating refs
@ 2008-01-15 23:50 Linus Torvalds
  2008-01-16  0:02 ` Linus Torvalds
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Linus Torvalds @ 2008-01-15 23:50 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano


This makes write_ref_sha1() more careful: it actually checks the SHA1 of 
the ref it is updating, and refuses to update a ref with an object that it 
cannot find.

Perhaps more importantly, it also refuses to update a branch head with a 
non-commit object. I don't quite know *how* the stable series maintainers 
were able to corrupt their repository to have a HEAD that pointed to a tag 
rather than a commit object, but they did. Which results in a totally 
broken repository that cannot be cloned or committed on.

So make it harder for people to shoot themselves in the foot like that.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

I'm signing off on this, but I hope people will double-check this: I 
didn't actually test it very much.

 refs.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/refs.c b/refs.c
index 58f6d17..c3ffe03 100644
--- a/refs.c
+++ b/refs.c
@@ -1119,10 +1119,16 @@ static int log_ref_write(const char *ref_name, const unsigned char *old_sha1,
 	return 0;
 }
 
+static int is_branch(const char *refname)
+{
+	return !strcmp(refname, "HEAD") || !prefixcmp(refname, "refs/heads/");
+}
+
 int write_ref_sha1(struct ref_lock *lock,
 	const unsigned char *sha1, const char *logmsg)
 {
 	static char term = '\n';
+	struct object *o;
 
 	if (!lock)
 		return -1;
@@ -1130,6 +1136,19 @@ int write_ref_sha1(struct ref_lock *lock,
 		unlock_ref(lock);
 		return 0;
 	}
+	o = parse_object(sha1);
+	if (!o) {
+		error("Trying to write ref %s with nonexistant object %s",
+			lock->ref_name, sha1_to_hex(sha1));
+		unlock_ref(lock);
+		return -1;
+	}
+	if (o->type != OBJ_COMMIT && is_branch(lock->ref_name)) {
+		error("Trying to write non-commit object %s to branch %s",
+			sha1_to_hex(sha1), lock->ref_name);
+		unlock_ref(lock);
+		return -1;
+	}
 	if (write_in_full(lock->lock_fd, sha1_to_hex(sha1), 40) != 40 ||
 	    write_in_full(lock->lock_fd, &term, 1) != 1
 		|| close(lock->lock_fd) < 0) {

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-15 23:50 Be more careful about updating refs Linus Torvalds
@ 2008-01-16  0:02 ` Linus Torvalds
  2008-01-16 19:52   ` Junio C Hamano
  2008-01-16  0:29 ` Junio C Hamano
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2008-01-16  0:02 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano



On Tue, 15 Jan 2008, Linus Torvalds wrote:
> 
> This makes write_ref_sha1() more careful: it actually checks the SHA1 of 
> the ref it is updating, and refuses to update a ref with an object that it 
> cannot find.

Side note: this breaks some tests, because those tests do things like

	git update-ref refs/heads/master 1111111111111111111111111111111111111111 &&
		test 1111111111111111111111111111111111111111 = $(cat .git/refs/heads/master)

which obviously won't work. I didn't update the tests, because I'm an evil 
person who just finds it very onerous to touch tests, and I particularly 
hate tests that turn out to be wrong.

(Pet peeve on mine: people fixing assert()'s by changing the source-code, 
without ever asking themselves whether maybe the assert itself was the 
bug).

		Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-15 23:50 Be more careful about updating refs Linus Torvalds
  2008-01-16  0:02 ` Linus Torvalds
@ 2008-01-16  0:29 ` Junio C Hamano
  2008-01-16  0:42   ` Linus Torvalds
  2008-01-16  1:11 ` Junio C Hamano
  2008-01-23 22:53 ` Sam Vilain
  3 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2008-01-16  0:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>
> I'm signing off on this, but I hope people will double-check this: I 
> didn't actually test it very much.

Does "Signed-off-by:" line mean something different for you than
for other people these days?

I thought that the rule that applies to you (and only you) on
this list was that all patches from you are DCO certified and I
am free to forge your signature on them, and the other rule that
applies to everybody including you is that Signed-off-by: is
about DCO certification and not about anything else
(e.g. author's confidence level about the patch).

I am asking because you did not have S-o-b and did not say
anything about the other patch about commit object creation.
Although both patches looked sane to me.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-16  0:29 ` Junio C Hamano
@ 2008-01-16  0:42   ` Linus Torvalds
  0 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2008-01-16  0:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Tue, 15 Jan 2008, Junio C Hamano wrote:
> 
> Does "Signed-off-by:" line mean something different for you than
> for other people these days?

I intentionally don't tend to sign off stuff that I just send out as trial 
balloons, not meant to "really" be applied.

Of course, then I *also* forget to sign off emails that I do intend to be 
applied, so that rule-of-thumb is pretty weak ;)

		Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-15 23:50 Be more careful about updating refs Linus Torvalds
  2008-01-16  0:02 ` Linus Torvalds
  2008-01-16  0:29 ` Junio C Hamano
@ 2008-01-16  1:11 ` Junio C Hamano
  2008-01-23 22:53 ` Sam Vilain
  3 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2008-01-16  1:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> This makes write_ref_sha1() more careful: it actually checks the SHA1 of 
> the ref it is updating, and refuses to update a ref with an object that it 
> cannot find.
>
> Perhaps more importantly, it also refuses to update a branch head with a 
> non-commit object. I don't quite know *how* the stable series maintainers 
> were able to corrupt their repository to have a HEAD that pointed to a tag 
> rather than a commit object, but they did. Which results in a totally 
> broken repository that cannot be cloned or committed on.

Two questions and a comment:

 - Do we want to impose the same restriction on refs/remotes/?
   I think that is a logical thing to do.

 - What should the receive-pack and git-fetch do if they trigger
   the check in this codepath while updating the refs under the
   affected hierarchies?  Fail the push and fetch?

 - I think !strcmp(refname, "HEAD") is not quite a right check
   to do here.  In order to catch the detached head case, it
   must be checked, but at the same time if the head is not
   detached, it should look at where the symref points at
   (i.e. a symref HEAD that points outside refs/heads is an
   error, and we need to catch that).

> +static int is_branch(const char *refname)
> +{
> +	return !strcmp(refname, "HEAD") || !prefixcmp(refname, "refs/heads/");
> +}

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-16  0:02 ` Linus Torvalds
@ 2008-01-16 19:52   ` Junio C Hamano
  2008-01-17  9:15     ` Charles Bailey
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2008-01-16 19:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 15 Jan 2008, Linus Torvalds wrote:
>> 
>> This makes write_ref_sha1() more careful: it actually checks the SHA1 of 
>> the ref it is updating, and refuses to update a ref with an object that it 
>> cannot find.
>
> Side note: this breaks some tests, because those tests do things like
>
> 	git update-ref refs/heads/master 1111111111111111111111111111111111111111 &&
> 		test 1111111111111111111111111111111111111111 = $(cat .git/refs/heads/master)
>
> ...
> (Pet peeve on mine: people fixing assert()'s by changing the source-code, 
> without ever asking themselves whether maybe the assert itself was the 
> bug).

The rules for the plumbing used to be that refs can point at
anything that get_sha1() accepts.  We did not even require it to
be parse_object() happy let alone it being parse_commit() kosher.

You changed the world order.  I agree that the world order was
changed in a good way, but saying that the original test did not
check the right thing or it was a bug is not quite fair.  At
worst, we can say that it was very sloppily written by assuming
that the commands involved in the particular test would not care
about corrupted repositories whose refs point at nonexistant
bogus objects.

I'll squash the following to your patch.

---
 t/t1400-update-ref.sh |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index a90824b..71ab2dd 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -7,12 +7,19 @@ test_description='Test git update-ref and basic ref logging'
 . ./test-lib.sh
 
 Z=0000000000000000000000000000000000000000
-A=1111111111111111111111111111111111111111
-B=2222222222222222222222222222222222222222
-C=3333333333333333333333333333333333333333
-D=4444444444444444444444444444444444444444
-E=5555555555555555555555555555555555555555
-F=6666666666666666666666666666666666666666
+
+test_expect_success setup '
+
+	for name in A B C D E F
+	do
+		test_tick &&
+		T=$(git write-tree) &&
+		sha1=$(echo $name | git commit-tree $T) &&
+		eval $name=$sha1
+	done
+
+'
+
 m=refs/heads/master
 n_dir=refs/heads/gu
 n=$n_dir/fixes

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-16 19:52   ` Junio C Hamano
@ 2008-01-17  9:15     ` Charles Bailey
  2008-01-17 10:52       ` Johannes Sixt
  2008-01-17 10:56       ` Charles Bailey
  0 siblings, 2 replies; 36+ messages in thread
From: Charles Bailey @ 2008-01-17  9:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Wed, Jan 16, 2008 at 11:52:43AM -0800, Junio C Hamano wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > On Tue, 15 Jan 2008, Linus Torvalds wrote:
> >> 
> >> This makes write_ref_sha1() more careful: it actually checks the SHA1 of 
> >> the ref it is updating, and refuses to update a ref with an object that it 
> >> cannot find.
> >
> > Side note: this breaks some tests, because those tests do things like
> >
> > 	git update-ref refs/heads/master 1111111111111111111111111111111111111111 &&
> > 		test 1111111111111111111111111111111111111111 = $(cat .git/refs/heads/master)
> >
> > ...
> > (Pet peeve on mine: people fixing assert()'s by changing the source-code, 
> > without ever asking themselves whether maybe the assert itself was the 
> > bug).
> 
> The rules for the plumbing used to be that refs can point at
> anything that get_sha1() accepts.  We did not even require it to
> be parse_object() happy let alone it being parse_commit() kosher.
> 
> You changed the world order.  I agree that the world order was
> changed in a good way, but saying that the original test did not
> check the right thing or it was a bug is not quite fair.  At
> worst, we can say that it was very sloppily written by assuming
> that the commands involved in the particular test would not care
> about corrupted repositories whose refs point at nonexistant
> bogus objects.
> 
> I'll squash the following to your patch.

I'm assuming that this original patch and the test update turned into
the following commit in master:

c3b0dec509fe136c5417422f31898b5a4e2d5e02 is first bad commit

I just thought I should warn you that this seems (git bisect tells me
so) to have caused a failure in t9301-fast-export.sh on my Mac OS X
10.4.11 machine although I haven't yet had the time to investigate
why.

Charles.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-17  9:15     ` Charles Bailey
@ 2008-01-17 10:52       ` Johannes Sixt
  2008-01-17 11:01         ` Charles Bailey
  2008-01-17 10:56       ` Charles Bailey
  1 sibling, 1 reply; 36+ messages in thread
From: Johannes Sixt @ 2008-01-17 10:52 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Junio C Hamano, Linus Torvalds, Git Mailing List

Charles Bailey schrieb:
> I'm assuming that this original patch and the test update turned into
> the following commit in master:
> 
> c3b0dec509fe136c5417422f31898b5a4e2d5e02 is first bad commit
> 
> I just thought I should warn you that this seems (git bisect tells me
> so) to have caused a failure in t9301-fast-export.sh on my Mac OS X
> 10.4.11 machine although I haven't yet had the time to investigate
> why.

I observed the same (on Windows). The reason is that above-mentioned
commit introduces a call to parse_objects(). But by the time that
fast-import calls write_ref_sha1() (and, hence, this new parse_objects())
it has not yet written a pack file, and parse_objects() fails. I don't
have a clue how to fix this short of reverting the commit.

-- Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-17  9:15     ` Charles Bailey
  2008-01-17 10:52       ` Johannes Sixt
@ 2008-01-17 10:56       ` Charles Bailey
  1 sibling, 0 replies; 36+ messages in thread
From: Charles Bailey @ 2008-01-17 10:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Thu, Jan 17, 2008 at 09:15:58AM +0000, Charles Bailey wrote:
> 
> c3b0dec509fe136c5417422f31898b5a4e2d5e02 is first bad commit
> 
> I just thought I should warn you that this seems (git bisect tells me
> so) to have caused a failure in t9301-fast-export.sh on my Mac OS X
> 10.4.11 machine although I haven't yet had the time to investigate
> why.
> 

Further to this, the first problem is that the 'fast-export |
fast-import' test is failing.

As far as I can tell, fast-export is behaving no differently, the
generated export file looks identical in shape.  It seems that
fast-import is creating a bad temporary packfile for itself.

The error that I'm getting is:

fatal: unknown object type 0 in
.git/objects/pack/pack-6be2b92c2d7485fd5cefbb0d9a68827e4c23d548.pack
fast-import: dumping crash report to .git/fast_import_crash_12317

which seems to come from the unpack_entry function in sha1_file.c.

I haven't got much deeper into how the refs.c change has caused the
issue, or why on earth it seems to be a Mac only issue.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-17 10:52       ` Johannes Sixt
@ 2008-01-17 11:01         ` Charles Bailey
  2008-01-17 12:41           ` Johannes Sixt
  0 siblings, 1 reply; 36+ messages in thread
From: Charles Bailey @ 2008-01-17 11:01 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Linus Torvalds, Git Mailing List

On Thu, Jan 17, 2008 at 11:52:23AM +0100, Johannes Sixt wrote:
> 
> I observed the same (on Windows). The reason is that above-mentioned
> commit introduces a call to parse_objects(). But by the time that
> fast-import calls write_ref_sha1() (and, hence, this new parse_objects())
> it has not yet written a pack file, and parse_objects() fails. I don't
> have a clue how to fix this short of reverting the commit.
> 

OK, so it's not just Mac OS X, then.  From your description and my
initial poke at the code,  I can't immediately see a good reason why
the test should succeed on Linux, though.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-17 11:01         ` Charles Bailey
@ 2008-01-17 12:41           ` Johannes Sixt
  2008-01-17 12:58             ` Johannes Schindelin
  2008-01-18  1:43             ` Junio C Hamano
  0 siblings, 2 replies; 36+ messages in thread
From: Johannes Sixt @ 2008-01-17 12:41 UTC (permalink / raw)
  To: Charles Bailey
  Cc: Junio C Hamano, Linus Torvalds, Git Mailing List, Shawn O. Pearce

Charles Bailey schrieb:
> On Thu, Jan 17, 2008 at 11:52:23AM +0100, Johannes Sixt wrote:
>> I observed the same (on Windows). The reason is that above-mentioned
>> commit introduces a call to parse_objects(). But by the time that
>> fast-import calls write_ref_sha1() (and, hence, this new parse_objects())
>> it has not yet written a pack file, and parse_objects() fails. I don't
>> have a clue how to fix this short of reverting the commit.
>>
> 
> OK, so it's not just Mac OS X, then.  From your description and my
> initial poke at the code,  I can't immediately see a good reason why
> the test should succeed on Linux, though.

My analysis is not correct. The pack file is present, but it seems to be
incomplete.

The reason is the NO_MMAP build flag. If you compile with
NO_MMAP=YesPlease on Linux, t9301 fails as well. Does this ring a bell?

-- Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-17 12:41           ` Johannes Sixt
@ 2008-01-17 12:58             ` Johannes Schindelin
  2008-01-17 13:07               ` Charles Bailey
  2008-01-18  1:43             ` Junio C Hamano
  1 sibling, 1 reply; 36+ messages in thread
From: Johannes Schindelin @ 2008-01-17 12:58 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Charles Bailey, Junio C Hamano, Linus Torvalds, Git Mailing List,
	Shawn O. Pearce

Hi,

On Thu, 17 Jan 2008, Johannes Sixt wrote:

> Charles Bailey schrieb:
> > On Thu, Jan 17, 2008 at 11:52:23AM +0100, Johannes Sixt wrote:
> >> I observed the same (on Windows). The reason is that above-mentioned
> >> commit introduces a call to parse_objects(). But by the time that
> >> fast-import calls write_ref_sha1() (and, hence, this new parse_objects())
> >> it has not yet written a pack file, and parse_objects() fails. I don't
> >> have a clue how to fix this short of reverting the commit.
> >>
> > 
> > OK, so it's not just Mac OS X, then.  From your description and my
> > initial poke at the code,  I can't immediately see a good reason why
> > the test should succeed on Linux, though.
> 
> My analysis is not correct. The pack file is present, but it seems to be
> incomplete.
> 
> The reason is the NO_MMAP build flag. If you compile with
> NO_MMAP=YesPlease on Linux, t9301 fails as well. Does this ring a bell?

Just a wild guess... Could it be that the mmap() happened before contents 
were written to that fd?  The faked mmap() is just a malloc() && pread(), 
and would thus miss out on changes written after the mmap() call.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-17 12:58             ` Johannes Schindelin
@ 2008-01-17 13:07               ` Charles Bailey
  0 siblings, 0 replies; 36+ messages in thread
From: Charles Bailey @ 2008-01-17 13:07 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Sixt, Junio C Hamano, Linus Torvalds, Git Mailing List,
	Shawn O. Pearce

On Thu, Jan 17, 2008 at 12:58:07PM +0000, Johannes Schindelin wrote:
> 
> Just a wild guess... Could it be that the mmap() happened before contents 
> were written to that fd?  The faked mmap() is just a malloc() && pread(), 
> and would thus miss out on changes written after the mmap() call.

This would be my guess too.  As far as I can tell fast-import is
creating objects directly in a packfile.  I'm guessing here, but the
new checking code is probably being triggered to check (for example)
the parent ref for commits that fast-import is creating and the check
only works if mmap can successfully examine objects in the
still-being-written packfile.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-17 12:41           ` Johannes Sixt
  2008-01-17 12:58             ` Johannes Schindelin
@ 2008-01-18  1:43             ` Junio C Hamano
  2008-01-18  2:01               ` Junio C Hamano
  2008-01-18  2:30               ` Be more careful about updating refs Shawn O. Pearce
  1 sibling, 2 replies; 36+ messages in thread
From: Junio C Hamano @ 2008-01-18  1:43 UTC (permalink / raw)
  To: Johannes Sixt, Shawn O. Pearce
  Cc: Charles Bailey, Linus Torvalds, Git Mailing List

Johannes Sixt <j.sixt@viscovery.net> writes:

> Charles Bailey schrieb:
>> On Thu, Jan 17, 2008 at 11:52:23AM +0100, Johannes Sixt wrote:
>>> I observed the same (on Windows). The reason is that above-mentioned
>>> commit introduces a call to parse_objects(). But by the time that
>>> fast-import calls write_ref_sha1() (and, hence, this new parse_objects())
>>> it has not yet written a pack file, and parse_objects() fails. I don't
>>> have a clue how to fix this short of reverting the commit.
>>>
>> 
>> OK, so it's not just Mac OS X, then.  From your description and my
>> initial poke at the code,  I can't immediately see a good reason why
>> the test should succeed on Linux, though.
>
> My analysis is not correct. The pack file is present, but it seems to be
> incomplete.
>
> The reason is the NO_MMAP build flag. If you compile with
> NO_MMAP=YesPlease on Linux, t9301 fails as well. Does this ring a bell?

I ran strace and found that fast-import retains three windows to
the same data that was opened while the pack was still being
built (i.e. the filename is still tmp_pack_XXXXXX) when it dies:

    {next = 0x6f6d20, base = 0x6f77a0 "PACK", offset = 0, len = 907,
     last_used = 9, inuse_cnt = 0}
    {next = 0x728630, base = 0x6f7160 "PACK", offset = 0, len = 500, 
      last_used = 5, inuse_cnt = 0}
    {next = 0x0, base = 0x6f6d50 "PACK", offset = 0, len = 261, 
      last_used = 1, inuse_cnt = 0}

When it hits end_packfile() to install the built-pack to its
final destination and makes it available to the rest of git,
however, it has this clever code to reuse the still open window.

The object offset asked when it dies is at offset 887.  However,
tmp_pack was written after that last window with length 907 has
been (re)opened, and reusing the window results in reading a
stale data.

The following patch seems to fix the issue for me, but this is
primarily meant for discussion, as I do not quite understand why
the same issue does not manifest itself when NO_MMAP is not
used.

diff --git a/fast-import.c b/fast-import.c
index 3609c24..bd0ddb1 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -926,7 +926,13 @@ static void end_packfile(void)
 		new_p = add_packed_git(idx_name, strlen(idx_name), 1);
 		if (!new_p)
 			die("core git rejected index %s", idx_name);
-		new_p->windows = old_p->windows;
+		while (old_p->windows) {
+			struct pack_window *w = old_p->windows;
+			munmap(w->base, w->len);
+			old_p->windows = w->next;
+			free(w);
+		}
+		new_p->windows = NULL;
 		all_packs[pack_id] = new_p;
 		install_packed_git(new_p);
 

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-18  1:43             ` Junio C Hamano
@ 2008-01-18  2:01               ` Junio C Hamano
  2008-01-18  2:13                 ` Shawn O. Pearce
  2008-01-18  2:30               ` Be more careful about updating refs Shawn O. Pearce
  1 sibling, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2008-01-18  2:01 UTC (permalink / raw)
  To: Shawn O. Pearce, Johannes Sixt
  Cc: Charles Bailey, Linus Torvalds, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> The following patch seems to fix the issue for me, but this is
> primarily meant for discussion, as I do not quite understand why
> the same issue does not manifest itself when NO_MMAP is not
> used.
> ...

BTW, the lookup of the object that dies is in update_branch().
The call to write_ref_sha1() at the last tries to verify b->sha1
is available and is a commit.

        static int update_branch(struct branch *b)
        {
                static const char *msg = "fast-import";
                struct ref_lock *lock;
                unsigned char old_sha1[20];

                if (read_ref(b->name, old_sha1))
                        hashclr(old_sha1);
                lock = lock_any_ref_for_update(b->name, old_sha1, 0);
                if (!lock)
                        return error("Unable to lock %s", b->name);
                if (!force_update && !is_null_sha1(old_sha1)) {
                        struct commit *old_cmit, *new_cmit;

                        old_cmit = lookup_commit_reference_gently(old_sha1, 0);
                        new_cmit = lookup_commit_reference_gently(b->sha1, 0);
                        if (!old_cmit || !new_cmit) {
                                unlock_ref(lock);
                                return error("Branch %s is missing commits.", b->name);
                        }

                        if (!in_merge_bases(old_cmit, &new_cmit, 1)) {
                                unlock_ref(lock);
                                warning("Not updating %s"
                                        " (new tip %s does not contain %s)",
                                        b->name, sha1_to_hex(b->sha1), sha1_to_hex(old_sha1));
                                return -1;
                        }
                }
                if (write_ref_sha1(lock, b->sha1, msg) < 0)
                        return error("Unable to update %s", b->name);
                return 0;
        }

The write_ref_sha1() function previously did not even check if
the object pointed at by b->sha1 actually existed, but now it
parses the object and makes sure it is a commit, if the updated
ref is under ref/heads/ hierarchy.

What I do not quite understand is how this can be a new issue.

The codepath to allow updating an existing branch shown above
(i.e. "if it is not force and old is not NULL") uses the usual
lookup_commit_reference_gently() interface to access b->sha1,
and does not use gfi-aware gfi_unpack_entry() or anything
magical, which means it would have passed the same codepath down
to trigger the same issue.  IOW, even before this tightening of
write_ref_sha1(), we already should have had the issue of not
being to able to grab the object b->sha1 refers to out of the
newly built packfile.

Is it just nobody seriously exercised the codepath yet, or is
there a difference between these two calls that is more subtle
than that?

Quite confused...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-18  2:01               ` Junio C Hamano
@ 2008-01-18  2:13                 ` Shawn O. Pearce
  2008-01-18  2:25                   ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Shawn O. Pearce @ 2008-01-18  2:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, Charles Bailey, Linus Torvalds, Git Mailing List

Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > The following patch seems to fix the issue for me, but this is
> > primarily meant for discussion, as I do not quite understand why
> > the same issue does not manifest itself when NO_MMAP is not
> > used.
> > ...
...
> What I do not quite understand is how this can be a new issue.

It isn't.  It is indeed a pretty old issue, as far as git issues go.
Its probably about as old as fast-import being accepted into your
tree is.
 
> The codepath to allow updating an existing branch shown above
> (i.e. "if it is not force and old is not NULL") uses the usual
> lookup_commit_reference_gently() interface to access b->sha1,
> and does not use gfi-aware gfi_unpack_entry() or anything
> magical, which means it would have passed the same codepath down
> to trigger the same issue.  IOW, even before this tightening of
> write_ref_sha1(), we already should have had the issue of not
> being to able to grab the object b->sha1 refers to out of the
> newly built packfile.
> 
> Is it just nobody seriously exercised the codepath yet, or is
> there a difference between these two calls that is more subtle
> than that?

I think the problem is nobody has tested fast-import updating an
existing ref while using NO_MMAP.  Or if they did, they didn't report
the problem as they didn't figure they needed fast-import that badly.

Updating an existing ref is not a common operation, but the test
suite does test for it.  So it must be the NO_MMAP configuration
is simply not being tested well enough.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-18  2:13                 ` Shawn O. Pearce
@ 2008-01-18  2:25                   ` Junio C Hamano
  2008-01-18  2:33                     ` Shawn O. Pearce
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2008-01-18  2:25 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Johannes Sixt, Charles Bailey, Linus Torvalds, Git Mailing List

"Shawn O. Pearce" <spearce@spearce.org> writes:

> I think the problem is nobody has tested fast-import updating an
> existing ref while using NO_MMAP.  Or if they did, they didn't report
> the problem as they didn't figure they needed fast-import that badly.
>
> Updating an existing ref is not a common operation, but the test
> suite does test for it.  So it must be the NO_MMAP configuration
> is simply not being tested well enough.

Ok, thanks.

Now a more important question is how we would properly fix this
issue?

I suspect that fast-import is the only one that opens windows
into an unfinalized pack, and if that is the case, it would be
the only program that may be hit by the issue of mmap emulation
getting stale data.

I do not think the patch I posted was correct at all.

Especially, I am not sure if the issue only exists at the
end_packfile() boundary.  Don't we have the same issue reading
from the packfile being built, and isn't the only reason my hack
works it around is because access patterns of the testsuite
happens to not trigger it?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-18  1:43             ` Junio C Hamano
  2008-01-18  2:01               ` Junio C Hamano
@ 2008-01-18  2:30               ` Shawn O. Pearce
  1 sibling, 0 replies; 36+ messages in thread
From: Shawn O. Pearce @ 2008-01-18  2:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, Charles Bailey, Linus Torvalds, Git Mailing List

Junio C Hamano <gitster@pobox.com> wrote:
> I ran strace and found that fast-import retains three windows to
> the same data that was opened while the pack was still being
> built (i.e. the filename is still tmp_pack_XXXXXX) when it dies:
> 
>     {next = 0x6f6d20, base = 0x6f77a0 "PACK", offset = 0, len = 907,
>      last_used = 9, inuse_cnt = 0}
>     {next = 0x728630, base = 0x6f7160 "PACK", offset = 0, len = 500, 
>       last_used = 5, inuse_cnt = 0}
>     {next = 0x0, base = 0x6f6d50 "PACK", offset = 0, len = 261, 
>       last_used = 1, inuse_cnt = 0}

Ouch.  Hmm, well, maybe it shouldn't keep three windows open to
the same part of the file, but with different lengths.  :)

But that's another issue.

> The following patch seems to fix the issue for me, but this is
> primarily meant for discussion,

I agree the patch should fix this particular issue and the
performance difference it will cause is minor enough to not be
worth discussing further.  But I have one minor comment (please
see it below).

> as I do not quite understand why
> the same issue does not manifest itself when NO_MMAP is not
> used.

When NO_MMAP is off (we are actually using the OS's true mmap)
the data is updated into the window when the file is written,
as the window is backed by the OS's filesystem cache.

If the data access is outside of the window (past the offset of
the longest window, here 907) we would just open a new window to
the relevant region of the file.  But when it is inside the window,
the window being backed by the filesystem cache saved us.  In the
case of NO_MMAP this doesn't work, so we get a failure later during
object access.

In particular if you look at gfi_unpack_entry() (which is where
we cause windows to be opened on the file being created) we tell
sha1_file.c it has 20 bytes available in the window *beyond* the
actual end of the file.  This is due to an assumption within the
windowing code of sha1_file.c that we always have the packfile
footer at the end of the file, so all windowing code assumes it
has at least 20 bytes from the start of a window that it can access
without needing to perform additional bounds checking.

So what's happening here is we are adding objects into the file,
some of which may have their data appearing in the trailing 20 bytes
of some prior window.  If that happens the cached window looks like
it can be used to access the data, but it really cannot be in the
NO_MMAP case as those trailing 20 bytes of the window are all \0's.

The astute reader may wonder how gfi_unpack_entry() works when
NO_MMAP is being used.  It doesn't for that last 20 bytes of
any window.  Which has me thinking that Junio's patch alone isn't
enough to make fast-import work correctly under NO_MMAP.

> diff --git a/fast-import.c b/fast-import.c
> index 3609c24..bd0ddb1 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -926,7 +926,13 @@ static void end_packfile(void)
>  		new_p = add_packed_git(idx_name, strlen(idx_name), 1);
>  		if (!new_p)
>  			die("core git rejected index %s", idx_name);
> -		new_p->windows = old_p->windows;
> +		while (old_p->windows) {
> +			struct pack_window *w = old_p->windows;
> +			munmap(w->base, w->len);
> +			old_p->windows = w->next;
> +			free(w);
> +		}
> +		new_p->windows = NULL;

This assignment of NULL should not be necessary.  add_packed_git()
already does this for us on new_p so we do not need to repeat it
here in fast-import.

>  		all_packs[pack_id] = new_p;
>  		install_packed_git(new_p);
>  

-- 
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-18  2:25                   ` Junio C Hamano
@ 2008-01-18  2:33                     ` Shawn O. Pearce
  2008-01-18  2:58                       ` Shawn O. Pearce
  0 siblings, 1 reply; 36+ messages in thread
From: Shawn O. Pearce @ 2008-01-18  2:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, Charles Bailey, Linus Torvalds, Git Mailing List

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > I think the problem is nobody has tested fast-import updating an
> > existing ref while using NO_MMAP.  Or if they did, they didn't report
> > the problem as they didn't figure they needed fast-import that badly.
> >
> > Updating an existing ref is not a common operation, but the test
> > suite does test for it.  So it must be the NO_MMAP configuration
> > is simply not being tested well enough.
> 
> Now a more important question is how we would properly fix this
> issue?
> 
> I suspect that fast-import is the only one that opens windows
> into an unfinalized pack, and if that is the case, it would be
> the only program that may be hit by the issue of mmap emulation
> getting stale data.

Yes, it is the only program that is foolish enough to access a
partial packfile.  index-pack uses pread(), and only after it has
the entire packfile downloaded to the local system.  The only other
reader of a partial packfile is unpack-objects, and obviously that
doesn't care about seeking backwards within the packfile.

So its probably the only user who suffers from the mmap emulation.

> I do not think the patch I posted was correct at all.
> 
> Especially, I am not sure if the issue only exists at the
> end_packfile() boundary.  Don't we have the same issue reading
> from the packfile being built, and isn't the only reason my hack
> works it around is because access patterns of the testsuite
> happens to not trigger it?

Yes, that's my take on it as well (see my other email).  The
testsuite must just be really lucky that its not hitting the
boundary condition.

I almost said gfi_unpack_entry() was immune from this bug, but
I went back and read the code again and determined that it does
in fact suffer from this under NO_MMAP, and we're just really
damn lucky nobody has caused it.

I'll try to work up a patch this evening.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-18  2:33                     ` Shawn O. Pearce
@ 2008-01-18  2:58                       ` Shawn O. Pearce
  2008-01-18  3:18                         ` Shawn O. Pearce
  0 siblings, 1 reply; 36+ messages in thread
From: Shawn O. Pearce @ 2008-01-18  2:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, Charles Bailey, Linus Torvalds, Git Mailing List

"Shawn O. Pearce" <spearce@spearce.org> wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
> > 
> > Especially, I am not sure if the issue only exists at the
> > end_packfile() boundary.  Don't we have the same issue reading
> > from the packfile being built, and isn't the only reason my hack
> > works it around is because access patterns of the testsuite
> > happens to not trigger it?
> 
> Yes, that's my take on it as well (see my other email).  The
> testsuite must just be really lucky that its not hitting the
> boundary condition.
> 
> I almost said gfi_unpack_entry() was immune from this bug, but
> I went back and read the code again and determined that it does
> in fact suffer from this under NO_MMAP, and we're just really
> damn lucky nobody has caused it.

I think this solves the problem.  Its based on your first patch, but
would replace it.  The trick here is we close the cached windows if
we are accessing data from the packfile we are appending into and we
have increased the file length.  This way we don't blow away windows
during high read/low write periods, like during branch cache reloads.

--8>--
diff --git a/fast-import.c b/fast-import.c
index 3609c24..8e7747c 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -907,6 +907,16 @@ static void unkeep_all_packs(void)
 	}
 }
 
+static void close_all_windows(struct packed_git *p)
+{
+	while (p->windows) {
+		struct pack_window *w = p->windows;
+		munmap(w->base, w->len);
+		p->windows = w->next;
+		free(w);
+	}
+}
+
 static void end_packfile(void)
 {
 	struct packed_git *old_p = pack_data, *new_p;
@@ -917,6 +927,7 @@ static void end_packfile(void)
 		struct branch *b;
 		struct tag *t;
 
+		close_all_windows(pack_data);
 		fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1,
 				    pack_data->pack_name, object_count);
 		close(pack_data->pack_fd);
@@ -926,7 +937,6 @@ static void end_packfile(void)
 		new_p = add_packed_git(idx_name, strlen(idx_name), 1);
 		if (!new_p)
 			die("core git rejected index %s", idx_name);
-		new_p->windows = old_p->windows;
 		all_packs[pack_id] = new_p;
 		install_packed_git(new_p);
 
@@ -1129,8 +1139,10 @@ static void *gfi_unpack_entry(
 {
 	enum object_type type;
 	struct packed_git *p = all_packs[oe->pack_id];
-	if (p == pack_data)
+	if (p == pack_data && p->pack_size < (pack_size + 20)) {
+		close_all_windows(p);
 		p->pack_size = pack_size + 20;
+	}
 	return unpack_entry(p, oe->offset, &type, sizep);
 }
 

-- 
Shawn.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-18  2:58                       ` Shawn O. Pearce
@ 2008-01-18  3:18                         ` Shawn O. Pearce
  2008-01-18  3:22                           ` Shawn O. Pearce
  0 siblings, 1 reply; 36+ messages in thread
From: Shawn O. Pearce @ 2008-01-18  3:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, Charles Bailey, Linus Torvalds, Git Mailing List

"Shawn O. Pearce" <spearce@spearce.org> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> wrote:
> > Junio C Hamano <gitster@pobox.com> wrote:
> > > 
> > > Especially, I am not sure if the issue only exists at the
> > > end_packfile() boundary.  Don't we have the same issue reading
> > > from the packfile being built, and isn't the only reason my hack
> > > works it around is because access patterns of the testsuite
> > > happens to not trigger it?
> > 
> > Yes, that's my take on it as well (see my other email).  The
> > testsuite must just be really lucky that its not hitting the
> > boundary condition.
> > 
> > I almost said gfi_unpack_entry() was immune from this bug, but
> > I went back and read the code again and determined that it does
> > in fact suffer from this under NO_MMAP, and we're just really
> > damn lucky nobody has caused it.
> 
> I think this solves the problem.  Its based on your first patch, but
> would replace it.  The trick here is we close the cached windows if
> we are accessing data from the packfile we are appending into and we
> have increased the file length.  This way we don't blow away windows
> during high read/low write periods, like during branch cache reloads.

Junio pointed out my first attempt at this didn't update the
memory pressure values, so we could "run out of memory" even
if we had plenty free.

Try #2...
 
--8>--
diff --git a/cache.h b/cache.h
index 24735bd..15388aa 100644
--- a/cache.h
+++ b/cache.h
@@ -561,6 +561,7 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 extern void pack_report(void);
 extern int open_pack_index(struct packed_git *);
 extern unsigned char* use_pack(struct packed_git *, struct pack_window **, off_t, unsigned int *);
+extern void close_pack_windows(struct packed_git *, int);
 extern void unuse_pack(struct pack_window **);
 extern struct packed_git *add_packed_git(const char *, int, int);
 extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t);
diff --git a/fast-import.c b/fast-import.c
index 3609c24..82c82ce 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -917,6 +917,7 @@ static void end_packfile(void)
 		struct branch *b;
 		struct tag *t;
 
+		close_pack_windows(pack_data, 0);
 		fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1,
 				    pack_data->pack_name, object_count);
 		close(pack_data->pack_fd);
@@ -926,7 +927,6 @@ static void end_packfile(void)
 		new_p = add_packed_git(idx_name, strlen(idx_name), 1);
 		if (!new_p)
 			die("core git rejected index %s", idx_name);
-		new_p->windows = old_p->windows;
 		all_packs[pack_id] = new_p;
 		install_packed_git(new_p);
 
@@ -1129,8 +1129,10 @@ static void *gfi_unpack_entry(
 {
 	enum object_type type;
 	struct packed_git *p = all_packs[oe->pack_id];
-	if (p == pack_data)
+	if (p == pack_data && p->pack_size < (pack_size + 20)) {
+		close_pack_windows(p, 0);
 		p->pack_size = pack_size + 20;
+	}
 	return unpack_entry(p, oe->offset, &type, sizep);
 }
 
diff --git a/sha1_file.c b/sha1_file.c
index 6583797..50d1dea 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -611,6 +611,34 @@ void release_pack_memory(size_t need, int fd)
 		; /* nothing */
 }
 
+void close_pack_windows(struct packed_git *p, int retain_fd)
+{
+	struct pack_window **tail = NULL, *n = p->windows;
+	while (n) {
+		struct pack_window *w = p->windows;
+
+		if (w->inuse_cnt) {
+			*tail = w;
+			tail = &w->next;
+			continue;
+		}
+
+		munmap(w->base, w->len);
+		pack_mapped -= w->len;
+		pack_open_windows--;
+		n = w->next;
+		free(w);
+	}
+
+	p->windows = *tail;
+	if (p->windows)
+		warning("pack windows still in-use during close attempt");
+	else if (!retain_fd && p->pack_fd != -1) {
+		close(p->pack_fd);
+		p->pack_fd = -1;
+	}
+}
+
 void unuse_pack(struct pack_window **w_cursor)
 {
 	struct pack_window *w = *w_cursor;

-- 
Shawn.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-18  3:18                         ` Shawn O. Pearce
@ 2008-01-18  3:22                           ` Shawn O. Pearce
       [not found]                             ` <20080118035700.GA3458@spearce.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Shawn O. Pearce @ 2008-01-18  3:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, Charles Bailey, Linus Torvalds, Git Mailing List

"Shawn O. Pearce" <spearce@spearce.org> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> wrote:
> > "Shawn O. Pearce" <spearce@spearce.org> wrote:
> > > Junio C Hamano <gitster@pobox.com> wrote:
> > > > 
> > > > Especially, I am not sure if the issue only exists at the
> > > > end_packfile() boundary.  Don't we have the same issue reading
> > > > from the packfile being built, and isn't the only reason my hack
> > > > works it around is because access patterns of the testsuite
> > > > happens to not trigger it?
> > > 
> > > Yes, that's my take on it as well (see my other email).  The
> > > testsuite must just be really lucky that its not hitting the
> > > boundary condition.
> > > 
> > > I almost said gfi_unpack_entry() was immune from this bug, but
> > > I went back and read the code again and determined that it does
> > > in fact suffer from this under NO_MMAP, and we're just really
> > > damn lucky nobody has caused it.
> > 
> > I think this solves the problem.  Its based on your first patch, but
> > would replace it.  The trick here is we close the cached windows if
> > we are accessing data from the packfile we are appending into and we
> > have increased the file length.  This way we don't blow away windows
> > during high read/low write periods, like during branch cache reloads.
> 
> Junio pointed out my first attempt at this didn't update the
> memory pressure values, so we could "run out of memory" even
> if we had plenty free.
> 
> Try #2...

OK, that was crap.  Don't even try it.

I'm holding off sending anything more until I get the test suite
to actually run without telling me the bus crashed into the wall.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
       [not found]                             ` <20080118035700.GA3458@spearce.org>
@ 2008-01-18  4:27                               ` Linus Torvalds
  2008-01-18  8:42                                 ` Charles Bailey
  2008-01-18  6:10                               ` Junio C Hamano
                                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2008-01-18  4:27 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git, Johannes Sixt, Charles Bailey

On Thu, 17 Jan 2008, Shawn O. Pearce wrote:
>
> fast-import was relying on the fact that on most systems mmap() and
> write() are synchronized by the filesystem's buffer cache.  We were
> relying on the ability to mmap() 20 bytes beyond the current end
> of the file, then later fill in those bytes with a future write()
> call, then read them through the previously obtained mmap() address.
> 
> This isn't always true with some implementations of NFS, but it is
> especially not true with our NO_MMAP=YesPlease build time option used
> on some platforms.

In fact, even with mmap(), it's not guaranteed. There are really crappy 
mmap implementations out there, partly due to bad CPU design (virtual CPU 
caches without coherency), but more often due to total crap OS.

(Yeah, Linux did count in that area at some point. Long ago. Early 90's. 
Maybe)

I think HP-UX used to have non-coherent mmap for the longest time, due to 
carrying around some totally crap memory management based on some ancient 
BSD version that everybody else (including the BSD's) had long since 
jettisoned.

That said, I suspect any unix you can run today (without calling it a 
retro setup) probably has coherent-enough mmap. The possible virtual cache 
coherency issue is unlikely to be able to trigger this (and not relevant 
on any sane hardware anyway).

				Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
       [not found]                             ` <20080118035700.GA3458@spearce.org>
  2008-01-18  4:27                               ` [PATCH] Fix random fast-import errors when compiled with NO_MMAP Linus Torvalds
@ 2008-01-18  6:10                               ` Junio C Hamano
  2008-01-21  4:10                                 ` Shawn O. Pearce
  2008-01-18  7:53                               ` Johannes Sixt
  2008-01-18  9:26                               ` Charles Bailey
  3 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2008-01-18  6:10 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git, Johannes Sixt, Charles Bailey, Linus Torvalds

"Shawn O. Pearce" <spearce@spearce.org> writes:

> +extern void close_pack_windows(struct packed_git *, int);

Nobody seems to pass anything but true in retain_fd parameter.
Is it worth it?

> +void close_pack_windows(struct packed_git *p, int retain_fd)
> +{
> +	struct pack_window *tail_var = NULL, *n = p->windows;
> +	struct pack_window **tail = &tail_var;
> +	while (n) {
> +		struct pack_window *w = p->windows;
> +
> +		if (w->inuse_cnt) {
> +			*tail = w;
> +			tail = &w->next;
> +			continue;
> +		}
> +
> +		munmap(w->base, w->len);
> +		pack_mapped -= w->len;
> +		pack_open_windows--;
> +		n = w->next;
> +		free(w);
> +	}
> +
> +	p->windows = tail_var;
> +	if (p->windows)
> +		warning("pack windows still in-use during close attempt");
> +	else if (!retain_fd && p->pack_fd != -1) {
> +		close(p->pack_fd);
> +		p->pack_fd = -1;
> +	}
> +}

I am not sure about this inuse_cnt business.

The only caller we know that needs this function is fast-import
that wants to drop all windows into a pack while keeping the
file descriptor and it should not have an in-use windows.

It is unclear what semantics is the right one for callers that
do want to retain some windows but still want to call this
function.  If somebody is in the middle of chasing a delta chain
and still calls this function, *why* is the call being made, and
what is the right behaviour if not all the windows can be closed
because of these open windows?

What about the case the value passed in ratain_fd is 0, which
presumably means the caller is asking us to close the file
descriptor?  If we close the packfile, later accesses through
the unclosed windows will obviously barf and I understand that
is why you are ignoring retain_fd in that case, but maybe for
the caller it was of higher priority that the packfile to get
closed than the remaining windows to be still usable.  Or maybe
the caller wants to be notified of such an error, in which case
it probably is not enough to just call warning().

IOW, I think the patch is trying to be too flexible without
having a clear definition of what that flexibility is trying to
achieve.

Maybe we would need more flexible version later, but I am not
convinced if the semantics the above patch implements is the
right one.

So I'd prefer something much simpler like this one instead...

void close_pack_windows(struct packed_git *p)
{
	while (p->windows) {
		struct pack_window *w = p->windows;

		if (w->inuse_cnt)
			die("pack '%s' still has outstanding windows",
				p->pack_name)
		munmap(w->base, w->len);
		pack_mapped -= w->len;
		pack_open_windows--;
		p->windows = w->next;
		free(w);
	}
}

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
       [not found]                             ` <20080118035700.GA3458@spearce.org>
  2008-01-18  4:27                               ` [PATCH] Fix random fast-import errors when compiled with NO_MMAP Linus Torvalds
  2008-01-18  6:10                               ` Junio C Hamano
@ 2008-01-18  7:53                               ` Johannes Sixt
  2008-01-18  9:26                               ` Charles Bailey
  3 siblings, 0 replies; 36+ messages in thread
From: Johannes Sixt @ 2008-01-18  7:53 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git, Charles Bailey, Linus Torvalds

Shawn O. Pearce schrieb:
>  This should do the trick.  I SOB'd the patch as I did test it with
>  NO_MMAP and I think its valid.  :-)

Thanks a lot. It works here on Windows, too.

-- Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
  2008-01-18  4:27                               ` [PATCH] Fix random fast-import errors when compiled with NO_MMAP Linus Torvalds
@ 2008-01-18  8:42                                 ` Charles Bailey
  2008-01-18 17:08                                   ` Linus Torvalds
  0 siblings, 1 reply; 36+ messages in thread
From: Charles Bailey @ 2008-01-18  8:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shawn O. Pearce, Junio C Hamano, git, Johannes Sixt

On Thu, Jan 17, 2008 at 08:27:08PM -0800, Linus Torvalds wrote:
> 
> 
> On Thu, 17 Jan 2008, Shawn O. Pearce wrote:
> >
> > fast-import was relying on the fact that on most systems mmap() and
> > write() are synchronized by the filesystem's buffer cache.  We were
> > relying on the ability to mmap() 20 bytes beyond the current end
> > of the file, then later fill in those bytes with a future write()
> > call, then read them through the previously obtained mmap() address.
> > 
> > This isn't always true with some implementations of NFS, but it is
> > especially not true with our NO_MMAP=YesPlease build time option used
> > on some platforms.
> 
> In fact, even with mmap(), it's not guaranteed. There are really crappy 
> mmap implementations out there, partly due to bad CPU design (virtual CPU 
> caches without coherency), but more often due to total crap OS.
> 
> (Yeah, Linux did count in that area at some point. Long ago. Early 90's. 
> Maybe)
> 
> I think HP-UX used to have non-coherent mmap for the longest time, due to 
> carrying around some totally crap memory management based on some ancient 
> BSD version that everybody else (including the BSD's) had long since 
> jettisoned.
> 
> That said, I suspect any unix you can run today (without calling it a 
> retro setup) probably has coherent-enough mmap. The possible virtual cache 
> coherency issue is unlikely to be able to trigger this (and not relevant 
> on any sane hardware anyway).
> 
> 				Linus

I've just checked the Mac OS X build and it looks like there is a mmap
and git is indeed using it, so this is obviously an example of a
"really crappy" mmap implementation.

This adds more ammunition to the fight against the whole Mac OS X is
powered/built/based on UNIX myth.

Charles.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
       [not found]                             ` <20080118035700.GA3458@spearce.org>
                                                 ` (2 preceding siblings ...)
  2008-01-18  7:53                               ` Johannes Sixt
@ 2008-01-18  9:26                               ` Charles Bailey
  2008-01-18  9:36                                 ` Junio C Hamano
  3 siblings, 1 reply; 36+ messages in thread
From: Charles Bailey @ 2008-01-18  9:26 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git, Johannes Sixt, Linus Torvalds

I can ACK this as fixing the test suite failures on Mac OS X 10.4.11
both with and without NO_MMAP=Yes on top of 1.5.4.rc3.24.gb53139 and
also on my oldish Fedora install with NO_MMAP=Yes.

Acked-by: Charles Bailey <charles@hashpling.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
  2008-01-18  9:26                               ` Charles Bailey
@ 2008-01-18  9:36                                 ` Junio C Hamano
  2008-01-18  9:45                                   ` Charles Bailey
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2008-01-18  9:36 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Shawn O. Pearce, git, Johannes Sixt, Linus Torvalds

Charles Bailey <charles@hashpling.org> writes:

> I can ACK this as fixing the test suite failures on Mac OS X 10.4.11
> both with and without NO_MMAP=Yes on top of 1.5.4.rc3.24.gb53139 and
> also on my oldish Fedora install with NO_MMAP=Yes.
>
> Acked-by: Charles Bailey <charles@hashpling.org>

Well, I do not want to be picky with these things, but you do
not own that area, so that would be "Tested-by:"

I've already committed the simplified version I suggested to
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
  2008-01-18  9:36                                 ` Junio C Hamano
@ 2008-01-18  9:45                                   ` Charles Bailey
  2008-01-18 10:57                                     ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Charles Bailey @ 2008-01-18  9:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git, Johannes Sixt, Linus Torvalds

On Fri, Jan 18, 2008 at 01:36:55AM -0800, Junio C Hamano wrote:
> 
> Well, I do not want to be picky with these things, but you do
> not own that area, so that would be "Tested-by:"
> 
> I've already committed the simplified version I suggested to
> Shawn.
> 

Sorry, I thought that Acked-by was an acceptable tag for people not
involved in the development path of a patch.  I don't want to add a
"Tested-by" as I am not a fast-import user and I am *only* reporting
the success of the test suite, not any real-world testing.

Charles.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
  2008-01-18  9:45                                   ` Charles Bailey
@ 2008-01-18 10:57                                     ` Junio C Hamano
  0 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2008-01-18 10:57 UTC (permalink / raw)
  To: Charles Bailey; +Cc: git

Charles Bailey <charles@hashpling.org> writes:

> Sorry, I thought that Acked-by was an acceptable tag for people not
> involved in the development path of a patch.  I don't want to add a
> "Tested-by" as I am not a fast-import user and I am *only* reporting
> the success of the test suite, not any real-world testing.

It still counts as "Tested-by:".  Earlier you saw breakage and
now you see it fixed in your environment, to which neither I nor
Shawn have access to.  That's a good verification that the patch
fixed the issue for you and your feedback was very much
appreciated (your timely initial breakage report even more so).

Acked-by: is usually given by people who are the most familiar
with the code being affected (or at least by the ones more
familiar than the patch's author), in order to let me and the
world know that he thinks the contents of the patch makes sense.
IOW, the patch was checked by somebody who knows the existing
code well enough to catch any subtleties in the existing code
that might have been broken if the patch's author was not
careful.  It just felt slightly funny that anybody is giving an
Acked-by: to a patch Shawn made to fast-import.c, which is his
brainchild ;-).

Signed-off-by is about the kosherness from software license
point of view, while Acked-by and Tested-by are about the
correctness.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
  2008-01-18  8:42                                 ` Charles Bailey
@ 2008-01-18 17:08                                   ` Linus Torvalds
  2008-01-19  3:25                                     ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2008-01-18 17:08 UTC (permalink / raw)
  To: Charles Bailey; +Cc: Shawn O. Pearce, Junio C Hamano, git, Johannes Sixt

On Fri, 18 Jan 2008, Charles Bailey wrote:
> 
> I've just checked the Mac OS X build and it looks like there is a mmap
> and git is indeed using it, so this is obviously an example of a
> "really crappy" mmap implementation.

Looking closer, this is not necessarily the case here.

Git uses MAP_PRIVATE, because that whole pack-file mapping was really 
*meant* to map an existing read-only pack-file, and fast-import seems to 
really be screwing with it.

It so happens that Linux has a particularly clean and streamlined VM, and 
if you do only reads to a MAP_PRIVATE mapping on a normal filesystem, 
you'll always be as coherent as with MAP_SHARED because Linux will simply 
map in the page cache pages directly.

But this is definitely not portable, and the git fast-import mmap window 
usage before Shawn's patch it was simply wrong.

So in this case, it really was git that was crap.

It just happened to work because the Linux mmap handling is just generally 
pretty sane. It probably also worked fine on pretty much any other modern 
UNIX (ie Solaris).

I'm not quite sure what OS X does to MAP_PRIVATE mappings, but if OS X is 
still based on Mach (with FreeBSD just as a single-server on top), I 
suspect that may be why it broke on OS X. The Mach VM is insanely complex 
and does really odd things.

But the fact is, without MAP_SHARED, you shouldn't expect things to be 
coherent, even if they often will be (especially for PROT_READ).

Btw, even with Shawn's patch, I wonder if the index_data usage is correct.

			Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
  2008-01-18 17:08                                   ` Linus Torvalds
@ 2008-01-19  3:25                                     ` Junio C Hamano
  2008-01-19  3:55                                       ` Linus Torvalds
  2008-01-21  3:57                                       ` Shawn O. Pearce
  0 siblings, 2 replies; 36+ messages in thread
From: Junio C Hamano @ 2008-01-19  3:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Charles Bailey, Shawn O. Pearce, git, Johannes Sixt

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Btw, even with Shawn's patch, I wonder if the index_data usage is correct.

Hmph.

gfi uses data in a "pack" in two quite different ways.

A new object is written to an unfinalized pack.  Such a pack
already has "struct packed_git" allocated for it and a pointer
to it is held in pack_data.  As far as the core part of git
(that is, sha1_file.c) is concerned, however, this pack does not
even exist.  It is still not part of packed_git list in
sha1_file.c, and read_sha1_file() will not see objects in it, as
no idx into the packfile exists yet.  gfi has a table of objects
in this pack and uses gfi_unpack_entry() function to retrieve
data from it.

A packfile is finalized in end_packfile().  The pack header and
footer is recomputed, an idx file is written, and the pack is
finally registered.  Before that time p->index_data is not even
used for that pack (it is initialized with NULL).

So I do not think "index_data usage" can be incorrect, as there
won't be any index_data usage with unfinalized pack, and the
core part of git would not even have any mmap(2) (nor open fd)
into its idx file before it is finalized.

By the way, I was quite puzzled how the gfi_unpack_entry()
function manages to work correctly when it has to read an object
it deltified based on another object it wrote into the same
unfinalized pack earlier.  It knows where in the unfinalized
pack it wrote the object, so it can find from its own "struct
object_entry" the offset for the object, and calls
unpack_entry() defined in the core to do the rest.

However, most of the core does not really know about the other
objects in this half-built pack.  If the object is a delta,
unpack_delta_entry() needs to find the delta base.  And it needs
to do that without having the idx.

The trick (the code really needs a bit more documentation) is
that gfi never writes anything but OFS_DELTA.  So the core, even
though it does not have the corresponding idx file, does not
have to look up the object (in fact it does not even know what
object to look up for the base, it only knows the offset).

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
  2008-01-19  3:25                                     ` Junio C Hamano
@ 2008-01-19  3:55                                       ` Linus Torvalds
  2008-01-21  3:57                                       ` Shawn O. Pearce
  1 sibling, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2008-01-19  3:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Charles Bailey, Shawn O. Pearce, git, Johannes Sixt



On Fri, 18 Jan 2008, Junio C Hamano wrote:
> 
> gfi uses data in a "pack" in two quite different ways.
> 
> A new object is written to an unfinalized pack.  Such a pack
> already has "struct packed_git" allocated for it and a pointer
> to it is held in pack_data.  As far as the core part of git
> (that is, sha1_file.c) is concerned, however, this pack does not
> even exist.  It is still not part of packed_git list in
> sha1_file.c, and read_sha1_file() will not see objects in it, as
> no idx into the packfile exists yet.  gfi has a table of objects
> in this pack and uses gfi_unpack_entry() function to retrieve
> data from it.
> 
> A packfile is finalized in end_packfile().  The pack header and
> footer is recomputed, an idx file is written, and the pack is
> finally registered.  Before that time p->index_data is not even
> used for that pack (it is initialized with NULL).

Oh, ok. I did try to grep for index_data, and didn't find anything that 
looked bad, but the incestuous things that fast-import.c does just made me 
worry - but I was too lazy to really follow it all. It's one of those 
files I don't think I've ever had anything what-so-ever to do with.

> So I do not think "index_data usage" can be incorrect, as there
> won't be any index_data usage with unfinalized pack, and the
> core part of git would not even have any mmap(2) (nor open fd)
> into its idx file before it is finalized.

In that case, I think Shawn fixed it all, and we're all good, and it's not 
just hidden well enough that it "just happens" to work.

				Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
  2008-01-19  3:25                                     ` Junio C Hamano
  2008-01-19  3:55                                       ` Linus Torvalds
@ 2008-01-21  3:57                                       ` Shawn O. Pearce
  1 sibling, 0 replies; 36+ messages in thread
From: Shawn O. Pearce @ 2008-01-21  3:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Charles Bailey, git, Johannes Sixt

Junio C Hamano <gitster@pobox.com> wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > Btw, even with Shawn's patch, I wonder if the index_data usage is correct.
...
> So I do not think "index_data usage" can be incorrect, as there
> won't be any index_data usage with unfinalized pack, and the
> core part of git would not even have any mmap(2) (nor open fd)
> into its idx file before it is finalized.

Correct.
 
> By the way, I was quite puzzled how the gfi_unpack_entry()
> function manages to work correctly when it has to read an object
> it deltified based on another object it wrote into the same
> unfinalized pack earlier.  It knows where in the unfinalized
> pack it wrote the object, so it can find from its own "struct
> object_entry" the offset for the object, and calls
> unpack_entry() defined in the core to do the rest.
> 
> However, most of the core does not really know about the other
> objects in this half-built pack.  If the object is a delta,
> unpack_delta_entry() needs to find the delta base.  And it needs
> to do that without having the idx.
> 
> The trick (the code really needs a bit more documentation) is
> that gfi never writes anything but OFS_DELTA.  So the core, even
> though it does not have the corresponding idx file, does not
> have to look up the object (in fact it does not even know what
> object to look up for the base, it only knows the offset).

Yup.

Older fast-imports (pre OFS_DELTA) had to replicate a good chunk of
the unpack_entry() logic directly inside of fast-import.  But it
later occurred to me that OFS_DELTA simplifies the code and lets
me reuse more of the existing sha1_file.c implementation.

So yea, fast-import only emits OFS_DELTA, and does so only so that it
can pull this delta-base-unpack trick on the core, without actually
giving the core the corresponding idx file first.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] Fix random fast-import errors when compiled with NO_MMAP
  2008-01-18  6:10                               ` Junio C Hamano
@ 2008-01-21  4:10                                 ` Shawn O. Pearce
  0 siblings, 0 replies; 36+ messages in thread
From: Shawn O. Pearce @ 2008-01-21  4:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Sixt, Charles Bailey, Linus Torvalds

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > +extern void close_pack_windows(struct packed_git *, int);
> 
> Nobody seems to pass anything but true in retain_fd parameter.
> Is it worth it?

Hmm, well, I originally wrote something like what you fixed it
to be, then thought that at some point in the future someone may
try to invoke the method and expect different semantics than it
implemented.  :-)

In short I made the implementation I gave you way too complex.
 
> So I'd prefer something much simpler like this one instead...
> 
> void close_pack_windows(struct packed_git *p)
> {
> 	while (p->windows) {
> 		struct pack_window *w = p->windows;
> 
> 		if (w->inuse_cnt)
> 			die("pack '%s' still has outstanding windows",
> 				p->pack_name)
> 		munmap(w->base, w->len);
> 		pack_mapped -= w->len;
> 		pack_open_windows--;
> 		p->windows = w->next;
> 		free(w);
> 	}
> }

Me too.  Thank you for the sanity check, and for fixing the
code to actually be somewhat sane.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Be more careful about updating refs
  2008-01-15 23:50 Be more careful about updating refs Linus Torvalds
                   ` (2 preceding siblings ...)
  2008-01-16  1:11 ` Junio C Hamano
@ 2008-01-23 22:53 ` Sam Vilain
  3 siblings, 0 replies; 36+ messages in thread
From: Sam Vilain @ 2008-01-23 22:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

Linus Torvalds wrote:
> Perhaps more importantly, it also refuses to update a branch head with a 
> non-commit object. I don't quite know *how* the stable series maintainers 
> were able to corrupt their repository to have a HEAD that pointed to a tag 
> rather than a commit object, but they did. Which results in a totally 
> broken repository that cannot be cloned or committed on.

I actually used this for a prototype system to allow push over git://
with secure authentication via PGP; basically it used an update hook
that disallowed any ref change that wasn't a signed tag, and then in the
post-update hook, moved the tag to an audit log refspace and put the
referant commit in the refs/heads/* location.

The hooks (probably racy etc) can be seen under
http://git.utsl.gen.nz/Tangram/hooks

Sam.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2008-01-23 22:54 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-15 23:50 Be more careful about updating refs Linus Torvalds
2008-01-16  0:02 ` Linus Torvalds
2008-01-16 19:52   ` Junio C Hamano
2008-01-17  9:15     ` Charles Bailey
2008-01-17 10:52       ` Johannes Sixt
2008-01-17 11:01         ` Charles Bailey
2008-01-17 12:41           ` Johannes Sixt
2008-01-17 12:58             ` Johannes Schindelin
2008-01-17 13:07               ` Charles Bailey
2008-01-18  1:43             ` Junio C Hamano
2008-01-18  2:01               ` Junio C Hamano
2008-01-18  2:13                 ` Shawn O. Pearce
2008-01-18  2:25                   ` Junio C Hamano
2008-01-18  2:33                     ` Shawn O. Pearce
2008-01-18  2:58                       ` Shawn O. Pearce
2008-01-18  3:18                         ` Shawn O. Pearce
2008-01-18  3:22                           ` Shawn O. Pearce
     [not found]                             ` <20080118035700.GA3458@spearce.org>
2008-01-18  4:27                               ` [PATCH] Fix random fast-import errors when compiled with NO_MMAP Linus Torvalds
2008-01-18  8:42                                 ` Charles Bailey
2008-01-18 17:08                                   ` Linus Torvalds
2008-01-19  3:25                                     ` Junio C Hamano
2008-01-19  3:55                                       ` Linus Torvalds
2008-01-21  3:57                                       ` Shawn O. Pearce
2008-01-18  6:10                               ` Junio C Hamano
2008-01-21  4:10                                 ` Shawn O. Pearce
2008-01-18  7:53                               ` Johannes Sixt
2008-01-18  9:26                               ` Charles Bailey
2008-01-18  9:36                                 ` Junio C Hamano
2008-01-18  9:45                                   ` Charles Bailey
2008-01-18 10:57                                     ` Junio C Hamano
2008-01-18  2:30               ` Be more careful about updating refs Shawn O. Pearce
2008-01-17 10:56       ` Charles Bailey
2008-01-16  0:29 ` Junio C Hamano
2008-01-16  0:42   ` Linus Torvalds
2008-01-16  1:11 ` Junio C Hamano
2008-01-23 22:53 ` Sam Vilain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).