Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 1/4] hooks: Add function to check if a hook exists
From: Junio C Hamano @ 2012-12-29 16:54 UTC (permalink / raw)
  To: Aaron Schrab; +Cc: git
In-Reply-To: <20121229145032.GB3789@pug.qqx.org>

Aaron Schrab <aaron@schrab.com> writes:

> Since I'm going to be changing the interface for this hook in v2 of
> the series so that it will be more complicated than can be readily
> addressed with the run_hook() API (and will have use a fixed number of
> arguments anyway) I'll be dropping the run_hook_argv() function.

Just to make sure there is no misunderstanding (sorry for sending
the message without finishing it with this clarification at the end
in the first place).  I didn't mean that converting all of the
existing callers must come earlier than introducing a new hook
invoker.

I just wanted to make sure that we are aware that we are adding to
our technical debt, if we are adding another that is also
specialized; as the proposed interface looked sufficiently generic,
it would be the ideal one to make _other_ ones thin wrappers around
it to unify the various codepaths.

Thanks.

^ permalink raw reply

* Re: [PATCH 0/4] pre-push hook support
From: Junio C Hamano @ 2012-12-29 16:48 UTC (permalink / raw)
  To: Aaron Schrab; +Cc: git
In-Reply-To: <20121229145025.GA3789@pug.qqx.org>

Aaron Schrab <aaron@schrab.com> writes:

> At 18:01 -0800 28 Dec 2012, Junio C Hamano <gitster@pobox.com> wrote:
>>Will it be "all-or-none", or "I'll allow these but not those"?
>
> Currently it just uses the exit code to communicate that back, so it's
> all-or-none.  I think I'll keep that in the updated version as well.

Thanks; that sounds sensible.

^ permalink raw reply

* Re: Hold your fire, please
From: Adam Spiers @ 2012-12-29 15:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vehi9hgz3.fsf@alter.siamese.dyndns.org>

On Fri, Dec 28, 2012 at 11:52 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Primarily in order to force me concentrate on the releng for the
>> upcoming release, and also to encourage contributors to focus on
>> finding and fixing any last minute regressions (rather than
>> distracting others by showing publicly scratching their itches), I
>> won't be queuing any patch that is not a regression fix, at least
>> for the next few days.  I may not even review them.
>>
>> Thanks.
>
> Just as a friendly reminder, I am reposting this to remind people.

Ah, I thought this period had elapsed already.  I assume this applies
to check-ignore, in which case how long should I hold off before
submitting v4?

>> And have a nice holiday if you are in areas where it is a holiday
>> season ;-)

You too :-)

^ permalink raw reply

* Re: [PATCH 1/4] hooks: Add function to check if a hook exists
From: Aaron Schrab @ 2012-12-29 14:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vwqw1fw5a.fsf@alter.siamese.dyndns.org>

At 18:08 -0800 28 Dec 2012, Junio C Hamano <gitster@pobox.com> wrote:
>Aaron Schrab <aaron@schrab.com> writes:
>
>> Create find_hook() function to determine if a given hook exists and is
>> executable.  If it is the path to the script will be returned, otherwise
>> NULL is returned.
>
>Sounds like a sensible thing to do.  To make sure the API is also
>sensible, all the existing hooks should be updated to use this API,
>no?

I'd been trying to keep the changes limited.  I'll see about modifying 
the existing places that run hooks in v2 of the series.

>> This is in support for an upcoming run_hook_argv() function which will
>> expect the full path to the hook script as the first element in the
>> argv_array.
>
>There is currently a public function called run_hook() that squats
>on the good name with a kludgy API that is too specific to using
>separate index file.  Back when it was a private helper in the
>implementation of "git commit", it was perfectly fine, but it was
>exported without giving much thought on the API.
>
>If you are introducing a new run_hook_* function, give it a generic
>enough API that lets all the existing hook callers to use it.  I
>would imagine that the API requirement may be modelled after
>run_command() API so that we can pass argv[] and tweak the hook's
>environ[], as well as feeding its stdin and possibly reading from
>its stdout.  That would be very useful.

I think the attraction of the run_hook() API is its simplicity.  It's 
currently a fairly thin wrapper around the run_command() API.  I suspect 
that if the run_hook() API were made generic enough to support all of 
the existing hook callers it would greatly complicate the existing calls 
to run_hook() while not providing much benefit to hook callers which 
can't currently use it beyond what run_command() offers.

Since I'm going to be changing the interface for this hook in v2 of the 
series so that it will be more complicated than can be readily addressed 
with the run_hook() API (and will have use a fixed number of arguments 
anyway) I'll be dropping the run_hook_argv() function.

^ permalink raw reply

* Re: [PATCH 0/4] pre-push hook support
From: Aaron Schrab @ 2012-12-29 14:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v1ue9hb06.fsf@alter.siamese.dyndns.org>

At 18:01 -0800 28 Dec 2012, Junio C Hamano <gitster@pobox.com> wrote:
>One lesson we learned long time ago while doing hooks is to avoid
>unbound number of command line arguments and instead feed them from
>the standard input.  I think this should do the same.

Good point.  I had been trying to keep the interface for this hook as 
close as possible to the ones for other client-side hooks on the theory 
that less development effort may go into those than for server-side 
hooks.  But thinking on that more I certainly see that this could easily 
run into limits on argument length on some systems; especially when it's 
likely that each of those arguments is likely to be over 100 bytes long.

I'll work on an updated version which sends the variable length 
information over a pipe, using the command-line arguments only to pass 
the remote name and URL.

>How does the hook communicate its decision to the calling Git?
>
>Will it be "all-or-none", or "I'll allow these but not those"?

Currently it just uses the exit code to communicate that back, so it's 
all-or-none.  I think I'll keep that in the updated version as well.

A future enhancement could modify the protocol to support reading from 
the hook's stdout the names of remote refs which are to be rejected, I 
think that just having the option for all-or-nothing is a good starting 
point.

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jeff King @ 2012-12-29 11:05 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229103430.GG18903@elie.Belkin>

On Sat, Dec 29, 2012 at 02:34:30AM -0800, Jonathan Nieder wrote:

> >>   $ git am --abort
> >>   Unstaged changes after reset:
> >>   M       sound/usb/midi.c
> >
> > What does your index look like afterwards? Does it have a null sha1 in
> > it (check "ls-files -s")?
> 
> $ git diff-index --abbrev HEAD
> :100644 100644 eeefbce3873c... 000000000000... M	sound/usb/midi.c
> $ git ls-files --abbrev -s sound/usb/midi.c
> 100644 eeefbce3873c 0	sound/usb/midi.c

Hmm. It looks like "am --abort" overwrites the index again after the
read-tree which complains.  If I downgrade the error in write_index to a
warning, like this:

diff --git a/read-cache.c b/read-cache.c
index fda78bc..70a6d86 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1797,7 +1797,7 @@ int write_index(struct index_state *istate, int newfd)
 		if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
 			ce_smudge_racily_clean_entry(ce);
 		if (is_null_sha1(ce->sha1))
-			return error("cache entry has null sha1: %s", ce->name);
+			warning("cache entry has null sha1: %s", ce->name);
 		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
 			return -1;
 	}

and then just run this:

  [clear state from last run]
  $ rm -rf .git/rebase-apply
  $ git reset --hard

  [apply the patch; we get a conflict]
  $ git am -3sc queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch

  [now run just the read-tree from "am --abort"]
  $ git.compile read-tree --reset -u HEAD ORIG_HEAD
  warning: cache entry has null sha1: sound/usb/midi.c

  [and now check our index]
  $ git ls-files -s sound/usb/midi.c
  100644 0000000000000000000000000000000000000000 0 sound/usb/midi.c

  [yes, this index is bogus]
  $ git write-tree
  error: invalid object 100644 0000000000000000000000000000000000000000 for 'sound/usb/midi.c'
  fatal: git-write-tree: error building trees

So I think this check may actually be finding a real bug. I have seen
these null sha1s in the wild, but I was never able to track down the
actual cause. Maybe this will give us a clue. Now we just need to work
backwards and figure out who is putting it in the in-memory index and
why.

I'll try to work on it tomorrow, but please don't let that stop you if
you want to keep digging in the meantime.

-Peff

^ permalink raw reply related

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jonathan Nieder @ 2012-12-29 10:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229104247.GA30283@sigill.intra.peff.net>

Jeff King wrote:

> Hrm. But your output does not say there is a conflict. It says you have
> a local modification and it does not try the merge:

That's probably operator error on my part when gathering output to
paste into the email.

In other words, nothing to see there. :)  Sorry for the confusion.

[...]
> If I try to apply it, I get a real conflict:
[...]
> Although running "git am --abort" after that does seem to produce the
> "cache entry has null sha1" error.

Yep, that is what my report should have said.

'night,
Jonathan

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jeff King @ 2012-12-29 10:42 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229103430.GG18903@elie.Belkin>

On Sat, Dec 29, 2012 at 02:34:30AM -0800, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > I can't reproduce here. I can checkout v3.2.35, and I guess that the
> > patch you are applying comes from f5f1654, but I don't know your
> > local modification to sound/usb/midi.c.
> 
> No local modification.  The unstaged change after "git am --abort" to
> recover from a conflicted git am is a longstanding bug (at least a
> couple of years old).
> 
> The patch creating conflicts is
> 
> 	queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch
> 
> from git://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-3.2.y-queue.git

Hrm. But your output does not say there is a conflict. It says you have
a local modification and it does not try the merge:

> $ git am -3sc /tmp/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch
> Applying: ALSA: usb-audio: Fix missing autopm for MIDI input
> Using index info to reconstruct a base tree...
> Falling back to patching base and 3-way merge...
> error: Your local changes to the following files would be overwritten by merge:
>       sound/usb/midi.c
> Please, commit your changes or stash them before you can merge.
> Aborting

If I try to apply it, I get a real conflict:

  $ git checkout v3.2.35
  $ git am -3sc linux-3.2.y-queue/queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch
  Applying: ALSA: usb-audio: Fix missing autopm for MIDI input
  Using index info to reconstruct a base tree...
  M       sound/usb/midi.c
  Falling back to patching base and 3-way merge...
  Auto-merging sound/usb/midi.c
  CONFLICT (content): Merge conflict in sound/usb/midi.c

Although running "git am --abort" after that does seem to produce the
"cache entry has null sha1" error. So I can start investigating from
there.

-Peff

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jonathan Nieder @ 2012-12-29 10:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229102707.GA26730@sigill.intra.peff.net>

Jeff King wrote:

> Can you give more details?

$ GIT_TRACE=1 git am --abort
trace: exec: 'git-am' '--abort'
trace: run_command: 'git-am' '--abort'
trace: built-in: git 'rev-parse' '--parseopt' '--' '--abort'
trace: built-in: git 'rev-parse' '--git-dir'
trace: built-in: git 'rev-parse' '--show-prefix'
trace: built-in: git 'rev-parse' '--show-toplevel'
trace: built-in: git 'var' 'GIT_COMMITTER_IDENT'
trace: built-in: git 'rev-parse' '--verify' '-q' 'HEAD'
trace: built-in: git 'config' '--bool' '--get' 'am.keepcr'
trace: built-in: git 'rerere' 'clear'
trace: built-in: git 'rev-parse' '--verify' '-q' 'HEAD'
trace: built-in: git 'read-tree' '--reset' '-u' 'HEAD' 'ORIG_HEAD'
error: cache entry has null sha1: sound/usb/midi.c
fatal: unable to write new index file
trace: built-in: git 'reset' 'ORIG_HEAD'
Unstaged changes after reset:
M       sound/usb/midi.c

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jonathan Nieder @ 2012-12-29 10:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229102707.GA26730@sigill.intra.peff.net>

Jeff King wrote:

> I can't reproduce here. I can checkout v3.2.35, and I guess that the
> patch you are applying comes from f5f1654, but I don't know your
> local modification to sound/usb/midi.c.

No local modification.  The unstaged change after "git am --abort" to
recover from a conflicted git am is a longstanding bug (at least a
couple of years old).

The patch creating conflicts is

	queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch

from git://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-3.2.y-queue.git

[...]
>>   $ git am --abort
>>   Unstaged changes after reset:
>>   M       sound/usb/midi.c
>
> What does your index look like afterwards? Does it have a null sha1 in
> it (check "ls-files -s")?

$ git diff-index --abbrev HEAD
:100644 100644 eeefbce3873c... 000000000000... M	sound/usb/midi.c
$ git ls-files --abbrev -s sound/usb/midi.c
100644 eeefbce3873c 0	sound/usb/midi.c

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jeff King @ 2012-12-29 10:27 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229100130.GA31497@elie.Belkin>

On Sat, Dec 29, 2012 at 02:03:46AM -0800, Jonathan Nieder wrote:

> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1800,6 +1800,8 @@ int write_index(struct index_state *istate, int newfd)
> >  			continue;
> >  		if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
> >  			ce_smudge_racily_clean_entry(ce);
> > +		if (is_null_sha1(ce->sha1))
> > +			return error("cache entry has null sha1: %s", ce->name);
> >  		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
> >  			return -1;
> 
> Quick heads up: this is tripping for me in the "git am --abort"
> codepath:

Thanks. The intent was that this should never happen, and we are
protecting against bogus sha1s slipping into the index. So either we
have found a bug in "am --abort", or the assumption that it should never
happen was wrong. :)

>   $ cd ~/src/linux
>   $ git checkout v3.2.35
>   $ git am -3sc /tmp/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch 
>   Applying: ALSA: usb-audio: Fix missing autopm for MIDI input
>   Using index info to reconstruct a base tree...
>   Falling back to patching base and 3-way merge...
>   error: Your local changes to the following files would be overwritten by merge:
> 	sound/usb/midi.c
>   Please, commit your changes or stash them before you can merge.
>   Aborting
>   Failed to merge in the changes.
>   Patch failed at 0001 ALSA: usb-audio: Fix missing autopm for MIDI input
>   When you have resolved this problem run "git am --resolved".
>   If you would prefer to skip this patch, instead run "git am --skip".
>   To restore the original branch and stop patching run "git am --abort".
>   $ git am --abort
>   error: cache entry has null sha1: sound/usb/midi.c
>   fatal: unable to write new index file
>   Unstaged changes after reset:
>   M       sound/usb/midi.c
>   $

I can't reproduce here. I can checkout v3.2.35, and I guess that the
patch you are applying comes from f5f1654, but I don't know your
local modification to sound/usb/midi.c. If I just make a fake
modification, I get this:

  $ git checkout v3.2.35
  $ echo fake >sound/usb/midi.c
  $ git format-patch --stdout -1 f5f1654 | git am -3 ~/patch
  [same errors as you]
  $ git am --abort
  Unstaged changes after reset:
  M       sound/usb/midi.c

So I wonder if there is something in the way your working tree is
modified. Can you give more details?

> Reproducible using v1.8.1-rc3 and master.  Bisects to 4337b585 (do not
> write null sha1s to on-disk index, 2012-07-28).  For comparison, older
> gits produced
> 
>   $ git am --abort
>   Unstaged changes after reset:
>   M       sound/usb/midi.c

What does your index look like afterwards? Does it have a null sha1 in
it (check "ls-files -s")?

-Peff

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jonathan Nieder @ 2012-12-29 10:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20120728150524.GB25269@sigill.intra.peff.net>

Hi Peff,

Jeff King wrote:

> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1800,6 +1800,8 @@ int write_index(struct index_state *istate, int newfd)
>  			continue;
>  		if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
>  			ce_smudge_racily_clean_entry(ce);
> +		if (is_null_sha1(ce->sha1))
> +			return error("cache entry has null sha1: %s", ce->name);
>  		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
>  			return -1;

Quick heads up: this is tripping for me in the "git am --abort"
codepath:

  $ cd ~/src/linux
  $ git checkout v3.2.35
  $ git am -3sc /tmp/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch 
  Applying: ALSA: usb-audio: Fix missing autopm for MIDI input
  Using index info to reconstruct a base tree...
  Falling back to patching base and 3-way merge...
  error: Your local changes to the following files would be overwritten by merge:
	sound/usb/midi.c
  Please, commit your changes or stash them before you can merge.
  Aborting
  Failed to merge in the changes.
  Patch failed at 0001 ALSA: usb-audio: Fix missing autopm for MIDI input
  When you have resolved this problem run "git am --resolved".
  If you would prefer to skip this patch, instead run "git am --skip".
  To restore the original branch and stop patching run "git am --abort".
  $ git am --abort
  error: cache entry has null sha1: sound/usb/midi.c
  fatal: unable to write new index file
  Unstaged changes after reset:
  M       sound/usb/midi.c
  $

Reproducible using v1.8.1-rc3 and master.  Bisects to 4337b585 (do not
write null sha1s to on-disk index, 2012-07-28).  For comparison, older
gits produced

  $ git am --abort
  Unstaged changes after reset:
  M       sound/usb/midi.c

which is also not great but at least didn't involve any obviously
invalid behavior.  Known problem?

Thanks,
Jonathan

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-29  9:48 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121229090558.GA31291@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 04:05:58AM -0500, Jeff King wrote:

> On Sat, Dec 29, 2012 at 12:27:47AM -0500, Jeff King wrote:
> 
> > > I think I tried the partial decompression for commit header and it did
> > > not help much (or I misremember it, not so sure).
> > 
> > I'll see if I can dig up the reference, as it was something I was going
> > to look at next.
> 
> I tried the simple patch below, but it actually made things slower!  I'm
> assuming it is because the streaming setup is not micro-optimized very
> well. A custom read_sha1_until_blank_line() could probably do better.

Something like the patch below, which does speed things up. But not
nearly as much as I'd hoped:

  [before]
  $ best-of-five git rev-list --count --all
  real    0m4.197s
  user    0m4.112s
  sys     0m0.072s

  [after]
  $ best-of-five git rev-list --count --all
  real    0m3.782s
  user    0m3.708s
  sys     0m0.064s

Only about a 10% speedup (versus ~75% with uncompressed commits).

---
diff --git a/cache.h b/cache.h
index 18fdd18..a494d3b 100644
--- a/cache.h
+++ b/cache.h
@@ -724,6 +724,7 @@ int offset_1st_component(const char *path);
 
 /* object replacement */
 #define READ_SHA1_FILE_REPLACE 1
+#define READ_SHA1_FILE_HEADER 2
 extern void *read_sha1_file_extended(const unsigned char *sha1, enum object_type *type, unsigned long *size, unsigned flag);
 static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size)
 {
@@ -1059,7 +1060,7 @@ extern int is_pack_valid(struct packed_git *);
 extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
 extern off_t find_pack_entry_one(const unsigned char *, struct packed_git *);
 extern int is_pack_valid(struct packed_git *);
-extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *);
+extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *, int);
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *);
diff --git a/commit.c b/commit.c
index e8eb0ae..0a088dc 100644
--- a/commit.c
+++ b/commit.c
@@ -312,12 +312,13 @@ int parse_commit(struct commit *item)
 	void *buffer;
 	unsigned long size;
 	int ret;
+	int flags = save_commit_buffer ? 0 : READ_SHA1_FILE_HEADER;
 
 	if (!item)
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	buffer = read_sha1_file(item->object.sha1, &type, &size);
+	buffer = read_sha1_file_extended(item->object.sha1, &type, &size, flags);
 	if (!buffer)
 		return error("Could not read %s",
 			     sha1_to_hex(item->object.sha1));
diff --git a/fast-import.c b/fast-import.c
index c2a814e..a140d57 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1303,7 +1303,7 @@ static void *gfi_unpack_entry(
 		 */
 		p->pack_size = pack_size + 20;
 	}
-	return unpack_entry(p, oe->idx.offset, &type, sizep);
+	return unpack_entry(p, oe->idx.offset, &type, sizep, 0);
 }
 
 static const char *get_mode(const char *str, uint16_t *modep)
diff --git a/pack-check.c b/pack-check.c
index 63a595c..e4a43c0 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -116,7 +116,7 @@ static int verify_packfile(struct packed_git *p,
 					    sha1_to_hex(entries[i].sha1),
 					    p->pack_name, (uintmax_t)offset);
 		}
-		data = unpack_entry(p, entries[i].offset, &type, &size);
+		data = unpack_entry(p, entries[i].offset, &type, &size, 0);
 		if (!data)
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    sha1_to_hex(entries[i].sha1), p->pack_name,
diff --git a/sha1_file.c b/sha1_file.c
index 40b2329..1f1f31a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1469,16 +1469,19 @@ static void *unpack_sha1_file(void *map, unsigned long mapsize, enum object_type
 	return *hdr ? -1 : type_from_string(type);
 }
 
-static void *unpack_sha1_file(void *map, unsigned long mapsize, enum object_type *type, unsigned long *size, const unsigned char *sha1)
+static void *unpack_sha1_file(void *map, unsigned long mapsize, enum object_type *type, unsigned long *size, const unsigned char *sha1, int stop_at_blank)
 {
 	int ret;
 	git_zstream stream;
-	char hdr[8192];
+	char hdr[512];
 
 	ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
 	if (ret < Z_OK || (*type = parse_sha1_header(hdr, size)) < 0)
 		return NULL;
-
+	if (stop_at_blank && strstr(hdr, "\n\n")) {
+		*size = strlen(hdr);
+		return xstrdup(hdr);
+	}
 	return unpack_sha1_rest(&stream, hdr, *size, sha1);
 }
 
@@ -1667,8 +1670,11 @@ static void *unpack_compressed_entry(struct packed_git *p,
 static void *unpack_compressed_entry(struct packed_git *p,
 				    struct pack_window **w_curs,
 				    off_t curpos,
-				    unsigned long size)
+				    unsigned long *sizep,
+				    int stop_at_blank)
 {
+	static const int chunk_size = 256;
+	unsigned long size = *sizep;
 	int st;
 	git_zstream stream;
 	unsigned char *buffer, *in;
@@ -1676,15 +1682,27 @@ static void *unpack_compressed_entry(struct packed_git *p,
 	buffer = xmallocz(size);
 	memset(&stream, 0, sizeof(stream));
 	stream.next_out = buffer;
-	stream.avail_out = size + 1;
+
+	if (stop_at_blank)
+		stream.avail_out = chunk_size;
+	else
+		stream.avail_out = size + 1;
 
 	git_inflate_init(&stream);
 	do {
 		in = use_pack(p, w_curs, curpos, &stream.avail_in);
 		stream.next_in = in;
 		st = git_inflate(&stream, Z_FINISH);
-		if (!stream.avail_out)
-			break; /* the payload is larger than it should be */
+		if (!stream.avail_out) {
+			if (!stop_at_blank)
+				break; /* the payload is larger than it should be */
+			if (memmem(buffer, chunk_size, "\n\n", 2)) {
+				git_inflate_end(&stream);
+				*sizep = chunk_size;
+				return buffer;
+			}
+			stream.avail_out = size + 1 - chunk_size;
+		}
 		curpos += stream.next_in - in;
 	} while (st == Z_OK || st == Z_BUF_ERROR);
 	git_inflate_end(&stream);
@@ -1731,7 +1749,8 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
 }
 
 static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
-	unsigned long *base_size, enum object_type *type, int keep_cache)
+	unsigned long *base_size, enum object_type *type, int keep_cache,
+	int stop_at_blank)
 {
 	void *ret;
 	unsigned long hash = pack_entry_hash(p, base_offset);
@@ -1739,9 +1758,9 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
 
 	ret = ent->data;
 	if (!ret || ent->p != p || ent->base_offset != base_offset)
-		return unpack_entry(p, base_offset, type, base_size);
+		return unpack_entry(p, base_offset, type, base_size, stop_at_blank);
 
-	if (!keep_cache) {
+	if (!stop_at_blank && !keep_cache) {
 		ent->data = NULL;
 		ent->lru.next->prev = ent->lru.prev;
 		ent->lru.prev->next = ent->lru.next;
@@ -1810,7 +1829,7 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
 }
 
 static void *read_object(const unsigned char *sha1, enum object_type *type,
-			 unsigned long *size);
+			 unsigned long *size, int stop_at_blank);
 
 static void *unpack_delta_entry(struct packed_git *p,
 				struct pack_window **w_curs,
@@ -1832,7 +1851,7 @@ static void *unpack_delta_entry(struct packed_git *p,
 		return NULL;
 	}
 	unuse_pack(w_curs);
-	base = cache_or_unpack_entry(p, base_offset, &base_size, type, 0);
+	base = cache_or_unpack_entry(p, base_offset, &base_size, type, 0, 0);
 	if (!base) {
 		/*
 		 * We're probably in deep shit, but let's try to fetch
@@ -1851,12 +1870,12 @@ static void *unpack_delta_entry(struct packed_git *p,
 		      sha1_to_hex(base_sha1), (uintmax_t)base_offset,
 		      p->pack_name);
 		mark_bad_packed_object(p, base_sha1);
-		base = read_object(base_sha1, type, &base_size);
+		base = read_object(base_sha1, type, &base_size, 0);
 		if (!base)
 			return NULL;
 	}
 
-	delta_data = unpack_compressed_entry(p, w_curs, curpos, delta_size);
+	delta_data = unpack_compressed_entry(p, w_curs, curpos, &delta_size, 0);
 	if (!delta_data) {
 		error("failed to unpack compressed delta "
 		      "at offset %"PRIuMAX" from %s",
@@ -1895,7 +1914,8 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 int do_check_packed_object_crc;
 
 void *unpack_entry(struct packed_git *p, off_t obj_offset,
-		   enum object_type *type, unsigned long *sizep)
+		   enum object_type *type, unsigned long *sizep,
+		   int stop_at_blank)
 {
 	struct pack_window *w_curs = NULL;
 	off_t curpos = obj_offset;
@@ -1929,7 +1949,8 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 	case OBJ_TREE:
 	case OBJ_BLOB:
 	case OBJ_TAG:
-		data = unpack_compressed_entry(p, &w_curs, curpos, *sizep);
+		data = unpack_compressed_entry(p, &w_curs, curpos, sizep,
+					       stop_at_blank);
 		break;
 	default:
 		data = NULL;
@@ -2208,14 +2229,15 @@ static void *read_packed_sha1(const unsigned char *sha1,
 }
 
 static void *read_packed_sha1(const unsigned char *sha1,
-			      enum object_type *type, unsigned long *size)
+			      enum object_type *type, unsigned long *size,
+			      int stop_at_blank)
 {
 	struct pack_entry e;
 	void *data;
 
 	if (!find_pack_entry(sha1, &e))
 		return NULL;
-	data = cache_or_unpack_entry(e.p, e.offset, size, type, 1);
+	data = cache_or_unpack_entry(e.p, e.offset, size, type, 1, stop_at_blank);
 	if (!data) {
 		/*
 		 * We're probably in deep shit, but let's try to fetch
@@ -2226,7 +2248,7 @@ static void *read_packed_sha1(const unsigned char *sha1,
 		error("failed to read object %s at offset %"PRIuMAX" from %s",
 		      sha1_to_hex(sha1), (uintmax_t)e.offset, e.p->pack_name);
 		mark_bad_packed_object(e.p, sha1);
-		data = read_object(sha1, type, size);
+		data = read_object(sha1, type, size, stop_at_blank);
 	}
 	return data;
 }
@@ -2255,7 +2277,7 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
 }
 
 static void *read_object(const unsigned char *sha1, enum object_type *type,
-			 unsigned long *size)
+			 unsigned long *size, int stop_at_blank)
 {
 	unsigned long mapsize;
 	void *map, *buf;
@@ -2268,17 +2290,18 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
 		return xmemdupz(co->buf, co->size);
 	}
 
-	buf = read_packed_sha1(sha1, type, size);
+	buf = read_packed_sha1(sha1, type, size, stop_at_blank);
 	if (buf)
 		return buf;
 	map = map_sha1_file(sha1, &mapsize);
 	if (map) {
-		buf = unpack_sha1_file(map, mapsize, type, size, sha1);
+		buf = unpack_sha1_file(map, mapsize, type, size, sha1,
+				       stop_at_blank);
 		munmap(map, mapsize);
 		return buf;
 	}
 	reprepare_packed_git();
-	return read_packed_sha1(sha1, type, size);
+	return read_packed_sha1(sha1, type, size, stop_at_blank);
 }
 
 /*
@@ -2296,9 +2319,10 @@ void *read_sha1_file_extended(const unsigned char *sha1,
 	const struct packed_git *p;
 	const unsigned char *repl = (flag & READ_SHA1_FILE_REPLACE)
 		? lookup_replace_object(sha1) : sha1;
+	int stop_at_blank = !!(flag & READ_SHA1_FILE_HEADER);
 
 	errno = 0;
-	data = read_object(repl, type, size);
+	data = read_object(repl, type, size, stop_at_blank);
 	if (data)
 		return data;
 
@@ -2597,7 +2621,7 @@ int force_object_loose(const unsigned char *sha1, time_t mtime)
 
 	if (has_loose_object(sha1))
 		return 0;
-	buf = read_packed_sha1(sha1, &type, &len);
+	buf = read_packed_sha1(sha1, &type, &len, 0);
 	if (!buf)
 		return error("cannot read sha1_file for %s", sha1_to_hex(sha1));
 	hdrlen = sprintf(hdr, "%s %lu", typename(type), len) + 1;

^ permalink raw reply related

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-29  9:05 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121229052747.GA14928@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 12:27:47AM -0500, Jeff King wrote:

> > I think I tried the partial decompression for commit header and it did
> > not help much (or I misremember it, not so sure).
> 
> I'll see if I can dig up the reference, as it was something I was going
> to look at next.

I tried the simple patch below, but it actually made things slower!  I'm
assuming it is because the streaming setup is not micro-optimized very
well. A custom read_sha1_until_blank_line() could probably do better.

diff --git a/commit.c b/commit.c
index e8eb0ae..efd6c06 100644
--- a/commit.c
+++ b/commit.c
@@ -8,6 +8,7 @@
 #include "notes.h"
 #include "gpg-interface.h"
 #include "mergesort.h"
+#include "streaming.h"
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
 
@@ -306,6 +307,39 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 	return 0;
 }
 
+static void *read_commit_header(const unsigned char *sha1,
+				enum object_type *type,
+				unsigned long *size)
+{
+	static const int chunk_size = 256;
+	struct strbuf buf = STRBUF_INIT;
+	struct git_istream *st;
+
+	st = open_istream(sha1, type, size, NULL);
+	if (!st)
+		return NULL;
+	while (1) {
+		size_t start = buf.len;
+		ssize_t readlen;
+
+		strbuf_grow(&buf, chunk_size);
+		readlen = read_istream(st, buf.buf + start, chunk_size);
+		buf.buf[start + readlen + 1] = '\0';
+		buf.len += readlen;
+
+		if (readlen < 0) {
+			close_istream(st);
+			strbuf_release(&buf);
+			return NULL;
+		}
+		if (!readlen || strstr(buf.buf + start, "\n\n"))
+			break;
+	}
+
+	close_istream(st);
+	return strbuf_detach(&buf, size);
+}
+
 int parse_commit(struct commit *item)
 {
 	enum object_type type;
@@ -317,7 +351,11 @@ int parse_commit(struct commit *item)
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	buffer = read_sha1_file(item->object.sha1, &type, &size);
+
+	if (!save_commit_buffer)
+		buffer = read_commit_header(item->object.sha1, &type, &size);
+	else
+		buffer = read_sha1_file(item->object.sha1, &type, &size);
 	if (!buffer)
 		return error("Could not read %s",
 			     sha1_to_hex(item->object.sha1));

^ permalink raw reply related

* Re: Lockless Refs?
From: Jeff King @ 2012-12-29  8:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Fick, Michael Haggerty, git, Shawn Pearce
In-Reply-To: <7vhan6jdx3.fsf@alter.siamese.dyndns.org>

On Fri, Dec 28, 2012 at 09:15:52AM -0800, Junio C Hamano wrote:

> Martin Fick <mfick@codeaurora.org> writes:
> 
> > Hmm, actually I believe that with a small modification to the 
> > semantics described here it would be possible to make multi 
> > repo/branch commits work....
> >
> > Shawn talked about adding multi repo/branch transaction 
> > semantics to jgit, this might be something that git wants to 
> > support also at some point?
> 
> Shawn may have talked about it and you may have listened to it, but
> others wouldn't have any idea what kind of "multi repo/branch
> transaction" you are talking about.  Is it about "I want to push
> this ref to that repo and push this other ref to that other repo",
> in what situation will it be used/useful, what are the failure
> modes, what are failure tolerances by the expected use cases, ...?
> 
> Care to explain?

I cannot speak for Martin, but I am assuming the point is to atomically
update 2 (or more) refs on the same repo. That is, if I have a branch
"refs/heads/foo" and a ref pointing to meta-information (say, notes
about commits in foo, in "refs/notes/meta/foo"), I would want to "git
push" them, and only update them if _both_ will succeed, and otherwise
fail and update nothing.

I think Shawn mentioned this at the last GitTogether as a stumbling
block for pushing more of Gerrit's meta-information as refs over the git
protocol. But I might be mis-remembering.

-Peff

^ permalink raw reply

* Re: Lockless Refs?  (Was [PATCH] refs: do not use cached refs in repack_without_ref)
From: Jeff King @ 2012-12-29  8:12 UTC (permalink / raw)
  To: Martin Fick; +Cc: Michael Haggerty, git, Junio C Hamano, Shawn Pearce
In-Reply-To: <201212280750.14695.mfick@codeaurora.org>

On Fri, Dec 28, 2012 at 07:50:14AM -0700, Martin Fick wrote:

> Hmm, actually I believe that with a small modification to the 
> semantics described here it would be possible to make multi 
> repo/branch commits work.   Simply allow the ref filename to 
> be locked by a transaction by appending the transaction ID to 
> the filename.  So if transaction 123 wants to lock master 
> which points currently to abcde, then it will move 
> master/abcde to master/abcde_123.  If transaction 123 is 
> designed so that any process can commit/complete/abort it 
> without requiring any locks which can go stale, then this ref 
> lock will never go stale either (easy as long as it writes 
> all its proposed updates somewhere upfront and has atomic 
> semantics for starting, committing and aborting).  On commit, 
> the ref lock gets updated to its new value: master/newsha and 
> on abort it gets unlocked: master/abcde.

Hmm. I thought our goal was to avoid locks? Isn't this just locking by
another name?

I guess your point is to have no locks in the "normal" case, and have
locked transactions as an optional add-on?

-Peff

^ permalink raw reply

* Re: Lockless Refs?  (Was [PATCH] refs: do not use cached refs in repack_without_ref)
From: Jeff King @ 2012-12-29  8:10 UTC (permalink / raw)
  To: Martin Fick; +Cc: Michael Haggerty, git, Junio C Hamano
In-Reply-To: <201212271611.52203.mfick@codeaurora.org>

On Thu, Dec 27, 2012 at 04:11:51PM -0700, Martin Fick wrote:

> For a single user repo this is not a big deal, the lock can 
> always be cleaned up manually (and it is a rare occurrence).  
> However, in a multi user server environment, possibly even 
> from multiple hosts over a shared filesystem such as NFS, 
> stale locks could lead to serious downtime and risky recovery 
> (since it is currently hard to figure out if a lock really is 
> stale).  Even though stale locks are probably rare even today 
> in the larger shared repo case, as git scales to even larger 
> shared repositories, this will eventually become more of a 
> problem *1.  Naturally, this has me thinking that git should 
> possibly consider moving towards a lockless design for refs 
> in the long term.

FWIW, I am involved in cleaning up stale locks for a very large git
hosting site. It actually happens surprisingly little. I think it is
mostly because git holds actual locks for a very short period of time
(just enough to check that the value is unchanged from when we started a
lengthy operation, and then atomically write the new value).

So I agree it would be cool (and maybe open up new realms of
scalability) for git to be lockless, but in my experience, this isn't
that pressing a problem (and any solutions are not going to be backwards
compatible, so there is going to be a high deployment cost).

> My idea is based on using filenames to store sha1s instead of 
> file contents.  To do this, the sha1 one of a ref would be 
> stored in a file in a directory named after the loose ref.  I 
> believe this would then make it possible to have lockless 
> atomic ref updates by renaming the file.
> 
> To more fully illustrate the idea, imagine that any file 
> (except for the null file) in the directory will represent the 
> value of the ref with its name, then the following 
> transitions can represent atomic state changes to a refs 
> value and existence:

Hmm. So basically you are relying on atomic rename() to move the value
around within a directory, rather than using write to move it around
within a file. Atomic rename is usually something we have on local
filesystems (and I think we rely on it elsewhere). Though I would not be
surprised if it is not atomic on all networked filesystems (though it is
on NFS, at least).

> 1) To update the value from a known value to a new value 
> atomically, simply rename the file to the new value.  This 
> operation should only succeed if the file exists and is still 
> named old value before the rename.  This should even be 
> faster than today's approach, especially on remote filesystems 
> since it would require only 1 round trip in the success case 
> instead of 3!

OK. Makes sense.

> 2) To delete the ref, simply delete the filename representing 
> the current value of the ref.  This ensures that you are 
> deleting the ref from a specific value.  I am not sure if git 
> needs to be able to delete refs without knowing their values?  
> If so, this would require reading the value and looping until 
> the delete succeeds, this may be a bit slow for a constantly 
> updated ref, but likely a rare situation (and not likely 
> worse than trying to acquire the ref-lock today).  Overall, 
> this again would likely be faster than today's approach.

We do sometimes delete without knowing the value. In most cases we would
not want to do this, but for some "force"-type commands, we do. You
would actually have the same problem with updating above, as we
sometimes update with the intent to overwrite whatever is there.

> 3) To create a ref, it must be renamed from the null file (sha 
> 0000...) to the new value just as if it were being updated 
> from any other value, but there is one extra condition: 
> before renaming the null file, a full directory scan must be 
> done to ensure that the null file is the only file in the 
> directory (this condition exists because creating the 
> directory and null file cannot be atomic unless the filesystem 
> supports atomic directory renames, an expectation git does 
> not currently make).  I am not sure how this compares to 
> today's approach, but including the setup costs (described 
> below), I suspect it is slower.

Hmm. mkdir is atomic. So wouldn't it be sufficient to just mkdir and
create the correct sha1 file?  A simultaneous creator would fail on the
mkdir and abort. A simultaneous reader might see the directory, but it
would either see it as empty, or with the correct file. In the former
case, it would treat that the same as if the directory did not exist.

Speaking of which, you did not cover reading at all, but it would have
to be:

  dh = opendir(ref);
  if (!dh) {
          if (errno == ENOENT)
                  return 0; /* no such ref */
          else
                  return error("couldn't read ref");
  }

  while ((ent = readdir(dh)) {
          if (ent->d_name[0] == '.')
                  /*
                   * skip "." and "..", and leave room for annotating 
                   * refs via dot-files
                   */
                   continue;
          /* otherwise, we found it */
          if (get_sha1_hex(ent->d_name, sha1) < 0)
                  return error("weird junk in ref dir?");
          return 1; /* found it */
  }
  return 0; /* did not contain an entry; ref being created? Retry? */

Is readdir actually atomic with respect to directory updates? That is,
if I am calling readdir() and somebody else is renaming, what do I get?
POSIX says:

   If a file is removed from or added to the directory after the most
   recent call to opendir() or rewinddir(), whether a subsequent call to
   readdir() returns an entry for that file is unspecified.

If I get one or the other file (that is, the old name or the new one),
it is OK. It does not matter which, as it is a race whether I see the
old value or the new one during an update. But according to POSIX, it is
possible that I may see neither.

I suppose we could rewinddir() and retry. We might hit the race again
(if somebody else is updating quickly), but realistically, this will
happen very infrequently, and we can just keep trying until we win the
race and get a valid read.

> I don't know how this new scheme could be made to work with 
> the current scheme, it seems like perhaps new git releases 
> could be made to understand both the old and the new, and a 
> config option could be used to tell it which method to write 
> new refs with.  Since in this new scheme ref directory names 
> would conflict with old ref filenames, this would likely 
> prevent both schemes from erroneously being used 
> simultaneously (so they shouldn't corrupt each other), except 
> for the fact that refs can be nested in directories which 
> confuses things a bit.  I am not sure what a good solution to 
> this is?

I think you would need to bump core.repositoryformatversion, and just
never let old versions of git access the repository directly. Not the
end of the world, but it certainly increases deployment effort. If we
were going to do that, it would probably make sense to think about
solving the D/F conflict issues at the same time (i.e., start calling
"refs/heads/foo" in the filesystem "refs.d/heads.d/foo.ref" so that it
cannot conflict with "refs.d/heads.d/foo.d/bar.ref").

-Peff

^ permalink raw reply

* Re: [PATCH v2] wt-status: Show ignored files in untracked dirs
From: Jeff King @ 2012-12-29  7:22 UTC (permalink / raw)
  To: Antoine Pelisse; +Cc: Junio C Hamano, git
In-Reply-To: <CALWbr2yNR=K3MqBVe=sfSxPaJ+-A8utHBqoiHPHPLxr_9e9SVQ@mail.gmail.com>

On Fri, Dec 28, 2012 at 03:05:30PM +0100, Antoine Pelisse wrote:

> Using the example from Michael's mail, I end up having this:
> $ git status --porcelain --ignored
> ?? .gitignore
> ?? x
> ?? y/
> !! x.ignore-me
> !! y/
> 
> y/ is referred as untracked, because it contains untracked files, and
> then as ignored because it
> contains ignored files.
> 
> Showing it twice doesn't feel right though, so I guess we should only
> show "?? y/" with untracked=normal,
> and "!! y/foo.ignore-me" when using untracked=all
> 
> What do you think ?

Good catch. I agree that showing just "?? y/" in the untracked=normal
case makes sense. It makes the definition of "!!" to mean "all untracked
files in this path are ignored". IOW, showing "??" means there are one
or more untracked, unignored files. There may _also_ be ignored files,
but we do not say (nor we even necessarily need to bother checking).

In retrospect, I think it might have made more sense to use the second
character of an untracked line to represent "ignored". That is, the
output:

  ?? .gitignore
  ?? x
  ?! y/
  !! x.ignore-me

would make sense to me. But that would be a backwards-incompatible
change at this point, and I don't think it's worth it.

-Peff

^ permalink raw reply

* Re: [PATCH] refs: do not use cached refs in repack_without_ref
From: Jeff King @ 2012-12-29  7:16 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, Junio C Hamano
In-Reply-To: <50DAB447.8000101@alum.mit.edu>

On Wed, Dec 26, 2012 at 09:24:39AM +0100, Michael Haggerty wrote:

> I'm sorry to take so long to respond to this patch.  Thank you for
> tracking down this bug and for your careful analysis.
> 
> I think your patch is correct and should fix the first race condition
> that you described.

Thanks for checking it over. I almost cc'd you, as I know you have been
working on ref caching. But I think that the problem is much older, as
we always cached the packed-refs list in memory.

> But I think the continued existence of the other race conditions is an
> indication of a fundamental problem with the reference locking
> policy--independent of the in-RAM reference cache.
> 
> The tacit assumption of the current locking policy is that changes to
> the packed-refs file are not critical for correctness, because loose
> references take precedence over it anyway.  This is true for adding and
> modifying references.  But it is not true for deleting references,
> because there is no way for a deletion to be represented as a loose
> reference in a way that takes precedence over the packed-refs file
> (i.e., there is no way for a loose reference to say "I am deleted,
> regardless of what packed-refs says").  Thus the race conditions for
> deleting references, whether via delete_ref() or via pack_refs() with
> --prune.

Yeah. It would be much nicer if we could just write the null sha1 or a
similar sentinel into the loose ref for "I am deleted". The problem
(besides backwards compatibility) is the usual D/F conflict. I cannot
delete "refs/heads/foo" and then create "refs/heads/foo/bar" if the old
ref file is in the way.

> There is a problem if two processes try to delete a reference at the
> same time, or if one process tries to delete a reference at the same
> time as another process is trying to pack the references.  The reason is
> that there is no "transaction" that spans both the rewriting of the
> packed-refs file and also the deletion of the loose-ref files, and
> therefore it is possible for conflicting changes to be made in the two
> locations.

Just to be clear, are you talking about the races I wrote about in my
other email? Or are there other races? I didn't (and still don't) see
any actual on-disk data loss races. Just ones where a reader may get an
old, packed value (which is still bad, but is less bad than one where a
write is lost).

> I think that all of the problems would be fixed if a lock would be held
> on the packed-refs file during the whole process of deleting any
> reference; i.e., change the algorithms to:

Yeah, I looked at that, too. In fact, before I had correctly analyzed
the problem, I thought that doing so would solve the problem I was
seeing (which I noticed was wrong when it did not pass my tests :) ).

>From your description, I think you realize this, but I want to point out
for other readers: just updating the pack_refs side to call prune_refs
under the lock would create a deadlock with a simultaneous delete (which
takes the individual ref lock first, then the packed-refs lock). Of
course, I do not think git is capable of deadlock, as we typically just
die() instead of blocking on a lock. So maybe it does not matter so
much. :)

> * Delete reference foo:
> 
>   1. Acquire the lock $GIT_DIR/packed-refs.lock (regardless of whether
>      "foo" is a packed ref)

This suffers from the same problem I mentioned in my earlier email: we
create lock contention on packed-refs.lock when there are two
simultaneous deletes, even when the refs aren't packed. That might be an
acceptable tradeoff, though. It's only for deletion, not for update,
which is presumably rare. And it has to happen anyway when both refs are
packed.

>   2. Write to $GIT_DIR/packed-refs.new a version of the packed-refs
>      file that omits "foo"
> [...]

All of the further steps make sense. By deleting from packed-refs first,
we eliminate the read race-condition I mentioned in my earlier email.
The only downside is the possible increased lock contention on
packed-refs.lock.  I'm very tempted to go this route. It's better to be
correct and sometimes die on lock contention than to sometimes give the
wrong answer.

> I would appreciate a critique of my analysis.  Even if you agree, I
> think it would be OK to apply Peff's patch to fix up the most pressing
> problem, then implement the more complete solution later.

I think your analysis is correct, modulo the issue I mentioned. And I
agree that my patch is a good interim, as it fixes the worst case
(losing writes).

-Peff

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-29  5:27 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <CACsJy8AN3y_4wcZ_w0zz+ZAaDasRT-+h8vA_fp2j4+FL00dbLw@mail.gmail.com>

On Sat, Dec 29, 2012 at 12:25:04PM +0700, Nguyen Thai Ngoc Duy wrote:

> > But just dropping the compression (or doing partial
> > decompression when we only care about the beginning part) is way less
> > code and complexity.
> 
> I think I tried the partial decompression for commit header and it did
> not help much (or I misremember it, not so sure).

I'll see if I can dig up the reference, as it was something I was going
to look at next.

> > There's no cache to manage.
> 
> If reachability bitmap is implemented, we'll have per-pack cache
> infrastructure ready, so less work there for commit cache.

True. I don't want to dissuade you from doing any commit cache work. I
only wanted to point out that this alternative may have merit because of
its simplicity (so we can use it until a caching solution exists, or
even after, if managing the cache has downsides).

-Peff

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Nguyen Thai Ngoc Duy @ 2012-12-29  5:25 UTC (permalink / raw)
  To: Jeff King; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121229050707.GA14475@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 12:07 PM, Jeff King <peff@peff.net> wrote:
> On Sat, Dec 29, 2012 at 11:34:09AM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> On Sat, Dec 29, 2012 at 7:41 AM, Jeff King <peff@peff.net> wrote:
>> > I wonder if we could do even better, though. For a traversal, we only
>> > need to look at the commit header. We could potentially do a progressive
>> > inflate and stop before getting to the commit message (which is the bulk
>> > of the data, and the part that is most likely to benefit from
>> > compression).
>>
>> Commit cache should solve this efficiently as it also eliminates
>> parsing cost. We discussed this last time as a side topic of the
>> reachability bitmap feature.
>
> I agree that a commit cache would solve this (though it can not help the
> tree traversal).

Yeah, caching trees efficiently is not easy.

> But just dropping the compression (or doing partial
> decompression when we only care about the beginning part) is way less
> code and complexity.

I think I tried the partial decompression for commit header and it did
not help much (or I misremember it, not so sure).

> There's no cache to manage.

If reachability bitmap is implemented, we'll have per-pack cache
infrastructure ready, so less work there for commit cache.
-- 
Duy

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-29  5:07 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <CACsJy8D_E0shqJAvZH7xqij6F4a6qUxkUPNcZL=0yX5w9bLd_g@mail.gmail.com>

On Sat, Dec 29, 2012 at 11:34:09AM +0700, Nguyen Thai Ngoc Duy wrote:

> On Sat, Dec 29, 2012 at 7:41 AM, Jeff King <peff@peff.net> wrote:
> > I wonder if we could do even better, though. For a traversal, we only
> > need to look at the commit header. We could potentially do a progressive
> > inflate and stop before getting to the commit message (which is the bulk
> > of the data, and the part that is most likely to benefit from
> > compression).
> 
> Commit cache should solve this efficiently as it also eliminates
> parsing cost. We discussed this last time as a side topic of the
> reachability bitmap feature.

I agree that a commit cache would solve this (though it can not help the
tree traversal). But just dropping the compression (or doing partial
decompression when we only care about the beginning part) is way less
code and complexity. There's no cache to manage.

-Peff

^ permalink raw reply

* Re: [PATCH] Remove the suggestion to use parsecvs, which is currently broken.
From: Eric S. Raymond @ 2012-12-29  4:42 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: git
In-Reply-To: <20121228230149.GA3575@book-mint>

Heiko Voigt <hvoigt@hvoigt.net>:
> Maybe you could add that information to the parsecvs compile
> instructions? I think just because it takes some effort to compile does
> not justify to remove this useful pointer here. When I was converting a
> legacy cvs repository this pointer would have helped me a lot.

I'm parsecvs's maintainer now.  It's not in good shape; there is at
least one other known showstopper besides the build issue.  I would
strongly prefer to direct peoples' attention away from it until I
have time to fix it and cut a release.  This is not a distant 
prospect - two or three weeks out, maybe.

The priority that is between me and fixing parsecvs is getting (a)
cvsps and git-cvsimport to a non-broken state, and (b) having a sound
test suite in place so I *know* it's in a non-broken state. As previously
discussed, I will then apply that test suite to parsecvs.

Heiko, you can speed up the process by (a) adapting your tests for
the new cvsps test code, and (b) merging the fix you wrote so cvsps
would pass the t9603 test.  

The sooner I can get that out of the way, the sooner I will be avble
to pay serious attention to parsecvs.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Nguyen Thai Ngoc Duy @ 2012-12-29  4:34 UTC (permalink / raw)
  To: Jeff King; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121229004104.GA24828@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 7:41 AM, Jeff King <peff@peff.net> wrote:
> I wonder if we could do even better, though. For a traversal, we only
> need to look at the commit header. We could potentially do a progressive
> inflate and stop before getting to the commit message (which is the bulk
> of the data, and the part that is most likely to benefit from
> compression).

Commit cache should solve this efficiently as it also eliminates
parsing cost. We discussed this last time as a side topic of the
reachability bitmap feature.
-- 
Duy

^ permalink raw reply

* Re: [PATCH] Use longer alias names in subdirectory tests
From: Junio C Hamano @ 2012-12-29  3:42 UTC (permalink / raw)
  To: Aaron Schrab; +Cc: git
In-Reply-To: <1356735786-24001-1-git-send-email-aaron@schrab.com>

Aaron Schrab <aaron@schrab.com> writes:

> When testing aliases in t/t1020-subdirectory.sh use longer names so that
> they're less likely to conflict with a git-* command somewhere in the
> $PATH.

Thanks.

In the longer term we might want to rethink the way we run the tests
so that random $PATH the user has has less chance of interacting
with our tests (we had a similar topic around completion output
recently), but until that happens, I think this is a good change.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox