Git development
 help / color / mirror / Atom feed
* Re: [PATCH v2 28/51] refs.c: rename ref_array -> ref_dir
From: Junio C Hamano @ 2011-12-13  6:37 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips
In-Reply-To: <4EE6E61F.8080405@alum.mit.edu>

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Apropos testing, it is unsettling that our test suite doesn't show any
> failures after my changes.  The dir entries in extra_refs are now always
> sorted and de-duped when do_for_each_ref() is called.  Could it be that
> duplicate ".have" references never come up in the test suite?  It sounds
> like an important code path is not being tested.  In particular, I won't
> be able to test whether my fix works :-/

I doubt anybody thought of using more than one --reference while cloning
or having more than one entry in $GIT_DIR/objects/info/alternates in a
repository that is being pushed into, even though we might have tests for
simpler single --reference and single alternate cases.

As to the order of the changes, my gut feeling is that it would make it
harder to debug your series to delay the removal of "extra_ref" hackery,
as it would be a corner case that your "hierarchical" structure never has
to worry about in the end result.

Another possibility is to keep the extra_ref interface in refs.c, without
any of your changes (i.e. keep it just as a flat list, with the original
interface to append to it without any dedup and other fancy features) and
also keep the special casing of it in for_each_ref(). AFAIK, that is the
only iterator that should care about the extra refs. Thinking about it a
bit more, removal of add_extra_ref() API may be unnecessarily complex with
no real gain. For example, extra refs added in builtin/clone.c is used by
builtin/fetch-pack.c, meaning that the codepath that discovers and
remembers these extra history anchor points and the codepath that uses
them while walking the history are not localized and we would need some
sort of "extra ref API" anyway. I am leaning towards this option.

^ permalink raw reply

* Re: [PATCH v2] Update documentation for stripspace
From: Frans Klaver @ 2011-12-13  6:28 UTC (permalink / raw)
  To: Junio C Hamano, Conrad Irwin; +Cc: git
In-Reply-To: <1323728909-7847-1-git-send-email-conrad.irwin@gmail.com>

On Mon, 12 Dec 2011 23:28:29 +0100, Conrad Irwin <conrad.irwin@gmail.com>  
wrote:

>> an incomplete line, i.e. "ensures the output does not end with an
>> incomplete line by adding '\n' at the end if needed".
>
> Hmm, I'm not sure that's the best way of describing it — I've gone with:
> "add a missing '\n' to the last line if necessary.".

In most editors/IDE's I know and that support this, this is called "ensure  
new-line at end of file". I find this wording clearer than the above two  
options.

Cheers,
Frans

^ permalink raw reply

* Re: Question about commit message wrapping
From: Frans Klaver @ 2011-12-13  6:16 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Holger Hellmuth, Andrew Ardill, Jakub Narebski,
	Sidney San Martín, git
In-Reply-To: <4EE6C31C.60909@alum.mit.edu>

On Tue, 13 Dec 2011 04:14:36 +0100, Michael Haggerty  
<mhagger@alum.mit.edu> wrote:

> On 12/12/2011 11:16 PM, Frans Klaver wrote:
>> Wrapped code as in auto-wrapped? Or as in manually wrapped? Python
>> programmers have significant white space, but you can still hard wrap
>> stuff, as long as the next statement is properly indented.
>
> This is incorrect.  Python statements can only be broken across lines
> within unbalanced parenthesis (or using '\' or within a multiline
> string).  For example,
>
>     x =
>         1
>
> is a syntax error, while
>
>     y = (
>         1
>         )
>
> or
>
>     f(1,
>           2,
>       3,
>       4)
>
> are both valid.

Hm yes, my statement was quite incomplete. What you describe here is what  
I meant, and I should have taken the time to word it down properly. Thanks  
for taking the time to do so.



>
> FWIW I think automatic wrapping of commit messages is a bad idea.  I
> wrap my commit messages deliberately to make them look the way I want
> them to look.  The assumption of an 80-character display has historical
> reasons, but it is also a relatively comfortable line-width to read
> (even on wider displays).  And given that commit messages sometimes
> contain "flowable" paragraph text, sometimes code snippets, sometimes
> ASCII art, etc, no automatic wrapping will work correctly unless
> everybody agrees that commit messages must be written in some specific
> form of markup (or lightweight markup).  And I can't imagine such a
> thing ever happening.
>
> As for "future-proofing", do you really think there will be a lot of
> programming happening on mobile phones with less than 80-character-wide
> displays?  (And even my little HTC can easily fit 80 characters if I
> rotate the phone to "landscape" mode.)

Makes sense.

^ permalink raw reply

* Re: Git blame only current branch
From: Junio C Hamano @ 2011-12-13  5:47 UTC (permalink / raw)
  To: Vijay Lakshminarayanan; +Cc: Jeff King, Stephen Bash, git discussion list
In-Reply-To: <8739cpteat.fsf@gmail.com>

Vijay Lakshminarayanan <laksvij@gmail.com> writes:

> The code reads fine when there's no numeral 1 around but now it doesn't
> read well.  I think refactoring
>
>     struct commit_list *l
>
> to 
>
>     struct commit_list *lst
>
> is justified.  Thoughts?

Not justified at all.

What is "lst" and why is it not spelled "list"?  It is a disease to drop
vowels when you do not have to.

If I were to name a new variable that points at one element of a linked
list and is used to walk the list (surprise!) "element" or perhaps "elem"
for short, but in the context of that short function I honestly do not see
much need for such a naming. The variable is extremely short-lived and
there is no room for confusion.

^ permalink raw reply

* Re: [PATCH v2 28/51] refs.c: rename ref_array -> ref_dir
From: Michael Haggerty @ 2011-12-13  5:43 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips
In-Reply-To: <7v7h21xps9.fsf@alter.siamese.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 3838 bytes --]

On 12/13/2011 01:45 AM, Junio C Hamano wrote:
> mhagger@alum.mit.edu writes:
> 
>> From: Michael Haggerty <mhagger@alum.mit.edu>
>>
>> This purely textual change is in preparation for storing references
>> hierarchically, when the old ref_array structure will represent one
>> "directory" of references.  Rename functions that deal with this
>> structure analogously, and also rename the structure's "refs" member
>> to "entries".
>>
>> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
>> ---
>>  refs.c |  166 ++++++++++++++++++++++++++++++++--------------------------------
>>  1 files changed, 83 insertions(+), 83 deletions(-)
>>
>> diff --git a/refs.c b/refs.c
>> index fe6d657..b74ef80 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -106,9 +106,9 @@ struct ref_value {
>>  	unsigned char peeled[20];
>>  };
>>  
>> -struct ref_array {
>> +struct ref_dir {
>>  	int nr, alloc;
>> -	struct ref_entry **refs;
>> +	struct ref_entry **entries;
>>  };
> 
> The s/refs/entries/ renaming is a sane thing to do; on the other hand, I
> somehow find the s/ref_array/ref_dir/ renaming is a short-sighted change
> and undesirable, as you are essentially declaring that "if you use this
> structure, the contents you store there are expected to be named
> hierarchically", forbidding users that want to use a simple flat array.

This data structure is not exposed outside of the module, so it only
describes the de facto current storage scheme rather than making any
promises to the outside world.

> BUT. That was an observation before I continued reading the remainder of
> the series.

> I think the above observation primarily come from my worries around the
> extra ref stuff, which by nature does not fit well with the hierarchical
> naming (they do not even have any meaningful names). Sorting or requiring
> uniqueness among them do not make any sense, let alone cutting their names
> in hierarchical boundaries.
> 
> As an alternative, it _might_ make sense to get rid of "add_extra_ref()"
> API from refs.c and make it the responsibility for its users to add their
> extra anchor points where they use for_each_ref() to find out all anchor
> points in the history from the refs.c subsystem. If we go that route, I
> fully agree that "s/ref_array/ref_dir/" renaming is the right thing to do,
> as refs.c subsystem will _only_ handle the hierarchical ref namespace and
> nothing else after such a change.

I absolutely agree; the fact that extra refs are part of the refs module
has the foul smell of expedience.  And your suggestion for changing the
situation makes sense to me as far as I can follow it.  But I don't
think I have the gumption to attack another big part of the code base
before I've even finished the changes that I still have planned for the
refs API.

If somebody else wants to volunteer to make extra_refs redundant, I
would be happy to support that person on the refs side.

Otherwise, I propose to just avoid de-duping extra refs so that my patch
series doesn't make things worse.  If it is acceptable, I would prefer
to add the fix as a patch at the end of the series, because after

    struct ref_dir: store a reference to the enclosing ref_cache

ref_dir carries around information that can be used to distinguish
between the extra_refs and other ref caches.  I think it is as easy as
the attached patch (untested).

Apropos testing, it is unsettling that our test suite doesn't show any
failures after my changes.  The dir entries in extra_refs are now always
sorted and de-duped when do_for_each_ref() is called.  Could it be that
duplicate ".have" references never come up in the test suite?  It sounds
like an important code path is not being tested.  In particular, I won't
be able to test whether my fix works :-/

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

[-- Attachment #2: fix-dedup.diff --]
[-- Type: text/x-patch, Size: 1358 bytes --]

diff --git a/refs.c b/refs.c
index 2d5c827..2fd3db2 100644
--- a/refs.c
+++ b/refs.c
@@ -457,7 +457,6 @@ static int is_dup_ref(const struct ref_entry *ref1, const struct ref_entry *ref2
  */
 static void sort_ref_dir(struct ref_entry *direntry)
 {
-	int i, j;
 	struct ref_entry *last = NULL;
 	struct ref_dir *dir;
 	assert(direntry->flag & REF_DIR);
@@ -468,19 +467,24 @@ static void sort_ref_dir(struct ref_entry *direntry)
 
 	qsort(dir->entries, dir->nr, sizeof(*dir->entries), ref_entry_cmp);
 
-	/* Remove any duplicates: */
-	for (i = 0, j = 0; j < dir->nr; j++) {
-		struct ref_entry *entry = dir->entries[j];
-		if (last && is_dup_ref(last, entry)) {
-			free_ref_entry(entry);
-		} else if (entry->flag & REF_DIR) {
-			dir->entries[i++] = entry;
-			last = NULL;
-		} else {
-			last = dir->entries[i++] = entry;
+	/* Remove any duplicates, but not for extra_refs: */
+	if (dir->ref_cache != NULL) {
+		int i, j;
+		for (i = 0, j = 0; j < dir->nr; j++) {
+			struct ref_entry *entry = dir->entries[j];
+			if (last && is_dup_ref(last, entry)) {
+				free_ref_entry(entry);
+			} else if (entry->flag & REF_DIR) {
+				dir->entries[i++] = entry;
+				last = NULL;
+			} else {
+				last = dir->entries[i++] = entry;
+			}
 		}
+		dir->nr = i;
 	}
-	dir->sorted = dir->nr = i;
+
+	dir->sorted = dir->nr;
 }
 
 #define DO_FOR_EACH_INCLUDE_BROKEN 01

^ permalink raw reply related

* Re: [PATCH 2/3] test-terminal: set output terminals to raw mode
From: Jonathan Nieder @ 2011-12-13  5:09 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Junio C Hamano, Jeff King, Michael Haggerty
In-Reply-To: <201112122001.36303.trast@student.ethz.ch>

Thomas Rast wrote:

> I tested this tweak of the script:
[...]
> That's over ssh on
>
>   $ uname -a
>   Darwin mackeller.inf.ethz.ch 11.1.0 Darwin Kernel Version 11.1.0: Tue Jul 26 16:07:11 PDT 2011; root:xnu-1699.22.81~1/RELEASE_X86_64 x86_64

Thanks.  I've passed this information on to Apple (rdar://9046540),
though I have no reason to believe anyone reads the reports there. :)

^ permalink raw reply

* Re: [PATCH v2 08/51] is_dup_ref(): extract function from sort_ref_array()
From: Michael Haggerty @ 2011-12-13  5:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips
In-Reply-To: <4EE6D61A.2030300@alum.mit.edu>

On 12/13/2011 05:35 AM, Michael Haggerty wrote:
> Is it *really* possible to have multiple extra_refs with the same name,
> or is this more of a hypothetical problem that requires more careful
> analysis?

I see that you already answered this question downthread.  I will reply
to your analysis there.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply

* Re: [PATCH v2 08/51] is_dup_ref(): extract function from sort_ref_array()
From: Michael Haggerty @ 2011-12-13  4:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips
In-Reply-To: <7vvcplxq4r.fsf@alter.siamese.dyndns.org>

On 12/12/2011 11:33 PM, Junio C Hamano wrote:
> mhagger@alum.mit.edu writes:
> 
>> +/*
>> + * Emit a warning and return true iff ref1 and ref2 have the same name
>> + * and the same sha1.  Die if they have the same name but different
>> + * sha1s.
>> + */
>> +static int is_dup_ref(const struct ref_entry *ref1, const struct ref_entry *ref2)
>> +{
>> +	if (!strcmp(ref1->name, ref2->name)) {
>> +		/* Duplicate name; make sure that the SHA1s match: */
>> +		if (hashcmp(ref1->sha1, ref2->sha1))
>> +			die("Duplicated ref, and SHA1s don't match: %s",
>> +			    ref1->name);
>> +		warning("Duplicated ref: %s", ref1->name);
>> +		return 1;
>> +	} else {
>> +		return 0;
>> +	}
>> +}
> 
> At this step, is_dup_ref() is only called from sort_ref_array() which in
> turn is only called on either a collection of loose or packed refs, so
> warning is warranted. IOW I do not see anything wrong with _this_ patch.

I agree.

> Later in the series, however, the collection of extra refs seems to be
> shoved into a phoney "direntry" and made to go through the same add_ref()
> machinery that uses find_containing_direntry() which in turn calls
> search_ref_dir() that smells like a definite no-no.

Correct.

I had remembered your explanation of the purpose of extra refs and also
that they have unusual names, but I hadn't allowed for the possibility
that multiple extra_refs can have the same name.  The old code never
sorted the extra refs and therefore never checked or eliminated
duplicates.  The new code does, as sorting is an intrinsic part of
looking up references and reference subdirectories in the hierarchical
cache.  It is introduced along with the meat of the change in

    [PATCH v2 29/51] refs: store references hierarchically

So...the new code would die if two extra refs have the same name but
different SHA1s, and emit a warning if they have the same name and the
same SHA1s.  (However, it is no problem for an extra ref to have the
same name as a packed or loose ref.)

Is it *really* possible to have multiple extra_refs with the same name,
or is this more of a hypothetical problem that requires more careful
analysis?

If the former, then I will have to do some work.  My approach would
probably be to allow sorting without de-duplication; that way extra_refs
can continue to share most of the code used for the other ref caches.
It would be relatively easy to add such a fix on top of the patch
series, where every ref_entry knows the ref_cache with which it is
associated.  It would be quite a bit more painful to massage such a fix
through the whole patch series.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply

* Re: [PATCH 4/4] connect.c: drop path_match function
From: Michael Haggerty @ 2011-12-13  3:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Kevin Sawicki
In-Reply-To: <20111213004959.GD3699@sigill.intra.peff.net>

On 12/13/2011 01:49 AM, Jeff King wrote:
> I just did the exact-string match inline in the previous patch. I could
> also have modified path_match to do it. But really, I can't think of a
> worse name for a global function in a system which is all about
> managing content in paths. Unless, you know, it actually matched paths.
> Which it doesn't.

The tacit assumption that a reference is always a file is all over the
code and is often confusing.  My

    [PATCH v2 02/51] refs: rename "refname" variables

is a step towards making the distinction more explicit, at least in the
refs code.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply

* Re: Question about commit message wrapping
From: Michael Haggerty @ 2011-12-13  3:14 UTC (permalink / raw)
  To: Frans Klaver
  Cc: Holger Hellmuth, Andrew Ardill, Jakub Narebski,
	Sidney San Martín, git
In-Reply-To: <op.v6edibfz0aolir@keputer>

On 12/12/2011 11:16 PM, Frans Klaver wrote:
> Wrapped code as in auto-wrapped? Or as in manually wrapped? Python
> programmers have significant white space, but you can still hard wrap
> stuff, as long as the next statement is properly indented.

This is incorrect.  Python statements can only be broken across lines
within unbalanced parenthesis (or using '\' or within a multiline
string).  For example,

    x =
        1

is a syntax error, while

    y = (
        1
        )

or

    f(1,
          2,
      3,
      4)

are both valid.

FWIW I think automatic wrapping of commit messages is a bad idea.  I
wrap my commit messages deliberately to make them look the way I want
them to look.  The assumption of an 80-character display has historical
reasons, but it is also a relatively comfortable line-width to read
(even on wider displays).  And given that commit messages sometimes
contain "flowable" paragraph text, sometimes code snippets, sometimes
ASCII art, etc, no automatic wrapping will work correctly unless
everybody agrees that commit messages must be written in some specific
form of markup (or lightweight markup).  And I can't imagine such a
thing ever happening.

As for "future-proofing", do you really think there will be a lot of
programming happening on mobile phones with less than 80-character-wide
displays?  (And even my little HTC can easily fit 80 characters if I
rotate the phone to "landscape" mode.)

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply

* Re: Git blame only current branch
From: Jeff King @ 2011-12-13  2:14 UTC (permalink / raw)
  To: Vijay Lakshminarayanan; +Cc: Stephen Bash, git discussion list
In-Reply-To: <8739cpteat.fsf@gmail.com>

On Tue, Dec 13, 2011 at 07:37:22AM +0530, Vijay Lakshminarayanan wrote:

> > diff --git a/builtin/blame.c b/builtin/blame.c
> > index 80febbe..c19a8cd 100644
> > --- a/builtin/blame.c
> > +++ b/builtin/blame.c
> > @@ -1191,6 +1191,8 @@ static int num_scapegoats(struct rev_info *revs, struct commit *commit)
> >  {
> >  	int cnt;
> >  	struct commit_list *l = first_scapegoat(revs, commit);
> > +	if (revs->first_parent_only)
> > +		return l ? 1 : 0;
> >  	for (cnt = 0; l; l = l->next)
> >  		cnt++;
> >  	return cnt;
> 
> I just spent 30s staring at this wondering why you needed to do 
> 
>     return 1 ? 1 : 0;
> 
> which always returns 1 anyway before I realized it was a lowercase L.
> 
> The code reads fine when there's no numeral 1 around but now it doesn't
> read well.  I think refactoring
> 
>     struct commit_list *l
> 
> to 
> 
>     struct commit_list *lst
> 
> is justified.  Thoughts?

Sure, that would help. I wasn't planning to push this forward as a
"real" patch, but if somebody wants to do some testing and, more
importantly read through the code to make sure I am not violating some
assumptions, then it might be worth including upstream.

-Peff

^ permalink raw reply

* Re: Git blame only current branch
From: Vijay Lakshminarayanan @ 2011-12-13  2:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Stephen Bash, git discussion list
In-Reply-To: <20111212165542.GA4802@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

[snip]

> diff --git a/builtin/blame.c b/builtin/blame.c
> index 80febbe..c19a8cd 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -1191,6 +1191,8 @@ static int num_scapegoats(struct rev_info *revs, struct commit *commit)
>  {
>  	int cnt;
>  	struct commit_list *l = first_scapegoat(revs, commit);
> +	if (revs->first_parent_only)
> +		return l ? 1 : 0;
>  	for (cnt = 0; l; l = l->next)
>  		cnt++;
>  	return cnt;

I just spent 30s staring at this wondering why you needed to do 

    return 1 ? 1 : 0;

which always returns 1 anyway before I realized it was a lowercase L.

The code reads fine when there's no numeral 1 around but now it doesn't
read well.  I think refactoring

    struct commit_list *l

to 

    struct commit_list *lst

is justified.  Thoughts?

> -Peff

-- 
Cheers
~vijay

Gnus should be more complicated.

^ permalink raw reply

* Re: [PATCH 2/2] Makefile: optionally exclude code that needs Unix sockets
From: Jeff King @ 2011-12-13  0:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git
In-Reply-To: <7v1us91i5b.fsf@alter.siamese.dyndns.org>

On Mon, Dec 12, 2011 at 03:31:44PM -0800, Junio C Hamano wrote:

> > In theory we should also disable the documentation for credential-cache.
> > But that means surgery not only to Documentation/Makefile, but figuring
> > out how to pass the flags down to the actual asciidoc process (since
> > gitcredentials(7) mentions it in the text). Certainly possible, but I
> > don't know if it's worth the effort or not.
> 
> I do not think it matters that much. We've been shipping documentation for
> stuff like remote archiver and daemon without conditional compilation, no?

True. Let's not worry about it, then.

> I'll queue a single patch that is a squash between 2/2 and Peff's test
> updates between "credentials: add "cache" helper" and "strbuf: add
> strbuf_add*_urlencode" in the series.

That's perfect. Thanks.

-Peff

^ permalink raw reply

* Re: [PATCH 3/4] fetch-pack: match refs exactly
From: Jeff King @ 2011-12-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Kevin Sawicki
In-Reply-To: <20111213004808.GC3699@sigill.intra.peff.net>

On Mon, Dec 12, 2011 at 07:48:08PM -0500, Jeff King wrote:

> diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
> index 46688dc..6207ecd 100644
> --- a/builtin/fetch-pack.c
> +++ b/builtin/fetch-pack.c
> @@ -556,11 +556,16 @@ static void filter_refs(struct ref **refs, int nr_match, char **match)
>  			continue;
>  		}
>  		else {
> -			int order = path_match(ref->name, nr_match, match);
> -			if (order) {
> -				return_refs[order-1] = ref;
> -				continue; /* we will link it later */
> +			int i;
> +			for (i = 0; i < nr_match; i++) {
> +				if (!strcmp(ref->name, match[i])) {
> +					match[i][0] = '\0';
> +					return_refs[i] = ref;
> +					break;
> +				}
>  			}
> +			if (i < nr_match)
> +				continue; /* we will link it later */

Astute readers may notice that our matching scheme is now something
like:

  for ref in refs
    for match in matches
      strcmp(ref, match)

If you are fetching everything, that's O(n^2) strcmps (where n is the
number of remote refs). If we sorted the match list (the refs list is
already sorted), we could turn it into an O(n lg n) sort+merge.  This is
an optimization we couldn't do with the old suffix-matching rule.

I doubt it matters in any practical case. Even for our crazy 100K ref
cases at GitHub, we don't tend to actually fetch all of those at one
time. So it is more like O(n * m), where m is probably in the dozens at
most.

-Peff

^ permalink raw reply

* [PATCH 4/4] connect.c: drop path_match function
From: Jeff King @ 2011-12-13  0:49 UTC (permalink / raw)
  To: git; +Cc: Kevin Sawicki
In-Reply-To: <20111213003925.GA28403@sigill.intra.peff.net>

This function was used for comparing local and remote ref
names during fetch (which makes it a candidate for "most
confusingly named function of the year").

It no longer has any callers, so let's get rid of it.

Signed-off-by: Jeff King <peff@peff.net>
---
I just did the exact-string match inline in the previous patch. I could
also have modified path_match to do it. But really, I can't think of a
worse name for a global function in a system which is all about
managing content in paths. Unless, you know, it actually matched paths.
Which it doesn't.

 cache.h   |    1 -
 connect.c |   21 ---------------------
 2 files changed, 0 insertions(+), 22 deletions(-)

diff --git a/cache.h b/cache.h
index 408e880..2ad063f 100644
--- a/cache.h
+++ b/cache.h
@@ -1029,7 +1029,6 @@ struct ref {
 extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
 extern int finish_connect(struct child_process *conn);
 extern int git_connection_is_socket(struct child_process *conn);
-extern int path_match(const char *path, int nr, char **match);
 struct extra_have_objects {
 	int nr, alloc;
 	unsigned char (*array)[20];
diff --git a/connect.c b/connect.c
index 48df90b..2a0a040 100644
--- a/connect.c
+++ b/connect.c
@@ -105,27 +105,6 @@ int server_supports(const char *feature)
 		strstr(server_capabilities, feature) != NULL;
 }
 
-int path_match(const char *path, int nr, char **match)
-{
-	int i;
-	int pathlen = strlen(path);
-
-	for (i = 0; i < nr; i++) {
-		char *s = match[i];
-		int len = strlen(s);
-
-		if (!len || len > pathlen)
-			continue;
-		if (memcmp(path + pathlen - len, s, len))
-			continue;
-		if (pathlen > len && path[pathlen - len - 1] != '/')
-			continue;
-		*s = 0;
-		return (i + 1);
-	}
-	return 0;
-}
-
 enum protocol {
 	PROTO_LOCAL = 1,
 	PROTO_SSH,
-- 
1.7.8.13.g74677

^ permalink raw reply related

* [PATCH 3/4] fetch-pack: match refs exactly
From: Jeff King @ 2011-12-13  0:48 UTC (permalink / raw)
  To: git; +Cc: Kevin Sawicki
In-Reply-To: <20111213003925.GA28403@sigill.intra.peff.net>

When we are determining the list of refs to fetch via
fetch-pack, we have two sets of refs to compare: those on
the remote side, and a "match" list of things we want to
fetch. We iterate through the remote refs alphabetically,
seeing if each one is wanted by the "match" list.

Since def88e9 (Commit first cut at "git-fetch-pack",
2005-07-04), we have used the "path_match" function to do a
suffix match, where a remote ref is considered wanted if
any of the "match" elements is a suffix of the remote
refname.

This enables callers of fetch-pack to specify unqualified
refs and have them matched up with remote refs (e.g., ask
for "A" and get remote's "refs/heads/A"). However, if you
provide a fully qualified ref, then there are corner cases
where we provide the wrong answer. For example, given a
remote with two refs:

   refs/foo/refs/heads/master
   refs/heads/master

asking for "refs/heads/master" will first match
"refs/foo/refs/heads/master" by the suffix rule, and we will
erroneously fetch it instead of refs/heads/master.

As it turns out, all callers of fetch_pack do provide
fully-qualified refs for the match list. There are two ways
fetch_pack can get match lists:

  1. Through the transport code (i.e., via git-fetch)

  2. On the command-line of git-fetch-pack

In the first case, we will always be providing the names of
fully-qualified refs from "struct ref" objects. We will have
pre-matched those ref objects already (since we have to
handle more advanced matching, like wildcard refspecs), and
are just providing a list of the refs whose objects we need.

In the second case, users could in theory be providing
non-qualified refs on the command-line. However, the
fetch-pack documentation claims that refs should be fully
qualified (and has always done so since it was written in
2005).

Let's change this path_match call to simply check for string
equality, matching what the callers of fetch_pack are
expecting.

Signed-off-by: Jeff King <peff@peff.net>
---
This is obviously the one that can break existing fetch-pack users. I
doubt they exist. If they do, there are a few alternatives:

  1. Come up with some more sane rules for path_match (e.g., try full
     strings first, use full-string matching for things starting with
     "refs/", etc).

  2. Leave the matching in-place for git-fetch-pack, but use exact
     matching for internal users that will always provide qualified refs
     (i.e., "git fetch").

I don't personally think it's worth the trouble.

 builtin/fetch-pack.c      |   13 +++++++++----
 t/t5527-fetch-odd-refs.sh |   29 +++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 4 deletions(-)
 create mode 100755 t/t5527-fetch-odd-refs.sh

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 46688dc..6207ecd 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -556,11 +556,16 @@ static void filter_refs(struct ref **refs, int nr_match, char **match)
 			continue;
 		}
 		else {
-			int order = path_match(ref->name, nr_match, match);
-			if (order) {
-				return_refs[order-1] = ref;
-				continue; /* we will link it later */
+			int i;
+			for (i = 0; i < nr_match; i++) {
+				if (!strcmp(ref->name, match[i])) {
+					match[i][0] = '\0';
+					return_refs[i] = ref;
+					break;
+				}
 			}
+			if (i < nr_match)
+				continue; /* we will link it later */
 		}
 		free(ref);
 	}
diff --git a/t/t5527-fetch-odd-refs.sh b/t/t5527-fetch-odd-refs.sh
new file mode 100755
index 0000000..edea9f9
--- /dev/null
+++ b/t/t5527-fetch-odd-refs.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test fetching of oddly-named refs'
+. ./test-lib.sh
+
+# afterwards we will have:
+#  HEAD - two
+#  refs/for/refs/heads/master - one
+#  refs/heads/master - three
+test_expect_success 'setup repo with odd suffix ref' '
+	echo content >file &&
+	git add . &&
+	git commit -m one &&
+	git update-ref refs/for/refs/heads/master HEAD &&
+	echo content >>file &&
+	git commit -a -m two &&
+	echo content >>file &&
+	git commit -a -m three &&
+	git checkout HEAD^
+'
+
+test_expect_success 'suffix ref is ignored during fetch' '
+	git clone --bare file://"$PWD" suffix &&
+	echo three >expect &&
+	git --git-dir=suffix log -1 --format=%s refs/heads/master >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
1.7.8.13.g74677

^ permalink raw reply related

* Re: [PATCH 2/2] Makefile: optionally exclude code that needs Unix sockets
From: Junio C Hamano @ 2011-12-13  0:45 UTC (permalink / raw)
  To: Johannes Sixt, Jeff King; +Cc: git
In-Reply-To: <20111212213951.GB9754@sigill.intra.peff.net>

I'll queue a single patch that is a squash between 2/2 and Peff's test
updates between "credentials: add "cache" helper" and "strbuf: add
strbuf_add*_urlencode" in the series.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 28/51] refs.c: rename ref_array -> ref_dir
From: Junio C Hamano @ 2011-12-13  0:45 UTC (permalink / raw)
  To: mhagger
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips
In-Reply-To: <1323668338-1764-29-git-send-email-mhagger@alum.mit.edu>

mhagger@alum.mit.edu writes:

> From: Michael Haggerty <mhagger@alum.mit.edu>
>
> This purely textual change is in preparation for storing references
> hierarchically, when the old ref_array structure will represent one
> "directory" of references.  Rename functions that deal with this
> structure analogously, and also rename the structure's "refs" member
> to "entries".
>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>  refs.c |  166 ++++++++++++++++++++++++++++++++--------------------------------
>  1 files changed, 83 insertions(+), 83 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index fe6d657..b74ef80 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -106,9 +106,9 @@ struct ref_value {
>  	unsigned char peeled[20];
>  };
>  
> -struct ref_array {
> +struct ref_dir {
>  	int nr, alloc;
> -	struct ref_entry **refs;
> +	struct ref_entry **entries;
>  };

The s/refs/entries/ renaming is a sane thing to do; on the other hand, I
somehow find the s/ref_array/ref_dir/ renaming is a short-sighted change
and undesirable, as you are essentially declaring that "if you use this
structure, the contents you store there are expected to be named
hierarchically", forbidding users that want to use a simple flat array.

BUT. That was an observation before I continued reading the remainder of
the series.

I think the above observation primarily come from my worries around the
extra ref stuff, which by nature does not fit well with the hierarchical
naming (they do not even have any meaningful names). Sorting or requiring
uniqueness among them do not make any sense, let alone cutting their names
in hierarchical boundaries.

As an alternative, it _might_ make sense to get rid of "add_extra_ref()"
API from refs.c and make it the responsibility for its users to add their
extra anchor points where they use for_each_ref() to find out all anchor
points in the history from the refs.c subsystem. If we go that route, I
fully agree that "s/ref_array/ref_dir/" renaming is the right thing to do,
as refs.c subsystem will _only_ handle the hierarchical ref namespace and
nothing else after such a change.

A bit of background refresher.

"git clone --reference=../another-repo.git $origin" is a typical user of
the add_extra_ref() API. It runs an equivalent of ls-remote against the
reference repository, and uses add_extra_ref() to cause its later call to
for_each_ref() to return the names of the objects at the tips of refs in
that reference repository.

During the object transfer (i.e. the equivalent of "git fetch") from the
origin, "clone" and "upload-pack" negotiate what objects need to be sent,
and this is done by the receiving end telling what it has to the sending
end, and the sending end uses this information to subtract what the
receiving end claims to already have from what is going to be sent.

And the enumeration of what the receiving end, especially when there is no
"reference", is done using for_each_ref(), as object transfer considers
everything reachable from the refs complete. add_extra_ref() is a _hack_
to cause for_each_ref() to include objects that actually do _not_ sit at
the tip of any of our refs.

When cloning, we do not have any ref, and after clone is done, we do not
want the refs the repository we are borrowing its objects from to be our
ref (after all, we may be using a local copy of linux-3.0 repository as
our reference but cloning from linux-2.4 history), and that is why this
hack was invented (in the old scripted version, we tentatively created
real refs in our ref namespace and removed them after clone finished).

In the case of "clone", these extra refs are registered with names taken
from the repository we are borrowing from, so they may be unique and
without conflict among them if you are borrowing from one repository, but
if you borrow from more than one repositories, it is likely that all of
them have "refs/heads/master" and there is no reason to expect that the
extra refs added with add_extra_ref() API have unique names. Worse, in
"receive-pack" that runs on the receiving end when you "git push" your
history, "refs" that exist in repositories that the receiving repository
is borrowing from (i.e. it has $GIT_DIR/objects/info/alternates) are
advertised with ".have" in their names (not using anything refs/* is to
avoid the pusher from mistakenly think there is such a ref that is
eligible for "matching push" logic), and these ".have" entries obviously
are not unique. The code also uses the same add_extra_ref() hack to cause
for_each_ref() to report them.

The removal of this hack, taking receive-pack as an example, would be to
stop using add_extra_ref() but instead to keep a local list of object
names that the code currently uses add_extra_ref() to keep track of, and
then modify write_head_info() to feed show_ref_cb() with these object
names in addition to the current for_each_ref() callback.

It may make the resulting code cleaner if we go that route, and if we were
to do so, I think the right place to do so in the series would be either
at the very beginning of the series (as part of the preliminary clean-up),
or immediately before "do_for_each_ref: correctly terminate while
processing extra_refs" (making that commit unnecessary, as after such a
fix, refs.c API won't have to worry about the extra_refs hack anymore).

What do you think?

^ permalink raw reply

* [PATCH 2/4] t5500: give fully-qualified refs to fetch-pack
From: Jeff King @ 2011-12-13  0:44 UTC (permalink / raw)
  To: git; +Cc: Kevin Sawicki
In-Reply-To: <20111213003925.GA28403@sigill.intra.peff.net>

The fetch-pack documentation is very clear that refs given
on the command line are to be full refs:

  <refs>...::
          The remote heads to update from. This is relative to
          $GIT_DIR (e.g. "HEAD", "refs/heads/master").  When
          unspecified, update from all heads the remote side has.

and this has been the case since fetch-pack was originally documented in
8b3d9dc ([PATCH] Documentation: clone/fetch/upload., 2005-07-14).

Let's follow our own documentation to set a good example,
and to avoid breaking when this restriction is enforced in
the next patch.

Signed-off-by: Jeff King <peff@peff.net>
---
This patch by itself isn't a big deal for backwards compatibility But
if we are violating the constraints in the documentation, then I worry
that others are, too. On the other hand, I seriously doubt anyone is
actually using fetch-pack directly anymore at all.

 t/t5500-fetch-pack.sh |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index bafcca7..9bf69e9 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -97,7 +97,7 @@ test_expect_success 'setup' '
 	git symbolic-ref HEAD refs/heads/B
 '
 
-pull_to_client 1st "B A" $((11*3))
+pull_to_client 1st "refs/heads/B refs/heads/A" $((11*3))
 
 test_expect_success 'post 1st pull setup' '
 	add A11 $A10 &&
@@ -110,9 +110,9 @@ test_expect_success 'post 1st pull setup' '
 	done
 '
 
-pull_to_client 2nd "B" $((64*3))
+pull_to_client 2nd "refs/heads/B" $((64*3))
 
-pull_to_client 3rd "A" $((1*3))
+pull_to_client 3rd "refs/heads/A" $((1*3))
 
 test_expect_success 'clone shallow' '
 	git clone --depth 2 "file://$(pwd)/." shallow
-- 
1.7.8.13.g74677

^ permalink raw reply related

* [PATCH 1/4] drop "match" parameter from get_remote_heads
From: Jeff King @ 2011-12-13  0:41 UTC (permalink / raw)
  To: git; +Cc: Kevin Sawicki
In-Reply-To: <20111213003925.GA28403@sigill.intra.peff.net>

The get_remote_heads function reads the list of remote refs
during git protocol session. It dates all the way back to
def88e9 (Commit first cut at "git-fetch-pack", 2005-07-04).
At that time, the idea was to come up with a list of refs we
were interested in, and then filter the list as we got it
from the remote side.

Later, 1baaae5 (Make maximal use of the remote refs,
2005-10-28) stopped filtering at the get_remote_heads layer,
letting us use the non-matching refs to find common history.

As a result, all callers now simply pass an empty match
list (and any future callers will want to do the same). So
let's drop these now-useless parameters.

Signed-off-by: Jeff King <peff@peff.net>
---
This one isn't necessary for the bugfix, but since it is the only other
caller of path_match besides fetch-pack, it gives us freedom to modify
or get rid of path_match later.

 builtin/fetch-pack.c |    2 +-
 builtin/send-pack.c  |    3 +--
 cache.h              |    2 +-
 connect.c            |    3 ---
 remote-curl.c        |    2 +-
 transport.c          |    7 +++----
 6 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index c6bc8eb..46688dc 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -976,7 +976,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 				   args.verbose ? CONNECT_VERBOSE : 0);
 	}
 
-	get_remote_heads(fd[0], &ref, 0, NULL, 0, NULL);
+	get_remote_heads(fd[0], &ref, 0, NULL);
 
 	ref = fetch_pack(&args, fd, conn, ref, dest,
 		nr_heads, heads, pack_lockfile_ptr);
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index e0b8030..cd1115f 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -494,8 +494,7 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix)
 
 	memset(&extra_have, 0, sizeof(extra_have));
 
-	get_remote_heads(fd[0], &remote_refs, 0, NULL, REF_NORMAL,
-			 &extra_have);
+	get_remote_heads(fd[0], &remote_refs, REF_NORMAL, &extra_have);
 
 	transport_verify_remote_names(nr_refspecs, refspecs);
 
diff --git a/cache.h b/cache.h
index 8c98d05..408e880 100644
--- a/cache.h
+++ b/cache.h
@@ -1034,7 +1034,7 @@ struct extra_have_objects {
 	int nr, alloc;
 	unsigned char (*array)[20];
 };
-extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, unsigned int flags, struct extra_have_objects *);
+extern struct ref **get_remote_heads(int in, struct ref **list, unsigned int flags, struct extra_have_objects *);
 extern int server_supports(const char *feature);
 
 extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
diff --git a/connect.c b/connect.c
index 51990fa..48df90b 100644
--- a/connect.c
+++ b/connect.c
@@ -53,7 +53,6 @@ static void add_extra_have(struct extra_have_objects *extra, unsigned char *sha1
  * Read all the refs from the other end
  */
 struct ref **get_remote_heads(int in, struct ref **list,
-			      int nr_match, char **match,
 			      unsigned int flags,
 			      struct extra_have_objects *extra_have)
 {
@@ -92,8 +91,6 @@ struct ref **get_remote_heads(int in, struct ref **list,
 
 		if (!check_ref(name, name_len, flags))
 			continue;
-		if (nr_match && !path_match(name, nr_match, match))
-			continue;
 		ref = alloc_ref(buffer + 41);
 		hashcpy(ref->old_sha1, old_sha1);
 		*list = ref;
diff --git a/remote-curl.c b/remote-curl.c
index 0e720ee..94dc488 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -200,7 +200,7 @@ static struct ref *parse_git_refs(struct discovery *heads)
 
 	if (start_async(&async))
 		die("cannot start thread to parse advertised refs");
-	get_remote_heads(async.out, &list, 0, NULL, 0, NULL);
+	get_remote_heads(async.out, &list, 0, NULL);
 	close(async.out);
 	if (finish_async(&async))
 		die("ref parsing thread failed");
diff --git a/transport.c b/transport.c
index 51814b5..c2245d4 100644
--- a/transport.c
+++ b/transport.c
@@ -502,7 +502,7 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus
 	struct ref *refs;
 
 	connect_setup(transport, for_push, 0);
-	get_remote_heads(data->fd[0], &refs, 0, NULL,
+	get_remote_heads(data->fd[0], &refs,
 			 for_push ? REF_NORMAL : 0, &data->extra_have);
 	data->got_remote_heads = 1;
 
@@ -537,7 +537,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0, 0);
-		get_remote_heads(data->fd[0], &refs_tmp, 0, NULL, 0, NULL);
+		get_remote_heads(data->fd[0], &refs_tmp, 0, NULL);
 		data->got_remote_heads = 1;
 	}
 
@@ -772,8 +772,7 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re
 		struct ref *tmp_refs;
 		connect_setup(transport, 1, 0);
 
-		get_remote_heads(data->fd[0], &tmp_refs, 0, NULL, REF_NORMAL,
-				 NULL);
+		get_remote_heads(data->fd[0], &tmp_refs, REF_NORMAL, NULL);
 		data->got_remote_heads = 1;
 	}
 
-- 
1.7.8.13.g74677

^ permalink raw reply related

* [PATCH 0/4] exact ref-matching for fetch-pack
From: Jeff King @ 2011-12-13  0:39 UTC (permalink / raw)
  To: git; +Cc: Kevin Sawicki

I was passed a bug report of clone failing on a repo with an oddly named
ref in it: "refs/for/refs/heads/master". This is probably an error
somebody made while pushing and should simply be deleted. However, it's
curious that "clone" would fail on this, since it should be ignoring
everything outside the "refs/heads/*" refspec.

It turns out that during the fetch-pack phase, we accidentally ask for
the sha1 of "refs/for/refs/heads/master" instead of that of
"refs/heads/master". This can cause "clone" to fail, as we may not have
asked for the object pointed to by "refs/heads/master" at all. This
behavior is due to some questionable suffix-matching deep inside
fetch-pack. The code in question dates back to the beginning of git; I
think it simply hasn't triggered much because the refname you need to
have is so specific (you must be asking to fetch a ref "refs/X", have
"refs/Y/X" on the remote, and "Y" must come alphabetically before "X",
since we match refs in alphabetical order).

As I said, this is probably just a silly one-off error. But I could
imagine this happening legitimately. Here are two possible scenarios:

  1. You are mirroring some meta-information about your refs in a
     parallel namespace. E.g., "refs/descriptions/refs/heads/master"
     contains information about "refs/heads/master".

  2. One of the things we've discussed for future git is mirroring the
     remote ref namespaces more literally. E.g., having something like
     "refs/remotes/origin/refs/heads/master". That won't actually
     trigger this bug because "heads" is alphabetically before
     "remotes". But "tags", for example, is not.

This could possibly be considered a behavior regression for users of the
fetch-pack command line. See patches 2 and 3 for details.

  [1/4]: drop "match" parameter from get_remote_heads
  [2/4]: t5500: give fully-qualified refs to fetch-pack
  [3/4]: fetch-pack: match refs exactly
  [4/4]: connect.c: drop path_match function

-Peff

^ permalink raw reply

* Re: [PATCH v2 30/51] sort_ref_dir(): do not sort if already sorted
From: Junio C Hamano @ 2011-12-12 23:26 UTC (permalink / raw)
  To: mhagger
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips
In-Reply-To: <1323668338-1764-31-git-send-email-mhagger@alum.mit.edu>

mhagger@alum.mit.edu writes:

> From: Michael Haggerty <mhagger@alum.mit.edu>
>
> Keep track of how many entries in a ref_dir are already sorted.  In
> sort_ref_dir(), only call qsort() if the dir contains unsorted
> entries.
>
> We could store a binary "sorted" value instead of an integer, but
> storing the number of sorted entries leaves the way open for a couple
> of possible future optimizations:
>
> * In sort_ref_dir(), sort *only* the unsorted entries, then merge them
>   with the sorted entries.  This should be faster if most of the
>   entries are already sorted.
>
> * Teach search_ref_dir() to do a binary search of any sorted entries,
>   and if unsuccessful do a linear search of any unsorted entries.
>   This would avoid the need to sort the list every time that
>   search_ref_dir() is called, and (given some intelligence about how
>   often to sort) could significantly improve the speed in certain
>   hypothetical usage patterns.

The elegance (which I think is the right word for this case as it is both
simple and clever) of the above strategy is not fully appreciated unless
you explain that your plan for "lazily add new entries and keep the
directory unsorted" case is to append them at the end (hence preserving
the property that entries[0..sorted] is always sorted).

If you reword the comment "How many of the entries..." to something like
"entries at 0 up to this index in the array are sorted", you could express
that idea without changing the log message that much, I guess.

> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>  refs.c |   29 ++++++++++++++++++++++++-----
>  1 files changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index ccd2806..ce141ea 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -108,6 +108,10 @@ struct ref_value {
>  
>  struct ref_dir {
>  	int nr, alloc;
> +
> +	/* How many of the entries in this directory are sorted? */
> +	int sorted;
> +
>  	struct ref_entry **entries;
>  };

^ permalink raw reply

* Re: [PATCH v2 20/51] repack_without_ref(): reimplement using do_for_each_ref_in_array()
From: Junio C Hamano @ 2011-12-12 22:44 UTC (permalink / raw)
  To: mhagger
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips
In-Reply-To: <1323668338-1764-21-git-send-email-mhagger@alum.mit.edu>

mhagger@alum.mit.edu writes:

> +static int repack_without_ref_fn(const char *refname, const unsigned char *sha1,
> +				 int flags, void *cb_data)
> +{
> +	struct repack_without_ref_sb *data = cb_data;
> +	char line[PATH_MAX + 100];
> +	int len;
> +
> +	if (!strcmp(data->refname, refname))
> +		return 0;
> +	len = snprintf(line, sizeof(line), "%s %s\n",
> +		       sha1_to_hex(sha1), refname);
> +	/* this should not happen but just being defensive */
> +	if (len > sizeof(line))
> +		die("too long a refname '%s'", refname);

Perhaps strbuf would be easier to use for things like this.

^ permalink raw reply

* Re: [PATCH v2 17/51] do_for_each_ref(): correctly terminate while processesing extra_refs
From: Junio C Hamano @ 2011-12-12 22:41 UTC (permalink / raw)
  To: mhagger
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips
In-Reply-To: <1323668338-1764-18-git-send-email-mhagger@alum.mit.edu>

mhagger@alum.mit.edu writes:

> From: Michael Haggerty <mhagger@alum.mit.edu>
>
> If the user-supplied function returns a nonzero value while processing
> extra_refs, terminate without processing the rest of the list.
>
> This probably has no practical importance, but makes the handling of
> extra_refs a little bit more consistent with the handling of other
> refs.
>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>  refs.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index 579e4c3..fb6fe84 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -711,8 +711,11 @@ static int do_for_each_ref(const char *submodule, const char *base, each_ref_fn
>  
>  	struct ref_array *extra = &extra_refs;
>  
> -	for (i = 0; i < extra->nr; i++)
> +	for (i = 0; i < extra->nr; i++) {
>  		retval = do_one_ref(base, fn, trim, flags, cb_data, extra->refs[i]);
> +		if (retval)
> +			goto end_each;
> +	}

Makes sense and everything up to this point looks sane.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 08/51] is_dup_ref(): extract function from sort_ref_array()
From: Junio C Hamano @ 2011-12-12 22:33 UTC (permalink / raw)
  To: mhagger
  Cc: git, Jeff King, Drew Northup, Jakub Narebski, Heiko Voigt,
	Johan Herland, Julian Phillips
In-Reply-To: <1323668338-1764-9-git-send-email-mhagger@alum.mit.edu>

mhagger@alum.mit.edu writes:

> +/*
> + * Emit a warning and return true iff ref1 and ref2 have the same name
> + * and the same sha1.  Die if they have the same name but different
> + * sha1s.
> + */
> +static int is_dup_ref(const struct ref_entry *ref1, const struct ref_entry *ref2)
> +{
> +	if (!strcmp(ref1->name, ref2->name)) {
> +		/* Duplicate name; make sure that the SHA1s match: */
> +		if (hashcmp(ref1->sha1, ref2->sha1))
> +			die("Duplicated ref, and SHA1s don't match: %s",
> +			    ref1->name);
> +		warning("Duplicated ref: %s", ref1->name);
> +		return 1;
> +	} else {
> +		return 0;
> +	}
> +}

At this step, is_dup_ref() is only called from sort_ref_array() which in
turn is only called on either a collection of loose or packed refs, so
warning is warranted. IOW I do not see anything wrong with _this_ patch.

Later in the series, however, the collection of extra refs seems to be
shoved into a phoney "direntry" and made to go through the same add_ref()
machinery that uses find_containing_direntry() which in turn calls
search_ref_dir() that smells like a definite no-no.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox