Git development
 help / color / mirror / Atom feed
* Re: [PATCH v2] sha1-file: remove OBJECT_INFO_SKIP_CACHED
From: Junio C Hamano @ 2020-01-02 21:41 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder
In-Reply-To: <20200102201630.180969-1-jonathantanmy@google.com>

Jonathan Tan <jonathantanmy@google.com> writes:

> As a historical note, the function now known as repo_read_object_file()
> was taught the empty tree in 346245a1bb ("hard-code the empty tree
> object", 2008-02-13), and the function now known as oid_object_info()
> was taught the empty tree in c4d9986f5f ("sha1_object_info: examine
> cached_object store too", 2011-02-07). repo_has_object_file() was never
> updated, perhaps due to oversight. The flag OBJECT_INFO_SKIP_CACHED,
> introduced later in dfdd4afcf9 ("sha1_file: teach
> sha1_object_info_extended more flags", 2017-06-26) and used in
> e83e71c5e1 ("sha1_file: refactor has_sha1_file_with_flags", 2017-06-26),
> was introduced to preserve this difference in empty-tree handling, but
> now it can be removed.

I am not 100% sure if the implication of this change is safe to
allow us to say "now it can be".

The has_object_file() helper wanted to say "no" when given a
non-existing object registered via the pretend_object_file(),
presumably because we wanted to allow a use pattern like:

 - prepare an in-core representation of an object we tentatively
   expect, but not absolutely sure, to be necessary.

 - perform operations, using the object data obtained via
   read_object() API, which is capable of yielding data even for
   such "pretend" objects (perhaps we are creating a tentative merge
   parents during a recursive merge).

 - write out final set of objects by enumerating those that do not
   really exist yet (via has_object_file() API).

Teaching about the empty tree to has_object_file() is a good thing
(especially because we do not necessarily write an empty tree object
to our repositories), but as a collateral damage of doing so, we
make such use pattern impossible.  

It is not a large loss---the third bullet in the above list can just
be made to unconditionally call write_object_file() without
filtering with has_object_file() and write_object_file() will apply
the right optimization anyway, so it probably is OK.

Will queue.

Thanks.

^ permalink raw reply

* Re: [PATCH 1/1] merge-recursive: remove unnecessary oid_eq function
From: Johannes Schindelin @ 2020-01-02 20:52 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Junio C Hamano, Elijah Newren
In-Reply-To: <c653a9b8d3863b3484eff224bbfbde65c250eaf0.1577856057.git.gitgitgadget@gmail.com>

Hi Elijah,

On Wed, 1 Jan 2020, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> Back when merge-recursive was first introduced in commit 6d297f8137
> (Status update on merge-recursive in C, 2006-07-08), it created a
> sha_eq() function.  This function pre-dated the introduction of
> hashcmp() to cache.h by about a month, but was switched over to using
> hashcmp() as part of commit 9047ebbc22 (Split out merge_recursive() to
> merge-recursive.c, 2008-08-12).  In commit b4da9d62f9 (merge-recursive:
> convert leaf functions to use struct object_id, 2016-06-24), sha_eq() was
> renamed to oid_eq() and its hashcmp() call was switched to oideq().
>
> oid_eq() is basically just a wrapper around oideq() that has some extra
> checks to protect against NULL arguments or to allow short-circuiting if
> one of the arguments is NULL.  I don't know if any caller ever tried to
> call with NULL arguments, but certainly none do now which means the
> extra checks serve no purpose.  (Also, if these checks were genuinely
> useful, then they probably should be added to the main oideq() so all
> callers could benefit from them.)
>
> Reduce the cognitive overhead of having both oid_eq() and oideq(), by
> getting rid of merge-recursive's special oid_eq() wrapper.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---

The patch and the commit message look good to me!

Thanks,
Dscho

>  merge-recursive.c | 33 +++++++++++----------------------
>  1 file changed, 11 insertions(+), 22 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 11869ad81c..10dca5644b 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -224,17 +224,6 @@ static struct commit *make_virtual_commit(struct repository *repo,
>  	return commit;
>  }
>
> -/*
> - * Since we use get_tree_entry(), which does not put the read object into
> - * the object pool, we cannot rely on a == b.
> - */
> -static int oid_eq(const struct object_id *a, const struct object_id *b)
> -{
> -	if (!a && !b)
> -		return 2;
> -	return a && b && oideq(a, b);
> -}
> -
>  enum rename_type {
>  	RENAME_NORMAL = 0,
>  	RENAME_VIA_DIR,
> @@ -805,7 +794,7 @@ static int was_tracked_and_matches(struct merge_options *opt, const char *path,
>
>  	/* See if the file we were tracking before matches */
>  	ce = opt->priv->orig_index.cache[pos];
> -	return (oid_eq(&ce->oid, &blob->oid) && ce->ce_mode == blob->mode);
> +	return (oideq(&ce->oid, &blob->oid) && ce->ce_mode == blob->mode);
>  }
>
>  /*
> @@ -1317,7 +1306,7 @@ static int merge_mode_and_contents(struct merge_options *opt,
>  			oidcpy(&result->blob.oid, &b->oid);
>  		}
>  	} else {
> -		if (!oid_eq(&a->oid, &o->oid) && !oid_eq(&b->oid, &o->oid))
> +		if (!oideq(&a->oid, &o->oid) && !oideq(&b->oid, &o->oid))
>  			result->merge = 1;
>
>  		/*
> @@ -1333,9 +1322,9 @@ static int merge_mode_and_contents(struct merge_options *opt,
>  			}
>  		}
>
> -		if (oid_eq(&a->oid, &b->oid) || oid_eq(&a->oid, &o->oid))
> +		if (oideq(&a->oid, &b->oid) || oideq(&a->oid, &o->oid))
>  			oidcpy(&result->blob.oid, &b->oid);
> -		else if (oid_eq(&b->oid, &o->oid))
> +		else if (oideq(&b->oid, &o->oid))
>  			oidcpy(&result->blob.oid, &a->oid);
>  		else if (S_ISREG(a->mode)) {
>  			mmbuffer_t result_buf;
> @@ -1368,7 +1357,7 @@ static int merge_mode_and_contents(struct merge_options *opt,
>  			switch (opt->recursive_variant) {
>  			case MERGE_VARIANT_NORMAL:
>  				oidcpy(&result->blob.oid, &a->oid);
> -				if (!oid_eq(&a->oid, &b->oid))
> +				if (!oideq(&a->oid, &b->oid))
>  					result->clean = 0;
>  				break;
>  			case MERGE_VARIANT_OURS:
> @@ -2836,15 +2825,15 @@ static int process_renames(struct merge_options *opt,
>  			dst_other.mode = ren1->dst_entry->stages[other_stage].mode;
>  			try_merge = 0;
>
> -			if (oid_eq(&src_other.oid, &null_oid) &&
> +			if (oideq(&src_other.oid, &null_oid) &&
>  			    ren1->dir_rename_original_type == 'A') {
>  				setup_rename_conflict_info(RENAME_VIA_DIR,
>  							   opt, ren1, NULL);
> -			} else if (oid_eq(&src_other.oid, &null_oid)) {
> +			} else if (oideq(&src_other.oid, &null_oid)) {
>  				setup_rename_conflict_info(RENAME_DELETE,
>  							   opt, ren1, NULL);
>  			} else if ((dst_other.mode == ren1->pair->two->mode) &&
> -				   oid_eq(&dst_other.oid, &ren1->pair->two->oid)) {
> +				   oideq(&dst_other.oid, &ren1->pair->two->oid)) {
>  				/*
>  				 * Added file on the other side identical to
>  				 * the file being renamed: clean merge.
> @@ -2859,7 +2848,7 @@ static int process_renames(struct merge_options *opt,
>  						      1, /* update_cache */
>  						      0  /* update_wd    */))
>  					clean_merge = -1;
> -			} else if (!oid_eq(&dst_other.oid, &null_oid)) {
> +			} else if (!oideq(&dst_other.oid, &null_oid)) {
>  				/*
>  				 * Probably not a clean merge, but it's
>  				 * premature to set clean_merge to 0 here,
> @@ -3037,7 +3026,7 @@ static int blob_unchanged(struct merge_options *opt,
>
>  	if (a->mode != o->mode)
>  		return 0;
> -	if (oid_eq(&o->oid, &a->oid))
> +	if (oideq(&o->oid, &a->oid))
>  		return 1;
>  	if (!renormalize)
>  		return 0;
> @@ -3478,7 +3467,7 @@ static int merge_trees_internal(struct merge_options *opt,
>  					       opt->subtree_shift);
>  	}
>
> -	if (oid_eq(&merge_base->object.oid, &merge->object.oid)) {
> +	if (oideq(&merge_base->object.oid, &merge->object.oid)) {
>  		output(opt, 0, _("Already up to date!"));
>  		*result = head;
>  		return 1;
> --
> gitgitgadget
>

^ permalink raw reply

* Re: [PATCH v2] revision: allow missing promisor objects on CLI
From: Jonathan Tan @ 2020-01-02 20:49 UTC (permalink / raw)
  To: jrnieder; +Cc: jonathantanmy, git, gitster
In-Reply-To: <20191231000957.GB13606@google.com>

> > However, in the case that --ignore-missing is not set but
> > --exclude-promisor-objects is set, there is still no distinction between
> > the case wherein the missing object is a promisor object and the case
> > wherein it is not. This is unnecessarily restrictive, since if a missing
> > promisor object is encountered in traversal, it is ignored; likewise it
> > should be ignored if provided through the CLI. Therefore, distinguish
> > between these 2 cases. (As a bonus, the code is now simpler.)
> 
> nit about tenses, not worth a reroll on its own: "As a bonus, this
> makes the code simpler" (since commit messages describe the status quo
> before the patch in the present tense).

OK.

> > --- a/revision.c
> > +++ b/revision.c
> > @@ -370,8 +370,18 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
> >  		if (!repo_parse_commit(revs->repo, c))
> >  			object = (struct object *) c;
> >  		else
> > +			/*
> > +			 * There is something wrong with the commit.
> > +			 * repo_parse_commit() will have already printed an
> > +			 * error message. For our purposes, treat as missing.
> > +			 */
> >  			object = NULL;
> >  	} else {
> > +		/*
> > +		 * There is something wrong with the object. parse_object()
> 
> If we got here, we are in the !commit case, which is not inherently wrong,
> right?

[snip]

Ah, good catch. It should be "If parse_object() returns NULL, ..."

> By the way, why doesn't parse_object itself check the commit graph for
> commit objects?

Because that's how I coded it in ec0c5798ee ("revision: use commit graph
in get_reference()", 2018-12-28). In the commit message, I said:

> Another way to accomplish this effect would be to modify parse_object()
> to use the commit graph if possible; however, I did not want to change
> parse_object()'s current behavior of always checking the object
> signature of the returned object.

But that doesn't mean that we cannot change it.

> By "and treats it as missing" does this mean "and we should treat it
> as missing"?  (Forgive my ignorance.)

I don't fully know if we should, hence my specific wording :-P but see
my answer to your next question.

> Why do we treat corrupt objects as missing?  Is this for graceful
> degredation when trying to recover data from a corrupt repository?

This (at least, treating wrong-hash objects the same as missing) has
been true since acdeec62cb ("Don't ever return corrupt objects from
"parse_object()"", 2007-03-20) (and that commit message has no
explanation). I think this is best - I consider corrupt objects to be
similar to missing objects with respect to repository corruption, so for
me it is logical to treat them the same way.

^ permalink raw reply

* Re: BUG: sendemail-validate hook is run too early
From: Junio C Hamano @ 2020-01-02 20:26 UTC (permalink / raw)
  To: Jani Nikula; +Cc: git
In-Reply-To: <875zhut5yd.fsf@intel.com>

Jani Nikula <jani.nikula@intel.com> writes:

> I'm trying to use the sendemail-validate hook to validate the recipients
> of the patch email, among other things. Turns out the hook gets run
> immediately on the input patches, *not* on the "e-mail to be sent" as
> claimed by githooks(5).

I will make two suggestions, so please do not react before reading
both ;-)

The purpose of the validate hook, at least as it was originally
designed, was to vet the log message and patch contents, so what you
reported is not at all surprising.  After all, the sub that uses the
hook is called "validate_patch" ;-).

A low-hanging documentation fix (this is one suggestion) is to
phrase "e-mail to be sent" as "e-mail that has been submitted (to
git-send-email)" to avoid the confusion.

You do not want to use the sendemail-validate hook for checking for
the recipients, because the e-mail message is not a good source of
that information.

When a recipient is added, two things happen:

 * The recipient is added to the (internal) list of recipients on
   the underlying sendmail command line arguments.  This is the list
   of addresses that actually matter where the piece of email goes.

 * The recipient is added to the text of the message being sent, if
   s/he is being added to either To: or Cc: (this is purely for
   human consumption and does not affect where the piece of email
   goes).  A blind-carbon-copy recipient would not be added for
   obvious reasons.

If you truly want to validate where the message goes, you'd need to
vet the former list, not the latter one.  Otherwise, you'll miss BCC
recipients.

So the other suggestion is to have a separate hook to validate the
list of recipients.  This might be a bit more involved if we want to
execute cleanly, but should not be rocket science.  

The send_message() sub prepares @recipients list to form quite a bit
of processing at the beginning, and the uses the resulting list to
drive the sendmail by adding it to @sendmail_parameters().  The
contents of this @recipients list is what you want to vet before the
code talks to the sendmail program or daemon later in the function.

^ permalink raw reply

* Re: Comparing rebase --am with --interactive via p3400
From: Johannes Schindelin @ 2020-01-02 20:17 UTC (permalink / raw)
  To: Alban Gruin; +Cc: Elijah Newren, Git Mailing List
In-Reply-To: <16add63e-a631-5ebf-bbbe-17823d942ee9@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6575 bytes --]

Hi Alban & Elijah,

On Sun, 29 Dec 2019, Alban Gruin wrote:

> Hi Elijah,
>
> Le 27/12/2019 à 23:45, Elijah Newren a écrit :
> > Hi Alban,
> >
> > On Fri, Dec 27, 2019 at 1:11 PM Alban Gruin <alban.gruin@gmail.com> wrote:
> >>
> >> Hi Johannes & Elijah,
> >>
> >> Le 01/02/2019 à 07:04, Johannes Schindelin a écrit :
> >>> Hi Elijah,
> >>>
> >>> as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the
> >>> --am backend) and then with --keep-empty to force the interactive backend
> >>> to be used. Here are the best of 10, on my relatively powerful Windows 10
> >>> laptop, with current `master`.
> >>>
> >>> With regular rebase --am:
> >>>
> >>> 3400.2: rebase on top of a lot of unrelated changes             5.32(0.06+0.15)
> >>> 3400.4: rebase a lot of unrelated changes without split-index   33.08(0.04+0.18)
> >>> 3400.6: rebase a lot of unrelated changes with split-index      30.29(0.03+0.18)
> >>>
> >>> with --keep-empty to force the interactive backend:
> >>>
> >>> 3400.2: rebase on top of a lot of unrelated changes             3.92(0.03+0.18)
> >>> 3400.4: rebase a lot of unrelated changes without split-index   33.92(0.03+0.22)
> >>> 3400.6: rebase a lot of unrelated changes with split-index      38.82(0.03+0.16)
> >>>
> >>> I then changed it to -m to test the current scripted version, trying to
> >>> let it run overnight, but my laptop eventually went to sleep and the tests
> >>> were not even done. I'll let them continue and report back.
> >>>
> >>> My conclusion after seeing these numbers is: the interactive rebase is
> >>> really close to the performance of the --am backend. So to me, it makes a
> >>> total lot of sense to switch --merge over to it, and to make --merge the
> >>> default. We still should investigate why the split-index performance is so
> >>> significantly worse, though.
> >>>
> >>> Ciao,
> >>> Dscho
> >>>
> >>
> >> I investigated a bit on this.  From a quick glance at a callgrind trace,
> >> I can see that ce_write_entry() is called 20 601[1] times with `git am',
> >> but 739 802 times with the sequencer when the split-index is enabled.
> >
> > Sweet, thanks for digging in and analyzing this.
> >
> >> For reference, here are the timings, measured on my Linux machine, on a
> >> tmpfs, with git.git as the repo:
> >>
> >> `rebase --am':
> >>> 3400.2: rebase on top of a lot of unrelated changes             0.29(0.24+0.03)
> >>> 3400.4: rebase a lot of unrelated changes without split-index   6.77(6.51+0.22)
> >>> 3400.6: rebase a lot of unrelated changes with split-index      4.43(4.29+0.13)
> >> `rebase --quiet':
> >
> > --quiet?  Isn't that flag supposed to work with both backends and not
> > imply either one?  We previously used --keep-empty, though there's a
> > chance that flag means we're not doing a fair comparison (since 'am'
> > will drop empty commits and thus have less work to do).  Is there any
> > chance you actually ran a different command, but when you went to
> > summarize just typed the wrong flag name?  Anyway, the best would
> > probably be to use --merge here (at the time Johannes and I were
> > testing, that wouldn't have triggered the sequencer, but it does now),
> > after first applying the en/rebase-backend series just to make sure
> > we're doing an apples to apples comparison.  However, I suspect that
> > empty commits probably weren't much of a factor and you did find some
> > interesting things...
> >
>
> Yes, I did use `--keep-empty' but misremembered it when writing this email…
>
> >>> 3400.2: rebase on top of a lot of unrelated changes             0.24(0.21+0.02)
> >>> 3400.4: rebase a lot of unrelated changes without split-index   5.60(5.32+0.27)
> >>> 3400.6: rebase a lot of unrelated changes with split-index      5.67(5.40+0.26)
> >>
> >> This comes from two things:
> >>
> >> 1. There is not enough shared entries in the index with the sequencer.
> >>
> >> do_write_index() is called only by do_write_locked_index() with `--am',
> >> but is also called by write_shared_index() with the sequencer once for
> >> every other commit.  As the latter is only called by
> >> write_locked_index(), which means that too_many_not_shared_entries()
> >> returns true for the sequencer, but never for `--am'.
> >>
> >> Removing the call to discard_index() in do_pick_commit() (as in the
> >> first attached patch) solve this particular issue, but this would
> >> require a more thorough analysis to see if it is actually safe to do.
> >
> > I'm actually surprised the sequencer would call discard_index(); I
> > would have thought it would have relied on merge_recursive() to do the
> > necessary index changes and updates other than writing the new index
> > out.  But I'm not quite as familar with the sequencer so perhaps
> > there's some reason I'm unaware of.  (Any chance this is a left-over
> > from when sequencer invoked external scripts to do the work, and thus
> > the index was updated in another processes' memory and on disk, and it
> > had to discard and re-read to get its own process updated?)
> >
>
> The sequencer re-reads the index after invoking an external command
> (either `git checkout', `git merge' or an `exec' command from the todo
> list), which makes sense.  But this one seems to come from 6eb1b437933
> ("cherry-pick/revert: make direct internal call to merge_tree()",
> 2008-09-02).  So, yes, quite old, and perhaps no longer justified.

Right. This commit also moved the `discard_cache()` call outside from the
`else` clause of the `if (no_commit)`.

That `else` clause goes all the way back to 9509af686bf (Make git-revert &
git-cherry-pick a builtin, 2007-03-01), and I admit freely that my memory
is no longer fresh on the specifics of this patch.

Looking at that patch, I think I simply discarded the index because a
subsequent code path would spawn the `git merge-recursive` process, which
would have changed the index externally.

> I know I had to add another discard_cache() in rebase--interactive.c
> because it broke something with the submodules, but it does not seems
> all that useful now that rebase.c no longer has to fork to use the
> sequencer.

FWIW I agree. The code is still quite complex at this point, but
infinitely more readable (thank you Elijah for taking point on simplifying
merge-recursive.c!). So I think that it might be the right point in time
to make sure that the index is not re-read and re-discarded over and over
again.

Thanks,
Dscho

^ permalink raw reply

* [PATCH v2] sha1-file: remove OBJECT_INFO_SKIP_CACHED
From: Jonathan Tan @ 2020-01-02 20:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster, jrnieder
In-Reply-To: <20191230211027.37002-1-jonathantanmy@google.com>

In a partial clone, if a user provides the hash of the empty tree ("git
mktree </dev/null" - for SHA-1, this is 4b825d...) to a command which
requires that that object be parsed, for example:

  git diff-tree 4b825d <a non-empty tree>

then Git will lazily fetch the empty tree, unnecessarily, because
parsing of that object invokes repo_has_object_file(), which does not
special-case the empty tree.

Instead, teach repo_has_object_file() to consult find_cached_object()
(which handles the empty tree), thus bringing it in line with the rest
of the object-store-accessing functions. A cost is that
repo_has_object_file() will now need to oideq upon each invocation, but
that is trivial compared to the filesystem lookup or the pack index
search required anyway. (And if find_cached_object() needs to do more
because of previous invocations to pretend_object_file(), all the more
reason to be consistent in whether we present cached objects.)

As a historical note, the function now known as repo_read_object_file()
was taught the empty tree in 346245a1bb ("hard-code the empty tree
object", 2008-02-13), and the function now known as oid_object_info()
was taught the empty tree in c4d9986f5f ("sha1_object_info: examine
cached_object store too", 2011-02-07). repo_has_object_file() was never
updated, perhaps due to oversight. The flag OBJECT_INFO_SKIP_CACHED,
introduced later in dfdd4afcf9 ("sha1_file: teach
sha1_object_info_extended more flags", 2017-06-26) and used in
e83e71c5e1 ("sha1_file: refactor has_sha1_file_with_flags", 2017-06-26),
was introduced to preserve this difference in empty-tree handling, but
now it can be removed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Forgot to add v2 to the other email, so resending it with the correct
email subject.

Difference from v1: updated commit message in response to Jonathan
Nieder's feedback. Hopefully I didn't remove too much.
---
 object-store.h |  2 --
 sha1-file.c    | 38 ++++++++++++++++++--------------------
 2 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/object-store.h b/object-store.h
index 55ee639350..61b8b13e3b 100644
--- a/object-store.h
+++ b/object-store.h
@@ -292,8 +292,6 @@ struct object_info {
 #define OBJECT_INFO_LOOKUP_REPLACE 1
 /* Allow reading from a loose object file of unknown/bogus type */
 #define OBJECT_INFO_ALLOW_UNKNOWN_TYPE 2
-/* Do not check cached storage */
-#define OBJECT_INFO_SKIP_CACHED 4
 /* Do not retry packed storage after checking packed and loose storage */
 #define OBJECT_INFO_QUICK 8
 /* Do not check loose object */
diff --git a/sha1-file.c b/sha1-file.c
index 188de57634..03ae9ae93a 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1417,6 +1417,7 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 			     struct object_info *oi, unsigned flags)
 {
 	static struct object_info blank_oi = OBJECT_INFO_INIT;
+	struct cached_object *co;
 	struct pack_entry e;
 	int rtype;
 	const struct object_id *real = oid;
@@ -1431,24 +1432,22 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 	if (!oi)
 		oi = &blank_oi;
 
-	if (!(flags & OBJECT_INFO_SKIP_CACHED)) {
-		struct cached_object *co = find_cached_object(real);
-		if (co) {
-			if (oi->typep)
-				*(oi->typep) = co->type;
-			if (oi->sizep)
-				*(oi->sizep) = co->size;
-			if (oi->disk_sizep)
-				*(oi->disk_sizep) = 0;
-			if (oi->delta_base_sha1)
-				hashclr(oi->delta_base_sha1);
-			if (oi->type_name)
-				strbuf_addstr(oi->type_name, type_name(co->type));
-			if (oi->contentp)
-				*oi->contentp = xmemdupz(co->buf, co->size);
-			oi->whence = OI_CACHED;
-			return 0;
-		}
+	co = find_cached_object(real);
+	if (co) {
+		if (oi->typep)
+			*(oi->typep) = co->type;
+		if (oi->sizep)
+			*(oi->sizep) = co->size;
+		if (oi->disk_sizep)
+			*(oi->disk_sizep) = 0;
+		if (oi->delta_base_sha1)
+			hashclr(oi->delta_base_sha1);
+		if (oi->type_name)
+			strbuf_addstr(oi->type_name, type_name(co->type));
+		if (oi->contentp)
+			*oi->contentp = xmemdupz(co->buf, co->size);
+		oi->whence = OI_CACHED;
+		return 0;
 	}
 
 	while (1) {
@@ -1932,8 +1931,7 @@ int repo_has_object_file_with_flags(struct repository *r,
 {
 	if (!startup_info->have_repository)
 		return 0;
-	return oid_object_info_extended(r, oid, NULL,
-					flags | OBJECT_INFO_SKIP_CACHED) >= 0;
+	return oid_object_info_extended(r, oid, NULL, flags) >= 0;
 }
 
 int repo_has_object_file(struct repository *r,
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related

* [PATCH] sha1-file: remove OBJECT_INFO_SKIP_CACHED
From: Jonathan Tan @ 2020-01-02 20:15 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster, jrnieder
In-Reply-To: <20191230211027.37002-1-jonathantanmy@google.com>

In a partial clone, if a user provides the hash of the empty tree ("git
mktree </dev/null" - for SHA-1, this is 4b825d...) to a command which
requires that that object be parsed, for example:

  git diff-tree 4b825d <a non-empty tree>

then Git will lazily fetch the empty tree, unnecessarily, because
parsing of that object invokes repo_has_object_file(), which does not
special-case the empty tree.

Instead, teach repo_has_object_file() to consult find_cached_object()
(which handles the empty tree), thus bringing it in line with the rest
of the object-store-accessing functions. A cost is that
repo_has_object_file() will now need to oideq upon each invocation, but
that is trivial compared to the filesystem lookup or the pack index
search required anyway. (And if find_cached_object() needs to do more
because of previous invocations to pretend_object_file(), all the more
reason to be consistent in whether we present cached objects.)

As a historical note, the function now known as repo_read_object_file()
was taught the empty tree in 346245a1bb ("hard-code the empty tree
object", 2008-02-13), and the function now known as oid_object_info()
was taught the empty tree in c4d9986f5f ("sha1_object_info: examine
cached_object store too", 2011-02-07). repo_has_object_file() was never
updated, perhaps due to oversight. The flag OBJECT_INFO_SKIP_CACHED,
introduced later in dfdd4afcf9 ("sha1_file: teach
sha1_object_info_extended more flags", 2017-06-26) and used in
e83e71c5e1 ("sha1_file: refactor has_sha1_file_with_flags", 2017-06-26),
was introduced to preserve this difference in empty-tree handling, but
now it can be removed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Difference from v1: updated commit message in response to Jonathan
Nieder's feedback. Hopefully I didn't remove too much.
---
 object-store.h |  2 --
 sha1-file.c    | 38 ++++++++++++++++++--------------------
 2 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/object-store.h b/object-store.h
index 55ee639350..61b8b13e3b 100644
--- a/object-store.h
+++ b/object-store.h
@@ -292,8 +292,6 @@ struct object_info {
 #define OBJECT_INFO_LOOKUP_REPLACE 1
 /* Allow reading from a loose object file of unknown/bogus type */
 #define OBJECT_INFO_ALLOW_UNKNOWN_TYPE 2
-/* Do not check cached storage */
-#define OBJECT_INFO_SKIP_CACHED 4
 /* Do not retry packed storage after checking packed and loose storage */
 #define OBJECT_INFO_QUICK 8
 /* Do not check loose object */
diff --git a/sha1-file.c b/sha1-file.c
index 188de57634..03ae9ae93a 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1417,6 +1417,7 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 			     struct object_info *oi, unsigned flags)
 {
 	static struct object_info blank_oi = OBJECT_INFO_INIT;
+	struct cached_object *co;
 	struct pack_entry e;
 	int rtype;
 	const struct object_id *real = oid;
@@ -1431,24 +1432,22 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 	if (!oi)
 		oi = &blank_oi;
 
-	if (!(flags & OBJECT_INFO_SKIP_CACHED)) {
-		struct cached_object *co = find_cached_object(real);
-		if (co) {
-			if (oi->typep)
-				*(oi->typep) = co->type;
-			if (oi->sizep)
-				*(oi->sizep) = co->size;
-			if (oi->disk_sizep)
-				*(oi->disk_sizep) = 0;
-			if (oi->delta_base_sha1)
-				hashclr(oi->delta_base_sha1);
-			if (oi->type_name)
-				strbuf_addstr(oi->type_name, type_name(co->type));
-			if (oi->contentp)
-				*oi->contentp = xmemdupz(co->buf, co->size);
-			oi->whence = OI_CACHED;
-			return 0;
-		}
+	co = find_cached_object(real);
+	if (co) {
+		if (oi->typep)
+			*(oi->typep) = co->type;
+		if (oi->sizep)
+			*(oi->sizep) = co->size;
+		if (oi->disk_sizep)
+			*(oi->disk_sizep) = 0;
+		if (oi->delta_base_sha1)
+			hashclr(oi->delta_base_sha1);
+		if (oi->type_name)
+			strbuf_addstr(oi->type_name, type_name(co->type));
+		if (oi->contentp)
+			*oi->contentp = xmemdupz(co->buf, co->size);
+		oi->whence = OI_CACHED;
+		return 0;
 	}
 
 	while (1) {
@@ -1932,8 +1931,7 @@ int repo_has_object_file_with_flags(struct repository *r,
 {
 	if (!startup_info->have_repository)
 		return 0;
-	return oid_object_info_extended(r, oid, NULL,
-					flags | OBJECT_INFO_SKIP_CACHED) >= 0;
+	return oid_object_info_extended(r, oid, NULL, flags) >= 0;
 }
 
 int repo_has_object_file(struct repository *r,
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
From: Johannes Schindelin @ 2020-01-02 19:58 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano
In-Reply-To: <20191226214245.GA186931@google.com>

Hi Jonathan,

On Thu, 26 Dec 2019, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> > On Thu, 26 Dec 2019, Jonathan Nieder wrote:
> >> Johannes Schindelin wrote:
>
> >>> Arguably, this is the wrong layer for that error, anyway: As long as the
> >>> user never checks out the files whose names contain backslashes, there
> >>> should not be any problem in the first place.
> [...]
> >>          between the lines of this commit messages I sense that there
> >> *are* repositories in the wild using these kinds of filenames.
> >>
> >> Can you say more about that?  What repositories are affected?  Do they
> >> contain such filenames at HEAD or only in their history?  If someone
> >> wants to check out a revision with such filenames, what should happen?
> >
> > Yes: https://github.com/zephyrproject-rtos/civetweb contains history where
> > at some stage, by mistake, a directory was called `\`. It has been fixed a
> > long time ago, but users obviously still want to be able to clone it ;-)
>
> Thanks.
>
> What should and does happen if someone tries to check out an offending
> revision?

To answer this question, I added this paragraph to the commit message:

    The idea is to prevent Git from even trying to write files with
    backslashes in their file names: while these characters are valid in
    file names on other platforms, on Windows it is interpreted as directory
    separator (which would obviously lead to ambiguities, e.g. when there is
    a file `a\b` and there is also a file `a/b`).

> [...]
> > I added this paragraph to the commit message:
> >
> >     Note: just as before, the check is guarded by `core.protectNTFS` (to
> >     allow overriding the check by toggling that config setting), and it
> >     is _only_ performed on Windows, as the backslash is not a directory
> >     separator elsewhere, even when writing to NTFS-formatted volumes.
> >
> > Does that clarify the issue enough?
>
> It helps: that tells me why the check is only performed on Windows.
>
> Since this only affects Windows, please only take this review as data
> about how someone read the patch.  The patch doesn't make non Windows
> specific code any *less* maintainable, and as a general presumption I
> assume that the Git for Windows maintainer knows best about what is
> needed for maintainability of Windows-specific code.
>
> But the commit message and docs still don't describe what the desired
> behavior is.  For example, should I be able to check out a revision
> with a backslash in it (e.g. via Git skipping the offending entry), or
> is the intended behavior for it to error out and prevent checking out
> such versions of code?

As mentioned above, the idea is to prevent Git from attempting to create
files with illegal file name characters.

> Is there anything we can or should do to prevent people checking in
> new examples of paths with backslash in them (on all platforms)?

As mentioned in my reply to Junio, I don't think that it would be wise to
even try to warn about backslashes in the file names. There are _so_ many
Git users out there, I am convinced that at least some of them have valid
use cases for file names with backslashes in them.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH v2 1/1] commit: display advice hints when commit fails
From: Jonathan Tan @ 2020-01-02 19:56 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, gitgitgadget, git, heba.waly
In-Reply-To: <xmqq7e2cfh68.fsf@gitster-ct.c.googlers.com>

> > Junio, what are your plans over what you have in your tree? If you'd
> > like to hear Heba's opinion on it, then she can chime in; if you'd like
> > a review, then I think it's good to go in.
> 
> On hold until anything like those happens ;-) 
> 
> A random reviewer mentioning something on a patch (either in a
> line-by-line critique form or "how about doing it this way instead"
> counterproposal form) without getting followed up by others
> (including the original author) is a stall review thread, and it
> does not change the equation if the random reviewer happens to be me.

OK :-)

> > And I think the answer to that is "s" is used throughout the function in
> > various ways (in particular, used to print statuses both to stdout and
> > to the message template) so any wrapping or corralling of scope would
> > just make things more complicated. In particular, the way Heba did it in
> > v2 is more unclear - at the time of setting s->hints = 0, it's done
> 
> You mean "less clear" (just double checking if I got the negation right)?

Yes, less clear - v2 is less clear than v1.

> I think I've merged it to 'next' yesterday, but it does not mean
> that much as we are in -rc and it is not such an urgent "oops we
> broke it in this cycle, let's fix it" issue.  If we see a v3 that
> improves it, I do not mind at all reverting what I merged to 'next'
> and use the updated one instead (either way, it will be in 'master'
> during the next cycle at the earliest).

Sounds good - thanks.

^ permalink raw reply

* Re: [PATCH 1/1] add: use advise function to display hints
From: Junio C Hamano @ 2020-01-02 19:54 UTC (permalink / raw)
  To: Heba Waly via GitGitGadget; +Cc: git, Heba Waly
In-Reply-To: <90608636bf184de76f91e4e04d9e796a021775a0.1577934241.git.gitgitgadget@gmail.com>

"Heba Waly via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Heba Waly <heba.waly@gmail.com>
>
> Use the advise function in advice.c to display hints to the users, as
> it provides a neat and a standard format for hint messages, i.e: the
> text is colored in yellow and the line starts by the word "hint:".

Use of advise() function is good for giving hints not just due to
its yellow coloring (which by the way I find not very readable,
perhaps because I use black ink on white paper).  One good thing in
using the advise() API is that the messages can also be squelched
with advice.* configuration variables.

And these two hints in "git add" are good chandidates to make
customizable (perhaps with "advice.addNothing"), so I tend to agree
with you that it makes sense to move these two messages to advise().
Unfortunately this patch goes only halfway and stops (see below).

If there are many other places that calls to advise() are made
without getting guarded by the toggles defined in advice.c, we
should fix them, I think.

>
> Signed-off-by: Heba Waly <heba.waly@gmail.com>
> ---
>  builtin/add.c  | 4 ++--
>  t/t3700-add.sh | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index 4c38aff419..eebf8d772b 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -390,7 +390,7 @@ static int add_files(struct dir_struct *dir, int flags)
>  		fprintf(stderr, _(ignore_error));
>  		for (i = 0; i < dir->ignored_nr; i++)
>  			fprintf(stderr, "%s\n", dir->ignored[i]->name);
> -		fprintf(stderr, _("Use -f if you really want to add them.\n"));
> +		advise(_("Use -f if you really want to add them.\n"));
>  		exit_status = 1;
>  	}
>  
> @@ -480,7 +480,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>  
>  	if (require_pathspec && pathspec.nr == 0) {
>  		fprintf(stderr, _("Nothing specified, nothing added.\n"));
> -		fprintf(stderr, _("Maybe you wanted to say 'git add .'?\n"));
> +		advise( _("Maybe you wanted to say 'git add .'?\n"));
>  		return 0;
>  	}

The final code for the above part would look like:

		if (advice_add_nothing)
			advise(_("Use -f if you really want to add them."));
		...
		if (advice_add_nothing)
			advise( _("Maybe you wanted to say 'git add .'?"));

and then you would

 * add defn of advice_add_nothing to advice.h
 * add decl of the same, initialized to 1(true), to advice.c
 * map "addNothing" to &advice_add_nothing in advice.c::advice_config[]

to complete the other half of this patch, if the config we choose to
use is named "advice.addNothing".

By the way, notice that the single-liner advise() messages do not
end with LF?  This is another difference between printf() family and
advise().  advise() cuts its message at LF and prefixes each piece
with "hint:" but after the final LF there is nothing but NUL, which
means the final LF is optional.

The warning()/error()/die() family is different from advise() in
that they do not chop the incoming message at LF.  This behaviour is
less i18n friendly, and it would be nice to eventually change them
to behave similarly to advise().

Thanks.

 

^ permalink raw reply

* Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries
From: Johannes Schindelin @ 2020-01-02 19:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git
In-Reply-To: <xmqqk16dfpcy.fsf@gitster-ct.c.googlers.com>

Hi Junio,

On Mon, 30 Dec 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Turns out that this inconsistency is only in Git for Windows v2.24.1(2)
> > but not in current `master` of Git, so I simply struck that part from the
> > commit message.
> > ...
> > I rephrased it to:
> >
> >     So let's loosen the requirements: we now leave tree entries with
> >     backslashes in their file names alone, but we do require any entries
> >     that are added to the Git index to contain no backslashes on Windows.
> > ...
>
> We are in -rc so there is no real rush,

My intention was actually to integrate this patch into Git for Windows
v2.25.0-rc1 already, that's why I sent it out for review even this late in
the -rc phase.

> but I take these to mean that I should just leave this loose end hanging
> untied, and wait for an updated version to replace it sometime early
> next year.

I sent out a new iteration of the patch *just* before the new year arrived
over here ;-)

> Thanks and happy new year.

The same to you!
Dscho

^ permalink raw reply

* Re: [PATCH 1/1] merge-recursive: remove unnecessary oid_eq function
From: Junio C Hamano @ 2020-01-02 18:35 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren
In-Reply-To: <c653a9b8d3863b3484eff224bbfbde65c250eaf0.1577856057.git.gitgitgadget@gmail.com>

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
>
> Back when merge-recursive was first introduced in commit 6d297f8137
> (Status update on merge-recursive in C, 2006-07-08), it created a
> sha_eq() function.  This function pre-dated the introduction of
> hashcmp() to cache.h by about a month, but was switched over to using
> hashcmp() as part of commit 9047ebbc22 (Split out merge_recursive() to
> merge-recursive.c, 2008-08-12).  In commit b4da9d62f9 (merge-recursive:
> convert leaf functions to use struct object_id, 2016-06-24), sha_eq() was
> renamed to oid_eq() and its hashcmp() call was switched to oideq().
>
> oid_eq() is basically just a wrapper around oideq() that has some extra
> checks to protect against NULL arguments or to allow short-circuiting if
> one of the arguments is NULL.  I don't know if any caller ever tried to
> call with NULL arguments, but certainly none do now which means the
> extra checks serve no purpose.  (Also, if these checks were genuinely
> useful, then they probably should be added to the main oideq() so all
> callers could benefit from them.)

Just for some historical yuck values ^W^W entertainment,
6d297f81373:merge-recursive.c shows how the function was called and
needed to prepare for NULL inputs.

I agree that today's code won't need the "two NULLs are equal, and NULL
is never equal to anything" hack.  We've come a long way ;-)


^ permalink raw reply

* Re: [PATCH 05/16] t2018: don't lose return code of git commands
From: Junio C Hamano @ 2020-01-02 18:26 UTC (permalink / raw)
  To: Denton Liu; +Cc: Eric Sunshine, Git Mailing List
In-Reply-To: <20200101084840.GC5152@generichostname>

Denton Liu <liu.denton@gmail.com> writes:

>> Try writing your commit messages in imperative mood:
>> 
>>     Fix invocations of Git commands so their exit codes are not lost
>>     within command substitutions...
>> 
>> or something. Same comment about several other commit messages in this series.
>
> I thought that the preferred form of commit messages is to introduce the
> problem that I'm trying to solve first ("We had some git commands losing
> return codes") then, after that, describe the changes I made in
> imperative mood ("Rewrite these instances..."). From what I can tell,
> all of my commit messages conform to this template.
>
> I'd prefer to keep the "problem statement" introduction in my commit
> messages as it primes readers so they know "why" before "what" but I
> can't think of any way to phrase the "problem statement" part in a way
> that sounds good without resorting to past tense. Any suggestions?

I find Eric's slightly easier to follow in this particular case,
primarily because the problem being solved is so obvious (i.e. "so
that their exit codes are not lost" is sufficient to convey that the
current code lose exit codes and that it is bad to lose exit codes).

When the problem is deeper or needs longer backstory to understand
where it came from, I do prefer a separate description of the
current status and reason why the current status is undesirable
(both in the present tense), followed by commands to the code to
become better.

Thanks.



>> 
>> More below...
>> 
>> > Signed-off-by: Denton Liu <liu.denton@gmail.com>
>> > ---
>> > diff --git a/t/t2018-checkout-branch.sh b/t/t2018-checkout-branch.sh
>> > @@ -23,7 +23,8 @@ do_checkout () {
>> >         # if <sha> is not specified, use HEAD.
>> > -       exp_sha=${2:-$(git rev-parse --verify HEAD)} &&
>> > +       head=$(git rev-parse --verify HEAD) &&
>> > +       exp_sha=${2:-$head} &&
>> 
>> Are you sure this transformation is needed? In my tests, the exit code
>> of the Git command is not lost.
>
> Thanks for double checking, it's not needed. I'll drop this in my next
> reroll.

^ permalink raw reply

* Re: [PATCH 1/1] sparse-checkout: use extern for global variables
From: Junio C Hamano @ 2020-01-02 18:17 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee
In-Reply-To: <c704cb554e1897278913520f266d95a4b04f9639.1577798268.git.gitgitgadget@gmail.com>

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> When the core.sparseCheckoutCone config setting was added in
> 879321eb0b ("sparse-checkout: add 'cone' mode" 2019-11-21), the
> variables storing the config values for core.sparseCheckout and
> core.sparseCheckoutCone were rearranged in cache.h, but in doing
> so the "extern" keyword was dropped.
>
> While we are tending to drop the "extern" keyword for function
> declarations, it is still necessary for global variables used
> across multiple *.c files. The impact of not having the extern
> keyword may be unpredictable.

"May be unpredictable" might be a bit too strong, but I agree that
it is better not to rely on the "common extension" these days, and
instead make sure variable decls have "extern" in front.

Will queue.  Thanks.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  cache.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 1554488d66..cbfaead23a 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -958,8 +958,8 @@ extern int protect_hfs;
>  extern int protect_ntfs;
>  extern const char *core_fsmonitor;
>  
> -int core_apply_sparse_checkout;
> -int core_sparse_checkout_cone;
> +extern int core_apply_sparse_checkout;
> +extern int core_sparse_checkout_cone;
>  
>  /*
>   * Include broken refs in all ref iterations, which will

^ permalink raw reply

* Re: [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts
From: Derrick Stolee @ 2020-01-02 16:20 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: peff, me, szeder.dev, Derrick Stolee, Junio C Hamano
In-Reply-To: <91d89356a20625d04af74d458c28b32445e760c1.1577981654.git.gitgitgadget@gmail.com>

As I was writing this commit message, I changed the plan for how
to fix this. Originally, I was going to set size_multiple = 2 in
builtin/fetch.c, which is how the subject line was created. I forgot
to change that to something more like:

	"commit-graph: prefer default size_mult when given zero"

On 1/2/2020 11:14 AM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> In 50f26bd ("fetch: add fetch.writeCommitGraph config setting",
> 2019-09-02), the fetch builtin added the capability to write a
> commit-graph using the "--split" feature. This feature creates
> multiple commit-graph files, and those can merge based on a set
> of "split options" including a size multiple. The default size
> multiple is 2, which intends to provide a log_2 N depth of the
> commit-graph chain where N is the number of commits.
> 
> However, I noticed during dogfooding that my commit-graph chains
> were becoming quite large when left only to builds by 'git fetch'.
> It turns out that in split_graph_merge_strategy(), we default the
> size_mult variable to 2 except we override it with the context's
> split_opts if they exist. In builtin/fetch.c, we create such a
> split_opts, but do not populate it with values.
> 
> This problem is due to two failures:
> 
>  1. It is unclear that we can add the flag COMMIT_GRAPH_WRITE_SPLIT
>     with a NULL split_opts.
>  2. If we have a non-NULL split_opts, then we override the default
>     values even if a zero value is given.
> 
> Correct both of these issues. First, do not override size_mult when
> the options provide a zero value. Second, stop creating a split_opts
> in the fetch builtin.

This is a correct description of the actual patch.

Thanks,
-Stolee



^ permalink raw reply

* [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts
From: Derrick Stolee via GitGitGadget @ 2020-01-02 16:14 UTC (permalink / raw)
  To: git; +Cc: peff, me, szeder.dev, Derrick Stolee, Junio C Hamano,
	Derrick Stolee
In-Reply-To: <pull.509.git.1577981654.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

In 50f26bd ("fetch: add fetch.writeCommitGraph config setting",
2019-09-02), the fetch builtin added the capability to write a
commit-graph using the "--split" feature. This feature creates
multiple commit-graph files, and those can merge based on a set
of "split options" including a size multiple. The default size
multiple is 2, which intends to provide a log_2 N depth of the
commit-graph chain where N is the number of commits.

However, I noticed during dogfooding that my commit-graph chains
were becoming quite large when left only to builds by 'git fetch'.
It turns out that in split_graph_merge_strategy(), we default the
size_mult variable to 2 except we override it with the context's
split_opts if they exist. In builtin/fetch.c, we create such a
split_opts, but do not populate it with values.

This problem is due to two failures:

 1. It is unclear that we can add the flag COMMIT_GRAPH_WRITE_SPLIT
    with a NULL split_opts.
 2. If we have a non-NULL split_opts, then we override the default
    values even if a zero value is given.

Correct both of these issues. First, do not override size_mult when
the options provide a zero value. Second, stop creating a split_opts
in the fetch builtin.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/fetch.c | 4 +---
 commit-graph.c  | 4 +++-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index f8765b385b..b4c6d921d0 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1866,15 +1866,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	    (fetch_write_commit_graph < 0 &&
 	     the_repository->settings.fetch_write_commit_graph)) {
 		int commit_graph_flags = COMMIT_GRAPH_WRITE_SPLIT;
-		struct split_commit_graph_opts split_opts;
-		memset(&split_opts, 0, sizeof(struct split_commit_graph_opts));
 
 		if (progress)
 			commit_graph_flags |= COMMIT_GRAPH_WRITE_PROGRESS;
 
 		write_commit_graph_reachable(get_object_directory(),
 					     commit_graph_flags,
-					     &split_opts);
+					     NULL);
 	}
 
 	close_object_store(the_repository->objects);
diff --git a/commit-graph.c b/commit-graph.c
index e771394aff..b205e65ed1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1542,7 +1542,9 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
 
 	if (ctx->split_opts) {
 		max_commits = ctx->split_opts->max_commits;
-		size_mult = ctx->split_opts->size_multiple;
+
+		if (ctx->split_opts->size_multiple)
+			size_mult = ctx->split_opts->size_multiple;
 	}
 
 	g = ctx->r->objects->commit_graph;
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH 0/1] [PERF BUG] Fix size_mult option in fetch.writeCommitGraph
From: Derrick Stolee via GitGitGadget @ 2020-01-02 16:14 UTC (permalink / raw)
  To: git; +Cc: peff, me, szeder.dev, Derrick Stolee, Junio C Hamano

I found this while doing some digging into fetch behavior and split commit
graphs. I had been running fetch.writeCommitGraph=true on my local repos for
a while and noticed that the commit-graph chains were much longer than
expected.

The reason is silly, and the commit message includes all the details.

This behavior exists since v2.24.0, so I'm not sure if it makes the bar for
v2.25.0 this late in the release cycle. At minimum, the change is very small
and unlikely to cause more pain.

This is only a performance bug, and the effect is relatively small. A large
list of commit-graph files slows down the commit lookup time as we need to
perform a linear number of binary searches. This only affects finding the
first commit(s) in a commit walk, as after that we can navigate quickly to
the correct position using graph_pos. When a user runs gc (with 
gc.writeCommitGraph=true, on by default), the chain collapses to a single
level, fixing the performance problem.

Thanks, -Stolee

Derrick Stolee (1):
  fetch: set size_multiple in split_commit_graph_opts

 builtin/fetch.c | 4 +---
 commit-graph.c  | 4 +++-
 2 files changed, 4 insertions(+), 4 deletions(-)


base-commit: 99c33bed562b41de6ce9bd3fd561303d39645048
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-509%2Fderrickstolee%2Ffetch-write-commit-graph-split-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-509/derrickstolee/fetch-write-commit-graph-split-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/509
-- 
gitgitgadget

^ permalink raw reply

* Re: [PATCH v5 0/4] git-p4: Usability enhancements
From: Ben Keene @ 2020-01-02 13:50 UTC (permalink / raw)
  To: Junio C Hamano, Luke Diamand; +Cc: Git Users, Ben Keene via GitGitGadget
In-Reply-To: <xmqqh81omd5m.fsf@gitster-ct.c.googlers.com>


On 12/25/2019 2:13 PM, Junio C Hamano wrote:
> Luke Diamand <luke@diamand.org> writes:
>
>> $ git log --reverse --oneline --abbrev-commit
>> origin/maint..origin/bk/p4-misc-usability
>> e2aed5fd5b git-p4: yes/no prompts should sanitize user text
>>     - looks good to me
>>
>> 608e380502 git-p4: show detailed help when parsing options fail
>>     - also looks good to me
>>
>> c4dc935311 git-p4: wrap patchRCSKeywords test to revert changes on failure
>>     - why not just catch the exception, and then drop out of the "if-"
>> condition and fall into the cleanup section at the bottom of that
>> function (line 1976)? As it stands, this is duplicating the cleanup
>> code now.
>>
>> 89c88c0ecf (origin/bk/p4-misc-usability) git-p4: failure because of
>> RCS keywords should show help
>>    - strictly speaking, the code does not actually check if there *are*
>> any RCS keywords, it just checks if the filetype means that RCS kws
>> *would* be expanded *if* they were present. The conflict might be just
>> because....there's a conflict. As it stands this will be giving
>> misleading advice. I would get it to check to see if there really are
>> any RCS keywords in the file.
> Thanks.  Ben, let's keep the first two and discard the rest for now,
> which can later be replaced with updated ones.
That works for me.  So, are there any changes that I should make at
this time, or just let the rest die off?

^ permalink raw reply

* BUG: sendemail-validate hook is run too early
From: Jani Nikula @ 2020-01-02 12:10 UTC (permalink / raw)
  To: git


I'm trying to use the sendemail-validate hook to validate the recipients
of the patch email, among other things. Turns out the hook gets run
immediately on the input patches, *not* on the "e-mail to be sent" as
claimed by githooks(5).

This means the recipients added by git send-email automatically or on
the git send-email command-line, or any changes done by the user with
--annotate will not be validated.

This is easy to demonstrate in a git repo with e.g.

$ ln -s /bin/cat .git/hooks/sendemail-validate
$ git send-email --dry-run -1 --to bypass-validation@example.com

The file passed to the validate hook does not have the address.

If changing the location of the current validation hook seems too risky,
as apparently it's been like this for more than a decade, I suggest
adding another hook on the actual email to be sent.


BR,
Jani.

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply

* Re: [PATCH 1/1] branch: advise the user to checkout a different branch before deleting
From: Eric Sunshine @ 2020-01-02  8:18 UTC (permalink / raw)
  To: Heba Waly via GitGitGadget; +Cc: Git List, Heba Waly, Junio C Hamano
In-Reply-To: <82bf24ce537ca9333d72c2b4698864817801f10f.1577933387.git.gitgitgadget@gmail.com>

On Wed, Jan 1, 2020 at 9:50 PM Heba Waly via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Display a hint to the user when attempting to delete a checked out
> branch saying "Checkout another branch before deleting this one:
> git checkout <branch_name>".
>
> Currently the user gets an error message saying: "error: Cannot delete
> branch <branch_name> checked out at <path>". The hint will be displayed
> after the error message.
>
> Signed-off-by: Heba Waly <heba.waly@gmail.com>
> ---
> diff --git a/builtin/branch.c b/builtin/branch.c
> @@ -240,6 +240,8 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>                                 error(_("Cannot delete branch '%s' "
>                                         "checked out at '%s'"),
>                                       bname.buf, wt->path);
> +                               advise(_("Checkout another branch before deleting this "
> +                                                "one: git checkout <branch_name>"));

s/another/a different/ would make the meaning clearer.

Let's try to avoid underscores in placeholders. <branch-name> would be
better, however, git-checkout documentation just calls this <branch>,
so that's probably a good choice.

However, these days, I think we're promoting git-switch rather than
git-checkout, so perhaps this advice should follow suit.

Finally, is this advice sufficient for newcomers when the branch the
user is trying to delete is in fact checked out in a worktree other
than the worktree in which the git-branch command is being invoked?
That is:

    $ pwd
    /home/me/foo
    $ git branch -D bip
    Cannot delete  branch 'bip' checked out at '/home/me/bar'
    hint: Checkout another branch before deleting this one:
    hint: git checkout <branch>
    $ git checkout master # user follows advice
    $ git branch -D bip
    Cannot delete  branch 'bip' checked out at '/home/me/foo'
    hint: Checkout another branch before deleting this one:
    hint: git checkout <branch>
    $

And the user is left scratching his or her head wondering why
git-branch is still showing the error despite following the
instructions in the hint.

> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> @@ -808,7 +808,8 @@ test_expect_success 'test deleting branch without config' '
>  test_expect_success 'deleting currently checked out branch fails' '
>         git worktree add -b my7 my7 &&
>         test_must_fail git -C my7 branch -d my7 &&
> -       test_must_fail git branch -d my7 &&
> +       test_must_fail git branch -d my7 >actual.out 2>actual.err &&
> +       test_i18ngrep "hint: Checkout another branch" actual.err &&

Why does this capture standard output into 'actual.out' if that file
is never consulted?

>         rm -r my7 &&
>         git worktree prune
>  '

^ permalink raw reply

* [PATCH 1/1] add: use advise function to display hints
From: Heba Waly via GitGitGadget @ 2020-01-02  3:04 UTC (permalink / raw)
  To: git; +Cc: Heba Waly, Junio C Hamano, Heba Waly
In-Reply-To: <pull.508.git.1577934241.gitgitgadget@gmail.com>

From: Heba Waly <heba.waly@gmail.com>

Use the advise function in advice.c to display hints to the users, as
it provides a neat and a standard format for hint messages, i.e: the
text is colored in yellow and the line starts by the word "hint:".

Signed-off-by: Heba Waly <heba.waly@gmail.com>
---
 builtin/add.c  | 4 ++--
 t/t3700-add.sh | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index 4c38aff419..eebf8d772b 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -390,7 +390,7 @@ static int add_files(struct dir_struct *dir, int flags)
 		fprintf(stderr, _(ignore_error));
 		for (i = 0; i < dir->ignored_nr; i++)
 			fprintf(stderr, "%s\n", dir->ignored[i]->name);
-		fprintf(stderr, _("Use -f if you really want to add them.\n"));
+		advise(_("Use -f if you really want to add them.\n"));
 		exit_status = 1;
 	}
 
@@ -480,7 +480,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	if (require_pathspec && pathspec.nr == 0) {
 		fprintf(stderr, _("Nothing specified, nothing added.\n"));
-		fprintf(stderr, _("Maybe you wanted to say 'git add .'?\n"));
+		advise( _("Maybe you wanted to say 'git add .'?\n"));
 		return 0;
 	}
 
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index c325167b90..a649805369 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -326,7 +326,7 @@ test_expect_success 'git add --dry-run of an existing file output' "
 cat >expect.err <<\EOF
 The following paths are ignored by one of your .gitignore files:
 ignored-file
-Use -f if you really want to add them.
+hint: Use -f if you really want to add them.
 EOF
 cat >expect.out <<\EOF
 add 'track-this'
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH 0/1] [Outreachy] [RFC] add: use advise function to display hints
From: Heba Waly via GitGitGadget @ 2020-01-02  3:04 UTC (permalink / raw)
  To: git; +Cc: Heba Waly, Junio C Hamano

The advise function in advice.c provides a neat and a standard format for
hint messages, i.e: the text is colored in yellow and the line starts by the
word "hint:".

This patch suggests using this advise function whenever displaying hints to
improve the user experience, as the user's eyes will get used to the format
and will scan the screen for the yellow hints whenever confused instead of
reading all the output lines looking for advice.

Heba Waly (1):
  add: use advise function to display hints

 builtin/add.c  | 4 ++--
 t/t3700-add.sh | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


base-commit: 0a76bd7381ec0dbb7c43776eb6d1ac906bca29e6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-508%2FHebaWaly%2Fformatting_hints-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-508/HebaWaly/formatting_hints-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/508
-- 
gitgitgadget

^ permalink raw reply

* [PATCH 1/1] branch: advise the user to checkout a different branch before deleting
From: Heba Waly via GitGitGadget @ 2020-01-02  2:49 UTC (permalink / raw)
  To: git; +Cc: Heba Waly, Junio C Hamano, Heba Waly
In-Reply-To: <pull.507.git.1577933387.gitgitgadget@gmail.com>

From: Heba Waly <heba.waly@gmail.com>

Display a hint to the user when attempting to delete a checked out
branch saying "Checkout another branch before deleting this one:
git checkout <branch_name>".

Currently the user gets an error message saying: "error: Cannot delete
branch <branch_name> checked out at <path>". The hint will be displayed
after the error message.

Signed-off-by: Heba Waly <heba.waly@gmail.com>
---
 builtin/branch.c  | 2 ++
 t/t3200-branch.sh | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index d8297f80ff..799e967008 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -240,6 +240,8 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
 				error(_("Cannot delete branch '%s' "
 					"checked out at '%s'"),
 				      bname.buf, wt->path);
+				advise(_("Checkout another branch before deleting this "
+						 "one: git checkout <branch_name>"));
 				ret = 1;
 				continue;
 			}
diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 411a70b0ce..3b2812a8f4 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -808,7 +808,8 @@ test_expect_success 'test deleting branch without config' '
 test_expect_success 'deleting currently checked out branch fails' '
 	git worktree add -b my7 my7 &&
 	test_must_fail git -C my7 branch -d my7 &&
-	test_must_fail git branch -d my7 &&
+	test_must_fail git branch -d my7 >actual.out 2>actual.err &&
+	test_i18ngrep "hint: Checkout another branch" actual.err &&
 	rm -r my7 &&
 	git worktree prune
 '
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH 0/1] [Outreachy] [RFC] branch: advise the user to checkout a different branch before deleting
From: Heba Waly via GitGitGadget @ 2020-01-02  2:49 UTC (permalink / raw)
  To: git; +Cc: Heba Waly, Junio C Hamano

When a user attempts to delete a checked out branch, an error message is
displayed saying: "error: Cannot delete branch checked out at ". This patch
suggests displaying a hint after the error message advising the user to
checkout another branch first using "git checkout ".

Heba Waly (1):
  branch: advise the user to checkout a different branch before deleting

 builtin/branch.c  | 2 ++
 t/t3200-branch.sh | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)


base-commit: 0a76bd7381ec0dbb7c43776eb6d1ac906bca29e6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-507%2FHebaWaly%2Fdelete_branch_hint-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-507/HebaWaly/delete_branch_hint-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/507
-- 
gitgitgadget

^ permalink raw reply

* Re: [PATCH] t3008: find test-tool through path lookup
From: Johannes Schindelin @ 2020-01-01 22:19 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Git Mailing List
In-Reply-To: <2dc0b0f3-19ce-62ab-5af9-fdb2d05b00bd@kdbg.org>

Hi Hannes,

On Sun, 22 Dec 2019, Johannes Sixt wrote:

> Do not use $GIT_BUILD_DIR without quotes; it may contain spaces and be
> split into fields. But it is not necessary to access test-tool with an
> absolute path in the first place as it can be found via path lookup.
> Remove the explicit path.
>
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> ---
>  Found this today coincidentally when I happened to look at the terminal
>  with test output at the right time and saw a suspicous error message
>  that included just one half of the build directory path.

This patch looks good to me.

Thanks,
Dscho

>
>  -- Hannes
>
>  t/t3008-ls-files-lazy-init-name-hash.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t3008-ls-files-lazy-init-name-hash.sh b/t/t3008-ls-files-lazy-init-name-hash.sh
> index 64f047332b..85f3704958 100755
> --- a/t/t3008-ls-files-lazy-init-name-hash.sh
> +++ b/t/t3008-ls-files-lazy-init-name-hash.sh
> @@ -4,7 +4,7 @@ test_description='Test the lazy init name hash with various folder structures'
>
>  . ./test-lib.sh
>
> -if test 1 -eq $($GIT_BUILD_DIR/t/helper/test-tool online-cpus)
> +if test 1 -eq $(test-tool online-cpus)
>  then
>  	skip_all='skipping lazy-init tests, single cpu'
>  	test_done
> --
> 2.24.1.524.gae569673ff
>
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox