Git development
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] Use 'env' to find perl instead of fixed path
From: Jeff King @ 2017-01-14  7:54 UTC (permalink / raw)
  To: Eric Wong
  Cc: Junio C Hamano, Pat Pannuto, Johannes Schindelin, Johannes Sixt,
	git
In-Reply-To: <20170113185246.GA17441@starla>

On Fri, Jan 13, 2017 at 06:52:46PM +0000, Eric Wong wrote:

> > If something we _use_ from a third-party is not warnings-clean,
> > there is no easy way to squelch them if we use "-w", which is a
> > potential downside, isn't it?  I do not know how serious a problem
> > it is in practice.  I suspect that the core package we use from perl
> > distribution are supposed to be warnings-clean, but we use a handful
> > of things from outside the core and I do not know what state they
> > are in.
> 
> Yes, "-w" will trigger warnings in third party packages.
> Existing uses we have should be fine, and I think most Perl
> modules we use or would use are vigilant about being
> warnings-clean.  If we have to leave off a "-w", there should
> probably be a comment at the top stating the reason:
> 
> #!/usr/bin/perl
> # Not using "perl -w" since Foo::Bar <= X.Y.Y is not warnings-clean
> use strict;
> use warnings;
> use Foo::Bar;
> ...

Just as a devil's advocate, why do we care about warnings in third-party
modules? Or more specifically, why do _users_ who are running Git care
about them? We cannot fix them in Git. A user may report the error to
the module author, but the module author may not be responsive, or even
may not be inclined to fix the problem (because they have a particular
opinion on that warning).

In the meantime, the user is stuck with an annoying warning message
until Git is updated as you showed above. Why not just start there
preemptively, and let module authors worry about their own warnings?

-Peff

^ permalink raw reply

* merge maintaining history
From: David J. Bakeman @ 2017-01-14  2:01 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 854 bytes --]

History

git cloned a remote repository and made many changes pushing them all to
said repository over many months.

The powers that be then required me to move project to new repository
server did so by pushing local version to new remote saving all history!

Now have to merge back to original repository(which has undergone many
changes since I split off) but how do I do that without loosing the
history of all the commits since the original move?  Note I need to push
changes to files that are already in existence.  I found on the web a
bunch of ways to insert a whole new directory structure into an existing
repository but as I said I need to do it on top of existing files.  Of
course I can copy all the files from my local working repository to the
cloned remote repository and commit any changes but I loose all the
history that way.

Thanks.

[-- Attachment #2: nakuru.vcf --]
[-- Type: text/x-vcard, Size: 248 bytes --]

begin:vcard
fn:David J. Bakeman
n:Bakeman;David J.
org:Nakuru Software Inc.
adr:;;1504 North 57th Street;Seattle;WA;98103;USA
email;internet:nakuru@comcast.net
tel;work:(206)545-0609
tel;fax:(206)600-6957
x-mozilla-html:TRUE
version:2.1
end:vcard


^ permalink raw reply

* Re: [PATCH] submodule update: run custom update script for initial populating as well
From: Stefan Beller @ 2017-01-14  0:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Brandon Williams, Chris Packham, Spencer Olson,
	git@vger.kernel.org
In-Reply-To: <xmqq7f5yeclw.fsf@gitster.mtv.corp.google.com>

On Fri, Jan 13, 2017 at 3:58 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> +                     if test "$update_module" = "merge" ||
>> +                        test "$update_module" = "rebase" ||
>> +                        test "$update_module" = "none"
>> +                     then
>> +                             update_module=checkout
>> +                     fi
>
>         case "$update_module" in
>         merge | rebase | none)
>                 update_module=checkout ;;
>         esac
>
> Shorter and probably easier to update.

agreed, want me to reroll or squash locally?

Thanks,
Stefan

^ permalink raw reply

* Re: [PATCH] submodule update: run custom update script for initial populating as well
From: Junio C Hamano @ 2017-01-13 23:58 UTC (permalink / raw)
  To: Stefan Beller; +Cc: bmwill, judge.packham, olsonse, git
In-Reply-To: <20170113194326.13950-1-sbeller@google.com>

Stefan Beller <sbeller@google.com> writes:

> +			if test "$update_module" = "merge" ||
> +			   test "$update_module" = "rebase" ||
> +			   test "$update_module" = "none"
> +			then
> +				update_module=checkout
> +			fi

	case "$update_module" in
	merge | rebase | none)
		update_module=checkout ;;
	esac

Shorter and probably easier to update.

^ permalink raw reply

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
From: Junio C Hamano @ 2017-01-13 23:56 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: René Scharfe, git
In-Reply-To: <c74c260d-1a4d-39f6-a644-4f90a67d6d82@oracle.com>

Vegard Nossum <vegard.nossum@oracle.com> writes:

> The patch will work as intended and as expected for 95% of the users out
> there (javadoc, Doxygen, kerneldoc, etc. all have the comment
> immediately preceding the function) and fixes a very real problem for me
> (and I expect many others) _today_; for the remaining 5% (who put a
> blank line between their comment and the start of the function) it will
> revert back to the current behaviour, so there should be no regression
> for them.

I notice your 95% are all programming languages, but I am more
worried about the contents written in non programming languages
(René gave HTML an an example--there may be other types of contents
that we programmer types do not deal with every day, but Git users
depend on).  

I am also more focused on keeping the codebase maintainable in good
health by making sure that we made an effort to find a solution that
is general-enough before solving a single specific problem you have
today.  We may end up deciding that a blank-line heuristics gives us
good enough tradeoff, but I do not want us to make a decision before
thinking.

>> The way "diff -W" codepath used it as if it were always the very
>> first line of a function was bound to invite a patch like this, and
>> if we want to be extra elaborate, I agree that an extra mechanism to
>> say "the line the funcline regexp matches is not the beginning of a
>> function, but the beginning is a line that matches this other regexp
>> before that line" may help.
>>
>> Do we really want to be that elaborate, though?  I dunno.
>
> Adding a regex instead of the simple "blank line" test doesn't seem very
> difficult to do, but I am doubtful that it will make any difference in
> practice. But that can be done incrementally as well by the people who
> would actually need it (who I strongly suspect do not exist in the first
> place).

At least, the damage can be limited to the cases we know would work
well if we go that way.

^ permalink raw reply

* [PATCH] transport submodules: correct error message
From: Stefan Beller @ 2017-01-13 23:54 UTC (permalink / raw)
  To: gitster; +Cc: git, hvoigt, dborowitz, Stefan Beller

When push.recurseSubmodules is set to "check" or "on-demand", the transport
layer tries to determine if a submodule needs pushing. This check is done
by walking all remote refs that are known.

For remotes we only store the refs/heads/* (and tags), which doesn't
include all commits. In e.g. Gerrit commits often end up at refs/changes/*
(that we do not store) when pushing to refs/for/master (which we also do
not store). So a workflow such as the following still fails:

    $ git -C <submodule> push origin HEAD:refs/for/master
    $ git push origin HEAD:refs/for/master
    The following submodule paths contain changes that can
    not be found on any remote:
      submodule

    Please try

        git push --recurse-submodules=on-demand

    or cd to the path and use

        git push

    to push them to a remote.

Trying to push with --recurse-submodules=on-demand would run into
the same problem. To fix this issue
    1) specifically mention that we looked for branches on the remote.
    2) advertise pushing without recursing into submodules. ("Use this
       command to make the error message go away")

While at it, remove some empty lines, as they blow up the error message.

Reported-by: Dave Borowitz <dborowitz@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 transport.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/transport.c b/transport.c
index 3e8799a611..2445bf0dca 100644
--- a/transport.c
+++ b/transport.c
@@ -883,14 +883,14 @@ static void die_with_unpushed_submodules(struct string_list *needs_pushing)
 	int i;
 
 	fprintf(stderr, _("The following submodule paths contain changes that can\n"
-			"not be found on any remote:\n"));
+			"not be found on any remote branch:\n"));
 	for (i = 0; i < needs_pushing->nr; i++)
 		fprintf(stderr, "  %s\n", needs_pushing->items[i].string);
-	fprintf(stderr, _("\nPlease try\n\n"
-			  "	git push --recurse-submodules=on-demand\n\n"
-			  "or cd to the path and use\n\n"
-			  "	git push\n\n"
-			  "to push them to a remote.\n\n"));
+	fprintf(stderr, _("\nSuppress submodule checks via\n"
+			  "	git push --no-recurse-submodules\n"
+			  "or cd to the path and use\n"
+			  "	git push\n"
+			  "to push them to a remote.\n"));
 
 	string_list_clear(needs_pushing, 0);
 
-- 
2.11.0.297.g298debce27


^ permalink raw reply related

* Re: [PATCH] submodule update: run custom update script for initial populating as well
From: Stefan Beller @ 2017-01-13 23:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Brandon Williams, Chris Packham, Spencer Olson,
	git@vger.kernel.org
In-Reply-To: <xmqqfukmedca.fsf@gitster.mtv.corp.google.com>

On Fri, Jan 13, 2017 at 3:42 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> In 1b4735d9f3 (submodule: no [--merge|--rebase] when newly cloned,
>> 2011-02-17), all actions were defaulted to checkout for populating
>> a submodule initially, because merging or rebasing makes no sense
>> in that situation.
>>
>> Other commands however do make sense, such as the custom command
>> that was added later (6cb5728c43, submodule update: allow custom
>> command to update submodule working tree, 2013-07-03).
>
> Makes sense.
>
>> I am unsure about the "none" command, as I can see an initial
>> checkout there as a useful thing. On the other hand going strictly
>> by our own documentation, we should do nothing in case of "none"
>> as well, because the user asked for it.
>
> I think "none" is "I'll decide which revision of the submodule
> should be there---do not decide it for me".  If the user is
> explicitly saying with "git submodule init" to have "some" version,
> and if the user did not have any (because the user didn't show
> interest in any checkout of the submodule before), then I think it
> probably makes more sense to checkout the version bound to the
> superproject, than leaving the directory empty.
>
>> Reported-by: Han-Wen Nienhuys <hanwen@google.com>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  git-submodule.sh            |  7 ++++++-
>>  t/t7406-submodule-update.sh | 15 +++++++++++++++
>>  2 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index 554bd1c494..aeb721ab7e 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -606,7 +606,12 @@ cmd_update()
>>               if test $just_cloned -eq 1
>>               then
>>                       subsha1=
>> -                     update_module=checkout
>> +                     if test "$update_module" = "merge" ||
>> +                        test "$update_module" = "rebase" ||
>> +                        test "$update_module" = "none"
>> +                     then
>> +                             update_module=checkout
>> +                     fi
>
> ... which seems to be what you did.  Do we need a documentation
> update, or does this just make the behaviour of this corner case
> consistent with what is already documented?

I think we do not need to update the documentation, because the
documentation doesn't call out the first/initial call to update to be special.
So for a non existing submodule we can do:

    git submodule update --init --[rebase|merge]

and that falls back to checkout, which *looks* like it was a rebase/merge.
The original bug report was that

    $ git config submodule.<name>.update !echo-script.sh
    $ git submodule update <submodule>
    Submodule path '<submodule>': 'echo-script.sh'
    $ rm -rf <submodule>
    $ git submodule update <submodule>
    .. checked out ..

So while I usually think more verbose documentation is a good idea,
this time it's different, as it merely aligns current documented
behavior with reality.

Thanks,
Stefan

^ permalink raw reply

* Re: [PATCH] submodule update: run custom update script for initial populating as well
From: Junio C Hamano @ 2017-01-13 23:42 UTC (permalink / raw)
  To: Stefan Beller; +Cc: bmwill, judge.packham, olsonse, git
In-Reply-To: <20170113194326.13950-1-sbeller@google.com>

Stefan Beller <sbeller@google.com> writes:

> In 1b4735d9f3 (submodule: no [--merge|--rebase] when newly cloned,
> 2011-02-17), all actions were defaulted to checkout for populating
> a submodule initially, because merging or rebasing makes no sense
> in that situation.
>
> Other commands however do make sense, such as the custom command
> that was added later (6cb5728c43, submodule update: allow custom
> command to update submodule working tree, 2013-07-03).

Makes sense.

> I am unsure about the "none" command, as I can see an initial
> checkout there as a useful thing. On the other hand going strictly
> by our own documentation, we should do nothing in case of "none"
> as well, because the user asked for it.

I think "none" is "I'll decide which revision of the submodule
should be there---do not decide it for me".  If the user is
explicitly saying with "git submodule init" to have "some" version,
and if the user did not have any (because the user didn't show
interest in any checkout of the submodule before), then I think it
probably makes more sense to checkout the version bound to the
superproject, than leaving the directory empty.

> Reported-by: Han-Wen Nienhuys <hanwen@google.com>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  git-submodule.sh            |  7 ++++++-
>  t/t7406-submodule-update.sh | 15 +++++++++++++++
>  2 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 554bd1c494..aeb721ab7e 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -606,7 +606,12 @@ cmd_update()
>  		if test $just_cloned -eq 1
>  		then
>  			subsha1=
> -			update_module=checkout
> +			if test "$update_module" = "merge" ||
> +			   test "$update_module" = "rebase" ||
> +			   test "$update_module" = "none"
> +			then
> +				update_module=checkout
> +			fi

... which seems to be what you did.  Do we need a documentation
update, or does this just make the behaviour of this corner case
consistent with what is already documented?

Thanks.

^ permalink raw reply

* Re: [PATCH v2 1/2] diff --no-index: follow symlinks
From: Junio C Hamano @ 2017-01-13 23:37 UTC (permalink / raw)
  To: Dennis Kaarsemaker; +Cc: git
In-Reply-To: <20170113102021.6054-2-dennis@kaarsemaker.net>

Dennis Kaarsemaker <dennis@kaarsemaker.net> writes:

> Git's diff machinery does not follow symlinks, which makes sense as git
> itself also does not, but stores the symlink destination.
>
> In --no-index mode however, it is useful for diff to to follow symlinks,
> matching the behaviour of ordinary diff. A new --no-dereference (name
> copied from diff) option has been added to disable this behaviour.

If you add a --no-dereference option, --dereference option should
also be there, so that "--no-dereference" earlier on the command
line (perhaps coming from a configured alias) can be countermanded.

While I am not opposed to giving an optional feature to treat a
symlink as if it is a regular file with the contents of its link
target, I am not enthused that this patch tries to make that the
default behaviour.  We are not matching the behaviour of ordinary
diff anyway [*1*].

It probably makes more sense for our first step to introduce this
feature that is only enabled when "--dereference" option is given.
Making it the default for "--no-index" case should be discussed as
a separate step.

[Footnote]

*1* E.g. "git diff --no-index dirA/ dirB/" does not say "Only in
dirA: file".  It also recurses into subdirectories of dirA/ and
dirB/ without the --recursive option.

^ permalink raw reply

* Re: [PATCH v2 2/2] diff --no-index: support reading from pipes
From: Junio C Hamano @ 2017-01-13 23:24 UTC (permalink / raw)
  To: Dennis Kaarsemaker; +Cc: git
In-Reply-To: <20170113102021.6054-3-dennis@kaarsemaker.net>

Dennis Kaarsemaker <dennis@kaarsemaker.net> writes:

> +	/*
> +	 * In --no-index mode, we support reading from pipes. canon_mode, called by
> +	 * fill_filespec, gets confused by this and thinks we now have subprojects.
> +	 * Detect this and tell the rest of the diff machinery to treat pipes as
> +	 * normal files.
> +	 */
> +	if (S_ISGITLINK(s->mode))
> +		s->mode = S_IFREG | ce_permissions(mode);

Hmph.  Pipes on your system may satisfy S_ISGITLINK() and confuse
later code, and this hack may work it around.  But a proper gitlink
that was thrown at this codepath (probably by mistake) will also be
caught and pretend as if it were a regular file.  Do we know for
certain that pipes everywhere will be munged to appear as
S_ISGITLINK()?  Is it possible to do the "are we looking at an end
of a pipe?" check _before_ canon_mode() munges and stores the result
in s->mode in diff-no-index.c somewhere, perhaps inside get_mode()?

> diff --git a/diff.c b/diff.c
> index 2fc0226338..bb04eab331 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2839,9 +2839,18 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
>  		fd = open(s->path, O_RDONLY);
>  		if (fd < 0)
>  			goto err_empty;
> -		s->data = xmmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0);
> +		if (!S_ISREG(st.st_mode)) {
> +			struct strbuf sb = STRBUF_INIT;
> +			strbuf_read(&sb, fd, 0);
> +			s->size = sb.len;
> +			s->data = strbuf_detach(&sb, NULL);
> +			s->should_free = 1;
> +		}
> +		else {
> +			s->data = xmmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0);
> +			s->should_munmap = 1;
> +		}
>  		close(fd);
> -		s->should_munmap = 1;

I like the fact that, by extending the !S_ISREG() check this patch
introduces, we can later use the new "do not mmap but allocate to
read" codepath for small files, which may be more efficient.  We may
want to have a small helper

	static int should_mmap_file_contents(struct stat *st)
	{
		return S_ISREG(st->st_mode);
	}

so that we can do such an enhancement later more easily.

So, I am skeptical with the "do we have pipe" check in the earlier
hunk, but otherwise I think what this patch wanted to solve is a
reasonable problem to tackle.

Thanks.

^ permalink raw reply

* Re: [PATCH 2/2] Use 'env' to find perl instead of fixed path
From: Eric Wong @ 2017-01-13 21:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Pat Pannuto, Johannes Schindelin, Johannes Sixt, git
In-Reply-To: <xmqq37gmhgpn.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> > Junio C Hamano <gitster@pobox.com> wrote:
> >> Eric Wong <e@80x24.org> writes:
> >> > Pat Pannuto <pat.pannuto@gmail.com> wrote:
> >> >> You may still want the 1/2 patch in this series, just to make things
> >> >> internally consistent with "-w" vs "use warnings;" inside git's perl
> >> >> scripts.
> >> >
> >> > No, that is a step back.  "-w" affects the entire process, so it
> >> > spots more potential problems.  The "warnings" pragma is scoped
> >> > to the enclosing block, so it won't span across files.
> >> 
> >> OK, so with "-w", we do not have to write "use warnings" in each of
> >> our files to get them checked.  It is handy when we ship our own
> >> libs (e.g. Git.pm) that are used by our programs.
> >
> > Yes.  "use warnings" should be in our own libs in case other
> > people run without "-w"
> 
> Would it mean that we need both anyway?  That is, add missing "use
> warnings" without removing "-w" from she-bang line?

Yes, we keep "use warnings" other people may use, at least.
No harm in keeping that in top-level scripts, I guess.

> Speaking of Perl, I recall that somebody complained that we ship
> with and do use a stale copy of Error.pm that has been deprecated.
> I am not asking you to do so, but we may want to see somebody look
> into it (i.e. assessing the current situation, and if it indeed is
> desirable for us to wean ourselves away from Error.pm, update our
> codepaths that use it).

Agreed, I'd definitely prefer to move towards the basic eval/die
construct without relying on a bundled 3rd-party mechanism.
But we might need a migration path for out-of-tree users of
Git.pm (if any)...

I'm sure I've agreed this was a path we should be taking in the
past, but did something about it myself.  So yeah, maybe Pat or
somebody else interested can take care of this :)

Thanks.

^ permalink raw reply

* Re: [PATCH 5/5] describe: teach describe negative pattern matches
From: Johannes Sixt @ 2017-01-13 21:31 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jacob Keller, Git mailing list
In-Reply-To: <CA+P7+xq1LMkRG_aSyamrsPUQE+rDv4A9Qd19tDMgx-_a5OHsqQ@mail.gmail.com>

Am 13.01.2017 um 07:57 schrieb Jacob Keller:
> On Thu, Jan 12, 2017 at 10:43 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>>  When you write
>>
>>   git log --branches --exclude=origin/* --remotes
>>
>> --exclude=origin/* applies only to --remotes, but not to --branches.
>
> Well for describe I don't think the order matters.

That is certainly true today. But I would value consistency more. We 
would lose it if some time in the future 'describe' accepts --branches 
and --remotes in addition to --tags and --all.

-- Hannes


^ permalink raw reply

* Re: [PATCH 25/27] attr: store attribute stacks in hashmap
From: Junio C Hamano @ 2017-01-13 21:20 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, pclouds, sbeller
In-Reply-To: <20170112235354.153403-26-bmwill@google.com>

Brandon Williams <bmwill@google.com> writes:

> The last big hurdle towards a thread-safe API for the attribute system
> is the reliance on a global attribute stack that is modified during each
> call into the attribute system.
>
> This patch removes this global stack and instead a stack is retrieved or
> constructed locally.  Since each of these stacks is only used as a
> read-only structure once constructed, they can be stored in a hashmap
> and shared between threads.

Very good.

The reason why the original code used a stack was because it wanted
to keep only the info read from releavant files in-core, discarding
ones from files no-longer relevant (because the traversal switched
to another subdirectory of the same parent directory), to avoid the
memory consumption grow unbounded.  It probably was a premature
"optimization" that we can do without, so keeping everything we have
read so far in a hashmap (which is my understanding of what is going
on in this patch) is probably OK.

I suspect that this hashmap may eventually need to become per
attr_check if we want to follow through the optimization envisioned
by patch 15/27.

Inside fill(), path_matches() is called for the number of match_attr
in the entire attribute stack but it is wasteful to check if the
path matches with the a.u.pat if none of the a.state[] entries talk
about attributes and macros that are eventually get used by the
caller of check_attr().  By introducing a wrapping structure, 15/27
wanted to make sure that we have a place to store a "reduced"
attribute stack that is kept per attr_check that has only entries
from the files that talk about the attributes the particular
attr_check wants to learn about.

I need to think about this a bit more, but I do not offhand think
that it makes future such enhancement to make it per-check harder to
move from a global stack to a global hashmap, i.e. the above is not
an objection to this step.

> One caveat with storing and sharing the stack frames like this is that
> the info stack needs to be treated separately from the rest of the
> attribute stack.  This is because each stack frame holds a pointer to
> the stack that comes before it and if it was placed on top of the rest
> of the attribute stack then this pointer would be different for each
> attribute stack and wouldn't be able to be shared between threads.  In
> order to allow for sharing the info stack frame it needs to be its own
> isolated frame and can simply be processed first to have the same affect
> of being at the top of the stack.

Good.

Thanks.

^ permalink raw reply

* Re: [RFC] Add support for downloading blobs on demand
From: Shawn Pearce @ 2017-01-13 21:07 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, benpeart
In-Reply-To: <20170113155253.1644-1-benpeart@microsoft.com>

On Fri, Jan 13, 2017 at 7:52 AM, Ben Peart <peartben@gmail.com> wrote:
>
> Goal
> ~~~~
>
> To be able to better handle repos with many files that any individual
> developer doesn’t need it would be nice if clone/fetch only brought down
> those files that were actually needed.
>
> To enable that, we are proposing adding a flag to clone/fetch that will
> instruct the server to limit the objects it sends to commits and trees
> and to not send any blobs.
>
> When git performs an operation that requires a blob that isn’t currently
> available locally, it will download the missing blob and add it to the
> local object store.

Interesting. This is also an area I want to work on with my team at
$DAY_JOB. Repositories are growing along multiple dimensions, and
developers or editors don't always need all blobs for all time
available locally to successfully perform their work.

> Design
> ~~~~~~
>
> Clone and fetch will pass a “--lazy-clone” flag (open to a better name
> here) similar to “--depth” that instructs the server to only return
> commits and trees and to ignore blobs.

My group at $DAY_JOB hasn't talked about it yet, but I want to add a
protocol capability that lets clone/fetch ask only for blobs smaller
than a specified byte count. This could be set to a reasonable text
file size (e.g. <= 5 MiB) to predominately download only source files
and text documentation, omitting larger binaries.

If the limit was set to 0, its the same as your idea to ignore all blobs.

> Later during git operations like checkout, when a blob cannot be found
> after checking all the regular places (loose, pack, alternates, etc),
> git will download the missing object and place it into the local object
> store (currently as a loose object) then resume the operation.

Right. I'd like to have this object retrieval be inside the native Git
wire protocol, reusing the remote configuration and authentication
setup. That requires expanding the server side of the protocol
implementation slightly allowing any reachable object to be retrieved
by SHA-1 alone. Bitmap indexes can significantly reduce the
computational complexity for the server.

> To prevent git from accidentally downloading all missing blobs, some git
> operations are updated to be aware of the potential for missing blobs.
> The most obvious being check_connected which will return success as if
> everything in the requested commits is available locally.

This ... sounds risky for the developer, as the repository may be
corrupt due to a missing object, and the user cannot determine it.

Would it be reasonable for the server to return a list of SHA-1s it
knows should exist, but has omitted due to the blob threshold (above),
and the local repository store this in a binary searchable file?
During connectivity checking its assumed OK if an object is not
present in the object store, but is listed in this omitted objects
file.

> To minimize the impact on the server, the existing dumb HTTP protocol
> endpoint “objects/<sha>” can be used to retrieve the individual missing
> blobs when needed.

I'd prefer this to be in the native wire protocol, where the objects
are in pack format (which unfortunately differs from loose format). I
assume servers would combine many objects into pack files, potentially
isolating large uncompressable binaries into their own packs, stored
separately from commits/trees/small-text-blobs.

I get the value of this being in HTTP, where HTTP caching inside
proxies can be leveraged to reduce master server load. I wonder if the
native wire protocol could be taught to use a variation of an HTTP GET
that includes the object SHA-1 in the URL line, to retrieve a
one-object pack file.

> Performance considerations
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> We found that downloading commits and trees on demand had a significant
> negative performance impact.  In addition, many git commands assume all
> commits and trees are available locally so they quickly got pulled down
> anyway.  Even in very large repos the commits and trees are relatively
> small so bringing them down with the initial commit and subsequent fetch
> commands was reasonable.
>
> After cloning, the developer can use sparse-checkout to limit the set of
> files to the subset they need (typically only 1-10% in these large
> repos).  This allows the initial checkout to only download the set of
> files actually needed to complete their task.  At any point, the
> sparse-checkout file can be updated to include additional files which
> will be fetched transparently on demand.
>
> Typical source files are relatively small so the overhead of connecting
> and authenticating to the server for a single file at a time is
> substantial.  As a result, having a long running process that is started
> with the first request and can cache connection information between
> requests is a significant performance win.

Junio and I talked years ago (offline, sorry no mailing list archive)
about "narrow checkout", which is the idea of the client being able to
ask for a pack file from the server that only includes objects along
specific path names. This would allow a client to amortize the setup
costs, and even delta compress source files against each other (e.g.
boilerplate across Makefiles or license headers).

If the paths of interest can be determined as a batch before starting
the connection, this may be easier than maintaining a cross platform
connection cache in a separate process.

> Now some numbers
> ~~~~~~~~~~~~~~~~
>
> One repo has 3+ million files at tip across 500K folders with 5-6K
> active developers.  They have done a lot of work to remove large files
> from the repo so it is down to < 100GB.
>
> Before changes: clone took hours to transfer the 87GB .pack + 119MB .idx
>
> After changes: clone took 4 minutes to transfer 305MB .pack + 37MB .idx
>
> After hydrating 35K files (the typical number any individual developer
> needs to do their work), there was an additional 460 MB of loose files
> downloaded.
>
> Total savings: 86.24 GB * 6000 developers = 517 Terabytes saved!
>
> We have another repo (3.1 M files, 618 GB at tip with no history with
> 3K+ active developers) where the savings are even greater.

This is quite impressive, and shows this strategy has a lot of promise.


> Future Work
> ~~~~~~~~~~~
>
> The current prototype calls a new hook proc in sha1_object_info_extended
> and read_object, to download each missing blob.  A better solution would
> be to implement this via a long running process that is spawned on the
> first download and listens for requests to download additional objects
> until it terminates when the parent git operation exits (similar to the
> recent long running smudge and clean filter work).

Or batching these up in advance. checkout should be able to determine
which path entries from the index it wants to write to the working
tree. Once it has that set of paths it wants to write, it should be
fast to construct a subset of paths for which the blobs are not
present locally, and then pass the entire group off for download.

> Need to do more investigation into possible code paths that can trigger
> unnecessary blobs to be downloaded.  For example, we have determined
> that the rename detection logic in status can also trigger unnecessary
> blobs to be downloaded making status slow.

There isn't much of a workaround here. Only options I can see are
disabling rename detection when objects are above a certain size, or
removing entries from the rename table when the blob isn't already
local, which may yield different results than if the blob(s) were
local.

Another is to try to have actual source files always be local, and
thus we only punt on rename detection for bigger files that are more
likely to be binary, and thus less likely to match for rename[1]
unless it was SHA-1 identity match, which can be done without the
blob(s) present.


[1] I assume most really big files are some sort of media asset (e.g.
JPEG), where a change inside the source data may result in large
difference in bytes due to the compression applied by the media file
format.

> Need to investigate an alternate batching scheme where we can make a
> single request for a set of "related" blobs and receive single a
> packfile (especially during checkout).

Heh, what I just said above. Glad to see you already thought of it.

> Need to investigate adding a new endpoint in the smart protocol that can
> download both individual blobs as well as a batch of blobs.

Agreed, I said as much above. Again, glad to see you have similar ideas. :)

^ permalink raw reply

* Re: [PATCH 0/5] extend git-describe pattern matching
From: Jacob Keller @ 2017-01-13 20:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jacob Keller, Git mailing list
In-Reply-To: <xmqqpojqhk3i.fsf@gitster.mtv.corp.google.com>

On Fri, Jan 13, 2017 at 10:48 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
>
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> Teach git describe and git name-rev the ability to match multiple
>> patterns inclusively. Additionally, teach these commands to also accept
>> negative patterns to discard any refs which match.
>
> You made quick responses to reviews with "will change", so I am not
> queuing this round to my tree; please don't mistake that as my
> indifference or opposition to the topic.  This sounds like a good
> thing.

Perfect. I will probably take a few days till I am back at a computer
and can do this, but I will be submitting with the suggested changes
soon.

>
> As to the semantics of mixing positives and negatives, I would
> recommend this to follow how positive and negative pathspecs mix.
> IIRC we chose to use the most simple and easy to explain option,
> i.e. a thing must match at least one of the positives and must not
> match any of the negatives to be considered a match.
>
>

That is the current implementation, so I will stick with it. It's the
simplest, and easiest to implement.

Thanks,
Jake

^ permalink raw reply

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
From: Vegard Nossum @ 2017-01-13 20:20 UTC (permalink / raw)
  To: Junio C Hamano, René Scharfe; +Cc: git
In-Reply-To: <xmqqeg06hh6z.fsf@gitster.mtv.corp.google.com>

On 13/01/2017 20:51, Junio C Hamano wrote:
> René Scharfe <l.s.r@web.de> writes:
>> That's true, but I'm not sure "non-empty line before function line" is
>> good enough a definition for desirable lines.  It wouldn't work for
>> people who don't believe in empty lines.  Or for those that put a
>> blank line between comment and function.  (I have an opinion on such
>> habits, but git diff should probably stay neutral.)  And that's just
>> for C code; I have no idea how this heuristic would hold up for other
>> file types like HTML.
>
> As you are, I am fairly negative on the heuristic based on the
> "non-blank" thing.  We tried once with compaction-heuristics already
> and it did not quite perform well.  Let's not hardcode another one.

The patch will work as intended and as expected for 95% of the users out
there (javadoc, Doxygen, kerneldoc, etc. all have the comment
immediately preceding the function) and fixes a very real problem for me
(and I expect many others) _today_; for the remaining 5% (who put a
blank line between their comment and the start of the function) it will
revert back to the current behaviour, so there should be no regression
for them.

For the 0% who don't put even a single blank line between their
functions, it will probably not work as expected, but then again I have
never seen such a coding style in any language, so I am doubtful if this
is something that needs to be taken into account in the first place.

>> We can identify function lines with arbitrary precision (with a
>> xfuncname regex, if needed), but there is no accurate way to classify
>> lines as comments, or as the end of functions.  Adding optional
>> regexes for single- and multi-line comments would help, at least for
>> C.
>
> The funcline regexp is used for two related but different purposes.
> It identifies a single line to be placed on @@ ... @@ line before a
> diff hunk.  This line however does not have to be at the beginning
> of a function.  It has to be the line that conveys the most
> significant information (e.g. the name of the function).
>
> The way "diff -W" codepath used it as if it were always the very
> first line of a function was bound to invite a patch like this, and
> if we want to be extra elaborate, I agree that an extra mechanism to
> say "the line the funcline regexp matches is not the beginning of a
> function, but the beginning is a line that matches this other regexp
> before that line" may help.
>
> Do we really want to be that elaborate, though?  I dunno.

Adding a regex instead of the simple "blank line" test doesn't seem very
difficult to do, but I am doubtful that it will make any difference in
practice. But that can be done incrementally as well by the people who
would actually need it (who I strongly suspect do not exist in the first
place).

> I wonder if it would be sufficient to make -W take an optional
> number, e.g. "git show -W4", to add extre context lines before the
> funcline.
>

I don't like specifying a fixed number, that negates almost all the
reason for using -W in the first place; I would much prefer adding
a config variable to control the -W behaviour (or a new diff flag).


Vegard

^ permalink raw reply

* Re: [PATCH] lib-submodule-update.sh: drop unneeded shell
From: Junio C Hamano @ 2017-01-13 20:19 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org
In-Reply-To: <CAGZ79kayLoH7EURQ9aKGh+FzDz_BegJRjB2175qo53beLZDYog@mail.gmail.com>

Stefan Beller <sbeller@google.com> writes:

> On Fri, Jan 13, 2017 at 11:55 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Stefan Beller <sbeller@google.com> writes:
>>>
>>>> In modern Git we prefer "git -C <cmd" over "(cd <somewhere && git <cmd>)"
>>>> as it doesn't need an extra shell.
>>>
>>> There is a matching '>' missing.  The description is correct (I am
>>> not sure if there actually is "preference", though), but I found the
>>> title a bit misleading....
>>
>> It turns out that there were two missing '>' ;-)  It tentatively has
>> become like this in my tree.
>
> Thanks for fixing up locally. I had resent as
> "[PATCH] lib-submodule-update.sh: reduce use of subshell by using git -C <dir>"
> but you can ignore that now.

Yeah, apparently our mails crossed.  Yours still have "git -C <cmd>"
that should have been "git -C <dir> <cmd>", so I'll keep the locally
munged one.

Thanks.



^ permalink raw reply

* Re: [PATCH] lib-submodule-update.sh: drop unneeded shell
From: Stefan Beller @ 2017-01-13 20:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org
In-Reply-To: <xmqqa8auhgzt.fsf@gitster.mtv.corp.google.com>

On Fri, Jan 13, 2017 at 11:55 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Stefan Beller <sbeller@google.com> writes:
>>
>>> In modern Git we prefer "git -C <cmd" over "(cd <somewhere && git <cmd>)"
>>> as it doesn't need an extra shell.
>>
>> There is a matching '>' missing.  The description is correct (I am
>> not sure if there actually is "preference", though), but I found the
>> title a bit misleading....
>
> It turns out that there were two missing '>' ;-)  It tentatively has
> become like this in my tree.

Thanks for fixing up locally. I had resent as
"[PATCH] lib-submodule-update.sh: reduce use of subshell by using git -C <dir>"
but you can ignore that now.

Thank,
Stefan

^ permalink raw reply

* Re: [PATCH 2/2] Use 'env' to find perl instead of fixed path
From: Junio C Hamano @ 2017-01-13 20:01 UTC (permalink / raw)
  To: Eric Wong; +Cc: Pat Pannuto, Johannes Schindelin, Johannes Sixt, git
In-Reply-To: <20170113185246.GA17441@starla>

Eric Wong <e@80x24.org> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>> Eric Wong <e@80x24.org> writes:
>> > Pat Pannuto <pat.pannuto@gmail.com> wrote:
>> >> You may still want the 1/2 patch in this series, just to make things
>> >> internally consistent with "-w" vs "use warnings;" inside git's perl
>> >> scripts.
>> >
>> > No, that is a step back.  "-w" affects the entire process, so it
>> > spots more potential problems.  The "warnings" pragma is scoped
>> > to the enclosing block, so it won't span across files.
>> 
>> OK, so with "-w", we do not have to write "use warnings" in each of
>> our files to get them checked.  It is handy when we ship our own
>> libs (e.g. Git.pm) that are used by our programs.
>
> Yes.  "use warnings" should be in our own libs in case other
> people run without "-w"

Would it mean that we need both anyway?  That is, add missing "use
warnings" without removing "-w" from she-bang line?

> Yes, "-w" will trigger warnings in third party packages.
> Existing uses we have should be fine, and I think most Perl
> modules we use or would use are vigilant about being
> warnings-clean.  If we have to leave off a "-w", there should
> probably be a comment at the top stating the reason:
>
> #!/usr/bin/perl
> # Not using "perl -w" since Foo::Bar <= X.Y.Y is not warnings-clean
> use strict;
> use warnings;
> use Foo::Bar;
> ...

Good.

Speaking of Perl, I recall that somebody complained that we ship
with and do use a stale copy of Error.pm that has been deprecated.
I am not asking you to do so, but we may want to see somebody look
into it (i.e. assessing the current situation, and if it indeed is
desirable for us to wean ourselves away from Error.pm, update our
codepaths that use it).

Thanks.

^ permalink raw reply

* Re: [PATCH 0/3] updates to gitk & git-gui doc now gitview has gone
From: Junio C Hamano @ 2017-01-13 20:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Philip Oakley, GitList
In-Reply-To: <alpine.DEB.2.20.1701131622510.3469@virtualbox>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> Lets remove the reference in the gitk man page, and do some other
>> fixups while in the area.
>
> These changes all look sensible to me.

To me, too.  Thanks, both.

^ permalink raw reply

* Re: [PATCH] lib-submodule-update.sh: drop unneeded shell
From: Junio C Hamano @ 2017-01-13 19:55 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git
In-Reply-To: <xmqqtw92hkgc.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <sbeller@google.com> writes:
>
>> In modern Git we prefer "git -C <cmd" over "(cd <somewhere && git <cmd>)"
>> as it doesn't need an extra shell.
>
> There is a matching '>' missing.  The description is correct (I am
> not sure if there actually is "preference", though), but I found the
> title a bit misleading....

It turns out that there were two missing '>' ;-)  It tentatively has
become like this in my tree.

-- >8 --
From: Stefan Beller <sbeller@google.com>
Date: Wed, 11 Jan 2017 10:47:32 -0800
Subject: [PATCH] lib-submodule-update.sh: reduce use of subshell by using "git -C"

We write

    (cd <dir> && git <cmd>)

to avoid

    cd <dir> && git <cmd> && cd ..

that allows a breakage in one part of the test script to leave the
entire test process in an unexpected place.  We can do this more
concisely with "git -C <dir> <cmd>" with modern Git.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 79cdd34a54..915eb4a7c6 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -69,10 +69,7 @@ create_lib_submodule_repo () {
 
 		git checkout -b "replace_sub1_with_directory" "add_sub1" &&
 		git submodule update &&
-		(
-			cd sub1 &&
-			git checkout modifications
-		) &&
+		git -C sub1 checkout modifications &&
 		git rm --cached sub1 &&
 		rm sub1/.git* &&
 		git config -f .gitmodules --remove-section "submodule.sub1" &&

^ permalink raw reply related

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
From: Junio C Hamano @ 2017-01-13 19:51 UTC (permalink / raw)
  To: René Scharfe; +Cc: Vegard Nossum, git
In-Reply-To: <e55dc4dd-768b-8c9b-e3b2-e850d5d521f5@web.de>

René Scharfe <l.s.r@web.de> writes:

> Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
>> When using -W to include the whole function in the diff context, you
>> are typically doing this to be able to review the change in its entirety
>> within the context of the function. It is therefore almost always
>> desirable to include any comments that immediately precede the function.
>>
>> This also the fixes the case for C where the declaration is split across
>> multiple lines (where the first line of the declaration would not be
>> included in the output), e.g.:
>>
>> 	void
>> 	dummy(void)
>> 	{
>> 		...
>> 	}
>>
>
> That's true, but I'm not sure "non-empty line before function line" is
> good enough a definition for desirable lines.  It wouldn't work for
> people who don't believe in empty lines.  Or for those that put a
> blank line between comment and function.  (I have an opinion on such
> habits, but git diff should probably stay neutral.)  And that's just
> for C code; I have no idea how this heuristic would hold up for other
> file types like HTML.

As you are, I am fairly negative on the heuristic based on the
"non-blank" thing.  We tried once with compaction-heuristics already
and it did not quite perform well.  Let's not hardcode another one.

> We can identify function lines with arbitrary precision (with a
> xfuncname regex, if needed), but there is no accurate way to classify
> lines as comments, or as the end of functions.  Adding optional
> regexes for single- and multi-line comments would help, at least for
> C.

The funcline regexp is used for two related but different purposes.
It identifies a single line to be placed on @@ ... @@ line before a
diff hunk.  This line however does not have to be at the beginning
of a function.  It has to be the line that conveys the most
significant information (e.g. the name of the function).

The way "diff -W" codepath used it as if it were always the very
first line of a function was bound to invite a patch like this, and
if we want to be extra elaborate, I agree that an extra mechanism to
say "the line the funcline regexp matches is not the beginning of a
function, but the beginning is a line that matches this other regexp
before that line" may help.

Do we really want to be that elaborate, though?  I dunno.

I wonder if it would be sufficient to make -W take an optional
number, e.g. "git show -W4", to add extre context lines before the
funcline.

^ permalink raw reply

* [PATCH] submodule update: run custom update script for initial populating as well
From: Stefan Beller @ 2017-01-13 19:43 UTC (permalink / raw)
  To: bmwill, gitster, judge.packham, olsonse; +Cc: git, Stefan Beller

In 1b4735d9f3 (submodule: no [--merge|--rebase] when newly cloned,
2011-02-17), all actions were defaulted to checkout for populating
a submodule initially, because merging or rebasing makes no sense
in that situation.

Other commands however do make sense, such as the custom command
that was added later (6cb5728c43, submodule update: allow custom
command to update submodule working tree, 2013-07-03).

I am unsure about the "none" command, as I can see an initial
checkout there as a useful thing. On the other hand going strictly
by our own documentation, we should do nothing in case of "none"
as well, because the user asked for it.

Reported-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 git-submodule.sh            |  7 ++++++-
 t/t7406-submodule-update.sh | 15 +++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 554bd1c494..aeb721ab7e 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -606,7 +606,12 @@ cmd_update()
 		if test $just_cloned -eq 1
 		then
 			subsha1=
-			update_module=checkout
+			if test "$update_module" = "merge" ||
+			   test "$update_module" = "rebase" ||
+			   test "$update_module" = "none"
+			then
+				update_module=checkout
+			fi
 		else
 			subsha1=$(sanitize_submodule_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 64f322c4cc..1407fa6098 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -424,6 +424,19 @@ test_expect_success 'submodule update - command in .git/config catches failure -
 	test_i18ncmp actual expect
 '
 
+test_expect_success 'submodule update - command run for initial population of submodule' '
+	cat <<-\ EOF >expect
+	Execution of '\''false $submodulesha1'\'' failed in submodule path '\''submodule'\''
+	EOF
+	(
+		cd super &&
+		rm -rf submodule
+		test_must_fail git submodule update >../actual
+	)
+	test_cmp expect actual
+	git -C super submodule update --checkout
+'
+
 cat << EOF >expect
 Execution of 'false $submodulesha1' failed in submodule path '../super/submodule'
 Failed to recurse into submodule path '../super'
@@ -476,6 +489,7 @@ test_expect_success 'submodule init picks up merge' '
 '
 
 test_expect_success 'submodule update --merge  - ignores --merge  for new submodules' '
+	test_config -C super submodule.submodule.update checkout &&
 	(cd super &&
 	 rm -rf submodule &&
 	 git submodule update submodule &&
@@ -488,6 +502,7 @@ test_expect_success 'submodule update --merge  - ignores --merge  for new submod
 '
 
 test_expect_success 'submodule update --rebase - ignores --rebase for new submodules' '
+	test_config -C super submodule.submodule.update checkout &&
 	(cd super &&
 	 rm -rf submodule &&
 	 git submodule update submodule &&
-- 
2.11.0.300.g08194d1431.dirty


^ permalink raw reply related

* Re: [PATCH] Documentation/bisect: improve on (bad|new) and (good|bad)
From: Junio C Hamano @ 2017-01-13 19:14 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Manuel Ullmann, Matthieu Moy, Christian Couder
In-Reply-To: <20170113144405.3963-1-chriscool@tuxfamily.org>

Christian Couder <christian.couder@gmail.com> writes:

> The following part of the description:
>
> git bisect (bad|new) [<rev>]
> git bisect (good|old) [<rev>...]
>
> may be a bit confusing, as a reader may wonder if instead it should be:
>
> git bisect (bad|good) [<rev>]
> git bisect (old|new) [<rev>...]
>
> Of course the difference between "[<rev>]" and "[<rev>...]" should hint
> that there is a good reason for the way it is.
>
> But we can further clarify and complete the description by adding
> "<term-new>" and "<term-old>" to the "bad|new" and "good|old"
> alternatives.
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  Documentation/git-bisect.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks.  The patch looks good.

A related tangent.  

Last night, I was trying to think if there is a fundamental reason
why "bad/new/term-new" cannot take more than one <rev>s on the newer
side of the bisection, and couldn't quite think of any before
falling asleep.

Currently we keep track of a single bisect/bad, while marking all the
revs given as good previously as bisect/good-<SHA-1>.

Because the next "bad" is typically chosen from the region of the
commit DAG that is bounded by bad and good commits, i.e. "rev-list
bisect/bad --not bisect/good-*", the current bisect/bad will always
be an ancestor of all bad commits that used to be bisect/bad, and
keeping previous bisect/bad as bisect/bad-<SHA-1> won't change the
region of the commit DAG yet to be explored.

As a reason why we need to use only a single bisect/bad, the above
description is understandable.  But as a reason why we cannot have
more than one, it is tautological.  It merely says "if we start from
only one and dig history to find older culprit, we need only one
bad".

I fell asleep last night without thinking further than that.

I think the answer to the question "why do we think we need a single
bisect/bad?" is "because bisection is about assuming that there is
only one commit that flips the tree state from 'old' to 'new' and
finding that single commit".  That would mean that even if we had
bisect/bad-A and bisect/bad-B, e.g.

                          o---o---o---bad-A
                         /
    -----Good---o---o---o
                         \
                          o---o---o---bad-B


where 'o' are all commits whose goodness is not yet known, because
bisection is valid only when we are hunting for a single commit that
flips the state from good to bad, that commit MUST be at or before
the merge base of bad-A and bad-B.  So even if we allowed

	$ git bisect bad bad-A bad-B

on the command line, we won't have to set bisect/bad-A and
bisect/bad-B.  We only need a single bisect/bad that points at the
merge base of these two.

But what if bad-A and bad-B have more than one merge bases?  We
won't know which side the badness came from.

                          o---o---o---bad-A
                         /     \ / 
    -----Good---o---o---o       / 
                         \     / \
                          o---o---o---bad-B

Being able to bisect the region of DAG bound by "^Good bad-A bad-B"
may have value in such a case.  I dunno.


^ permalink raw reply

* [PATCH] lib-submodule-update.sh: reduce use of subshell by using git -C <dir>
From: Stefan Beller @ 2017-01-13 19:03 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller
In-Reply-To: <xmqqtw92hkgc.fsf@gitster.mtv.corp.google.com>

In modern Git we prefer
    "git -C <cmd>"
over
    "(cd <somewhere && git <cmd>)"
as it doesn't need an extra shell.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/lib-submodule-update.sh | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 79cdd34a54..915eb4a7c6 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -69,10 +69,7 @@ create_lib_submodule_repo () {
 
 		git checkout -b "replace_sub1_with_directory" "add_sub1" &&
 		git submodule update &&
-		(
-			cd sub1 &&
-			git checkout modifications
-		) &&
+		git -C sub1 checkout modifications &&
 		git rm --cached sub1 &&
 		rm sub1/.git* &&
 		git config -f .gitmodules --remove-section "submodule.sub1" &&
-- 
2.11.0.300.g08194d1431


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox