Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v3 07/14] checkout: split into switch-branch and restore-files
From: Junio C Hamano @ 2018-12-05  2:25 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Elijah Newren, Ævar Arnfjörð Bjarmason,
	Git Mailing List, Stefan Beller, Thomas Gummerer, Stefan Xenos
In-Reply-To: <CACsJy8D9Rgsf-E6yweQxpopFaOVZ1bgihEbg200yS1gup+Gt7Q@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

> On Tue, Dec 4, 2018 at 6:43 PM Elijah Newren <newren@gmail.com> wrote:
>> > > > +--ours::
>> > > > +--theirs::
>> ...
>> go away.  Maybe it can still be fixed (I haven't dug too deeply into
>> it), but if so, the only fix needed here would be to remove this long
>> explanation about why the tool gets things totally backward.
>
> Aha. I' not really deep in this merge business to know if stages 2 and
> 3 can be swapped. This is right up your alley. I'll just leave it to
> you.

Please don't show stage#2 and stage#3 swapped to the end user,
unless that is protected behind an option (not per-repo config).
It is pretty much ingrained that stage#2 is what came from the
commit that will become the first parent of the commit being
prepared, and changing it without an explicit command line option
will break tools.

> I'm actually still not sure how to move it here (I guess 'here' is
> restore-files since we won't move HEAD). All the --mixed, --merge and
> --hard are confusing. But maybe we could just make 'git restore-files
> --from HEAD -f :/" behave just like "git reset --hard HEAD" (but with
> some safety net) But we can leave it for discussion in the next round.

Perhaps you two should pay a bit closer attention to what Thomas
Gummerer is working on.  I've touched above in my earlier comments,
too, e.g.  <xmqqefb3mhrs.fsf@gitster-ct.c.googlers.com>

^ permalink raw reply

* Re: [ANNOUNCE] Git v2.20.0-rc2
From: Stefan Beller @ 2018-12-05  2:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, git-packagers, Jeff King
In-Reply-To: <87in09ytd2.fsf@evledraar.gmail.com>

-cc linux list


> Perhaps we should note this more prominently, and since Brandon isn't at
> Google anymore can some of you working there edit this post? It's the
> first Google result for "git protocol v2", so it's going to be quite
> confusing for people if after 2.20 the instructions in it no longer
> work.

Thanks for pointing this out, we missed the implications when that patch was
discussed. Looking into it.
>
> 1. https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html

^ permalink raw reply

* Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
From: Junio C Hamano @ 2018-12-05  3:10 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy
In-Reply-To: <20181129002756.167615-1-sbeller@google.com>

Stefan Beller <sbeller@google.com> writes:

> This is a resend of sb/submodule-recursive-fetch-gets-the-tip,
> with all feedback addressed. As it took some time, I'll send it
> without range-diff, but would ask for full review.

Is that a "resend" or reroll/update (or whatever word that does not
imply "just sending the same thing again")?

FWIW, here is the range diff between 104f939f27..@{-1} and master..
after replacing the topic with this round.


 3:  304b2dab29 !  3:  08a297bd49 submodule.c: sort changed_submodule_names before searching it
    @@ -28,7 +28,7 @@
     @@
      	/* default value, "--submodule-prefix" and its value are added later */
      
    - 	calculate_changed_submodule_paths();
    + 	calculate_changed_submodule_paths(r);
     +	string_list_sort(&changed_submodule_names);
      	run_processes_parallel(max_parallel_jobs,
      			       get_next_submodule,

Just the call nearby in the context has become repository-aware; no
change in this series.

 4:  f7345dad6d !  4:  16dd6fe133 submodule.c: tighten scope of changed_submodule_names struct
...
    ++	calculate_changed_submodule_paths(r, &spf.changed_submodule_names);
     +	string_list_sort(&spf.changed_submodule_names);
      	run_processes_parallel(max_parallel_jobs,
      			       get_next_submodule,

I do recall having to do these adjustments while merging, so not
having to do so anymore with rebasing is a welcome change ;-)

 5:  5613d81d1e !  5:  bcd7337243 submodule: store OIDs in changed_submodule_names
...
Likewise.

 7:  e2419f7e30 !  7:  26f80ccfc1 submodule: migrate get_next_submodule to use repository structs
    @@ -4,7 +4,8 @@
     
         We used to recurse into submodules, even if they were broken having
         only an objects directory. The child process executed in the submodule
    -    would fail though if the submodule was broken.
    +    would fail though if the submodule was broken. This is tested via
    +    "fetching submodule into a broken repository" in t5526.
     
         This patch tightens the check upfront, such that we do not need
         to spawn a child process to find out if the submodule is broken.
    @@ -34,6 +35,7 @@
     +		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", sub->path);
     +		if (repo_init(ret, gitdir.buf, NULL)) {
     +			strbuf_release(&gitdir);
    ++			free(ret);
     +			return NULL;

Leakfix?  Good.

     +		}
     +		strbuf_release(&gitdir);
    @@ -75,11 +77,10 @@
     +		if (repo) {
      			child_process_init(cp);
     -			cp->dir = strbuf_detach(&submodule_path, NULL);
    - 			prepare_submodule_repo_env(&cp->env_array);
     +			cp->dir = xstrdup(repo->worktree);
    + 			prepare_submodule_repo_env(&cp->env_array);

Hmph, I offhand do not see there would be any difference if you
assigned to cp->dir before or after preparing the repo env, but is
there a reason these two must be done in this updated order that I
am missing?  Very similar changes appear multiple times in this
range-diff.

      			cp->git_cmd = 1;
      			if (!spf->quiet)
    - 				strbuf_addf(err, "Fetching submodule %s%s\n",
     @@
      			argv_array_push(&cp->args, default_argv);
      			argv_array_push(&cp->args, "--submodule-prefix");
    @@ -94,8 +95,12 @@
     +			 * the submodule is not initialized
     +			 */
     +			if (S_ISGITLINK(ce->ce_mode) &&
    -+			    !is_empty_dir(ce->name))
    -+				die(_("Could not access submodule '%s'"), ce->name);
    ++			    !is_empty_dir(ce->name)) {
    ++				spf->result = 1;
    ++				strbuf_addf(err,
    ++					    _("Could not access submodule '%s'"),
    ++					    ce->name);
    ++			}

OK, not dying but returning to the caller to handle the error.

 9:  7454fe5cb6 !  9:  04eb06607b fetch: try fetching submodules if needed objects were not fetched
    @@ -17,11 +17,6 @@
         Try fetching a submodule by object id if the object id that the
         superproject points to, cannot be found.
     
    -    The try does not happen when the "git fetch" done at the
    -    superproject is not storing the fetched results in remote
    -    tracking branches (i.e. instead just recording them to
    -    FETCH_HEAD) in this step. A later patch will fix this.
    -
         builtin/fetch used to only inspect submodules when they were fetched
         "on-demand", as in either on/off case it was clear whether the submodule
         needs to be fetched. However to know whether we need to try fetching the
    @@ -29,6 +24,10 @@
         function check_for_new_submodule_commits(), so we'll also run that code
         in case the submodule recursion is set to "on".
     
    +    The submodule checks were done only when a ref in the superproject
    +    changed, these checks were extended to also be performed when fetching
    +    into FETCH_HEAD for completeness, and add a test for that too.
    +

OK.

         Signed-off-by: Stefan Beller <sbeller@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
    @@ -41,30 +40,39 @@
      
...

The range-diff output for this step is unreadble for me, but the
code around this area does not seem to appear in the comparison
between the result of applying these directly to master and the
result of merging the previous round to master, so perhaps this is
just an indication that later follow-up fix has been squashed into
this step or something, which I shouldn't have to worry about.

      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
    @@ -73,8 +81,10 @@
      	int result;
      
      	struct string_list changed_submodule_names;
    -+	struct get_next_submodule_task **fetch_specific_oids;
    -+	int fetch_specific_oids_nr, fetch_specific_oids_alloc;
    ++
    ++	/* The submodules to fetch in */
    ++	struct fetch_task **oid_fetch_tasks;
    ++	int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;

OK.  The task struct has been renamed and the new name makes more
sense ("getting the next submodule" is less important than "what we
are going to do to that submodule").

...      
    -+struct get_next_submodule_task {
    ++struct fetch_task {
     +	struct repository *repo;
     +	const struct submodule *sub;
     +	unsigned free_sub : 1; /* Do we need to free the submodule? */
...
     +	return (const struct submodule *) ret;
     +}
     +
    -+static struct get_next_submodule_task *get_next_submodule_task_create(
    -+	struct repository *r, const char *path)
    ++static struct fetch_task *fetch_task_create(struct repository *r,
    ++					    const char *path)
     +{
    -+	struct get_next_submodule_task *task = xmalloc(sizeof(*task));
    ++	struct fetch_task *task = xmalloc(sizeof(*task));
     +	memset(task, 0, sizeof(*task));
     +
     +	task->sub = submodule_from_path(r, &null_oid, path);
     +	if (!task->sub) {
    -+		task->sub = get_default_submodule(path);
    ++		/*
    ++		 * No entry in .gitmodules? Technically not a submodule,
    ++		 * but historically we supported repositories that happen to be
    ++		 * in-place where a gitlink is. Keep supporting them.
    ++		 */
    ++		task->sub = get_non_gitmodules_submodule(path);
    ++		if (!task->sub) {
    ++			free(task);
    ++			return NULL;
    ++		}
    ++
     +		task->free_sub = 1;

OK.

    -+		if (!task->sub) {
    -+			free(task);
    +-		}
    ++		task = fetch_task_create(spf->r, ce->name);
    ++		if (!task)
     +			continue;
    - 		}

OK, so the code used to signal the need to work with the presense of
task->sub but now task's NULLness is used, so no need to free.

    @@ -231,24 +253,26 @@
     +			return 1;
      		} else {
     +
    -+			get_next_submodule_task_release(task);
    ++			fetch_task_release(task);
     +			free(task);
     +

OK.

    -+		/* NEEDSWORK: have get_default_remote from s--h */
    ++		/* NEEDSWORK: have get_default_remote from submodule--helper */

;-)


^ permalink raw reply

* Re: How de-duplicate similar repositories with alternates
From: Junio C Hamano @ 2018-12-05  3:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, git, Git for human beings, Christian Couder
In-Reply-To: <87tvjtvah0.fsf@evledraar.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I think a warning (or even error) like this would be more useful:
>
>     test ! -d $objdir && error... # current behavior
>     test -d $objdir/objects && error "Did you mean $objdir/objects, silly?" # new error

If it is an error common enough, perhaps we could even DWIM it, I
guess, that is...

	if test ! -d $objdir
	then
		error
	elif test -d $objdir/objects/pack
	then
		possibly warn
		objdir=$objdir/objects
	fi

> I.e. I suspect I'm not the only one who's not read the documentation
> carefully enough and thought it was a path to the root of the repo and
> wondered why it silently didn't work.

^ permalink raw reply

* Re: [PATCH 2/3] sha1-file: emit error if an alternate looks like a repository
From: Junio C Hamano @ 2018-12-05  3:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Derrick Stolee, Christian Couder
In-Reply-To: <20181204132716.19208-3-avarab@gmail.com>

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Since 26125f6b9b ("detect broken alternates.", 2006-02-22) we've
> emitted an error if the alternates directory doesn't exist, but not
> for the common misstep of adding a path to another git repository as
> an alternate, as opposed to its "objects" directory.
>
> Let's check for this, i.e. whether X/objects or X/.git/objects exists
> if the user supplies X and print an error (which as a commit leading
> up to this one shows doesn't change the exit code, just "warns").

I agree that "Let's check for this" is a good idea, but do not
necessarily agree with "i.e.".  Don't we have a helper that takes
the path to an existing directory and answers "Yup, it does look
like a Git repository"?  Using that is a lot more in line with what
you claimed to do in the title for this patch.

I haven't read 3/3 yet, but as I said, I suspect it is reasonable to
DWIM and use the object store associated with the directory we found
to be a repository.

^ permalink raw reply

* Re: [PATCH 3/3] sha1-file: change alternate "error:" message to "warning:"
From: Junio C Hamano @ 2018-12-05  3:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Derrick Stolee, Christian Couder
In-Reply-To: <20181204132716.19208-4-avarab@gmail.com>

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change the "error" message emitted by alt_odb_usable() to be a
> "warning" instead. As noted in commits leading up to this one this has
> never impacted the exit code ever since the check was initially added
> in 26125f6b9b ("detect broken alternates.", 2006-02-22).
>
> It's confusing to emit an "error" when e.g. "git fsck" will exit with
> 0, so let's emit a "warning:" instead.

OK, that sounds sensible.  An alternative that may be more sensible
is to actually diagnose this as an error, but the purpose of this
message is to help users diagnose a possible misconfiguration and
keeping the repository "working" with the remaining set of object
stores by leaving it as a mere warning, like this patch does, is
probably a better approach.

^ permalink raw reply

* Re: [PATCH] rebase docs: fix incorrect format of the section Behavioral Differences
From: Junio C Hamano @ 2018-12-05  3:54 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, j6t
In-Reply-To: <20181204231709.13824-1-newren@gmail.com>

Elijah Newren <newren@gmail.com> writes:

> Gah, when I was rebasing on your patch I adopted your sentence rewrite
> but forgot to remove the "sometimes".  Thanks for catching; correction:

>
> -- 8< --
> Subject: [PATCH v2] git-rebase.txt: update note about directory rename
>  detection and am
>
> In commit 6aba117d5cf7 ("am: avoid directory rename detection when
> calling recursive merge machinery", 2018-08-29), the git-rebase manpage
> probably should have also been updated to note the stronger
> incompatibility between git-am and directory rename detection.  Update
> it now.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  Documentation/git-rebase.txt | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 41631df6e4..ef76cccf3f 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -569,8 +569,12 @@ it to keep commits that started empty.
>  Directory rename detection
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> -The merge and interactive backends work fine with
> -directory rename detection.  The am backend sometimes does not.
> +The merge and interactive backends work fine with directory rename

I am not sure "work fine" a fair and correct label, as rename is
always heuristic.  

    The "directory rename detection" heuristic in "merge" and the
    "interactive" backends can take what happened to paths in the
    same directory into account when deciding if a disappeared path
    was "renamed" and to which other path.  The heuristic produces
    incorrect result when the information given is only about
    changed paths, which is why it is disabled when using the "am"
    backend.

perhaps.

^ permalink raw reply

* Re: [ANNOUNCE] Git v2.20.0-rc2
From: Jeff King @ 2018-12-05  4:05 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
	git-packagers
In-Reply-To: <CAGZ79kZPXoST3Jfmd06ALV3BGX+yd5rYKUhVkWpHmj94Kit-wQ@mail.gmail.com>

On Tue, Dec 04, 2018 at 06:48:22PM -0800, Stefan Beller wrote:

> > Perhaps we should note this more prominently, and since Brandon isn't at
> > Google anymore can some of you working there edit this post? It's the
> > first Google result for "git protocol v2", so it's going to be quite
> > confusing for people if after 2.20 the instructions in it no longer
> > work.
> 
> Thanks for pointing this out, we missed the implications when that patch was
> discussed. Looking into it.

I think if it just does "ls-remote --heads", that would still prove the
point (it would omit all of the tags).

-Peff

^ permalink raw reply

* Re: [WIP RFC 1/5] Documentation: order protocol v2 sections
From: Junio C Hamano @ 2018-12-05  4:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <061dacf51763bd2f72b95c382e08f44655e247a7.1543879256.git.jonathantanmy@google.com>

Jonathan Tan <jonathantanmy@google.com> writes:

> The git command line expects Git servers to follow a specific order of

"Command line"?  It sounds like you are talking about the order of
command line arguments and options, but apparently that is not what
you are doing.  Is it "The git over-the-wire protocol"?

> +    output = acknowledgements flush-pkt |
> +	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
> +	     [wanted-refs delim-pkt] packfile flush-pkt

So the output can be either 

 - acks followed by flush (and nothing else) or

 - (possibly) acks, followed by (possibly) shallow, followed by
   (possibly) wanted-refs, followed by the pack stream and flush at
   the end.

> @@ -335,9 +335,10 @@ header.
>  	       *PKT-LINE(%x01-03 *%x00-ff)
>  
>      acknowledgments section
> -	* If the client determines that it is finished with negotiations
> -	  by sending a "done" line, the acknowledgments sections MUST be
> -	  omitted from the server's response.
> +	* If the client determines that it is finished with negotiations by
> +	  sending a "done" line (thus requiring the server to send a packfile),
> +	  the acknowledgments sections MUST be omitted from the server's
> +	  response.

OK.  

>  	* Always begins with the section header "acknowledgments"
>  
> @@ -388,9 +389,6 @@ header.
>  	  which the client has not indicated was shallow as a part of
>  	  its request.
>  
> -	* This section is only included if a packfile section is also
> -	  included in the response.
> -

Earlier, we said that shallow-info is not given when packfile is not
there.  That is captured in the updated EBNF above.  We don't have a
corresponding removal of a bullet point for wanted-refs section below
but probably that is because the original did not have corresponding
bullet point to begin with.

>      wanted-refs section
>  	* This section is only included if the client has requested a
>  	  ref using a 'want-ref' line and if a packfile section is also

^ permalink raw reply

* Re: [PATCH v3 07/14] checkout: split into switch-branch and restore-files
From: Elijah Newren @ 2018-12-05  4:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc, Ævar Arnfjörð,
	Git Mailing List, Stefan Beller, Thomas Gummerer, sxenos
In-Reply-To: <xmqqtvjsen3r.fsf@gitster-ct.c.googlers.com>

On Tue, Dec 4, 2018 at 6:14 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Duy Nguyen <pclouds@gmail.com> writes:
>
> >> My single biggest worry about this whole series is that I'm worried
> >> you're perpetuating and even further ingraining one of the biggest
> >> usability problems with checkout: people suggest and use it for
> >> reverting/restoring paths to a previous version, but it doesn't do
> >> that:
> >
> > ...
> >
> >  git restore-files --from=master~10 Documentation/
>
> The "single biggest worry" could be due to Elijah not being aware of
> other recent discussions.  My understanding of the plan is
>
>  - "git checkout" will learn a new "--[no-]overlay" option, where
>    the current behaviour, i.e. "take paths in master~10 that match
>    pathspec Documentation/, and overlay them on top of what is in
>    the index and the working tree", is explained as "the overlay
>    mode" and stays to be the default.  With "checkout --no-overlay
>    master~10 Documentation/", the command will become "replace paths
>    in the current index and the working tree that match the pathspec
>    Documentation/ with paths in master~10 that match pathspec
>    Documentation/".
>
>  - "git restore-files --from=<tree> <pathspec>" by default will use
>    "--no-overlay" semantics, but the users can still use "--overlay"
>    from the command line as an option.
>
> So "restore-files" would become truly "restore the state of
> Documentation/ to match that of master~10", I would think.

Oh, sweet, that's awesome.  I saw some of the other emails as I was
scanning through and looked at the --overlay stuff since it sounded
relevant but something in the explanation made me think it was about
whether the command would write to the index or the working tree,
rather than being about adding+overwriting vs. just overwriting.

> >> Also, the fact that we're trying to make a simpler command makes me
> >> think that removing the auto-vivify behavior from the default and
> >> adding a simple flag which users can pass to request will allow this
> >> part of the documentation to be hidden behind the appropriate flag,
> >> which may make it easier for users to compartmentalize the command and
> >> it's options, enabling them to learn as they go.
> >
> > Sounds good. I don't know a good name for this new option though so
> > unless anybody comes up with some suggestion, I'll just disable
> > checkout.defaultRemote in switch-branch. If it comes back as a new
> > option, it can always be added later.
>
> Are you two discussing the "checkout --guess" option?  I am somewhat
> lost here.

Generally what was being discussed was just that this manpage was
rather complicated for the standard base-case due to the need to
explain all the behavior associated with --guess since that option is
on by default in checkout.  And since --guess is controlled by
checkout.defaultRemote, that was part of the extra complexity that had
to be learned in the basic explanation, rather than letting users
learn it when they learn a new flag.

The critical part you were missing was part of the original text just
before the quoted part was:

>>> So switch-branch will be controlled by checkout.* config variables?
>>> That probably makes the most sense, but it does dilute the advantage
>>> of adding these simpler commands.

Duy is responding to that even if it wasn't included in his quoting.

> >> > +-f::
> >> > +--force::
> >> > +       Proceed even if the index or the working tree differs from
> >> > +       HEAD.  This is used to throw away local changes.
> >>
> >> Haven't thought through this thoroughly, but do we really need an
> >> option for that instead of telling users to 'git reset --hard HEAD'
> >> before switching branches if they want their stuff thrown away?
> >
> > For me it's just a bit more convenient. Hit an error when switching
> > branch? Recall the command from bash history, stick -f in it and run.
> > Elsewhere I think both Junio and Thomas (or maybe only Junio) suggests
> > moving the "git reset" functionality without moving HEAD to one of
> > these commands, which goes the opposite direction...
>
> Isn't there a huge difference?  "checkout --force <other-branch>"
> needs to clobber only the changes that are involved in the switch,
> i.e. if your README.txt is the same between master and maint while
> Makefile is different, after editing both files while on master, you
> can not "switch-branch" to maint without doing something to Makefile
> (i.e. either discard your local change or wiggle your local change
> to the context of 'maint' with "checkout -m").  But you can carry
> the changes to README.txt while checking out 'maint' branch.
> Running "git reset --hard HEAD" would mean that you will lose the
> changes to README.txt as well.

Ah, indeed.  Thanks for pointing out what I missed here.

> >> > +--orphan <new_branch>::
> >> > +       Create a new 'orphan' branch, named <new_branch>, started from
> >> > +       <start_point> and switch to it.  The first commit made on this
> >>
> >> What??  started from <start_point>?  The whole point of --orphan is
> >> you have no parent, i.e. no start point.  Also, why does the
> >> explanation reference an argument that wasn't in the immediately
> >> preceding synopsis?
> >
> > I guess bad phrasing. It should be "switch to <start_point> first,
> > then prepare the worktree so that the first commit will have no
> > parent". Or something along that line.
>
> It should be a <tree-ish>, no?  It is not a "point" in history, but
> is "start with this tree".

Are you saying that referring to it as a tree will lessen the
confusion users face when they think it's a commit that serves as a
parent and get confused with the fact that this option is named
"--orphan"?  Or are you making some other orthogonal point here?

> I may have more comments on this message but that's it from me for
> now.

Fair enough.  Sorry if I've distracted from RC stuff with all my responses.

^ permalink raw reply

* Re: [PATCH v3 07/14] checkout: split into switch-branch and restore-files
From: Elijah Newren @ 2018-12-05  4:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc, Ævar Arnfjörð,
	Git Mailing List, Stefan Beller, Thomas Gummerer, sxenos
In-Reply-To: <xmqqpnugemks.fsf@gitster-ct.c.googlers.com>

On Tue, Dec 4, 2018 at 6:26 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Duy Nguyen <pclouds@gmail.com> writes:
>
> > On Tue, Dec 4, 2018 at 6:43 PM Elijah Newren <newren@gmail.com> wrote:
> >> > > > +--ours::
> >> > > > +--theirs::
> >> ...
> >> go away.  Maybe it can still be fixed (I haven't dug too deeply into
> >> it), but if so, the only fix needed here would be to remove this long
> >> explanation about why the tool gets things totally backward.
> >
> > Aha. I' not really deep in this merge business to know if stages 2 and
> > 3 can be swapped. This is right up your alley. I'll just leave it to
> > you.
>
> Please don't show stage#2 and stage#3 swapped to the end user,
> unless that is protected behind an option (not per-repo config).
> It is pretty much ingrained that stage#2 is what came from the
> commit that will become the first parent of the commit being
> prepared, and changing it without an explicit command line option
> will break tools.

What depends on stage#2 coming from the commit that will become the
first parent?  I wasn't thinking in terms of modifying
checkout/restore-files/diff/etc in a way that would make them show
things different than what was recorded in the index, I was rather
musing on whether it was feasible to have rebase tell the merge
machinery to treat HEAD as stage #3 and the other commit as stage #2
so that it was swapping what was actually recorded in the index.

I know the merge machinery implicitly assumes HEAD == stage #2 in
multiple places, and it'd obviously need a fair amount of fixing to
handle this.  I wasn't immediately aware of other things that would
break.  If you know of some, I'm happy to hear.  Otherwise, I might go
and learn the hard way (after I get around to the merge rewrite) why
my idea is crazy.  :-)

> > I'm actually still not sure how to move it here (I guess 'here' is
> > restore-files since we won't move HEAD). All the --mixed, --merge and
> > --hard are confusing. But maybe we could just make 'git restore-files
> > --from HEAD -f :/" behave just like "git reset --hard HEAD" (but with
> > some safety net) But we can leave it for discussion in the next round.
>
> Perhaps you two should pay a bit closer attention to what Thomas
> Gummerer is working on.  I've touched above in my earlier comments,
> too, e.g.  <xmqqefb3mhrs.fsf@gitster-ct.c.googlers.com>

Indeed, I'm excited about his changes now; I'll keep an eye out.

^ permalink raw reply

* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
From: Jeff King @ 2018-12-05  4:46 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, Geert Jansen,
	Junio C Hamano, git@vger.kernel.org, Takuto Ikuta
In-Reply-To: <fe388ba5-765e-ff83-e610-d40964a76a0c@web.de>

On Tue, Dec 04, 2018 at 10:45:13PM +0100, René Scharfe wrote:

> > The comment in the context there is warning callers to remember to load
> > the cache first. Now that we have individual caches, might it make sense
> > to change the interface a bit, and make these members private. I.e.,
> > something like:
> > 
> >   struct oid_array *odb_loose_cache(struct object_directory *odb,
> >                                     int subdir_nr) 
> >   {
> > 	if (!loose_objects_subdir_seen[subdir_nr])
> > 		odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
> > 
> > 	return &odb->loose_objects_cache[subdir_nr];
> >   }
> 
> Sure.  And it should take an object_id pointer instead of a subdir_nr --
> less duplication, nicer interface (patch below).

I had considered that initially, but it's a little less flexible if a
caller doesn't actually have an oid. Though both of the proposed callers
do, so it's probably OK to worry about that if and when it ever comes up
(the most plausible thing in my mind is doing some abbrev-like analysis
without looking to abbreviate a _particular_ oid).

> It would be nice if it could return a const pointer to discourage
> messing up the cache, but that's not compatible with oid_array_lookup().

Yeah.

> And quick_has_loose() should be converted to object_id as well -- adding
> a function that takes a SHA-1 is a regression. :D

I actually wrote it that way initially, but doing the hashcpy() in the
caller is a bit more awkward. My thought was to punt on that until the
rest of the surrounding code starts handling oids.

> ---
>  object-store.h |  8 ++++----
>  sha1-file.c    | 19 ++++++++-----------
>  sha1-name.c    |  4 +---
>  3 files changed, 13 insertions(+), 18 deletions(-)

This patch looks sane. How do you want to handle it? I'd assumed your
earlier one would be for applying on top, but this one doesn't have a
commit message. Did you want me to squash down the individual hunks?

-Peff

^ permalink raw reply

* Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
From: Jeff King @ 2018-12-05  4:54 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <20181204224238.50966-1-jonathantanmy@google.com>

On Tue, Dec 04, 2018 at 02:42:38PM -0800, Jonathan Tan wrote:

> diff --git a/revision.c b/revision.c
> index b5108b75ab..e7da2c57ab 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -212,7 +212,20 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
>  {
>  	struct object *object;
>  
> -	object = parse_object(revs->repo, oid);
> +	/*
> +	 * If the repository has commit graphs, repo_parse_commit() avoids
> +	 * reading the object buffer, so use it whenever possible.
> +	 */
> +	if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
> +		struct commit *c = lookup_commit(revs->repo, oid);
> +		if (!repo_parse_commit(revs->repo, c))
> +			object = (struct object *) c;
> +		else
> +			object = NULL;
> +	} else {
> +		object = parse_object(revs->repo, oid);
> +	}

This seems like a reasonable thing to do, but I have sort of a
meta-comment. In several places we've started doing this kind of "if
it's this type of object, do X, otherwise do Y" optimization (e.g.,
handling large blobs for streaming).

And in the many cases we end up doubling the effort to do object
lookups: here we do one lookup to get the type, and then if it's not a
commit (or if we don't have a commit graph) we end up parsing it anyway.

I wonder if we could do better. In this instance, it might make sense
to first see if we actually have a commit graph available (it might not
have this object, of course, but at least we'd expect it to have most
commits). In general, it would be nice if we had a more incremental API
for accessing objects: open, get metadata, then read the data. That
would make these kinds of optimizations "free".

I don't have numbers for how much the extra lookups cost. The lookups
are probably dwarfed by parse_object() in general, so even if we save
only a few full object loads, it may be a win. It just seems a shame
that we may be making the "slow" paths (when our type-specific check
doesn't match) even slower.

-Peff

^ permalink raw reply

* Re: [PATCH 1/3] test-lib: consolidate naming of test-results paths
From: Jeff King @ 2018-12-05  4:57 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git
In-Reply-To: <20181204163457.15717-2-szeder.dev@gmail.com>

On Tue, Dec 04, 2018 at 05:34:55PM +0100, SZEDER Gábor wrote:

> There are two places where we strip off any leading path components
> and the '.sh' suffix from the test script's pathname, and there are
> two places where we construct the filename of test output files in
> 't/test-results/'.  The last patch in this series will add even more.
> 
> Factor these out into helper variables to avoid repeating ourselves.

Makes sense.

> +TEST_NAME="$(basename "$0" .sh)"
> +TEST_RESULTS_BASE="$TEST_OUTPUT_DIRECTORY/test-results/$TEST_NAME"

Hmm, since we are building up this BASE variable anyway, why not:

  TEST_RESULTS_DIR=$TEST_OUTPUT_DIRECTORY/test-results
  TEST_RESULTS_BASE=$TEST_RESULTS_DIR/$TEST_NAME

? That saves having to run `dirname` on it later.

I guess one could argue that saying "the directory name of the file I'm
writing" is more readable. I just generally try to avoid extra
manipulation of the strings when possible (especially in shell).

-Peff

^ permalink raw reply

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
From: Junio C Hamano @ 2018-12-05  5:02 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <0461b362569362c6d0e73951469c547a03a1b59d.1543879256.git.jonathantanmy@google.com>

Jonathan Tan <jonathantanmy@google.com> writes:

> +This feature allows servers to serve part of their packfile response as URIs.
> +This allows server designs that improve scalability in bandwidth and CPU usage
> +(for example, by serving some data through a CDN), and (in the future) provides
> +some measure of resumability to clients.

Without reading the remainder, this makes readers anticipate a few
good things ;-)

 - "part of", so pre-generated constant material can be given from
   CDN and then followed-up by "filling the gaps" small packfile,
   perhaps?

 - The "part of" transmission may not bring the repository up to
   date wrt to the "want" objects; would this feature involve "you
   asked history up to these commits, but with this pack-uri, you'll
   be getting history up to these (somewhat stale) commits"?

Anyway, let's read on.

> +This feature is available only in protocol version 2.
> +
> +Protocol
> +--------
> +
> +The server advertises `packfile-uris`.
> +
> +If the client replies with the following arguments:
> +
> + * packfile-uris
> + * thin-pack
> + * ofs-delta

"with the following" meaning "with all of the following", or "with
any of the following"?  Is there a reason why the server side must
require that the client understands and is willing to accept a
thin-pack when wanting to use packfile-uris?  The same question for
the ofs-delta.

When the pregenerated constant material the server plans to hand out
the uris for was prepared by using ofs-delta encoding, the server
cannot give the uri to it when the client does not want ofs-delta
encoded packfile, but it feels somewhat strange that we require the
most capable client at the protocol level.  After all, the server
side could prepare one with ofs-delta and another without ofs-delta
and depending on what the client is capable of, hand out different
URIs, if it wanted to.

The reason why I care is because thin and ofs will *NOT* stay
forever be the only optional features of the pack format.  We may
invent yet another such optional 'frotz' feature, which may greatly
help the efficiency of the packfile encoding, hence it may be
preferrable to always generate a CDN packfile with that feature, in
addition to thin and ofs.  Would we add 'frotz' to the above list in
the documentation, then?  What would happen to existing servers and
clients written before that time then?

My recommendation is to drop the mention of "thin" and "ofs" from
the above list, and also from the following paragraph.  The "it MAY
send" will serve as a practical escape clause to allow a server/CDN
implementation that *ALWAYS* prepares pregenerated material that can
only be digested by clients that supports thin and ofs.  Such a server
can send packfile-URIs only when all of the three are given by the
client and be compliant.  And such an update to the proposed document
would allow a more diskful server to prepare both thin and non-thin
pregenerated packs and choose which one to give to the client depending
on the capability.

> +when the server sends the packfile, it MAY send a `packfile-uris` section
> +directly before the `packfile` section (right after `wanted-refs` if it is
> +sent) containing HTTP(S) URIs. See protocol-v2.txt for the documentation of
> +this section.

So, this is OK, but

> +Clients then should understand that the returned packfile could be incomplete,
> +and that it needs to download all the given URIs before the fetch or clone is
> +complete. Each URI should point to a Git packfile (which may be a thin pack and
> +which may contain offset deltas).

weaken or remove the (parenthetical comment) in the last sentence,
and replace the beginning of the section with something like

	If the client replies with 'packfile-uris', when the server
	sends the packfile, it MAY send a `packfile-uris` section...

You may steal what I wrote in the above response to help the
server-side folks to decide how to actually implement the "it MAY
send a packfile-uris" part in the document.

> +Server design
> +-------------
> +
> +The server can be trivially made compatible with the proposed protocol by
> +having it advertise `packfile-uris`, tolerating the client sending
> +`packfile-uris`, and never sending any `packfile-uris` section. But we should
> +include some sort of non-trivial implementation in the Minimum Viable Product,
> +at least so that we can test the client.
> +
> +This is the implementation: a feature, marked experimental, that allows the
> +server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
> +<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
> +with the given sha1 can be replaced by the given URI. This allows, for example,
> +servers to delegate serving of large blobs to CDNs.

;-)

> +Client design
> +-------------
> +
> +While fetching, the client needs to remember the list of URIs and cannot
> +declare that the fetch is complete until all URIs have been downloaded as
> +packfiles.
> +
> +The division of work (initial fetch + additional URIs) introduces convenient
> +points for resumption of an interrupted clone - such resumption can be done
> +after the Minimum Viable Product (see "Future work").
> +
> +The client can inhibit this feature (i.e. refrain from sending the
> +`packfile-urls` parameter) by passing --no-packfile-urls to `git fetch`.

OK, this comes back to what I alluded to at the beginning.  We could
respond to a full-clone request by feeding a series of packfile-uris
and some ref information, perhaps like this:

	* Grab this packfile and update your remote-tracking refs
          and tags to these values; you'd be as if you cloned the
          project when it was at v1.0.

	* When you are done with the above, grab this packfile and
          update your remote-tracking refs and tags to these values;
          you'd be as if you cloned the project when it was at v2.0.

	* When you are done with the above, grab this packfile and
          update your remote-tracking refs and tags to these values;
          you'd be as if you cloned the project when it was at v3.0.

	...

	* When you are done with the above, here is the remaining
          packdata to bring you fully up to date with your original
          "want"s.

and before fully reading the proposal, I anticipated that it was
what you were going to describe.  The major difference is "up to the
packdata given to you so far, you'd be as if you fetched these" ref
information, which would allow you to be interrupted and then simply
resume, without having to remember the set of packfile-uris yet to
be processed across a fetch/clone failure.  If you sucessfully fetch
packfile for ..v1.0, you can update the remote-tracking refs to
match as if you fetched back when that was the most recent state of
the project, and then if you failed while transferring packfile for
v1.0..v2.0, the resuming would just reissue "git fetch" internally.

I think what you proposed, i.e. without the "with the data up to
this packfile, you have history to these objects", would also work,
even though it requires us to remember more of what we learned
during the initial attempt throughout retrying failed transfers.

> +Future work
> +-----------
> +
> +The protocol design allows some evolution of the server and client without any
> +need for protocol changes, so only a small-scoped design is included here to
> +form the MVP. For example, the following can be done:
> +
> + * On the server, a long-running process that takes in entire requests and
> +   outputs a list of URIs and the corresponding inclusion and exclusion sets of
> +   objects. This allows, e.g., signed URIs to be used and packfiles for common
> +   requests to be cached.
> + * On the client, resumption of clone. If a clone is interrupted, information
> +   could be recorded in the repository's config and a "clone-resume" command
> +   can resume the clone in progress. (Resumption of subsequent fetches is more
> +   difficult because that must deal with the user wanting to use the repository
> +   even after the fetch was interrupted.)
> +
> +There are some possible features that will require a change in protocol:
> +
> + * Additional HTTP headers (e.g. authentication)
> + * Byte range support
> + * Different file formats referenced by URIs (e.g. raw object)
> +
> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> index 345c00e08c..2cb1c41742 100644
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -313,7 +313,8 @@ header. Most sections are sent only when the packfile is sent.
>  
>      output = acknowledgements flush-pkt |
>  	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
> -	     [wanted-refs delim-pkt] packfile flush-pkt
> +	     [wanted-refs delim-pkt] [packfile-uris delim-pkt]
> +	     packfile flush-pkt
>  
>      acknowledgments = PKT-LINE("acknowledgments" LF)
>  		      (nak | *ack)
> @@ -331,6 +332,9 @@ header. Most sections are sent only when the packfile is sent.
>  		  *PKT-LINE(wanted-ref LF)
>      wanted-ref = obj-id SP refname
>  
> +    packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
> +    packfile-uri = PKT-LINE("uri" SP *%x20-ff LF)
> +
>      packfile = PKT-LINE("packfile" LF)
>  	       *PKT-LINE(%x01-03 *%x00-ff)

^ permalink raw reply

* Re: [PATCH 2/3] test-lib-functions: introduce the 'test_set_port' helper function
From: Jeff King @ 2018-12-05  5:17 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git
In-Reply-To: <20181204163457.15717-3-szeder.dev@gmail.com>

On Tue, Dec 04, 2018 at 05:34:56PM +0100, SZEDER Gábor wrote:

> Several test scripts run daemons like 'git-daemon' or Apache, and
> communicate with them through TCP sockets.  To have unique ports where
> these daemons are accessible, the ports are usually the number of the
> corresponding test scripts, unless the user overrides them via
> environment variables, and thus all those tests and test libs contain
> more or less the same bit of one-liner boilerplate code to find out
> the port.  The last patch in this series will make this a bit more
> complicated.
> 
> Factor out finding the port for a daemon into the common helper
> function 'test_set_port' to avoid repeating ourselves.

OK. Looks like this should keep the same port numbers for all of the
existing tests, which I think is good. As nice as choosing random high
port numbers might be for some cases, it can also be confusing when
there are random conflicts.

I've also run into non-random conflicts, but at least once you figure
them out they're predictable (the most confusing I've seen is that adb,
the android debugger, sometimes but not always leaves a daemon hanging
around on port 5561).

> Take special care of test scripts with "low" numbers:
> 
>   - Test numbers below 1024 would result in a port that's only usable
>     as root, so set their port to '10000 + test-nr' to make sure it
>     doesn't interfere with other tests in the test suite.  This makes
>     the hardcoded port number in 't0410-partial-clone.sh' unnecessary,
>     remove it.

This part is what made me think a bit more about conflicting with
dynamically allocated ports. Arguably the http parts of t0410 ought to
be in a much higher-numbered script, but I don't know that we've held
over the years very well to the idea that scripts should only depend on
the bits from lower numbered scripts.

> ---
>  t/lib-git-daemon.sh      |  2 +-
>  t/lib-git-p4.sh          |  9 +--------
>  t/lib-git-svn.sh         |  2 +-
>  t/lib-httpd.sh           |  2 +-
>  t/t0410-partial-clone.sh |  1 -
>  t/t5512-ls-remote.sh     |  2 +-
>  t/test-lib-functions.sh  | 36 ++++++++++++++++++++++++++++++++++++
>  7 files changed, 41 insertions(+), 13 deletions(-)

The patch itself looks good to me.

> +	eval port=\"\${$var}\"
> +	case "$port" in

The quotes in the eval'd variable assignment aren't necessary. Usually I
don't mind them in a simple:

  FOO="$bar"

even though they're redundant (and there were a few instances in the
prior patch, I think). But here the escaping makes it harder to read,
compared to:

  eval port=\$$var

It might be worth simplifying, but I don't feel strongly about it (we'd
run into problems if $var contained spaces with either variant, but I
don't think that's worth caring about).

-Peff

^ permalink raw reply

* Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
From: Jeff King @ 2018-12-05  5:44 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git
In-Reply-To: <20181204163457.15717-4-szeder.dev@gmail.com>

On Tue, Dec 04, 2018 at 05:34:57PM +0100, SZEDER Gábor wrote:

> To prevent the several parallel invocations of the same test from
> interfering with each other:
> 
>   - Include the parallel job's number in the name of the trash
>     directory and the various output files under 't/test-results/' as
>     a '.stress-<Nr>' suffix.

Makes sense.

>   - Add the parallel job's number to the port number specified by the
>     user or to the test number, so even tests involving daemons
>     listening on a TCP socket can be stressed.

In my script I just use an arbitrary sequence of ports. That means two
stress runs may stomp on each other, but stress runs tend not to
interfere with regular runs (whereas with your scheme, a stress run of
t5561 is going to stomp on t5562). It probably doesn't matter much
either way, as I tend not to do too much with the machine during a
stress run.

>   - Make '--stress' imply '--verbose-log' and discard the test's
>     standard ouput and error; dumping the output of several parallel
>     tests to the terminal would create a big ugly mess.

Makes sense. My script just redirects the output to a file, but it
predates --verbose-log.

My script always runs with "-x". I guess it's not that hard to add it if
you want, but it is annoying to finally get a failure after several
minutes only to realize that your are missing some important
information. ;)

Ditto for "-i", which leaves more readable output (you know the
interesting part is at the end of the log), and leaves a trash directory
state that is more likely to be useful for inspecting.

My script also implies "--root". That's not strictly necessary, though I
suspect it is much more likely to catch races when it's run on a ram
disk (simply because it leaves the CPU as the main bottleneck).

> 'wait' for all parallel jobs before exiting (either because a failure
> was found or because the user lost patience and aborted the stress
> test), allowing the still running tests to finish.  Otherwise the "OK
> X.Y" progress output from the last iteration would likely arrive after
> the user got back the shell prompt, interfering with typing in the
> next command.  OTOH, this waiting might induce a considerable delay
> between hitting ctrl-C and the test actually exiting; I'm not sure
> this is the right tradeoff.

If there is a delay, it's actually quite annoying to _not_ wait; you
start doing something in the terminal, and then a bunch of extra test
statuses print at a random time.

> +	job_nr=0
> +	while test $job_nr -lt "$job_count"
> +	do
> +		(
> +			GIT_TEST_STRESS_STARTED=done
> +			GIT_TEST_STRESS_JOB_NR=$job_nr
> +			export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR
> +
> +			cnt=0
> +			while ! test -e "$stressfail"
> +			do
> +				if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1
> +				then
> +					printf >&2 "OK   %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt

OK, this adds a counter for each job number (compared to my script).
Seems helpful.

> +				elif test -f "$stressfail" &&
> +				     test "$(cat "$stressfail")" = "aborted"
> +				then
> +					printf >&2 "ABORTED %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt
> +				else
> +					printf >&2 "FAIL %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt
> +					echo $GIT_TEST_STRESS_JOB_NR >>"$stressfail"
> +				fi

Hmm. What happens if we see a failure _and_ there's an abort? That's
actually pretty plausible if you see a failure whiz by, and you want to
abort the remaining scripts because they take a long time to run.

If the abort happens first, then the goal is to assume any errors are
due to the abort. But there's a race between the individual jobs reading
$stressfail and the parent signal handler writing it. So you may end up
with "aborted\n5" or similar (after which any other jobs would fail to
match "aborted" and declare themselves failures, as well).  I think my
script probably also has this race, too (it doesn't check the abort case
explicitly, but it would race in checking "test -e fail").

If the fail happens first (which I think is the more likely case), then
the abort handler will overwrite the file with "aborted", and we'll
forget that there was a real failure. This works in my script because of
the symlinking (if a real failure symlinked $fail to a directory, then
the attempt to write "aborted" will just be a noop).

> +	job_nr=0
> +	while test $job_nr -lt "$job_count"
> +	do
> +		wait
> +		job_nr=$(($job_nr + 1))
> +	done

Do we need to loop? Calling "wait" with no arguments should wait for all
children.

> +	if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted"
> +	then
> +		echo "Log(s) of failed test run(s) be found in:"
> +		for f in $(cat "$stressfail")
> +		do
> +			echo "  $TEST_RESULTS_BASE.stress-$f.out"
> +		done
> +	fi

In my experience, outputting the failed log saves a step (especially
with "-i"), since seeing the failed test and its output is often
sufficient.

I'm also sad to lose the symlink to the failed trash directory, which
saves cutting and pasting when you have to inspect it. But I guess we
can't rely on symlinks everywhere.

-Peff

^ permalink raw reply

* Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
From: Jeff King @ 2018-12-05  5:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: SZEDER Gábor, git
In-Reply-To: <87o9a1z0j5.fsf@evledraar.gmail.com>

On Tue, Dec 04, 2018 at 06:04:14PM +0100, Ævar Arnfjörð Bjarmason wrote:

> 
> On Tue, Dec 04 2018, SZEDER Gábor wrote:
> 
> > The number of parallel invocations is determined by, in order of
> > precedence: the number specified as '--stress=<N>', or the value of
> > the GIT_TEST_STRESS_LOAD environment variable, or twice the number of
> > available processors in '/proc/cpuinfo', or 8.
> 
> With this series we have at least 3 ways to get this number. First
> online_cpus() in the C code, then Peff's recent `getconf
> _NPROCESSORS_ONLN` in doc-diff, and now this /proc/cpuinfo parsing.

To be fair, I only added the "getconf" thing because somebody complained
that I was parsing /proc/cpuinfo, and suggested it instead. ;)

I don't think it's especially portable, but it should work on Linux and
modern BSD/macOS, which may be enough (unlike doc-diff, I'd be a little
more inclined to ask somebody on a random platform to stress test if
they're hitting a bug).

> Perhaps it makes sense to split online_cpus() into a helper to use from
> the shellscripts instead?

I think somebody proposed that in a recent thread for other reasons,
too. I'd be OK with that, but probably just using getconf is reasonable
in the meantime.

-Peff

^ permalink raw reply

* Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
From: Jeff King @ 2018-12-05  5:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: SZEDER Gábor, git
In-Reply-To: <87muplyxfn.fsf@evledraar.gmail.com>

On Tue, Dec 04, 2018 at 07:11:08PM +0100, Ævar Arnfjörð Bjarmason wrote:

> It's a frequent annoyance of mine in the test suite that I'm
> e.g. running t*.sh with some parallel "prove" in one screen, and then I
> run tABCD*.sh manually, and get unlucky because they use the same trash
> dir, and both tests go boom.
> 
> You can fix that with --root, which is much of what this patch does. My
> one-liner for doing --stress has been something like:
> 
>     perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh --root=trash-{} -v'
> 
> But it would be great if I didn't have to worry about that and could
> just run two concurrent:
> 
>     ./t0000-basic.sh
> 
> So I think we could just set some env variable where instead of having
> the predictable trash directory we have a $TRASHDIR.$N as this patch
> does, except we pick $N by locking some test-runs/tABCD.Nth file with
> flock() during the run.

That actually sounds kind of annoying when doing a single run, since now
you have this extra ".N". I guess it would at least be predictable, and
I tend to tab-complete the trash dirs anyway.

I accomplish the same thing by doing my "big" runs using --root
specified in config.mak (which points to a RAM disk -- if you're not
using one to run the tests, you really should look into it, as it's way
faster). And then for one-offs, investigating failures, etc, I do "cd t
&& ./t0000-basic.sh -v -i", which naturally runs in the local directory.

-Peff

^ permalink raw reply

* Re: [PATCH 3/3] sha1-file: change alternate "error:" message to "warning:"
From: Jeff King @ 2018-12-05  5:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Derrick Stolee,
	Christian Couder
In-Reply-To: <xmqqva48d4o9.fsf@gitster-ct.c.googlers.com>

On Wed, Dec 05, 2018 at 12:37:58PM +0900, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > Change the "error" message emitted by alt_odb_usable() to be a
> > "warning" instead. As noted in commits leading up to this one this has
> > never impacted the exit code ever since the check was initially added
> > in 26125f6b9b ("detect broken alternates.", 2006-02-22).
> >
> > It's confusing to emit an "error" when e.g. "git fsck" will exit with
> > 0, so let's emit a "warning:" instead.
> 
> OK, that sounds sensible.  An alternative that may be more sensible
> is to actually diagnose this as an error, but the purpose of this
> message is to help users diagnose a possible misconfiguration and
> keeping the repository "working" with the remaining set of object
> stores by leaving it as a mere warning, like this patch does, is
> probably a better approach.

Yeah, I think it's better to keep it as a warning. It's actually
reasonably likely to be benign (e.g., you did a "git repack -ad && rm
/path/to/alternate" to remove the dependency, but forgot to clean up the
alternates). And when it _is_ a problem, the object-reading code paths
will definitely let you know.

-Peff

^ permalink raw reply

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
From: Junio C Hamano @ 2018-12-05  5:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <xmqqbm60d0s1.fsf@gitster-ct.c.googlers.com>

Junio C Hamano <gitster@pobox.com> writes:

> So, this is OK, but
>
>> +Clients then should understand that the returned packfile could be incomplete,
>> +and that it needs to download all the given URIs before the fetch or clone is
>> +complete. Each URI should point to a Git packfile (which may be a thin pack and
>> +which may contain offset deltas).
>
> weaken or remove the (parenthetical comment) in the last sentence,
> and replace the beginning of the section with something like
>
> 	If the client replies with 'packfile-uris', when the server
> 	sends the packfile, it MAY send a `packfile-uris` section...
>
> You may steal what I wrote in the above response to help the
> server-side folks to decide how to actually implement the "it MAY
> send a packfile-uris" part in the document.

By the way, I do agree with the practical consideration the design
you described makes.  For a pregenerated pack that brings you from
v1.0 to v2.0, "thin" would roughly save the transfer of one full
checkout (compressed, of course), and "ofs" would also save several
bytes per object.  Compared to a single pack that delivers everything
the fetcher wants, concatenation of packs without "thin" to transfer
the same set of objects would cost quite a lot more.

And I do not think we should care too much about fetchers that lack
either thin or ofs, so I'd imagine that any client that ask for
packfile-uris would also advertise thin and ofs as well, so in
practice, a request with packfile-uris that lack thin or ofs would
be pretty rare and requiring all three and requiring only one would
not make much practical difference.  It's just that I think singling
out these two capabilities as hard requirements at the protocol
level is wrong.

^ permalink raw reply

* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
From: René Scharfe @ 2018-12-05  6:02 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason
  Cc: Geert Jansen, Junio C Hamano, git@vger.kernel.org, Takuto Ikuta
In-Reply-To: <20181205044645.GA12284@sigill.intra.peff.net>

Am 05.12.2018 um 05:46 schrieb Jeff King:
> On Tue, Dec 04, 2018 at 10:45:13PM +0100, René Scharfe wrote:
> 
>>> The comment in the context there is warning callers to remember to load
>>> the cache first. Now that we have individual caches, might it make sense
>>> to change the interface a bit, and make these members private. I.e.,
>>> something like:
>>>
>>>   struct oid_array *odb_loose_cache(struct object_directory *odb,
>>>                                     int subdir_nr) 
>>>   {
>>> 	if (!loose_objects_subdir_seen[subdir_nr])
>>> 		odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
>>>
>>> 	return &odb->loose_objects_cache[subdir_nr];
>>>   }
>>
>> Sure.  And it should take an object_id pointer instead of a subdir_nr --
>> less duplication, nicer interface (patch below).
> 
> I had considered that initially, but it's a little less flexible if a
> caller doesn't actually have an oid. Though both of the proposed callers
> do, so it's probably OK to worry about that if and when it ever comes up
> (the most plausible thing in my mind is doing some abbrev-like analysis
> without looking to abbreviate a _particular_ oid).

Right, let's focus on concrete requirements of current callers.  YAGNI..
:)

>> And quick_has_loose() should be converted to object_id as well -- adding
>> a function that takes a SHA-1 is a regression. :D
> 
> I actually wrote it that way initially, but doing the hashcpy() in the
> caller is a bit more awkward. My thought was to punt on that until the
> rest of the surrounding code starts handling oids.

The level of awkwardness is the same to me, but sha1_loose_object_info()
is much longer already, so it might feel worse to add it there.  This
function is easily converted to struct object_id, though, as its single
caller can pass one on -- this makes the copy unnecessary.

> This patch looks sane. How do you want to handle it? I'd assumed your
> earlier one would be for applying on top, but this one doesn't have a
> commit message. Did you want me to squash down the individual hunks?

I'm waiting for the first one (object-store: use one oid_array per
subdirectory for loose cache) to be accepted, as it aims to solve a
user-visible performance regression, i.e. that's where the meat is.
I'm particularly interested in performance numbers from Ævar for it.

I can send the last one properly later, and add patches for converting
quick_has_loose() to struct object_id.  Those are just cosmetic..

René

^ permalink raw reply

* Re: [PATCH 2/3] sha1-file: emit error if an alternate looks like a repository
From: Jeff King @ 2018-12-05  6:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Derrick Stolee,
	Christian Couder
In-Reply-To: <xmqqzhtkd4sz.fsf@gitster-ct.c.googlers.com>

On Wed, Dec 05, 2018 at 12:35:08PM +0900, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > Since 26125f6b9b ("detect broken alternates.", 2006-02-22) we've
> > emitted an error if the alternates directory doesn't exist, but not
> > for the common misstep of adding a path to another git repository as
> > an alternate, as opposed to its "objects" directory.
> >
> > Let's check for this, i.e. whether X/objects or X/.git/objects exists
> > if the user supplies X and print an error (which as a commit leading
> > up to this one shows doesn't change the exit code, just "warns").
> 
> I agree that "Let's check for this" is a good idea, but do not
> necessarily agree with "i.e.".  Don't we have a helper that takes
> the path to an existing directory and answers "Yup, it does look
> like a Git repository"?  Using that is a lot more in line with what
> you claimed to do in the title for this patch.

Hmm. Yeah, one case this does not handle is when ".git" is a git-file
pointing elsewhere, which should trigger the condition, too.

I think we can afford to be a bit loose with this check if it's just
generating a warning for a case that would otherwise not work (and the
worst is that we might fail to correctly diagnose a broken setup). But
that I think points to another issue: this kicks in even if the path is
otherwise usable.

So if had, say, a git repository whose worktree was full of objects and
packfiles, it currently works for me to point to that as an alternate.
But after this patch, we'd complain "wait, this looks like a git repo!".

So I'd much rather see the logic check first for something usable, and
only when we fail to find it, start doing a loose diagnosis. Something
like:

  if !is_directory($path)
	complain that it does not exist, as now
  else if !is_directory($path/pack)
	/*
	 * it doesn't look like an object directory; technically it
	 * _could_ just have loose objects, and maybe we ought to check
	 * for directories matching [0-9a-f]{2}, though it seems
	 * somewhat unlikely these days.
	 */
	if is_directory($path/objects) || exists($path/.git)
		complain that it looks like a git dir
	else
		complain that it doesn't look like an object dir
  fi

Hmm. I managed to write a gross mix of C and shell for my pseudocode,
but hopefully you can read it. ;)

> I haven't read 3/3 yet, but as I said, I suspect it is reasonable to
> DWIM and use the object store associated with the directory we found
> to be a repository.

Yeah, I'm tempted by that, too, though I worry about corner cases. How
much effort should we put into discovery? I guess the rules from
enter_repo() would make sense, though the logic in that function would
need some refactoring to be reused elsewhere.

-Peff

^ permalink raw reply

* Re: [PATCH] rebase docs: fix incorrect format of the section Behavioral Differences
From: Elijah Newren @ 2018-12-05  6:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Johannes Sixt
In-Reply-To: <xmqqo9a0d3w6.fsf@gitster-ct.c.googlers.com>

On Tue, Dec 4, 2018 at 7:54 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Gah, when I was rebasing on your patch I adopted your sentence rewrite
> > but forgot to remove the "sometimes".  Thanks for catching; correction:
>
> >
> > -- 8< --
> > Subject: [PATCH v2] git-rebase.txt: update note about directory rename
> >  detection and am
> >
> > In commit 6aba117d5cf7 ("am: avoid directory rename detection when
> > calling recursive merge machinery", 2018-08-29), the git-rebase manpage
> > probably should have also been updated to note the stronger
> > incompatibility between git-am and directory rename detection.  Update
> > it now.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  Documentation/git-rebase.txt | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> > index 41631df6e4..ef76cccf3f 100644
> > --- a/Documentation/git-rebase.txt
> > +++ b/Documentation/git-rebase.txt
> > @@ -569,8 +569,12 @@ it to keep commits that started empty.
> >  Directory rename detection
> >  ~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > -The merge and interactive backends work fine with
> > -directory rename detection.  The am backend sometimes does not.
> > +The merge and interactive backends work fine with directory rename
>
> I am not sure "work fine" a fair and correct label, as rename is
> always heuristic.
>
>     The "directory rename detection" heuristic in "merge" and the
>     "interactive" backends can take what happened to paths in the
>     same directory into account when deciding if a disappeared path
>     was "renamed" and to which other path.  The heuristic produces
>     incorrect result when the information given is only about
>     changed paths, which is why it is disabled when using the "am"
>     backend.
>
> perhaps.

The general idea sounds good.  Does adding a few more details help
with understanding, or is it more of an information overload?  I'm
thinking of something like:

     The "directory rename detection" heuristic in the "merge" and
     "interactive" backends can take what happened to paths in the
     same directory on the other side of history into account when
     deciding whether a new path in that directory should instead be
     moved elsewhere.  The heuristic produces incorrect results when
     the only information available is about files which were changed
     on the side of history being rebased, which is why directory
     rename detection is disabled when using the "am" backend.

^ permalink raw reply

* Re: [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out
From: Junio C Hamano @ 2018-12-05  6:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <e19f294df9ff999d30a47339a7848c7104bfae7d.1543879256.git.jonathantanmy@google.com>

Jonathan Tan <jonathantanmy@google.com> writes:

> +struct output_state {
> +	char buffer[8193];
> +	int used;
> +};
> +
> +static int read_pack_objects_stdout(int outfd, struct output_state *os)
> +{

The naming feels quite odd.  We are downstream of pack-objects and
reading its standard output stream, so "read stdout-of-pack-objects"
is not wrong per-se, but it may be just me.  Stepping back and what
the function is really about often helps us to understand what we
are doing.

I think the function you extracted from the existing logic is
responsible for relaying the data read from pack-objects to the
fetch-pack client on the other side of the wire.  So from that point
of view, singling out "read" as if it is a primary thing the
function does is already suboptimal.  Perhaps

	static int relay_pack_data(int in, struct output_state *os)

because the fd is what we "read" from (hence, 'in', not 'out'), and
relaying is what we do and reading is only one half of doing so?

> +	/* Data ready; we keep the last byte to ourselves
> +	 * in case we detect broken rev-list, so that we
> +	 * can leave the stream corrupted.  This is
> +	 * unfortunate -- unpack-objects would happily
> +	 * accept a valid packdata with trailing garbage,
> +	 * so appending garbage after we pass all the
> +	 * pack data is not good enough to signal
> +	 * breakage to downstream.
> +	 */

Yeah, I recall writing this funky and unfortunate logic in 2006.
Perhaps we want to update the comment style to make it a bit more
modern?

The "Data ready;" part of the comment applies more to what the
caller of this logic does (i.e. poll returned and revents indicates
we can read, and that is why we are calling into this logic).  The
remainder is the comment that is relevat to this logic.

> +	ssize_t readsz;
> +
> +	readsz = xread(outfd, os->buffer + os->used,
> +		       sizeof(os->buffer) - os->used);

So we read what we could read in the remaining buffer.

I notice that you alloated 8k+1 bytes; would this code end up
reading that 8k+1 bytes in full under the right condition?  I am
mainly wondering where the need for +1 comes from.

> +	if (readsz < 0) {
> +		return readsz;
> +	}

OK, report an error to the caller by returning negative.  I am
guessing that you'd have more code in this block that currently have
only one statement in the future steps, perhaps.

> +	os->used += readsz;
> +
> +	if (os->used > 1) {
> +		send_client_data(1, os->buffer, os->used - 1);
> +		os->buffer[0] = os->buffer[os->used - 1];

OK, this corresponds to the "*cp++ = buffered"  in the original just
before xread().

> +		os->used = 1;
> +	} else {
> +		send_client_data(1, os->buffer, os->used);
> +		os->used = 0;

I am not sure if the code is correct when os->used happens to be 1
(shouldn't we hold the byte, skip the call to send_client_data(),
and go back to poll() to expect more data?), but this is a faithful
code movement and rewrite of the original.

I've read the caller (below, snipped) and the conversion looked
sensible there as well.  The final flushing of the byte(s) held back
in *os also looks good.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox