Git development
 help / color / mirror / Atom feed
* Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
From: SZEDER Gábor @ 2018-12-04 17:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King
In-Reply-To: <87o9a1z0j5.fsf@evledraar.gmail.com>

On Tue, Dec 04, 2018 at 06:04:14PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Dec 04 2018, SZEDER Gábor wrote:
> 
> > The number of parallel invocations is determined by, in order of
> > precedence: the number specified as '--stress=<N>', or the value of
> > the GIT_TEST_STRESS_LOAD environment variable, or twice the number of
> > available processors in '/proc/cpuinfo', or 8.
> 
> With this series we have at least 3 ways to get this number. First
> online_cpus() in the C code, then Peff's recent `getconf
> _NPROCESSORS_ONLN` in doc-diff, and now this /proc/cpuinfo parsing.
> 
> Perhaps it makes sense to split online_cpus() into a helper to use from
> the shellscripts instead?

I don't think so, but I will update this patch to use 'getconf'
instead.


^ permalink raw reply

* Re: [PATCH v3 07/14] checkout: split into switch-branch and restore-files
From: Elijah Newren @ 2018-12-04 17:43 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ævar Arnfjörð, Git Mailing List, Junio C Hamano,
	Stefan Beller, Thomas Gummerer, Stefan Xenos
In-Reply-To: <CACsJy8BTs+WKzTTEF2XVTT-LVJk_exYCz_hN+hXU1Dw+oquBpA@mail.gmail.com>

On Tue, Dec 4, 2018 at 8:22 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> Thanks for all the comments! There are still some I haven't replied
> (either I'll agree and do it anyway, or I'll need some more time to
> digest)
>
> On Tue, Dec 4, 2018 at 1:45 AM Elijah Newren <newren@gmail.com> wrote:
> > > +'git restore-files' [--from=<tree-ish>] <pathspec>...
> >
> > Isn't this already inferred by the previous line?  Or was the
> > inclusion of --from on the previous line in error?  Looking at the
> > git-checkout manpage, it looks like you may have just been copying an
> > existing weirdness, but it needs to be fixed.  ;-)
>
> Hehe.
>
> > > +'git restore-files' (-p|--patch) [--from=<tree-ish>] [<pathspec>...]
> > > +
> > > +DESCRIPTION
> > > +-----------
> > > +Updates files in the working tree to match the version in the index
> > > +or the specified tree.
> > > +
> > > +'git restore-files' [--from=<tree-ish>] <pathspec>...::
> >
> > <tree-ish> and <pathspec>?  I understand <commit-ish> and <pathspec>,
> > or <tree-ish> but have no clue why it'd be okay to specify <tree-ish>
> > and <pathspec> together.  What does that even mean?
> >
> > Also, rather than fixing from <tree-ish> to <commit-ish> or <commit>,
> > perhaps we should just use <revision> here?  (I'm thinking of git
> > rev-parse's "Specifying revisions", which suggests "revisions" as a
> > good substitute for "commit-ish" that isn't quite so weird for new
> > users.)
>
> tree-ish is technically more accurate. But I'm ok with just
> <revision>. If you give it a blob oid then you should get a nice
> explanation what you're doing wrong anyway.

Documenting as <revision> but having it be more general under the hood
and actually accept <tree-ish> sounds good to me.  I just think the
pain of trying to explain <tree-ish> is too much of a hurdle for
users, especially as I expect it to be very unlikely that users will
ever take advantage of it.

> > > +       Overwrite paths in the working tree by replacing with the
> > > +       contents in the index or in the <tree-ish> (most often a
> > > +       commit).  When a <tree-ish> is given, the paths that
> > > +       match the <pathspec> are updated both in the index and in
> > > +       the working tree.
> >
> > Is that the default we really want for this command?  Why do we
> > automatically assume these files are ready for commit?  I understand
> > that it's what checkout did, but I'd find it more natural to only
> > affect the working tree by default.  We can give it an option for
> > affecting the index instead (or perhaps in addition).
>
> Yeah, that behavior of updating the index always bothers me when I use
> it but I seemed to forget when working on this.
>
> > > +--ours::
> > > +--theirs::
> > > +       Check out stage #2 ('ours') or #3 ('theirs') for unmerged
> > > +       paths.
> > > ++
> > > +Note that during `git rebase` and `git pull --rebase`, 'ours' and
> > > +'theirs' may appear swapped; `--ours` gives the version from the
> > > +branch the changes are rebased onto, while `--theirs` gives the
> > > +version from the branch that holds your work that is being rebased.
> > > ++
> > > +This is because `rebase` is used in a workflow that treats the
> > > +history at the remote as the shared canonical one, and treats the
> > > +work done on the branch you are rebasing as the third-party work to
> > > +be integrated, and you are temporarily assuming the role of the
> > > +keeper of the canonical history during the rebase.  As the keeper of
> > > +the canonical history, you need to view the history from the remote
> > > +as `ours` (i.e. "our shared canonical history"), while what you did
> > > +on your side branch as `theirs` (i.e. "one contributor's work on top
> > > +of it").
> >
> > Total aside because I'm not sure what you could change here, but man
> > do I hate this.
>
> Uh it's actually documented? I'm always confused by this too. --ours
> and --theirs at this point are pretty much tied to stage 2 and 3.
> Nothing I can do about it. But if you could come up with some other
> option names, then we could make "new ours" to be stage 3 during
> rebase, for example.

I don't think it's a naming issue, personally.  Years ago we could
have defined --ours and --theirs differently based on which kind of
operation we were in the middle of, but you are probably right that
they are now tied to stage 2 and 3.  But there's another place that we
might still be able to address this; I think the brain-damage here may
have just been due to the fact that the recursive merge machinery was
rather inflexible and required HEAD to be stage 2.  If it were a
little more flexible, then we might just be able to make this problem
go away.  Maybe it can still be fixed (I haven't dug too deeply into
it), but if so, the only fix needed here would be to remove this long
explanation about why the tool gets things totally backward.

> > > +Part of the linkgit:git[1] suite
> >
> >
> > My single biggest worry about this whole series is that I'm worried
> > you're perpetuating and even further ingraining one of the biggest
> > usability problems with checkout: people suggest and use it for
> > reverting/restoring paths to a previous version, but it doesn't do
> > that:
> >
> > git restore-files --from master~10 Documentation/
> > <edit some non-documentation files>
> > git add -u
> > git commit -m "Rationale for changing files including reverting Documentation/"
> >
> > In particular, now you have a mixture of files in Documentation/ from
> > master~10 (er, now likely master~11) and HEAD~1; any files and
> > sub-directories that existed in HEAD~1 still remain and are mixed with
> > all other files in Documentation/ from the older commit.
> >
> > You may think this is a special case, but this particular issue
> > results in some pretty bad surprises.  Also, it was pretty surprising
> > to me just how difficult it was to implement an svn-like revert in
> > EasyGit, in large part because of this 'oversight' in git.  git
> > checkout -- <paths> to me has always been fundamentally wrong, but I
> > just wasn't sure if I wanted to fight the backward compatibility
> > battle and suggest changing it.  With a new command, we definitely
> > shouldn't be reinforcing this error.  (And maybe we should consider
> > taking the time to fix git checkout too.)
>
> What would be the right behavior for
>
>  git restore-files --from=master~10 Documentation/
>
> then? Consider it an error? I often use "git checkout HEAD" and "git
> checkout HEAD^" (usually with -p) but not very far back like
> master~10.

Well, when you use a file rather than a directory:
  git restore-files --from=master~10 foo.c
then you expect
  git diff master~10 foo.c
to come back empty afterward.  I expect the same if I give a directory
rather than a file.  (Even if it does make 'restore-files' feel
slightly mis-named.)

> > > +If the branch exists in multiple remotes and one of them is named by
> > > +the `checkout.defaultRemote` configuration variable, we'll use that
> > > +one for the purposes of disambiguation, even if the `<branch>` isn't
> > > +unique across all remotes. Set it to
> > > +e.g. `checkout.defaultRemote=origin` to always checkout remote
> > > +branches from there if `<branch>` is ambiguous but exists on the
> > > +'origin' remote. See also `checkout.defaultRemote` in
> > > +linkgit:git-config[1].
> >
> > So switch-branch will be controlled by checkout.* config variables?
> > That probably makes the most sense, but it does dilute the advantage
> > of adding these simpler commands.
> >
> > Also, the fact that we're trying to make a simpler command makes me
> > think that removing the auto-vivify behavior from the default and
> > adding a simple flag which users can pass to request will allow this
> > part of the documentation to be hidden behind the appropriate flag,
> > which may make it easier for users to compartmentalize the command and
> > it's options, enabling them to learn as they go.
>
> Sounds good. I don't know a good name for this new option though so
> unless anybody comes up with some suggestion, I'll just disable
> checkout.defaultRemote in switch-branch. If it comes back as a new
> option, it can always be added later.
>
> > > +'git switch-branch' -c|-C <new_branch> [<start_point>]::
> > > +
> > > +       Specifying `-c` causes a new branch to be created as if
> > > +       linkgit:git-branch[1] were called and then switched to. In
> > > +       this case you can use the `--track` or `--no-track` options,
> > > +       which will be passed to 'git branch'.  As a convenience,
> > > +       `--track` without `-c` implies branch creation; see the
> > > +       description of `--track` below.
> >
> > Can we get rid of --track/--no-track and just provide a flag (which
> > takes no arguments) for the user to use?  Problems with --track:
> >   * it's not even in your synopsis
> >   * user has to repeat themselves (e.g. 'next' in two places from '-c
> > next --track origin/next'); this repetition is BOTH laborious AND
> > error-prone
> >   * it's rather inconsistent: --track is the default due to
> > auto-vivify when the user specifies nothing but a branch name that
> > doesn't exist yet, but when the user realizes the branch doesn't exist
> > yet and asks to have it created then suddenly tracking is not the
> > default??
>
> I don't think --track is default anymore (maybe I haven't updated the
> man page correctly). The dwim behavior is only activated in
> switch-branch when you specify --guess to reduce the amount of magic
> we throw at the user. With that in mind, do we still hide
> --track/--no-track from switch-branch?

Ooh, you're adding --guess?  Cool, that addresses my concerns, just in
a different manner.

Personally, I'd leave --track/--no-track out.  It's extra mental
overhead, git branch has options for setting those if they need some
special non-default setup, and if there is enough demand for it we can
add it later.  Removing options once published is much harder.

> > I'm not sure what's best, but here's some food for thought:
> >
> >
> >    git switch-branch <branch>
> > switches to <branch>, if it exists.  Error cases:
> >   * If <branch> isn't actually a branch but a <tag> or
> > <remote-tracking-branch> or <revision>, error out and suggest using
> > --detach.
> >   * If <branch> isn't actually a branch but there is a similarly named
> > <remote-tracking-branch> (e.g. origin/<branch>), then suggest using
> > -c.
>
> I would make these advice so I can hide them. Or if I manage to make
> all these hints one line then I'll make it unconditional.
>
> >   git switch-branch -c <branch>
> > creates <branch> and, if a relevant-remote-tracking branch exists,
> > base the branch on that revision and set the new branch up to track
>
> Hmm.. this is a bit magical and could be surprising. If I create (and
> switch to) a new branch foo, I don't necessarily mean tracking
> origin/foo (I may not even think about origin/foo when I type the
> command). So tentatively no.

Yeah, if you're adding --guess then I'm happy.  I do think, though,
that if the user runs switch-branch to a branch that doesn't exist, we
should check if there is an associated remote-tracking branch so that
we can provide a better error message and help users learn about
--guess.  (Also, will there be a short -g form?)

>
> > > +If `-C` is given, <new_branch> is created if it doesn't exist;
> > > +otherwise, it is reset. This is the transactional equivalent of
> > > ++
> > > +------------
> > > +$ git branch -f <branch> [<start_point>]
> > > +$ git switch-branch <branch>
> > > +------------
> > > ++
> > > +that is to say, the branch is not reset/created unless "git
> > > +switch-branch" is successful.
> >
> > ...and when exactly would it fail?  Reading this, it looks like the
> > only possible error condition was removed due saying we'll reset the
> > branch if it already exists, so it's rather confusing.
>
> Yeah probably just scrape it. The atomic nature is not worth highlighting.
>
>
> > > +'git switch-branch' --detach [<commit>]::
> > > +
> > > +       Prepare to work on a unnamed branch on top of <commit> (see
> > > +       "DETACHED HEAD" section), and updating the index and the files
> > > +       in the working tree.  Local modifications to the files in the
> > > +       working tree are kept, so that the resulting working tree will
> > > +       be the state recorded in the commit plus the local
> > > +       modifications.
> > > ++
> > > +When the <commit> argument is a branch name, the `--detach` option can
> > > +be used to detach HEAD at the tip of the branch (`git switch-branch
> > > +<branch>` would check out that branch without detaching HEAD).
> > > ++
> > > +Omitting <commit> detaches HEAD at the tip of the current branch.
> > > +
> > > +OPTIONS
> > > +-------
> > > +-q::
> > > +--quiet::
> > > +       Quiet, suppress feedback messages.
> > > +
> > > +--[no-]progress::
> > > +       Progress status is reported on the standard error stream
> > > +       by default when it is attached to a terminal, unless `--quiet`
> > > +       is specified. This flag enables progress reporting even if not
> > > +       attached to a terminal, regardless of `--quiet`.
> > > +
> > > +-f::
> > > +--force::
> > > +       Proceed even if the index or the working tree differs from
> > > +       HEAD.  This is used to throw away local changes.
> >
> > Haven't thought through this thoroughly, but do we really need an
> > option for that instead of telling users to 'git reset --hard HEAD'
> > before switching branches if they want their stuff thrown away?
>
> For me it's just a bit more convenient. Hit an error when switching
> branch? Recall the command from bash history, stick -f in it and run.
> Elsewhere I think both Junio and Thomas (or maybe only Junio) suggests
> moving the "git reset" functionality without moving HEAD to one of
> these commands, which goes the opposite direction...

Fair enough.

> > > +-c <new_branch>::
> > > +--create <new_branch>::
> > > +       Create a new branch named <new_branch> and start it at
> > > +       <start_point>; see linkgit:git-branch[1] for details.
> > > +
> > > +-C <new_branch>::
> > > +--force-create <new_branch>::
> > > +       Creates the branch <new_branch> and start it at <start_point>;
> > > +       if it already exists, then reset it to <start_point>. This is
> > > +       equivalent to running "git branch" with "-f"; see
> > > +       linkgit:git-branch[1] for details.
> >
> > Makes sense, but let's get the -b/-B vs. -c/-C consistent.
>
> Another option I'm considering is -n/-N (for _new_ branch). Maybe
> -c/-C is good enough.

Actually, -n/-N seems like a good idea, especially since --guess also
creates a branch.  If we do stick with -c/-C, then we may need to
document --guess as implying -c to avoid confusion (and then make sure
we get the synopsis correct to show which flags can be used together).

> > > +-l::
> > > +       Create the new branch's reflog; see linkgit:git-branch[1] for
> > > +       details.
> >
> > ??  Jettison this.
>
> Yep. It looks weird to me too. reflog is just behind the scene these
> days. Nobody need to explicitly ask for reflog anymore.
>
> > > +--orphan <new_branch>::
> > > +       Create a new 'orphan' branch, named <new_branch>, started from
> > > +       <start_point> and switch to it.  The first commit made on this
> >
> > What??  started from <start_point>?  The whole point of --orphan is
> > you have no parent, i.e. no start point.  Also, why does the
> > explanation reference an argument that wasn't in the immediately
> > preceding synopsis?
>
> I guess bad phrasing. It should be "switch to <start_point> first,
> then prepare the worktree so that the first commit will have no
> parent". Or something along that line.
>
> You should really review git-checkout.txt btw ;-)

I did after writing several of these comments, and yeah, it really
needs a clean up.  Seems like something someone would do when writing
a (partial) replacement or simplified alternative.  ;-)

To be fair though, I suspect anyone familiar enough with git that
looks at git-checkout.txt again is probably going to miss at least one
thing that is bad for new users no matter how closely they look, just
because they've grown accustomed to the documentation as it is.  I was
trying to help point out possible issues that I spotted, but I suspect
others may be able to point out more.

> > > +       new branch will have no parents and it will be the root of a new
> > > +       history totally disconnected from all the other branches and
> > > +       commits.
> > > ++
> > > +The index and the working tree are adjusted as if you had previously run
> > > +"git checkout <start_point>".  This allows you to start a new history
> > > +that records a set of paths similar to <start_point> by easily running
> > > +"git commit -a" to make the root commit.
> > > ++
> > > +This can be useful when you want to publish the tree from a commit
> > > +without exposing its full history. You might want to do this to publish
> > > +an open source branch of a project whose current tree is "clean", but
> > > +whose full history contains proprietary or otherwise encumbered bits of
> > > +code.
> > > ++
> > > +If you want to start a disconnected history that records a set of paths
> > > +that is totally different from the one of <start_point>, then you should
> > > +clear the index and the working tree right after creating the orphan
> > > +branch by running "git rm -rf ." from the top level of the working tree.
> > > +Afterwards you will be ready to prepare your new files, repopulating the
> > > +working tree, by copying them from elsewhere, extracting a tarball, etc.
> >
> > Ick.  Seems overly complex.  I'd rather that --orphan defaulted to
> > clearing the index and working tree, and that one would need to pass
> > HEAD for <start_point> if you wanted to start out with all those other
> > files.  That would certainly make the explanation a little clearer to
> > users, and more natural when they start experimenting with it.
> >
> > However, --orphan is pretty special case.  Do we perhaps want to leave
> > it out of this new command and only include it in checkout?
>
> I started this by simply splitting git-checkout in two commands that,
> combined, can do everything git-checkout can. Then suggestions to have
> better default came in and I think we started to drift further to
> _removing_ options and falling back to git-checkout.
>
> I think we could still keep "complicated" options as long as they are
> clearly described and don't surprise users until they figure them out.
> That way I don't have to go back to git-checkout and deal with all the
> ambiguation it creates.

Fair enough...though I think it may make sense to also review the
complicated options and determine if they are overly complicated.  I
think --orphan qualifies (I stumbled with it a bit for years the
occasional time I needed to use it), and my small suggestion above
would simplify both it and its description.  We should probably also
consider just removing <start_point> as an acceptable argument to
--orphan; if people want files from some revision after creating an
orphan branch that's a simple extra command.

> > > +-m::
> > > +--merge::
> > > +       If you have local modifications to one or more files that are
> > > +       different between the current branch and the branch to which
> > > +       you are switching, the command refuses to switch branches in
> > > +       order to preserve your modifications in context.  However,
> > > +       with this option, a three-way merge between the current
> > > +       branch, your working tree contents, and the new branch is
> > > +       done, and you will be on the new branch.
> > > ++
> > > +When a merge conflict happens, the index entries for conflicting
> > > +paths are left unmerged, and you need to resolve the conflicts
> > > +and mark the resolved paths with `git add` (or `git rm` if the merge
> > > +should result in deletion of the path).
> > > +
> > > +--conflict=<style>::
> > > +       The same as --merge option above, but changes the way the
> > > +       conflicting hunks are presented, overriding the
> > > +       merge.conflictStyle configuration variable.  Possible values are
> > > +       "merge" (default) and "diff3" (in addition to what is shown by
> > > +       "merge" style, shows the original contents).
> > > +
> > > +--ignore-other-worktrees::
> > > +       `git switch-branch` refuses when the wanted ref is already
> > > +       checked out by another worktree. This option makes it check
> > > +       the ref out anyway. In other words, the ref can be held by
> > > +       more than one worktree.
> >
> > seems rather dangerous...is the goal to be an easier-to-use suggestion
> > for new users while checkout continues to exist, or is this command
> > meant to handle all branch switching functionality that checkout has?
>
> As explained above. I'm still thinking the latter, but with fewer
> surprises and confusion. Though I guess I could be convinced to go
> with the former (the problem with the former is, even as a
> no-longer-new user, I still find git-checkout not that pleasant to use
> and want a better replacement)
>
> > > +<branch>::
> > > +       Branch to checkout; if it refers to a branch (i.e., a name that,
> > > +       when prepended with "refs/heads/", is a valid ref), then that
> > > +       branch is checked out. Otherwise, if it refers to a valid
> > > +       commit, your HEAD becomes "detached" and you are no longer on
> > > +       any branch (see below for details).
> >
> > I thought we requiring --detach in order to detach.  Does this
> > paragraph need updating?  Also, if we require --detach when we'll be
> > detaching HEAD, then this paragraph gets a LOT simpler.
>
> Yep. I really need to read through the document and update all of it.
>
> > > +You can use the `"@{-N}"` syntax to refer to the N-th last
> > > +branch/commit checked out using "git checkout" operation. You may
> > > +also specify `-` which is synonymous to `"@{-1}`.
> > > ++
> > > +As a special case, you may use `"A...B"` as a shortcut for the
> > > +merge base of `A` and `B` if there is exactly one merge base. You can
> > > +leave out at most one of `A` and `B`, in which case it defaults to `HEAD`.
> >
> > I actually didn't know about the A...B special case for checkout.
> > Interesting...but I'm starting to wonder if this is too much info for
> > a "simplified command".
>
> I could just hint about A...B and send the user to git-checkout.txt if
> they need to know more. They can learn about git-checkout that way
> too.

It may be fine to stay.  Lots of my comments were just "let's try to
note any weirdness or complication that might seem excessive so we at
least have a conversation about it" (because it's easy to gloss over
since we've looked at these documents so many times) rather than a
"this definitely needs to go".

^ permalink raw reply

* Re: [PATCH/RFC v3 00/14] Introduce new commands switch-branch and restore-files
From: Elijah Newren @ 2018-12-04 17:45 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ævar Arnfjörð, Git Mailing List, Junio C Hamano,
	Stefan Beller, Thomas Gummerer, Stefan Xenos
In-Reply-To: <CACsJy8DEMHFTnL2QJu5Csb1jUQeu0HiT3rTDii4krrEJcoh=Qw@mail.gmail.com>

On Tue, Dec 4, 2018 at 8:28 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Tue, Dec 4, 2018 at 2:29 AM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Thu, Nov 29, 2018 at 2:01 PM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> > >
> > > v3 sees switch-branch go back to switch-branch (in v2 it was
> > > checkout-branch). checkout-files is also renamed restore-files (v1 was
> > > restore-paths). Hopefully we won't see another rename.
> >
> > I started reading through the patches.  I also tried to apply them
> > locally, but they had conflicts or missing base file version on both
> > master and next.  What version did you base it on?
>
> I think nd/checkout-dwim-fix because of a non-trivial conflict there
> (but I don't remember when I noticed it and rebased on that). Anyway
> you can get the whole series at
>
> https://gitlab.com/pclouds/git/tree/switch-branch-and-checkout-files
>
> It fixes some of your comments already, a couple of bug fixes here and
> there and in a good-enough shape that I start actually using it.

Cool.

> > > - Two more fancy features (the "git checkout --index" being the
> > >   default mode and the backup log for accidental overwrites) are of
> > >   course still missing. But they are coming.
> > >
> > > I did not go replace "detached HEAD" with "unnamed branch" (or "no
> > > branch") everywhere because I think a unique term is still good to
> > > refer to this concept. Or maybe "no branch" is good enough. I dunno.
> >
> > I personally like "unnamed branch", but "no branch" would still be
> > better than "detached HEAD".
>
> Haven't really worked on killing the term "detached HEAD" yet. But I
> noticed the other day that git-branch reports
>
> * (HEAD detached from 703266f6e4)
>
> and I didn't know how to rephrase that. I guess "unnamed branch from
> 703266f6e4" is probably good enough but my old-timer brain screams no.

Perhaps "* (On an unnamed branch, at 703266f6e4)"?

^ permalink raw reply

* Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
From: Ævar Arnfjörð Bjarmason @ 2018-12-04 18:11 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git, Jeff King
In-Reply-To: <20181204163457.15717-4-szeder.dev@gmail.com>


On Tue, Dec 04 2018, SZEDER Gábor wrote:

> Unfortunately, we have a few flaky tests, whose failures tend to be
> hard to reproduce.  We've found that the best we can do to reproduce
> such a failure is to run the test repeatedly while the machine is
> under load, and wait in the hope that the load creates enough variance
> in the timing of the test's commands that a failure is evenually
> triggered.  I have a command to do that, and I noticed that two other
> contributors have rolled their own scripts to do the same, all
> choosing slightly different approaches.
>
> To help reproduce failures in flaky tests, introduce the '--stress'
> option to run a test script repeatedly in multiple parallel
> invocations until one of them fails, thereby using the test script
> itself to increase the load on the machine.
>
> The number of parallel invocations is determined by, in order of
> precedence: the number specified as '--stress=<N>', or the value of
> the GIT_TEST_STRESS_LOAD environment variable, or twice the number of
> available processors in '/proc/cpuinfo', or 8.
>
> To prevent the several parallel invocations of the same test from
> interfering with each other:
>
>   - Include the parallel job's number in the name of the trash
>     directory and the various output files under 't/test-results/' as
>     a '.stress-<Nr>' suffix.
>
>   - Add the parallel job's number to the port number specified by the
>     user or to the test number, so even tests involving daemons
>     listening on a TCP socket can be stressed.
>
>   - Make '--stress' imply '--verbose-log' and discard the test's
>     standard ouput and error; dumping the output of several parallel
>     tests to the terminal would create a big ugly mess.
>
> 'wait' for all parallel jobs before exiting (either because a failure
> was found or because the user lost patience and aborted the stress
> test), allowing the still running tests to finish.  Otherwise the "OK
> X.Y" progress output from the last iteration would likely arrive after
> the user got back the shell prompt, interfering with typing in the
> next command.  OTOH, this waiting might induce a considerable delay
> between hitting ctrl-C and the test actually exiting; I'm not sure
> this is the right tradeoff.

I think it makes sense to generalize this and split it up into two
features.

It's a frequent annoyance of mine in the test suite that I'm
e.g. running t*.sh with some parallel "prove" in one screen, and then I
run tABCD*.sh manually, and get unlucky because they use the same trash
dir, and both tests go boom.

You can fix that with --root, which is much of what this patch does. My
one-liner for doing --stress has been something like:

    perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh --root=trash-{} -v'

But it would be great if I didn't have to worry about that and could
just run two concurrent:

    ./t0000-basic.sh

So I think we could just set some env variable where instead of having
the predictable trash directory we have a $TRASHDIR.$N as this patch
does, except we pick $N by locking some test-runs/tABCD.Nth file with
flock() during the run.

Then a stress mode like this would just be:

    GIT_TEST_FLOCKED_TRASH_DIRS=1 perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error soon,fail=1 './t0000-basic.sh'

And sure, we could ship a --stress option too, but it wouldn't be
magical in any way, just another "spawn N in a loop" implementation, but
you could also e.g. use GNU parallel to drive it, and without needing to
decide to stress test in advance, since we'd either flock() the trash
dir, or just mktemp(1)-it.

^ permalink raw reply

* Re: [PATCH v3 07/14] checkout: split into switch-branch and restore-files
From: Duy Nguyen @ 2018-12-04 18:17 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	Junio C Hamano, Stefan Beller, Thomas Gummerer, Stefan Xenos
In-Reply-To: <CABPp-BGRcaiiD-aks1kaLr7ATLQ_oGSyooQBDD+2acgerA+Phg@mail.gmail.com>

On Tue, Dec 4, 2018 at 6:43 PM Elijah Newren <newren@gmail.com> wrote:
> > > > +--ours::
> > > > +--theirs::
> > > > +       Check out stage #2 ('ours') or #3 ('theirs') for unmerged
> > > > +       paths.
> > > > ++
> > > > +Note that during `git rebase` and `git pull --rebase`, 'ours' and
> > > > +'theirs' may appear swapped; `--ours` gives the version from the
> > > > +branch the changes are rebased onto, while `--theirs` gives the
> > > > +version from the branch that holds your work that is being rebased.
> > > > ++
> > > > +This is because `rebase` is used in a workflow that treats the
> > > > +history at the remote as the shared canonical one, and treats the
> > > > +work done on the branch you are rebasing as the third-party work to
> > > > +be integrated, and you are temporarily assuming the role of the
> > > > +keeper of the canonical history during the rebase.  As the keeper of
> > > > +the canonical history, you need to view the history from the remote
> > > > +as `ours` (i.e. "our shared canonical history"), while what you did
> > > > +on your side branch as `theirs` (i.e. "one contributor's work on top
> > > > +of it").
> > >
> > > Total aside because I'm not sure what you could change here, but man
> > > do I hate this.
> >
> > Uh it's actually documented? I'm always confused by this too. --ours
> > and --theirs at this point are pretty much tied to stage 2 and 3.
> > Nothing I can do about it. But if you could come up with some other
> > option names, then we could make "new ours" to be stage 3 during
> > rebase, for example.
>
> I don't think it's a naming issue, personally.  Years ago we could
> have defined --ours and --theirs differently based on which kind of
> operation we were in the middle of, but you are probably right that
> they are now tied to stage 2 and 3.  But there's another place that we
> might still be able to address this; I think the brain-damage here may
> have just been due to the fact that the recursive merge machinery was
> rather inflexible and required HEAD to be stage 2.  If it were a
> little more flexible, then we might just be able to make this problem
> go away.  Maybe it can still be fixed (I haven't dug too deeply into
> it), but if so, the only fix needed here would be to remove this long
> explanation about why the tool gets things totally backward.

Aha. I' not really deep in this merge business to know if stages 2 and
3 can be swapped. This is right up your alley. I'll just leave it to
you.

> > > > +'git switch-branch' -c|-C <new_branch> [<start_point>]::
> > > > +
> > > > +       Specifying `-c` causes a new branch to be created as if
> > > > +       linkgit:git-branch[1] were called and then switched to. In
> > > > +       this case you can use the `--track` or `--no-track` options,
> > > > +       which will be passed to 'git branch'.  As a convenience,
> > > > +       `--track` without `-c` implies branch creation; see the
> > > > +       description of `--track` below.
> > >
> > > Can we get rid of --track/--no-track and just provide a flag (which
> > > takes no arguments) for the user to use?  Problems with --track:
> > >   * it's not even in your synopsis
> > >   * user has to repeat themselves (e.g. 'next' in two places from '-c
> > > next --track origin/next'); this repetition is BOTH laborious AND
> > > error-prone
> > >   * it's rather inconsistent: --track is the default due to
> > > auto-vivify when the user specifies nothing but a branch name that
> > > doesn't exist yet, but when the user realizes the branch doesn't exist
> > > yet and asks to have it created then suddenly tracking is not the
> > > default??
> >
> > I don't think --track is default anymore (maybe I haven't updated the
> > man page correctly). The dwim behavior is only activated in
> > switch-branch when you specify --guess to reduce the amount of magic
> > we throw at the user. With that in mind, do we still hide
> > --track/--no-track from switch-branch?
>
> Ooh, you're adding --guess?  Cool, that addresses my concerns, just in
> a different manner.

No it's always there even in git-checkout, just hidden.

> Personally, I'd leave --track/--no-track out.  It's extra mental
> overhead, git branch has options for setting those if they need some
> special non-default setup, and if there is enough demand for it we can
> add it later.  Removing options once published is much harder.

Slightly less convenient (you would need a combination of git-branch
and git-switch-branch, if you avoid git-checkout). But since I don't
think I have ever used this option, I'm fine with leaving it out and
considering adding it back later.

> > > I'm not sure what's best, but here's some food for thought:
> > >
> > >
> > >    git switch-branch <branch>
> > > switches to <branch>, if it exists.  Error cases:
> > >   * If <branch> isn't actually a branch but a <tag> or
> > > <remote-tracking-branch> or <revision>, error out and suggest using
> > > --detach.
> > >   * If <branch> isn't actually a branch but there is a similarly named
> > > <remote-tracking-branch> (e.g. origin/<branch>), then suggest using
> > > -c.
> >
> > I would make these advice so I can hide them. Or if I manage to make
> > all these hints one line then I'll make it unconditional.
> >
> > >   git switch-branch -c <branch>
> > > creates <branch> and, if a relevant-remote-tracking branch exists,
> > > base the branch on that revision and set the new branch up to track
> >
> > Hmm.. this is a bit magical and could be surprising. If I create (and
> > switch to) a new branch foo, I don't necessarily mean tracking
> > origin/foo (I may not even think about origin/foo when I type the
> > command). So tentatively no.
>
> Yeah, if you're adding --guess then I'm happy.  I do think, though,
> that if the user runs switch-branch to a branch that doesn't exist, we
> should check if there is an associated remote-tracking branch so that
> we can provide a better error message and help users learn about
> --guess.  (Also, will there be a short -g form?)

Yeah better error and suggestion is a good idea. And yes the short
form -g is already added (I did try to use it and find --guess too
time consuming even with bash completion support).

> > > > +-f::
> > > > +--force::
> > > > +       Proceed even if the index or the working tree differs from
> > > > +       HEAD.  This is used to throw away local changes.
> > >
> > > Haven't thought through this thoroughly, but do we really need an
> > > option for that instead of telling users to 'git reset --hard HEAD'
> > > before switching branches if they want their stuff thrown away?
> >
> > For me it's just a bit more convenient. Hit an error when switching
> > branch? Recall the command from bash history, stick -f in it and run.
> > Elsewhere I think both Junio and Thomas (or maybe only Junio) suggests
> > moving the "git reset" functionality without moving HEAD to one of
> > these commands, which goes the opposite direction...
>
> Fair enough.

I'm actually still not sure how to move it here (I guess 'here' is
restore-files since we won't move HEAD). All the --mixed, --merge and
--hard are confusing. But maybe we could just make 'git restore-files
--from HEAD -f :/" behave just like "git reset --hard HEAD" (but with
some safety net) But we can leave it for discussion in the next round.

> > > > +--orphan <new_branch>::
> > > > +       Create a new 'orphan' branch, named <new_branch>, started from
> > > > +       <start_point> and switch to it.  The first commit made on this
> > >
> > > What??  started from <start_point>?  The whole point of --orphan is
> > > you have no parent, i.e. no start point.  Also, why does the
> > > explanation reference an argument that wasn't in the immediately
> > > preceding synopsis?
> >
> > I guess bad phrasing. It should be "switch to <start_point> first,
> > then prepare the worktree so that the first commit will have no
> > parent". Or something along that line.
> >
> > You should really review git-checkout.txt btw ;-)
>
> I did after writing several of these comments, and yeah, it really
> needs a clean up.  Seems like something someone would do when writing
> a (partial) replacement or simplified alternative.  ;-)

Heh ;-) Fine I'll do it. I have to read and re-read git-checkout.txt
anyway and already queued up a couple small fixes.

> > > > +       new branch will have no parents and it will be the root of a new
> > > > +       history totally disconnected from all the other branches and
> > > > +       commits.
> > > > ++
> > > > +The index and the working tree are adjusted as if you had previously run
> > > > +"git checkout <start_point>".  This allows you to start a new history
> > > > +that records a set of paths similar to <start_point> by easily running
> > > > +"git commit -a" to make the root commit.
> > > > ++
> > > > +This can be useful when you want to publish the tree from a commit
> > > > +without exposing its full history. You might want to do this to publish
> > > > +an open source branch of a project whose current tree is "clean", but
> > > > +whose full history contains proprietary or otherwise encumbered bits of
> > > > +code.
> > > > ++
> > > > +If you want to start a disconnected history that records a set of paths
> > > > +that is totally different from the one of <start_point>, then you should
> > > > +clear the index and the working tree right after creating the orphan
> > > > +branch by running "git rm -rf ." from the top level of the working tree.
> > > > +Afterwards you will be ready to prepare your new files, repopulating the
> > > > +working tree, by copying them from elsewhere, extracting a tarball, etc.
> > >
> > > Ick.  Seems overly complex.  I'd rather that --orphan defaulted to
> > > clearing the index and working tree, and that one would need to pass
> > > HEAD for <start_point> if you wanted to start out with all those other
> > > files.  That would certainly make the explanation a little clearer to
> > > users, and more natural when they start experimenting with it.
> > >
> > > However, --orphan is pretty special case.  Do we perhaps want to leave
> > > it out of this new command and only include it in checkout?
> >
> > I started this by simply splitting git-checkout in two commands that,
> > combined, can do everything git-checkout can. Then suggestions to have
> > better default came in and I think we started to drift further to
> > _removing_ options and falling back to git-checkout.
> >
> > I think we could still keep "complicated" options as long as they are
> > clearly described and don't surprise users until they figure them out.
> > That way I don't have to go back to git-checkout and deal with all the
> > ambiguation it creates.
>
> Fair enough...though I think it may make sense to also review the
> complicated options and determine if they are overly complicated.  I
> think --orphan qualifies (I stumbled with it a bit for years the
> occasional time I needed to use it), and my small suggestion above
> would simplify both it and its description.  We should probably also
> consider just removing <start_point> as an acceptable argument to
> --orphan; if people want files from some revision after creating an
> orphan branch that's a simple extra command.

It is good that you pointed it out though. I still have to digest this
option before I make more comments, but generally if there's a simpler
(even if new) way to achieve the same thing, I'm all for it.
-- 
Duy

^ permalink raw reply

* Re: [PATCH/RFC v3 00/14] Introduce new commands switch-branch and restore-files
From: Duy Nguyen @ 2018-12-04 18:22 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	Junio C Hamano, Stefan Beller, Thomas Gummerer, Stefan Xenos
In-Reply-To: <CABPp-BH=rsLqq4ZRMSUv6n0n5p=aMZs-+VkVT=7P8n4=iUk=-Q@mail.gmail.com>

On Tue, Dec 4, 2018 at 6:45 PM Elijah Newren <newren@gmail.com> wrote:
> > > > - Two more fancy features (the "git checkout --index" being the
> > > >   default mode and the backup log for accidental overwrites) are of
> > > >   course still missing. But they are coming.
> > > >
> > > > I did not go replace "detached HEAD" with "unnamed branch" (or "no
> > > > branch") everywhere because I think a unique term is still good to
> > > > refer to this concept. Or maybe "no branch" is good enough. I dunno.
> > >
> > > I personally like "unnamed branch", but "no branch" would still be
> > > better than "detached HEAD".
> >
> > Haven't really worked on killing the term "detached HEAD" yet. But I
> > noticed the other day that git-branch reports
> >
> > * (HEAD detached from 703266f6e4)
> >
> > and I didn't know how to rephrase that. I guess "unnamed branch from
> > 703266f6e4" is probably good enough but my old-timer brain screams no.
>
> Perhaps "* (On an unnamed branch, at 703266f6e4)"?

This 703266f6e4 is the fork point. Once you start adding more commits
on top of this unnamed branch, I find it hard to define it "at"
703266f6e4 anymore. "forked from 703266f6e4" (or even starting/growing
from...) is probably clearest but also a bit longer.
-- 
Duy

^ permalink raw reply

* Re: [PATCH/RFC v3 00/14] Introduce new commands switch-branch and restore-files
From: Elijah Newren @ 2018-12-04 18:31 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ævar Arnfjörð, Git Mailing List, Junio C Hamano,
	Stefan Beller, Thomas Gummerer, Stefan Xenos
In-Reply-To: <CACsJy8BSm945_hqwT3MSW2H_1so1KwrW_p1zz3V-fObwyGNUjw@mail.gmail.com>

On Tue, Dec 4, 2018 at 10:22 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Tue, Dec 4, 2018 at 6:45 PM Elijah Newren <newren@gmail.com> wrote:
> > > > > - Two more fancy features (the "git checkout --index" being the
> > > > >   default mode and the backup log for accidental overwrites) are of
> > > > >   course still missing. But they are coming.
> > > > >
> > > > > I did not go replace "detached HEAD" with "unnamed branch" (or "no
> > > > > branch") everywhere because I think a unique term is still good to
> > > > > refer to this concept. Or maybe "no branch" is good enough. I dunno.
> > > >
> > > > I personally like "unnamed branch", but "no branch" would still be
> > > > better than "detached HEAD".
> > >
> > > Haven't really worked on killing the term "detached HEAD" yet. But I
> > > noticed the other day that git-branch reports
> > >
> > > * (HEAD detached from 703266f6e4)
> > >
> > > and I didn't know how to rephrase that. I guess "unnamed branch from
> > > 703266f6e4" is probably good enough but my old-timer brain screams no.
> >
> > Perhaps "* (On an unnamed branch, at 703266f6e4)"?
>
> This 703266f6e4 is the fork point. Once you start adding more commits
> on top of this unnamed branch, I find it hard to define it "at"
> 703266f6e4 anymore. "forked from 703266f6e4" (or even starting/growing
> from...) is probably clearest but also a bit longer.

It reports the fork point rather than the commit HEAD points to?  Ah,
I guess I never payed that close of attention before.  I actually
think "on an unnamed branch" is good enough, but if others gain value
from the extra info, then I understand the conundrum.  I'm not sure
what the use or rationale is for the fork point, though, so I feel
slightly at a loss to try to describe this extra piece of info.

^ permalink raw reply

* Re: [PATCH/RFC v3 00/14] Introduce new commands switch-branch and restore-files
From: Duy Nguyen @ 2018-12-04 18:39 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	Junio C Hamano, Stefan Beller, Thomas Gummerer, Stefan Xenos
In-Reply-To: <CABPp-BFk8XXv=6bu0XPFfiDrNWE4HP9qF=5E+QFx3Q-brj=BBw@mail.gmail.com>

On Tue, Dec 4, 2018 at 7:31 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Dec 4, 2018 at 10:22 AM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Tue, Dec 4, 2018 at 6:45 PM Elijah Newren <newren@gmail.com> wrote:
> > > > > > - Two more fancy features (the "git checkout --index" being the
> > > > > >   default mode and the backup log for accidental overwrites) are of
> > > > > >   course still missing. But they are coming.
> > > > > >
> > > > > > I did not go replace "detached HEAD" with "unnamed branch" (or "no
> > > > > > branch") everywhere because I think a unique term is still good to
> > > > > > refer to this concept. Or maybe "no branch" is good enough. I dunno.
> > > > >
> > > > > I personally like "unnamed branch", but "no branch" would still be
> > > > > better than "detached HEAD".
> > > >
> > > > Haven't really worked on killing the term "detached HEAD" yet. But I
> > > > noticed the other day that git-branch reports
> > > >
> > > > * (HEAD detached from 703266f6e4)
> > > >
> > > > and I didn't know how to rephrase that. I guess "unnamed branch from
> > > > 703266f6e4" is probably good enough but my old-timer brain screams no.
> > >
> > > Perhaps "* (On an unnamed branch, at 703266f6e4)"?
> >
> > This 703266f6e4 is the fork point. Once you start adding more commits
> > on top of this unnamed branch, I find it hard to define it "at"
> > 703266f6e4 anymore. "forked from 703266f6e4" (or even starting/growing
> > from...) is probably clearest but also a bit longer.
>
> It reports the fork point rather than the commit HEAD points to?  Ah,
> I guess I never payed that close of attention before.  I actually
> think "on an unnamed branch" is good enough, but if others gain value
> from the extra info, then I understand the conundrum.  I'm not sure
> what the use or rationale is for the fork point, though, so I feel
> slightly at a loss to try to describe this extra piece of info.

It's probably a corner case. This is a better example

* (HEAD detached at pclouds/backup-log)

It does help see i'm working on top of some branch (or tag)
-- 
Duy

^ permalink raw reply

* Re: Simple git push --tags deleted all branches
From: Mateusz Loskot @ 2018-12-04 19:00 UTC (permalink / raw)
  To: git
In-Reply-To: <87va4fyjtv.fsf@evledraar.gmail.com>

On Thu, 29 Nov 2018 at 16:39, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Thu, Nov 29 2018, Mateusz Loskot wrote:
> > On Thu, 29 Nov 2018 at 16:03, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> >> On Wed, Nov 28 2018, Mateusz Loskot wrote:
> >> >
> >> > (using git version 2.19.2.windows.1)
> >> >
> >> > I've just encountered one of those WTH moments.
> >> >
> >> > I have a bare repository
> >> >
> >> > core.git (BARE:master) $ git branch
> >> >   1.0
> >> >   2.0
> >> > * master
> >> >
> >> > core.git (BARE:master) $ git tag
> >> > 1.0.1651
> >> > 1.0.766
> >> > 2.0.1103
> >> > 2.0.1200
> >> >
> >> > I published the repo using: git push --all --follow-tags
> >> >
> >> > This succeeded, but there seem to be no tags pushed, just branches.
> >> > So, I followed with
> >> >
> >> > core.git (BARE:master) $ git push --tags
> >> > To XXX
> >> >  - [deleted]               1.0
> >> >  - [deleted]               2.0
> >> >  ! [remote rejected]       master (refusing to delete the current
> >> > branch: refs/heads/master)
> >> > error: failed to push some refs to 'XXX'
> >> >
> >> > And, I've found out that all branches and tags have been
> >> > wiped in both, local repo and remote :)
> >> >
> >> > I restored the repo and tried out
> >> >
> >> > git push origin 1.0
> >> > git push origin --tags
> >> >
> >> > and this time both succeeded, without wiping out any refs.
> >> >
> >> > Could anyone help me to understand why remote-less
> >> >
> >> > git push --tags
> >> >
> >> > is/was so dangerous and unforgiving?!
> >>
> >> Since nobody's replied yet, I can't see what's going on here from the
> >> info you've provided. My guess is that you have something "mirror" set
> >> on the remote.
> >
> > Thank you for responding.
> >
> > The git push --tags hugely surprised me, and here is genuine screenshot
> > https://twitter.com/mloskot/status/1068072285846859776
> >
> >> It seems you can't share the repo or its URL, but could you share the
> >> scrubbed output of 'git config -l --show-origin' when run inside this
> >> repository?
> >
> > Here is complete output. I have not stripped the basics like aliases,
> > just in case.
>
> Right, it's because you used --mirror, the important bit:
>
> > file:config     remote.origin.url=https://xxx.com/core-external-metadata.git
> > file:config     remote.origin.fetch=+refs/*:refs/*
> > file:config     remote.origin.mirror=true
> > file:config
>
> I.e. you have cloned with the --mirror flag, this is what it's supposed
> to do: https://git-scm.com/docs/git-clone#git-clone---mirror
> https://git-scm.com/docs/git-fetch#git-fetch---prune
>
> I.e. you push and git tries to mirror the refs you have locally to the
> remote, including pruning stuff in the remote.

Thank you very much for diagnosing my issue.
I was not aware about how --mirror affects the workflow.
It all makes perfect sense now.

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net

^ permalink raw reply

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
From: Jonathan Tan @ 2018-12-04 19:29 UTC (permalink / raw)
  To: sandals; +Cc: jonathantanmy, git
In-Reply-To: <20181204015446.GX890086@genre.crustytoothpaste.net>

> Some thoughts here:
> 
> First, I'd like to see a section (and a bit in the implementation)
> requiring HTTPS if the original protocol is secure (SSH or HTTPS).
> Allowing the server to downgrade to HTTP, even by accident, would be a
> security problem.
> 
> Second, this feature likely should be opt-in for SSH. One issue I've
> seen repeatedly is that people don't want to use HTTPS to fetch things
> when they're using SSH for Git. Many people in corporate environments
> have proxies that break HTTP for non-browser use cases[0], and using SSH
> is the only way that they can make a functional Git connection.

Good points about SSH support and the client needing to control which
protocols the server will send URIs for. I'll include a line in the
client request in which the client can specify which protocols it is OK
with.

> Third, I think the server needs to be required to both support Range
> headers and never change the content of a URI, so that we can have
> resumable clone implicit in this design. There are some places in the
> world where connections are poor and fetching even the initial packfile
> at once might be a problem. (I've seen such questions on Stack
> Overflow, for example.)

Good points. I'll add these in the next revision.

> Having said that, I think overall this is a good idea and I'm glad to
> see a proposal for it.

Thanks, and thanks for your comments too.

^ permalink raw reply

* Re: [ANNOUNCE] Git v2.20.0-rc2
From: Ævar Arnfjörð Bjarmason @ 2018-12-04 19:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linux Kernel, git-packagers, Stefan Beller, Jeff King
In-Reply-To: <xmqq36rhjnts.fsf@gitster-ct.c.googlers.com>


On Sat, Dec 01 2018, Junio C Hamano wrote:

>  * "git ls-remote $there foo" was broken by recent update for the
>    protocol v2 and stopped showing refs that match 'foo' that are not
>    refs/{heads,tags}/foo, which has been fixed.
>    (merge 6a139cdd74 jk/proto-v2-ref-prefix-fix later to maint).

I started bisecting this thinking it was a regression, but found that
the reason the instructions in the protocol v2 announcement[1] don't
work anymore is because of it was due to 631f0f8c4b ("ls-remote: do not
send ref prefixes for patterns", 2018-10-31).

Perhaps we should note this more prominently, and since Brandon isn't at
Google anymore can some of you working there edit this post? It's the
first Google result for "git protocol v2", so it's going to be quite
confusing for people if after 2.20 the instructions in it no longer
work.

1. https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html

^ permalink raw reply

* Re: [WIP RFC 5/5] upload-pack: send part of packfile response as uri
From: Stefan Beller @ 2018-12-04 20:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <707a8181a0e8cd2e21989fa6da59cee38d9fadde.1543879256.git.jonathantanmy@google.com>

On Mon, Dec 3, 2018 at 3:38 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> This is a partial implementation of upload-pack sending part of its
> packfile response as URIs.

It does so by implementing a new flag `--exclude-configured-blobs`
in pack-objects, which would change the output of pack-objects to
output a list of URLs (of the excluded blobs) followed by the
pack to be asked for.

This design seems easy to implement as then upload-pack
can just parse the output and only needs to insert
"packfile" and "packfile-uris\n" at the appropriate places
of the stream, otherwise it just passes through the information
obtained from pack-objects.

The design as-is would make for hard documentation of
pack-objects (its output is not just a pack anymore when that
flag is given, but a highly optimized byte stream).

Initially I did not anticipate this to be one of the major design problems
as I assumed we'd want to use this pack feature over broadly (e.g.
eventually by offloading most of the objects into a base pack that
is just always included as the likelihood for any object in there is
very high on initial clone), but it makes total sense to only
output the URIs that we actually need.

An alternative that comes very close to the current situation
would be to either pass a file path or file descriptor (that upload-pack
listens to in parallel) to pack-objects as an argument of the new flag.
Then we would not need to splice the protocol sections into the single
output stream, but we could announce the sections, then flush
the URIs and then flush the pack afterwards.

I looked at this quickly, but that would need either extensions in
run-command.c for setting up the new fd for us, as there we already
have OS specific code for these setups, or we'd have to duplicate
some of the logic here, which doesn't enthuse me either.

So maybe we'd create a temp file via mkstemp and pass
the file name to pack-objects for writing the URIs and then
we'd just need to stream that file?

> +       # NEEDSWORK: "git clone" fails here because it ignores the URI provided
> +       # instead of fetching it.
> +       test_must_fail env GIT_TRACE_PACKET="$(pwd)/log" \
> +               git -c protocol.version=2 clone \
> +               "$HTTPD_URL/smart/http_parent" http_child 2>err &&
> +       # Although "git clone" fails, we can still check that the server
> +       # provided the URI we requested and that the error message pinpoints
> +       # the object that is missing.
> +       grep "clone< uri https://example.com/a-uri" log &&
> +       test_i18ngrep "did not receive expected object $(cat h)" err

That is a nice test!

^ permalink raw reply

* Re: [PATCH/RFC v3 00/14] Introduce new commands switch-branch and restore-files
From: Eric Sunshine @ 2018-12-04 21:18 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Elijah Newren, Ævar Arnfjörð Bjarmason, Git List,
	Junio C Hamano, Stefan Beller, Thomas Gummerer, sxenos
In-Reply-To: <CACsJy8DEMHFTnL2QJu5Csb1jUQeu0HiT3rTDii4krrEJcoh=Qw@mail.gmail.com>

On Tue, Dec 4, 2018 at 11:28 AM Duy Nguyen <pclouds@gmail.com> wrote:
> Haven't really worked on killing the term "detached HEAD" yet. But I
> noticed the other day that git-branch reports
>
> * (HEAD detached from 703266f6e4)
>
> and I didn't know how to rephrase that. I guess "unnamed branch from
> 703266f6e4" is probably good enough but my old-timer brain screams no.

"git worktree add" and "git worktree show" also report similar messages.

^ permalink raw reply

* Re: [PATCH] rebase docs: fix incorrect format of the section Behavioral Differences
From: Elijah Newren @ 2018-12-04 21:24 UTC (permalink / raw)
  To: git; +Cc: j6t, gitster, Elijah Newren
In-Reply-To: <f26b53e3-e7d1-f0fe-cdd3-dd734beb1628@kdbg.org>

On Mon, 3 Dec 2018, Johannes Sixt wrote:

> The text body of section Behavioral Differences is typeset as code,
> but should be regular text. Remove the indentation to achieve that.
> 
> While here, prettify the language:
> 
> - use "the x backend" instead of "x-based rebase";
> - use present tense instead of future tense;
> 
> and use subsections instead of a list.
> 
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> ---
> Cf. https://git-scm.com/docs/git-rebase#_behavioral_differences
> 
> I actually did not test the result, because I don't have the
> infrastructure.
> 
> The sentence "The am backend sometimes does not" is not very useful,
> but is not my fault ;) It would be great if it could be made more
> specific, but I do not know the details.

That sentence is my fault.  I've been sitting on a patch for about a
week that fixes it which I was going to submit after 2.20.0, but since
you're fixing this text up right now, I guess putting these two patches
together makes sense.  I've rebased it on top of your commit and provided
it below.

-- 8< --
Subject: [PATCH] git-rebase.txt: update note about directory rename detection
 and am

In commit 6aba117d5cf7 ("am: avoid directory rename detection when
calling recursive merge machinery", 2018-08-29), the git-rebase manpage
probably should have also been updated to note the stronger
incompatibility between git-am and directory rename detection.  Update
it now.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-rebase.txt | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 41631df6e4..b220b8b2b6 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -569,8 +569,12 @@ it to keep commits that started empty.
 Directory rename detection
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The merge and interactive backends work fine with
-directory rename detection.  The am backend sometimes does not.
+The merge and interactive backends work fine with directory rename
+detection.  The am backend sometimes does not because it operates on
+"fake ancestors" that involve trees containing only the files modified
+in the patch.  Without accurate tree information, directory rename
+detection cannot safely operate and is thus turned off in the am
+backend.
 
 include::merge-strategies.txt[]
 
-- 
2.20.0.rc1.2.gca0e531e87


^ permalink raw reply related

* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
From: René Scharfe @ 2018-12-04 21:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Geert Jansen,
	Junio C Hamano, git@vger.kernel.org, Takuto Ikuta
In-Reply-To: <20181203220424.GA11883@sigill.intra.peff.net>

Am 03.12.2018 um 23:04 schrieb Jeff King:
> On Sun, Dec 02, 2018 at 11:52:50AM +0100, René Scharfe wrote:
> 
>>> And for mu.git, a ~20k object repo:
>>>
>>>     Test                                             origin/master     peff/jk/loose-cache       avar/check-collisions-config
>>>     -------------------------------------------------------------------------------------------------------------------------
>>>     0008.2: index-pack with 256*1 loose objects      0.59(0.91+0.06)   0.58(0.93+0.03) -1.7%     0.57(0.89+0.04) -3.4%
>>>     0008.3: index-pack with 256*10 loose objects     0.59(0.91+0.07)   0.59(0.92+0.03) +0.0%     0.57(0.89+0.03) -3.4%
>>>     0008.4: index-pack with 256*100 loose objects    0.59(0.91+0.05)   0.81(1.13+0.04) +37.3%    0.58(0.91+0.04) -1.7%
>>>     0008.5: index-pack with 256*250 loose objects    0.59(0.91+0.05)   1.23(1.51+0.08) +108.5%   0.58(0.91+0.04) -1.7%
>>>     0008.6: index-pack with 256*500 loose objects    0.59(0.90+0.06)   1.96(2.20+0.12) +232.2%   0.58(0.91+0.04) -1.7%
>>>     0008.7: index-pack with 256*750 loose objects    0.59(0.92+0.05)   2.72(2.92+0.17) +361.0%   0.58(0.90+0.04) -1.7%
>>>     0008.8: index-pack with 256*1000 loose objects   0.59(0.90+0.06)   3.50(3.67+0.21) +493.2%   0.57(0.90+0.04) -3.4%
>>
>> OK, here's another theory: The cache scales badly with increasing
>> numbers of loose objects because it sorts the array 256 times as it is
>> filled.  Loading it fully and sorting once would help, as would using
>> one array per subdirectory.
> 
> Yeah, that makes sense. This was actually how I had planned to do it
> originally, but then I ended up just reusing the existing single-array
> approach from the abbrev code.
> 
> I hadn't actually thought about the repeated sortings (but that
> definitely makes sense that they would hurt in these pathological
> cases), but more just that we get a 256x reduction in N for our binary
> search (in fact we already do this first-byte lookup-table trick for
> pack index lookups).

Skipping eight steps in a binary search is something, but it's faster
even without that.

Just realized that the demo code can use "lookup" instead of the much
more expensive "for_each_unique" to sort.  D'oh!  With that change:

  for command in '
      foreach (0..255) {
        $subdir = sprintf("%02x", $_);
        foreach (1..$ARGV[0]) {
          printf("append %s%038d\n", $subdir, $_);
        }
        # intermediate sort
        print "lookup " . "0" x 40 . "\n";
      }
    ' '
      foreach (0..255) {
        $subdir = sprintf("%02x", $_);
        foreach (1..$ARGV[0]) {
          printf("append %s%038d\n", $subdir, $_);
        }
      }
      # sort once at the end
      print "lookup " . "0" x 40 . "\n";
    ' '
      foreach (0..255) {
        $subdir = sprintf("%02x", $_);
        foreach (1..$ARGV[0]) {
          printf("append %s%038d\n", $subdir, $_);
        }
        # sort each subdirectory separately
        print "lookup " . "0" x 40 . "\n";
        print "clear\n";
      }
    '
  do
    time perl -e "$command" 1000 | t/helper/test-tool sha1-array | wc -l
  done

And the results make the scale of the improvement more obvious:

  256

  real    0m3.476s
  user    0m3.466s
  sys     0m0.099s
  1

  real    0m0.157s
  user    0m0.148s
  sys     0m0.046s
  256

  real    0m0.117s
  user    0m0.116s
  sys     0m0.051s

> Your patch looks good to me. We may want to do one thing on top:
> 
>> diff --git a/object-store.h b/object-store.h
>> index 8dceed0f31..ee67a50980 100644
>> --- a/object-store.h
>> +++ b/object-store.h
>> @@ -20,7 +20,7 @@ struct object_directory {
>>  	 * Be sure to call odb_load_loose_cache() before using.
>>  	 */
>>  	char loose_objects_subdir_seen[256];
>> -	struct oid_array loose_objects_cache;
>> +	struct oid_array loose_objects_cache[256];
> 
> The comment in the context there is warning callers to remember to load
> the cache first. Now that we have individual caches, might it make sense
> to change the interface a bit, and make these members private. I.e.,
> something like:
> 
>   struct oid_array *odb_loose_cache(struct object_directory *odb,
>                                     int subdir_nr) 
>   {
> 	if (!loose_objects_subdir_seen[subdir_nr])
> 		odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
> 
> 	return &odb->loose_objects_cache[subdir_nr];
>   }

Sure.  And it should take an object_id pointer instead of a subdir_nr --
less duplication, nicer interface (patch below).

It would be nice if it could return a const pointer to discourage
messing up the cache, but that's not compatible with oid_array_lookup().

And quick_has_loose() should be converted to object_id as well -- adding
a function that takes a SHA-1 is a regression. :D

René

---
 object-store.h |  8 ++++----
 sha1-file.c    | 19 ++++++++-----------
 sha1-name.c    |  4 +---
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/object-store.h b/object-store.h
index ee67a50980..dd9efdd276 100644
--- a/object-store.h
+++ b/object-store.h
@@ -48,11 +48,11 @@ void add_to_alternates_file(const char *dir);
 void add_to_alternates_memory(const char *dir);
 
 /*
- * Populate an odb's loose object cache for one particular subdirectory (i.e.,
- * the one that corresponds to the first byte of objects you're interested in,
- * from 0 to 255 inclusive).
+ * Populate and return the loose object cache array corresponding to the
+ * given object ID.
  */
-void odb_load_loose_cache(struct object_directory *odb, int subdir_nr);
+struct oid_array *odb_loose_cache(struct object_directory *odb,
+				  const struct object_id *oid);
 
 struct packed_git {
 	struct packed_git *next;
diff --git a/sha1-file.c b/sha1-file.c
index d2f5e65865..38af6d5d0b 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -924,7 +924,6 @@ static int open_sha1_file(struct repository *r,
 static int quick_has_loose(struct repository *r,
 			   const unsigned char *sha1)
 {
-	int subdir_nr = sha1[0];
 	struct object_id oid;
 	struct object_directory *odb;
 
@@ -932,9 +931,7 @@ static int quick_has_loose(struct repository *r,
 
 	prepare_alt_odb(r);
 	for (odb = r->objects->odb; odb; odb = odb->next) {
-		odb_load_loose_cache(odb, subdir_nr);
-		if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr],
-				     &oid) >= 0)
+		if (oid_array_lookup(odb_loose_cache(odb, &oid), &oid) >= 0)
 			return 1;
 	}
 	return 0;
@@ -2159,24 +2156,24 @@ static int append_loose_object(const struct object_id *oid, const char *path,
 	return 0;
 }
 
-void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
+struct oid_array *odb_loose_cache(struct object_directory *odb,
+				  const struct object_id *oid)
 {
+	int subdir_nr = oid->hash[0];
+	struct oid_array *subdir_array = &odb->loose_objects_cache[subdir_nr];
 	struct strbuf buf = STRBUF_INIT;
 
-	if (subdir_nr < 0 ||
-	    subdir_nr >= ARRAY_SIZE(odb->loose_objects_subdir_seen))
-		BUG("subdir_nr out of range");
-
 	if (odb->loose_objects_subdir_seen[subdir_nr])
-		return;
+		return subdir_array;
 
 	strbuf_addstr(&buf, odb->path);
 	for_each_file_in_obj_subdir(subdir_nr, &buf,
 				    append_loose_object,
 				    NULL, NULL,
-				    &odb->loose_objects_cache[subdir_nr]);
+				    subdir_array);
 	odb->loose_objects_subdir_seen[subdir_nr] = 1;
 	strbuf_release(&buf);
+	return subdir_array;
 }
 
 static int check_stream_sha1(git_zstream *stream,
diff --git a/sha1-name.c b/sha1-name.c
index fdb22147b2..4fc6368ce5 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -87,7 +87,6 @@ static int match_sha(unsigned, const unsigned char *, const unsigned char *);
 
 static void find_short_object_filename(struct disambiguate_state *ds)
 {
-	int subdir_nr = ds->bin_pfx.hash[0];
 	struct object_directory *odb;
 
 	for (odb = the_repository->objects->odb;
@@ -96,8 +95,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
 		int pos;
 		struct oid_array *loose_subdir_objects;
 
-		odb_load_loose_cache(odb, subdir_nr);
-		loose_subdir_objects = &odb->loose_objects_cache[subdir_nr];
+		loose_subdir_objects = odb_loose_cache(odb, &ds->bin_pfx);
 		pos = oid_array_lookup(loose_subdir_objects, &ds->bin_pfx);
 		if (pos < 0)
 			pos = -1 - pos;
-- 
2.19.2

^ permalink raw reply related

* Re: [PATCH] rebase docs: fix incorrect format of the section Behavioral Differences
From: Johannes Sixt @ 2018-12-04 21:53 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, gitster
In-Reply-To: <20181204212411.11572-1-newren@gmail.com>

Am 04.12.18 um 22:24 schrieb Elijah Newren:
> +....  The am backend sometimes does not because it operates on
> +"fake ancestors" that involve trees containing only the files modified
> +in the patch.  Without accurate tree information, directory rename
> +detection cannot safely operate and is thus turned off in the am
> +backend.

Directory rename detection does not work sometimes. That is, it works 
most of the time. But how can that be the case when it is turned off?

-- Hannes

^ permalink raw reply

* [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
From: Jonathan Tan @ 2018-12-04 22:42 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When fetching into a repository, a connectivity check is first made by
check_exist_and_connected() in builtin/fetch.c that runs:

  git rev-list --objects --stdin --not --all --quiet <(list of objects)

If the client repository has many refs, this command can be slow,
regardless of the nature of the server repository or what is being
fetched. A profiler reveals that most of the time is spent in
setup_revisions() (approx. 60/63), and of the time spent in
setup_revisions(), most of it is spent in parse_object() (approx.
49/60). This is because setup_revisions() parses the target of every ref
(from "--all"), and parse_object() reads the buffer of the object.

Reading the buffer is unnecessary if the repository has a commit graph
and if the ref points to a commit (which is typically the case). This
patch uses the commit graph wherever possible; on my computer, when I
run the above command with a list of 1 object on a many-ref repository,
I get a speedup from 1.8s to 1.0s.

Another way to accomplish this effect would be to modify parse_object()
to use the commit graph if possible; however, I did not want to change
parse_object()'s current behavior of always checking the object
signature of the returned object.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
This is on sb/more-repo-in-api because I'm using the repo_parse_commit()
function.

A colleague noticed this issue when handling a mirror clone.

Looking at the bigger picture, the speed of the connectivity check
during a fetch might be further improved by passing only the negotiation
tips (obtained through --negotiation-tip) instead of "--all". This patch
just handles the low-hanging fruit first.
---
 revision.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/revision.c b/revision.c
index b5108b75ab..e7da2c57ab 100644
--- a/revision.c
+++ b/revision.c
@@ -212,7 +212,20 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
 {
 	struct object *object;
 
-	object = parse_object(revs->repo, oid);
+	/*
+	 * If the repository has commit graphs, repo_parse_commit() avoids
+	 * reading the object buffer, so use it whenever possible.
+	 */
+	if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
+		struct commit *c = lookup_commit(revs->repo, oid);
+		if (!repo_parse_commit(revs->repo, c))
+			object = (struct object *) c;
+		else
+			object = NULL;
+	} else {
+		object = parse_object(revs->repo, oid);
+	}
+
 	if (!object) {
 		if (revs->ignore_missing)
 			return object;
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related

* Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
From: Stefan Beller @ 2018-12-04 23:12 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <20181204224238.50966-1-jonathantanmy@google.com>

On Tue, Dec 4, 2018 at 2:42 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> When fetching into a repository, a connectivity check is first made by
> check_exist_and_connected() in builtin/fetch.c that runs:
>
>   git rev-list --objects --stdin --not --all --quiet <(list of objects)
>
> If the client repository has many refs, this command can be slow,
> regardless of the nature of the server repository or what is being
> fetched. A profiler reveals that most of the time is spent in
> setup_revisions() (approx. 60/63), and of the time spent in
> setup_revisions(), most of it is spent in parse_object() (approx.
> 49/60). This is because setup_revisions() parses the target of every ref
> (from "--all"), and parse_object() reads the buffer of the object.
>
> Reading the buffer is unnecessary if the repository has a commit graph
> and if the ref points to a commit (which is typically the case). This
> patch uses the commit graph wherever possible; on my computer, when I
> run the above command with a list of 1 object on a many-ref repository,
> I get a speedup from 1.8s to 1.0s.
>
> Another way to accomplish this effect would be to modify parse_object()
> to use the commit graph if possible; however, I did not want to change
> parse_object()'s current behavior of always checking the object
> signature of the returned object.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> This is on sb/more-repo-in-api because I'm using the repo_parse_commit()
> function.

This is a mere nicety, not strictly required.
Before we had parse_commit(struct commit *) which would accomplish the
same, (and we'd still have that afterwards as a #define falling back onto
the_repository). As the function get_reference() is not the_repository safe
as it contains a call to is_promisor_object() that is repository
agnostic, I think
it would be fair game to not depend on that series. I am not
complaining, though.

> A colleague noticed this issue when handling a mirror clone.
>
> Looking at the bigger picture, the speed of the connectivity check
> during a fetch might be further improved by passing only the negotiation
> tips (obtained through --negotiation-tip) instead of "--all". This patch
> just handles the low-hanging fruit first.
> ---
>  revision.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/revision.c b/revision.c
> index b5108b75ab..e7da2c57ab 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -212,7 +212,20 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
>  {
>         struct object *object;
>
> -       object = parse_object(revs->repo, oid);
> +       /*
> +        * If the repository has commit graphs, repo_parse_commit() avoids
> +        * reading the object buffer, so use it whenever possible.
> +        */
> +       if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
> +               struct commit *c = lookup_commit(revs->repo, oid);
> +               if (!repo_parse_commit(revs->repo, c))
> +                       object = (struct object *) c;
> +               else
> +                       object = NULL;

Would it make sense in this case to rely on parse_object below
instead of assigning NULL? The reason for that would be that
when lookup_commit returns NULL, we would try more broadly.

AFAICT oid_object_info doesn't take advantage of the commit graph,
but just looks up the object header, which is still less than completely
parsing it. Then lookup_commit is overly strict, as it may return
NULL as when there still is a type mismatch (I don't think a mismatch
could happen here, as both rely on just the object store, and not the
commit graph.), so this would be just defensive programming for
the sake of it. I dunno.

    struct commit *c;

    if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT &&
        (c = lookup_commit(revs->repo, oid)) &&
        !repo_parse_commit(revs->repo, c))
            object = (struct object *) c;
    else
        object = parse_object(revs->repo, oid);


So with all that said, I still think this is a good patch.

Thanks,
Stefan

> +       } else {
> +               object = parse_object(revs->repo, oid);
> +       }
> +
>         if (!object) {
>                 if (revs->ignore_missing)
>                         return object;
> --
> 2.19.0.271.gfe8321ec05.dirty
>

^ permalink raw reply

* Re: [PATCH] rebase docs: fix incorrect format of the section Behavioral Differences
From: Elijah Newren @ 2018-12-04 23:17 UTC (permalink / raw)
  To: git; +Cc: j6t, gitster, Elijah Newren
In-Reply-To: <f2ed3730-03f3-ae92-234c-e7500eaa5c33@kdbg.org>


On Tue, Dec 4, 2018 at 1:53 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 04.12.18 um 22:24 schrieb Elijah Newren:
> > +....  The am backend sometimes does not because it operates on
> > +"fake ancestors" that involve trees containing only the files modified
> > +in the patch.  Without accurate tree information, directory rename
> > +detection cannot safely operate and is thus turned off in the am
> > +backend.
>
> Directory rename detection does not work sometimes. That is, it works
> most of the time. But how can that be the case when it is turned off?

Gah, when I was rebasing on your patch I adopted your sentence rewrite
but forgot to remove the "sometimes".  Thanks for catching; correction:

-- 8< --
Subject: [PATCH v2] git-rebase.txt: update note about directory rename
 detection and am

In commit 6aba117d5cf7 ("am: avoid directory rename detection when
calling recursive merge machinery", 2018-08-29), the git-rebase manpage
probably should have also been updated to note the stronger
incompatibility between git-am and directory rename detection.  Update
it now.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-rebase.txt | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 41631df6e4..ef76cccf3f 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -569,8 +569,12 @@ it to keep commits that started empty.
 Directory rename detection
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The merge and interactive backends work fine with
-directory rename detection.  The am backend sometimes does not.
+The merge and interactive backends work fine with directory rename
+detection.  The am backend does not because it operates on "fake
+ancestors" that involve trees containing only the files modified in
+the patch.  Without accurate tree information, directory rename
+detection cannot safely operate and is thus turned off in the am
+backend.
 
 include::merge-strategies.txt[]
 
-- 
2.20.0.rc1.2.gca0e531e87

 

^ permalink raw reply related

* Re: [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it
From: Jonathan Tan @ 2018-12-05  0:12 UTC (permalink / raw)
  To: sbeller; +Cc: git, jonathantanmy, gitster
In-Reply-To: <20181129002756.167615-4-sbeller@google.com>

> We can string_list_insert() to maintain sorted-ness of the
> list as we find new items, or we can string_list_append() to
> build an unsorted list and sort it at the end just once.
> 
> As we do not rely on the sortedness while building the
> list, we pick the "append and sort at the end" as it
> has better worst case execution times.

I would write this entire commit message as:

  Instead of using unsorted_string_list_lookup(), sort
  changed_submodule_names before performing any lookups so that we can
  use the faster string_list_lookup() instead.

The code in this patch is fine, and patches 1-2 are fine too.

^ permalink raw reply

* Re: [PATCH 7/9] submodule: migrate get_next_submodule to use repository structs
From: Jonathan Tan @ 2018-12-05  0:17 UTC (permalink / raw)
  To: sbeller; +Cc: git, jonathantanmy
In-Reply-To: <20181129002756.167615-8-sbeller@google.com>

> We used to recurse into submodules, even if they were broken having
> only an objects directory. The child process executed in the submodule
> would fail though if the submodule was broken. This is tested via
> "fetching submodule into a broken repository" in t5526.
> 
> This patch tightens the check upfront, such that we do not need
> to spawn a child process to find out if the submodule is broken.

Thanks, patches 4-7 look good to me - I see that you have addressed all
my comments. Not sending one email each for patches 4, 5, and 6 -
although I have commented on all of them, my comments were minor.

My more in-depth review was done on a previous version [1], and I see
that my comments have been addressed. Also, Stefan says [2] (and implements
in this patch):

> > > If the working tree directory is empty for that submodule, it means
> > > it is likely not initialized. But why would we use that as a signal to
> > > skip the submodule?
> >
> > What I meant was: if empty, skip it completely. Otherwise, do the
> > repo_submodule_init() and repo_init() thing, and if they both fail, set
> > spf->result to 1, preserving existing behavior.
> 
> I did it the other way round:
> 
> If repo_[submodule_]init fails, see if we have a gitlink in tree and
> an empty dir in the FS, to decide if we need to signal failure.

This works too.

[1] https://public-inbox.org/git/20181017225811.66554-1-jonathantanmy@google.com/
[2] https://public-inbox.org/git/CAGZ79kbNXD35ZwevjLZcrGsT=2hNcUPmVUWvP1RjsKSH0Gd3ww@mail.gmail.com/

^ permalink raw reply

* Re: [PATCH 8/9] submodule.c: fetch in submodules git directory instead of in worktree
From: Jonathan Tan @ 2018-12-05  0:38 UTC (permalink / raw)
  To: sbeller; +Cc: git, jonathantanmy
In-Reply-To: <20181129002756.167615-9-sbeller@google.com>

> Keep the properties introduced in 10f5c52656 (submodule: avoid
> auto-discovery in prepare_submodule_repo_env(), 2016-09-01), by fixating
> the git directory of the submodule.

This is to avoid the autodetection of the Git repository, making it less
error-prone; looks good to me.

^ permalink raw reply

* Re: [PATCH 9/9] fetch: try fetching submodules if needed objects were not fetched
From: Jonathan Tan @ 2018-12-05  1:07 UTC (permalink / raw)
  To: sbeller; +Cc: git, jonathantanmy
In-Reply-To: <20181129002756.167615-10-sbeller@google.com>

> Try fetching a submodule by object id if the object id that the
> superproject points to, cannot be found.

Mention here the consequences of what happens when this attempt to fetch
fails. Also, this seems to be a case of "do or do not, there is no try"
- maybe it's better to say "Fetch commits by ID from the submodule's
origin if the submodule doesn't already contain the commit that the
superproject references" (note that there is no "Try" word, since the
consequence is to fail the entire command).

Also mention that this fetch is always from the origin.

> builtin/fetch used to only inspect submodules when they were fetched
> "on-demand", as in either on/off case it was clear whether the submodule
> needs to be fetched. However to know whether we need to try fetching the
> object ids, we need to identify the object names, which is done in this
> function check_for_new_submodule_commits(), so we'll also run that code
> in case the submodule recursion is set to "on".
> 
> The submodule checks were done only when a ref in the superproject
> changed, these checks were extended to also be performed when fetching
> into FETCH_HEAD for completeness, and add a test for that too.

"inspect submodules" and "submodule checks" are unnecessarily vague to
me - might be better to just say "A list of new submodule commits are
already generated in certain conditions (by
check_for_new_submodule_commits()); this new feature invokes that
function in more situations".

> -		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
> -		    (recurse_submodules != RECURSE_SUBMODULES_ON))
> -			check_for_new_submodule_commits(&ref->new_oid);
>  		r = s_update_ref(msg, ref, 0);
>  		format_display(display, r ? '!' : '*', what,
>  			       r ? _("unable to update local ref") : NULL,
> @@ -779,9 +776,6 @@ static int update_local_ref(struct ref *ref,
>  		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
>  		strbuf_addstr(&quickref, "..");
>  		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
> -		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
> -		    (recurse_submodules != RECURSE_SUBMODULES_ON))
> -			check_for_new_submodule_commits(&ref->new_oid);
>  		r = s_update_ref("fast-forward", ref, 1);
>  		format_display(display, r ? '!' : ' ', quickref.buf,
>  			       r ? _("unable to update local ref") : NULL,
> @@ -794,9 +788,6 @@ static int update_local_ref(struct ref *ref,
>  		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
>  		strbuf_addstr(&quickref, "...");
>  		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
> -		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
> -		    (recurse_submodules != RECURSE_SUBMODULES_ON))
> -			check_for_new_submodule_commits(&ref->new_oid);
>  		r = s_update_ref("forced-update", ref, 1);
>  		format_display(display, r ? '!' : '+', quickref.buf,
>  			       r ? _("unable to update local ref") : _("forced update"),
> @@ -892,6 +883,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  				ref->force = rm->peer_ref->force;
>  			}
>  
> +			if (recurse_submodules != RECURSE_SUBMODULES_OFF)
> +				check_for_new_submodule_commits(&rm->old_oid);

As discussed above, indeed, check_for_new_submodule_commits() is now
invoked in more situations (not just when there is a local ref, and also
when recurse_submodules is _ON).

> @@ -1231,8 +1231,14 @@ struct submodule_parallel_fetch {
>  	int result;
>  
>  	struct string_list changed_submodule_names;
> +
> +	/* The submodules to fetch in */
> +	struct fetch_task **oid_fetch_tasks;
> +	int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;
>  };

Better to document as "Pending fetches by OIDs", I think. (These are not
fetches by default refspec, and are not already in progress.)

> +struct fetch_task {
> +	struct repository *repo;
> +	const struct submodule *sub;
> +	unsigned free_sub : 1; /* Do we need to free the submodule? */
> +
> +	/* fetch specific oids if set, otherwise fetch default refspec */
> +	struct oid_array *commits;
> +};

I would document this as "Fetch in progress (if callback data) or
pending (if in oid_fetch_tasks in struct submodule_parallel_fetch)".
This potential confusion is why I wanted 2 separate types, as I wrote in
[1].

[1] https://public-inbox.org/git/20181026204106.132296-1-jonathantanmy@google.com/

> +/**
> + * When a submodule is not defined in .gitmodules, we cannot access it
> + * via the regular submodule-config. Create a fake submodule, which we can
> + * work on.
> + */
> +static const struct submodule *get_non_gitmodules_submodule(const char *path)

Thanks, this is a good explanation.

>  static int get_next_submodule(struct child_process *cp,
>  			      struct strbuf *err, void *data, void **task_cb)
>  {
> -	int ret = 0;
>  	struct submodule_parallel_fetch *spf = data;
>  
>  	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
> -		struct strbuf submodule_prefix = STRBUF_INIT;
> +		int recurse_config;

Unnecessary local variable recurse_config, but that's not a big deal.

[snip the part where we use a heap-allocated task instead of a few
variables on the stack]

> -			repo_clear(repo);
> -			free(repo);
> -			ret = 1;
> +			spf->count++;
> +			*task_cb = task;

And here we see why we need the heap-allocated task - it needs to go
into the callback data.

> +	if (spf->oid_fetch_tasks_nr) {
> +		struct fetch_task *task =
> +			spf->oid_fetch_tasks[spf->oid_fetch_tasks_nr - 1];
> +		struct strbuf submodule_prefix = STRBUF_INIT;
> +		spf->oid_fetch_tasks_nr--;
> +
> +		strbuf_addf(&submodule_prefix, "%s%s/",
> +			    spf->prefix, task->sub->path);
> +
> +		child_process_init(cp);
> +		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> +		cp->git_cmd = 1;
> +		cp->dir = task->repo->gitdir;
> +
> +		argv_array_init(&cp->args);
> +		argv_array_pushv(&cp->args, spf->args.argv);
> +		argv_array_push(&cp->args, "on-demand");
> +		argv_array_push(&cp->args, "--submodule-prefix");
> +		argv_array_push(&cp->args, submodule_prefix.buf);
> +
> +		/* NEEDSWORK: have get_default_remote from submodule--helper */
> +		argv_array_push(&cp->args, "origin");
> +		oid_array_for_each_unique(task->commits,
> +					  append_oid_to_argv, &cp->args);
> +
> +		*task_cb = task;
>  		strbuf_release(&submodule_prefix);
> -		if (ret) {
> -			spf->count++;
> -			return 1;
> -		}
> +		return 1;
>  	}

And if we ran out of submodules but have pending fetch-by-OID tasks, we
execute them.

> +static int commit_exists_in_sub(const struct object_id *oid, void *data)
> +{
> +	struct repository *subrepo = data;
> +
> +	enum object_type type = oid_object_info(subrepo, oid, NULL);
> +
> +	return type != OBJ_COMMIT;
> +}

Should be commit_missing_in_sub.

> +	/* Is this the second time we process this submodule? */
> +	if (task->commits)
> +		return 0;

Should be goto out, to clean up properly?

> +test_expect_success "fetch new submodule commits on-demand outside standard refspec" '
> +	# add a second submodule and ensure it is around in downstream first
> +	git clone submodule sub1 &&
> +	git submodule add ./sub1 &&
> +	git commit -m "adding a second submodule" &&
> +	git -C downstream pull &&
> +	git -C downstream submodule update --init --recursive &&
> +
> +	git checkout --detach &&
> +
> +	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
> +	git -C submodule update-ref refs/changes/1 $C &&
> +	git update-index --cacheinfo 160000 $C submodule &&
> +	test_tick &&
> +
> +	D=$(git -C sub1 commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
> +	git -C sub1 update-ref refs/changes/2 $D &&
> +	git update-index --cacheinfo 160000 $D sub1 &&
> +
> +	git commit -m "updated submodules outside of refs/heads" &&
> +	E=$(git rev-parse HEAD) &&
> +	git update-ref refs/changes/3 $E &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules origin refs/changes/3:refs/heads/my_branch &&
> +		git -C submodule cat-file -t $C &&
> +		git -C sub1 cat-file -t $D &&
> +		git checkout --recurse-submodules FETCH_HEAD
> +	)
> +'

It would be nicer if all these tests started from scratch (that is, an
empty repository), but they are understandable - thanks.

In addition to the tests here, I would like a test that checks that
submodule commits pointed to by interior superproject commits also work.
For example:

  submodule:
  B   C
   \ /
    A

  superproject:

  current HEAD -> a commit -> the ref we're fetching
  gitlink=A       gitlink=B   gitlink=C

When fetching recursively in the superproject, we should make sure that
both B and C are fetched in the submodule.

^ permalink raw reply

* Re: [PATCH v3 07/14] checkout: split into switch-branch and restore-files
From: Junio C Hamano @ 2018-12-05  2:14 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Elijah Newren, Ævar Arnfjörð Bjarmason,
	Git Mailing List, Stefan Beller, Thomas Gummerer, Stefan Xenos
In-Reply-To: <CACsJy8BTs+WKzTTEF2XVTT-LVJk_exYCz_hN+hXU1Dw+oquBpA@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

>> My single biggest worry about this whole series is that I'm worried
>> you're perpetuating and even further ingraining one of the biggest
>> usability problems with checkout: people suggest and use it for
>> reverting/restoring paths to a previous version, but it doesn't do
>> that:
>
> ...
>
>  git restore-files --from=master~10 Documentation/

The "single biggest worry" could be due to Elijah not being aware of
other recent discussions.  My understanding of the plan is

 - "git checkout" will learn a new "--[no-]overlay" option, where
   the current behaviour, i.e. "take paths in master~10 that match
   pathspec Documentation/, and overlay them on top of what is in
   the index and the working tree", is explained as "the overlay
   mode" and stays to be the default.  With "checkout --no-overlay
   master~10 Documentation/", the command will become "replace paths
   in the current index and the working tree that match the pathspec
   Documentation/ with paths in master~10 that match pathspec
   Documentation/".

 - "git restore-files --from=<tree> <pathspec>" by default will use
   "--no-overlay" semantics, but the users can still use "--overlay"
   from the command line as an option.

So "restore-files" would become truly "restore the state of
Documentation/ to match that of master~10", I would think.

>> Also, the fact that we're trying to make a simpler command makes me
>> think that removing the auto-vivify behavior from the default and
>> adding a simple flag which users can pass to request will allow this
>> part of the documentation to be hidden behind the appropriate flag,
>> which may make it easier for users to compartmentalize the command and
>> it's options, enabling them to learn as they go.
>
> Sounds good. I don't know a good name for this new option though so
> unless anybody comes up with some suggestion, I'll just disable
> checkout.defaultRemote in switch-branch. If it comes back as a new
> option, it can always be added later.

Are you two discussing the "checkout --guess" option?  I am somewhat
lost here.

>> > +-f::
>> > +--force::
>> > +       Proceed even if the index or the working tree differs from
>> > +       HEAD.  This is used to throw away local changes.
>>
>> Haven't thought through this thoroughly, but do we really need an
>> option for that instead of telling users to 'git reset --hard HEAD'
>> before switching branches if they want their stuff thrown away?
>
> For me it's just a bit more convenient. Hit an error when switching
> branch? Recall the command from bash history, stick -f in it and run.
> Elsewhere I think both Junio and Thomas (or maybe only Junio) suggests
> moving the "git reset" functionality without moving HEAD to one of
> these commands, which goes the opposite direction...

Isn't there a huge difference?  "checkout --force <other-branch>"
needs to clobber only the changes that are involved in the switch,
i.e. if your README.txt is the same between master and maint while
Makefile is different, after editing both files while on master, you
can not "switch-branch" to maint without doing something to Makefile
(i.e. either discard your local change or wiggle your local change
to the context of 'maint' with "checkout -m").  But you can carry
the changes to README.txt while checking out 'maint' branch.
Running "git reset --hard HEAD" would mean that you will lose the
changes to README.txt as well.

>> > +--orphan <new_branch>::
>> > +       Create a new 'orphan' branch, named <new_branch>, started from
>> > +       <start_point> and switch to it.  The first commit made on this
>>
>> What??  started from <start_point>?  The whole point of --orphan is
>> you have no parent, i.e. no start point.  Also, why does the
>> explanation reference an argument that wasn't in the immediately
>> preceding synopsis?
>
> I guess bad phrasing. It should be "switch to <start_point> first,
> then prepare the worktree so that the first commit will have no
> parent". Or something along that line.

It should be a <tree-ish>, no?  It is not a "point" in history, but
is "start with this tree".

I may have more comments on this message but that's it from me for
now.

^ permalink raw reply

* Re: [PATCH v3 07/14] checkout: split into switch-branch and restore-files
From: Junio C Hamano @ 2018-12-05  2:25 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Elijah Newren, Ævar Arnfjörð Bjarmason,
	Git Mailing List, Stefan Beller, Thomas Gummerer, Stefan Xenos
In-Reply-To: <CACsJy8D9Rgsf-E6yweQxpopFaOVZ1bgihEbg200yS1gup+Gt7Q@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

> On Tue, Dec 4, 2018 at 6:43 PM Elijah Newren <newren@gmail.com> wrote:
>> > > > +--ours::
>> > > > +--theirs::
>> ...
>> go away.  Maybe it can still be fixed (I haven't dug too deeply into
>> it), but if so, the only fix needed here would be to remove this long
>> explanation about why the tool gets things totally backward.
>
> Aha. I' not really deep in this merge business to know if stages 2 and
> 3 can be swapped. This is right up your alley. I'll just leave it to
> you.

Please don't show stage#2 and stage#3 swapped to the end user,
unless that is protected behind an option (not per-repo config).
It is pretty much ingrained that stage#2 is what came from the
commit that will become the first parent of the commit being
prepared, and changing it without an explicit command line option
will break tools.

> I'm actually still not sure how to move it here (I guess 'here' is
> restore-files since we won't move HEAD). All the --mixed, --merge and
> --hard are confusing. But maybe we could just make 'git restore-files
> --from HEAD -f :/" behave just like "git reset --hard HEAD" (but with
> some safety net) But we can leave it for discussion in the next round.

Perhaps you two should pay a bit closer attention to what Thomas
Gummerer is working on.  I've touched above in my earlier comments,
too, e.g.  <xmqqefb3mhrs.fsf@gitster-ct.c.googlers.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox