Git development
 help / color / mirror / Atom feed
* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
From: brian m. carlson @ 2018-12-04  1:54 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <0461b362569362c6d0e73951469c547a03a1b59d.1543879256.git.jonathantanmy@google.com>

[-- Attachment #1: Type: text/plain, Size: 3058 bytes --]

On Mon, Dec 03, 2018 at 03:37:35PM -0800, Jonathan Tan wrote:
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Documentation/technical/packfile-uri.txt | 83 ++++++++++++++++++++++++
>  Documentation/technical/protocol-v2.txt  |  6 +-
>  2 files changed, 88 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/technical/packfile-uri.txt
> 
> diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
> new file mode 100644
> index 0000000000..6535801486
> --- /dev/null
> +++ b/Documentation/technical/packfile-uri.txt
> @@ -0,0 +1,83 @@
> +Packfile URIs
> +=============
> +
> +This feature allows servers to serve part of their packfile response as URIs.
> +This allows server designs that improve scalability in bandwidth and CPU usage
> +(for example, by serving some data through a CDN), and (in the future) provides
> +some measure of resumability to clients.
> +
> +This feature is available only in protocol version 2.
> +
> +Protocol
> +--------
> +
> +The server advertises `packfile-uris`.
> +
> +If the client replies with the following arguments:
> +
> + * packfile-uris
> + * thin-pack
> + * ofs-delta
> +
> +when the server sends the packfile, it MAY send a `packfile-uris` section
> +directly before the `packfile` section (right after `wanted-refs` if it is
> +sent) containing HTTP(S) URIs. See protocol-v2.txt for the documentation of
> +this section.
> +
> +Clients then should understand that the returned packfile could be incomplete,
> +and that it needs to download all the given URIs before the fetch or clone is
> +complete. Each URI should point to a Git packfile (which may be a thin pack and
> +which may contain offset deltas).


Some thoughts here:

First, I'd like to see a section (and a bit in the implementation)
requiring HTTPS if the original protocol is secure (SSH or HTTPS).
Allowing the server to downgrade to HTTP, even by accident, would be a
security problem.

Second, this feature likely should be opt-in for SSH. One issue I've
seen repeatedly is that people don't want to use HTTPS to fetch things
when they're using SSH for Git. Many people in corporate environments
have proxies that break HTTP for non-browser use cases[0], and using SSH
is the only way that they can make a functional Git connection.

Third, I think the server needs to be required to both support Range
headers and never change the content of a URI, so that we can have
resumable clone implicit in this design. There are some places in the
world where connections are poor and fetching even the initial packfile
at once might be a problem. (I've seen such questions on Stack
Overflow, for example.)

Having said that, I think overall this is a good idea and I'm glad to
see a proposal for it.

[0] For example, a naughty-word filter may corrupt or block certain byte
sequences that occur incidentally in the pack stream.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply

* Re: [PATCH v3] range-diff: always pass at least minimal diff options
From: Junio C Hamano @ 2018-12-04  1:35 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: git, Johannes Schindelin, Ævar Arnfjörð Bjarmason,
	Martin Ågren
In-Reply-To: <20181203212131.11299-1-sunshine@sunshineco.com>

Eric Sunshine <sunshine@sunshineco.com> writes:

> This is a re-roll of Martin's v2[1]. The only difference from v2 is that
> it retains coloring when emitting to the terminal (plus an in-code
> comment was simplified).
>
> The regression introduced by d8981c3f88, in which the range-diff only
> ever gets emitted to the terminal, and never to the cover letter or
> commentary section of a standalone patch, makes the --range-diff option
> rather useless, so this fix probably ought to be fast-tracked.

Yup.  Thanks.  The only thing that makes me wonder is why any of the
existing tests (among which I think I saw range-diff driven from
format-patch) did not catch this rather obvious glitch.

> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index e497c1358f..048feaf6dd 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -248,18 +248,24 @@ test_expect_success 'dual-coloring' '
>  for prev in topic master..topic
>  do
>  	test_expect_success "format-patch --range-diff=$prev" '
> -		git format-patch --stdout --cover-letter --range-diff=$prev \
> +		git format-patch --cover-letter --range-diff=$prev \
>  			master..unmodified >actual &&

Ah, of course.  Then the "actual" file gets the names of the output
files; we expect to see 5 of them.

But now we have lost all the range-diff tests run under "--stdout",
so next time some other change regresses only that codepath, we will
not notice (which is fine for now but would want to be addressed
before the end of the year around which time we certainly will all
forget).

Thanks.

> -		grep "= 1: .* s/5/A" actual &&
> -		grep "= 2: .* s/4/A" actual &&
> -		grep "= 3: .* s/11/B" actual &&
> -		grep "= 4: .* s/12/B" actual
> +		test_when_finished "rm 000?-*" &&
> +		test_line_count = 5 actual &&
> +		test_i18ngrep "^Range-diff:$" 0000-* &&
> +		grep "= 1: .* s/5/A" 0000-* &&
> +		grep "= 2: .* s/4/A" 0000-* &&
> +		grep "= 3: .* s/11/B" 0000-* &&
> +		grep "= 4: .* s/12/B" 0000-*
>  	'
>  done
>  
>  test_expect_success 'format-patch --range-diff as commentary' '
> -	git format-patch --stdout --range-diff=HEAD~1 HEAD~1 >actual &&
> -	test_i18ngrep "^Range-diff:$" actual
> +	git format-patch --range-diff=HEAD~1 HEAD~1 >actual &&
> +	test_when_finished "rm 0001-*" &&
> +	test_line_count = 1 actual &&
> +	test_i18ngrep "^Range-diff:$" 0001-* &&
> +	grep "> 1: .* new message" 0001-*
>  '
>  
>  test_done

^ permalink raw reply

* Re: [PATCH/RFC v3 00/14] Introduce new commands switch-branch and restore-files
From: Elijah Newren @ 2018-12-04  1:28 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ævar Arnfjörð, Git Mailing List, Junio C Hamano,
	Stefan Beller, Thomas Gummerer, sxenos
In-Reply-To: <20181129215850.7278-1-pclouds@gmail.com>

On Thu, Nov 29, 2018 at 2:01 PM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>
> v3 sees switch-branch go back to switch-branch (in v2 it was
> checkout-branch). checkout-files is also renamed restore-files (v1 was
> restore-paths). Hopefully we won't see another rename.

I started reading through the patches.  I also tried to apply them
locally, but they had conflicts or missing base file version on both
master and next.  What version did you base it on?

I stopped at 07/14, and dropped my comments all there.  I didn't read
any further yet, and may wait for your post-2.20 reroll.

> I'll try to summarize the differences between the new commands and
> 'git checkout' down here, but you're welcome to just head to 07/14 and
> read the new man pages.
>
> 'git switch-branch'
>
> - does not "do nothing", you have to either switch branch, create a
>   new branch, or detach. "git switch-branch" with no arguments is
>   rejected.
>
> - implicit detaching is rejected. If you need to detach, you need to
>   give --detach. Or stick to 'git checkout'.
>
> - -b/-B is renamed to -c/-C with long option names
>
> - of course does not accept pathspec
>
> 'git restore-files'
>
> - takes a ref from --from argument, not as a free ref. As a result,
>   '--' is no longer needed. All non-option arguments are pathspec
>
> - pathspec is mandatory, you can't do "git restore-files" without any
>   pathspec.
>
> - I just remember -p which is allowed to take no pathspec :( I'll fix
>   it later.

This all looks good.  I commented elsewhere but please remember that
pathspec implies directories as a possibility and we really need to
fix the broken behavior of checkout when given a directory.

> - Two more fancy features (the "git checkout --index" being the
>   default mode and the backup log for accidental overwrites) are of
>   course still missing. But they are coming.
>
> I did not go replace "detached HEAD" with "unnamed branch" (or "no
> branch") everywhere because I think a unique term is still good to
> refer to this concept. Or maybe "no branch" is good enough. I dunno.

I personally like "unnamed branch", but "no branch" would still be
better than "detached HEAD".

^ permalink raw reply

* Re: [PATCH v3 07/14] checkout: split into switch-branch and restore-files
From: Elijah Newren @ 2018-12-04  0:45 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ævar Arnfjörð, Git Mailing List, Junio C Hamano,
	Stefan Beller, Thomas Gummerer, sxenos
In-Reply-To: <20181129215850.7278-8-pclouds@gmail.com>

On Thu, Nov 29, 2018 at 2:03 PM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>
> "git checkout" doing too many things is a source of confusion for many
> users (and it even bites old timers sometimes). To rememdy that, the
> command is now split in two: switch-branch and checkout-files. The

"checkout-files" here....(will comment more on this below)

> good old "git checkout" command is still here and will be until all
> (or most of users) are sick of it.
>
> See the new man pages for the final design of these commands. The
> actual implementation though is still pretty much the same as "git
> checkout". Following patches will adjust their behavior to match the
> man pages.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  .gitignore                          |   2 +
>  Documentation/git-checkout.txt      |   5 +
>  Documentation/git-restore-files.txt | 167 ++++++++++++++++
>  Documentation/git-switch-branch.txt | 289 ++++++++++++++++++++++++++++
>  Makefile                            |   2 +
>  builtin.h                           |   2 +
>  builtin/checkout.c                  |  84 ++++++--
>  command-list.txt                    |   2 +
>  git.c                               |   2 +
>  9 files changed, 543 insertions(+), 12 deletions(-)
>  create mode 100644 Documentation/git-restore-files.txt
>  create mode 100644 Documentation/git-switch-branch.txt
>
> diff --git a/.gitignore b/.gitignore
> index 0d77ea5894..c63dcb1427 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -143,6 +143,7 @@
>  /git-request-pull
>  /git-rerere
>  /git-reset
> +/git-restore-files

...and "restore-files" here.  Should be consistent with whatever name you pick.

>  /git-rev-list
>  /git-rev-parse
>  /git-revert
> @@ -167,6 +168,7 @@
>  /git-submodule
>  /git-submodule--helper
>  /git-svn
> +/git-switch-branch
>  /git-symbolic-ref
>  /git-tag
>  /git-unpack-file
> diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
> index 25887a6087..25ec7f508f 100644
> --- a/Documentation/git-checkout.txt
> +++ b/Documentation/git-checkout.txt
> @@ -406,6 +406,11 @@ $ edit frotz
>  $ git add frotz
>  ------------
>
> +SEE ALSO
> +--------
> +linkgit:git-switch-branch[1]
> +linkgit:git-restore-files[1]
> +
>  GIT
>  ---
>  Part of the linkgit:git[1] suite
> diff --git a/Documentation/git-restore-files.txt b/Documentation/git-restore-files.txt
> new file mode 100644
> index 0000000000..03c1250ad0
> --- /dev/null
> +++ b/Documentation/git-restore-files.txt
> @@ -0,0 +1,167 @@
> +git-restore-files(1)
> +====================
> +
> +NAME
> +----
> +git-restore-files - Restore working tree files
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git restore-files' [-f|--ours|--theirs|-m|--conflict=<style>] [--from=<tree-ish>] <pathspec>...

Suggesting that you can use both --ours and --from?  Or -f and --from?
 That seems bad; see below for more on this...

> +'git restore-files' [--from=<tree-ish>] <pathspec>...

Isn't this already inferred by the previous line?  Or was the
inclusion of --from on the previous line in error?  Looking at the
git-checkout manpage, it looks like you may have just been copying an
existing weirdness, but it needs to be fixed.  ;-)

> +'git restore-files' (-p|--patch) [--from=<tree-ish>] [<pathspec>...]
> +
> +DESCRIPTION
> +-----------
> +Updates files in the working tree to match the version in the index
> +or the specified tree.
> +
> +'git restore-files' [--from=<tree-ish>] <pathspec>...::

<tree-ish> and <pathspec>?  I understand <commit-ish> and <pathspec>,
or <tree-ish> but have no clue why it'd be okay to specify <tree-ish>
and <pathspec> together.  What does that even mean?

Also, rather than fixing from <tree-ish> to <commit-ish> or <commit>,
perhaps we should just use <revision> here?  (I'm thinking of git
rev-parse's "Specifying revisions", which suggests "revisions" as a
good substitute for "commit-ish" that isn't quite so weird for new
users.)

> +
> +       Overwrite paths in the working tree by replacing with the
> +       contents in the index or in the <tree-ish> (most often a
> +       commit).  When a <tree-ish> is given, the paths that
> +       match the <pathspec> are updated both in the index and in
> +       the working tree.

Is that the default we really want for this command?  Why do we
automatically assume these files are ready for commit?  I understand
that it's what checkout did, but I'd find it more natural to only
affect the working tree by default.  We can give it an option for
affecting the index instead (or perhaps in addition).

> ++
> +The index may contain unmerged entries because of a previous failed merge.
> +By default, if you try to check out such an entry from the index, the
> +checkout operation will fail and nothing will be checked out.
> +Using `-f` will ignore these unmerged entries.  The contents from a
> +specific side of the merge can be checked out of the index by
> +using `--ours` or `--theirs`.  With `-m`, changes made to the working tree
> +file can be discarded to re-create the original conflicted merge result.
> +
> +'git restore-files' (-p|--patch) [--from=<tree-ish>] [<pathspec>...]::
> +       This is similar to the "check out paths to the working tree
> +       from either the index or from a tree-ish" mode described
> +       above, but lets you use the interactive interface to show
> +       the "diff" output and choose which hunks to use in the
> +       result.  See below for the description of `--patch` option.
> +
> +OPTIONS
> +-------
> +-q::
> +--quiet::
> +       Quiet, suppress feedback messages.
> +
> +--[no-]progress::
> +       Progress status is reported on the standard error stream
> +       by default when it is attached to a terminal, unless `--quiet`
> +       is specified. This flag enables progress reporting even if not
> +       attached to a terminal, regardless of `--quiet`.
> +
> +-f::
> +--force::
> +       Do not fail upon unmerged entries; instead, unmerged entries
> +       are ignored.

You just copied this from the checkout manpage, but this is ambiguous
and/or wrong; using git-checkout (since your patch-series doesn't
apply cleanly to either master or next for me):

$ sha1sum counting
c0ed0e34b0fbef4274ef59480e0a0a1cb2776870  counting
$ git checkout counting; echo $?
error: path 'counting' is unmerged
1
$ git checkout -f counting; echo $?
warning: path 'counting' is unmerged
0
$ sha1sum counting
c0ed0e34b0fbef4274ef59480e0a0a1cb2776870  counting

Maybe printing a warning counts as "ignored", though it doesn't seem
like it.  However, even worse is:

$ git checkout -f HEAD~1 counting
$ sha1sum counting
612ca68d0305c821750a452e9d5bf050e915824f  counting

Now the unmerged entry wasn't ignored; it was updated in the working
tree and overwritten in the index.


Perhaps -f and --from should be incompatible and throw an error if
both are specified?  Also does "printed a warning message for"
actually count as "ignored" or should the documentation for this
option be updated?

> +--ours::
> +--theirs::
> +       Check out stage #2 ('ours') or #3 ('theirs') for unmerged
> +       paths.
> ++
> +Note that during `git rebase` and `git pull --rebase`, 'ours' and
> +'theirs' may appear swapped; `--ours` gives the version from the
> +branch the changes are rebased onto, while `--theirs` gives the
> +version from the branch that holds your work that is being rebased.
> ++
> +This is because `rebase` is used in a workflow that treats the
> +history at the remote as the shared canonical one, and treats the
> +work done on the branch you are rebasing as the third-party work to
> +be integrated, and you are temporarily assuming the role of the
> +keeper of the canonical history during the rebase.  As the keeper of
> +the canonical history, you need to view the history from the remote
> +as `ours` (i.e. "our shared canonical history"), while what you did
> +on your side branch as `theirs` (i.e. "one contributor's work on top
> +of it").

Total aside because I'm not sure what you could change here, but man
do I hate this.

> +
> +--ignore-skip-worktree-bits::
> +       In sparse checkout mode, update only entries matched by
> +       <paths> and sparse patterns in
> +       $GIT_DIR/info/sparse-checkout. This option ignores the sparse
> +       patterns and adds back any files in <paths>.

This doesn't make any sense now that you've removed the "`git checkout
-- <paths>` would" from the original.

> +
> +-m::
> +--merge::
> +       When checking out paths from the index, this option lets you
> +       recreate the conflicted merge in the specified paths.
> +
> +--conflict=<style>::
> +       The same as --merge option above, but changes the way the
> +       conflicting hunks are presented, overriding the
> +       merge.conflictStyle configuration variable.  Possible values are
> +       "merge" (default) and "diff3" (in addition to what is shown by
> +       "merge" style, shows the original contents).
> +
> +-p::
> +--patch::
> +       Interactively select hunks in the difference between the
> +       <tree-ish> (or the index, if unspecified) and the working
> +       tree.  The chosen hunks are then applied in reverse to the
> +       working tree (and if a <tree-ish> was specified, the index).
> ++
> +This means that you can use `git restore-files -p` to selectively
> +discard edits from your current working tree. See the ``Interactive
> +Mode'' section of linkgit:git-add[1] to learn how to operate the
> +`--patch` mode.
> +
> +--[no-]recurse-submodules::
> +       Using --recurse-submodules will update the content of all initialized
> +       submodules according to the commit recorded in the superproject. If
> +       local modifications in a submodule would be overwritten the checkout
> +       will fail unless `-f` is used. If nothing (or --no-recurse-submodules)
> +       is used, the work trees of submodules will not be updated.
> +       Just like linkgit:git-submodule[1], this will detach the
> +       submodules HEAD.
> +
> +<tree-ish>::
> +       Tree to checkout from (when paths are given). If not specified,
> +       the index will be used.

Again, I'd really rather use <revision> here.

> +
> +EXAMPLES
> +--------
> +
> +. The following sequence checks out the `master` branch, reverts
> +the `Makefile` to two revisions back, deletes hello.c by
> +mistake, and gets it back from the index.
> ++
> +------------
> +$ git switch-branch master                    <1>
> +$ git restore-files --from master~2 Makefile  <2>
> +$ rm -f hello.c
> +$ git restore-files hello.c                   <3>
> +------------
> ++
> +<1> switch branch
> +<2> take a file out of another commit
> +<3> restore hello.c from the index
> ++

Why is the switch-branch command labelled but not the rm command?  It
made sense in the original checkout manpage to label all checkout
commands, but here since only restore-files is being discussed it
seems the switch-branch should lose its label.

> +If you want to check out _all_ C source files out of the index,
> +you can say
> ++
> +------------
> +$ git restore-files '*.c'
> +------------
> ++
> +Note the quotes around `*.c`.  The file `hello.c` will also be
> +checked out, even though it is no longer in the working tree,
> +because the file globbing is used to match entries in the index
> +(not in the working tree by the shell).

Sounds good.

> ++
> +If you have an unfortunate branch that is named `hello.c`, this
> +step would be confused as an instruction to switch to that branch.
> +You should instead write:
> ++
> +------------
> +$ git restore-files hello.c
> +------------

Isn't the point of this command to allow us to remove paragraphs like
this last one?

> +
> +SEE ALSO
> +--------
> +linkgit:git-checkout[1]
> +
> +GIT
> +---
> +Part of the linkgit:git[1] suite


My single biggest worry about this whole series is that I'm worried
you're perpetuating and even further ingraining one of the biggest
usability problems with checkout: people suggest and use it for
reverting/restoring paths to a previous version, but it doesn't do
that:

git restore-files --from master~10 Documentation/
<edit some non-documentation files>
git add -u
git commit -m "Rationale for changing files including reverting Documentation/"

In particular, now you have a mixture of files in Documentation/ from
master~10 (er, now likely master~11) and HEAD~1; any files and
sub-directories that existed in HEAD~1 still remain and are mixed with
all other files in Documentation/ from the older commit.

You may think this is a special case, but this particular issue
results in some pretty bad surprises.  Also, it was pretty surprising
to me just how difficult it was to implement an svn-like revert in
EasyGit, in large part because of this 'oversight' in git.  git
checkout -- <paths> to me has always been fundamentally wrong, but I
just wasn't sure if I wanted to fight the backward compatibility
battle and suggest changing it.  With a new command, we definitely
shouldn't be reinforcing this error.  (And maybe we should consider
taking the time to fix git checkout too.)


> diff --git a/Documentation/git-switch-branch.txt b/Documentation/git-switch-branch.txt
> new file mode 100644
> index 0000000000..d5bf5cb37d
> --- /dev/null
> +++ b/Documentation/git-switch-branch.txt
> @@ -0,0 +1,289 @@
> +git-switch-branch(1)
> +====================
> +
> +NAME
> +----
> +git-switch-branch - Switch branches
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git switch-branch' [-q] [-f] [-m] <branch>
> +'git switch-branch' [-q] [-f] [-m] --detach [<commit>]
> +'git switch-branch' [-q] [-f] [-m] [[-b|-B|--orphan] <new_branch>] [<start_point>]

You label the options as -b/-B here, but -c/-C below.  Should be consistent.

> +
> +DESCRIPTION
> +-----------
> +Switch to a specified branch and update files in the working tree to
> +match it.
> +
> +'git switch-branch' <branch>::
> +       To prepare for working on <branch>, switch to it by updating
> +       the index and the files in the working tree. Local
> +       modifications to the files in the working tree are kept, so
> +       that they can be committed to the <branch>.
> ++
> +If <branch> is not found but there does exist a tracking branch in
> +exactly one remote (call it <remote>) with a matching name, treat as
> +equivalent to
> ++
> +------------
> +$ git switch-branch -b <branch> --track <remote>/<branch>
> +------------

If we're making --detach explicit (which I think is good), shouldn't
this also be made explicit?  I think I saw Junio arguing for this in
another thread.

Also, this is another case where you used -b instead of -c.  Finally,
--track wasn't mentioned in the synopsis but it is shown the first
time -b or -c is used?

> ++
> +If the branch exists in multiple remotes and one of them is named by
> +the `checkout.defaultRemote` configuration variable, we'll use that
> +one for the purposes of disambiguation, even if the `<branch>` isn't
> +unique across all remotes. Set it to
> +e.g. `checkout.defaultRemote=origin` to always checkout remote
> +branches from there if `<branch>` is ambiguous but exists on the
> +'origin' remote. See also `checkout.defaultRemote` in
> +linkgit:git-config[1].

So switch-branch will be controlled by checkout.* config variables?
That probably makes the most sense, but it does dilute the advantage
of adding these simpler commands.

Also, the fact that we're trying to make a simpler command makes me
think that removing the auto-vivify behavior from the default and
adding a simple flag which users can pass to request will allow this
part of the documentation to be hidden behind the appropriate flag,
which may make it easier for users to compartmentalize the command and
it's options, enabling them to learn as they go.

> +
> +'git switch-branch' -c|-C <new_branch> [<start_point>]::
> +
> +       Specifying `-c` causes a new branch to be created as if
> +       linkgit:git-branch[1] were called and then switched to. In
> +       this case you can use the `--track` or `--no-track` options,
> +       which will be passed to 'git branch'.  As a convenience,
> +       `--track` without `-c` implies branch creation; see the
> +       description of `--track` below.

Can we get rid of --track/--no-track and just provide a flag (which
takes no arguments) for the user to use?  Problems with --track:
  * it's not even in your synopsis
  * user has to repeat themselves (e.g. 'next' in two places from '-c
next --track origin/next'); this repetition is BOTH laborious AND
error-prone
  * it's rather inconsistent: --track is the default due to
auto-vivify when the user specifies nothing but a branch name that
doesn't exist yet, but when the user realizes the branch doesn't exist
yet and asks to have it created then suddenly tracking is not the
default??


I'm not sure what's best, but here's some food for thought:


   git switch-branch <branch>
switches to <branch>, if it exists.  Error cases:
  * If <branch> isn't actually a branch but a <tag> or
<remote-tracking-branch> or <revision>, error out and suggest using
--detach.
  * If <branch> isn't actually a branch but there is a similarly named
<remote-tracking-branch> (e.g. origin/<branch>), then suggest using
-c.

  git switch-branch -c <branch>
creates <branch> and, if a relevant-remote-tracking branch exists,
base the branch on that revision and set the new branch up to track
it.  Error cases:
  * If <branch> already exists, error out, suggesting -C or using a
non-conflicting name instead.

Other cases:
  * user wants a branch named 'master' that tracks 'origin/next'?  Use
git branch instead, don't support that in switch-branch.
  * user wants a branch named 'master' that doesn't track
'origin/master' despite 'origin/master' existing?  Use git branch
instead; don't support that in switch-branch.

> ++
> +If `-C` is given, <new_branch> is created if it doesn't exist;
> +otherwise, it is reset. This is the transactional equivalent of
> ++
> +------------
> +$ git branch -f <branch> [<start_point>]
> +$ git switch-branch <branch>
> +------------
> ++
> +that is to say, the branch is not reset/created unless "git
> +switch-branch" is successful.

...and when exactly would it fail?  Reading this, it looks like the
only possible error condition was removed due saying we'll reset the
branch if it already exists, so it's rather confusing.

> +
> +'git switch-branch' --detach [<commit>]::
> +
> +       Prepare to work on a unnamed branch on top of <commit> (see
> +       "DETACHED HEAD" section), and updating the index and the files
> +       in the working tree.  Local modifications to the files in the
> +       working tree are kept, so that the resulting working tree will
> +       be the state recorded in the commit plus the local
> +       modifications.
> ++
> +When the <commit> argument is a branch name, the `--detach` option can
> +be used to detach HEAD at the tip of the branch (`git switch-branch
> +<branch>` would check out that branch without detaching HEAD).
> ++
> +Omitting <commit> detaches HEAD at the tip of the current branch.
> +
> +OPTIONS
> +-------
> +-q::
> +--quiet::
> +       Quiet, suppress feedback messages.
> +
> +--[no-]progress::
> +       Progress status is reported on the standard error stream
> +       by default when it is attached to a terminal, unless `--quiet`
> +       is specified. This flag enables progress reporting even if not
> +       attached to a terminal, regardless of `--quiet`.
> +
> +-f::
> +--force::
> +       Proceed even if the index or the working tree differs from
> +       HEAD.  This is used to throw away local changes.

Haven't thought through this thoroughly, but do we really need an
option for that instead of telling users to 'git reset --hard HEAD'
before switching branches if they want their stuff thrown away?

> +-c <new_branch>::
> +--create <new_branch>::
> +       Create a new branch named <new_branch> and start it at
> +       <start_point>; see linkgit:git-branch[1] for details.
> +
> +-C <new_branch>::
> +--force-create <new_branch>::
> +       Creates the branch <new_branch> and start it at <start_point>;
> +       if it already exists, then reset it to <start_point>. This is
> +       equivalent to running "git branch" with "-f"; see
> +       linkgit:git-branch[1] for details.

Makes sense, but let's get the -b/-B vs. -c/-C consistent.

> +
> +-t::
> +--track::
> +       When creating a new branch, set up "upstream" configuration. See
> +       "--track" in linkgit:git-branch[1] for details.
> ++
> +If no `-c` option is given, the name of the new branch will be derived
> +from the remote-tracking branch, by looking at the local part of the
> +refspec configured for the corresponding remote, and then stripping
> +the initial part up to the "*".
> +This would tell us to use "hack" as the local branch when branching
> +off of "origin/hack" (or "remotes/origin/hack", or even
> +"refs/remotes/origin/hack").  If the given name has no slash, or the above
> +guessing results in an empty name, the guessing is aborted.  You can
> +explicitly give a name with `-c` in such a case.
> +
> +--no-track::
> +       Do not set up "upstream" configuration, even if the
> +       branch.autoSetupMerge configuration variable is true.

These two options and the intervening paragraph, while they make sense
to me, seem like the kind of stuff we'd want to throw out -- or at
least rework -- when trying to introduce a new command to simplify.
But I already discussed that above.

> +-l::
> +       Create the new branch's reflog; see linkgit:git-branch[1] for
> +       details.

??  Jettison this.

> +
> +--detach::
> +       Rather than checking out a branch to work on it, check out a
> +       commit for inspection and discardable experiments.
> +       This is the default behavior of "git checkout <commit>" when
> +       <commit> is not a branch name.  See the "DETACHED HEAD" section
> +       below for details.
> +
> +--orphan <new_branch>::
> +       Create a new 'orphan' branch, named <new_branch>, started from
> +       <start_point> and switch to it.  The first commit made on this

What??  started from <start_point>?  The whole point of --orphan is
you have no parent, i.e. no start point.  Also, why does the
explanation reference an argument that wasn't in the immediately
preceding synopsis?

> +       new branch will have no parents and it will be the root of a new
> +       history totally disconnected from all the other branches and
> +       commits.
> ++
> +The index and the working tree are adjusted as if you had previously run
> +"git checkout <start_point>".  This allows you to start a new history
> +that records a set of paths similar to <start_point> by easily running
> +"git commit -a" to make the root commit.
> ++
> +This can be useful when you want to publish the tree from a commit
> +without exposing its full history. You might want to do this to publish
> +an open source branch of a project whose current tree is "clean", but
> +whose full history contains proprietary or otherwise encumbered bits of
> +code.
> ++
> +If you want to start a disconnected history that records a set of paths
> +that is totally different from the one of <start_point>, then you should
> +clear the index and the working tree right after creating the orphan
> +branch by running "git rm -rf ." from the top level of the working tree.
> +Afterwards you will be ready to prepare your new files, repopulating the
> +working tree, by copying them from elsewhere, extracting a tarball, etc.

Ick.  Seems overly complex.  I'd rather that --orphan defaulted to
clearing the index and working tree, and that one would need to pass
HEAD for <start_point> if you wanted to start out with all those other
files.  That would certainly make the explanation a little clearer to
users, and more natural when they start experimenting with it.

However, --orphan is pretty special case.  Do we perhaps want to leave
it out of this new command and only include it in checkout?

> +-m::
> +--merge::
> +       If you have local modifications to one or more files that are
> +       different between the current branch and the branch to which
> +       you are switching, the command refuses to switch branches in
> +       order to preserve your modifications in context.  However,
> +       with this option, a three-way merge between the current
> +       branch, your working tree contents, and the new branch is
> +       done, and you will be on the new branch.
> ++
> +When a merge conflict happens, the index entries for conflicting
> +paths are left unmerged, and you need to resolve the conflicts
> +and mark the resolved paths with `git add` (or `git rm` if the merge
> +should result in deletion of the path).
> +
> +--conflict=<style>::
> +       The same as --merge option above, but changes the way the
> +       conflicting hunks are presented, overriding the
> +       merge.conflictStyle configuration variable.  Possible values are
> +       "merge" (default) and "diff3" (in addition to what is shown by
> +       "merge" style, shows the original contents).
> +
> +--ignore-other-worktrees::
> +       `git switch-branch` refuses when the wanted ref is already
> +       checked out by another worktree. This option makes it check
> +       the ref out anyway. In other words, the ref can be held by
> +       more than one worktree.

seems rather dangerous...is the goal to be an easier-to-use suggestion
for new users while checkout continues to exist, or is this command
meant to handle all branch switching functionality that checkout has?

> +
> +--[no-]recurse-submodules::
> +       Using --recurse-submodules will update the content of all initialized
> +       submodules according to the commit recorded in the superproject. If
> +       local modifications in a submodule would be overwritten the checkout
> +       will fail unless `-f` is used. If nothing (or --no-recurse-submodules)
> +       is used, the work trees of submodules will not be updated.
> +       Just like linkgit:git-submodule[1], this will detach the
> +       submodules HEAD.
> +
> +<branch>::
> +       Branch to checkout; if it refers to a branch (i.e., a name that,
> +       when prepended with "refs/heads/", is a valid ref), then that
> +       branch is checked out. Otherwise, if it refers to a valid
> +       commit, your HEAD becomes "detached" and you are no longer on
> +       any branch (see below for details).

I thought we requiring --detach in order to detach.  Does this
paragraph need updating?  Also, if we require --detach when we'll be
detaching HEAD, then this paragraph gets a LOT simpler.

> ++
> +You can use the `"@{-N}"` syntax to refer to the N-th last
> +branch/commit checked out using "git checkout" operation. You may
> +also specify `-` which is synonymous to `"@{-1}`.
> ++
> +As a special case, you may use `"A...B"` as a shortcut for the
> +merge base of `A` and `B` if there is exactly one merge base. You can
> +leave out at most one of `A` and `B`, in which case it defaults to `HEAD`.

I actually didn't know about the A...B special case for checkout.
Interesting...but I'm starting to wonder if this is too much info for
a "simplified command".

> +
> +<new_branch>::
> +       Name for the new branch.
> +
> +<start_point>::
> +       The name of a commit at which to start the new branch; see
> +       linkgit:git-branch[1] for details. Defaults to HEAD.

Erm, so if <start_point> is given, and there is an associated remote
tracking branch for the given branch name, perhaps we don't set up the
automatic tracking in contrast to what I mentioned above?  Hmm...

> +
> +DETACHED HEAD
> +-------------
> +include::detach-head.txt[]
> +
> +EXAMPLES
> +--------
> +
> +. The following sequence checks out the `master` branch.
> ++
> +------------
> +$ git switch-branch master
> +------------
> ++
> +
> +. After working in the wrong branch, switching to the correct
> +branch would be done using:
> ++
> +------------
> +$ git switch-branch mytopic
> +------------
> ++
> +However, your "wrong" branch and correct "mytopic" branch may
> +differ in files that you have modified locally, in which case
> +the above checkout would fail like this:
> ++
> +------------
> +$ git switch-branch mytopic
> +error: You have local changes to 'frotz'; not switching branches.
> +------------
> ++
> +You can give the `-m` flag to the command, which would try a
> +three-way merge:
> ++
> +------------
> +$ git switch-branch -m mytopic
> +Auto-merging frotz
> +------------
> ++
> +After this three-way merge, the local modifications are _not_
> +registered in your index file, so `git diff` would show you what
> +changes you made since the tip of the new branch.
> +
> +. When a merge conflict happens during switching branches with
> +the `-m` option, you would see something like this:
> ++
> +------------
> +$ git switch-branch -m mytopic
> +Auto-merging frotz
> +ERROR: Merge conflict in frotz
> +fatal: merge program failed
> +------------
> ++
> +At this point, `git diff` shows the changes cleanly merged as in
> +the previous example, as well as the changes in the conflicted
> +files.  Edit and resolve the conflict and mark it resolved with
> +`git add` as usual:
> ++
> +------------
> +$ edit frotz
> +$ git add frotz
> +------------
> +
> +SEE ALSO
> +--------
> +linkgit:git-checkout[1]
> +
> +GIT
> +---
> +Part of the linkgit:git[1] suite
> diff --git a/Makefile b/Makefile
> index 1a44c811aa..f035dbab9e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -777,9 +777,11 @@ BUILT_INS += git-format-patch$X
>  BUILT_INS += git-fsck-objects$X
>  BUILT_INS += git-init$X
>  BUILT_INS += git-merge-subtree$X
> +BUILT_INS += git-restore-files$X
>  BUILT_INS += git-show$X
>  BUILT_INS += git-stage$X
>  BUILT_INS += git-status$X
> +BUILT_INS += git-switch-branch$X
>  BUILT_INS += git-whatchanged$X
>
>  # what 'all' will build and 'install' will install in gitexecdir,
> diff --git a/builtin.h b/builtin.h
> index 6538932e99..01ed43ea69 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -214,6 +214,7 @@ extern int cmd_remote_fd(int argc, const char **argv, const char *prefix);
>  extern int cmd_repack(int argc, const char **argv, const char *prefix);
>  extern int cmd_rerere(int argc, const char **argv, const char *prefix);
>  extern int cmd_reset(int argc, const char **argv, const char *prefix);
> +extern int cmd_restore_files(int argc, const char **argv, const char *prefix);
>  extern int cmd_rev_list(int argc, const char **argv, const char *prefix);
>  extern int cmd_rev_parse(int argc, const char **argv, const char *prefix);
>  extern int cmd_revert(int argc, const char **argv, const char *prefix);
> @@ -227,6 +228,7 @@ extern int cmd_show_index(int argc, const char **argv, const char *prefix);
>  extern int cmd_status(int argc, const char **argv, const char *prefix);
>  extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
>  extern int cmd_submodule__helper(int argc, const char **argv, const char *prefix);
> +extern int cmd_switch_branch(int argc, const char **argv, const char *prefix);
>  extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
>  extern int cmd_tag(int argc, const char **argv, const char *prefix);
>  extern int cmd_tar_tree(int argc, const char **argv, const char *prefix);
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 764e1a83a1..7dc0f4d3f3 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -33,6 +33,16 @@ static const char * const checkout_usage[] = {
>         NULL,
>  };
>
> +static const char * const switch_branch_usage[] = {
> +       N_("git switch-branch [<options>] [<branch>]"),
> +       NULL,
> +};
> +
> +static const char * const restore_files_usage[] = {
> +       N_("git restore-files [<options>] [<branch>] -- <file>..."),
> +       NULL,
> +};
> +
>  struct checkout_opts {
>         int patch_mode;
>         int quiet;
> @@ -1302,31 +1312,23 @@ static struct option *add_checkout_path_options(struct checkout_opts *opts,
>         return newopts;
>  }
>
> -int cmd_checkout(int argc, const char **argv, const char *prefix)
> +static int checkout_main(int argc, const char **argv, const char *prefix,
> +                        struct checkout_opts *opts, struct option *options,
> +                        const char * const usagestr[])
>  {
> -       struct checkout_opts real_opts;
> -       struct checkout_opts *opts = &real_opts;
>         struct branch_info new_branch_info;
>         int dwim_remotes_matched = 0;
> -       struct option *options = NULL;
>
> -       memset(opts, 0, sizeof(*opts));
>         memset(&new_branch_info, 0, sizeof(new_branch_info));
>         opts->overwrite_ignore = 1;
>         opts->prefix = prefix;
>         opts->show_progress = -1;
> -       opts->dwim_new_local_branch = 1;
>
>         git_config(git_checkout_config, opts);
>
>         opts->track = BRANCH_TRACK_UNSPECIFIED;
>
> -       options = parse_options_dup(options);
> -       options = add_common_options(opts, options);
> -       options = add_switch_branch_options(opts, options);
> -       options = add_checkout_path_options(opts, options);
> -
> -       argc = parse_options(argc, argv, prefix, options, checkout_usage,
> +       argc = parse_options(argc, argv, prefix, options, usagestr,
>                              PARSE_OPT_KEEP_DASHDASH);
>
>         if (opts->show_progress < 0) {
> @@ -1455,3 +1457,61 @@ int cmd_checkout(int argc, const char **argv, const char *prefix)
>                 return checkout_branch(opts, &new_branch_info);
>         }
>  }
> +
> +int cmd_checkout(int argc, const char **argv, const char *prefix)
> +{
> +       struct checkout_opts opts;
> +       struct option *options = NULL;
> +       int ret;
> +
> +       memset(&opts, 0, sizeof(opts));
> +       opts.dwim_new_local_branch = 1;
> +
> +       options = parse_options_dup(options);
> +       options = add_common_options(&opts, options);
> +       options = add_switch_branch_options(&opts, options);
> +       options = add_checkout_path_options(&opts, options);
> +
> +       ret = checkout_main(argc, argv, prefix, &opts,
> +                           options, checkout_usage);
> +       FREE_AND_NULL(options);
> +       return ret;
> +}
> +
> +int cmd_switch_branch(int argc, const char **argv, const char *prefix)
> +{
> +       struct checkout_opts opts;
> +       struct option *options = NULL;
> +       int ret;
> +
> +       memset(&opts, 0, sizeof(opts));
> +       opts.dwim_new_local_branch = 1;
> +
> +       options = parse_options_dup(options);
> +       options = add_common_options(&opts, options);
> +       options = add_switch_branch_options(&opts, options);
> +
> +       ret = checkout_main(argc, argv, prefix, &opts,
> +                           options, switch_branch_usage);
> +       FREE_AND_NULL(options);
> +       return ret;
> +}
> +
> +int cmd_restore_files(int argc, const char **argv, const char *prefix)
> +{
> +       struct checkout_opts opts;
> +       struct option *options = NULL;
> +       int ret;
> +
> +       memset(&opts, 0, sizeof(opts));
> +       opts.dwim_new_local_branch = 1;
> +
> +       options = parse_options_dup(options);
> +       options = add_common_options(&opts, options);
> +       options = add_checkout_path_options(&opts, options);
> +
> +       ret = checkout_main(argc, argv, prefix, &opts,
> +                           options, restore_files_usage);
> +       FREE_AND_NULL(options);
> +       return ret;
> +}
> diff --git a/command-list.txt b/command-list.txt
> index 3a9af104b5..4638802754 100644
> --- a/command-list.txt
> +++ b/command-list.txt
> @@ -151,6 +151,7 @@ git-replace                             ancillarymanipulators           complete
>  git-request-pull                        foreignscminterface             complete
>  git-rerere                              ancillaryinterrogators
>  git-reset                               mainporcelain           worktree
> +git-restore-files                       mainporcelain           worktree
>  git-revert                              mainporcelain
>  git-rev-list                            plumbinginterrogators
>  git-rev-parse                           plumbinginterrogators
> @@ -171,6 +172,7 @@ git-status                              mainporcelain           info
>  git-stripspace                          purehelpers
>  git-submodule                           mainporcelain
>  git-svn                                 foreignscminterface
> +git-switch-branch                       mainporcelain           history
>  git-symbolic-ref                        plumbingmanipulators
>  git-tag                                 mainporcelain           history
>  git-unpack-file                         plumbinginterrogators
> diff --git a/git.c b/git.c
> index 2f604a41ea..a2be6c3eb5 100644
> --- a/git.c
> +++ b/git.c
> @@ -542,6 +542,7 @@ static struct cmd_struct commands[] = {
>         { "replace", cmd_replace, RUN_SETUP },
>         { "rerere", cmd_rerere, RUN_SETUP },
>         { "reset", cmd_reset, RUN_SETUP },
> +       { "restore-files", cmd_restore_files, RUN_SETUP | NEED_WORK_TREE },
>         { "rev-list", cmd_rev_list, RUN_SETUP | NO_PARSEOPT },
>         { "rev-parse", cmd_rev_parse, NO_PARSEOPT },
>         { "revert", cmd_revert, RUN_SETUP | NEED_WORK_TREE },
> @@ -557,6 +558,7 @@ static struct cmd_struct commands[] = {
>         { "status", cmd_status, RUN_SETUP | NEED_WORK_TREE },
>         { "stripspace", cmd_stripspace },
>         { "submodule--helper", cmd_submodule__helper, RUN_SETUP | SUPPORT_SUPER_PREFIX | NO_PARSEOPT },
> +       { "switch-branch", cmd_switch_branch, RUN_SETUP | NEED_WORK_TREE },
>         { "symbolic-ref", cmd_symbolic_ref, RUN_SETUP },
>         { "tag", cmd_tag, RUN_SETUP | DELAY_PAGER_CONFIG },
>         { "unpack-file", cmd_unpack_file, RUN_SETUP | NO_PARSEOPT },
> --
> 2.20.0.rc1.380.g3eb999425c.dirty
>

^ permalink raw reply

* Re: [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out
From: Stefan Beller @ 2018-12-04  0:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <e19f294df9ff999d30a47339a7848c7104bfae7d.1543879256.git.jonathantanmy@google.com>

On Mon, Dec 3, 2018 at 3:37 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Subsequent patches will change how the output of pack-objects is
> processed, so extract that processing into its own function.
>
> Currently, at most 1 character can be buffered (in the "buffered" local
> variable). One of those patches will require a larger buffer, so replace
> that "buffered" local variable with a buffer array.

This buffering sounds oddly similar to the pkt reader which can buffer
at most one pkt, the difference being that we'd buffer bytes
instead of pkts.

^ permalink raw reply

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
From: Stefan Beller @ 2018-12-04  0:21 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <0461b362569362c6d0e73951469c547a03a1b59d.1543879256.git.jonathantanmy@google.com>

Thanks for bringing this design to the list!

> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> index 345c00e08c..2cb1c41742 100644
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -313,7 +313,8 @@ header. Most sections are sent only when the packfile is sent.
>
>      output = acknowledgements flush-pkt |
>              [acknowledgments delim-pkt] [shallow-info delim-pkt]
> -            [wanted-refs delim-pkt] packfile flush-pkt
> +            [wanted-refs delim-pkt] [packfile-uris delim-pkt]
> +            packfile flush-pkt

While this is an RFC and incomplete, we'd need to remember to
add packfile-uris to the capabilities list above, stating that it requires
thin-pack and ofs-delta to be sent, and what to expect from it.

The mention of  --no-packfile-urls in the Client design above
seems to imply we'd want to turn it on by default, which I thought
was not the usual stance how we introduce new things.

An odd way of disabling it would be --no-thin-pack, hoping the
client side implementation abides by the implied requirements.

>      acknowledgments = PKT-LINE("acknowledgments" LF)
>                       (nak | *ack)
> @@ -331,6 +332,9 @@ header. Most sections are sent only when the packfile is sent.
>                   *PKT-LINE(wanted-ref LF)
>      wanted-ref = obj-id SP refname
>
> +    packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
> +    packfile-uri = PKT-LINE("uri" SP *%x20-ff LF)

Is the *%x20-ff a fancy way of saying obj-id?

While the server is configured with pairs of (oid URL),
we would not need to send the exact oid to the client
as that is what the client can figure out on its own by reading
the downloaded pack.

Instead we could send an integrity hash (i.e. the packfile
downloaded from "uri" is expected to hash to $oid here)

Thanks,
Stefan

^ permalink raw reply

* Re: [WIP RFC 0/5] Design for offloading part of packfile response to CDN
From: Stefan Beller @ 2018-12-04  0:01 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <cover.1543879256.git.jonathantanmy@google.com>

On Mon, Dec 3, 2018 at 3:37 PM Jonathan Tan <jonathantanmy@google.com> wrote:

>
> There is a potential issue: a server which produces both the URIs and
> the packfile at roughly the same time (like the implementation in this
> patch set) will not have sideband access until it has concluded sending
> the URIs. Among other things, this means that the server cannot send
> keepalive packets until quite late in the response. One solution to this
> might be to add a feature that allows the server to use a sideband
> throughout the whole response - and this has other benefits too like
> allowing servers to inform the client throughout the whole fetch, not
> just at the end.

While side band sounds like the right thing to do, we could also
sending (NULL)-URIs within this feature.

^ permalink raw reply

* Re: [PATCH] pack-protocol.txt: accept error packets in any context
From: Stefan Beller @ 2018-12-03 23:59 UTC (permalink / raw)
  To: Masaya Suzuki; +Cc: git
In-Reply-To: <20181127045301.103807-1-masayasuzuki@google.com>

> diff --git a/pkt-line.c b/pkt-line.c
> index 04d10bbd0..ce9e42d10 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -346,6 +346,10 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
>                 return PACKET_READ_EOF;
>         }
>
> +       if (starts_with(buffer, "ERR ")) {
> +               die(_("remote error: %s"), buffer + 4);
> +       }
> +

Handling any ERR line in the pkt reader is okay, as
* we do not pkt-ize pack files and
* we do not have any other parts of the protocol where user data is in
  the first four bytes, which could randomly match this pattern and
* the rest of the pkt-ized part of the protocol never sends
  ERR lines.

Makes sense.

^ permalink raw reply

* Re: [PATCH] sideband: color lines with keyword only
From: Jonathan Nieder @ 2018-12-03 23:42 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git, Han-Wen Nienhuys
In-Reply-To: <CAGZ79kY0w7Zt0Z4KNu7qL4Lz8fFpv2p51D-w_MgZBYPqPFbZKw@mail.gmail.com>

Stefan Beller wrote:
> On Mon, Dec 3, 2018 at 3:23 PM Jonathan Nieder <jrnieder@gmail.com> wrote:

>> I was curious about what versions of Gerrit this is designed to
>> support (or in other words whether it's a bug fix or a feature).
>> Looking at examples like [1], it seems that Gerrit historically always
>> used "ERROR:" so the 59a255aef0 logic would work for it.  More
>> recently, [2] (ReceiveCommits: add a "SUCCESS" marker for successful
>> change updates, 2018-08-21) put SUCCESS on a line of its own.  That
>> puts this squarely in the new-feature category.
>
> Ooops. From the internal bug, I assumed this to be long standing Gerrit
> behavior, which is why I sent it out in -rc to begin with.

No worries.  Can't hurt for Junio to have a few patches to apply to
"pu" or "next" to practice using the release candidates. :)

[...]
>> In the old code, we would escape early if 'n == len', but we didn't
>> need to.  If 'n == len', then
>>
>>         src[len] == '\0'
>
> src[len] could also be one of "\n\r", see the caller
> recv_sideband for sidebase case 2.

Yes, I noticed too late[*].  Sorry for the noise.

The patch still looks good.

Jonathan

[*] https://public-inbox.org/git/20181203233439.GB157301@google.com/

^ permalink raw reply

* [WIP RFC 5/5] upload-pack: send part of packfile response as uri
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan
In-Reply-To: <cover.1543879256.git.jonathantanmy@google.com>

This is a partial implementation of upload-pack sending part of its
packfile response as URIs.

The client is not fully implemented - it knows to ignore the
"packfile-uris" section, but because it does not actually fetch those
URIs, the returned packfile is incomplete. A test is included to show
that the appropriate URI is indeed transmitted, and that the returned
packfile is lacking exactly the expected object.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/pack-objects.c | 48 ++++++++++++++++++++++++++++++++++++++++++
 fetch-pack.c           |  9 ++++++++
 t/t5702-protocol-v2.sh | 25 ++++++++++++++++++++++
 upload-pack.c          | 37 ++++++++++++++++++++++++++++----
 4 files changed, 115 insertions(+), 4 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e7ea206c08..2abbddd3cb 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -117,6 +117,15 @@ enum missing_action {
 static enum missing_action arg_missing_action;
 static show_object_fn fn_show_object;
 
+struct configured_exclusion {
+	struct oidmap_entry e;
+	char *uri;
+};
+static struct oidmap configured_exclusions;
+
+static int exclude_configured_blobs;
+static struct oidset excluded_by_config;
+
 /*
  * stats
  */
@@ -831,6 +840,23 @@ static off_t write_reused_pack(struct hashfile *f)
 	return reuse_packfile_offset - sizeof(struct pack_header);
 }
 
+static void write_excluded_by_configs(void)
+{
+	struct oidset_iter iter;
+	const struct object_id *oid;
+
+	oidset_iter_init(&excluded_by_config, &iter);
+	while ((oid = oidset_iter_next(&iter))) {
+		struct configured_exclusion *ex =
+			oidmap_get(&configured_exclusions, oid);
+
+		if (!ex)
+			BUG("configured exclusion wasn't configured");
+		write_in_full(1, ex->uri, strlen(ex->uri));
+		write_in_full(1, "\n", 1);
+	}
+}
+
 static const char no_split_warning[] = N_(
 "disabling bitmap writing, packs are split due to pack.packSizeLimit"
 );
@@ -1124,6 +1150,12 @@ static int want_object_in_pack(const struct object_id *oid,
 		}
 	}
 
+	if (exclude_configured_blobs &&
+	    oidmap_get(&configured_exclusions, oid)) {
+		oidset_insert(&excluded_by_config, oid);
+		return 0;
+	}
+
 	return 1;
 }
 
@@ -2728,6 +2760,19 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 			    pack_idx_opts.version);
 		return 0;
 	}
+	if (!strcmp(k, "uploadpack.blobpackfileuri")) {
+		struct configured_exclusion *ex = xmalloc(sizeof(*ex));
+		const char *end;
+
+		if (parse_oid_hex(v, &ex->e.oid, &end) || *end != ' ')
+			die(_("value of uploadpack.blobpackfileuri must be "
+			      "of the form '<sha-1> <uri>' (got '%s')"), v);
+		if (oidmap_get(&configured_exclusions, &ex->e.oid))
+			die(_("object already configured in another "
+			      "uploadpack.blobpackfileuri (got '%s')"), v);
+		ex->uri = xstrdup(end + 1);
+		oidmap_put(&configured_exclusions, ex);
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -3314,6 +3359,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			 N_("do not pack objects in promisor packfiles")),
 		OPT_BOOL(0, "delta-islands", &use_delta_islands,
 			 N_("respect islands during delta compression")),
+		OPT_BOOL(0, "exclude-configured-blobs", &exclude_configured_blobs,
+			 N_("respect uploadpack.blobpackfileuri")),
 		OPT_END(),
 	};
 
@@ -3487,6 +3534,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		return 0;
 	if (nr_result)
 		prepare_pack(window, depth);
+	write_excluded_by_configs();
 	write_pack_file();
 	if (progress)
 		fprintf_ln(stderr,
diff --git a/fetch-pack.c b/fetch-pack.c
index 9691046e64..6e1985ab55 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1413,6 +1413,15 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 				receive_wanted_refs(&reader, sought, nr_sought);
 
 			/* get the pack */
+			if (process_section_header(&reader, "packfile-uris", 1)) {
+				/* skip the whole section */
+				process_section_header(&reader, "packfile-uris", 0);
+				while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
+					/* do nothing */
+				}
+				if (reader.status != PACKET_READ_DELIM)
+					die("expected DELIM");
+			}
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfile))
 				die(_("git fetch-pack: fetch failed."));
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 0f2b09ebb8..ccb1fc510e 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -588,6 +588,31 @@ test_expect_success 'when server does not send "ready", expect FLUSH' '
 	test_i18ngrep "expected no other sections to be sent after no .ready." err
 '
 
+test_expect_success 'part of packfile response provided as URI' '
+	rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" http_child log &&
+
+	git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	echo my-blob >"$HTTPD_DOCUMENT_ROOT_PATH/http_parent/my-blob" &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent/" add my-blob &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent/" commit -m x &&
+
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent/" hash-object my-blob >h &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent/" config \
+		"uploadpack.blobpackfileuri" \
+		"$(cat h) https://example.com/a-uri" &&
+
+	# NEEDSWORK: "git clone" fails here because it ignores the URI provided
+	# instead of fetching it.
+	test_must_fail env GIT_TRACE_PACKET="$(pwd)/log" \
+		git -c protocol.version=2 clone \
+		"$HTTPD_URL/smart/http_parent" http_child 2>err &&
+	# Although "git clone" fails, we can still check that the server
+	# provided the URI we requested and that the error message pinpoints
+	# the object that is missing.
+	grep "clone< uri https://example.com/a-uri" log &&
+	test_i18ngrep "did not receive expected object $(cat h)" err
+'
+
 stop_httpd
 
 test_done
diff --git a/upload-pack.c b/upload-pack.c
index aa2589b858..a8eef697ec 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -104,6 +104,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 struct output_state {
 	char buffer[8193];
 	int used;
+	unsigned packfile_uris_started : 1;
 	unsigned packfile_started : 1;
 	struct strbuf progress_buf;
 };
@@ -129,10 +130,35 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os,
 	}
 	os->used += readsz;
 
-	if (!os->packfile_started) {
-		os->packfile_started = 1;
-		if (use_protocol_v2)
-			packet_write_fmt(1, "packfile\n");
+	while (!os->packfile_started) {
+		char *p;
+		if (os->used >= 4 && !memcmp(os->buffer, "PACK", 4)) {
+			os->packfile_started = 1;
+			if (use_protocol_v2) {
+				if (os->packfile_uris_started)
+					packet_delim(1);
+				packet_write_fmt(1, "packfile\n");
+			}
+			break;
+		}
+		if ((p = memchr(os->buffer, '\n', os->used))) {
+			if (!os->packfile_uris_started) {
+				os->packfile_uris_started = 1;
+				if (!use_protocol_v2)
+					BUG("packfile_uris requires protocol v2");
+				packet_write_fmt(1, "packfile-uris\n");
+			}
+			*p = '\0';
+			packet_write_fmt(1, "uri %s\n", os->buffer);
+
+			os->used -= p - os->buffer + 1;
+			memmove(os->buffer, p, os->used);
+		} else {
+			/*
+			 * Incomplete line.
+			 */
+			return readsz;
+		}
 	}
 
 	if (os->used > 1) {
@@ -205,6 +231,9 @@ static void create_pack_file(const struct object_array *have_obj,
 					 filter_options.filter_spec);
 		}
 	}
+	if (use_protocol_v2)
+		argv_array_push(&pack_objects.args,
+				"--exclude-configured-blobs");
 
 	pack_objects.in = -1;
 	pack_objects.out = -1;
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related

* [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan
In-Reply-To: <cover.1543879256.git.jonathantanmy@google.com>

A subsequent patch allows pack-objects to output additional information
(in addition to the packfile that it currently outputs). This means that
we must hold off on writing the "packfile" section header to the client
before we process the output of pack-objects, so move the writing of
the "packfile" section header to read_pack_objects_stdout().

Unfortunately, this also means that we cannot send keepalive packets
until pack-objects starts sending out the packfile, since the sideband
is only established when the "packfile" section header is sent.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 upload-pack.c | 47 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 11 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index cec43e0a46..aa2589b858 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -104,9 +104,12 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 struct output_state {
 	char buffer[8193];
 	int used;
+	unsigned packfile_started : 1;
+	struct strbuf progress_buf;
 };
 
-static int read_pack_objects_stdout(int outfd, struct output_state *os)
+static int read_pack_objects_stdout(int outfd, struct output_state *os,
+				    int use_protocol_v2)
 {
 	/* Data ready; we keep the last byte to ourselves
 	 * in case we detect broken rev-list, so that we
@@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os)
 	}
 	os->used += readsz;
 
+	if (!os->packfile_started) {
+		os->packfile_started = 1;
+		if (use_protocol_v2)
+			packet_write_fmt(1, "packfile\n");
+	}
+
 	if (os->used > 1) {
 		send_client_data(1, os->buffer, os->used - 1);
 		os->buffer[0] = os->buffer[os->used - 1];
@@ -138,8 +147,17 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os)
 	return readsz;
 }
 
+static void flush_progress_buf(struct strbuf *progress_buf)
+{
+	if (!progress_buf->len)
+		return;
+	send_client_data(2, progress_buf->buf, progress_buf->len);
+	strbuf_reset(progress_buf);
+}
+
 static void create_pack_file(const struct object_array *have_obj,
-			     const struct object_array *want_obj)
+			     const struct object_array *want_obj,
+			     int use_protocol_v2)
 {
 	struct child_process pack_objects = CHILD_PROCESS_INIT;
 	struct output_state output_state = {0};
@@ -260,9 +278,13 @@ static void create_pack_file(const struct object_array *have_obj,
 			 */
 			sz = xread(pack_objects.err, progress,
 				  sizeof(progress));
-			if (0 < sz)
-				send_client_data(2, progress, sz);
-			else if (sz == 0) {
+			if (0 < sz) {
+				if (output_state.packfile_started)
+					send_client_data(2, progress, sz);
+				else
+					strbuf_add(&output_state.progress_buf,
+						   progress, sz);
+			} else if (sz == 0) {
 				close(pack_objects.err);
 				pack_objects.err = -1;
 			}
@@ -273,8 +295,10 @@ static void create_pack_file(const struct object_array *have_obj,
 		}
 		if (0 <= pu && (pfd[pu].revents & (POLLIN|POLLHUP))) {
 			int result = read_pack_objects_stdout(pack_objects.out,
-							      &output_state);
-
+							      &output_state,
+							      use_protocol_v2);
+			if (output_state.packfile_started)
+				flush_progress_buf(&output_state.progress_buf);
 			if (result == 0) {
 				close(pack_objects.out);
 				pack_objects.out = -1;
@@ -293,12 +317,14 @@ static void create_pack_file(const struct object_array *have_obj,
 		 * protocol to say anything, so those clients are just out of
 		 * luck.
 		 */
-		if (!ret && use_sideband) {
+		if (!ret && use_sideband && output_state.packfile_started) {
 			static const char buf[] = "0005\1";
 			write_or_die(1, buf, 5);
 		}
 	}
 
+	flush_progress_buf(&output_state.progress_buf);
+
 	if (finish_command(&pack_objects)) {
 		error("git upload-pack: git-pack-objects died with error.");
 		goto fail;
@@ -1094,7 +1120,7 @@ void upload_pack(struct upload_pack_options *options)
 	if (want_obj.nr) {
 		struct object_array have_obj = OBJECT_ARRAY_INIT;
 		get_common_commits(&have_obj, &want_obj);
-		create_pack_file(&have_obj, &want_obj);
+		create_pack_file(&have_obj, &want_obj, 0);
 	}
 }
 
@@ -1475,8 +1501,7 @@ int upload_pack_v2(struct repository *r, struct argv_array *keys,
 			send_wanted_ref_info(&data);
 			send_shallow_info(&data, &want_obj);
 
-			packet_write_fmt(1, "packfile\n");
-			create_pack_file(&have_obj, &want_obj);
+			create_pack_file(&have_obj, &want_obj, 1);
 			state = FETCH_DONE;
 			break;
 		case FETCH_DONE:
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related

* [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan
In-Reply-To: <cover.1543879256.git.jonathantanmy@google.com>

Subsequent patches will change how the output of pack-objects is
processed, so extract that processing into its own function.

Currently, at most 1 character can be buffered (in the "buffered" local
variable). One of those patches will require a larger buffer, so replace
that "buffered" local variable with a buffer array.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 upload-pack.c | 80 +++++++++++++++++++++++++++++----------------------
 1 file changed, 46 insertions(+), 34 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 5e81f1ff24..cec43e0a46 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -101,14 +101,51 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 	return 0;
 }
 
+struct output_state {
+	char buffer[8193];
+	int used;
+};
+
+static int read_pack_objects_stdout(int outfd, struct output_state *os)
+{
+	/* Data ready; we keep the last byte to ourselves
+	 * in case we detect broken rev-list, so that we
+	 * can leave the stream corrupted.  This is
+	 * unfortunate -- unpack-objects would happily
+	 * accept a valid packdata with trailing garbage,
+	 * so appending garbage after we pass all the
+	 * pack data is not good enough to signal
+	 * breakage to downstream.
+	 */
+	ssize_t readsz;
+
+	readsz = xread(outfd, os->buffer + os->used,
+		       sizeof(os->buffer) - os->used);
+	if (readsz < 0) {
+		return readsz;
+	}
+	os->used += readsz;
+
+	if (os->used > 1) {
+		send_client_data(1, os->buffer, os->used - 1);
+		os->buffer[0] = os->buffer[os->used - 1];
+		os->used = 1;
+	} else {
+		send_client_data(1, os->buffer, os->used);
+		os->used = 0;
+	}
+
+	return readsz;
+}
+
 static void create_pack_file(const struct object_array *have_obj,
 			     const struct object_array *want_obj)
 {
 	struct child_process pack_objects = CHILD_PROCESS_INIT;
-	char data[8193], progress[128];
+	struct output_state output_state = {0};
+	char progress[128];
 	char abort_msg[] = "aborting due to possible repository "
 		"corruption on the remote side.";
-	int buffered = -1;
 	ssize_t sz;
 	int i;
 	FILE *pipe_fd;
@@ -235,39 +272,15 @@ static void create_pack_file(const struct object_array *have_obj,
 			continue;
 		}
 		if (0 <= pu && (pfd[pu].revents & (POLLIN|POLLHUP))) {
-			/* Data ready; we keep the last byte to ourselves
-			 * in case we detect broken rev-list, so that we
-			 * can leave the stream corrupted.  This is
-			 * unfortunate -- unpack-objects would happily
-			 * accept a valid packdata with trailing garbage,
-			 * so appending garbage after we pass all the
-			 * pack data is not good enough to signal
-			 * breakage to downstream.
-			 */
-			char *cp = data;
-			ssize_t outsz = 0;
-			if (0 <= buffered) {
-				*cp++ = buffered;
-				outsz++;
-			}
-			sz = xread(pack_objects.out, cp,
-				  sizeof(data) - outsz);
-			if (0 < sz)
-				;
-			else if (sz == 0) {
+			int result = read_pack_objects_stdout(pack_objects.out,
+							      &output_state);
+
+			if (result == 0) {
 				close(pack_objects.out);
 				pack_objects.out = -1;
-			}
-			else
+			} else if (result < 0) {
 				goto fail;
-			sz += outsz;
-			if (1 < sz) {
-				buffered = data[sz-1] & 0xFF;
-				sz--;
 			}
-			else
-				buffered = -1;
-			send_client_data(1, data, sz);
 		}
 
 		/*
@@ -292,9 +305,8 @@ static void create_pack_file(const struct object_array *have_obj,
 	}
 
 	/* flush the data */
-	if (0 <= buffered) {
-		data[0] = buffered;
-		send_client_data(1, data, 1);
+	if (output_state.used > 0) {
+		send_client_data(1, output_state.buffer, output_state.used);
 		fprintf(stderr, "flushed.\n");
 	}
 	if (use_sideband)
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related

* [WIP RFC 2/5] Documentation: add Packfile URIs design doc
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan
In-Reply-To: <cover.1543879256.git.jonathantanmy@google.com>

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/technical/packfile-uri.txt | 83 ++++++++++++++++++++++++
 Documentation/technical/protocol-v2.txt  |  6 +-
 2 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/technical/packfile-uri.txt

diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
new file mode 100644
index 0000000000..6535801486
--- /dev/null
+++ b/Documentation/technical/packfile-uri.txt
@@ -0,0 +1,83 @@
+Packfile URIs
+=============
+
+This feature allows servers to serve part of their packfile response as URIs.
+This allows server designs that improve scalability in bandwidth and CPU usage
+(for example, by serving some data through a CDN), and (in the future) provides
+some measure of resumability to clients.
+
+This feature is available only in protocol version 2.
+
+Protocol
+--------
+
+The server advertises `packfile-uris`.
+
+If the client replies with the following arguments:
+
+ * packfile-uris
+ * thin-pack
+ * ofs-delta
+
+when the server sends the packfile, it MAY send a `packfile-uris` section
+directly before the `packfile` section (right after `wanted-refs` if it is
+sent) containing HTTP(S) URIs. See protocol-v2.txt for the documentation of
+this section.
+
+Clients then should understand that the returned packfile could be incomplete,
+and that it needs to download all the given URIs before the fetch or clone is
+complete. Each URI should point to a Git packfile (which may be a thin pack and
+which may contain offset deltas).
+
+Server design
+-------------
+
+The server can be trivially made compatible with the proposed protocol by
+having it advertise `packfile-uris`, tolerating the client sending
+`packfile-uris`, and never sending any `packfile-uris` section. But we should
+include some sort of non-trivial implementation in the Minimum Viable Product,
+at least so that we can test the client.
+
+This is the implementation: a feature, marked experimental, that allows the
+server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
+<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
+with the given sha1 can be replaced by the given URI. This allows, for example,
+servers to delegate serving of large blobs to CDNs.
+
+Client design
+-------------
+
+While fetching, the client needs to remember the list of URIs and cannot
+declare that the fetch is complete until all URIs have been downloaded as
+packfiles.
+
+The division of work (initial fetch + additional URIs) introduces convenient
+points for resumption of an interrupted clone - such resumption can be done
+after the Minimum Viable Product (see "Future work").
+
+The client can inhibit this feature (i.e. refrain from sending the
+`packfile-urls` parameter) by passing --no-packfile-urls to `git fetch`.
+
+Future work
+-----------
+
+The protocol design allows some evolution of the server and client without any
+need for protocol changes, so only a small-scoped design is included here to
+form the MVP. For example, the following can be done:
+
+ * On the server, a long-running process that takes in entire requests and
+   outputs a list of URIs and the corresponding inclusion and exclusion sets of
+   objects. This allows, e.g., signed URIs to be used and packfiles for common
+   requests to be cached.
+ * On the client, resumption of clone. If a clone is interrupted, information
+   could be recorded in the repository's config and a "clone-resume" command
+   can resume the clone in progress. (Resumption of subsequent fetches is more
+   difficult because that must deal with the user wanting to use the repository
+   even after the fetch was interrupted.)
+
+There are some possible features that will require a change in protocol:
+
+ * Additional HTTP headers (e.g. authentication)
+ * Byte range support
+ * Different file formats referenced by URIs (e.g. raw object)
+
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 345c00e08c..2cb1c41742 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -313,7 +313,8 @@ header. Most sections are sent only when the packfile is sent.
 
     output = acknowledgements flush-pkt |
 	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
-	     [wanted-refs delim-pkt] packfile flush-pkt
+	     [wanted-refs delim-pkt] [packfile-uris delim-pkt]
+	     packfile flush-pkt
 
     acknowledgments = PKT-LINE("acknowledgments" LF)
 		      (nak | *ack)
@@ -331,6 +332,9 @@ header. Most sections are sent only when the packfile is sent.
 		  *PKT-LINE(wanted-ref LF)
     wanted-ref = obj-id SP refname
 
+    packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
+    packfile-uri = PKT-LINE("uri" SP *%x20-ff LF)
+
     packfile = PKT-LINE("packfile" LF)
 	       *PKT-LINE(%x01-03 *%x00-ff)
 
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related

* [WIP RFC 1/5] Documentation: order protocol v2 sections
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan
In-Reply-To: <cover.1543879256.git.jonathantanmy@google.com>

The git command line expects Git servers to follow a specific order of
sections when transmitting protocol v2 responses, but this is not
explicit in the documentation. Make the order explicit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/technical/protocol-v2.txt | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 09e4e0273f..345c00e08c 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -309,11 +309,11 @@ the 'wanted-refs' section in the server's response as explained below.
 
 The response of `fetch` is broken into a number of sections separated by
 delimiter packets (0001), with each section beginning with its section
-header.
+header. Most sections are sent only when the packfile is sent.
 
-    output = *section
-    section = (acknowledgments | shallow-info | wanted-refs | packfile)
-	      (flush-pkt | delim-pkt)
+    output = acknowledgements flush-pkt |
+	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
+	     [wanted-refs delim-pkt] packfile flush-pkt
 
     acknowledgments = PKT-LINE("acknowledgments" LF)
 		      (nak | *ack)
@@ -335,9 +335,10 @@ header.
 	       *PKT-LINE(%x01-03 *%x00-ff)
 
     acknowledgments section
-	* If the client determines that it is finished with negotiations
-	  by sending a "done" line, the acknowledgments sections MUST be
-	  omitted from the server's response.
+	* If the client determines that it is finished with negotiations by
+	  sending a "done" line (thus requiring the server to send a packfile),
+	  the acknowledgments sections MUST be omitted from the server's
+	  response.
 
 	* Always begins with the section header "acknowledgments"
 
@@ -388,9 +389,6 @@ header.
 	  which the client has not indicated was shallow as a part of
 	  its request.
 
-	* This section is only included if a packfile section is also
-	  included in the response.
-
     wanted-refs section
 	* This section is only included if the client has requested a
 	  ref using a 'want-ref' line and if a packfile section is also
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related

* [WIP RFC 0/5] Design for offloading part of packfile response to CDN
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Some of us have been working on a design to improve the scalability of
Git servers by allowing them to offload part of the packfile response to
CDNs in this way: returning HTTP(S) URIs in fetch responses in addition
to packfiles.

This can reduce the load on individual Git servers and improves
proximity (by having data served from closer to the user).

I have included here a design document (patch 2) and a rough
implementation of the server (patch 5). Currently, the implementation
only allows replacing single blobs with URIs, but the protocol
improvement is designed in such a way as to allow independent
improvement of Git server implementations.

There is a potential issue: a server which produces both the URIs and
the packfile at roughly the same time (like the implementation in this
patch set) will not have sideband access until it has concluded sending
the URIs. Among other things, this means that the server cannot send
keepalive packets until quite late in the response. One solution to this
might be to add a feature that allows the server to use a sideband
throughout the whole response - and this has other benefits too like
allowing servers to inform the client throughout the whole fetch, not
just at the end.

Jonathan Tan (5):
  Documentation: order protocol v2 sections
  Documentation: add Packfile URIs design doc
  upload-pack: refactor reading of pack-objects out
  upload-pack: refactor writing of "packfile" line
  upload-pack: send part of packfile response as uri

 Documentation/technical/packfile-uri.txt |  83 +++++++++++++
 Documentation/technical/protocol-v2.txt  |  22 ++--
 builtin/pack-objects.c                   |  48 ++++++++
 fetch-pack.c                             |   9 ++
 t/t5702-protocol-v2.sh                   |  25 ++++
 upload-pack.c                            | 150 ++++++++++++++++-------
 6 files changed, 285 insertions(+), 52 deletions(-)
 create mode 100644 Documentation/technical/packfile-uri.txt

-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply

* Re: [PATCH] sideband: color lines with keyword only
From: Stefan Beller @ 2018-12-03 23:35 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, git, Han-Wen Nienhuys
In-Reply-To: <20181203232353.GA157301@google.com>

On Mon, Dec 3, 2018 at 3:23 PM Jonathan Nieder <jrnieder@gmail.com> wrote:

> I was curious about what versions of Gerrit this is designed to
> support (or in other words whether it's a bug fix or a feature).
> Looking at examples like [1], it seems that Gerrit historically always
> used "ERROR:" so the 59a255aef0 logic would work for it.  More
> recently, [2] (ReceiveCommits: add a "SUCCESS" marker for successful
> change updates, 2018-08-21) put SUCCESS on a line of its own.  That
> puts this squarely in the new-feature category.

Ooops. From the internal bug, I assumed this to be long standing Gerrit
behavior, which is why I sent it out in -rc to begin with.

> > --- a/sideband.c
> > +++ b/sideband.c
> > @@ -87,7 +87,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
> >               struct keyword_entry *p = keywords + i;
> >               int len = strlen(p->keyword);
> >
> > -             if (n <= len)
> > +             if (n < len)
> >                       continue;
>
> In the old code, we would escape early if 'n == len', but we didn't
> need to.  If 'n == len', then
>
>         src[len] == '\0'

src[len] could also be one of "\n\r", see the caller
recv_sideband for sidebase case 2.

>         src .. &src[len-1] is a valid buffer to read from
>
> so the strncasecmp and strbuf_add operations used in this function are
> valid.  Good.

Yes, they are all valid...

> > -             if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) {
> > +             if (!strncasecmp(p->keyword, src, len) &&
> > +                 (len == n || !isalnum(src[len]))) {
>
> Our custom isalnum treats '\0' as not alphanumeric (sane_ctype[0] ==
> GIT_CNTRL) so this part of the patch is unnecessary.  That said, it's
> good for clarity and defensive programming.

... but here we need to check for src[len] for validity.

I made no assumptions about isalnum, but rather needed to shortcut
the condition, as accessing src[len] would be out of bounds, no?

>
> >                       strbuf_addstr(dest, p->color);
> >                       strbuf_add(dest, src, len);

unlike here (or the rest of the block), where len is used correctly.

^ permalink raw reply

* Re: [PATCH] sideband: color lines with keyword only
From: Jonathan Nieder @ 2018-12-03 23:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: gitster, git, Han-Wen Nienhuys
In-Reply-To: <20181203232353.GA157301@google.com>

Jonathan Nieder wrote:
> Stefan Beller wrote:

>>  		/*
>>  		 * Match case insensitively, so we colorize output from existing
>> @@ -95,7 +95,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
>>  		 * messages. We only highlight the word precisely, so
>>  		 * "successful" stays uncolored.
>>  		 */
>> -		if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) {
>> +		if (!strncasecmp(p->keyword, src, len) &&
>> +		    (len == n || !isalnum(src[len]))) {
>
> Our custom isalnum treats '\0' as not alphanumeric (sane_ctype[0] ==
> GIT_CNTRL) so this part of the patch is unnecessary.  That said, it's
> good for clarity and defensive programming.

Correction: I am being silly here.  src[len] can be '\0', '\n', or
'\r' --- it's not always '\0'.  And the contract of this function is
that src[len] could be anything.  Thanks for having handled it
correctly. :)

Jonathan

^ permalink raw reply

* Re: [PATCH] sideband: color lines with keyword only
From: Jonathan Nieder @ 2018-12-03 23:23 UTC (permalink / raw)
  To: Stefan Beller; +Cc: gitster, git, Han-Wen Nienhuys
In-Reply-To: <20181203223713.158394-1-sbeller@google.com>

Hi,

Stefan Beller wrote:

> When bf1a11f0a1 (sideband: highlight keywords in remote sideband output,
> 2018-08-07) was introduced, it was carefully considered which strings
> would be highlighted. However 59a255aef0 (sideband: do not read beyond
> the end of input, 2018-08-18) brought in a regression that the original
> did not test for. A line containing only the keyword and nothing else
> ("SUCCESS") should still be colored.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  sideband.c                          | 5 +++--
>  t/t5409-colorize-remote-messages.sh | 2 ++
>  2 files changed, 5 insertions(+), 2 deletions(-)

Thanks for writing this.

I was curious about what versions of Gerrit this is designed to
support (or in other words whether it's a bug fix or a feature).
Looking at examples like [1], it seems that Gerrit historically always
used "ERROR:" so the 59a255aef0 logic would work for it.  More
recently, [2] (ReceiveCommits: add a "SUCCESS" marker for successful
change updates, 2018-08-21) put SUCCESS on a line of its own.  That
puts this squarely in the new-feature category.

"success" on its own line is even less likely to be a false positive
than "success" followed by punctuation (for example a period marking
the end of a sentence).  So I like this change.

[1] https://gerrit-review.googlesource.com/c/gerrit/+/22361
[2] https://gerrit-review.googlesource.com/c/gerrit/+/193570

> diff --git a/sideband.c b/sideband.c
> index 368647acf8..7c3d33d3f8 100644
> --- a/sideband.c
> +++ b/sideband.c
> @@ -87,7 +87,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
>  		struct keyword_entry *p = keywords + i;
>  		int len = strlen(p->keyword);
>  
> -		if (n <= len)
> +		if (n < len)
>  			continue;

In the old code, we would escape early if 'n == len', but we didn't
need to.  If 'n == len', then

	src[len] == '\0'
	src .. &src[len-1] is a valid buffer to read from

so the strncasecmp and strbuf_add operations used in this function are
valid.  Good.

>  		/*
>  		 * Match case insensitively, so we colorize output from existing
> @@ -95,7 +95,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
>  		 * messages. We only highlight the word precisely, so
>  		 * "successful" stays uncolored.
>  		 */
> -		if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) {
> +		if (!strncasecmp(p->keyword, src, len) &&
> +		    (len == n || !isalnum(src[len]))) {

Our custom isalnum treats '\0' as not alphanumeric (sane_ctype[0] ==
GIT_CNTRL) so this part of the patch is unnecessary.  That said, it's
good for clarity and defensive programming.

>  			strbuf_addstr(dest, p->color);
>  			strbuf_add(dest, src, len);
>  			strbuf_addstr(dest, GIT_COLOR_RESET);
> diff --git a/t/t5409-colorize-remote-messages.sh b/t/t5409-colorize-remote-messages.sh
> index f81b6813c0..2a8c449661 100755
> --- a/t/t5409-colorize-remote-messages.sh
> +++ b/t/t5409-colorize-remote-messages.sh
> @@ -17,6 +17,7 @@ test_expect_success 'setup' '
>  	echo " " "error: leading space"
>  	echo "    "
>  	echo Err
> +	echo SUCCESS
>  	exit 0
>  	EOF
>  	echo 1 >file &&
> @@ -35,6 +36,7 @@ test_expect_success 'keywords' '
>  	grep "<BOLD;RED>error<RESET>: error" decoded &&
>  	grep "<YELLOW>hint<RESET>:" decoded &&
>  	grep "<BOLD;GREEN>success<RESET>:" decoded &&
> +	grep "<BOLD;GREEN>SUCCESS<RESET>" decoded &&
>  	grep "<BOLD;YELLOW>warning<RESET>:" decoded
>  '

Nice tests.

The "hinting: not highlighted" example shows that we aren't
introducing false positives here, so the coverage seems sufficient.
It might be nice to include a line

	echo ERROR:

as well to match another idiom that Gerrit sometimes uses.

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks again for a pleasant read.

^ permalink raw reply

* Re: easy way to demonstrate length of colliding SHA-1 prefixes?
From: Jeff King @ 2018-12-03 22:57 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Ævar Arnfjörð Bjarmason, Robert P. J. Day,
	Git Mailing list
In-Reply-To: <5d63905a-4a52-0724-90f6-a2b0a7ab0f62@comcast.net>

On Mon, Dec 03, 2018 at 02:30:44PM -0800, Matthew DeVore wrote:

> Here is a one-liner to do it. It is Perl line noise, so it's not very cute,
> thought that is subjective. The output shown below is for the Git project
> (not Linux) repository as I've currently synced it:
> 
> $ git rev-list --objects HEAD | sort | perl -anE 'BEGIN { $prev = ""; $long
> = "" } $n = $F[0]; for my $i (reverse 1..40) {last if $i < length($long); if
> (substr($prev, 0, $i) eq substr($n, 0, $i)) {$long = substr($prev, 0, $i);
> last} } $prev = $n; END {say $long}'

Ooh, object-collision golf.

Try:

  git cat-file --batch-all-objects --batch-check='%(objectname)'

instead of "rev-list | sort". It's _much_ faster, because it doesn't
have to actually open the objects and walk the graph.

Some versions of uniq have "-w" (including GNU, but it's definitely not
in POSIX), which lets you do:

  git cat-file --batch-all-objects --batch-check='%(objectname)' |
  uniq -cdw 7

to list all collisions of length 7 (it will show just the first item
from each group, but you can use -D to see them all).

> > You'll always need to list them all. It's inherently an operation where
> > for each SHA-1 you need to search for other ones with that prefix up to
> > a given length.
> > 
> > Perhaps you've missed that you can use --abbrev=N for this, and just
> > grep for things that are loger than that N, e.g. for linux.git:
> > 
> >      git log --oneline --abbrev=10 --pretty=format:%h |
> >      grep -E -v '^.{10}$' |
> >      perl -pe 's/^(.{10}).*/$1/'
> 
> I think the goal was to search all object hashes, not just commits. And git
> rev-list --objects will do that.

You can add "-t --raw" to see the abbreviated tree and blob names,
though it gets tricky around handling merges.

-Peff

^ permalink raw reply

* [PATCH] sideband: color lines with keyword only
From: Stefan Beller @ 2018-12-03 22:37 UTC (permalink / raw)
  To: jrnieder, gitster; +Cc: git, Stefan Beller

When bf1a11f0a1 (sideband: highlight keywords in remote sideband output,
2018-08-07) was introduced, it was carefully considered which strings
would be highlighted. However 59a255aef0 (sideband: do not read beyond
the end of input, 2018-08-18) brought in a regression that the original
did not test for. A line containing only the keyword and nothing else
("SUCCESS") should still be colored.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 sideband.c                          | 5 +++--
 t/t5409-colorize-remote-messages.sh | 2 ++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/sideband.c b/sideband.c
index 368647acf8..7c3d33d3f8 100644
--- a/sideband.c
+++ b/sideband.c
@@ -87,7 +87,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
 		struct keyword_entry *p = keywords + i;
 		int len = strlen(p->keyword);
 
-		if (n <= len)
+		if (n < len)
 			continue;
 		/*
 		 * Match case insensitively, so we colorize output from existing
@@ -95,7 +95,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
 		 * messages. We only highlight the word precisely, so
 		 * "successful" stays uncolored.
 		 */
-		if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) {
+		if (!strncasecmp(p->keyword, src, len) &&
+		    (len == n || !isalnum(src[len]))) {
 			strbuf_addstr(dest, p->color);
 			strbuf_add(dest, src, len);
 			strbuf_addstr(dest, GIT_COLOR_RESET);
diff --git a/t/t5409-colorize-remote-messages.sh b/t/t5409-colorize-remote-messages.sh
index f81b6813c0..2a8c449661 100755
--- a/t/t5409-colorize-remote-messages.sh
+++ b/t/t5409-colorize-remote-messages.sh
@@ -17,6 +17,7 @@ test_expect_success 'setup' '
 	echo " " "error: leading space"
 	echo "    "
 	echo Err
+	echo SUCCESS
 	exit 0
 	EOF
 	echo 1 >file &&
@@ -35,6 +36,7 @@ test_expect_success 'keywords' '
 	grep "<BOLD;RED>error<RESET>: error" decoded &&
 	grep "<YELLOW>hint<RESET>:" decoded &&
 	grep "<BOLD;GREEN>success<RESET>:" decoded &&
+	grep "<BOLD;GREEN>SUCCESS<RESET>" decoded &&
 	grep "<BOLD;YELLOW>warning<RESET>:" decoded
 '
 
-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply related

* Re: easy way to demonstrate length of colliding SHA-1 prefixes?
From: Matthew DeVore @ 2018-12-03 22:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Robert P. J. Day; +Cc: Git Mailing list
In-Reply-To: <87y398uknn.fsf@evledraar.gmail.com>



On 12/02/2018 05:23 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sun, Dec 02 2018, Robert P. J. Day wrote:
> 
>>    as part of an upcoming git class i'm delivering, i thought it would
>> be amusing to demonstrate the maximum length of colliding SHA-1
>> prefixes in a repository (in my case, i use the linux kernel git repo
>> for most of my examples).
>>
>>    is there a way to display the objects in the object database that
>> clash in the longest object name SHA-1 prefix; i mean, short of
>> manually listing all object names, running that through cut and sort
>> and uniq and ... you get the idea.
>>
>>    is there a cute way to do that? thanks.
> 

Here is a one-liner to do it. It is Perl line noise, so it's not very 
cute, thought that is subjective. The output shown below is for the Git 
project (not Linux) repository as I've currently synced it:

$ git rev-list --objects HEAD | sort | perl -anE 'BEGIN { $prev = ""; 
$long = "" } $n = $F[0]; for my $i (reverse 1..40) {last if $i < 
length($long); if (substr($prev, 0, $i) eq substr($n, 0, $i)) {$long = 
substr($prev, 0, $i); last} } $prev = $n; END {say $long}'

c68038ef

$ git cat-file -t c68038ef

error: short SHA1 c68038ef is ambiguous
hint: The candidates are:
hint:   c68038effe commit 2012-06-01 - vcs-svn: suppress a 
signed/unsigned comparison warning
hint:   c68038ef00 blob
fatal: Not a valid object name c68038ef


> You'll always need to list them all. It's inherently an operation where
> for each SHA-1 you need to search for other ones with that prefix up to
> a given length.
> 
> Perhaps you've missed that you can use --abbrev=N for this, and just
> grep for things that are loger than that N, e.g. for linux.git:
> 
>      git log --oneline --abbrev=10 --pretty=format:%h |
>      grep -E -v '^.{10}$' |
>      perl -pe 's/^(.{10}).*/$1/'

I think the goal was to search all object hashes, not just commits. And 
git rev-list --objects will do that.

^ permalink raw reply

* Re: [PATCH v2] revisions.c: put promisor option in specialized struct
From: Jeff King @ 2018-12-03 22:17 UTC (permalink / raw)
  To: Matthew DeVore; +Cc: git, gitster, pclouds, jonathantanmy, jeffhost
In-Reply-To: <20181203221019.237034-1-matvore@google.com>

On Mon, Dec 03, 2018 at 02:10:19PM -0800, Matthew DeVore wrote:

> Put the allow_exclude_promisor_objects flag in setup_revision_opt. When
> it was in rev_info, it was unclear when it was used, since rev_info is
> passed to functions that don't use the flag. This resulted in
> unnecessary setting of the flag in prune.c, so fix that as well.
> 
> Signed-off-by: Matthew DeVore <matvore@google.com>
> ---
>  builtin/pack-objects.c |  6 ++++--
>  builtin/prune.c        |  1 -
>  builtin/rev-list.c     |  6 ++++--
>  revision.c             | 10 ++++++----
>  revision.h             |  4 ++--
>  5 files changed, 16 insertions(+), 11 deletions(-)

Thanks, this version looks good to me.

One style nit that I don't think is worth a re-roll, but that Junio
might want to tweak while applying:

> diff --git a/revision.c b/revision.c
> index 13e0519c02..f6b32e6a42 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1791,7 +1791,8 @@ static void add_message_grep(struct rev_info *revs, const char *pattern)
>  }
>  
>  static int handle_revision_opt(struct rev_info *revs, int argc, const char **argv,
> -			       int *unkc, const char **unkv)
> +			       int *unkc, const char **unkv,
> +			       const struct setup_revision_opt* opt)

We keep the "*" with the variable name, not the type.

-Peff

^ permalink raw reply

* Re: Confusing inconsistent option syntax
From: Jeff King @ 2018-12-03 22:14 UTC (permalink / raw)
  To: Robert White; +Cc: git
In-Reply-To: <20181202100747.GA5019@laptop>

On Sun, Dec 02, 2018 at 09:07:47PM +1100, Robert White wrote:

> `git log --pretty short` gives the error message "ambiguous argument
> 'short'". To get the expected result, you need to use `git log
> --pretty=short`. However, `git log --since yesterday` and `git log
> --since=yesterday` both work as expected.
> 
> When is an = needed? What is the reason for these inconsistencies?

As Duy mentioned, this has to do with optional arguments for some flags.
This is discussed in more detail in "git help cli". Notably (and as
advised in that manpage), you should always use the "stuck" form (with
the "=") in scripts, in case a flag grows an optional value later.

-Peff

^ permalink raw reply

* [PATCH v2] revisions.c: put promisor option in specialized struct
From: Matthew DeVore @ 2018-12-03 22:10 UTC (permalink / raw)
  To: peff, git, gitster; +Cc: Matthew DeVore, pclouds, jonathantanmy, jeffhost

Put the allow_exclude_promisor_objects flag in setup_revision_opt. When
it was in rev_info, it was unclear when it was used, since rev_info is
passed to functions that don't use the flag. This resulted in
unnecessary setting of the flag in prune.c, so fix that as well.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/pack-objects.c |  6 ++++--
 builtin/prune.c        |  1 -
 builtin/rev-list.c     |  6 ++++--
 revision.c             | 10 ++++++----
 revision.h             |  4 ++--
 5 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 24bba8147f..889df2c755 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3084,14 +3084,16 @@ static void record_recent_commit(struct commit *commit, void *data)
 static void get_object_list(int ac, const char **av)
 {
 	struct rev_info revs;
+	struct setup_revision_opt s_r_opt = {
+		.allow_exclude_promisor_objects = 1,
+	};
 	char line[1000];
 	int flags = 0;
 	int save_warning;
 
 	repo_init_revisions(the_repository, &revs, NULL);
 	save_commit_buffer = 0;
-	revs.allow_exclude_promisor_objects_opt = 1;
-	setup_revisions(ac, av, &revs, NULL);
+	setup_revisions(ac, av, &revs, &s_r_opt);
 
 	/* make sure shallows are read */
 	is_repository_shallow(the_repository);
diff --git a/builtin/prune.c b/builtin/prune.c
index e42653b99c..1ec9ddd751 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -120,7 +120,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	save_commit_buffer = 0;
 	read_replace_refs = 0;
 	ref_paranoia = 1;
-	revs.allow_exclude_promisor_objects_opt = 1;
 	repo_init_revisions(the_repository, &revs, prefix);
 
 	argc = parse_options(argc, argv, prefix, options, prune_usage, 0);
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 3a2c0c23b6..14ef659c12 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -362,6 +362,9 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
 	struct rev_list_info info;
+	struct setup_revision_opt s_r_opt = {
+		.allow_exclude_promisor_objects = 1,
+	};
 	int i;
 	int bisect_list = 0;
 	int bisect_show_vars = 0;
@@ -375,7 +378,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	git_config(git_default_config, NULL);
 	repo_init_revisions(the_repository, &revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
-	revs.allow_exclude_promisor_objects_opt = 1;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
 	revs.do_not_die_on_missing_tree = 1;
 
@@ -407,7 +409,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	argc = setup_revisions(argc, argv, &revs, NULL);
+	argc = setup_revisions(argc, argv, &revs, &s_r_opt);
 
 	memset(&info, 0, sizeof(info));
 	info.revs = &revs;
diff --git a/revision.c b/revision.c
index 13e0519c02..f6b32e6a42 100644
--- a/revision.c
+++ b/revision.c
@@ -1791,7 +1791,8 @@ static void add_message_grep(struct rev_info *revs, const char *pattern)
 }
 
 static int handle_revision_opt(struct rev_info *revs, int argc, const char **argv,
-			       int *unkc, const char **unkv)
+			       int *unkc, const char **unkv,
+			       const struct setup_revision_opt* opt)
 {
 	const char *arg = argv[0];
 	const char *optarg;
@@ -2151,7 +2152,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->limited = 1;
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
-	} else if (revs->allow_exclude_promisor_objects_opt &&
+	} else if (opt && opt->allow_exclude_promisor_objects &&
 		   !strcmp(arg, "--exclude-promisor-objects")) {
 		if (fetch_if_missing)
 			BUG("exclude_promisor_objects can only be used when fetch_if_missing is 0");
@@ -2173,7 +2174,7 @@ void parse_revision_opt(struct rev_info *revs, struct parse_opt_ctx_t *ctx,
 			const char * const usagestr[])
 {
 	int n = handle_revision_opt(revs, ctx->argc, ctx->argv,
-				    &ctx->cpidx, ctx->out);
+				    &ctx->cpidx, ctx->out, NULL);
 	if (n <= 0) {
 		error("unknown option `%s'", ctx->argv[0]);
 		usage_with_options(usagestr, options);
@@ -2391,7 +2392,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 				continue;
 			}
 
-			opts = handle_revision_opt(revs, argc - i, argv + i, &left, argv);
+			opts = handle_revision_opt(revs, argc - i, argv + i,
+						   &left, argv, opt);
 			if (opts > 0) {
 				i += opts - 1;
 				continue;
diff --git a/revision.h b/revision.h
index 7987bfcd2e..52e5a88ff5 100644
--- a/revision.h
+++ b/revision.h
@@ -161,7 +161,6 @@ struct rev_info {
 			do_not_die_on_missing_tree:1,
 
 			/* for internal use only */
-			allow_exclude_promisor_objects_opt:1,
 			exclude_promisor_objects:1;
 
 	/* Diff flags */
@@ -297,7 +296,8 @@ struct setup_revision_opt {
 	const char *def;
 	void (*tweak)(struct rev_info *, struct setup_revision_opt *);
 	const char *submodule;	/* TODO: drop this and use rev_info->repo */
-	int assume_dashdash;
+	unsigned int	assume_dashdash:1,
+			allow_exclude_promisor_objects:1;
 	unsigned revarg_opt;
 };
 
-- 
2.20.0.rc1.387.gf8505762e3-goog


^ permalink raw reply related

* Re: [PATCH] revisions.c: put promisor option in specialized struct
From: Matthew DeVore @ 2018-12-03 22:01 UTC (permalink / raw)
  To: Jeff King, Matthew DeVore; +Cc: git, gitster, pclouds, jonathantanmy, jeffhost
In-Reply-To: <20181203212431.GB8700@sigill.intra.peff.net>

On 12/03/2018 01:24 PM, Jeff King wrote:
>> @@ -297,7 +296,8 @@ struct setup_revision_opt {
>>   	const char *def;
>>   	void (*tweak)(struct rev_info *, struct setup_revision_opt *);
>>   	const char *submodule;	/* TODO: drop this and use rev_info->repo */
>> -	int assume_dashdash;
>> +	int assume_dashdash : 1;
>> +	int allow_exclude_promisor_objects : 1;
>>   	unsigned revarg_opt;
>>   };
> 
> I don't know that we need to penny-pinch bytes in this struct, but in
> general it shouldn't hurt either awy. However, a signed bit-field with 1
> bit is funny. I'm not even sure what the standard has to say, but in
> twos-complement that would store "-1" and "0" (gcc -Wpedantic also
> complains about overflow in assigning "1" to it).
Interesting. I hadn't suspected this. But I confirmed it with this:

#include <stdio.h>

struct x {
   int y : 1;
   int z : 1;
};

int main() {
   struct x x;
   x.y = 1;
   x.z = 1;
   printf("%d %d\n", (int) x.y, (int) x.z);
   return 0;
}

-- Output --
-1 -1

> 
> So this probably ought to be "unsigned".


Earlier in this file we define bit fields this way:
	/* Traversal flags */
	unsigned int	dense:1,
			prune:1,

... using \t to align the field names, so I'll mimic that style.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox