Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] t7800: replace "wc -l" with test_line_count
From: Johannes Schindelin @ 2017-02-07 12:02 UTC (permalink / raw)
  To: David Aguilar; +Cc: Junio C Hamano, Git ML
In-Reply-To: <20170207091700.20156-2-davvid@gmail.com>

Hi David,

On Tue, 7 Feb 2017, David Aguilar wrote:

> Make t7800 easier to debug by capturing output into temporary files and
> using test_line_count to make assertions on those files.
> 
> Signed-off-by: David Aguilar <davvid@gmail.com>

Both patches look like an obvious improvement with no obvious bugs to me.

In this case, I allowed myself to forego the more thorough code review in
favor of merely glancing over the diffs, seeing as the changes do not
really need a lot of context.

Thank you,
Johannes

^ permalink raw reply

* Re: Git clonebundles
From: Johannes Schindelin @ 2017-02-07 12:04 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, Shawn Pearce, Stefan Saasen, Git Mailing List
In-Reply-To: <xmqq4m070xua.fsf@gitster.mtv.corp.google.com>

Hi Junio,

On Mon, 6 Feb 2017, Junio C Hamano wrote:

> Christian Couder <christian.couder@gmail.com> writes:
> 
> > There is also Junio's work on Bundle v3 that was unfortunately
> > recently discarded.  Look for "jc/bundle" in:
> >
> > http://public-inbox.org/git/xmqq4m0cry60.fsf@gitster.mtv.corp.google.com/
> >
> > and previous "What's cooking in git.git" emails.
> 
> If people think it might be useful to have it around to experiment, I
> can resurrect and keep that in 'pu' (or rather 'jch'), as long as it
> does not overlap and conflict with other topics in flight.  Let me try
> that in today's integration cycle.

I would like to remind you of my suggestion to make this more publicly
visible and substantially easier to play with, by adding it as an
experimental feature (possibly guarded via an explicit opt-in config
setting).

Ciao,
Johannes

^ permalink raw reply

* [PATCH] rev-list-options.txt: update --all about detached HEAD
From: Nguyễn Thái Ngọc Duy @ 2017-02-07 13:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This is the document patch for f0298cf1c6 (revision walker: include a
detached HEAD in --all - 2009-01-16)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/rev-list-options.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 5da7cf5a8d..72212ac6ec 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -133,8 +133,8 @@ parents) and `--max-parents=-1` (negative numbers denote no upper limit).
 	for all following revision specifiers, up to the next `--not`.
 
 --all::
-	Pretend as if all the refs in `refs/` are listed on the
-	command line as '<commit>'.
+	Pretend as if all the refs in `refs/` (and HEAD if detached)
+	are listed on the command line as '<commit>'.
 
 --branches[=<pattern>]::
 	Pretend as if all the refs in `refs/heads` are listed
-- 
2.11.0.157.gd943d85


^ permalink raw reply related

* ``git clean -xdf'' and ``make clean''
From: Hongyi Zhao @ 2017-02-07 14:17 UTC (permalink / raw)
  To: git

Hi all,

In order to delete all of the last build stuff, does the following two
methods equivalent or not?

``git clean -xdf'' and ``make clean''

Regards
-- 
Hongsheng Zhao <hongyi.zhao@gmail.com>
Institute of Semiconductors, Chinese Academy of Sciences
GnuPG DSA: 0xD108493

^ permalink raw reply

* Re: Request re git status
From: Samuel Lijin @ 2017-02-07 14:54 UTC (permalink / raw)
  To: Phil Hord; +Cc: Ron Pero, Git
In-Reply-To: <CABURp0qbKMfngfsC5pQeO+qyRPxa21vi090hMWDtLd+BBH_3Jg@mail.gmail.com>

On Mon, Feb 6, 2017 at 6:45 PM, Phil Hord <phil.hord@gmail.com> wrote:
> On Mon, Feb 6, 2017 at 3:36 PM Ron Pero <rpero@magnadev.com> wrote:
>> I almost got bit by git: I knew there were changes on the remote
>> server, but git status said I was uptodate with the remote.
>>
>
> Do you mean you almost pushed some changed history with "--force"
> which would have lost others' changes?  Use of this option is
> discouraged on shared branches for this very reason.  But if you do
> use it, the remote will tell you the hash of the old branch so you can
> undo the damage.
>
> But if you did not use --force, then you were not in danger of being
> bit.  Git would have prevented the push in that case.
>
>
>> Why ... not design it to [optionally] DO a fetch and THEN declare
>> whether it is up to date?
>
> It's because `git status` does not talk to the remote server, by
> design.  The only Git commands that do talk to the remote are push,
> pull and fetch.  All the rest work off-line and they do so
> consistently.
>
> Imagine `git status` did what you requested; that is, it first did a
> fetch and then reported the status.  Suppose someone pushed a commit
> to the remote immediately after your fetch completed.  Now git will
> still report "up to date" but it will be wrong as soon as the remote
> finishes adding the new push.  Yet the "up to date" message will
> remain on your console, lying to you.  If you leave and come back in
> two days, the message will remain there even if it is no longer
> correct.
>
> So you should accept that `git status` tells you the status with
> respect to your most recent fetch, and that you are responsible for
> the timing of the most recent fetch.  To have git try to do otherwise
> would be misleading.

This argument doesn't work for me. Race conditions in *any*
asynchronous work flow are inevitable; in commits, particularly to a
shared branch, I also can't imagine them being common. It's like
saying because there's lag between the remote's response and the
output on the local, `git fetch` shouldn't bother saying that the
local remote has been updated.

It wouldn't be hard, though, to define an alias that fetches the
remote-tracking branch and then reports the status.

Nevertheless, this is one of those cases where I think Git suffers
from a poor UI/UX - it's letting the underlying model define the
behavior, rather than using the underlying model to drive the
behavior.

>> Or change the message to tell what it really
>> did, e.g. "Your branch was up-to-date with 'origin/master' when last
>> checked at {timestamp}"? Or even just say, "Do a fetch to find out
>> whether your branch is up to date"?
>
> These are reasonable suggestions, but i don't think the extra wording
> adds anything for most users.  Adding a timestamp seems generally
> useful, but it could get us into other trouble since we have to depend
> on outside sources for timestamps.  :-\

^ permalink raw reply

* Re: subtree merging fails
From: Samuel Lijin @ 2017-02-07 14:59 UTC (permalink / raw)
  To: Stavros Liaskos; +Cc: git@vger.kernel.org
In-Reply-To: <CAEXhnECi3LvSA92dSjL5PZ1Lx9p1PWELS04nmfJW=8K9o4T-0Q@mail.gmail.com>

Have you tried using (without -s subtree) -X subtree=path/to/add/subtree/at?

From the man page:

          subtree[=<path>]
               This option is a more advanced form of subtree
strategy, where the strategy
               makes a guess on how two trees must be shifted to match
with each other when
               merging. Instead, the specified path is prefixed (or
stripped from the
               beginning) to make the shape of two trees to match.

On Tue, Feb 7, 2017 at 2:16 AM, Stavros Liaskos <st.liaskos@gmail.com> wrote:
> Following the instructions here:
> https://git-scm.com/book/en/v1/Git-Tools-Subtree-Merging
> will lead to an error.
>
> In particular, if the subtree is merged and then updated, this command
> that is supposed to update the local subtree fails with a fatal:
> refusing to merge unrelated histories error.
>
> $ git merge --squash -s subtree --no-commit rack_branch
>
> A workaround could be using the --allow-unrelated-histories option
>
> $ git merge --squash --allow-unrelated-histories -s subtree
> --no-commit rack_branch
>
> But this completely destroys my project by pushing the subtree
> contents into a completely irrelevant directory in my project (no in
> the subtree).
>
> Any ideas??
>
> https://github.com/git/git-scm.com/issues/896#issuecomment-277587626

^ permalink raw reply

* Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
From: Stefan Hajnoczi @ 2017-02-07 15:04 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <20170120235133.GA146274@google.com>

[-- Attachment #1: Type: text/plain, Size: 2262 bytes --]

On Fri, Jan 20, 2017 at 03:51:33PM -0800, Brandon Williams wrote:
> On 01/20, Junio C Hamano wrote:
> > Stefan Hajnoczi <stefanha@redhat.com> writes:
> > 
> > > If the tree contains a sub-directory then git-grep(1) output contains a
> > > colon character instead of a path separator:
> > >
> > >   $ git grep malloc v2.9.3:t
> > >   v2.9.3:t:test-lib.sh:	setup_malloc_check () {
> > >   $ git show v2.9.3:t:test-lib.sh
> > >   fatal: Path 't:test-lib.sh' does not exist in 'v2.9.3'
> > >
> > > This patch attempts to use the correct delimiter:
> > >
> > >   $ git grep malloc v2.9.3:t
> > >   v2.9.3:t/test-lib.sh:	setup_malloc_check () {
> > >   $ git show v2.9.3:t/test-lib.sh
> > >   (success)
> > >
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  builtin/grep.c  | 4 +++-
> > >  t/t7810-grep.sh | 5 +++++
> > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/builtin/grep.c b/builtin/grep.c
> > > index 90a4f3d..7a7aab9 100644
> > > --- a/builtin/grep.c
> > > +++ b/builtin/grep.c
> > > @@ -494,7 +494,9 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
> > >  
> > >  			/* Add a delimiter if there isn't one already */
> > >  			if (name[len - 1] != '/' && name[len - 1] != ':') {
> > > -				strbuf_addch(&base, ':');
> > > +				/* rev: or rev:path/ */
> > > +				char delim = obj->type == OBJ_COMMIT ? ':' : '/';
> > 
> > Why check the equality with commit, rather than un-equality with
> > tree?  Wouldn't you want to treat $commit:path and $tag:path the
> > same way?
> 
> I assume Stefan just grabbed my naive suggestion hence why it checks
> equality with a commit.  So that's my fault :)  Either of these may
> not be enough though, since if you do 'git grep malloc v2.9.3^{tree}'
> with this change the output prefix is 'v2.9.3^{tree}/' instead of the
> correct prefix 'v2.9.3^{tree}:'

I revisited this series again today and am coming to the conclusion that
forming output based on the user's rev is really hard to get right in
all cases.  I don't have a good solution to the v2.9.3^{tree} problem.

Perhaps it's better to leave this than to merge code that doesn't work
correctly 100% of the time.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* Re: ``git clean -xdf'' and ``make clean''
From: Cornelius Weig @ 2017-02-07 15:07 UTC (permalink / raw)
  To: Hongyi Zhao, git
In-Reply-To: <CAGP6PO+qD6eRkKbWAxOfiqUQw8o+dOfgwgvt_8OxHQ5ocAopEQ@mail.gmail.com>

On 02/07/2017 03:17 PM, Hongyi Zhao wrote:
> Hi all,
> 
> In order to delete all of the last build stuff, does the following two
> methods equivalent or not?
> 
> ``git clean -xdf'' and ``make clean''

No, it is not equivalent.

* `make clean` removes any build-related files (assuming that the
`clean` target is properly written). To see exactly what it would do,
run `make clean -n`. Judging from your question, I think this is what
you want to do.

* `git clean -xdf` would remove any files that git does not track. This
also includes build-related files, but also any other files that happen
to be in your working directory. For example, any output from `git
format-patch` would be removed by this, but not `make clean`.

^ permalink raw reply

* Re: The design of refs backends, linked worktrees and submodules
From: Duy Nguyen @ 2017-02-07 15:07 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Git Mailing List
In-Reply-To: <CACsJy8CHoroX2k9GqOFmXkvvPCPN4SBeCg+6aC2WSWNSKVmWQw@mail.gmail.com>

On Thu, Jan 19, 2017 at 6:55 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> I've started working on fixing the "git gc" issue with multiple
> worktrees, which brings me back to this. Just some thoughts. Comments
> are really appreciated.
>
> In the current code, files backend has special cases for both
> submodules (explicitly) and linked worktrees (hidden behind git_path).

It just occurs to me that, since the refs directory structure of a
linked worktree is exactly like one in a normal single-worktree setup,
minus the shared (or packed) refs. The "files" refs backend can just
see this "per-worktree only" refs directory as a remote refs storage,
which is just another name for "submodule".

So, we could just use the exact same submodule code path in refs to
create a per-worktree refs storage. Doing it this way, files backedn
do not need to learn about linked worktrees at all. To retrieve a
per-worktree refs storage, we do
"get_ref_store(".git/worktrees/foobar")". To get all per-worktree refs
do for_each_ref_submodule(".git/worktrees/foobar", ...).

Does it make sense? Should we go this way?
-- 
Duy

^ permalink raw reply

* Re: Git clonebundles
From: Stefan Beller @ 2017-02-07 15:34 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Christian Couder, Shawn Pearce, Stefan Saasen,
	Git Mailing List
In-Reply-To: <alpine.DEB.2.20.1702071303370.3496@virtualbox>

On Tue, Feb 7, 2017 at 4:04 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Junio,
>
> On Mon, 6 Feb 2017, Junio C Hamano wrote:
>
>> Christian Couder <christian.couder@gmail.com> writes:
>>
>> > There is also Junio's work on Bundle v3 that was unfortunately
>> > recently discarded.  Look for "jc/bundle" in:
>> >
>> > http://public-inbox.org/git/xmqq4m0cry60.fsf@gitster.mtv.corp.google.com/
>> >
>> > and previous "What's cooking in git.git" emails.
>>
>> If people think it might be useful to have it around to experiment, I
>> can resurrect and keep that in 'pu' (or rather 'jch'), as long as it
>> does not overlap and conflict with other topics in flight.  Let me try
>> that in today's integration cycle.
>
> I would like to remind you of my suggestion to make this more publicly
> visible and substantially easier to play with, by adding it as an
> experimental feature (possibly guarded via an explicit opt-in config
> setting).
>
> Ciao,
> Johannes

For making this more publicly visible, I want to look into publishing
the cooking reports on the git-scm.com. Maybe we can have a "dev"
section there, that has
* a "getting started" section
  linking to
    Documentation/SubmittingPatches
    How to setup your travis
* "current state of development" section
  e.g. the cooking reports, the
  release calender, description of the workflow
  (which branches do exist and serve which purpose),

Most of the static information is already covered quite
well in Documentation/ so there is definitively overlap,
hence lots of links to the ground truth.

The dynamic information however (release calender,
cooking reports) are not described well enough in
Documentation/ so I think we'd want to focus on these
in that dev section.

^ permalink raw reply

* Re: [RFC] mailmap.blob overrides default .mailmap
From: Stefan Beller @ 2017-02-07 17:27 UTC (permalink / raw)
  To: Cornelius Weig, Jeff King; +Cc: git@vger.kernel.org
In-Reply-To: <77c0182b-8c4f-9727-f56f-d8e2bad8146d@tngtech.com>

On Tue, Feb 7, 2017 at 3:56 AM, Cornelius Weig
<cornelius.weig@tngtech.com> wrote:
> Hi,
>
>  I was reading into the mailmap handling today and I'm a bit puzzled by the overriding behavior.
>
> This is what the documentation says about precedence (emphasis mine):
> -------------
> mailmap.file
>     The location of an augmenting mailmap file. The default mailmap, located
>     in the root of the repository, is loaded first, then the mailmap file
>     pointed to by this variable. The location of the mailmap file may be in a
>     repository subdirectory, or somewhere outside of the repository itself.
>     See git-shortlog(1) and git-blame(1).
>
> mailmap.blob
>     Like mailmap.file, but consider the value as a reference to a blob in the
>     repository. If both mailmap.file and mailmap.blob are given, both are
> !!! parsed, with _entries from mailmap.file taking precedence_. In a bare
>     repository, this defaults to HEAD:.mailmap. In a non-bare repository, it
>     defaults to empty.
> ------------
>
> So from the doc I would have expected that files always get precedence over the blob. IOW entries from .mailmap override entries from mailmap.blob. However, this is not the case.
>
> The code shows why (mailmap.c):
>         err |= read_mailmap_file(map, ".mailmap", repo_abbrev);
>         if (startup_info->have_repository)
>                 err |= read_mailmap_blob(map, git_mailmap_blob, repo_abbrev);
>         err |= read_mailmap_file(map, git_mailmap_file, repo_abbrev);
>
>
> Apparently this is not an oversight, because there is an explicit test for this overriding behavior (t4203 'mailmap.blob overrides .mailmap').

which is blamed to 08610900 (mailmap: support reading mailmap from
blobs, 2012-12-12),
cc'ing Jeff who may remember what he was doing back then, as the
commit message doesn't discuss the implications on ordering.

>
> So I wonder: what is the rationale behind this? I find this mixed overriding behavior hard to explain and difficult to understand.
>

^ permalink raw reply

* RE: [RFC] Add support for downloading blobs on demand
From: Ben Peart @ 2017-02-07 18:21 UTC (permalink / raw)
  To: 'Christian Couder'
  Cc: 'Jeff King', 'git', 'Johannes Schindelin',
	Ben Peart
In-Reply-To: <CAP8UFD3R6nzDPApNvK6rcXR2qdAE6G4J3xbvEam3xsobO7viiA@mail.gmail.com>

No worries about a late response, I'm sure this is the start of a long conversation. :)

> -----Original Message-----
> From: Christian Couder [mailto:christian.couder@gmail.com]
> Sent: Sunday, February 5, 2017 9:04 AM
> To: Ben Peart <peartben@gmail.com>
> Cc: Jeff King <peff@peff.net>; git <git@vger.kernel.org>; Johannes Schindelin
> <Johannes.Schindelin@gmx.de>
> Subject: Re: [RFC] Add support for downloading blobs on demand
> 
> (Sorry for the late reply and thanks to Dscho for pointing me to this thread.)
> 
> On Tue, Jan 17, 2017 at 10:50 PM, Ben Peart <peartben@gmail.com> wrote:
> >> From: Jeff King [mailto:peff@peff.net] On Fri, Jan 13, 2017 at
> >> 10:52:53AM -0500, Ben Peart wrote:
> >>
> >> > Clone and fetch will pass a  --lazy-clone  flag (open to a better
> >> > name
> >> > here) similar to  --depth  that instructs the server to only return
> >> > commits and trees and to ignore blobs.
> >> >
> >> > Later during git operations like checkout, when a blob cannot be
> >> > found after checking all the regular places (loose, pack,
> >> > alternates, etc), git will download the missing object and place it
> >> > into the local object store (currently as a loose object) then resume the
> operation.
> >>
> >> Have you looked at the "external odb" patches I wrote a while ago,
> >> and which Christian has been trying to resurrect?
> >>
> >>
> >> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpubli
> >> c-
> >> inbox.org%2Fgit%2F20161130210420.15982-1-
> >>
> chriscool%40tuxfamily.org%2F&data=02%7C01%7CBen.Peart%40microsoft.c
> >>
> om%7C9596d3bf32564f123e0c08d43f08a9e1%7C72f988bf86f141af91ab2d7c
> >>
> d011db47%7C1%7C0%7C636202753822020527&sdata=a6%2BGOAQoRhjFoxS
> >> vftY8JZAVUssmrXuDZ9OBy3xqNZk%3D&reserved=0
> >>
> >> This is a similar approach, though I pushed the policy for "how do
> >> you get the objects" out into an external script. One advantage there
> >> is that large objects could easily be fetched from another source
> >> entirely (e.g., S3 or equivalent) rather than the repo itself.
> >>
> >> The downside is that it makes things more complicated, because a push
> >> or a fetch now involves three parties (server, client, and the
> >> alternate object store). So questions like "do I have all the objects
> >> I need" are hard to reason about.
> >>
> >> If you assume that there's going to be _some_ central Git repo which
> >> has all of the objects, you might as well fetch from there (and do it
> >> over normal git protocols). And that simplifies things a bit, at the cost of
> being less flexible.
> >
> > We looked quite a bit at the external odb patches, as well as lfs and
> > even using alternates.  They all share a common downside that you must
> > maintain a separate service that contains _some_ of the files.
> 
> Pushing the policy for "how do you get the objects" out into an external
> helper doesn't mean that the external helper cannot use the main service.
> The external helper is still free to do whatever it wants including calling the
> main service if it thinks it's better.

That is a good point and you're correct, that means you can avoid having to build out multiple services.

> 
> > These
> > files must also be versioned, replicated, backed up and the service
> > itself scaled out to handle the load.  As you mentioned, having
> > multiple services involved increases flexability but it also increases
> > the complexity and decreases the reliability of the overall version
> > control service.
> 
> About reliability, I think it depends a lot on the use case. If you want to get
> very big files over an unreliable connection, it can better if you send those big
> files over a restartable protocol and service like HTTP/S on a regular web
> server.
> 

My primary concern about reliability was the multiplicative effect of making multiple requests across multiple servers to complete a single request.  Having putting this all in a single service like you suggested above brings us back to parity on the complexity.

> > For operational simplicity, we opted to go with a design that uses a
> > single, central git repo which has _all_ the objects and to focus on
> > enhancing it to handle large numbers of files efficiently.  This
> > allows us to focus our efforts on a great git service and to avoid
> > having to build out these other services.
> 
> Ok, but I don't think it prevents you from using at least some of the same
> mechanisms that the external odb series is using.
> And reducing the number of mechanisms in Git itself is great for its
> maintainability and simplicity.

I completely agree with the goal of reducing the number of mechanisms in Git itself.  Our proposal is primarily targeting speeding operations when dealing with large numbers of files.  ObjectDB is primarily targeting large objects but there is a lot of similarity in how we're approaching the solution.  I hope/believe we can come to a common solution that will solve both.

> 
> >> > To prevent git from accidentally downloading all missing blobs,
> >> > some git operations are updated to be aware of the potential for
> missing blobs.
> >> > The most obvious being check_connected which will return success as
> >> > if everything in the requested commits is available locally.
> >>
> >> Actually, Git is pretty good about trying not to access blobs when it
> >> doesn't need to. The important thing is that you know enough about
> >> the blobs to fulfill has_sha1_file() and sha1_object_info() requests
> >> without actually fetching the data.
> >>
> >> So the client definitely needs to have some list of which objects
> >> exist, and which it _could_ get if it needed to.
> 
> Yeah, and the external odb series handles that already, thanks to Peff's initial
> work.
> 

I'm currently working on a patch series that will reimplement our current read-object hook to use the LFS model for long running background processes.  As part of that, I am building a versioned interface that will support multiple commands (like get, have, put).  In my initial implementation, I'm only supporting the "get" verb as that is what we currently need but my intent is to build it so that we could add have and put in future versions.  When I have the first iteration ready, I'll push it up to our fork on github for review as code is clearer than my description in email.

Moving forward, the "have" verb is a little problematic as we would "have" 3+ million shas that we'd be required to fetch from the server and then pass along to git when requested.  It would be nice to come up with a way to avoid or reduce that cost.

> >> The one place you'd probably want to tweak things is in the diff
> >> code, as a single "git log -Sfoo" would fault in all of the blobs.
> >
> > It is an interesting idea to explore how we could be smarter about
> > preventing blobs from faulting in if we had enough info to fulfill
> > has_sha1_file() and sha1_object_info().  Given we also heavily prune
> > the working directory using sparse-checkout, this hasn't been our top
> > focus but it is certainly something worth looking into.
> 
> The external odb series doesn't handle preventing blobs from faulting in yet,
> so this could be a common problem.
> 

Agreed.  This is one we've been working on quite a bit out of necessity.  If you look at our patch series, most of the changes are related to dealing with missing objects.

> [...]
> 
> >> One big hurdle to this approach, no matter the protocol, is how you
> >> are going to handle deltas. Right now, a git client tells the server
> >> "I have this commit, but I want this other one". And the server knows
> >> which objects the client has from the first, and which it needs from
> >> the second. Moreover, it knows that it can send objects in delta form
> >> directly from disk if the other side has the delta base.
> >>
> >> So what happens in this system? We know we don't need to send any
> >> blobs in a regular fetch, because the whole idea is that we only send
> >> blobs on demand. So we wait for the client to ask us for blob A. But
> >> then what do we send? If we send the whole blob without deltas, we're
> >> going to waste a lot of bandwidth.
> >>
> >> The on-disk size of all of the blobs in linux.git is ~500MB. The
> >> actual data size is ~48GB. Some of that is from zlib, which you get
> >> even for non-deltas. But the rest of it is from the delta
> >> compression. I don't think it's feasible to give that up, at least
> >> not for "normal" source repos like linux.git (more on that in a minute).
> >>
> >> So ideally you do want to send deltas. But how do you know which
> >> objects the other side already has, which you can use as a delta
> >> base? Sending the list of "here are the blobs I have" doesn't scale.
> >> Just the sha1s start to add up, especially when you are doing incremental
> fetches.
> 
> To initialize some paths that the client wants, it could perhaps just ask for
> some pack files, or maybe bundle files, related to these paths.
> Those packs or bundles could be downloaded either directly from the main
> server or from other web or proxy servers.
> 
> >> I think this sort of things performs a lot better when you just focus
> >> on large objects. Because they don't tend to delta well anyway, and
> >> the savings are much bigger by avoiding ones you don't want. So a
> >> directive like "don't bother sending blobs larger than 1MB" avoids a
> >> lot of these issues. In other words, you have some quick shorthand to
> >> communicate between the client and server: this what I have, and what I
> don't.
> >> Normal git relies on commit reachability for that, but there are
> >> obviously other dimensions. The key thing is that both sides be able
> >> to express the filters succinctly, and apply them efficiently.
> >
> > Our challenge has been more the sheer _number_ of files that exist in
> > the repo rather than the _size_ of the files in the repo.  With >3M
> > source files and any typical developer only needing a small percentage
> > of those files to do their job, our focus has been pruning the tree as
> > much as possible such that they only pay the cost for the files they
> > actually need.  With typical text source files being 10K - 20K in
> > size, the overhead of the round trip is a significant part of the
> > overall transfer time so deltas don't help as much.  I agree that
> > large files are also a problem but it isn't my top focus at this point in time.
> 
> Ok, but it would be nice if both problems could be solved using some
> common mechanisms.
> This way it could probably work better in situations where there are both a
> large number of files _and_ some big files.
> And from what I am seeing, there could be no real downside from using
> some common mechanisms.
> 

Agree completely.  I'm hopeful that we can come up with some common mechanisms that will allow us to solve both problems.

> >> If most of your benefits are not from avoiding blobs in general, but
> >> rather just from sparsely populating the tree, then it sounds like
> >> sparse clone might be an easier path forward. The general idea is to
> >> restrict not just the checkout, but the actual object transfer and
> >> reachability (in the tree dimension, the way shallow clone limits it
> >> in the time dimension, which will require cooperation between the client
> and server).
> >>
> >> So that's another dimension of filtering, which should be expressed
> >> pretty
> >> succinctly: "I'm interested in these paths, and not these other
> >> ones." It's pretty easy to compute on the server side during graph
> >> traversal (though it interacts badly with reachability bitmaps, so
> >> there would need to be some hacks there).
> >>
> >> It's an idea that's been talked about many times, but I don't recall
> >> that there were ever working patches. You might dig around in the
> >> list archive under the name "sparse clone" or possibly "narrow clone".
> >
> > While a sparse/narrow clone would work with this proposal, it isn't
> > required.  You'd still probably want all the commits and trees but the
> > clone would also bring down the specified blobs.  Combined with using
> > "depth" you could further limit it to those blobs at tip.
> >
> > We did run into problems with this model however as our usage patterns
> > are such that our working directories often contain very sparse trees
> > and as a result, we can end up with thousands of entries in the sparse
> > checkout file.  This makes it difficult for users to manually specify
> > a sparse-checkout before they even do a clone.  We have implemented a
> > hashmap based sparse-checkout to deal with the performance issues of
> > having that many entries but that's a different RFC/PATCH.  In short,
> > we found that a "lazy-clone" and downloading blobs on demand provided
> > a better developer experience.
> 
> I think both ways are possible using the external odb mechanism.
> 
> >> > Future Work
> >> > ~~~~~~~~~~~
> >> >
> >> > The current prototype calls a new hook proc in
> >> > sha1_object_info_extended and read_object, to download each missing
> >> > blob.  A better solution would be to implement this via a long
> >> > running process that is spawned on the first download and listens
> >> > for requests to download additional objects until it terminates
> >> > when the parent git operation exits (similar to the recent long
> >> > running smudge and clean filter
> >> work).
> >>
> >> Yeah, see the external-odb discussion. Those prototypes use a process
> >> per object, but I think we all agree after seeing how the git-lfs
> >> interface has scaled that this is a non-starter. Recent versions of
> >> git-lfs do the single- process thing, and I think any sort of
> >> external-odb hook should be modeled on that protocol.
> 
> I agree that the git-lfs scaling work is great, but I think it's not necessary in the
> external odb work to have the same kind of single-process protocol from the
> beginning (though it should be possible and easy to add it).
> For example if the external odb work can be used or extended to handle
> restartable clone by downloading a single bundle when cloning, this would
> not need that kind of protocol.
> 
> > I'm looking into this now and plan to re-implement it this way before
> > sending out the first patch series.  Glad to hear you think it is a
> > good protocol to model it on.
> 
> Yeah, for your use case on Windows, it looks really worth it to use this kind
> of protocol.
> 
> >> > Need to investigate an alternate batching scheme where we can make
> >> > a single request for a set of "related" blobs and receive single a
> >> > packfile (especially during checkout).
> >>
> >> I think this sort of batching is going to be the really hard part to
> >> retrofit onto git. Because you're throwing out the procedural notion
> >> that you can loop over a set of objects and ask for each individually.
> >> You have to start deferring computation until answers are ready. Some
> >> operations can do that reasonably well (e.g., checkout), but
> >> something like "git log -p" is constantly digging down into history.
> >> I suppose you could just perform the skeleton of the operation
> >> _twice_, once to find the list of objects to fault in, and the second time to
> actually do it.
> 
> In my opinion, perhaps we can just prevent "git log -p" from faulting in blobs
> and have it show a warning saying that it was performed only on a subset of
> all the blobs.
> 

You might be surprised at how many other places end up faulting in blobs. :)  Rename detection is one we've recently been working on.

> [...]


^ permalink raw reply

* Re: [PATCH/RFC] WIP: log: allow "-" as a short-hand for "previous branch"
From: Siddharth Kannan @ 2017-02-07 19:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pranit.bauva, Matthieu.Moy, peff, pclouds, sandals
In-Reply-To: <xmqqtw86zzk4.fsf@gitster.mtv.corp.google.com>

On Mon, Feb 06, 2017 at 03:09:47PM -0800, Junio C Hamano wrote:
> The focus of GSoC being mentoring those who are new to the open
> source development, and hopefully retain them in the community after
> GSoC is over, we do expect microprojects to be suitable for those
> who are new to the codebase.

Okay, understood! Since I have spent time here anyway, I guess I will
continue on this instead of going over to a new micro project.

> 
> > (c) -> Else look for "r1^-"
> > ...
> > Case (c) is a bit confusing. This could be something like "-^-", and
> > something like "^-" could mean "Not commits on previous branch" or it
> > could mean "All commits on this branch except for the parent of HEAD"
> 
> Do you mean:
> 
>     "git rev-parse ^-" does not mean "git rev-parse HEAD^-", but we
>     probably would want to, and if that is what is going to happen,
>     "^-" should mean "HEAD^-", and cannot be used for "^@{-1}"?
> 
> It's friend "^!" does not mean "HEAD^!", and "^@" does not mean
> "HEAD^@", either (the latter is somewhat borked, though, and "^@"
> translates to "^HEAD" because confusingly "@" stands for "HEAD"
> sometimes).  

Yes, I meant that whether we should use ^- as ^@{-1} or HEAD^-.

Oh! So, that's why running `git log ^@` leads to an empty set!
> 
> So my gut feeling is that it is probably OK to make "^-" mean
> "^@{-1}"; it may be prudent to at least initially keep "^-" an error
> like it currently is already, though.

I agree with your gut feeling, and would like to _not_ exclude only
this case. This way, across the code and implementation, there
wouldn't be any particular cases which would have to be excluded.

> > So, this patch reduces to the following 2 tasks:
> > 
> > 1. Teach setup_revisions that something starting with "-" can be
> > an
> > argument as well
> > 2. Teach get_sha1_basic that "-" means the tip of the previous
> > branch
> > perhaps by replacing it with "@{-1}" just before the reflog
> > parsing is
> > done

Making a change in sha1_name.c will touch a lot of commands
(setup_revisions is called from everywhere in the codebase), so, I am
still trying to figure out how to do this such that the rest of the
codepath remains unchanged.

I hope that you do not mind this side-effect, but rather, you intended
for this to happen, right? More commands will start supporting this
shorthand, suddenly.  (such as format-patch, whatchanged, diff to name
a very few).

Best Regards,

Siddharth.

^ permalink raw reply

* Re: Request re git status
From: Jacob Keller @ 2017-02-07 19:18 UTC (permalink / raw)
  To: Samuel Lijin; +Cc: Phil Hord, Ron Pero, Git
In-Reply-To: <CAJZjrdWbqvBRtyyfhgAt1E9ZdTUaz+Zpk7iGasNoeSuFJbsKog@mail.gmail.com>

On Tue, Feb 7, 2017 at 6:54 AM, Samuel Lijin <sxlijin@gmail.com> wrote:
> On Mon, Feb 6, 2017 at 6:45 PM, Phil Hord <phil.hord@gmail.com> wrote:
>> On Mon, Feb 6, 2017 at 3:36 PM Ron Pero <rpero@magnadev.com> wrote:
>>> I almost got bit by git: I knew there were changes on the remote
>>> server, but git status said I was uptodate with the remote.
>>>
>>
>> Do you mean you almost pushed some changed history with "--force"
>> which would have lost others' changes?  Use of this option is
>> discouraged on shared branches for this very reason.  But if you do
>> use it, the remote will tell you the hash of the old branch so you can
>> undo the damage.
>>
>> But if you did not use --force, then you were not in danger of being
>> bit.  Git would have prevented the push in that case.
>>
>>
>>> Why ... not design it to [optionally] DO a fetch and THEN declare
>>> whether it is up to date?
>>
>> It's because `git status` does not talk to the remote server, by
>> design.  The only Git commands that do talk to the remote are push,
>> pull and fetch.  All the rest work off-line and they do so
>> consistently.
>>
>> Imagine `git status` did what you requested; that is, it first did a
>> fetch and then reported the status.  Suppose someone pushed a commit
>> to the remote immediately after your fetch completed.  Now git will
>> still report "up to date" but it will be wrong as soon as the remote
>> finishes adding the new push.  Yet the "up to date" message will
>> remain on your console, lying to you.  If you leave and come back in
>> two days, the message will remain there even if it is no longer
>> correct.
>>
>> So you should accept that `git status` tells you the status with
>> respect to your most recent fetch, and that you are responsible for
>> the timing of the most recent fetch.  To have git try to do otherwise
>> would be misleading.
>
> This argument doesn't work for me. Race conditions in *any*
> asynchronous work flow are inevitable; in commits, particularly to a
> shared branch, I also can't imagine them being common. It's like
> saying because there's lag between the remote's response and the
> output on the local, `git fetch` shouldn't bother saying that the
> local remote has been updated.
>
> It wouldn't be hard, though, to define an alias that fetches the
> remote-tracking branch and then reports the status.
>
> Nevertheless, this is one of those cases where I think Git suffers
> from a poor UI/UX - it's letting the underlying model define the
> behavior, rather than using the underlying model to drive the
> behavior.
>

Personally, I think that the fact that Git forces the user to think
about it in terms of "oh I have to fetch" instead of that happening
automatically, it helps teach the model to the user. If it happened in
the background then the user might not be confronted with the
distributed nature of the tool.

An alias to fetch and then show status is very straight forward, and
you can do so locally if you want.

Thanks,
Jake

^ permalink raw reply

* Trying to use xfuncname without success.
From: Jack Adrian Zappa @ 2017-02-07 19:21 UTC (permalink / raw)
  To: git-mailing-list

I'm trying to specify a hunk header using xfuncname, and it just
doesn't want to work.

The full question is on SO here:

http://stackoverflow.com/questions/42078376/why-isnt-my-xfuncname-working-in-my-gitconfig-file

But the basic gist is that no matter what regex I specify, git will
not recognise the hunk header.  Am I doing something wrong or is this
a bug?

For those who don't want to jump to the SO site, I've copied the text below:

-----8<--------8<--------8<--------8<--------8<--------8<--------8<--------8<---

I'm trying to setup a hunk header for .natvis files. For some reason,
it doesn't seem to be working. I'm following their instructions from
here, which doesn't say much in terms of restrictions of the regex,
such as, is the matched item considered the hunk header or do I need a
group? I have tried both with no success. This is what I have:

[diff "natvis"]
    xfuncname = "^[\\\t ]*<Type[\\\t ]+Name=\"([^\"])\".*$"

I've also added to my .gitattributes file (even though I'm not
positive that it is necessary):

*.natvis diff=natvis

I've tried \t instead of \\\t as well as replacing the entire regex
with just <Type.* with no luck. I'm using git version 2.7.0.windows.1
on Windows 8.1. EDIT: I upgraded to git version 2.11.1.windows.1 on
Windows 8.1 and even tried git version 2.8.3 on cygwin64 on Windows
8.1 with the same results.

As a test file, I have the following test.natvis file:

<?xml version="1.0" encoding="utf-8"?>
<AutoVisualizer
xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010">

  <Type Name="test">
    <Expand>
      <Item Name="var">var</Item>

      <!-- Non-blank line -->
      <Item Name="added var">added_var</Item>

      <Item Name="var2">var2</Item>
    </Expand>
  </Type>
</AutoVisualizer>

with the <Item Name="added var">added_var</Item> being the new line added.

I'm really not sure why this is so difficult.

EDIT:

Here is a sample output of what I am getting:

$ git diff --word-diff
diff --git a/test.natvis b/test.natvis
index 73c06bc..bc0f549 100644
--- a/test.natvis
+++ b/test.natvis
@@ -18,6 +18,7 @@

      <!-- Non-blank line -->
      {+<Item Name="added var">added_var</Item>+}

      <Item Name="var2">var2</Item>
warning: LF will be replaced by CRLF in test.natvis.
The file will have its original line endings in your working directory.

Even using xfuncname = "^.*$" I would have expected that  would have shown up as my hunk header, but I get
nothing. :(

EDIT:

I've tried the solution proposed by torek, but to no avail. It's like
it doesn't know what to do with the xfuncname entry. :(

^ permalink raw reply related

* Re: [RFC] mailmap.blob overrides default .mailmap
From: Jeff King @ 2017-02-07 19:28 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Cornelius Weig, git@vger.kernel.org
In-Reply-To: <CAGZ79kZ=ikbYpuK6E=ui1ju=bRavcVcxb3AA_dvb2Jp6cRNmJQ@mail.gmail.com>

On Tue, Feb 07, 2017 at 09:27:19AM -0800, Stefan Beller wrote:

> > The code shows why (mailmap.c):
> >         err |= read_mailmap_file(map, ".mailmap", repo_abbrev);
> >         if (startup_info->have_repository)
> >                 err |= read_mailmap_blob(map, git_mailmap_blob, repo_abbrev);
> >         err |= read_mailmap_file(map, git_mailmap_file, repo_abbrev);
> >
> >
> > Apparently this is not an oversight, because there is an explicit
> > test for this overriding behavior (t4203 'mailmap.blob overrides
> > .mailmap').
> 
> which is blamed to 08610900 (mailmap: support reading mailmap from
> blobs, 2012-12-12),
> cc'ing Jeff who may remember what he was doing back then, as the
> commit message doesn't discuss the implications on ordering.

I think it was mostly that I had to define _some_ order. This made sense
to me as similar to things like attributes or excludes, where we prefer
clone-specific data over in-history data (so .git/info/attributes takes
precedence over .gitattributes).

So any mailmap.* would take precedence over the in-tree .mailmap file.
And then between mailmap.file and mailmap.blob, the "blob" form is
more "in-tree" than the "file" form (especially because we turn it on by
default in bare repos, so it really is identical to the in-tree form
there).

I think the easiest way to think of it is the same as we do config. We
read the files in a particular order, least-important to most-important,
and apply "last one wins" (so more-important entries overwrite
less-important ones).

-Peff

^ permalink raw reply

* Re: ``git clean -xdf'' and ``make clean''
From: Jacob Keller @ 2017-02-07 19:35 UTC (permalink / raw)
  To: Cornelius Weig; +Cc: Hongyi Zhao, Git mailing list
In-Reply-To: <fe8595aa-0395-e948-13e9-f952541d106e@tngtech.com>

On Tue, Feb 7, 2017 at 7:07 AM, Cornelius Weig
<cornelius.weig@tngtech.com> wrote:
> On 02/07/2017 03:17 PM, Hongyi Zhao wrote:
>> Hi all,
>>
>> In order to delete all of the last build stuff, does the following two
>> methods equivalent or not?
>>
>> ``git clean -xdf'' and ``make clean''
>
> No, it is not equivalent.
>
> * `make clean` removes any build-related files (assuming that the
> `clean` target is properly written). To see exactly what it would do,
> run `make clean -n`. Judging from your question, I think this is what
> you want to do.
>
> * `git clean -xdf` would remove any files that git does not track. This
> also includes build-related files, but also any other files that happen
> to be in your working directory. For example, any output from `git
> format-patch` would be removed by this, but not `make clean`.

Make clean can run arbitrary code, and really depends on the
implementation. git clean -xdf will result in all non-tracked files
being removed, which should restore you to a pristine pre-build state.
However, this can have unfortunate side effect of destroying files
which you might not expect.

Properly written, a make clean shouldn't remove anything except what
could be regenerated by make. But that's just a strong convention.

Regards,
Jake

^ permalink raw reply

* Re: subtree merging fails
From: David Aguilar @ 2017-02-07 18:44 UTC (permalink / raw)
  To: Samuel Lijin; +Cc: Stavros Liaskos, git@vger.kernel.org
In-Reply-To: <CAJZjrdU3toam4tDwXBu1Q3UAZm-kML3CzMrsMoJ_2jsGJ3vWrQ@mail.gmail.com>

On Tue, Feb 07, 2017 at 08:59:06AM -0600, Samuel Lijin wrote:
> Have you tried using (without -s subtree) -X subtree=path/to/add/subtree/at?
> 
> From the man page:
> 
>           subtree[=<path>]
>                This option is a more advanced form of subtree
> strategy, where the strategy
>                makes a guess on how two trees must be shifted to match
> with each other when
>                merging. Instead, the specified path is prefixed (or
> stripped from the
>                beginning) to make the shape of two trees to match.

I'm not 100% certain, but it's highly likely that the subtree=<prefix>
argument needs to include a trailing slash "/" in the prefix,
otherwise files will be named e.g. "fooREADME" instead of
"foo/README" when prefix=foo.

These days I would steer users towards the "git-subtree" command in
contrib/ so that users don't need to deal with these details.  It
handles all of this stuff for you.

https://github.com/git/git/blob/master/contrib/subtree/git-subtree.txt

https://github.com/git/git/tree/master/contrib/subtree

Updating the progit book to also mention git-subtree, in addition to the
low-level methods, would probably be a good user-centric change.
-- 
David

^ permalink raw reply

* Re: [PATCH] rev-list-options.txt: update --all about detached HEAD
From: Jeff King @ 2017-02-07 19:42 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano
In-Reply-To: <20170207133850.14056-1-pclouds@gmail.com>

On Tue, Feb 07, 2017 at 08:38:49PM +0700, Nguyễn Thái Ngọc Duy wrote:

> This is the document patch for f0298cf1c6 (revision walker: include a
> detached HEAD in --all - 2009-01-16)
> [...]
>  --all::
> -	Pretend as if all the refs in `refs/` are listed on the
> -	command line as '<commit>'.
> +	Pretend as if all the refs in `refs/` (and HEAD if detached)
> +	are listed on the command line as '<commit>'.

I think this is an improvement, but I'm not sure about the "if detached"
bit. We always read HEAD, no matter what.

If you only care about reachability, then reading HEAD only has an
impact if it is detached, since otherwise we know that we will grab the
ref via refs/.

I'm not sure if it would matter for some other cases, though. For
example, with "--source", do we report HEAD or the matching ref? It
looks like the latter (because we read the refs first).

I suspect you could also construct a case with excludes like:

  $ git checkout foo
  $ git rev-list --exclude=refs/heads/foo --all

where it is relevant that we read HEAD separately from refs/heads/foo.

So I think just "and HEAD" is better, like:

  Pretend as if all the refs in `refs/`, along with `HEAD`, are
  listed...

-Peff

^ permalink raw reply

* Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
From: Jeff King @ 2017-02-07 19:50 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Brandon Williams, Junio C Hamano, git
In-Reply-To: <20170207150414.GD8583@stefanha-x1.localdomain>

On Tue, Feb 07, 2017 at 03:04:14PM +0000, Stefan Hajnoczi wrote:

> > I assume Stefan just grabbed my naive suggestion hence why it checks
> > equality with a commit.  So that's my fault :)  Either of these may
> > not be enough though, since if you do 'git grep malloc v2.9.3^{tree}'
> > with this change the output prefix is 'v2.9.3^{tree}/' instead of the
> > correct prefix 'v2.9.3^{tree}:'
> 
> I revisited this series again today and am coming to the conclusion that
> forming output based on the user's rev is really hard to get right in
> all cases.  I don't have a good solution to the v2.9.3^{tree} problem.

I think the rule you need is not "are we at a tree", but rather "did we
traverse a path while resolving the name?". Only the get_sha1() parser
can tell you that. I think:

  char delim = ':';
  struct object_context oc;
  if (get_sha1_with_context(name, 0, sha1, &oc))
          die("...");
  if (oc.path[0])
          delim = '/'; /* name had a partial path */

would work. Root trees via "v2.9.3^{tree}" or "v2.9.3:" would have no
path, but "v2.9.3:Documentation" would. I think you'd still need to
avoid duplicating a trailing delimiter, but I can't think of a case
where it is wrong to do that based purely on the name.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Feb 2017, #02; Mon, 6)
From: Junio C Hamano @ 2017-02-07 20:01 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Karthik Nayak, Git mailing list
In-Reply-To: <CAM0VKjmhO9NQLz9TDv5M3OhxSBt-JdjaouVT0pTA-a6mGaF4_A@mail.gmail.com>

SZEDER Gábor <szeder.dev@gmail.com> writes:

> All failing tests fail with the same error:
>
>   fatal: unrecognized %(refname:strip=2) argument: strip=2
>
> That's because of this topic:
>
>> * kn/ref-filter-branch-list (2017-01-31) 20 commits

Ahh, of course.

Let's make sure the series won't escape to 'master' before the
"strip" breakage is fixed.  How about queuing this on top of the
ref-filter topic?  

It seems to unblock your completion-refs-speedup topic and makes the
test pass ;-)

Thanks.

-- >8 --
Subject: [PATCH] ref-filter: resurrect "strip" as a synonym to "lstrip"

We forgot that "strip" was introduced at 0571979bd6 ("tag: do not
show ambiguous tag names as "tags/foo"", 2016-01-25) as part of Git
2.8 (and 2.7.1), yet in the update to ref-filter, we started calling
it "lstrip" to make it easier to explain the new "rstrip" operation.

We shouldn't have renamed the existing one; "lstrip" should have
been a new synonym that means the same thing as "strip".  Scripts
in the wild are surely using the original form already.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-for-each-ref.txt |  2 ++
 ref-filter.c                       |  3 ++-
 t/t6300-for-each-ref.sh            | 12 ++++++++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 2008600e7e..111e1be6f5 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -107,6 +107,8 @@ refname::
 	enough components, the result becomes an empty string if
 	stripping with positive <N>, or it becomes the full refname if
 	stripping with negative <N>.  Neither is an error.
++
+`strip` can be used as a synomym to `lstrip`.
 
 objecttype::
 	The type of the object (`blob`, `tree`, `commit`, `tag`).
diff --git a/ref-filter.c b/ref-filter.c
index 01b5c18ef0..2a94d6da98 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -112,7 +112,8 @@ static void refname_atom_parser_internal(struct refname_atom *atom,
 		atom->option = R_NORMAL;
 	else if (!strcmp(arg, "short"))
 		atom->option = R_SHORT;
-	else if (skip_prefix(arg, "lstrip=", &arg)) {
+	else if (skip_prefix(arg, "lstrip=", &arg) ||
+		 skip_prefix(arg, "strip=", &arg)) {
 		atom->option = R_LSTRIP;
 		if (strtol_i(arg, 10, &atom->lstrip))
 			die(_("Integer value expected refname:lstrip=%s"), arg);
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 25a9973ce9..c87dc1f8bc 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -59,18 +59,26 @@ test_atom head refname:rstrip=1 refs/heads
 test_atom head refname:rstrip=2 refs
 test_atom head refname:rstrip=-1 refs
 test_atom head refname:rstrip=-2 refs/heads
+test_atom head refname:strip=1 heads/master
+test_atom head refname:strip=2 master
+test_atom head refname:strip=-1 master
+test_atom head refname:strip=-2 heads/master
 test_atom head upstream refs/remotes/origin/master
 test_atom head upstream:short origin/master
 test_atom head upstream:lstrip=2 origin/master
 test_atom head upstream:lstrip=-2 origin/master
 test_atom head upstream:rstrip=2 refs/remotes
 test_atom head upstream:rstrip=-2 refs/remotes
+test_atom head upstream:strip=2 origin/master
+test_atom head upstream:strip=-2 origin/master
 test_atom head push refs/remotes/myfork/master
 test_atom head push:short myfork/master
 test_atom head push:lstrip=1 remotes/myfork/master
 test_atom head push:lstrip=-1 master
 test_atom head push:rstrip=1 refs/remotes/myfork
 test_atom head push:rstrip=-1 refs
+test_atom head push:strip=1 remotes/myfork/master
+test_atom head push:strip=-1 master
 test_atom head objecttype commit
 test_atom head objectsize 171
 test_atom head objectname $(git rev-parse refs/heads/master)
@@ -636,6 +644,10 @@ EOF
 test_expect_success 'Verify usage of %(symref:lstrip) atom' '
 	git for-each-ref --format="%(symref:lstrip=2)" refs/heads/sym > actual &&
 	git for-each-ref --format="%(symref:lstrip=-2)" refs/heads/sym >> actual &&
+	test_cmp expected actual &&
+
+	git for-each-ref --format="%(symref:strip=2)" refs/heads/sym > actual &&
+	git for-each-ref --format="%(symref:strip=-2)" refs/heads/sym >> actual &&
 	test_cmp expected actual
 '
 
-- 
2.12.0-rc0-144-g99fe1a5456


^ permalink raw reply related

* Re: [PATCH] difftool: fix bug when printing usage
From: Junio C Hamano @ 2017-02-07 20:03 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Aguilar, Git ML, Denton Liu
In-Reply-To: <alpine.DEB.2.20.1702071220290.3496@virtualbox>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > Likewise, this would become
>> >
>> > 	GIT_CEILING_DIRECTORIES="$PWD/not" \
>> > 	test_expect_code 129 git -C not/repo difftool -h >output &&
>> > 	grep ^usage: output
>> 
>> I agree with the intent, but the execution here is "Not quite".
>> test_expect_code being a shell function, it does not take the
>> "one-shot environment assignment for this single invocation," like
>> external commands do.
>
> So now that we know what is wrong, can you please enlighten me about what
> is right?

David's original is just fine, isn't it?

^ permalink raw reply

* Re: [PATCH] difftool: fix bug when printing usage
From: Junio C Hamano @ 2017-02-07 20:06 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Aguilar, Git ML, Denton Liu
In-Reply-To: <xmqqh945zs3c.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>>> > Likewise, this would become
>>> >
>>> > 	GIT_CEILING_DIRECTORIES="$PWD/not" \
>>> > 	test_expect_code 129 git -C not/repo difftool -h >output &&
>>> > 	grep ^usage: output
>>> 
>>> I agree with the intent, but the execution here is "Not quite".
>>> test_expect_code being a shell function, it does not take the
>>> "one-shot environment assignment for this single invocation," like
>>> external commands do.
>>
>> So now that we know what is wrong, can you please enlighten me about what
>> is right?
>
> David's original is just fine, isn't it?

I've also seen people use "env VAR=VAL git command" as the command
to be tested in t/ scripts.  You can run that under test_expect_code,
methinks.

^ permalink raw reply

* Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
From: Junio C Hamano @ 2017-02-07 20:24 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Brandon Williams, git, Jeff King
In-Reply-To: <20170207150414.GD8583@stefanha-x1.localdomain>

Stefan Hajnoczi <stefanha@redhat.com> writes:

> Perhaps it's better to leave this than to merge code that doesn't work
> correctly 100% of the time.

I am not sure if you are shooting for is "work correctly" to begin
with, to be honest.  The current code always shows the "correct"
output which is "the tree-ish object name (expressed in a way easier
to understand by the humans), followed by a colon, followed by the
path in the tree-ish the hit lies".  You are making it "incorrect
but often more convenient", and sometimes that is a worth goal, but
for the particular use cases you presented, i.e.

    $ git grep -e "$pattern" "$commit:path"

a more natural way to express "I want to find this pattern in the
commit under that path" exists:

    $ git grep -e "$pattern" "$commit" -- path

and because of that, I do not think the former form of the query
should happen _less_ often in the first place, which would make it
"incorrect but more convenient if the user gives an unusual query".

So I am not sure if the change to "grep" is worth it.

Having said that, I actually think "make it more convenient" without
making anything incorrect would be to teach the revision parser to
understand

    <any-expression-to-name-a-tree-ish:<path>

as an extended SHA-1 expression to name the blob or the tree at that
path in the tree-ish, e.g. if we can make the revision parser to
take this

    master:Documentation:git.txt

as the name of the blob object, then the current output is both
correct and more convenient.  After all, this sample string starts
at "master:Documentation" (which is an extended SHA-1 expression to
name a tree-ish), followed by a colon, then followed by the path
"git.txt" in it, and "grep -e pattern master:Documentation" would
show hits in that blob prefixed with it.

I.e.

	T=$(git rev-parse master:Documentation)
	git cat-file blob $T:git.txt

would give you the contents of the source to the Git manual.  It is
not all that unreasonable to expect

	git cat-file blob master:Documentation:git.txt

to be able to show the same thing as well.  You'd need to backtrack
the parsing (e.g. attempt to find "Documentation:git.txt" in
"master", fail to find any, then fall back to find "git.txt" in
"master:Documentation", find one, and be happy, or something like
that), and define how to resolve potential ambiguity (e.g. there may
indeed be "Documentation:git.txt" and "Documentation/git.txt" in the
tree-ish "master"), though.

^ permalink raw reply

* Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
From: Junio C Hamano @ 2017-02-07 20:37 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Brandon Williams, git, Jeff King
In-Reply-To: <xmqq8tphzr41.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

Sorry, one shouldn't type while being sick and in bed X-<.

> I am not sure if you are shooting for is "work correctly" to begin
> with, to be honest.  The current code always shows the "correct"
> output which is "the tree-ish object name (expressed in a way easier
> to understand by the humans), followed by a colon, followed by the
> path in the tree-ish the hit lies".  You are making it "incorrect
> but often more convenient", and sometimes that is a worth goal, but

s/worth/&y/;

> for the particular use cases you presented, i.e.
>
>     $ git grep -e "$pattern" "$commit:path"
>
> a more natural way to express "I want to find this pattern in the
> commit under that path" exists:
>
>     $ git grep -e "$pattern" "$commit" -- path
>
> and because of that, I do not think the former form of the query

s/do not think/do think/

> should happen _less_ often in the first place, which would make it
> "incorrect but more convenient if the user gives an unusual query".
>
> So I am not sure if the change to "grep" is worth it.

Also, it may be fairer to do s/incorrect/inconsistent/.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox