Git development
 help / color / mirror / Atom feed
* Re: What's cooking in git.git (Jan 2017, #04; Mon, 23)
From: Johannes Schindelin @ 2017-01-25 17:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, Junio C Hamano, git
In-Reply-To: <20170124132749.l3ezupyitvxe4t2l@sigill.intra.peff.net>

Hi Peff,

On Tue, 24 Jan 2017, Jeff King wrote:

> On Tue, Jan 24, 2017 at 11:04:21AM +0100, Lars Schneider wrote:
> 
> > "fsck: prepare dummy objects for --connectivity-check" seems to
> > make t1450-fsck.sh fail on macOS with TravisCI:
> >
> > Initialized empty Git repository in /Users/travis/build/git/git/t/trash directory.t1450-fsck/connectivity-only/.git/
> > [master (root-commit) 86520b7] empty
> >  Author: A U Thor <author@example.com>
> >  2 files changed, 1 insertion(+)
> >  create mode 100644 empty
> >  create mode 100644 empty.t
> > override r--r--r--  travis/staff for .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391? (y/n [n]) not overwritten
> > dangling blob c21c9352f7526e9576892a6631e0e8cf1fccd34d
> 
> Looks like "mv" prompts and then fails to move the file (so we get the
> dangling blob for the source blob, and fsck doesn't report failure
> because we didn't actually corrupt the destination blob).

IIRC I had similar problems years ago, on a machine where the
administrator defined mandatory aliases, including mv="mv -i".

Ciao,
Johannes

^ permalink raw reply

* [ANNOUNCE] Git Rev News edition 23
From: Christian Couder @ 2017-01-25 17:20 UTC (permalink / raw)
  To: git
  Cc: Thomas Ferris Nicolaisen, Jakub Narebski, Markus Jansen,
	Junio C Hamano, Johannes Schindelin, Jeff King, Stefan Beller,
	Jacob Keller, Eric Wong, Eduardo Habkost, Pranit Bauva,
	Andreas Schwab, Javantea, Aneesh Kumar K.V, Philip Oakley,
	Lars Schneider

Hi everyone,

The 23rd edition of Git Rev News is now published:

https://git.github.io/rev_news/2017/01/25/edition-23/

Thanks a lot to all the contributors and helpers!

Enjoy,
Christian, Thomas, Jakub and Markus.

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #04; Mon, 23)
From: Jeff King @ 2017-01-25 17:39 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Lars Schneider, Junio C Hamano, git
In-Reply-To: <alpine.DEB.2.20.1701251800120.3469@virtualbox>

On Wed, Jan 25, 2017 at 06:01:11PM +0100, Johannes Schindelin wrote:

> > Looks like "mv" prompts and then fails to move the file (so we get the
> > dangling blob for the source blob, and fsck doesn't report failure
> > because we didn't actually corrupt the destination blob).
> 
> IIRC I had similar problems years ago, on a machine where the
> administrator defined mandatory aliases, including mv="mv -i".

Yeah, that was my first thought, too. But this should be a
non-interactive shell, which would generally avoid loading rc files. I
think there are some exceptions, though (e.g., setting ENV or BASH_ENV).
Loading aliases like "mv -i" for non-interactive shells seems somewhat
insane to me. But whatever the cause, I think the workaround I posted is
easy enough to do.

-Peff

^ permalink raw reply

* Re: [PATCH 4/5] revision.c: refactor ref selection handler after --exclude
From: Jacob Keller @ 2017-01-25 17:41 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Git mailing list, Jeff King
In-Reply-To: <20170125125054.7422-5-pclouds@gmail.com>

On Wed, Jan 25, 2017 at 4:50 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> Behavior change: "--exclude --blah --remotes" will not exclude remote
> branches any more. Only "--exclude --remotes" does.
>
> This is because --exclude is going to have a new friend --decorate-reflog
> who haves the same way. When you allow a distant --remotes to complement
> a previous option, things get complicated. In
>
>     --exclude .. --decorate-reflog ... --remotes
>
> Does it mean decorate remote reflogs, or does it mean exclude remotes
> from the selected revisions?
>
> Granted, there may be valid use cases for such a combination (e.g.
> "decorate all reflogs except remote ones") but I feel option order is
> not a good fit to express them.
>

Limiting the scope of the exclude seems somewhat reasonable to me,
because it makes it much easier to explain and show the user. We do
need to make sure it's not going to break any scripts or other issues.
Is it possible for us to produce an error if the user does "--exclude"
without a necessary connecting option?

Thanks,
Jake

^ permalink raw reply

* Re: git clone problem
From: Santiago Torres @ 2017-01-25 17:58 UTC (permalink / raw)
  To: Jordi Durban; +Cc: git
In-Reply-To: <ef188364-ccd2-e7f5-3c06-62afca79f8d3@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

Hello, Jordi.

Hmm, it should've cloned in the "whatever" directory.

Can you post your git version/configs and maybe the output verbatim of
the command when you run it?

If you can reproduce in an empty dictionary that'd be better

$ mkdir test && cd test

$ git clone --recursive https://github.com/...

$ ls 

Thanks,
-Santiago

On Wed, Jan 25, 2017 at 05:58:58PM +0100, Jordi Durban wrote:
> Hi all! Not sure if that will reach the goal, but let's it a try.
> 
> I have a problem with the git clone command: when I try to clone a remote
> repository with the following:
> 
> git clone --recursive https://github.com/whatever.git
> 
> what I actually obtain is a copy of my own files in the current directory.
> 
> I mean:
> 
> In the current directory:
> 
> $ls
> 
> -rwxr-xr-x 1  1,6K oct 24 17:29 get_fasta.pl
> -rwxr-xr-x 1  1,6K set  5 13:05 script_clus_miRNA_c95.pl
> 
> $git clone --recursive https://github.com/whatever.git WHATEVER
> 
> $ls
> 
> -rwxr-xr-x 1  1,6K oct 24 17:29 get_fasta.pl
> -rwxr-xr-x 1  1,6K set  5 13:05 script_clus_miRNA_c95.pl
> 
> -rwxr-xr-x 1  1,6K set  5 13:05 WHATEVER
> 
> $ls WHATEVER
> 
> -rwxr-xr-x 1  1,6K oct 24 17:29 get_fasta.pl
> -rwxr-xr-x 1  1,6K set  5 13:05 script_clus_miRNA_c95.pl
> 
> I am really confused with that.
> 
> Any help will be appreciated. Thank you
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] tag: add tag.createReflog option
From: Jeff King @ 2017-01-25 18:00 UTC (permalink / raw)
  To: cornelius.weig; +Cc: git, novalis, pclouds
In-Reply-To: <20170125001906.13916-1-cornelius.weig@tngtech.com>

On Wed, Jan 25, 2017 at 01:19:06AM +0100, cornelius.weig@tngtech.com wrote:

> From: Cornelius Weig <cornelius.weig@tngtech.com>
> 
> Git does not create a history for tags, in contrast to common
> expectation to simply version everything. This can be changed by using
> the `--create-reflog` flag when creating the tag. However, a config
> option to enable this behavior by default is missing.

Hmm, I didn't even know we had "tag --create-reflog". Looks like it was
added by 144c76fa39 (update-ref and tag: add --create-reflog arg,
2015-07-21).

IMHO it is a mistake. The "update-ref --create-reflog" variant makes
sense to me as a plumbing operation. But are there end users who want to
create a reflog for just _one_ tag?

As your patch shows, the more likely variant is "I want reflogs for all
tags". But that raises two questions with your patch:

  - yours isn't "reflogs for all tags". It's "reflogs for tags I created
    with git-tag". What about other operations that create tags, like
    fetching (or even just a script that uses update-ref under the
    hood).

    IOW, instead of tag.createReflog, should this be tweaing
    core.logallrefupdates to have a mode that includes tags?

  - Is that the end of it, or is the desire really "I want reflogs for
    _everything_"? That seems like a sane thing to want.

    If so, then the update to core.logallrefupdates should turn it into
    a tri-state:

      - false; no reflogs

      - true; reflogs for branches, remotes, notes, as now

      - always; reflogs for all refs under "refs/"

I made a lot of suppositions about your desires there, so maybe you
really do want just tag.createReflog. But "core.logallrefupdates =
always" sounds a lot more useful to me.

-Peff

^ permalink raw reply

* Re: [PATCH] tag: add tag.createReflog option
From: Junio C Hamano @ 2017-01-25 18:10 UTC (permalink / raw)
  To: Jeff King; +Cc: cornelius.weig, git, novalis, pclouds
In-Reply-To: <20170125180054.7mioop2o6uvqloyt@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> I made a lot of suppositions about your desires there, so maybe you
> really do want just tag.createReflog. But "core.logallrefupdates =
> always" sounds a lot more useful to me.

Thanks for saving me from typing exactly the same thing ;-)

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #04; Mon, 23)
From: Junio C Hamano @ 2017-01-25 18:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Lars Schneider, git
In-Reply-To: <20170125173958.pg546a6w33dirp5k@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Wed, Jan 25, 2017 at 06:01:11PM +0100, Johannes Schindelin wrote:
>
>> > Looks like "mv" prompts and then fails to move the file (so we get the
>> > dangling blob for the source blob, and fsck doesn't report failure
>> > because we didn't actually corrupt the destination blob).
>> 
>> IIRC I had similar problems years ago, on a machine where the
>> administrator defined mandatory aliases, including mv="mv -i".
>
> Yeah, that was my first thought, too. But this should be a
> non-interactive shell, which would generally avoid loading rc files. I
> think there are some exceptions, though (e.g., setting ENV or BASH_ENV).
> Loading aliases like "mv -i" for non-interactive shells seems somewhat
> insane to me. 

It does to me, too.

> But whatever the cause, I think the workaround I posted is
> easy enough to do.

Or spelling it explicitly as "/bin/mv" (forgetting systems that does
not have it in /bin but as /usr/bin/mv) would also defeat alias if
that were the cause.

One downside of working it around like your patch does, or spelling
it out as "/bin/mv", is that we'd need to worry about all the uses
of "mv" in our scripts.  If this were _only_ happening in the Travis
environment, I'd prefer to see why it happens only there and fix that
instead.


^ permalink raw reply

* Re: [PATCH 01/12] for_each_alternate_ref: handle failure from real_pathdup()
From: Junio C Hamano @ 2017-01-25 18:26 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20170124003827.l2rimgitsyxsvtly@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> In older versions of git, if real_path() failed to resolve
> the alternate object store path, we would die() with an
> error. However, since 4ac9006f8 (real_path: have callers use
> real_pathdup and strbuf_realpath, 2016-12-12) we use the
> real_pathdup() function, which may return NULL. Since we
> don't check the return value, we can segfault.
>
> This is hard to trigger in practice, since we check that the
> path is accessible before creating the alternate_object_database
> struct. But it could be removed racily, or we could see a
> transient filesystem error.
>
> We could restore the original behavior by switching back to
> xstrdup(real_path()).  However, dying is probably not the
> best option here. This whole function is best-effort
> already; there might not even be a repository around the
> shared objects at all. And if the alternate store has gone
> away, there are no objects to show.
>
> So let's just quietly return, as we would if we failed to
> open "refs/", or if upload-pack failed to start, etc.

That's sensible, methinks.

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  transport.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/transport.c b/transport.c
> index c86ba2eb8..74d0e45bd 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -1215,6 +1215,8 @@ static int refs_from_alternate_cb(struct alternate_object_database *e,
>  	struct alternate_refs_data *cb = data;
>  
>  	other = real_pathdup(e->path);
> +	if (!other)
> +		return 0;
>  	len = strlen(other);
>  
>  	while (other[len-1] == '/')

^ permalink raw reply

* Re: [PATCH 03/12] for_each_alternate_ref: use strbuf for path allocation
From: Junio C Hamano @ 2017-01-25 18:29 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20170124004038.njjevfku4v7kmnh4@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> We have a string with ".../objects" pointing to the
> alternate object store, and overwrite bits of it to look at
> other paths in the (potential) git repository holding it.
> This works because the only path we care about is "refs",
> which is shorter than "objects".

Yup, this was probably copied from a lazy original I wrote ;-)
Thanks for cleaning up.

^ permalink raw reply

* Re: Fixing the warning about warning(""); was: Re: [PATCH] difftool.c: mark a file-local symbol with static
From: Jeff King @ 2017-01-25 18:35 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, David Aguilar, Ramsay Jones, GIT Mailing-list
In-Reply-To: <alpine.DEB.2.20.1701251135090.3469@virtualbox>

On Wed, Jan 25, 2017 at 11:36:50AM +0100, Johannes Schindelin wrote:

> > Gross, but at least it's self documenting. :)
> > 
> > I guess a less horrible version of that is:
> > 
> >   static inline warning_blank_line(void)
> >   {
> > 	warning("%s", "");
> >   }
> > 
> > We'd potentially need a matching one for error(), but at last it avoids
> > macro trickery.
> 
> I fail to see how this function, or this definition, makes the code better
> than simply calling `warning("%s", "");` and be done with it.

The only advantage is that it is self-documenting, so somebody does not
come through later and convert ("%s", "") back to (""). We could also
write a comment. But I am happy if we simply catch it in review (or
preferably the person is clueful enough to read the output of git-blame
and see why it is that way in the first place).

So maybe:

-- >8 --
Subject: [PATCH] difftool: hack around -Wzero-length-format warning

Building with "gcc -Wall" will complain that the format in:

  warning("")

is empty. Which is true, but the warning is over-eager. We
are calling the function for its side effect of printing
"warning:", even with an empty string.

Our DEVELOPER Makefile knob disables the warning, but not
everybody uses it. Let's silence the warning in the code so
that nobody reports it or tries to "fix" it.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/difftool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/difftool.c b/builtin/difftool.c
index 42ad9e804..b5e85ab07 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -567,7 +567,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 				warning(_("both files modified: '%s' and '%s'."),
 					wtdir.buf, rdir.buf);
 				warning(_("working tree file has been left."));
-				warning("");
+				warning("%s", "");
 				err = 1;
 			} else if (unlink(wtdir.buf) ||
 				   copy_file(wtdir.buf, rdir.buf, st.st_mode))
-- 
2.11.0.840.gd37c5973a

^ permalink raw reply related

* Re: What's cooking in git.git (Jan 2017, #04; Mon, 23)
From: Stefan Beller @ 2017-01-25 18:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org
In-Reply-To: <xmqqo9yxpaxk.fsf@gitster.mtv.corp.google.com>

On Mon, Jan 23, 2017 at 4:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> * sb/unpack-trees-super-prefix (2017-01-12) 5 commits
>  - SQUASH
>  - unpack-trees: support super-prefix option
>  - t1001: modernize style
>  - t1000: modernize style
>  - read-tree: use OPT_BOOL instead of OPT_SET_INT
>
>  "git read-tree" and its underlying unpack_trees() machinery learned
>  to report problematic paths prefixed with the --super-prefix option.
>
>  Expecting a reroll.
>  The first three are in good shape.  The last one needs a better
>  explanation and possibly an update to its test.
>  cf. <CAGZ79kaHDnVLQr8WLoaD5KKs7EqeW=KbkptF=iHpU5t054Xcdw@mail.gmail.com>
>

Please see
https://public-inbox.org/git/20170118010520.8804-1-sbeller@google.com/


>
> * sb/submodule-doc (2017-01-12) 3 commits
>  - submodules: add a background story
>  - submodule update documentation: don't repeat ourselves
>  - submodule documentation: add options to the subcommand
>
>  Needs review.

The first 2 commit are asking for the standard review, whereas the last patch
(submodules: add a background story), needs more of a design review/opinion
if such a patch is a good idea actually.

Thanks,
Stefan

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2017, #04; Mon, 23)
From: Jeff King @ 2017-01-25 18:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Lars Schneider, git
In-Reply-To: <xmqq4m0nc8dz.fsf@gitster.mtv.corp.google.com>

On Wed, Jan 25, 2017 at 10:16:40AM -0800, Junio C Hamano wrote:

> > But whatever the cause, I think the workaround I posted is
> > easy enough to do.
> 
> Or spelling it explicitly as "/bin/mv" (forgetting systems that does
> not have it in /bin but as /usr/bin/mv) would also defeat alias if
> that were the cause.

Yes, but I think it's less tricky and unportable to write "mv -f" than
"/bin/mv". So even if it _is_ a funny alias thing, I think my patch is
the right fix.

> One downside of working it around like your patch does, or spelling
> it out as "/bin/mv", is that we'd need to worry about all the uses
> of "mv" in our scripts.  If this were _only_ happening in the Travis
> environment, I'd prefer to see why it happens only there and fix that
> instead.

I would be curious to know whether it is a funny thing in the Travis
environment, or if some version of macOS "mv" really is that braindead
(and it is just the case that Travis has that version and Lars's
computer doesn't). I just didn't want to waste anybody's time digging
into it if it won't affect our patch.

I guess the way to dig would be to add a test that looks at the output
of "type mv" or something, push it to a Travis-hooked branch, and then
wait for the output

-Peff

^ permalink raw reply

* Re: [PATCH 03/12] for_each_alternate_ref: use strbuf for path allocation
From: Jeff King @ 2017-01-25 18:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqvat3at8u.fsf@gitster.mtv.corp.google.com>

On Wed, Jan 25, 2017 at 10:29:05AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We have a string with ".../objects" pointing to the
> > alternate object store, and overwrite bits of it to look at
> > other paths in the (potential) git repository holding it.
> > This works because the only path we care about is "refs",
> > which is shorter than "objects".
> 
> Yup, this was probably copied from a lazy original I wrote ;-)
> Thanks for cleaning up.

To be fair, the original predates all of the helper functions I used
here. :)

-Peff

^ permalink raw reply

* Re: [RFC] what content should go in https://git-scm.com/doc/ext
From: Jeff King @ 2017-01-25 18:42 UTC (permalink / raw)
  To: Christian Couder; +Cc: git
In-Reply-To: <CAP8UFD3Z9ZrLEZVkgL7ACkL5nOsoo2R6dUC=Bfy1AjF_9+4snQ@mail.gmail.com>

On Sun, Jan 22, 2017 at 08:13:53PM +0100, Christian Couder wrote:

> > Basically, we maintain a list of links to outside documentation, as well
> > as to books. Somebody has requested a link to their paid tutorial
> > course. How should we handle it?
> 
> I think it depends if you are ready and willing to handle more changes.

I don't clicking "merge" on more changes if people are willing to do the
design work.

> > Should we do the same for tutorial content like this?
> 
> It could make sense to link to tutorial content, and then it could
> also make sense to link to trainings (online and maybe offline too),
> but my fear is that you could then have many people who want to
> advertise their tutorials and trainings, and that it might be
> difficult to draw a line while keeping the playing field even.
> 
> But maybe it's worth trying anyway.

I think I'm inclined to try it, as long as it's kept in a separate "paid
training" section, and is open to everybody. If it becomes a burden, we
can cut it out later. I think the important thing is that it be equally
open to everybody or nobody, but not just "some". So we'll go with
everybody for now, and switch to nobody later if need be.

-Peff

^ permalink raw reply

* Re: [DEMO][PATCH v2 6/5] compat: add a qsort_s() implementation based on GNU's qsort_r(1)
From: René Scharfe @ 2017-01-25 18:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git List, Johannes Schindelin
In-Reply-To: <20170124203949.46lbmiyj26xx4hrk@sigill.intra.peff.net>

Am 24.01.2017 um 21:39 schrieb Jeff King:
> On Tue, Jan 24, 2017 at 07:00:03PM +0100, René Scharfe wrote:
>
>>> I do worry about having to support more implementations in the
>>> future that have different function signature for the comparison
>>> callbacks, which will make things ugly, but this addition alone
>>> doesn't look too bad to me.
>>
>> It is unfair of me to show a 5% speedup and then recommend to not
>> include it. ;-)  That difference won't be measurable in real use cases
>> and the patch is not necessary.  This patch is simple, but the overall
>> complexity (incl. #ifdefs etc.) will be lower without it.
>
> I care less about squeezing out the last few percent performance and
> more that somebody libc qsort_r() might offer some other improvement.
> For instance, it could sort in-place to lower memory use for some cases,
> or do some clever thing that gives more than a few percent in the real
> world (something like TimSort).
>
> I don't know to what degree we should care about that.

That's a good question.  Which workloads spend a significant amount of 
time in qsort/qsort_s?

We could track processor time spent and memory allocated in QSORT_S and 
the whole program and show a warning at the end if one of the two 
exceeded, say, 5% of the total, asking nicely to send it to our mailing 
list.  Would something like this be useful for other functions or 
metrics as well?  Would it be too impolite to use users as telemetry 
transports?

If we find such cases then we'd better fix them for all platforms, e.g. 
by importing timsort, no?

René

^ permalink raw reply

* Re: [DEMO][PATCH v2 6/5] compat: add a qsort_s() implementation based on GNU's qsort_r(1)
From: Jeff King @ 2017-01-25 18:51 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Git List, Johannes Schindelin
In-Reply-To: <f41e5053-ee24-060f-0fb9-b257b3ba35a0@web.de>

On Wed, Jan 25, 2017 at 07:43:01PM +0100, René Scharfe wrote:

> We could track processor time spent and memory allocated in QSORT_S and the
> whole program and show a warning at the end if one of the two exceeded, say,
> 5% of the total, asking nicely to send it to our mailing list.  Would
> something like this be useful for other functions or metrics as well?  Would
> it be too impolite to use users as telemetry transports?

Frankly, that sounds a bit overboard to me. If people want to profile
there are profiling tools. If we want users to profile on their systems
and send results to us, I think I'd rather give them instructions or a
wrapper script for doing so.

> If we find such cases then we'd better fix them for all platforms, e.g. by
> importing timsort, no?

Yes, as long as they are strict improvements. I can't think of a case
where some sorting behavior is a matter of opinion, and they'd prefer
Git behave like the rest of their system rather than like Git on other
systems[1].

-Peff

[1] I wonder the same about regex implementations. I generally consider
    our GNU regex fallback to be a strict improvement over system
    versions, at least in terms of features. But I was interested to see
    recently that musl implements pcre-style "\x" as an extension, but
    it _doesn't_ do REG_STARTEND. So it's at tradeoff. They have to use
    the fallback on newer versions of git (to get REG_STARTEND), but
    that means the rest of their system understands "\x" but Git does
    not.

^ permalink raw reply

* Re: [PATCH 05/12] for_each_alternate_ref: replace transport code with for-each-ref
From: Junio C Hamano @ 2017-01-25 19:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20170124004409.py4eggvrtrej2bgi@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> ...
> This patch omits the peeled information from
> for_each_alternate_ref entirely, and leaves it up to the
> caller whether they want to dig up the information.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I also tried adding "%(*objectname)" to for-each-ref to just
> grab the peeled information, but the peel implementation in
> ref-filter is _really_ slow. It doesn't use the packed-ref
> peel information, and it doesn't recognize duplicates (so in
> the 80 million case, it really parses 80 million tags).

That's an interesting tangent that may want to be looked into.

>  transport.c | 48 ++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 38 insertions(+), 10 deletions(-)

The updated code is a more pleasant read.


^ permalink raw reply

* Re: [PATCH 06/12] clone: disable save_commit_buffer
From: Junio C Hamano @ 2017-01-25 19:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20170124004500.v7geae55w6zeax7m@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Normally git caches the raw commit object contents in
> "struct commit". This makes it fast to run parse_commit()
> followed by a pretty-print operation.
>
> For commands which don't actually pretty-print the commits,
> the caching is wasteful (and may use quite a lot of memory
> if git accesses a large number of commits).
>
> For fetching operations like clone, we already disable

s/clone/fetch/ you meant?

> In one real-world case with a large number of tags, this
> cut about 10MB off of clone's heap usage. Not spectacular,
> but there's really no downside.

"There is no downside" is especially true in the modern world post
v2.1, where get_commit_buffer() is what everybody has to go through
to access this information.  I would have been very hesitant to
accept a patch like this one if we didn't do that clean-up, as a
stray codepath could have just done "commit->buffer" and segfaulted
or said "ah, there is no message", neither of which is satisfactory.

Thanks.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/clone.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5ef81927a..3fca45e7e 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -858,6 +858,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	struct refspec *refspec;
>  	const char *fetch_pattern;
>  
> +	save_commit_buffer = 0;
>  	packet_trace_identity("clone");
>  	argc = parse_options(argc, argv, prefix, builtin_clone_options,
>  			     builtin_clone_usage, 0);

^ permalink raw reply

* Re: [PATCH 07/12] fetch-pack: cache results of for_each_alternate_ref
From: Junio C Hamano @ 2017-01-25 19:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20170124004559.vlsrwwphuzdsfqoq@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> +struct alternate_object_cache {
> +	struct object **items;
> +	size_t nr, alloc;
> +};
> +
> +static void cache_one_alternate(const char *refname,
> +				const struct object_id *oid,
> +				void *vcache)
> +{
> +	struct alternate_object_cache *cache = vcache;
> +	struct object *obj = parse_object(oid->hash);
> +
> +	if (!obj || (obj->flags & ALTERNATE))
> +		return;
> +
> +	obj->flags |= ALTERNATE;
> +	ALLOC_GROW(cache->items, cache->nr + 1, cache->alloc);
> +	cache->items[cache->nr++] = obj;
> +}

Nice.  I love simplicity.

> diff --git a/object.h b/object.h
> index 614a00675..f52957dcb 100644
> --- a/object.h
> +++ b/object.h
> @@ -29,7 +29,7 @@ struct object_array {
>  /*
>   * object flag allocation:
>   * revision.h:      0---------10                                26
> - * fetch-pack.c:    0---4
> + * fetch-pack.c:    0---5
>   * walker.c:        0-2
>   * upload-pack.c:       4       11----------------19
>   * builtin/blame.c:               12-13

This is a tangent, but I am not sure how much it buys us to keep
track of the information here in object.h, as all that picture says
is "revision traversal machinery given by revision.[ch] can never be
used inside fetch-pack and upload-pack", and that is already said by
the current picture that says fetch-pack.c uses 0 thru 4, and
updating it to say that we now use 0 thru 5 would not change the
conclusion.

What I am trying to get at is that we may want to (re)consider how
we manage these bits.  But that is totally outside the scope of this
series, and updating the above is the right thing to do here.

Thanks.

^ permalink raw reply

* Re: [PATCH 06/12] clone: disable save_commit_buffer
From: Jeff King @ 2017-01-25 19:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqmvefaray.fsf@gitster.mtv.corp.google.com>

On Wed, Jan 25, 2017 at 11:11:01AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Normally git caches the raw commit object contents in
> > "struct commit". This makes it fast to run parse_commit()
> > followed by a pretty-print operation.
> >
> > For commands which don't actually pretty-print the commits,
> > the caching is wasteful (and may use quite a lot of memory
> > if git accesses a large number of commits).
> >
> > For fetching operations like clone, we already disable
> 
> s/clone/fetch/ you meant?

Well, no, because this patch deals with clone.

It's likely that builtin/fetch.c would want the same treatment. It
didn't come up for me because I've disabled the alternates check for
that case (but you can't do that with stock git), and I didn't dig
further.

Possibly this should just go into fetch-pack.c, right before we walk
over all of the refs and call parse_object() and deref_tag() on all of
them. It feels a little funny to tweak the global save_commit_buffer
flag there, but it already do so in everything_local(), so I don't think
we're really losing much.

> > In one real-world case with a large number of tags, this
> > cut about 10MB off of clone's heap usage. Not spectacular,
> > but there's really no downside.
> 
> "There is no downside" is especially true in the modern world post
> v2.1, where get_commit_buffer() is what everybody has to go through
> to access this information.  I would have been very hesitant to
> accept a patch like this one if we didn't do that clean-up, as a
> stray codepath could have just done "commit->buffer" and segfaulted
> or said "ah, there is no message", neither of which is satisfactory.

Yep, I had a similar thought while writing it. I would have been very
hesitant to propose it back then, too. :)

-Peff

^ permalink raw reply

* Re: [PATCH 09/12] receive-pack: use oidset to de-duplicate .have lines
From: Junio C Hamano @ 2017-01-25 19:32 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20170124004739.5aowmmkesbanyohm@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> If you have an alternate object store with a very large
> number of refs, the peak memory usage of the sha1_array can
> grow high, even if most of them are duplicates that end up
> not being printed at all.
> ...
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/receive-pack.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)

Nice.  

Incidentally, this also shows that the refnames in alternate ref
namespace will not matter.  We are to only show just one of many
anyway (and that name is masked and replaced with ".have").  Perhaps
we want to do 04/12 without refname?

> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index b9f2c0cc5..27bb52988 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -21,6 +21,7 @@
>  #include "sigchain.h"
>  #include "fsck.h"
>  #include "tmp-objdir.h"
> +#include "oidset.h"
>  
>  static const char * const receive_pack_usage[] = {
>  	N_("git receive-pack <git-dir>"),
> @@ -271,27 +272,24 @@ static int show_ref_cb(const char *path_full, const struct object_id *oid,
>  	return 0;
>  }
>  
> -static int show_one_alternate_sha1(const unsigned char sha1[20], void *unused)
> +static void show_one_alternate_ref(const char *refname,
> +				   const struct object_id *oid,
> +				   void *data)
>  {
> -	show_ref(".have", sha1);
> -	return 0;
> -}
> +	struct oidset *seen = data;
>  
> -static void collect_one_alternate_ref(const char *refname,
> -				      const struct object_id *oid,
> -				      void *data)
> -{
> -	struct sha1_array *sa = data;
> -	sha1_array_append(sa, oid->hash);
> +	if (oidset_insert(seen, oid))
> +		return;
> +
> +	show_ref(".have", oid->hash);
>  }
>  
>  static void write_head_info(void)
>  {
> -	struct sha1_array sa = SHA1_ARRAY_INIT;
> +	static struct oidset seen = OIDSET_INIT;
>  
> -	for_each_alternate_ref(collect_one_alternate_ref, &sa);
> -	sha1_array_for_each_unique(&sa, show_one_alternate_sha1, NULL);
> -	sha1_array_clear(&sa);
> +	for_each_alternate_ref(show_one_alternate_ref, &seen);
> +	oidset_clear(&seen);
>  	for_each_ref(show_ref_cb, NULL);
>  	if (!sent_capabilities)
>  		show_ref("capabilities^{}", null_sha1);

^ permalink raw reply

* Re: [PATCH 06/12] clone: disable save_commit_buffer
From: Jeff King @ 2017-01-25 19:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <20170125192740.5lqoc2srqfjiyfwr@sigill.intra.peff.net>

On Wed, Jan 25, 2017 at 02:27:40PM -0500, Jeff King wrote:

> > > For fetching operations like clone, we already disable
> > 
> > s/clone/fetch/ you meant?
> 
> Well, no, because this patch deals with clone.
> 
> It's likely that builtin/fetch.c would want the same treatment. It
> didn't come up for me because I've disabled the alternates check for
> that case (but you can't do that with stock git), and I didn't dig
> further.
> 
> Possibly this should just go into fetch-pack.c, right before we walk
> over all of the refs and call parse_object() and deref_tag() on all of
> them. It feels a little funny to tweak the global save_commit_buffer
> flag there, but it already do so in everything_local(), so I don't think
> we're really losing much.

Hrm. So I thought it might be useful to write a patch that just tweaks
save_commit_buffer at the start of fetch_pack(). But looking it over,
we call everything_local() _before_ we walk over all the refs. So
save_commit_buffer should already be turned off for my case.

I wonder if I made a mistake while measuring the peak RSS. Or if clone
parses some commits before it calls into fetch_pack() (but which
objects? It doesn't have any until it does the fetch).

Perhaps we should just drop this patch (though I think it is logically
consistent and wouldn't hurt anything).

-Peff

^ permalink raw reply

* Re: [PATCH 07/12] fetch-pack: cache results of for_each_alternate_ref
From: Jeff King @ 2017-01-25 19:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqinp3aqu0.fsf@gitster.mtv.corp.google.com>

On Wed, Jan 25, 2017 at 11:21:11AM -0800, Junio C Hamano wrote:

> > diff --git a/object.h b/object.h
> > index 614a00675..f52957dcb 100644
> > --- a/object.h
> > +++ b/object.h
> > @@ -29,7 +29,7 @@ struct object_array {
> >  /*
> >   * object flag allocation:
> >   * revision.h:      0---------10                                26
> > - * fetch-pack.c:    0---4
> > + * fetch-pack.c:    0---5
> >   * walker.c:        0-2
> >   * upload-pack.c:       4       11----------------19
> >   * builtin/blame.c:               12-13
> 
> This is a tangent, but I am not sure how much it buys us to keep
> track of the information here in object.h, as all that picture says
> is "revision traversal machinery given by revision.[ch] can never be
> used inside fetch-pack and upload-pack", and that is already said by
> the current picture that says fetch-pack.c uses 0 thru 4, and
> updating it to say that we now use 0 thru 5 would not change the
> conclusion.

I agree that bumping 4 to 5 does not help at all, given the current
allocations. But it could eventually, if another allocation wanted to
pick up 5, and might plausibly be used together with fetch-pack.

The main thing is that we should keep this up to date, and that it
should cause textual conflicts when two topics update the allocation, so
that a human looks at it. I actually think we fail at the latter,
because a change to revision.h to use bit 20 would probably get
auto-resolved against a change to allocate the same bit to blame.c.

Probably a better organization is:

  bit 0: revision.h, fetch-pack.c, walker.c
  bit 1: revision.h, fetch-pack.c, walker.c
  ...
  bit 10: revision.h
  bit 11: upload-pack.c

and so forth. It's more tedious to read and write, but any two changes
to the same bit would be forced to generate a conflict (assuming a
line-oriented merge, of course :) ).

> What I am trying to get at is that we may want to (re)consider how
> we manage these bits.  But that is totally outside the scope of this
> series, and updating the above is the right thing to do here.

Perhaps you meant something like the above when you said this. But what
I'd _really_ love is to stop using global bits entirely, and have some
way to allocate a bitset and efficiently convert "struct object" to an
index into the bitset. Then cleaning up just becomes throwing out your
bitset, and by default operations do not see the bits from other
unrelated operations. That makes it impossible to have bugs like the one
fixed by 5c08dc48a (checkout: clear commit marks after detached-orphan
check, 2011-03-20).

I can't think of a way to do that efficiently besides something like the
commit-slab system, but extended to all objects. The lookups there are
fairly cheap, though it does have worse cache performance. I don't know
if that would matter in practice or not.

But yeah, definitely outside the scope of this series. :)

-Peff

^ permalink raw reply

* Re: [PATCH 11/12] receive-pack: treat namespace .have lines like alternates
From: Junio C Hamano @ 2017-01-25 19:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20170124004805.nu3w47isrb4bxgi5@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Namely, de-duplicate them. We use the same set as the
> alternates, since we call them both ".have" (i.e., there is
> no value in showing one versus the other).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/receive-pack.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 8f8762e4a..c55e2f993 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -251,8 +251,9 @@ static void show_ref(const char *path, const unsigned char *sha1)
>  }
>  
>  static int show_ref_cb(const char *path_full, const struct object_id *oid,
> -		       int flag, void *unused)
> +		       int flag, void *data)
>  {
> +	struct oidset *seen = data;
>  	const char *path = strip_namespace(path_full);
>  
>  	if (ref_is_hidden(path, path_full))
> @@ -263,8 +264,11 @@ static int show_ref_cb(const char *path_full, const struct object_id *oid,
>  	 * refs, so that the client can use them to minimize data
>  	 * transfer but will otherwise ignore them.
>  	 */
> -	if (!path)
> +	if (!path) {
> +		if (oidset_insert(seen, oid))
> +			return 0;
>  		path = ".have";
> +	}

This is very sensible as an optimization that does not change
semantics.

This is an unrelated tangent, but there may want to be a knob to
make the code return here without even showing, to make the
advertisement even smaller and also to stop miniscule information
leakage?  If the namespaced multiple projects are totally unrelated
(i.e. "My sysadmin gave me a write access only to this single
directory, so I am using the namespace feature to host these three
projects that have nothing to do with each other"), showing objects
of other namespaces will buy us nothing and the user is better off
without this code showing these refs as ".have".

>  	show_ref(path, oid->hash);
>  	return 0;
>  }
> @@ -287,7 +291,7 @@ static void write_head_info(void)
>  
>  	for_each_alternate_ref(show_one_alternate_ref, &seen);
>  	oidset_clear(&seen);
> -	for_each_ref(show_ref_cb, NULL);
> +	for_each_ref(show_ref_cb, &seen);
>  	if (!sent_capabilities)
>  		show_ref("capabilities^{}", null_sha1);

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox