* Re: [PATCH 1/3] Documentation/stash: remove mention of git reset --hard
From: Junio C Hamano @ 2017-01-25 21:20 UTC (permalink / raw)
To: Jakub Narębski
Cc: Jeff King, Thomas Gummerer, git, Stephan Beyer, Marc Strapetz,
Johannes Schindelin
In-Reply-To: <d327cfc5-2f0d-1a67-d5f5-b9b7a06f7570@gmail.com>
Jakub Narębski <jnareb@gmail.com> writes:
>>> - Save your local modifications to a new 'stash', and run `git reset
>>> - --hard` to revert them. The <message> part is optional and gives
>>> + Save your local modifications to a new 'stash', and revert the
>>> + the changes in the working tree to match the index.
>>> + The <message> part is optional and gives
>>
>> Hrm. "git reset --hard" doesn't just make the working tree match the
>> index. It also resets the index to HEAD. So either the original or your
>> new description is wrong.
>>
>> I think it's the latter. We really do reset the index unless
>> --keep-index is specified.
>
> I wonder if it is worth mentioning that "saving local modifications"
> in 'git stash' means storing both the state of the worktree (of tracked
> files in it) _and_ the state of the index, before both get set to
> state of HEAD.
Yes, it is, I would say.
^ permalink raw reply
* Re: [PATCH 1/3] Documentation/stash: remove mention of git reset --hard
From: Junio C Hamano @ 2017-01-25 21:20 UTC (permalink / raw)
To: Jeff King
Cc: Thomas Gummerer, git, Stephan Beyer, Marc Strapetz,
Johannes Schindelin
In-Reply-To: <20170124201415.zzyg4vt35vflwjnb@sigill.intra.peff.net>
Jeff King <peff@peff.net> writes:
> On Sat, Jan 21, 2017 at 08:08:02PM +0000, Thomas Gummerer wrote:
>
>> diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
>> index 2e9cef06e6..0ad5335a3e 100644
>> --- a/Documentation/git-stash.txt
>> +++ b/Documentation/git-stash.txt
>> @@ -47,8 +47,9 @@ OPTIONS
>>
>> save [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] [-q|--quiet] [<message>]::
>>
>> - Save your local modifications to a new 'stash', and run `git reset
>> - --hard` to revert them. The <message> part is optional and gives
>> + Save your local modifications to a new 'stash', and revert the
>> + the changes in the working tree to match the index.
>> + The <message> part is optional and gives
>
> Hrm. "git reset --hard" doesn't just make the working tree match the
> index. It also resets the index to HEAD. So either the original or your
> new description is wrong.
>
> I think it's the latter. We really do reset the index unless
> --keep-index is specified.
Correct. So the patch is a net loss. Perhaps not requiring readers
to know "reset --hard" might be an improvement (I personally do not
think so), but this loses crucial information from the description.
Save your local modifications (both in the working tree and
to the index) to a new 'stash', and resets the index to HEAD
and makes the working tree match the index (i.e. runs "git
reset --hard").
That's one-and-a-half lines longer than the original, though.
^ permalink raw reply
* Re: [PATCH 4/5] revision.c: refactor ref selection handler after --exclude
From: Junio C Hamano @ 2017-01-25 21:15 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, jacob.keller
In-Reply-To: <20170125125054.7422-5-pclouds@gmail.com>
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> Behavior change: "--exclude --blah --remotes" will not exclude remote
> branches any more. Only "--exclude --remotes" does.
>
> This is because --exclude is going to have a new friend --decorate-reflog
> who haves the same way. When you allow a distant --remotes to complement
> a previous option, things get complicated. In
>
> --exclude .. --decorate-reflog ... --remotes
>
> Does it mean decorate remote reflogs, or does it mean exclude remotes
> from the selected revisions?
I would expect that the effect of exclude, decorate-reflog and
friends will extend until the next occurrence of --remotes (or --all
or whatever you catch in parse_ref_selector_option() function).
So, I'd read it as "add all remote tracking refs, but (1) exclude
exclude the refs matching pattern .. and (2) use reflog of them if
they match pattern ...".
Or did you mean by "..." something other than a single argument that
is a pattern?
^ permalink raw reply
* Re: [PATCH] mingw: allow hooks to be .exe files
From: Jeff King @ 2017-01-25 21:15 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Junio C Hamano
In-Reply-To: <dc388b2df3baee83972f007cd77a57410553a411.1485362812.git.johannes.schindelin@gmx.de>
On Wed, Jan 25, 2017 at 05:58:42PM +0100, Johannes Schindelin wrote:
> This change is necessary to allow the files in .git/hooks/ to optionally
> have the file extension `.exe` on Windows, as the file names are
> hardcoded otherwise.
Make sense as a goal.
> - if (access(path.buf, X_OK) < 0)
> + if (access(path.buf, X_OK) < 0) {
> +#ifdef STRIP_EXTENSION
> + strbuf_addstr(&path, ".exe");
I think STRIP_EXTENSION is a string. Should this line be:
strbuf_addstr(&path, STRIP_EXTENSION);
?
-Peff
^ permalink raw reply
* Re: [PATCH] Retire the `relink` command
From: Jeff King @ 2017-01-25 21:13 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Junio C Hamano
In-Reply-To: <10319c47ff3f7222c3a601827ebd9398861d509d.1485363528.git.johannes.schindelin@gmx.de>
On Wed, Jan 25, 2017 at 05:58:57PM +0100, Johannes Schindelin wrote:
> Back in the olden days, when all objects were loose and rubber boots were
> made out of wood, it made sense to try to share (immutable) objects
> between repositories.
>
> Ever since the arrival of pack files, it is but an anachronism.
Yes, this is a good idea. I could _almost_ see its utility if it linked
packfiles, too, but then it is very unlikely that two repos have the
exact same packfile.
> Let's move the script to the contrib/examples/ directory and no longer
> offer it.
I am OK with this, but perhaps we should go even further and just delete
it entirely. The point of contrib/examples is to show people "this is
how you could script plumbing to implement a porcelain". But this script
does not call a single plumbing script. It just looks directly in
objects/, which is probably not something we would want to encourage. :)
-Peff
^ permalink raw reply
* Re: [PATCH 2/5] revision.c: group ref selection options together
From: Junio C Hamano @ 2017-01-25 21:11 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, jacob.keller
In-Reply-To: <20170125125054.7422-3-pclouds@gmail.com>
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> These options have on thing in common: when specified right after
one thing.
> --exclude, they will de-select refs instead of selecting them by
> default.
>
> This change makes it possible to introduce new options that use these
> options in the same way as --exclude. Such an option would just
> implement something like handle_refs_pseudo_opt().
>
> parse_ref_selector_option() is taken out of handle_refs_pseudo_opt() so
> that similar functions like handle_refs_pseudo_opt() are forced to
> handle all ref selector options, not skipping some by mistake, which may
> revert the option back to default behavior (rev selection).
I am not sure about these two refactorings, for at least two reasons.
* Naming. The function is all about handling "refs options" and I
do not see anything "pseudo" about the options handled by the
handle_refs_pseudo_opt() function. This is minor.
* I am expecting that the new one yet to be introduced will not
share the huge "switch (selector)" part, but does its own things
in a separate function with a similar structure. The only thing
common between these two functions would be the structure
(i.e. it has a big "switch(selector)" that does different things
depending on REF_SELECT_*) and a call to clear_* function.
If we were to add a new kind of REF_SELECT_* (say
REF_SELECT_NOTES just for the sake of being concrete), what
changes will be needed to the code if the addition of "use reflog
from this class of refs for decoration" feature was done with or
without this step? I have a suspicion that the change will be
simpler without this step.
The above comments will be invalid if the new "use reflog for
decoration" feature will be done by extending this pseudo_opt()
function by extending each of the switch/case arms. If that is
the case, please disregard the above. I just didn't get that
impression from the above paragraph.
^ permalink raw reply
* Re: [PATCH 06/12] clone: disable save_commit_buffer
From: Jeff King @ 2017-01-25 21:07 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <20170125193541.yxqq4avnjngbigmq@sigill.intra.peff.net>
On Wed, Jan 25, 2017 at 02:35:41PM -0500, Jeff King wrote:
> On Wed, Jan 25, 2017 at 02:27:40PM -0500, Jeff King wrote:
>
> > > > For fetching operations like clone, we already disable
> > >
> > > s/clone/fetch/ you meant?
> >
> > Well, no, because this patch deals with clone.
> >
> > It's likely that builtin/fetch.c would want the same treatment. It
> > didn't come up for me because I've disabled the alternates check for
> > that case (but you can't do that with stock git), and I didn't dig
> > further.
> >
> > Possibly this should just go into fetch-pack.c, right before we walk
> > over all of the refs and call parse_object() and deref_tag() on all of
> > them. It feels a little funny to tweak the global save_commit_buffer
> > flag there, but it already do so in everything_local(), so I don't think
> > we're really losing much.
>
> Hrm. So I thought it might be useful to write a patch that just tweaks
> save_commit_buffer at the start of fetch_pack(). But looking it over,
> we call everything_local() _before_ we walk over all the refs. So
> save_commit_buffer should already be turned off for my case.
>
> I wonder if I made a mistake while measuring the peak RSS. Or if clone
> parses some commits before it calls into fetch_pack() (but which
> objects? It doesn't have any until it does the fetch).
>
> Perhaps we should just drop this patch (though I think it is logically
> consistent and wouldn't hurt anything).
OK, I just repeated my heap measurements with valgrind's massif tool,
specifically checking builds of this patch and its parent. I couldn't
find any improvement. So I must have screwed something up in my earlier
measurements.
Sorry for the noise. I think we should probably just drop this patch.
-Peff
^ permalink raw reply
* Re: [PATCH 4/5] revision.c: refactor ref selection handler after --exclude
From: Jeff King @ 2017-01-25 20:57 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, jacob.keller
In-Reply-To: <20170125125054.7422-5-pclouds@gmail.com>
On Wed, Jan 25, 2017 at 07:50:53PM +0700, Nguyễn Thái Ngọc Duy wrote:
> Behavior change: "--exclude --blah --remotes" will not exclude remote
> branches any more. Only "--exclude --remotes" does.
>
> This is because --exclude is going to have a new friend --decorate-reflog
> who haves the same way. When you allow a distant --remotes to complement
> a previous option, things get complicated. In
>
> --exclude .. --decorate-reflog ... --remotes
>
> Does it mean decorate remote reflogs, or does it mean exclude remotes
> from the selected revisions?
I don't think it means either. It means to include remotes in the
selected revisions, but excluding the entries mentioned by --exclude.
IOW:
--exclude=foo --remotes
include all remotes except refs/remotes/foo
--exclude=foo --unrelated --remotes
same
--exclude=foo --decorate-reflog --remotes
decorate reflogs of all remotes except "foo". Do _not_ use them
as traversal tips.
--decorate-reflog --exclude=foo --remotes
same
IOW, the ref-selector options build up until a group option is given,
which acts on the built-up options (over that group) and then resets the
built-up options. Doing "--unrelated" as above is orthogonal (though I
think in practice nobody would do that, because it's hard to read).
> Granted, there may be valid use cases for such a combination (e.g.
> "decorate all reflogs except remote ones") but I feel option order is
> not a good fit to express them.
That would be spelled:
--exclude=refs/remotes --decorate-reflogs --all
(or you could swap the first two options).
Again, I'm not sure if I'm missing something subtle, or if you are
confused about how --exclude works. :)
-Peff
^ permalink raw reply
* Re: [PATCH v3 14/21] read-cache: touch shared index files when used
From: Junio C Hamano @ 2017-01-25 20:52 UTC (permalink / raw)
To: Christian Couder
Cc: Duy Nguyen, git, Ævar Arnfjörð Bjarmason,
Christian Couder
In-Reply-To: <CAP8UFD3KXOgVOhuMtws36XPiek56U4mQKdUs07hzKc-dC=ppmA@mail.gmail.com>
Christian Couder <christian.couder@gmail.com> writes:
> Well, when we cannot freshen a loose file (with
> freshen_loose_object()), we don't warn or die, we just write the loose
> file. But here we cannot write the shared index file.
I think that is an excellent point. Let me make sure I got you
right. For loose object files, we may attempt to freshen and when
it fails we stay silent and do not talk about the failure. Instead,
we write the same file again. That will have two potential outcomes:
1. write fails and we fail the whole thing. It is very clear to
the user that there is something wrong going on.
2. write succeeds, and because we just wrote it, we know that the
file is fresh and is protected from gc.
So the "freshen and if fails just write" is sufficient. It is
absolutely the right thing to do for loose object files.
When we are forking off a new split index file based on an old
shared index file, we may attempt to "touch" the old shared one, but
we cannot write the old shared one again, because other split index
may be based on that, and we do not have enough information to
recreate the old one [*1*]. The fallback safety is not available.
> And this doesn't lead to a catastrophic failure right away.
Exactly.
> There
> could be a catastrophic failure if the shared index file is needed
> again later, but we are not sure that it's going to be needed later.
> In fact it may have just been removed because it won't be needed
> later.
You are listing only the irrelevant cases. The shared one may be
used immediately, and the user can keep using it for a while without
"touching". Perhaps the shared one was chowned to "root" while the
user is looking the other way, and its timestamp is now frozen to
the time of chown. It is a ticking time-bomb that will go off when
your expiry logic kicks in.
> So I am not sure it's a good idea to anticipate a catastrophic failure
> that may not happen. Perhaps we could just warn, but I am not sure it
> will help the user. If the catastrophic failure doesn't happen because
> the shared index file is not needed, I can't see how the warning could
> help. And if there is a catastrophic failure, the following will be
> displayed:
>
> fatal: .git/sharedindex.cbfe41352a4623d4d3e9757861fed53f3094e0ac:
> index file open failed: No such file or directory
That "fatal" is given _ONLY_ after time passes and our failure to
"touch" the file that is still-in-use left it subject to "gc". Of
course, when "fatal" is given, it is too late to warn about ticking
time bomb.
At the time we notice a ticking time bomb is the only sensible time
to warn. Or better yet take a corrective action.
As I expect (and you seem to agree) that a failure to "touch" would
be a very rare event (like, sysadmin chowning it to "root" by
mistake), I do not mind if the "corrective action" were "immediately
unsplit the index, so that at least the current and the latest index
contents will be written out safely to a new single unsplit index
file". That won't help _other_ split index files that are based on
the same "untouchable" shared index, but I do not think that is a
problem we need to solve---if they are still in use, the code that
use them will notice it, and otherwise they are not in use and can
be aged away safely.
[Footnote]
*1* My understanding is that we lose information on stale entries in
the shared file that are covered by the split index overlay
after read_index() returns, so we _might_ be able to write the
"old" one that is sufficient for _our_ split index, but we do
not have good enough information to recreate "old" one usable by
other split index files.
^ permalink raw reply
* Re: [PATCH 2/5] revision.c: group ref selection options together
From: Jeff King @ 2017-01-25 20:50 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, jacob.keller
In-Reply-To: <20170125125054.7422-3-pclouds@gmail.com>
On Wed, Jan 25, 2017 at 07:50:51PM +0700, Nguyễn Thái Ngọc Duy wrote:
> These options have on thing in common: when specified right after
> --exclude, they will de-select refs instead of selecting them by
> default.
>
> This change makes it possible to introduce new options that use these
> options in the same way as --exclude. Such an option would just
> implement something like handle_refs_pseudo_opt().
>
> parse_ref_selector_option() is taken out of handle_refs_pseudo_opt() so
> that similar functions like handle_refs_pseudo_opt() are forced to
> handle all ref selector options, not skipping some by mistake, which may
> revert the option back to default behavior (rev selection).
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> revision.c | 134 +++++++++++++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 100 insertions(+), 34 deletions(-)
Hmm. I see what you're trying to do here, and abstract the repeated
bits. But I'm not sure the line-count reflects a real simplification.
Everything ends up converted to an enum, and then that enum just expands
to similar C code.
I kind of expected that clear_ref_exclusion() would just become a more
abstract clear_ref_selection(). For now it would clear exclusions, and
then later learn to clear the decoration flags.
Maybe I am missing something in the later patches, though.
-Peff
^ permalink raw reply
* Re: GSoC 2017: application open, deadline = February 9, 2017
From: Jeff King @ 2017-01-25 20:45 UTC (permalink / raw)
To: Matthieu Moy
Cc: git, Pranit Bauva, Lars Schneider, Christian Couder,
Carlos Martín Nieto, Johannes Schindelin, Thomas Gummerer
In-Reply-To: <vpq1svtstud.fsf@anie.imag.fr>
On Mon, Jan 23, 2017 at 04:02:02PM +0100, Matthieu Moy wrote:
> If the answer to one of the above question is yes, then:
>
> * Who's willing to mentor? and to admin?
I did co-admin last year and the year before, but I made Matthieu do all
the work. :)
I do not mind doing the administrative stuff. But the real work is in
polishing up the ideas list and microprojects page. So I think the first
step, if people are interested in GSoC, is not just to say "yes, let's
do it", but to actually flesh out these pages:
> * We need to write the application, i.e. essentially polish and update
> the text here: https://git.github.io/SoC-2016-Org-Application/ and
> update the list of project ideas and microprojects :
> https://git.github.io/SoC-2017-Ideas/
> https://git.github.io/SoC-2016-Microprojects/
That can be done incrementally by people who care (especially mentors)
over the next week or so, and doesn't require any real admin
coordination. If it happens and the result looks good, then the
application process is pretty straightforward.
If it doesn't, then we probably ought not to participate in GSoC.
-Peff
PS If we do participate, we should consider Outreachy as well, which can
expand our base of possible applicants. Though note that ones who are
not eligible for GSoC would need to be funded by us. Last year GitHub
offered to cover the stipend for an Outreachy intern for Git. If
there's interest I can see if that offer can be extended for this
year.
^ permalink raw reply
* Re: What's cooking in git.git (Jan 2017, #04; Mon, 23)
From: Junio C Hamano @ 2017-01-25 20:35 UTC (permalink / raw)
To: Stefan Beller; +Cc: git@vger.kernel.org
In-Reply-To: <CAGZ79kZKDyfP1SPsxAf8oMSU3763QiogLpUzFZHy+OTQQdJP6A@mail.gmail.com>
Stefan Beller <sbeller@google.com> writes:
> On Mon, Jan 23, 2017 at 4:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> * sb/unpack-trees-super-prefix (2017-01-12) 5 commits
>> - SQUASH
>> - unpack-trees: support super-prefix option
>> - t1001: modernize style
>> - t1000: modernize style
>> - read-tree: use OPT_BOOL instead of OPT_SET_INT
>>
>> "git read-tree" and its underlying unpack_trees() machinery learned
>> to report problematic paths prefixed with the --super-prefix option.
>>
>> Expecting a reroll.
>> The first three are in good shape. The last one needs a better
>> explanation and possibly an update to its test.
>> cf. <CAGZ79kaHDnVLQr8WLoaD5KKs7EqeW=KbkptF=iHpU5t054Xcdw@mail.gmail.com>
>>
>
> Please see
> https://public-inbox.org/git/20170118010520.8804-1-sbeller@google.com/
Thanks, applied. Let's move this forward.
^ permalink raw reply
* Re: [PATCH v2 25/27] attr: store attribute stack in attr_check structure
From: Junio C Hamano @ 2017-01-25 20:14 UTC (permalink / raw)
To: Brandon Williams; +Cc: git, sbeller, pclouds
In-Reply-To: <20170125195745.GA83343@google.com>
Brandon Williams <bmwill@google.com> writes:
>> In my mind, the value of having a constant check_attr is primarily
>> that it gives us a stable pointer to serve as a hashmap key,
>> i.e. the identifier for each call site, in a later iteration.
>
> We didn't really discuss this notion of having the pointer be a key into
> a hashmap, what sort of information are you envisioning being stored in
> this sort of hashmap?
The "entries relevant to this attr_check() call, that is specific to
the <check_attr instance, the thread> tuple" (aka "what used to be
called the global attr_stack") we discussed would be the primary
example. A thread is likely be looping in a caller that has many
paths inside a directory, calling a function that has a call to
attr_check() for each path. Having something that can use to
identify the check_attr instance in a stable way, even when the
inner function is called and returns many times, would allow us to
populate the "attr stack" just once for the thread when it enters a
directory for the first time (remember, another thread may be
executing the same codepath, checking for paths in a different
directory) and keep using it. There may be other mechanisms you can
come up with, so I wouldn't say it is the only valid way, but it is
a way. That is why I said:
>> But all of the above comes from my intuition, and I'll very much
>> welcome to be proven wrong with an alternative design, or better
>> yet, a working code based on an alternative design ;-).
near the end of my message.
> One issue I can see with this is that the
> functions which have a static attr_check struct would still not be thread
> safe if the initialization of the structure isn't surrounded by a mutex
> itself. ie
Yes, that goes without saying. That is why I suggested Stefan to do
not this:
> static struct attr_check *check;
>
> if (!check)
> init(check);
>
> would need to be:
>
> lock()
> if (!check)
> init(check);
> unlock();
but this:
static struct attr_check *check;
init(&check);
and hide the lock/unlock gymnastics inside the API. I thought that
already was in what you inherited from him and started your work
on top of?
^ permalink raw reply
* Re: [PATCH v2 25/27] attr: store attribute stack in attr_check structure
From: Stefan Beller @ 2017-01-25 20:10 UTC (permalink / raw)
To: Brandon Williams; +Cc: Junio C Hamano, git@vger.kernel.org, Duy Nguyen
In-Reply-To: <20170125195745.GA83343@google.com>
On Wed, Jan 25, 2017 at 11:57 AM, Brandon Williams <bmwill@google.com> wrote:
> On 01/23, Junio C Hamano wrote:
>> Brandon Williams <bmwill@google.com> writes:
>>
>> > ... It seems like breaking the question and answer up
>> > doesn't buy you much in terms of reducing allocation churn and instead
>> > complicates the API with needing to keep track of two structures instead
>> > of a one.
>>
>> In my mind, the value of having a constant check_attr is primarily
>> that it gives us a stable pointer to serve as a hashmap key,
>> i.e. the identifier for each call site, in a later iteration.
>
> We didn't really discuss this notion of having the pointer be a key into
> a hashmap, what sort of information are you envisioning being stored in
> this sort of hashmap? One issue I can see with this is that the
> functions which have a static attr_check struct would still not be thread
> safe if the initialization of the structure isn't surrounded by a mutex
> itself. ie
>
> static struct attr_check *check;
>
> if (!check)
> init(check);
>
> would need to be:
>
> lock()
> if (!check)
> init(check);
> unlock();
>
> inorder to prevent a race to initialize the structure. Which is
> something that the attr system itself can't be refactored to fix (at
> least I can't see how at the moment).
By passing the check pointer into the attr system (using a double pointer)
extern void git_attr_check_initl( \
struct git_attr_check out**, \
const char *, ...)
{
// get the global lock, as construction of new check structs
// is not expected to produce contention
// parse the list of things & construct the thing
*out = /* I made a thing */
// unlock globally
}
>
>> Of course, in order to populate the "question" array, we'd need the
>> interning of attribute names to attr objects, which need to be
>> protected by mutex, and you would probably not want to do that every
>> time the control hits the codepath.
>
> While true that doesn't prevent the mutex needed to create/check that
> the all_attr array that is used to collect attributes is the correct
> size/initialized properly.
>
>> But all of the above comes from my intuition, and I'll very much
>> welcome to be proven wrong with an alternative design, or better
>> yet, a working code based on an alternative design ;-).
>
> Yeah, after working through the problem the two simple solutions I can
> think of are either my v1 or v2 of the series, neither of which allows
> for the attr_check structure to be shared. If we truly want the
> "question" array to be const then that can be done, it would just
> require a bit more boilerplate and making the all_attr array to be
> local to the check_attrs() function itself. An API like this would look
> like:
>
> static const struct attr_check *check;
> struct attr_result result;
>
> if (!check)
> init_check(check);
>
> // Result struct needs to be initialized based on the size of check
> init_result(&result, check);
Behind the scenes we may have a pool that caches result allocations,
such that we avoid memory allocation churn in here
>
> check_attrs(path, check, &result);
>
> // use result
>
> attr_result_clear(&result);
Instead of clearing here, we'd give it back to the pool, which then can keep
parts of the result intact.
^ permalink raw reply
* Re: [PATCH 12/12] receive-pack: avoid duplicates between our refs and alternates
From: Jeff King @ 2017-01-25 20:05 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq60l2c3hl.fsf@gitster.mtv.corp.google.com>
On Wed, Jan 25, 2017 at 12:02:30PM -0800, Junio C Hamano wrote:
> > +extract_ref_advertisement () {
> > + perl -lne '
> > + # \\ is there to skip capabilities after \0
> > + /push< ([^\\]+)/ or next;
> > + exit 0 if $1 eq "0000";
> > + print $1;
> > + '
>
> Parsing TRACE_PACKET output? Yuck. But I think this has to do, as
> any other solution will bound to be uglier.
Agreed. My initial attempt was to just run "git receive-pack </dev/null"
and parse the advertisement. But then you have to parse pktlines. :)
> > + # Notable things in this expectation:
> > + # - local refs are not de-duped
> > + # - .have does not duplicate locals
> > + # - .have does not duplicate itself
> > + local=$(git -C fork rev-parse HEAD) &&
> > + shared=$(git -C shared rev-parse only-shared) &&
> > + cat >expect <<-EOF &&
> > + $local refs/heads/master
> > + $local refs/remotes/origin/HEAD
> > + $local refs/remotes/origin/master
> > + $shared .have
> > + EOF
>
> We may want to sort this thing and the extracted one when comparing;
> the order of the entries is not part of the feature we cast in stone.
True (and in fact the .have moved in the previous patch). I wondered,
though, if it might not be a reasonable thing for somebody to have to
look at the output and verify it when the order _does_ change.
I dunno.
-Peff
^ permalink raw reply
* Re: [PATCH 12/12] receive-pack: avoid duplicates between our refs and alternates
From: Junio C Hamano @ 2017-01-25 20:02 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20170124004818.7resjwbe6ldqjfyx@sigill.intra.peff.net>
Jeff King <peff@peff.net> writes:
> We de-duplicate ".have" refs among themselves, but never
> check if they are duplicates of our local refs. It's not
> unreasonable that they would be if we are a "--shared" or
> "--reference" clone of a similar repository; we'd have all
> the same tags.
>
> We can handle this by inserting our local refs into the
> oidset, but obviously not suppressing duplicates (since the
> refnames are important).
Makes sense.
> +extract_ref_advertisement () {
> + perl -lne '
> + # \\ is there to skip capabilities after \0
> + /push< ([^\\]+)/ or next;
> + exit 0 if $1 eq "0000";
> + print $1;
> + '
Parsing TRACE_PACKET output? Yuck. But I think this has to do, as
any other solution will bound to be uglier.
> +test_expect_success 'receive-pack de-dupes .have lines' '
> + git init shared &&
> + git -C shared commit --allow-empty -m both &&
> + git clone -s shared fork &&
> + (
> + cd shared &&
> + git checkout -b only-shared &&
> + git commit --allow-empty -m only-shared &&
> + git update-ref refs/heads/foo HEAD
> + ) &&
> +
> + # Notable things in this expectation:
> + # - local refs are not de-duped
> + # - .have does not duplicate locals
> + # - .have does not duplicate itself
> + local=$(git -C fork rev-parse HEAD) &&
> + shared=$(git -C shared rev-parse only-shared) &&
> + cat >expect <<-EOF &&
> + $local refs/heads/master
> + $local refs/remotes/origin/HEAD
> + $local refs/remotes/origin/master
> + $shared .have
> + EOF
We may want to sort this thing and the extracted one when comparing;
the order of the entries is not part of the feature we cast in stone.
> +
> + GIT_TRACE_PACKET=$(pwd)/trace git push fork HEAD:foo &&
> + extract_ref_advertisement <trace >refs &&
> + test_cmp expect refs
> +'
> +
> test_done
^ permalink raw reply
* Re: [PATCH 11/12] receive-pack: treat namespace .have lines like alternates
From: Jeff King @ 2017-01-25 19:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqa8aec40a.fsf@gitster.mtv.corp.google.com>
On Wed, Jan 25, 2017 at 11:51:17AM -0800, Junio C Hamano wrote:
> This is an unrelated tangent, but there may want to be a knob to
> make the code return here without even showing, to make the
> advertisement even smaller and also to stop miniscule information
> leakage? If the namespaced multiple projects are totally unrelated
> (i.e. "My sysadmin gave me a write access only to this single
> directory, so I am using the namespace feature to host these three
> projects that have nothing to do with each other"), showing objects
> of other namespaces will buy us nothing and the user is better off
> without this code showing these refs as ".have".
Yeah, I'd agree it's a potential issue. And I definitely do the same
with alternates (in my case it isn't "they're not relevant" as much as
"they are too big, and the optimization backfires").
I don't use namespaces myself[1], though, so I'd prefer to leave that to
somebody who has that itch, and could think it through better.
-Peff
[1] They _seem_ like they'd be a good fit for the way GitHub uses
alternates, but since they're only implemented for fetch/push,
they're only a small part of the story. Plus the ref storage has
traditionally been a scaling bottleneck for us, so it's nice for
each fork to have its own ref storage for operations that just touch
that fork.
^ permalink raw reply
* Re: [PATCH v2 25/27] attr: store attribute stack in attr_check structure
From: Brandon Williams @ 2017-01-25 19:57 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, sbeller, pclouds
In-Reply-To: <xmqqinp5p4x2.fsf@gitster.mtv.corp.google.com>
On 01/23, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
>
> > ... It seems like breaking the question and answer up
> > doesn't buy you much in terms of reducing allocation churn and instead
> > complicates the API with needing to keep track of two structures instead
> > of a one.
>
> In my mind, the value of having a constant check_attr is primarily
> that it gives us a stable pointer to serve as a hashmap key,
> i.e. the identifier for each call site, in a later iteration.
We didn't really discuss this notion of having the pointer be a key into
a hashmap, what sort of information are you envisioning being stored in
this sort of hashmap? One issue I can see with this is that the
functions which have a static attr_check struct would still not be thread
safe if the initialization of the structure isn't surrounded by a mutex
itself. ie
static struct attr_check *check;
if (!check)
init(check);
would need to be:
lock()
if (!check)
init(check);
unlock();
inorder to prevent a race to initialize the structure. Which is
something that the attr system itself can't be refactored to fix (at
least I can't see how at the moment).
> Of course, in order to populate the "question" array, we'd need the
> interning of attribute names to attr objects, which need to be
> protected by mutex, and you would probably not want to do that every
> time the control hits the codepath.
While true that doesn't prevent the mutex needed to create/check that
the all_attr array that is used to collect attributes is the correct
size/initialized properly.
> But all of the above comes from my intuition, and I'll very much
> welcome to be proven wrong with an alternative design, or better
> yet, a working code based on an alternative design ;-).
Yeah, after working through the problem the two simple solutions I can
think of are either my v1 or v2 of the series, neither of which allows
for the attr_check structure to be shared. If we truly want the
"question" array to be const then that can be done, it would just
require a bit more boilerplate and making the all_attr array to be
local to the check_attrs() function itself. An API like this would look
like:
static const struct attr_check *check;
struct attr_result result;
if (!check)
init_check(check);
// Result struct needs to be initialized based on the size of check
init_result(&result, check);
check_attrs(path, check, &result);
// use result
attr_result_clear(&result);
>
It still doesn't handle an initialization race on the check structure
but the check pointer would be const and could be used for some future
optimization. It also will have a bit more allocation churn than either
v1 or v2 of the series. If this is the route you want to take I'll get
working on it, I just want to make sure we're on the same page before
doing a larger refactor like this.
Thanks for the guidance on this, someday we'll get this right :)
--
Brandon Williams
^ permalink raw reply
* Re: [PATCH 09/12] receive-pack: use oidset to de-duplicate .have lines
From: Jeff King @ 2017-01-25 19:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqefzraqbu.fsf@gitster.mtv.corp.google.com>
On Wed, Jan 25, 2017 at 11:32:05AM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > If you have an alternate object store with a very large
> > number of refs, the peak memory usage of the sha1_array can
> > grow high, even if most of them are duplicates that end up
> > not being printed at all.
> > ...
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > builtin/receive-pack.c | 26 ++++++++++++--------------
> > 1 file changed, 12 insertions(+), 14 deletions(-)
>
> Nice.
>
> Incidentally, this also shows that the refnames in alternate ref
> namespace will not matter. We are to only show just one of many
> anyway (and that name is masked and replaced with ".have"). Perhaps
> we want to do 04/12 without refname?
I kept the refnames there as a "plausible" thing that a callback might
want. But I have ulterior motives. :)
I have another series which uses alternate refs as part of
check_connected(). Since that function calls rev-list, it needs to add
something like "rev-list --alternate-refs". Even check_connected() does
not care about the refnames, it seems like that rev-list option should
have some useful name for each ref. It can make things like "--source"
more sensible, instead of just reporting everything as a blank ".have".
I could be persuaded otherwise, though. I don't think we pay a huge for
the refnames here. The generating for-each-ref has them either way, and
the receiving side only keeps one in memory at a time. So we are really
only paying to send them across the pipe and parse them, which doesn't
seem extravagant. I didn't actually measure it, though.
-Peff
^ permalink raw reply
* Re: [PATCH 11/12] receive-pack: treat namespace .have lines like alternates
From: Junio C Hamano @ 2017-01-25 19:51 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20170124004805.nu3w47isrb4bxgi5@sigill.intra.peff.net>
Jeff King <peff@peff.net> writes:
> Namely, de-duplicate them. We use the same set as the
> alternates, since we call them both ".have" (i.e., there is
> no value in showing one versus the other).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> builtin/receive-pack.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 8f8762e4a..c55e2f993 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -251,8 +251,9 @@ static void show_ref(const char *path, const unsigned char *sha1)
> }
>
> static int show_ref_cb(const char *path_full, const struct object_id *oid,
> - int flag, void *unused)
> + int flag, void *data)
> {
> + struct oidset *seen = data;
> const char *path = strip_namespace(path_full);
>
> if (ref_is_hidden(path, path_full))
> @@ -263,8 +264,11 @@ static int show_ref_cb(const char *path_full, const struct object_id *oid,
> * refs, so that the client can use them to minimize data
> * transfer but will otherwise ignore them.
> */
> - if (!path)
> + if (!path) {
> + if (oidset_insert(seen, oid))
> + return 0;
> path = ".have";
> + }
This is very sensible as an optimization that does not change
semantics.
This is an unrelated tangent, but there may want to be a knob to
make the code return here without even showing, to make the
advertisement even smaller and also to stop miniscule information
leakage? If the namespaced multiple projects are totally unrelated
(i.e. "My sysadmin gave me a write access only to this single
directory, so I am using the namespace feature to host these three
projects that have nothing to do with each other"), showing objects
of other namespaces will buy us nothing and the user is better off
without this code showing these refs as ".have".
> show_ref(path, oid->hash);
> return 0;
> }
> @@ -287,7 +291,7 @@ static void write_head_info(void)
>
> for_each_alternate_ref(show_one_alternate_ref, &seen);
> oidset_clear(&seen);
> - for_each_ref(show_ref_cb, NULL);
> + for_each_ref(show_ref_cb, &seen);
> if (!sent_capabilities)
> show_ref("capabilities^{}", null_sha1);
^ permalink raw reply
* Re: [PATCH 07/12] fetch-pack: cache results of for_each_alternate_ref
From: Jeff King @ 2017-01-25 19:47 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqinp3aqu0.fsf@gitster.mtv.corp.google.com>
On Wed, Jan 25, 2017 at 11:21:11AM -0800, Junio C Hamano wrote:
> > diff --git a/object.h b/object.h
> > index 614a00675..f52957dcb 100644
> > --- a/object.h
> > +++ b/object.h
> > @@ -29,7 +29,7 @@ struct object_array {
> > /*
> > * object flag allocation:
> > * revision.h: 0---------10 26
> > - * fetch-pack.c: 0---4
> > + * fetch-pack.c: 0---5
> > * walker.c: 0-2
> > * upload-pack.c: 4 11----------------19
> > * builtin/blame.c: 12-13
>
> This is a tangent, but I am not sure how much it buys us to keep
> track of the information here in object.h, as all that picture says
> is "revision traversal machinery given by revision.[ch] can never be
> used inside fetch-pack and upload-pack", and that is already said by
> the current picture that says fetch-pack.c uses 0 thru 4, and
> updating it to say that we now use 0 thru 5 would not change the
> conclusion.
I agree that bumping 4 to 5 does not help at all, given the current
allocations. But it could eventually, if another allocation wanted to
pick up 5, and might plausibly be used together with fetch-pack.
The main thing is that we should keep this up to date, and that it
should cause textual conflicts when two topics update the allocation, so
that a human looks at it. I actually think we fail at the latter,
because a change to revision.h to use bit 20 would probably get
auto-resolved against a change to allocate the same bit to blame.c.
Probably a better organization is:
bit 0: revision.h, fetch-pack.c, walker.c
bit 1: revision.h, fetch-pack.c, walker.c
...
bit 10: revision.h
bit 11: upload-pack.c
and so forth. It's more tedious to read and write, but any two changes
to the same bit would be forced to generate a conflict (assuming a
line-oriented merge, of course :) ).
> What I am trying to get at is that we may want to (re)consider how
> we manage these bits. But that is totally outside the scope of this
> series, and updating the above is the right thing to do here.
Perhaps you meant something like the above when you said this. But what
I'd _really_ love is to stop using global bits entirely, and have some
way to allocate a bitset and efficiently convert "struct object" to an
index into the bitset. Then cleaning up just becomes throwing out your
bitset, and by default operations do not see the bits from other
unrelated operations. That makes it impossible to have bugs like the one
fixed by 5c08dc48a (checkout: clear commit marks after detached-orphan
check, 2011-03-20).
I can't think of a way to do that efficiently besides something like the
commit-slab system, but extended to all objects. The lookups there are
fairly cheap, though it does have worse cache performance. I don't know
if that would matter in practice or not.
But yeah, definitely outside the scope of this series. :)
-Peff
^ permalink raw reply
* Re: [PATCH 06/12] clone: disable save_commit_buffer
From: Jeff King @ 2017-01-25 19:35 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <20170125192740.5lqoc2srqfjiyfwr@sigill.intra.peff.net>
On Wed, Jan 25, 2017 at 02:27:40PM -0500, Jeff King wrote:
> > > For fetching operations like clone, we already disable
> >
> > s/clone/fetch/ you meant?
>
> Well, no, because this patch deals with clone.
>
> It's likely that builtin/fetch.c would want the same treatment. It
> didn't come up for me because I've disabled the alternates check for
> that case (but you can't do that with stock git), and I didn't dig
> further.
>
> Possibly this should just go into fetch-pack.c, right before we walk
> over all of the refs and call parse_object() and deref_tag() on all of
> them. It feels a little funny to tweak the global save_commit_buffer
> flag there, but it already do so in everything_local(), so I don't think
> we're really losing much.
Hrm. So I thought it might be useful to write a patch that just tweaks
save_commit_buffer at the start of fetch_pack(). But looking it over,
we call everything_local() _before_ we walk over all the refs. So
save_commit_buffer should already be turned off for my case.
I wonder if I made a mistake while measuring the peak RSS. Or if clone
parses some commits before it calls into fetch_pack() (but which
objects? It doesn't have any until it does the fetch).
Perhaps we should just drop this patch (though I think it is logically
consistent and wouldn't hurt anything).
-Peff
^ permalink raw reply
* Re: [PATCH 09/12] receive-pack: use oidset to de-duplicate .have lines
From: Junio C Hamano @ 2017-01-25 19:32 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20170124004739.5aowmmkesbanyohm@sigill.intra.peff.net>
Jeff King <peff@peff.net> writes:
> If you have an alternate object store with a very large
> number of refs, the peak memory usage of the sha1_array can
> grow high, even if most of them are duplicates that end up
> not being printed at all.
> ...
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> builtin/receive-pack.c | 26 ++++++++++++--------------
> 1 file changed, 12 insertions(+), 14 deletions(-)
Nice.
Incidentally, this also shows that the refnames in alternate ref
namespace will not matter. We are to only show just one of many
anyway (and that name is masked and replaced with ".have"). Perhaps
we want to do 04/12 without refname?
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index b9f2c0cc5..27bb52988 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -21,6 +21,7 @@
> #include "sigchain.h"
> #include "fsck.h"
> #include "tmp-objdir.h"
> +#include "oidset.h"
>
> static const char * const receive_pack_usage[] = {
> N_("git receive-pack <git-dir>"),
> @@ -271,27 +272,24 @@ static int show_ref_cb(const char *path_full, const struct object_id *oid,
> return 0;
> }
>
> -static int show_one_alternate_sha1(const unsigned char sha1[20], void *unused)
> +static void show_one_alternate_ref(const char *refname,
> + const struct object_id *oid,
> + void *data)
> {
> - show_ref(".have", sha1);
> - return 0;
> -}
> + struct oidset *seen = data;
>
> -static void collect_one_alternate_ref(const char *refname,
> - const struct object_id *oid,
> - void *data)
> -{
> - struct sha1_array *sa = data;
> - sha1_array_append(sa, oid->hash);
> + if (oidset_insert(seen, oid))
> + return;
> +
> + show_ref(".have", oid->hash);
> }
>
> static void write_head_info(void)
> {
> - struct sha1_array sa = SHA1_ARRAY_INIT;
> + static struct oidset seen = OIDSET_INIT;
>
> - for_each_alternate_ref(collect_one_alternate_ref, &sa);
> - sha1_array_for_each_unique(&sa, show_one_alternate_sha1, NULL);
> - sha1_array_clear(&sa);
> + for_each_alternate_ref(show_one_alternate_ref, &seen);
> + oidset_clear(&seen);
> for_each_ref(show_ref_cb, NULL);
> if (!sent_capabilities)
> show_ref("capabilities^{}", null_sha1);
^ permalink raw reply
* Re: [PATCH 06/12] clone: disable save_commit_buffer
From: Jeff King @ 2017-01-25 19:27 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqmvefaray.fsf@gitster.mtv.corp.google.com>
On Wed, Jan 25, 2017 at 11:11:01AM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Normally git caches the raw commit object contents in
> > "struct commit". This makes it fast to run parse_commit()
> > followed by a pretty-print operation.
> >
> > For commands which don't actually pretty-print the commits,
> > the caching is wasteful (and may use quite a lot of memory
> > if git accesses a large number of commits).
> >
> > For fetching operations like clone, we already disable
>
> s/clone/fetch/ you meant?
Well, no, because this patch deals with clone.
It's likely that builtin/fetch.c would want the same treatment. It
didn't come up for me because I've disabled the alternates check for
that case (but you can't do that with stock git), and I didn't dig
further.
Possibly this should just go into fetch-pack.c, right before we walk
over all of the refs and call parse_object() and deref_tag() on all of
them. It feels a little funny to tweak the global save_commit_buffer
flag there, but it already do so in everything_local(), so I don't think
we're really losing much.
> > In one real-world case with a large number of tags, this
> > cut about 10MB off of clone's heap usage. Not spectacular,
> > but there's really no downside.
>
> "There is no downside" is especially true in the modern world post
> v2.1, where get_commit_buffer() is what everybody has to go through
> to access this information. I would have been very hesitant to
> accept a patch like this one if we didn't do that clean-up, as a
> stray codepath could have just done "commit->buffer" and segfaulted
> or said "ah, there is no message", neither of which is satisfactory.
Yep, I had a similar thought while writing it. I would have been very
hesitant to propose it back then, too. :)
-Peff
^ permalink raw reply
* Re: [PATCH 07/12] fetch-pack: cache results of for_each_alternate_ref
From: Junio C Hamano @ 2017-01-25 19:21 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20170124004559.vlsrwwphuzdsfqoq@sigill.intra.peff.net>
Jeff King <peff@peff.net> writes:
> +struct alternate_object_cache {
> + struct object **items;
> + size_t nr, alloc;
> +};
> +
> +static void cache_one_alternate(const char *refname,
> + const struct object_id *oid,
> + void *vcache)
> +{
> + struct alternate_object_cache *cache = vcache;
> + struct object *obj = parse_object(oid->hash);
> +
> + if (!obj || (obj->flags & ALTERNATE))
> + return;
> +
> + obj->flags |= ALTERNATE;
> + ALLOC_GROW(cache->items, cache->nr + 1, cache->alloc);
> + cache->items[cache->nr++] = obj;
> +}
Nice. I love simplicity.
> diff --git a/object.h b/object.h
> index 614a00675..f52957dcb 100644
> --- a/object.h
> +++ b/object.h
> @@ -29,7 +29,7 @@ struct object_array {
> /*
> * object flag allocation:
> * revision.h: 0---------10 26
> - * fetch-pack.c: 0---4
> + * fetch-pack.c: 0---5
> * walker.c: 0-2
> * upload-pack.c: 4 11----------------19
> * builtin/blame.c: 12-13
This is a tangent, but I am not sure how much it buys us to keep
track of the information here in object.h, as all that picture says
is "revision traversal machinery given by revision.[ch] can never be
used inside fetch-pack and upload-pack", and that is already said by
the current picture that says fetch-pack.c uses 0 thru 4, and
updating it to say that we now use 0 thru 5 would not change the
conclusion.
What I am trying to get at is that we may want to (re)consider how
we manage these bits. But that is totally outside the scope of this
series, and updating the above is the right thing to do here.
Thanks.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox