* Re: [PATCH v4 05/15] bugreport: add uname info
From: Aaron Schrab @ 2020-01-10 2:05 UTC (permalink / raw)
To: Emily Shaffer; +Cc: git
In-Reply-To: <20191213004312.169753-6-emilyshaffer@google.com>
At 16:43 -0800 12 Dec 2019, Emily Shaffer <emilyshaffer@google.com> wrote:
>The contents of uname() can give us some insight into what sort of
>system the user is running on, and help us replicate their setup if need
>be. The domainname field is not guaranteed to be available, so don't
>collect it.
The manpage on Linux says that it's a GNU extension; on Mac OS it isn't
mentioned. So it would be more accurate to say that it's known not to be
available on many systems rather than just not being guaranteed.
Besides that I think in some cases it may be considered sensitive
information, so another reason to not include it. Perhaps even worthy of
being mentioned as the primary reason.
>+ uname_info.nodename,
I think in some cases this could also be considered as sensitive
information, and is unlikely to help in diagnosing problems. So I'd
move to exclude this as well.
^ permalink raw reply
* [PATCH] unpack-trees: correctly compute result count
From: Derrick Stolee via GitGitGadget @ 2020-01-10 1:59 UTC (permalink / raw)
To: git; +Cc: peff, johannes.schindelin, Derrick Stolee
The clear_ce_flags_dir() method processes the cache entries within
a common directory. The returned int is the number of cache entries
processed by that directory. When using the sparse-checkout feature
in cone mode, we can skip the pattern matching for entries in the
directories that are entirely included or entirely excluded.
eb42feca (unpack-trees: hash less in cone mode, 2019-11-21)
introduced this performance feature. The old mechanism relied on
the counts returned by calling clear_ce_flags_1(), but the new
mechanism calculated the number of rows by subtracting "cache_end"
from "cache" to find the size of the range. However, the equation
is wrong because it divides by sizeof(struct cache_entry *). This
is not how pointer arithmetic works!
A coverity build of Git for Windows in preparation for the 2.25.0
release found this issue with the warning, "Pointer differences,
such as cache_end - cache, are automatically scaled down by the
size (8 bytes) of the pointed-to type (struct cache_entry *).
Most likely, the division by sizeof(struct cache_entry *) is
extraneous and should be eliminated." This warning is correct.
This leaves us with the question "how did this even work?" The
problem that occurs with this incorrect pointer arithmetic is
a performance-only bug, and a very slight one at that. Since
the entry count returned by clear_ce_flags_dir() is reduced by
a factor of 8, the loop in clear_ce_flags_1() will re-process
entries from those directories.
By inserting global counters into unpack-tree.c and tracing
them with trace2_data_intmax() (in a private change, for
testing), I was able to see count how many times the loop inside
clear_ce_flags_1() processed an entry and how many times
clear_ce_flags_dir() was called. Each of these are reduced by at
least a factor of 8 with the current change. A factor larger
than 8 happens when multiple levels of directories are repeated.
Specifically, in the Linux kernel repo, the command
git sparse-checkout set LICENSES
restricts the working directory to only the files at root and
in the LICENSES directory. Here are the measured counts:
clear_ce_flags_1 loop blocks:
Before: 11,520
After: 1,621
clear_ce_flags_dir calls:
Before: 7,048
After: 606
While these are dramatic counts, the time spent in
clear_ce_flags_1() is under one millisecond in each case, so
the improvement is not measurable as an end-to-end time.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
unpack-trees: correctly compute result count
Here is a very small fix to the cone-mode pattern-matching in
unpack-trees.c. Johannes found this while running a Coverity scan for
other issues. He alerted me to the problem and I could immediately see
my error here. In creating this patch, most of my time was spent asking
"how did this work before?" and "why didn't this hurt performance?"
Hopefully my commit message explains this thoroughly enough.
As for making it into the release, I don't know. The change is small, it
has a very limited scope, but this flaw is also not really hurting
anything in a major way.
Thanks, -Stolee
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-520%2Fderrickstolee%2Funpack-trees-division-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-520/derrickstolee/unpack-trees-division-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/520
unpack-trees.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/unpack-trees.c b/unpack-trees.c
index 2399b6818b..3dba7f9f09 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1305,14 +1305,14 @@ static int clear_ce_flags_dir(struct index_state *istate,
if (pl->use_cone_patterns && orig_ret == MATCHED_RECURSIVE) {
struct cache_entry **ce = cache;
- rc = (cache_end - cache) / sizeof(struct cache_entry *);
+ rc = cache_end - cache;
while (ce < cache_end) {
(*ce)->ce_flags &= ~clear_mask;
ce++;
}
} else if (pl->use_cone_patterns && orig_ret == NOT_MATCHED) {
- rc = (cache_end - cache) / sizeof(struct cache_entry *);
+ rc = cache_end - cache;
} else {
rc = clear_ce_flags_1(istate, cache, cache_end - cache,
prefix,
base-commit: 7a6a90c6ec48fc78c83d7090d6c1b95d8f3739c0
--
gitgitgadget
^ permalink raw reply related
* Re: Fwd: Add support for axiterm colors in parse_color
From: Eyal Soha @ 2020-01-10 0:20 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20200108095229.GC87523@coredump.intra.peff.net>
> Depending on your terminal, you can already access these colors via
> 256-color mode. Generally 0-7 are the standard colors there, then 8-15
> are the "bright" variants, and then an rgb cube.
Yes. According to the wikipedia page
https://en.wikipedia.org/wiki/ANSI_escape_code#Colors :
0- 7: standard colors (as in ESC [ 30–37 m)
8- 15: high intensity colors (as in ESC [ 90–97 m)
16-231: 6 × 6 × 6 cube (216 colors): 16 + 36 × r + 6 × g + b (0 ≤
r, g, b ≤ 5)
232-255: grayscale from black to white in 24 steps
So 8bit 0-7 and 8bit 8-15 match the ANSI colors 30-37 and 90-97. The
background color versions match 40-47 and 100-107.
On my desktop, the RGB and ANSI colors match in color as described above.
> That said, I'm not entirely opposed to having more human-readable
> aliases. I'm not sure if it's worth using the 16-color version (e.g.,
> "ESC[91m" versus the 256-color version "ESC[38;5;9m"). It's possible it
> would be compatible with more terminals, and it is slightly fewer bytes.
My motivation for this patch was to fix the github workflow log output
that doesn't support 8bit colors properly. Only the "ANSI" colors
0-15 worked. None of the 8-bit colors worked except for 30-37, 40-47,
90-97, and 100-107, which matched the ANSI colors. That is a very
broken color scheme! But that's how it displayed.
> What's AXITERM? I couldn't find it mentioned anywhere around ANSI codes.
Oops, I misread it. The wikipedia page mentions it, it's "aixterm".
> I kind of wonder if we could just call these ANSI as well, and have
> color_output() just decide whether to add an offset of 60 when we see a
> color number between 8 and 15. Or possibly c->value should just have the
> 60-offset value in the first place.
Done.
> This "type" field was already pretty ugly, and I think your patch makes
> it even worse. It would probably be less bad if we just passed in the
> integer 30 instead of '3', and then do:
Done.
> Bonus points for declaring:
>
> enum {
> COLOR_FOREGROUND = 30,
> COLOR_BACKGROUND = 40,
> } color_ground;
Done in a slightly different way.
> This BUG() can go away. The point was to make sure we don't overflow
> "out", but xsnprintf() will check that for us (that's why there isn't
> one in the COLOR_256 and COLOR_RGB case arms).
Removed.
> And if we can handle the regular/bright stuff instead of color_output(),
> we don't need to have this extra fg.type check.
Done. Here's a new patch!
diff --git a/color.c b/color.c
index ebb222ec33..efbe9a1858 100644
--- a/color.c
+++ b/color.c
@@ -24,6 +24,14 @@ const char *column_colors_ansi[] = {
GIT_COLOR_RESET,
};
+enum {
+ COLOR_BACKGROUND_OFFSET = 10,
+ COLOR_FOREGROUND_ANSI = 30,
+ COLOR_FOREGROUND_RGB = 38,
+ COLOR_FOREGROUND_256 = 38,
+ COLOR_FOREGROUND_BRIGHT_ANSI = 90,
+};
+
/* Ignore the RESET at the end when giving the size */
const int column_colors_ansi_max = ARRAY_SIZE(column_colors_ansi) - 1;
@@ -92,7 +100,13 @@ static int parse_color(struct color *out, const
char *name, int len)
for (i = 0; i < ARRAY_SIZE(color_names); i++) {
if (match_word(name, len, color_names[i])) {
out->type = COLOR_ANSI;
- out->value = i;
+ out->value = i + COLOR_FOREGROUND_ANSI;
+ return 0;
+ }
+ if (*name == '+' &&
+ match_word(name+1, len-1, color_names[i])) {
+ out->type = COLOR_ANSI;
+ out->value = i + COLOR_FOREGROUND_BRIGHT_ANSI;
return 0;
}
}
@@ -109,10 +123,15 @@ static int parse_color(struct color *out, const
char *name, int len)
else if (val < 0) {
out->type = COLOR_NORMAL;
return 0;
- /* Rewrite low numbers as more-portable standard colors. */
+ /* Rewrite 0-7 as more-portable standard colors. */
} else if (val < 8) {
out->type = COLOR_ANSI;
- out->value = val;
+ out->value = val + COLOR_FOREGROUND_ANSI;
+ return 0;
+ /* Rewrite 8-15 as more-portable aixterm colors. */
+ } else if (val < 16) {
+ out->type = COLOR_ANSI;
+ out->value = val - 8 + COLOR_FOREGROUND_BRIGHT_ANSI;
return 0;
} else if (val < 256) {
out->type = COLOR_256;
@@ -166,24 +185,24 @@ int color_parse(const char *value, char *dst)
* already have the ANSI escape code in it. "out" should have enough
* space in it to fit any color.
*/
-static char *color_output(char *out, int len, const struct color *c, char type)
+static char *color_output(char *out, int len, const struct color *c,
+ int offset)
{
switch (c->type) {
case COLOR_UNSPECIFIED:
case COLOR_NORMAL:
break;
case COLOR_ANSI:
- if (len < 2)
- BUG("color parsing ran out of space");
- *out++ = type;
- *out++ = '0' + c->value;
+ out += xsnprintf(out, len, "%d", c->value + offset);
break;
case COLOR_256:
- out += xsnprintf(out, len, "%c8;5;%d", type, c->value);
+ out += xsnprintf(out, len, "%d;5;%d", COLOR_FOREGROUND_256 + offset,
+ c->value);
break;
case COLOR_RGB:
- out += xsnprintf(out, len, "%c8;2;%d;%d;%d", type,
- c->red, c->green, c->blue);
+ out += xsnprintf(out, len, "%d;2;%d;%d;%d",
+ COLOR_FOREGROUND_RGB + offset,
+ c->red, c->green, c->blue);
break;
}
return out;
@@ -279,14 +298,12 @@ int color_parse_mem(const char *value, int
value_len, char *dst)
if (!color_empty(&fg)) {
if (sep++)
OUT(';');
- /* foreground colors are all in the 3x range */
- dst = color_output(dst, end - dst, &fg, '3');
+ dst = color_output(dst, end - dst, &fg, 0);
}
if (!color_empty(&bg)) {
if (sep++)
OUT(';');
- /* background colors are all in the 4x range */
- dst = color_output(dst, end - dst, &bg, '4');
+ dst = color_output(dst, end - dst, &bg,
COLOR_BACKGROUND_OFFSET);
}
OUT('m');
}
diff --git a/t/t4026-color.sh b/t/t4026-color.sh
index 671e951ee5..e3f11a94f9 100755
--- a/t/t4026-color.sh
+++ b/t/t4026-color.sh
@@ -30,6 +30,14 @@ test_expect_success 'attribute before color name' '
color "bold red" "[1;31m"
'
+test_expect_success 'axiterm bright fg color' '
+ color "+red" "[91m"
+'
+
+test_expect_success 'axiterm bright bg color' '
+ color "green +blue" "[32;104m"
+'
+
test_expect_success 'color name before attribute' '
color "red bold" "[1;31m"
'
@@ -74,6 +82,10 @@ test_expect_success '0-7 are aliases for basic ANSI
color names' '
color "0 7" "[30;47m"
'
+test_expect_success '8-15 are aliases for aixterm color names' '
+ color "12 13" "[94;105m"
+'
+
test_expect_success '256 colors' '
color "254 bold 255" "[1;38;5;254;48;5;255m"
'
^ permalink raw reply related
* Re: Unreliable 'git rebase --onto'
From: Eugeniu Rosca @ 2020-01-10 0:06 UTC (permalink / raw)
To: Elijah Newren
Cc: Eugeniu Rosca, SZEDER Gábor, Junio C Hamano, Jeff King,
Ævar Arnfjörð, Git Mailing List, Eugeniu Rosca
In-Reply-To: <CABPp-BFiDNb18m8geTCxKLXg0fOd0DS1dWRVWCfnTG0suwGRHg@mail.gmail.com>
Hi Elijah,
On Thu, Jan 09, 2020 at 10:05:52AM -0800, Elijah Newren wrote:
> On Thu, Jan 9, 2020 at 2:53 AM Eugeniu Rosca <erosca@de.adit-jv.com> wrote:
> > Some years ago I was hit by 'git merge' producing slightly different
> > results compared to 'git rebase --onto' and 'git cherry-pick A..B'
> > (maybe I can come up with a reproduction scenario for that too).
>
> If you can, I'd be interested to see it and take a look. I'd normally
> assume it was just some case where A..B included "evil" merge commits
> (merge commits that made additional changes not part of the actual
> merging) since rebasing or cherry-picking such a range would exclude
> the merge commits and thus drop those changes -- but you identified a
> real bug with the default rebase backend so I'm interested to see if
> you happen to have more bugs I should know about.
Here is a _simplified_ scenario to get a totally unexpected result from
'git merge' (initially reproduced years ago, but still happening on
2.25.0.rc2):
## Preparation
0. git --version
git version 2.25.0.rc2
1. git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
2. git remote add linux-stable https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
3. git fetch linux-stable
# Reproduction
4. git checkout f7a8e38f07a1
5. git merge --no-edit e18da11fc0f959
## Merge v4.4.3 commit
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e18da11fc0f959
which is a linux-stable backport of vanilla v4.5-rc1 commit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f7a8e38f07a1
the latter being checked out at step 4.
6. git show HEAD
## Inspect the _automatic_ conflict resolution performed by git in
drivers/mtd/nand/nand_base.c. Git decided to integrate e18da11fc0f959
alongside f7a8e38f07a1, while essentially they are the same commit.
We end up with two times commit f7a8e38f07a1.
What do you think about that?
> Unfortunately, you should note that git-2.25 is going to have the same
> bug you reported; there are still some loose ends with my series to
> make -m the default, and the 2.25 release is expected within a few
> days, so my change of default won't happen until 2.26. (That series
> would have needed to be completed several weeks ago for it to go into
> 2.25).
Thanks for this piece of information and for the time/effort spent!
--
Best Regards,
Eugeniu
^ permalink raw reply
* Re: [PATCH v2 4/4] config: add '--show-scope' to print the scope of a config value
From: Matt Rogers @ 2020-01-09 23:47 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Matthew Rogers via GitGitGadget, git
In-Reply-To: <xmqqd0bsz9xm.fsf@gitster-ct.c.googlers.com>
> > + switch (scope) {
> > + case CONFIG_SCOPE_LOCAL:
> > + return "local";
> > + case CONFIG_SCOPE_GLOBAL:
> > + return "global";
> > + case CONFIG_SCOPE_SYSTEM:
> > + return "system";
> > + case CONFIG_SCOPE_WORKTREE:
> > + return "worktree";
> > + case CONFIG_SCOPE_COMMAND:
> > + return "command";
> > + case CONFIG_SCOPE_SUBMODULE:
> > + return "submodule";
> > + default:
> > + return "unknown";
>
> Makes me wonder if this wants to be done via a table, which would
> later allow a parser to map in the other direction, taking a string
> and returning the corresponding enum, more easily (imagine a yet to
> be invented feature to "list config settings for only this scope").
I was thinking the same but then I realized such an option already exists,
--local and company.
>
> > + }
> > +}
> > +
> > +static void show_config_scope(struct strbuf *buf) {
> > + const char term = end_nul ? '\0' : '\t';
> > + const char *scope = scope_to_string(current_config_scope());
> > +
> > + strbuf_addstr(buf, N_(scope));
> > + strbuf_addch(buf, term);
> > +}
> > +
> > static int show_all_config(const char *key_, const char *value_, void *cb)
> > {
> > - if (show_origin) {
> > + if (show_origin || show_scope) {
> > struct strbuf buf = STRBUF_INIT;
> > - show_config_origin(&buf);
> > + if (show_scope)
> > + show_config_scope(&buf);
> > + if (show_origin)
> > + show_config_origin(&buf);
> > /* Use fwrite as "buf" can contain \0's if "end_null" is set. */
> > fwrite(buf.buf, 1, buf.len, stdout);
> > strbuf_release(&buf);
> > @@ -213,6 +245,8 @@ struct strbuf_list {
> >
> > static int format_config(struct strbuf *buf, const char *key_, const char *value_)
> > {
> > + if (show_scope)
> > + show_config_origin(buf);
> > if (show_origin)
> > show_config_origin(buf);
>
> Shouldn't one of these show scope instead of origin? I wonder how
> the tests I see later in the patch could have passed with this.
>
Oof, that was a blunder... I think this passes the tests because none of them
actually hits `format_config()` only `show_all_config()` so I'll definitely add
a test case for this. (and fix it while I'm at it)
> > diff --git a/t/helper/test-config.c b/t/helper/test-config.c
> > index 78bbb9eb98..aae2c6fc9e 100644
> > --- a/t/helper/test-config.c
> > +++ b/t/helper/test-config.c
> > @@ -48,8 +48,10 @@ static const char *scope_name(enum config_scope scope)
> > return "repo";
> > case CONFIG_SCOPE_WORKTREE:
> > return "worktree";
> > + case CONFIG_SCOPE_SUBMODULE:
> > + return "submodule";
> > case CONFIG_SCOPE_COMMAND:
> > - return "command";
> > + return "cmdline";
>
> Oh???
>
I definitely got a little too excited to get this patch in, so I
definitely apologize for that,
but thanks for the review.
--
Matthew Rogers
^ permalink raw reply
* Re: [PATCH v2 3/4] config: clarify meaning of command line scoping
From: Matt Rogers @ 2020-01-09 23:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Matthew Rogers via GitGitGadget, git
In-Reply-To: <xmqqh814zbm0.fsf@gitster-ct.c.googlers.com>
On Thu, Jan 9, 2020 at 2:14 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Matthew Rogers via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Matthew Rogers <mattr94@gmail.com>
> >
> > CONFIG_SCOPE_CMDLINE is generally used in the code to refer to config
> > values passed in via the -c option. This is a little bit too specific
> > as there are other methods to pass config values so that the last for a
> > single command (namely --file and --blob).
>
> Sorry, but I cannot parse this, especially around "so that the last
> for ..." part.
>
My bad, I guess I must have not read as carefully as I thought I did.
It should read:
"... there are other methods to pass config values so that _they_ last
for a single command ..."
> > As the "visibility" of config
> > values passed by these situations is common, we unify them as having a
> > scope of "command" rather than "command line".
>
> Is the "unification" something done by this patch? It does not
> appear to be so. The existing code already was using CMDLINE to
> call "git -c VAR=VAL" and also something else (which is not clear to
> me, unfortunately, probably because I failed to parse the above
> X-<), and this step renames CMDLINE to COMMAND perhaps because
> CMDLINE has too strong connotation with the "-c" thing and much less
> with the other thing (which is not clear to me, unfortunately) and
> you found that renaming it to COMMAND would cover both cases better?
Essentially, the "unification" was referring to more the unification of all the
things that affect configuration only for the duration of a specific command.
The gist of this patch was to say "There are a few ways besides -c to pass
a configuration option that lasts for a single command, so it makes sense
to broaden that scope.". The change is definitely justified in that COMMAND
communicates that much clearer than CMDLINE as REPO, GLOBAL, SYSTEM
all describe the thing that can see the configuration options, and
it's specifically the
command the can see the -c options and not the command line as a whole.
>
> I also do not get what you meant by "visibility", but it probably is
> primarily because it is not clear what "the other thing" is.
I was using visibility as another way to conceptualize scoping. A
scope more or less determines who can "see" a thing, so maybe that
language was a little bit too much in my head.
The issue is that currently the code doesn't care about any of that (as only -c
options actually have COMMAND scoping), so maybe I should roll it into the
next patch of the series? As that introduces code that actually cares about the
difference?
^ permalink raw reply
* Re: [PATCH v2 2/4] config: fix config scope enum
From: Matt Rogers @ 2020-01-09 23:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Matthew Rogers via GitGitGadget, git
In-Reply-To: <xmqqpnfszbyh.fsf@gitster-ct.c.googlers.com>
On Thu, Jan 9, 2020 at 2:06 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Matthew Rogers via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Matthew Rogers <mattr94@gmail.com>
> >
> > Previously when iterating through git config variables, worktree config
> > and local config were both considered "CONFIG_SCOPE_REPO". This was
> > never a problem before as no one had needed to differentiate between the
> > two cases.
>
> Hmph, then "fix" on the title is a bit misleading, no?
>
> The enum may not have been as fine grained as you would have liked,
> but if there was nothing broken, then we are doing this not to "fix"
> anything.
>
I see where you're coming from, but I would definitely consider this a "fix"
in that it's something that (as explained in the deleted comment) should
have been this way from the start but was unnecessary as no one had a
need for it yet. But I definitely wouldn't be against changing the phrasing
to something like "clean up" or whatever your preferred wording would be.
> A more important thing to explain would probably be why we
> (i.e. those who propose this change, those who give favoriable
> reviews to it, and those who accept it change to the system) would
> want to see finer-grained classification. What do we expect to be
> able to do with the resulting finer-grained set that we wouldn't be
> able to with the original, and why is it a good thing? That is what
> readers of the proposed log message of this change would want to
> see, I would think.
>
This is really more prep for patch 4 later on in this series, as a user who
ran the proposed '--show-scope' later on in the series would care what
was "worktree" vs "local".
Regardless, I think the two options I have would be to amend the commit
message with that extra information or roll it together with patch 4
^ permalink raw reply
* Re: [PATCH v2 1/4] config: fix typo in variable name
From: Matt Rogers @ 2020-01-09 23:22 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Matthew Rogers via GitGitGadget, git
In-Reply-To: <xmqqlfqgzbwp.fsf@gitster-ct.c.googlers.com>
>
> Malapropism is a new word in "git log" output in our history ;-)
>
Yeah, I remember my coworker saying that to me the day I wrote the
message and it was just stuck in my head
^ permalink raw reply
* Re: [RFC PATCH] unpack-trees: watch for out-of-range index position
From: Emily Shaffer @ 2020-01-09 22:46 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20200109075250.GA3978837@coredump.intra.peff.net>
On Thu, Jan 09, 2020 at 02:52:50AM -0500, Jeff King wrote:
> On Wed, Jan 08, 2020 at 12:35:29PM -0800, Junio C Hamano wrote:
>
> > >> It does not sound like a BUG to me, either, but the new condition
> > >> does look correct to me, too. We can turn it into die() later if
> > >> somebody truly cares ;-)
> > >>
> > >> Thanks, both. Will queue.
> > >
> > > Thanks much for the quick turnaround. If I hear more noise I'll give it
> > > a try with die() or error code instead, but for now I'll move on to the
> > > next bug on my list. :)
> >
> > By the way, it is somewhat sad that we proceeded that far in the
> > first place---such a corrupt on-disk index would have caused an
> > early die() if we did not get rid of the trailing-hash integrity
> > check.
>
> Perhaps. The integrity check only protects against an index that was
> modified after the fact, not one that was generated by a buggy Git. I'm
> not sure we know how the index that led to this patch got into this
> state (though it sounds like Emily has a copy and could check the hash
> on it), but other cache-tree segfault I found recently was with an index
> with an intact integrity hash.
Yeah, I can do that, although I'm not sure how. The index itself is very
small - it only contains one file and one tree extension - so I'll go
ahead and paste some poking and prodding, and if it's not what you
wanted then please let me know what else to run.
$ g fsck --cache
Checking object directories: 100% (256/256), done.
Checking objects: 100% (20/20), done.
broken link from commit 153a9a100eae7fdba5989ce39a5dd1782075517f
to commit cca7ecaa5d8c398f41bfec7938cc6a526803579b
broken link from commit 7d6bb91e31d18eadfaf855a9fb7ad6ba81b8b6d9
to commit 03087a617bfe55f862cb1ef43273a2bd08e8b6d6
missing commit 03087a617bfe55f862cb1ef43273a2bd08e8b6d6
missing commit cca7ecaa5d8c398f41bfec7938cc6a526803579b
dangling commit 5e2c635433bc46b13061b276e481f63b1f6642c8
$ hexdump -C .git/index
00000000 44 49 52 43 00 00 00 02 00 00 00 01 5d 89 5e 22 |DIRC........].^"|
00000010 23 bf a3 c4 5d 89 5e 22 23 bf a3 c4 00 00 fe 02 |#...].^"#.......|
00000020 02 c8 f5 83 00 00 81 a4 00 06 c1 dc 00 01 5f 53 |.............._S|
00000030 00 00 06 b3 78 88 a4 f4 22 34 7d ad b0 c4 73 0f |....x..."4}...s.|
00000040 c5 bc f6 ea 1d 2d f0 3a 00 09 52 45 41 44 4d 45 |.....-.:..README|
00000050 2e 6d 64 00 54 52 45 45 00 00 00 3a 00 31 37 20 |.md.TREE...:.17 |
00000060 31 0a da 7f 67 25 40 7d 4e ce 9f d3 72 ce 4c e8 |1...g%@}N...r.L.|
00000070 40 6d 5d ad e9 79 67 69 74 6c 69 6e 74 00 34 20 |@m]..ygitlint.4 |
00000080 30 0a 93 63 25 17 69 e6 d6 92 78 97 55 4b 0f 8b |0..c%.i...x.UK..|
00000090 ff a0 e8 2d 6d 71 32 d1 69 fc f2 38 42 f8 5a 6e |...-mq2.i..8B.Zn|
000000a0 05 35 d6 94 41 c0 9f c7 ba 43 |.5..A....C|
000000aa
$ find .git/objects -type f
.git/objects/pack/pack-5e5d5e7c3cbd60a99b4c0295a2935885fbb235a1.idx
.git/objects/pack/pack-5e5d5e7c3cbd60a99b4c0295a2935885fbb235a1.pack
.git/objects/15/3a9a100eae7fdba5989ce39a5dd1782075517f
.git/objects/5e/2c635433bc46b13061b276e481f63b1f6642c8
(By the way, cat-file barfs on both those loose objects, for the reason fsck
reveals.)
I hope that's helpful.
- Emily
^ permalink raw reply
* Re: [git-for-windows] Git for Windows v2.25.0-rc2, was Re: [ANNOUNCE] Git v2.25.0-rc2
From: Bryan Turner @ 2020-01-09 22:38 UTC (permalink / raw)
To: Johannes Schindelin
Cc: git-for-windows, Git Users, git-packagers, Junio C Hamano
In-Reply-To: <nycvar.QRO.7.76.6.2001092244250.46@tvgsbejvaqbjf.bet>
On Thu, Jan 9, 2020 at 1:47 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Team,
>
> On Wed, 8 Jan 2020, Junio C Hamano wrote:
>
> > A release candidate Git v2.25.0-rc2 is now available for testing
> > at the usual places. It is comprised of 517 non-merge commits
> > since v2.24.1, contributed by 70 people, 29 of which are new faces.
> >
> > The tarballs are found at:
> >
> > https://www.kernel.org/pub/software/scm/git/testing/
>
> The corresponding Git for Windows v2.25.0-rc2 can be found here:
>
> https://github.com/git-for-windows/git/releases/tag/v2.25.0-rc2.windows.1
>
> In addition to rebasing Git for Windows' patches from v2.25.0-rc1 onto
> v2.25.0-rc2, this also includes the "mingw: safeguard better against
> backslashes in file names" fix that I sent upstream in
> https://lore.kernel.org/git/pull.690.git.git.1578576634678.gitgitgadget@gmail.com/
I just wanted to say thank you for the work you've done adding, and
then refining, these backslash checks! The `core.protectNTFS` changes
in 2.24.1 caused some pretty significant fallout with our testing
repositories, multiple of which have invalid trees due to backslashes.
Our Windows tests don't use any of those trees, though, so the checks
were frustratingly rigid (if understandable, I should add).
One further thing we found while trying to make adjustments to our
test suite to account for the 2.24.1 changes was that, for some
actions, we couldn't find a way to make `core.protectNTFS=false`
actually change behavior. For example, using something like `git -c
core.protectNTFS=false fetch /path/on/disk` still failed due to
invalid trees (as has been mentioned on other threads, I believe).
Anyway, thanks again for these early release candidates, and for your
efforts to polish the backslash checks to retain the protections but
loosen the restrictions.
Best regards,
Bryan Turner
>
> Ciao,
> Johannes
>
> --
> You received this message because you are subscribed to the Google Groups "git-for-windows" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to git-for-windows+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/git-for-windows/nycvar.QRO.7.76.6.2001092244250.46%40tvgsbejvaqbjf.bet.
^ permalink raw reply
* Re: [PATCH v2 05/11] object-store: allow threaded access to object reading
From: Matheus Tavares Bernardino @ 2020-01-09 22:02 UTC (permalink / raw)
To: Jeff King
Cc: Jonathan Tan, git, Christian Couder,
Оля Тележная,
Nguyễn Thái Ngọc Duy, Junio C Hamano,
Jonathan Nieder, Stefan Beller
In-Reply-To: <CAHd-oW5qT5LmUd6GTL=O+-yXPmq5Uy9gk3ohL_2r+_K+6UJS3Q@mail.gmail.com>
Hi, folks
On Thu, Dec 19, 2019 at 5:27 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
[...]
> However, re-inspecting the code, it seemed to me that we might already
> have a thread-safe mechanism. The window disposal operations (at
> close_pack_windows() and unuse_one_window()) are only performed if
> window.inuse_cnt == 0. So as a thread which reads from the window will
> also previously increment its inuse_cnt, wouldn't the reading
> operation be already protected?
>
> Another concern would be close_pack_fd(), which can close packs even
> with in-use windows. However, as the mmap docs[1] says: "closing the
> file descriptor does not unmap the region".
>
> Finally, we also considered reprepare_packed_git() as a possible
> conflicting operation. But the function called by it to handle
> packfile opening, prepare_pack(), won't reopen already available
> packs. Therefore, IIUC, it will leave the opened windows intact.
>
> So, aren't perhaps the window readings performed outside the
> obj_read_mutex critical section already thread-safe?
Any thoughts on this?
> Thanks,
> Matheus
>
> [1]: https://linux.die.net/man/2/mmap
^ permalink raw reply
* Re: [PATCH] mingw: safeguard better against backslashes in file names
From: Johannes Schindelin @ 2020-01-09 21:53 UTC (permalink / raw)
To: Johannes Schindelin via GitGitGadget; +Cc: git
In-Reply-To: <pull.690.git.git.1578576634678.gitgitgadget@gmail.com>
Hi,
I forgot to say that I already included this patch into Git for Windows
v2.25.0-rc2.
Thanks,
Johannes
On Thu, 9 Jan 2020, Johannes Schindelin via GitGitGadget wrote:
> In 224c7d70fa1 (mingw: only test index entries for backslashes, not tree
> entries, 2019-12-31), we relaxed the check for backslashes in tree
> entries to check only index entries.
>
> However, the code change was incorrect: it was added to
> `add_index_entry_with_check()`, not to `add_index_entry()`, so under
> certain circumstances it was possible to side-step the protection.
>
> Besides, the description of that commit purported that all index entries
> would be checked when in fact they were only checked when being added to
> the index (there are code paths that do not do that, constructing
> "transient" index entries).
>
> In any case, it was pointed out in one insightful review at
> https://github.com/git-for-windows/git/pull/2437#issuecomment-566771835
> that it would be a much better idea to teach `verify_path()` to perform
> the check for a backslash. This is safer, even if it comes with two
> notable drawbacks:
>
> - `verify_path()` cannot say _what_ is wrong with the path, therefore
> the user will no longer be told that there was a backslash in the
> path, only that the path was invalid.
>
> - The `git apply` command also calls the `verify_path()` function, and
> might have been able to handle Windows-style paths (i.e. with
> backslashes instead of forward slashes). This will no longer be
> possible unless the user (temporarily) sets `core.protectNTFS=false`.
>
> Note that `git add <windows-path>` will _still_ work because
> `normalize_path_copy_len()` will convert the backslashes to forward
> slashes before hitting the code path that creates an index entry.
>
> The clear advantage is that `verify_path()`'s purpose is to check the
> validity of the file name, therefore we naturally tap into all the code
> paths that need safeguarding, also implicitly into future code paths.
>
> The benefits of that approach outweigh the downsides, so let's move the
> check from `add_index_entry_with_check()` to `verify_path()`.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> mingw: safeguard better against backslashes in file names
>
> I investigated again, and I think that there are code paths involving
> make_transient_cache_entry() that might be vulnerable again after my
> recent change in 224c7d70fa1 (mingw: only test index entries for
> backslashes, not tree entries, 2019-12-31).
>
> This version should help with keeping Git for Windows' users safe.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-690%2Fdscho%2Fonly-error-on-backslash-in-index-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-690/dscho/only-error-on-backslash-in-index-v1
> Pull-Request: https://github.com/git/git/pull/690
>
> read-cache.c | 12 ++++++------
> t/t7415-submodule-names.sh | 2 +-
> 2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index 737916ebd9..aa427c5c17 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -959,7 +959,7 @@ static int verify_dotfile(const char *rest, unsigned mode)
>
> int verify_path(const char *path, unsigned mode)
> {
> - char c;
> + char c = 0;
>
> if (has_dos_drive_prefix(path))
> return 0;
> @@ -974,6 +974,7 @@ int verify_path(const char *path, unsigned mode)
> if (is_dir_sep(c)) {
> inside:
> if (protect_hfs) {
> +
> if (is_hfs_dotgit(path))
> return 0;
> if (S_ISLNK(mode)) {
> @@ -982,6 +983,10 @@ int verify_path(const char *path, unsigned mode)
> }
> }
> if (protect_ntfs) {
> +#ifdef GIT_WINDOWS_NATIVE
> + if (c == '\\')
> + return 0;
> +#endif
> if (is_ntfs_dotgit(path))
> return 0;
> if (S_ISLNK(mode)) {
> @@ -1278,11 +1283,6 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> int new_only = option & ADD_CACHE_NEW_ONLY;
>
> -#ifdef GIT_WINDOWS_NATIVE
> - if (protect_ntfs && strchr(ce->name, '\\'))
> - return error(_("filename in tree entry contains backslash: '%s'"), ce->name);
> -#endif
> -
> if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
> cache_tree_invalidate_path(istate, ce->name);
>
> diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
> index 7ae0dc8ff4..f70368bc2e 100755
> --- a/t/t7415-submodule-names.sh
> +++ b/t/t7415-submodule-names.sh
> @@ -209,7 +209,7 @@ test_expect_success MINGW 'prevent git~1 squatting on Windows' '
> hash="$(echo x | git hash-object -w --stdin)" &&
> test_must_fail git update-index --add \
> --cacheinfo 160000,$rev,d\\a 2>err &&
> - test_i18ngrep backslash err &&
> + test_i18ngrep "Invalid path" err &&
> git -c core.protectNTFS=false update-index --add \
> --cacheinfo 100644,$modules,.gitmodules \
> --cacheinfo 160000,$rev,c \
>
> base-commit: 7a6a90c6ec48fc78c83d7090d6c1b95d8f3739c0
> --
> gitgitgadget
>
^ permalink raw reply
* Git for Windows v2.25.0-rc2, was Re: [ANNOUNCE] Git v2.25.0-rc2
From: Johannes Schindelin @ 2020-01-09 21:47 UTC (permalink / raw)
To: git-for-windows; +Cc: git, git-packagers, Junio C Hamano
In-Reply-To: <xmqq8smh1t3m.fsf@gitster-ct.c.googlers.com>
Team,
On Wed, 8 Jan 2020, Junio C Hamano wrote:
> A release candidate Git v2.25.0-rc2 is now available for testing
> at the usual places. It is comprised of 517 non-merge commits
> since v2.24.1, contributed by 70 people, 29 of which are new faces.
>
> The tarballs are found at:
>
> https://www.kernel.org/pub/software/scm/git/testing/
The corresponding Git for Windows v2.25.0-rc2 can be found here:
https://github.com/git-for-windows/git/releases/tag/v2.25.0-rc2.windows.1
In addition to rebasing Git for Windows' patches from v2.25.0-rc1 onto
v2.25.0-rc2, this also includes the "mingw: safeguard better against
backslashes in file names" fix that I sent upstream in
https://lore.kernel.org/git/pull.690.git.git.1578576634678.gitgitgadget@gmail.com/
Ciao,
Johannes
^ permalink raw reply
* Re: [PATCH v3 01/15] rebase: extend the options for handling of empty commits
From: Johannes Schindelin @ 2020-01-09 21:32 UTC (permalink / raw)
To: Phillip Wood
Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git Mailing List,
Phillip Wood, Denton Liu, Junio C Hamano, Pavel Roskin,
Alban Gruin, SZEDER Gábor
In-Reply-To: <d9d3d360-fc45-a716-4b35-4367c017460e@gmail.com>
Hi Phillip,
On Wed, 8 Jan 2020, Phillip Wood wrote:
> On 07/01/2020 19:15, Elijah Newren wrote:
> >
> > On Tue, Jan 7, 2020 at 6:37 AM Phillip Wood <phillip.wood123@gmail.com>
> > wrote:
> >
> > > On 24/12/2019 19:54, Elijah Newren via GitGitGadget wrote:
> > >
> > [...]
> > > > @@ -466,7 +494,10 @@ int cmd_rebase__interactive(int argc, const char
> > > > **argv, const char *prefix)
> > > > struct option options[] = {
> > > > OPT_NEGBIT(0, "ff", &opts.flags, N_("allow fast-forward"),
> > > > REBASE_FORCE),
> > > > - OPT_BOOL(0, "keep-empty", &opts.keep_empty, N_("keep empty
> > > > commits")),
> > > > + { OPTION_CALLBACK, 'k', "keep-empty", &options, NULL,
> > > > + N_("(DEPRECATED) keep empty commits"),
> > > > + PARSE_OPT_NOARG | PARSE_OPT_NONEG |
> > > > PARSE_OPT_HIDDEN,
> > >
> > > It is all very well deprecating --keep-empty but suddenly making
> > > '--no-keep-empty' an error goes beyond deprecation. Also I'm not sure
> > > it's worth changing these options as I think the only user is
> > > git-rebase--preserve-merges.sh
> >
> > Side track: Since git-rebase--preserve-merges.sh is deprecated and we
> > want to get rid of it, and rebase-merges exists and is a better
> > implementation of the original idea, can we just translate rebase -p
> > into rebase -r and delete git-rebase--preserve-merges.sh? (With a few
> > wrinkles, such as telling users in the middle of an existing
> > preserve-merges-rebase that they either need to use an old version of
> > git to continue their rebase or else abort the rebase?)
>
> dscho posted some patches in November or December moving along this path, I'm
> not sure what the outcome was. It's a bit complicated by the different todo
> list formats between -p and -r but I think the end game should be to treat -p
> as -r as you suggest.
I sent some patches to switch `git svn` over to `rebase -r`, but that was
only in non-interactive mode.
In interactive mode, the todo lists indeed look too different and I think
we'll just need to wait a year or so and then we can just delete
`git-rebase--preserve-merges.sh`.
Ciao,
Dscho
^ permalink raw reply
* Re: [PATCH v3 2/2] rebase-interactive: warn if commit is dropped with `rebase --edit-todo'
From: Alban Gruin @ 2020-01-09 21:13 UTC (permalink / raw)
To: phillip.wood, Junio C Hamano; +Cc: git, Johannes Schindelin
In-Reply-To: <64aa4049-ee35-df4c-1e6c-80707f4f9070@gmail.com>
Hi Phillip,
Le 09/12/2019 à 17:00, Phillip Wood a écrit :
>>> diff --git a/rebase-interactive.c b/rebase-interactive.c
>>> index ad5dd49c31..80b6a2e7a6 100644
>>> --- a/rebase-interactive.c
>>> +++ b/rebase-interactive.c
>>> @@ -97,7 +97,8 @@ int edit_todo_list(struct repository *r, struct
>>> todo_list *todo_list,
>>> struct todo_list *new_todo, const char *shortrevisions,
>>> const char *shortonto, unsigned flags)
>>> {
>>> - const char *todo_file = rebase_path_todo();
>>> + const char *todo_file = rebase_path_todo(),
>>> + *todo_backup = rebase_path_todo_backup();
>>> /* If the user is editing the todo list, we first try to parse
>>> @@ -110,9 +111,9 @@ int edit_todo_list(struct repository *r, struct
>>> todo_list *todo_list,
>>> -1, flags | TODO_LIST_SHORTEN_IDS |
>>> TODO_LIST_APPEND_TODO_HELP))
>>> return error_errno(_("could not write '%s'"), todo_file);
>>> - if (initial && copy_file(rebase_path_todo_backup(), todo_file,
>>> 0666))
>>> - return error(_("could not copy '%s' to '%s'."), todo_file,
>>> - rebase_path_todo_backup());
>>> + unlink(todo_backup);
>>> + if (copy_file(todo_backup, todo_file, 0666))
>>> + return error(_("could not copy '%s' to '%s'."), todo_file,
>>> todo_backup);
>>
>> We used to copy ONLY when initial is set and we left old todo_backup
>> intact when !initial. That is no longer true after this change, but
>> it is intended---we create an exact copy of what we would hand out
>> to the end-user, so that we can compare it with the edited result
>> to figure out what got changed.
>
> I think it would be better to only create a new copy if the last edit
> was successful. As it stands if I edit the todo list and accidentally
> delete some lines and then edit the todo list again to try and fix it
> the second edit will succeed whether or not I reinserted the deleted lines.
>
> We could add this to the tests to check that a subsequent edit that does
> not fix the problem fails
>
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index 969e12d281..8544d8ab2c 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
>
> @@ -1416,6 +1416,7 @@ test_expect_success 'rebase --edit-todo respects
> rebase.missingCommitsCheck = er
> test_i18ncmp expect actual &&
> test_must_fail git rebase --continue 2>actual &&
> test_i18ncmp expect actual &&
> + test_must_fail git rebase --edit-todo &&
> cp orig .git/rebase-merge/git-rebase-todo &&
> test_must_fail env FAKE_LINES="1 2 3 4" \
> git rebase --edit-todo 2>actual &&
>
>
In which case, if the check did not pass at the previous edit, the new
todo list should be compared to the backup. As sequencer_continue()
already does this, extract this to its own function in
rebase-interactive.c. To keep track of this, a file is created on the
disk (as you suggested in your other email.) At the next edit, if this
file exists and no errors were found, it is deleted. The backup is only
created if there is no errors in `todo_list' and in `new_todo'.
This would guarantee that there is no errors in the backup, and that the
edited list is always compared to a list exempt of errors.
This approach also has the benefit to detect if a commit part of a
badcmd was dropped.
After some tweaks (ie. `expect' now lists 2 commits instead of one),
this passes the test with the change you suggested, and the one you sent
in your other email.
>>
>> We unlink(2) unconditionally because the only effect we want to see
>> here is that todo_backup does not exist before we call copy_file()
>> that wants to do O_CREAT|O_EXCL. I wonder if we want to avoid
>> unlink() when initial, and also if we want to do unlink_or_warn()
>> when !initial (read: this is just "wondering" without thinking long
>> enough to suggest that doing so would be better)
>>
>>> diff --git a/t/t3404-rebase-interactive.sh
>>> b/t/t3404-rebase-interactive.sh
>>> index 29a35840ed..9051c1e11d 100755
>>> --- a/t/t3404-rebase-interactive.sh
>>> +++ b/t/t3404-rebase-interactive.sh
>>> @@ -1343,6 +1343,89 @@ test_expect_success 'rebase -i respects
>>> rebase.missingCommitsCheck = error' '
>>> test B = $(git cat-file commit HEAD^ | sed -ne \$p)
>>> '
>>> +test_expect_success 'rebase --edit-todo respects
>>> rebase.missingCommitsCheck = ignore' '
>>> + test_config rebase.missingCommitsCheck ignore &&
>>> + rebase_setup_and_clean missing-commit &&
>>> + set_fake_editor &&
>>> + FAKE_LINES="break 1 2 3 4 5" git rebase -i --root &&
>>> + FAKE_LINES="1 2 3 4" git rebase --edit-todo 2>actual &&
>>
>> OK, so we lost "pick 5" but with missing-check disabled, that should
>> not trigger any annoying warning or error.
>>
>>> + git rebase --continue 2>actual &&
>
> This clobbers actual which hasn't been used yet
>
Good catch.
>>> + test D = $(git cat-file commit HEAD | sed -ne \$p) &&
>>
>>> + test_i18ngrep \
>>> + "Successfully rebased and updated refs/heads/missing-commit" \
>>> + actual
>>> +'
>>> +
>>> +test_expect_success 'rebase --edit-todo respects
>>> rebase.missingCommitsCheck = warn' '
>>> + cat >expect <<-EOF &&
>>> + error: invalid line 1: badcmd $(git rev-list --pretty=oneline
>>> --abbrev-commit -1 master~4)
>>> + Warning: some commits may have been dropped accidentally.
>>> + Dropped commits (newer to older):
>>> + - $(git rev-list --pretty=oneline --abbrev-commit -1 master)
>>> + To avoid this message, use "drop" to explicitly remove a commit.
>>> + EOF
>>> + tail -n4 expect >expect.2 &&
>>> + test_config rebase.missingCommitsCheck warn &&
>>> + rebase_setup_and_clean missing-commit &&
>>> + set_fake_editor &&
>>> + test_must_fail env FAKE_LINES="bad 1 2 3 4 5" \
>>> + git rebase -i --root &&
>>> + cp .git/rebase-merge/git-rebase-todo.backup orig &&
>>> + FAKE_LINES="2 3 4" git rebase --edit-todo 2>actual.2 &&
>>> + head -n5 actual.2 >actual &&
>>> + test_i18ncmp expect actual &&
>>
>> OK, so we lost "pick 1" while discarding "bad", and we should notice
>> the lossage? I see "head -n5" there, which means we are still
>> getting "invalid line 1: badcmd", even though FAKE_LINES now got rid
>> of "bad"? Puzzled...
>
> Is the bad there to stop the rebase so we can edit the todo list? If so
> it would be better to use 'break' instead.
>
No, it was here to show that we can detect dropped commits, even if the
todo list has an error.
> Best Wishes
>
> Phillip
Cheers,
Alban
^ permalink raw reply
* Re: reftable progress
From: Junio C Hamano @ 2020-01-09 20:18 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: git, Johannes Schindelin, Christian Couder
In-Reply-To: <CAFQ2z_OhNHauK_W1wL7WcOJnm2vCUGXLfYn_ZmLnt2rez+_TDw@mail.gmail.com>
Han-Wen Nienhuys <hanwen@google.com> writes:
> Hi folks,
>
> I have some alpha-quality code for Reftable support in Git at
>
> https://github.com/hanwen/git/tree/reftable
>
> I'd be curious for some feedback, both on the library
> (https://github.com/google/reftable) and the glue code in Git.
If you are asking for feedback, sendign it over to this list with
[RFC PATCH n/3] as the subject prefix would have better chance.
I have a feeling that the patch to show-ref is done at the wrong
level. The show_ref() function is given as the callback function
to head_ref() and for_each_ref(), and the way these functions call
the callback function is part of the ref API. "In the reftable
format, ... are stored in the reference database too," is perfectly
fine (that is the implementation detail of the ref API backend) but
"and are produced when iterating over the refs" is not. Hide that
inside the ref API backend you are writing for reftable and this
change will become unnecessary.
I agree with the general direction of the second patch that extends
setup.c to allow .git/refs to be a regular file to signal that the
repository is using reftable (instead of filesystem-based refs), but
the title "fix repo detection" is misleading---it is adjusting for
the new format (as there is nothing to "fix" if there weren't the
new format). Also, the actual implementation may want to be a bit
more strict, i.e. saying "we allow refs/ to be a searchable
directory, and we also allow a readable regular file".
Thanks.
^ permalink raw reply
* Re: git submodule update strange output behavior.
From: Junio C Hamano @ 2020-01-09 20:03 UTC (permalink / raw)
To: Carlo Wood; +Cc: git
In-Reply-To: <xmqqtv54zcik.fsf@gitster-ct.c.googlers.com>
Junio C Hamano <gitster@pobox.com> writes:
> Carlo Wood <carlo@alinoe.com> writes:
>
>> In a project containing submodules, one of the submodules
>> contains a submodule itself, which in turn also contains
>> a submodule.
>>
>> Overview:
>>
>> project/foobar [submodule]
>> project/cwm4 [submodule]
>> project/evio [submodule]
>> project/evio/protocol/matrixssl [submodule]
>> project/evio/protocol/matrixssl/cwm4 [submodule]
>>
>> ('protocol' is a normal subdirectory)
>>
>> Running (with or without the --quiet),
>>
>> $ git submodule --quiet update --init --recursive --remote
>> Fetching submodule protocol/matrixssl
>> Fetching submodule protocol/matrixssl/cwm4
>> Fetching submodule cwm4
>>
>> This is odd (a bug imho) because
>>
>> 1) it seems to only print this fetching information for submodules inside submodules,
>> not for the top-level submodules.
>> 2) it even prints this when using --quiet
>> 3) it prints this every time (also when there is nothing more to fetch).
>
>
> Sounds like a symptom of (a) the top-level "git submodule update"
> knowing how to react to "--quiet" but (b) it forgets to pass down
> the "--quiet" when it recursively runs "git submodule update" in its
> submodules?
Just a shot in the dark. Not even compile tested ;-)
submodule.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/submodule.c b/submodule.c
index 9da7181321..535bb6bf04 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1454,11 +1454,12 @@ static int get_next_submodule(struct child_process *cp,
argv_array_pushv(&cp->args, spf->args.argv);
argv_array_push(&cp->args, default_argv);
argv_array_push(&cp->args, "--submodule-prefix");
-
strbuf_addf(&submodule_prefix, "%s%s/",
spf->prefix,
task->sub->path);
argv_array_push(&cp->args, submodule_prefix.buf);
+ if (spf->quiet)
+ argv_array_push(&cp->args, "--quiet");
spf->count++;
*task_cb = task;
^ permalink raw reply related
* Re: [PATCH v2 4/4] config: add '--show-scope' to print the scope of a config value
From: Junio C Hamano @ 2020-01-09 19:50 UTC (permalink / raw)
To: Matthew Rogers via GitGitGadget; +Cc: git, Matthew Rogers
In-Reply-To: <92ce9b782429248197bb87cfa11c89082304218f.1578565002.git.gitgitgadget@gmail.com>
"Matthew Rogers via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Matthew Rogers <mattr94@gmail.com>
>
> When a user queries config values with --show-origin, often it's
> difficult to determine what the actual "scope" (local, global, etc.) of
> a given value is based on just the origin file.
>
> Teach 'git config' the '--show-scope' option to print the scope of all
> displayed config values. Note that we should never see anything of
> "submodule" scope as that is only ever used by submodule-config.c when
> parsing the '.gitmodules' file.
>
> Signed-off-by: Matthew Rogers <mattr94@gmail.com>
> ---
> Documentation/git-config.txt | 15 ++++++----
> builtin/config.c | 56 ++++++++++++++++++++++++++++++++----
> config.c | 6 +++-
> config.h | 20 +++++++------
> submodule-config.c | 4 ++-
> t/helper/test-config.c | 4 ++-
> t/t1300-config.sh | 50 ++++++++++++++++++++++++++++++++
> 7 files changed, 133 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/git-config.txt b/Documentation/git-config.txt
> index 899e92a1c9..2e47765aab 100644
> --- a/Documentation/git-config.txt
> +++ b/Documentation/git-config.txt
> @@ -9,18 +9,18 @@ git-config - Get and set repository or global options
> SYNOPSIS
> --------
> [verse]
> -'git config' [<file-option>] [--type=<type>] [--show-origin] [-z|--null] name [value [value_regex]]
> +'git config' [<file-option>] [--type=<type>] [--show-origin] [--show-scope] [-z|--null] name [value [value_regex]]
> 'git config' [<file-option>] [--type=<type>] --add name value
> 'git config' [<file-option>] [--type=<type>] --replace-all name value [value_regex]
> -'git config' [<file-option>] [--type=<type>] [--show-origin] [-z|--null] --get name [value_regex]
> -'git config' [<file-option>] [--type=<type>] [--show-origin] [-z|--null] --get-all name [value_regex]
> -'git config' [<file-option>] [--type=<type>] [--show-origin] [-z|--null] [--name-only] --get-regexp name_regex [value_regex]
> +'git config' [<file-option>] [--type=<type>] [--show-origin] [--show-scope] [-z|--null] --get name [value_regex]
> +'git config' [<file-option>] [--type=<type>] [--show-origin] [--show-scope] [-z|--null] --get-all name [value_regex]
> +'git config' [<file-option>] [--type=<type>] [--show-origin] [--show-scope] [-z|--null] [--name-only] --get-regexp name_regex [value_regex]
> 'git config' [<file-option>] [--type=<type>] [-z|--null] --get-urlmatch name URL
> 'git config' [<file-option>] --unset name [value_regex]
> 'git config' [<file-option>] --unset-all name [value_regex]
> 'git config' [<file-option>] --rename-section old_name new_name
> 'git config' [<file-option>] --remove-section name
> -'git config' [<file-option>] [--show-origin] [-z|--null] [--name-only] -l | --list
> +'git config' [<file-option>] [--show-origin] [--show-scope] [-z|--null] [--name-only] -l | --list
> 'git config' [<file-option>] --get-color name [default]
> 'git config' [<file-option>] --get-colorbool name [stdout-is-tty]
> 'git config' [<file-option>] -e | --edit
> @@ -222,6 +222,11 @@ Valid `<type>`'s include:
> the actual origin (config file path, ref, or blob id if
> applicable).
>
> +--show-scope::
> + Similar to `--show-origin` in that it augments the output of
> + all queried config options with the scope of that value
> + (local, global, system, command).
> +
> --get-colorbool name [stdout-is-tty]::
>
> Find the color setting for `name` (e.g. `color.diff`) and output
> diff --git a/builtin/config.c b/builtin/config.c
> index 52a904cfb1..0367d28ec1 100644
> --- a/builtin/config.c
> +++ b/builtin/config.c
> @@ -33,6 +33,7 @@ static int end_nul;
> static int respect_includes_opt = -1;
> static struct config_options config_options;
> static int show_origin;
> +static int show_scope;
>
> #define ACTION_GET (1<<0)
> #define ACTION_GET_ALL (1<<1)
> @@ -155,6 +156,7 @@ static struct option builtin_config_options[] = {
> OPT_BOOL(0, "name-only", &omit_values, N_("show variable names only")),
> OPT_BOOL(0, "includes", &respect_includes_opt, N_("respect include directives on lookup")),
> OPT_BOOL(0, "show-origin", &show_origin, N_("show origin of config (file, standard input, blob, command line)")),
> + OPT_BOOL(0, "show-scope", &show_scope, N_("show scope of config (worktree, local, global, system, command)")),
> OPT_STRING(0, "default", &default_value, N_("value"), N_("with --get, use default value when missing entry")),
> OPT_END(),
> };
> @@ -189,11 +191,41 @@ static void show_config_origin(struct strbuf *buf)
> strbuf_addch(buf, term);
> }
>
> +static const char *scope_to_string(enum config_scope scope) {
Style: the outermost opening and closing parentheses of a functiond
definition sit on their own lines by themselves. Same for the next
one.
> + switch (scope) {
> + case CONFIG_SCOPE_LOCAL:
> + return "local";
> + case CONFIG_SCOPE_GLOBAL:
> + return "global";
> + case CONFIG_SCOPE_SYSTEM:
> + return "system";
> + case CONFIG_SCOPE_WORKTREE:
> + return "worktree";
> + case CONFIG_SCOPE_COMMAND:
> + return "command";
> + case CONFIG_SCOPE_SUBMODULE:
> + return "submodule";
> + default:
> + return "unknown";
Makes me wonder if this wants to be done via a table, which would
later allow a parser to map in the other direction, taking a string
and returning the corresponding enum, more easily (imagine a yet to
be invented feature to "list config settings for only this scope").
> + }
> +}
> +
> +static void show_config_scope(struct strbuf *buf) {
> + const char term = end_nul ? '\0' : '\t';
> + const char *scope = scope_to_string(current_config_scope());
> +
> + strbuf_addstr(buf, N_(scope));
> + strbuf_addch(buf, term);
> +}
> +
> static int show_all_config(const char *key_, const char *value_, void *cb)
> {
> - if (show_origin) {
> + if (show_origin || show_scope) {
> struct strbuf buf = STRBUF_INIT;
> - show_config_origin(&buf);
> + if (show_scope)
> + show_config_scope(&buf);
> + if (show_origin)
> + show_config_origin(&buf);
> /* Use fwrite as "buf" can contain \0's if "end_null" is set. */
> fwrite(buf.buf, 1, buf.len, stdout);
> strbuf_release(&buf);
> @@ -213,6 +245,8 @@ struct strbuf_list {
>
> static int format_config(struct strbuf *buf, const char *key_, const char *value_)
> {
> + if (show_scope)
> + show_config_origin(buf);
> if (show_origin)
> show_config_origin(buf);
Shouldn't one of these show scope instead of origin? I wonder how
the tests I see later in the patch could have passed with this.
> if (show_keys)
> @@ -599,6 +633,8 @@ int cmd_config(int argc, const char **argv, const char *prefix)
> int nongit = !startup_info->have_repository;
> char *value;
>
> +
> +
Unnecessary edit?
> given_config_source.file = xstrdup_or_null(getenv(CONFIG_ENVIRONMENT));
>
> argc = parse_options(argc, argv, prefix, builtin_config_options,
> @@ -622,6 +658,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
> !strcmp(given_config_source.file, "-")) {
> given_config_source.file = NULL;
> given_config_source.use_stdin = 1;
> + given_config_source.scope = CONFIG_SCOPE_COMMAND;
> }
>
> if (use_global_config) {
> @@ -637,6 +674,8 @@ int cmd_config(int argc, const char **argv, const char *prefix)
> */
> die(_("$HOME not set"));
>
> + given_config_source.scope = CONFIG_SCOPE_GLOBAL;
> +
> if (access_or_warn(user_config, R_OK, 0) &&
> xdg_config && !access_or_warn(xdg_config, R_OK, 0)) {
> given_config_source.file = xdg_config;
> @@ -646,11 +685,13 @@ int cmd_config(int argc, const char **argv, const char *prefix)
> free(xdg_config);
> }
> }
> - else if (use_system_config)
> + else if (use_system_config) {
> given_config_source.file = git_etc_gitconfig();
> - else if (use_local_config)
> + given_config_source.scope = CONFIG_SCOPE_SYSTEM;
> + } else if (use_local_config) {
> given_config_source.file = git_pathdup("config");
> - else if (use_worktree_config) {
> + given_config_source.scope = CONFIG_SCOPE_LOCAL;
> + } else if (use_worktree_config) {
> struct worktree **worktrees = get_worktrees(0);
> if (repository_format_worktree_config)
> given_config_source.file = git_pathdup("config.worktree");
> @@ -662,13 +703,18 @@ int cmd_config(int argc, const char **argv, const char *prefix)
> "section in \"git help worktree\" for details"));
> else
> given_config_source.file = git_pathdup("config");
> + given_config_source.scope = CONFIG_SCOPE_LOCAL;
> free_worktrees(worktrees);
> } else if (given_config_source.file) {
> if (!is_absolute_path(given_config_source.file) && prefix)
> given_config_source.file =
> prefix_filename(prefix, given_config_source.file);
> + given_config_source.scope = CONFIG_SCOPE_COMMAND;
> + } else if (given_config_source.blob) {
> + given_config_source.scope = CONFIG_SCOPE_COMMAND;
These that set scope for individual sources, not just during the
normal sequence, are new in this round, and I think it is an
improvement.
> }
>
> +
> if (respect_includes_opt == -1)
> config_options.respect_includes = !given_config_source.file;
> else
> diff --git a/config.c b/config.c
> index f319a3d6a0..035fc8fb29 100644
> --- a/config.c
> +++ b/config.c
> @@ -1702,6 +1702,7 @@ static int do_git_config_sequence(const struct config_options *opts,
> char *xdg_config = xdg_config_home("config");
> char *user_config = expand_user_path("~/.gitconfig", 0);
> char *repo_config;
> + enum config_scope prev_parsing_scope = current_parsing_scope;
>
> if (opts->commondir)
> repo_config = mkpathdup("%s/config", opts->commondir);
> @@ -1741,7 +1742,7 @@ static int do_git_config_sequence(const struct config_options *opts,
> if (!opts->ignore_cmdline && git_config_from_parameters(fn, data) < 0)
> die(_("unable to parse command-line config"));
>
> - current_parsing_scope = CONFIG_SCOPE_UNKNOWN;
> + current_parsing_scope = prev_parsing_scope;
> free(xdg_config);
> free(user_config);
> free(repo_config);
> @@ -1762,6 +1763,9 @@ int config_with_options(config_fn_t fn, void *data,
> data = &inc;
> }
>
> + if (config_source)
> + current_parsing_scope = config_source->scope;
> +
> /*
> * If we have a specific filename, use it. Otherwise, follow the
> * regular lookup sequence.
> diff --git a/config.h b/config.h
> index f383a71404..91f851e925 100644
> --- a/config.h
> +++ b/config.h
> @@ -35,10 +35,21 @@ struct object_id;
>
> #define CONFIG_REGEX_NONE ((void *)1)
>
> +enum config_scope {
> + CONFIG_SCOPE_UNKNOWN = 0,
> + CONFIG_SCOPE_SYSTEM,
> + CONFIG_SCOPE_GLOBAL,
> + CONFIG_SCOPE_LOCAL,
> + CONFIG_SCOPE_WORKTREE,
> + CONFIG_SCOPE_COMMAND,
> + CONFIG_SCOPE_SUBMODULE,
> +};
> +
> struct git_config_source {
> unsigned int use_stdin:1;
> const char *file;
> const char *blob;
> + enum config_scope scope;
> };
>
> enum config_origin_type {
> @@ -294,15 +305,6 @@ int config_error_nonbool(const char *);
>
> int git_config_parse_parameter(const char *, config_fn_t fn, void *data);
>
> -enum config_scope {
> - CONFIG_SCOPE_UNKNOWN = 0,
> - CONFIG_SCOPE_SYSTEM,
> - CONFIG_SCOPE_GLOBAL,
> - CONFIG_SCOPE_LOCAL,
> - CONFIG_SCOPE_WORKTREE,
> - CONFIG_SCOPE_COMMAND,
> -};
> -
> enum config_scope current_config_scope(void);
> const char *current_config_origin_type(void);
> const char *current_config_name(void);
> diff --git a/submodule-config.c b/submodule-config.c
> index 85064810b2..b8e97d8ae8 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -635,7 +635,9 @@ static void submodule_cache_check_init(struct repository *repo)
> static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void *data)
> {
> if (repo->worktree) {
> - struct git_config_source config_source = { 0 };
> + struct git_config_source config_source = {
> + 0, .scope = CONFIG_SCOPE_SUBMODULE
> + };
> const struct config_options opts = { 0 };
> struct object_id oid;
> char *file;
> diff --git a/t/helper/test-config.c b/t/helper/test-config.c
> index 78bbb9eb98..aae2c6fc9e 100644
> --- a/t/helper/test-config.c
> +++ b/t/helper/test-config.c
> @@ -48,8 +48,10 @@ static const char *scope_name(enum config_scope scope)
> return "repo";
> case CONFIG_SCOPE_WORKTREE:
> return "worktree";
> + case CONFIG_SCOPE_SUBMODULE:
> + return "submodule";
> case CONFIG_SCOPE_COMMAND:
> - return "command";
> + return "cmdline";
Oh???
> default:
> return "unknown";
> }
> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> index 983a0a1583..6cecc80b18 100755
> --- a/t/t1300-config.sh
> +++ b/t/t1300-config.sh
> @@ -1766,6 +1766,56 @@ test_expect_success !MINGW '--show-origin blob ref' '
> test_cmp expect output
> '
>
> +test_expect_success '--show-scope with --list' '
> + cat >expect <<-EOF &&
> + global user.global=true
> + global user.override=global
> + global include.path=$INCLUDE_DIR/absolute.include
> + global user.absolute=include
> + local user.local=true
> + local user.override=local
> + local include.path=../include/relative.include
> + local user.relative=include
> + command user.cmdline=true
> + EOF
> + git -c user.cmdline=true config --list --show-scope >output &&
> + test_cmp expect output
> +'
> +
> +test_expect_success !MINGW '--show-scope with --blob' '
> + blob=$(git hash-object -w "$CUSTOM_CONFIG_FILE") &&
> + cat >expect <<-EOF &&
> + command user.custom=true
> + EOF
> + git config --blob=$blob --show-scope --list >output &&
> + test_cmp expect output
> +'
> +test_expect_success '--show-scope with --local' '
> + cat >expect <<-\EOF &&
> + local user.local=true
> + local user.override=local
> + local include.path=../include/relative.include
> + EOF
> + git config --local --list --show-scope >output &&
> + test_cmp expect output
> +'
> +
> +test_expect_success '--show-scope with --show-origin' '
> + cat >expect <<-EOF &&
> + global file:$HOME/.gitconfig user.global=true
> + global file:$HOME/.gitconfig user.override=global
> + global file:$HOME/.gitconfig include.path=$INCLUDE_DIR/absolute.include
> + global file:$INCLUDE_DIR/absolute.include user.absolute=include
> + local file:.git/config user.local=true
> + local file:.git/config user.override=local
> + local file:.git/config include.path=../include/relative.include
> + local file:.git/../include/relative.include user.relative=include
> + command command line: user.cmdline=true
> + EOF
> + git -c user.cmdline=true config --list --show-origin --show-scope >output &&
> + test_cmp expect output
> +'
> +
> test_expect_success '--local requires a repo' '
> # we expect 128 to ensure that we do not simply
> # fail to find anything and return code "1"
^ permalink raw reply
* Re: [PATCH v2 3/4] config: clarify meaning of command line scoping
From: Junio C Hamano @ 2020-01-09 19:13 UTC (permalink / raw)
To: Matthew Rogers via GitGitGadget; +Cc: git, Matthew Rogers
In-Reply-To: <82252735467d876b4726f512a02cc44d271696ca.1578565001.git.gitgitgadget@gmail.com>
"Matthew Rogers via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Matthew Rogers <mattr94@gmail.com>
>
> CONFIG_SCOPE_CMDLINE is generally used in the code to refer to config
> values passed in via the -c option. This is a little bit too specific
> as there are other methods to pass config values so that the last for a
> single command (namely --file and --blob).
Sorry, but I cannot parse this, especially around "so that the last
for ..." part.
> As the "visibility" of config
> values passed by these situations is common, we unify them as having a
> scope of "command" rather than "command line".
Is the "unification" something done by this patch? It does not
appear to be so. The existing code already was using CMDLINE to
call "git -c VAR=VAL" and also something else (which is not clear to
me, unfortunately, probably because I failed to parse the above
X-<), and this step renames CMDLINE to COMMAND perhaps because
CMDLINE has too strong connotation with the "-c" thing and much less
with the other thing (which is not clear to me, unfortunately) and
you found that renaming it to COMMAND would cover both cases better?
I also do not get what you meant by "visibility", but it probably is
primarily because it is not clear what "the other thing" is.
> Signed-off-by: Matthew Rogers <mattr94@gmail.com>
> ---
> config.c | 2 +-
> config.h | 2 +-
> t/helper/test-config.c | 4 ++--
> 3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/config.c b/config.c
> index 447a013a15..f319a3d6a0 100644
> --- a/config.c
> +++ b/config.c
> @@ -1737,7 +1737,7 @@ static int do_git_config_sequence(const struct config_options *opts,
> free(path);
> }
>
> - current_parsing_scope = CONFIG_SCOPE_CMDLINE;
> + current_parsing_scope = CONFIG_SCOPE_COMMAND;
> if (!opts->ignore_cmdline && git_config_from_parameters(fn, data) < 0)
> die(_("unable to parse command-line config"));
>
> diff --git a/config.h b/config.h
> index 284d92fb0e..f383a71404 100644
> --- a/config.h
> +++ b/config.h
> @@ -300,7 +300,7 @@ enum config_scope {
> CONFIG_SCOPE_GLOBAL,
> CONFIG_SCOPE_LOCAL,
> CONFIG_SCOPE_WORKTREE,
> - CONFIG_SCOPE_CMDLINE,
> + CONFIG_SCOPE_COMMAND,
> };
>
> enum config_scope current_config_scope(void);
> diff --git a/t/helper/test-config.c b/t/helper/test-config.c
> index 6695e463eb..78bbb9eb98 100644
> --- a/t/helper/test-config.c
> +++ b/t/helper/test-config.c
> @@ -48,8 +48,8 @@ static const char *scope_name(enum config_scope scope)
> return "repo";
> case CONFIG_SCOPE_WORKTREE:
> return "worktree";
> - case CONFIG_SCOPE_CMDLINE:
> - return "cmdline";
> + case CONFIG_SCOPE_COMMAND:
> + return "command";
> default:
> return "unknown";
> }
^ permalink raw reply
* reftable progress
From: Han-Wen Nienhuys @ 2020-01-09 19:16 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Christian Couder
Hi folks,
I have some alpha-quality code for Reftable support in Git at
https://github.com/hanwen/git/tree/reftable
I'd be curious for some feedback, both on the library
(https://github.com/google/reftable) and the glue code in Git.
--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
^ permalink raw reply
* Re: [PATCH 7/9] commit-graph: reuse existing bloom filters during write.
From: Jakub Narebski @ 2020-01-09 19:12 UTC (permalink / raw)
To: Garima Singh via GitGitGadget
Cc: git, Derrick Stolee, SZEDER Gábor, Jonathan Tan,
Jeff Hostetler, Taylor Blau, Jeff King, Junio C Hamano,
Garima Singh
In-Reply-To: <1e2acb37ad710cb0d1c09ed163fdd4473a27335c.1576879520.git.gitgitgadget@gmail.com>
"Garima Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Garima Singh <garima.singh@microsoft.com>
>
Overly long lines in the commit message.
> Read previously computed bloom filters from the commit-graph file if possible
> to avoid recomputing during commit-graph write.
Hmmm. This fixes (somewhat) the problem that I have noticed in previous
patch that the Bloom filter was computed at least twice, once for BIDX
chunk, once fo BDAT chunk.
I think the order should be:
- use Bloom filter on slab, if present
- fill it from commit graph, if saved there
- if needed, compute it from scratch (expensive operation!)
If I understand it correctly, it now does it... though possibly with
unnecessary memory allocation if commit-graph file does not include
Bloomm filters data, and (re)computing is not requested (see later).
But I might be wrong here.
>
> Reading from the commit-graph is based on the format in which bloom filters are
> written in the commit graph file. See method `fill_filter_from_graph` in bloom.c
This description reads a bit strange; it looks like it states a truism
(we read in the format we wrote). It think it should be rephrased in
different way for better readability.
>
> For reading the bloom filter for commit at lexicographic position i:
I think it would better read as:
To read Bloom filter for a given commit with lexicographic position
'i' we need to:
> 1. Read BIDX[i] which essentially gives us the starting index in BDAT for filter
> of commit i+1 (called the next_index in the code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
\- I think would be not needed with better var name
I would also add that it gives the position [one past] the end of Bloom
filter data for i-th commit.
>
> 2. For i>0, read BIDX[i-1] which will give us the starting index in BDAT for
> filter of commit i (called the prev_index in the code)
Minor nitpick: Full stops are missing.
Why it is called prev_index and next_index, while it is either
curr_index and next_index or prev_index and curr_index, or maybe even
better beg_index and end_index?
> For i = 0, prev_index will be 0. The first lexicographic commit's filter will
> start at BDAT.
I would state it
For first commit, with i = 0, Bloom filter data starts at the
beginning, just past the header in BDAT chunk.
>
> 3. The length of the filter will be next_index - prev_index, because BIDX[i]
> gives the cumulative 8-byte words including the ith commit's filter.
>
> We toggle whether bloom filters should be recomputed based on the compute_if_null
> flag.
All right.
>
> Helped-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Garima Singh <garima.singh@microsoft.com>
> ---
> bloom.c | 40 ++++++++++++++++++++++++++++++++++++++--
> bloom.h | 3 ++-
> commit-graph.c | 6 +++---
> 3 files changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/bloom.c b/bloom.c
> index 08328cc381..86b1005802 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -1,5 +1,7 @@
> #include "git-compat-util.h"
> #include "bloom.h"
> +#include "commit.h"
> +#include "commit-slab.h"
> #include "commit-graph.h"
> #include "object-store.h"
> #include "diff.h"
> @@ -119,13 +121,35 @@ static void add_key_to_filter(struct bloom_key *key,
> }
> }
>
> +static void fill_filter_from_graph(struct commit_graph *g,
> + struct bloom_filter *filter,
> + struct commit *c)
> +{
> + uint32_t lex_pos, prev_index, next_index;
> +
> + while (c->graph_pos < g->num_commits_in_base)
> + g = g->base_graph;
> +
> + lex_pos = c->graph_pos - g->num_commits_in_base;
This part shares common code with load_oid_from_graph(), only without
some of error checking; perhaps it might be good to extract it into a
separate helper function, e.g. `lex_index(&g, c->graph_pos)`.
Minor nitpick about the consistency of function names: why
load_oid_from_graph(), but fill_filter_from_graph(), and not
load_filter_from_graph() / load_bloom_from_graph()?
> +
> + next_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
> + if (lex_pos)
I think using
+ if (lex_pos > 0)
or
+ if (lex_pos >= 0)
might be easier to reason about.
> + prev_index = get_be32(g->chunk_bloom_indexes + 4 * (lex_pos - 1));
> + else
> + prev_index = 0;
> +
> + filter->len = next_index - prev_index;
The above command reads a bit strange: next - prev? not next - curr,
or curr - prev? Wouldn't it be better to name it begin_index and
end_index, or beg_index and end_index for brevity?
> + filter->data = (uint64_t *)(g->chunk_bloom_data + 8 * prev_index + 12);
Please do not use magic constants; use instead something like:
+ filter->data = (uint64_t *)(g->chunk_bloom_data +
+ sizeof(uint64_t) * prev_index +
+ BLOOMDATA_CHUNK_HEADER_SIZE);
Perhaps using `3*sizeof(unit32_t)` instead of magic value 12 would be
enough; but having symbolic name for BDAT chunk header size is better, I
think.
> +}
> +
> void load_bloom_filters(void)
> {
> init_bloom_filter_slab(&bloom_filters);
> }
>
> struct bloom_filter *get_bloom_filter(struct repository *r,
> - struct commit *c)
> + struct commit *c,
> + int compute_if_null)
I'm not sure about `compute_if_null` name...
> {
> struct bloom_filter *filter;
> struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS;
> @@ -134,6 +158,18 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
> const char *revs_argv[] = {NULL, "HEAD", NULL};
>
> filter = bloom_filter_slab_at(&bloom_filters, c);
Note that the documentation for `slab_at(slab, commit)` is documented in
`commit-slab.h` as
* This function locates the data associated with the given commit in
* the indegree slab, and returns the pointer to it. The location to
* store the data is allocated as necessary.
~~~~~~~~~~~~~~~~~~~~~~
Should we worry about this possibly unnecessary allocation (if there is
no Bloom filter chunk in the commit-graph, and we are not recomputing
it)?
There is `slab_peek(slab_commit)` with the following properties:
* This function is similar to indegree_at(), but it will return NULL
* until a call to indegree_at() was made for the commit.
> +
> + if (!filter->data) {
What does the Bloom filter for a commit with no changes looks like?
What about for a commit with more than 512 changes? Do, in either of
those cases, filter->len is 0? If yes, what about filter->data?
> + load_commit_graph_info(r, c);
> + if (c->graph_pos != COMMIT_NOT_FROM_GRAPH && r->objects->commit_graph->chunk_bloom_indexes) {
Please wrap overly long lines (109 characters seems too long; the
CodingGuidelines states:
- We try to keep to at most 80 characters per line.
+ if (c->graph_pos != COMMIT_NOT_FROM_GRAPH &&
+ r->objects->commit_graph->chunk_bloom_indexes) {
You seem to assume here that in the chain of commit-graph files either
all of them would have Bloom filters, or all of them would be missing
Bloom filters. Isn't it however possible for only some of the
commit-graph files in chain to include Bloom filter data chunks?
In such case, the top commit-graph file may have BIDX chunk
(bloom_indexes), but the commit-graph with the commit 'c' might not have
it. Or the top commit-graph file may be missing BIDX chunk, so Git
would recompute it even if the commit-graph file for commit 'c' includes
it.
If such situation is forbidden, how the restriction is managed?
Note: in any case, this needs to be tested!
> + fill_filter_from_graph(r->objects->commit_graph, filter, c);
> + return filter;
> + }
> + }
All right, if it is not in slab, we try to read if from the commit
graph. Looks all right.
> +
> + if (filter->data || !compute_if_null)
> + return filter;
^^^^^^^^
|
\- one tab too many
If we have found existing filter (on slab or in the commit-graph), or if
we won't be recomputing it, return it. O.K.
> +
> init_revisions(&revs, NULL);
> revs.diffopt.flags.recursive = 1;
>
> @@ -198,4 +234,4 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
> DIFF_QUEUE_CLEAR(&diff_queued_diff);
>
> return filter;
> -}
> \ No newline at end of file
> +}
Accidental change.
> diff --git a/bloom.h b/bloom.h
> index ba8ae70b67..101d689bbd 100644
> --- a/bloom.h
> +++ b/bloom.h
> @@ -36,7 +36,8 @@ struct bloom_key {
> void load_bloom_filters(void);
>
> struct bloom_filter *get_bloom_filter(struct repository *r,
> - struct commit *c);
> + struct commit *c,
> + int compute_if_null);
>
All right, this is just update of the function signature.
> void fill_bloom_key(const char *data,
> int len,
> diff --git a/commit-graph.c b/commit-graph.c
> index def2ade166..0580ce75d5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1032,7 +1032,7 @@ static void write_graph_chunk_bloom_indexes(struct hashfile *f,
> uint32_t cur_pos = 0;
>
> while (list < last) {
> - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list);
> + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0);
> cur_pos += filter->len;
> hashwrite_be32(f, cur_pos);
> list++;
> @@ -1051,7 +1051,7 @@ static void write_graph_chunk_bloom_data(struct hashfile *f,
> hashwrite_be32(f, settings->bits_per_entry);
>
> while (first < last) {
> - struct bloom_filter *filter = get_bloom_filter(ctx->r, *first);
> + struct bloom_filter *filter = get_bloom_filter(ctx->r, *first, 0);
> hashwrite(f, filter->data, filter->len * sizeof(uint64_t));
> first++;
> }
O.K., so those two do not compute Bloom filters, but they are called
from write_commit_graph_file(), which in turn is called in
write_commit_graph() *after* running compute_bloom_filters().
Looks good to me, then.
> @@ -1218,7 +1218,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
>
> for (i = 0; i < ctx->commits.nr; i++) {
> struct commit *c = ctx->commits.list[i];
> - struct bloom_filter *filter = get_bloom_filter(ctx->r, c);
> + struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1);
> ctx->total_bloom_filter_size += sizeof(uint64_t) * filter->len;
> display_progress(progress, i + 1);
> }
All right, so compute_bloom_filters() ensures that it is actually
computed (if needed).
Looks good to me.
Regards,
--
Jakub Narębski
^ permalink raw reply
* Re: [PATCH v2 1/4] config: fix typo in variable name
From: Junio C Hamano @ 2020-01-09 19:07 UTC (permalink / raw)
To: Matthew Rogers via GitGitGadget; +Cc: git, Matthew Rogers
In-Reply-To: <b40480f03a5643761bd06d4b9c495a99b98a1ac8.1578565001.git.gitgitgadget@gmail.com>
"Matthew Rogers via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Matthew Rogers <mattr94@gmail.com>
>
> In git config use of the end_null variable to determine if we should be
> null terminating our output. While it is correct to say a string is
> "null terminated" the character is actually the "nul" character, so this
> malapropism is being fixed.
>
> Signed-off-by: Matthew Rogers <mattr94@gmail.com>
> ---
> builtin/config.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
Obviously correct.
Malapropism is a new word in "git log" output in our history ;-)
> diff --git a/builtin/config.c b/builtin/config.c
> index 98d65bc0ad..52a904cfb1 100644
> --- a/builtin/config.c
> +++ b/builtin/config.c
> @@ -29,7 +29,7 @@ static int use_worktree_config;
> static struct git_config_source given_config_source;
> static int actions, type;
> static char *default_value;
> -static int end_null;
> +static int end_nul;
> static int respect_includes_opt = -1;
> static struct config_options config_options;
> static int show_origin;
> @@ -151,7 +151,7 @@ static struct option builtin_config_options[] = {
> OPT_CALLBACK_VALUE(0, "path", &type, N_("value is a path (file or directory name)"), TYPE_PATH),
> OPT_CALLBACK_VALUE(0, "expiry-date", &type, N_("value is an expiry date"), TYPE_EXPIRY_DATE),
> OPT_GROUP(N_("Other")),
> - OPT_BOOL('z', "null", &end_null, N_("terminate values with NUL byte")),
> + OPT_BOOL('z', "null", &end_nul, N_("terminate values with NUL byte")),
> OPT_BOOL(0, "name-only", &omit_values, N_("show variable names only")),
> OPT_BOOL(0, "includes", &respect_includes_opt, N_("respect include directives on lookup")),
> OPT_BOOL(0, "show-origin", &show_origin, N_("show origin of config (file, standard input, blob, command line)")),
> @@ -178,11 +178,11 @@ static void check_argc(int argc, int min, int max)
>
> static void show_config_origin(struct strbuf *buf)
> {
> - const char term = end_null ? '\0' : '\t';
> + const char term = end_nul ? '\0' : '\t';
>
> strbuf_addstr(buf, current_config_origin_type());
> strbuf_addch(buf, ':');
> - if (end_null)
> + if (end_nul)
> strbuf_addstr(buf, current_config_name());
> else
> quote_c_style(current_config_name(), buf, NULL, 0);
> @@ -678,7 +678,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
> config_options.git_dir = get_git_dir();
> }
>
> - if (end_null) {
> + if (end_nul) {
> term = '\0';
> delim = '\n';
> key_delim = '\n';
^ permalink raw reply
* Re: [PATCH v2 2/4] config: fix config scope enum
From: Junio C Hamano @ 2020-01-09 19:06 UTC (permalink / raw)
To: Matthew Rogers via GitGitGadget; +Cc: git, Matthew Rogers
In-Reply-To: <e8e05f39407365a1bf5008820267d362e0cbffd6.1578565001.git.gitgitgadget@gmail.com>
"Matthew Rogers via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Matthew Rogers <mattr94@gmail.com>
>
> Previously when iterating through git config variables, worktree config
> and local config were both considered "CONFIG_SCOPE_REPO". This was
> never a problem before as no one had needed to differentiate between the
> two cases.
Hmph, then "fix" on the title is a bit misleading, no?
The enum may not have been as fine grained as you would have liked,
but if there was nothing broken, then we are doing this not to "fix"
anything.
A more important thing to explain would probably be why we
(i.e. those who propose this change, those who give favoriable
reviews to it, and those who accept it change to the system) would
want to see finer-grained classification. What do we expect to be
able to do with the resulting finer-grained set that we wouldn't be
able to with the original, and why is it a good thing? That is what
readers of the proposed log message of this change would want to
see, I would think.
> Additionally we rename what was CONFIG_SCOPE_REPO to CONFIG_SCOPE_LOCAL
> to reflect its new, more specific meaning.
>
> The clients of 'current_config_scope()' who cared about
> CONFIG_SCOPE_REPO are also modified to similarly care about
> CONFIG_SCOPE_WORKTREE and CONFIG_SCOPE_LOCAL to preserve previous behavior.
>
> Signed-off-by: Matthew Rogers <mattr94@gmail.com>
> ---
> config.c | 7 ++-----
> config.h | 3 ++-
> remote.c | 3 ++-
> t/helper/test-config.c | 4 +++-
> upload-pack.c | 3 ++-
> 5 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/config.c b/config.c
> index d75f88ca0c..447a013a15 100644
> --- a/config.c
> +++ b/config.c
> @@ -1724,15 +1724,12 @@ static int do_git_config_sequence(const struct config_options *opts,
> if (user_config && !access_or_die(user_config, R_OK, ACCESS_EACCES_OK))
> ret += git_config_from_file(fn, user_config, data);
>
> - current_parsing_scope = CONFIG_SCOPE_REPO;
> + current_parsing_scope = CONFIG_SCOPE_LOCAL;
> if (!opts->ignore_repo && repo_config &&
> !access_or_die(repo_config, R_OK, 0))
> ret += git_config_from_file(fn, repo_config, data);
>
> - /*
> - * Note: this should have a new scope, CONFIG_SCOPE_WORKTREE.
> - * But let's not complicate things before it's actually needed.
> - */
> + current_parsing_scope = CONFIG_SCOPE_WORKTREE;
> if (!opts->ignore_worktree && repository_format_worktree_config) {
> char *path = git_pathdup("config.worktree");
> if (!access_or_die(path, R_OK, 0))
> diff --git a/config.h b/config.h
> index 91fd4c5e96..284d92fb0e 100644
> --- a/config.h
> +++ b/config.h
> @@ -298,7 +298,8 @@ enum config_scope {
> CONFIG_SCOPE_UNKNOWN = 0,
> CONFIG_SCOPE_SYSTEM,
> CONFIG_SCOPE_GLOBAL,
> - CONFIG_SCOPE_REPO,
> + CONFIG_SCOPE_LOCAL,
> + CONFIG_SCOPE_WORKTREE,
> CONFIG_SCOPE_CMDLINE,
> };
>
> diff --git a/remote.c b/remote.c
> index 5c4666b53a..593ce297ed 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -369,7 +369,8 @@ static int handle_config(const char *key, const char *value, void *cb)
> }
> remote = make_remote(name, namelen);
> remote->origin = REMOTE_CONFIG;
> - if (current_config_scope() == CONFIG_SCOPE_REPO)
> + if (current_config_scope() == CONFIG_SCOPE_LOCAL ||
> + current_config_scope() == CONFIG_SCOPE_WORKTREE)
> remote->configured_in_repo = 1;
> if (!strcmp(subkey, "mirror"))
> remote->mirror = git_config_bool(key, value);
> diff --git a/t/helper/test-config.c b/t/helper/test-config.c
> index 214003d5b2..6695e463eb 100644
> --- a/t/helper/test-config.c
> +++ b/t/helper/test-config.c
> @@ -44,8 +44,10 @@ static const char *scope_name(enum config_scope scope)
> return "system";
> case CONFIG_SCOPE_GLOBAL:
> return "global";
> - case CONFIG_SCOPE_REPO:
> + case CONFIG_SCOPE_LOCAL:
> return "repo";
> + case CONFIG_SCOPE_WORKTREE:
> + return "worktree";
> case CONFIG_SCOPE_CMDLINE:
> return "cmdline";
> default:
> diff --git a/upload-pack.c b/upload-pack.c
> index a00d7ece6b..c53249cac1 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -1073,7 +1073,8 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
> precomposed_unicode = git_config_bool(var, value);
> }
>
> - if (current_config_scope() != CONFIG_SCOPE_REPO) {
> + if (current_config_scope() != CONFIG_SCOPE_LOCAL &&
> + current_config_scope() != CONFIG_SCOPE_WORKTREE) {
> if (!strcmp("uploadpack.packobjectshook", var))
> return git_config_string(&pack_objects_hook, var, value);
> }
^ permalink raw reply
* Re: git submodule update strange output behavior.
From: Junio C Hamano @ 2020-01-09 18:54 UTC (permalink / raw)
To: Carlo Wood; +Cc: git
In-Reply-To: <20200109192040.46aaa01e@hikaru>
Carlo Wood <carlo@alinoe.com> writes:
> In a project containing submodules, one of the submodules
> contains a submodule itself, which in turn also contains
> a submodule.
>
> Overview:
>
> project/foobar [submodule]
> project/cwm4 [submodule]
> project/evio [submodule]
> project/evio/protocol/matrixssl [submodule]
> project/evio/protocol/matrixssl/cwm4 [submodule]
>
> ('protocol' is a normal subdirectory)
>
> Running (with or without the --quiet),
>
> $ git submodule --quiet update --init --recursive --remote
> Fetching submodule protocol/matrixssl
> Fetching submodule protocol/matrixssl/cwm4
> Fetching submodule cwm4
>
> This is odd (a bug imho) because
>
> 1) it seems to only print this fetching information for submodules inside submodules,
> not for the top-level submodules.
> 2) it even prints this when using --quiet
> 3) it prints this every time (also when there is nothing more to fetch).
Sounds like a symptom of (a) the top-level "git submodule update"
knowing how to react to "--quiet" but (b) it forgets to pass down
the "--quiet" when it recursively runs "git submodule update" in its
submodules?
^ permalink raw reply
* git submodule update strange output behavior.
From: Carlo Wood @ 2020-01-09 18:20 UTC (permalink / raw)
To: git
In a project containing submodules, one of the submodules
contains a submodule itself, which in turn also contains
a submodule.
Overview:
project/foobar [submodule]
project/cwm4 [submodule]
project/evio [submodule]
project/evio/protocol/matrixssl [submodule]
project/evio/protocol/matrixssl/cwm4 [submodule]
('protocol' is a normal subdirectory)
Running (with or without the --quiet),
$ git submodule --quiet update --init --recursive --remote
Fetching submodule protocol/matrixssl
Fetching submodule protocol/matrixssl/cwm4
Fetching submodule cwm4
This is odd (a bug imho) because
1) it seems to only print this fetching information for submodules inside submodules,
not for the top-level submodules.
2) it even prints this when using --quiet
3) it prints this every time (also when there is nothing more to fetch).
--
Carlo Wood <carlo@alinoe.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox