Git development

Git development
 help / color / mirror / Atom feed

* Re: What's the recommendation for forgetting all rerere's records?
From: Junio C Hamano @ 2023-12-08 23:19 UTC (permalink / raw)
  To: Sean Allred; +Cc: git
In-Reply-To: <m0y1e7kkft.fsf@epic96565.epic.com>

Sean Allred <allred.sean@gmail.com> writes:

> When outside the context of a conflict (no rebase/merge in progress),
> what's the intended workflow to clear out the contents of
> $GIT_DIR/rr-cache?

You could "rm -fr .git/rr-cache/??*" if you want.

The "intended" workflow is there will no need to "clear out" at all;
you may notice mistaken resolution, and you will remove it when you
notice one, but the bulk of the remaining database entries ought to
be still correct.

^ permalink raw reply

* Re: [PATCH 4/7] builtin/clone: fix bundle URIs with mismatching object formats
From: Junio C Hamano @ 2023-12-08 23:11 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <c7a9d6ef74ff39e660f80e2e104a96b7c875845d.1701863960.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> This is not a problem in the context of the files backend ...
> It will become a problem though once
> we land the reftable backend, which indeed does require to know about
> the proper object format at the time of creation.

OK.  That answers the question I had on the previous step.

Thanks.

^ permalink raw reply

* Re: [PATCH 3/7] remote-curl: rediscover repository when fetching refs
From: Junio C Hamano @ 2023-12-08 23:09 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <a1b86a0cbbedcc6610b2c563e9e38d439338869d.1701863960.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> We're about to change git-clone(1) so that we set up the reference
> database at a later point. This change will cause git-remote-curl(1) to
> not detect the repository anymore due to "HEAD" not having been created
> yet at the time it spawns, and thus causes it to error out once it is
> asked to fetch the references.
>
> We can address this issue by trying to re-discover the Git repository in
> case none was detected at startup time. With this change, the clone will
> look as following:
>
>   1. git-clone(1) sets up the initial repository, excluding the
>      reference database.
>
>   2. git-clone(1) spawns git-remote-curl(1), which will be unable to
>      detect the repository due to a missing "HEAD".
>
>   3. git-clone(1) asks git-remote-curl(1) to list remote references.
>      This works just fine as this step does not require a local
>      repository
>
>   4. git-clone(1) creates the reference database as it has now learned
>      about the object format.

Sorry, but I am not sure I understand this step.  I assume you mean
by "the object format" what hash function is used to index the
objects (which can be learned from the remote "origin" in step 2 and
we can choose to use the one they use), not what ref backend is used
(which is purely a local matter and we do not need to know what is
used at the "origin").  Why do we need to wait initializing ref
backend until we learn what hash is being in use?

>   5. git-clone(1) asks git-remote-curl(1) to fetch the remote packfile.
>      The latter notices that it doesn't have a repository available, but
>      it now knows to try and re-discover it.
>
> If the re-discovery succeeds in the last step we can continue with the
> clone.

OK.

^ permalink raw reply

* Re: [PATCH 3/9] imap-send: don't use git_die_config() inside callback
From: Taylor Blau @ 2023-12-08 22:58 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Jeff King, git
In-Reply-To: <ZXGJE-pkb3BjlO-d@tanuki>

On Thu, Dec 07, 2023 at 09:57:55AM +0100, Patrick Steinhardt wrote:
> On Thu, Dec 07, 2023 at 02:24:58AM -0500, Jeff King wrote:
> [snip]
> > diff --git a/imap-send.c b/imap-send.c
> > index 996651e4f8..5b0fe4f95a 100644
> > --- a/imap-send.c
> > +++ b/imap-send.c
> > @@ -1346,7 +1346,7 @@ static int git_imap_config(const char *var, const char *val,
> >  		server.port = git_config_int(var, val, ctx->kvi);
> >  	else if (!strcmp("imap.host", var)) {
> >  		if (!val) {
> > -			git_die_config("imap.host", "Missing value for 'imap.host'");
> > +			return error("Missing value for 'imap.host'");
>
> Nit: while at it we might also mark this error for translation. Not
> worth a reroll on its own though.

This string goes away entirely in the next patch, so I don't think we
need to mark it here.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 1/7] setup: extract function to create the refdb
From: Junio C Hamano @ 2023-12-08 22:54 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, git
In-Reply-To: <ZXFy0_T1AZLh058g@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Dec 06, 2023 at 10:10:37PM +0100, Karthik Nayak wrote:
>> On Wed, Dec 6, 2023 at 1:40 PM Patrick Steinhardt <ps@pks.im> wrote:
>> > +static void create_reference_database(const char *initial_branch, int quiet)
>> > +{
>> > +       struct strbuf err = STRBUF_INIT;
>> > +       int reinit = is_reinit();
>> > +
>> > +       /*
>> > +        * We need to create a "refs" dir in any case so that older
>> > +        * versions of git can tell that this is a repository.
>> > +        */
>> 
>> How does this work though, even if an earlier version of git can tell
>> that this is a repository, it still won't be able to read the reftable
>> backend. In that sense, what do we achieve here?
>
> This is a good question, and there is related ongoing discussion about
> this topic in the thread starting at [1]. There are a few benefits to
> letting clients discover such repos even if they don't understand the
> new reference backend format:
>
>   - They know to stop walking up the parent-directory chain. Otherwise a
>     client might end up detecting a Git repository in the parent dir.
>
>   - The user gets a proper error message why the repository cannot be
>     accessed. Instead of failing to detect the repository altogether we
>     instead say that we don't understand the "extensions.refFormat"
>     extension.

Yup, both are very good reasons.  Would it help to sneak a condensed
version of it in the in-code comment, perhaps?

^ permalink raw reply

* Re: [PATCH 1/9] config: reject bogus values for core.checkstat
From: Taylor Blau @ 2023-12-08 22:50 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20231207072404.GA1277973@coredump.intra.peff.net>

On Thu, Dec 07, 2023 at 02:24:04AM -0500, Jeff King wrote:
> If you feed nonsense config like:
>
>   git -c core.checkstat=foobar status
>
> we'll silently ignore the unknown value, rather than reporting an error.
> This goes all the way back to c08e4d5b5c (Enable minimal stat checking,
> 2013-01-22).
>
> Detecting and complaining now is technically a backwards-incompatible
> change, but I don't think anybody has any reason to use an invalid value
> here. There are no historical values we'd want to allow for backwards
> compatibility or anything like that. We are better off loudly telling
> the user that their config may not be doing what they expect.

I think this is a good instance of "yes, this is a backwards
incompatible change, but the behavior we're breaking is so obviously
broken already that it's not worth maintaining compatibility."

Well reasoned, I am definitely in favor here.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 09/24] repack: implement `--extend-disjoint` mode
From: Taylor Blau @ 2023-12-08 22:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano
In-Reply-To: <ZXLRjeu66qE6J1K1@tanuki>

On Fri, Dec 08, 2023 at 09:19:25AM +0100, Patrick Steinhardt wrote:
> > Writing this out, I think that you could make an argument that
> > `--exclude-disjoint` is a better name for the last option. So I'm
> > definitely open to suggestions here, but I don't want to get too bogged
> > down on command-line option naming (so long as we're all reasonably
> > happy with the result).
>
> Yeah, as said, I don't mind it too much. It's a complex area and the
> flags all do different things, so it's expected that you may have to do
> some research on what exactly they do. That being said, I do like your
> proposed `--exclude-disjoint` a lot more than `--ignore-disjoint`.

I think that's fair, I renamed the option to be "--exclude-disjoint"
instead of "--ignore-disjoint" for any subsequent round(s) of this
series.

> > > One thing I wondered: do we need to consider the `-l` flag? When using
> > > an alternate object directory it is totally feasible that the alternate
> > > may be creating new disjoint packages without us knowing, and thus we
> > > may not be able to guarantee the disjoint property anymore.
> >
> > I don't think so. We'd only care about one direction of this (that
> > alternates do not create disjoint packs which overlap with ours, instead
> > of the other way around), but since we don't put non-local packs in the
> > MIDX, I think we're OK.
> >
> > I suppose that you might run into trouble if you use the chained MIDX
> > thing (via its `->next` pointer). I haven't used that feature myself, so
> > I'd have to play around with it.
>
> We do use this feature at GitLab for forks, where forks connect to a
> common alternate object directory to deduplicate objects. As both the
> fork repository and the alternate object directory use an MIDX I think
> they would be set up exactly like that.

Yep, that's right. I wasn't sure whether or not this feature had been
used extensively in production or not (we don't use it at GitHub, since
objects only live in their fork repositories for a short while before
moving to the fork network repository).

> I guess the only really viable solution here is to ignore disjoint packs
> in the main repo that connects to the alternate in the case where the
> alternate has any disjoint packs itself.

I think the behavior you'd get here is that we'd only look for disjoint
packs in the first MIDX in the chain (which is always the local one),
and we'd only recognizes packs from that MIDX as being potentially
disjoint.

If you have the bulk of your repositories in the alternate, then I think
you might want to consider how we combine the two. My sense is that
you'd want to be disjoint with respect to anything downstream of you.

Whether or not this is a feature that you/others need, I definitely
think we should leave it out of this series, since I am (a) fairly
certain that this is possible to do, and (b) already feel like this
series on its own is complicated enough.

Thanks,
Taylor

^ permalink raw reply

* [PATCH] revision: parse integer arguments to --max-count, --skip, etc., more carefully
From: Junio C Hamano @ 2023-12-08 22:35 UTC (permalink / raw)
  To: git; +Cc: Britton Kerin
In-Reply-To: <xmqqy1e41lf5.fsf@gitster.g>

The "rev-list" and other commands in the "log" family, being the
oldest part of the system, use their own custom argument parsers,
and integer values of some options are parsed with atoi(), which
allows a non-digit after the number (e.g., "1q") to be silently
ignored.  As a natural consequence, an argument that does not begin
with a digit (e.g., "q") silently becomes zero, too.

Switch to use strtol_i() and parse_timestamp() appropriately to
catch bogus input.

Note that one may naïvely expect that --max-count, --skip, etc., to
only take non-negative values, but we must allow them to also take
negative values, as an escape hatch to countermand a limit set by an
earlier option on the command line; the underlying variables are
initialized to (-1) and "--max-count=-1", for example, is a
legitimate way to reinitialize the limit.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 revision.c                 | 41 ++++++++++++++++++++++++++++----------
 t/t6005-rev-list-count.sh  | 15 +++++++++++++-
 t/t6009-rev-list-parent.sh | 11 ++++++++++
 3 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/revision.c b/revision.c
index 0ae1c76db3..8cda6e43d4 100644
--- a/revision.c
+++ b/revision.c
@@ -2214,6 +2214,27 @@ static void add_message_grep(struct rev_info *revs, const char *pattern)
 	add_grep(revs, pattern, GREP_PATTERN_BODY);
 }
 
+static int parse_count(const char *arg)
+{
+	int count;
+
+	if (strtol_i(arg, 10, &count) < 0)
+		die("'%s': not an integer", arg);
+	return count;
+}
+
+static timestamp_t parse_age(const char *arg)
+{
+	timestamp_t num;
+	char *p;
+
+	errno = 0;
+	num = parse_timestamp(arg, &p, 10);
+	if (errno || *p || p == arg)
+		die("'%s': not a number of seconds since epoch", arg);
+	return num;
+}
+
 static int handle_revision_opt(struct rev_info *revs, int argc, const char **argv,
 			       int *unkc, const char **unkv,
 			       const struct setup_revision_opt* opt)
@@ -2240,29 +2261,27 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	}
 
 	if ((argcount = parse_long_opt("max-count", argv, &optarg))) {
-		revs->max_count = atoi(optarg);
+		revs->max_count = parse_count(optarg);
 		revs->no_walk = 0;
 		return argcount;
 	} else if ((argcount = parse_long_opt("skip", argv, &optarg))) {
-		revs->skip_count = atoi(optarg);
+		revs->skip_count = parse_count(optarg);
 		return argcount;
 	} else if ((*arg == '-') && isdigit(arg[1])) {
 		/* accept -<digit>, like traditional "head" */
-		if (strtol_i(arg + 1, 10, &revs->max_count) < 0 ||
-		    revs->max_count < 0)
-			die("'%s': not a non-negative integer", arg + 1);
+		revs->max_count = parse_count(arg + 1);
 		revs->no_walk = 0;
 	} else if (!strcmp(arg, "-n")) {
 		if (argc <= 1)
 			return error("-n requires an argument");
-		revs->max_count = atoi(argv[1]);
+		revs->max_count = parse_count(argv[1]);
 		revs->no_walk = 0;
 		return 2;
 	} else if (skip_prefix(arg, "-n", &optarg)) {
-		revs->max_count = atoi(optarg);
+		revs->max_count = parse_count(optarg);
 		revs->no_walk = 0;
 	} else if ((argcount = parse_long_opt("max-age", argv, &optarg))) {
-		revs->max_age = atoi(optarg);
+		revs->max_age = parse_age(optarg);
 		return argcount;
 	} else if ((argcount = parse_long_opt("since", argv, &optarg))) {
 		revs->max_age = approxidate(optarg);
@@ -2274,7 +2293,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->max_age = approxidate(optarg);
 		return argcount;
 	} else if ((argcount = parse_long_opt("min-age", argv, &optarg))) {
-		revs->min_age = atoi(optarg);
+		revs->min_age = parse_age(optarg);
 		return argcount;
 	} else if ((argcount = parse_long_opt("before", argv, &optarg))) {
 		revs->min_age = approxidate(optarg);
@@ -2362,11 +2381,11 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--no-merges")) {
 		revs->max_parents = 1;
 	} else if (skip_prefix(arg, "--min-parents=", &optarg)) {
-		revs->min_parents = atoi(optarg);
+		revs->min_parents = parse_count(optarg);
 	} else if (!strcmp(arg, "--no-min-parents")) {
 		revs->min_parents = 0;
 	} else if (skip_prefix(arg, "--max-parents=", &optarg)) {
-		revs->max_parents = atoi(optarg);
+		revs->max_parents = parse_count(optarg);
 	} else if (!strcmp(arg, "--no-max-parents")) {
 		revs->max_parents = -1;
 	} else if (!strcmp(arg, "--boundary")) {
diff --git a/t/t6005-rev-list-count.sh b/t/t6005-rev-list-count.sh
index 0729f800c3..62538f5a02 100755
--- a/t/t6005-rev-list-count.sh
+++ b/t/t6005-rev-list-count.sh
@@ -18,13 +18,23 @@ test_expect_success 'no options' '
 '
 
 test_expect_success '--max-count' '
+	test_must_fail git rev-list --max-count=1q HEAD 2>error &&
+	grep "not an integer" error &&
+
 	test_stdout_line_count = 0 git rev-list HEAD --max-count=0 &&
 	test_stdout_line_count = 3 git rev-list HEAD --max-count=3 &&
 	test_stdout_line_count = 5 git rev-list HEAD --max-count=5 &&
-	test_stdout_line_count = 5 git rev-list HEAD --max-count=10
+	test_stdout_line_count = 5 git rev-list HEAD --max-count=10 &&
+	test_stdout_line_count = 5 git rev-list HEAD --max-count=-1
 '
 
 test_expect_success '--max-count all forms' '
+	test_must_fail git rev-list -1q HEAD 2>error &&
+	grep "not an integer" error &&
+	test_must_fail git rev-list --1 HEAD &&
+	test_must_fail git rev-list -n 1q HEAD 2>error &&
+	grep "not an integer" error &&
+
 	test_stdout_line_count = 1 git rev-list HEAD --max-count=1 &&
 	test_stdout_line_count = 1 git rev-list HEAD -1 &&
 	test_stdout_line_count = 1 git rev-list HEAD -n1 &&
@@ -32,6 +42,9 @@ test_expect_success '--max-count all forms' '
 '
 
 test_expect_success '--skip' '
+	test_must_fail git rev-list --skip 1q HEAD 2>error &&
+	grep "not an integer" error &&
+
 	test_stdout_line_count = 5 git rev-list HEAD --skip=0 &&
 	test_stdout_line_count = 2 git rev-list HEAD --skip=3 &&
 	test_stdout_line_count = 0 git rev-list HEAD --skip=5 &&
diff --git a/t/t6009-rev-list-parent.sh b/t/t6009-rev-list-parent.sh
index 5a67bbc760..9c9a8459af 100755
--- a/t/t6009-rev-list-parent.sh
+++ b/t/t6009-rev-list-parent.sh
@@ -62,6 +62,17 @@ test_expect_success 'setup roots, merges and octopuses' '
 	git checkout main
 '
 
+test_expect_success 'parse --max-parents & --min-parents' '
+	test_must_fail git rev-list --max-parents=1q HEAD 2>error &&
+	grep "not an integer" error &&
+
+	test_must_fail git rev-list --min-parents=1q HEAD 2>error &&
+	grep "not an integer" error &&
+
+	git rev-list --max-parents=1 --min-parents=1 HEAD &&
+	git rev-list --max-parents=-1 --min-parents=-1 HEAD
+'
+
 test_expect_success 'rev-list roots' '
 
 	check_revlist "--max-parents=0" one five
-- 
2.43.0


^ permalink raw reply related

* Re: Minor UX annoyance w/`git add --patch untracked/file`
From: Taylor Blau @ 2023-12-08 22:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Vito Caputo, git
In-Reply-To: <xmqqmsuk1jvp.fsf@gitster.g>

On Sat, Dec 09, 2023 at 06:09:46AM +0900, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > They _could_, but keep in mind that the argument is not strictly a path.
> > It is a pathspec that may match multiple paths. So:
> >
> >   git add -p path/to/
> >
> > for example will pick up the tracked files in path/to/, but not your
> > untracked one.
>
> The corresponding command w/o "-p", i.e., "git add path/to/", will
> pick up both tracked and untracked ones from the named directory,
> while honoring the ignore settings.  So I suspect it might feel more
> natural if "-p" followed suit.

I tend to agree. I do think that the full specification of when "git add
-p" implies "git add -N ... && git add -p" would be difficult to explain
to users.

But I think it's a worthwhile trade-off in that it makes the UX more
consistent in the common case (where the argument to add is a literal
path, not a pathspec).

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2 00/11] reftable: small set of fixes
From: Taylor Blau @ 2023-12-08 22:26 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <cover.1702047081.git.ps@pks.im>

On Fri, Dec 08, 2023 at 03:52:53PM +0100, Patrick Steinhardt wrote:
>  reftable/block.c          |  23 ++++-----
>  reftable/block.h          |   6 +++
>  reftable/block_test.c     |   4 +-
>  reftable/blocksource.c    |   2 +-
>  reftable/iter.h           |   8 +--
>  reftable/merged.c         |  31 +++++------
>  reftable/merged.h         |   2 +
>  reftable/reader.c         |   7 ++-
>  reftable/readwrite_test.c |   6 +--
>  reftable/stack.c          |  73 +++++++++++---------------
>  reftable/stack_test.c     | 105 +++++++++++++++++++++++++++++++++++++-
>  reftable/test_framework.h |  58 +++++++++++----------
>  12 files changed, 211 insertions(+), 114 deletions(-)

This all looks good to me. I gave this a pretty careful read and added a
couple of minor suggestions, but nothing that would indicate the need
for a re-roll.

Thanks for working on this!

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2 07/11] reftable/stack: fix stale lock when dying
From: Taylor Blau @ 2023-12-08 22:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <e82a68aecd0a1179df3a59755864c71995e979d3.1702047081.git.ps@pks.im>

On Fri, Dec 08, 2023 at 03:53:23PM +0100, Patrick Steinhardt wrote:
> When starting a transaction via `reftable_stack_init_addition()`, we
> create a lockfile for the reftable stack itself which we'll write the
> new list of tables to. But if we terminate abnormally e.g. via a call to
> `die()`, then we do not remove the lockfile. Subsequent executions of
> Git which try to modify references will thus fail with an out-of-date
> error.
>
> Fix this bug by registering the lock as a `struct tempfile`, which
> ensures automatic cleanup for us.

Makes sense.

> @@ -475,7 +471,7 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
>  		goto done;
>  	}
>  	if (st->config.default_permissions) {
> -		if (chmod(add->lock_file_name.buf, st->config.default_permissions) < 0) {
> +		if (chmod(lock_file_name.buf, st->config.default_permissions) < 0) {

Hmm. Would we want to use add->lock_file->filename.buf here instead? I
don't think that it matters (other than that the lockfile's pathname is
absolute). But it arguably makes it clearer to readers that we're
touching calling chmod() on the lockfile here, and hardens us against
the contents of the temporary strbuf changing.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2 06/11] reftable/stack: reuse buffers when reloading stack
From: Taylor Blau @ 2023-12-08 22:17 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <f797feff8dec383f1db9ae403cd89b80d1743432.1702047081.git.ps@pks.im>

On Fri, Dec 08, 2023 at 03:53:18PM +0100, Patrick Steinhardt wrote:
> In `reftable_stack_reload_once()` we iterate over all the tables added
> to the stack in order to figure out whether any of the tables needs to
> be reloaded. We use a set of buffers in this context to compute the
> paths of these tables, but discard those buffers on every iteration.
> This is quite wasteful given that we do not need to transfer ownership
> of the allocated buffer outside of the loop.
>
> Refactor the code to instead reuse the buffers to reduce the number of
> allocations we need to do.

> @@ -267,16 +265,13 @@ static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
>  	for (i = 0; i < cur_len; i++) {
>  		if (cur[i]) {
>  			const char *name = reader_name(cur[i]);
> -			struct strbuf filename = STRBUF_INIT;
> -			stack_filename(&filename, st, name);
> +			stack_filename(&table_path, st, name);

This initially caught me by surprise, but on closer inspection I agree
that this is OK, since stack_filename() calls strbuf_reset() before
adjusting the buffer contents.

(As a side-note, I do find the side-effect of stack_filename() to be a
little surprising, but that's not the fault of this series and not worth
changing here.)

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2 05/11] reftable/stack: perform auto-compaction with transactional interface
From: Taylor Blau @ 2023-12-08 22:14 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <77b9ae8aa675dd96dd10f4a5369f1f994fa59939.1702047081.git.ps@pks.im>

On Fri, Dec 08, 2023 at 03:53:14PM +0100, Patrick Steinhardt wrote:
> [...] It can thus happen quite fast that the stack grows very long, which
> results in performance issues when trying to read records.

This sentence was a little confusing to read at first glance. Perhaps
instead:

    The stack can grow to become quite long rather quickly, leading to
    performance issues when trying to read records.

> But besides
> performance issues, this can also lead to exhaustion of file descriptors
> very rapidly as every single table requires a separate descriptor when
> opening the stack.

Well explained, thanks. Everything else here looks good to me.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2 04/11] reftable/stack: verify that `reftable_stack_add()` uses auto-compaction
From: Taylor Blau @ 2023-12-08 21:35 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <8061b9d2fcb3e8c3d1fd641e705b9a8879e452f4.1702047081.git.ps@pks.im>

On Fri, Dec 08, 2023 at 03:53:10PM +0100, Patrick Steinhardt wrote:
> diff --git a/reftable/stack_test.c b/reftable/stack_test.c
> index 0644c8ad2e..c979d177c2 100644
> --- a/reftable/stack_test.c
> +++ b/reftable/stack_test.c
> @@ -850,6 +850,52 @@ static void test_reftable_stack_auto_compaction(void)
>  	clear_dir(dir);
>  }
>
> +static void test_reftable_stack_add_performs_auto_compaction(void)
> +{
> +	struct reftable_write_options cfg = { 0 };
> +	struct reftable_stack *st = NULL;
> +	char *dir = get_tmp_dir(__LINE__);
> +	int err, i, n = 20;
> +
> +	err = reftable_new_stack(&st, dir, cfg);
> +	EXPECT_ERR(err);
> +
> +	for (i = 0; i <= n; i++) {
> +		struct reftable_ref_record ref = {
> +			.update_index = reftable_stack_next_update_index(st),
> +			.value_type = REFTABLE_REF_SYMREF,
> +			.value.symref = "master",
> +		};
> +		char name[100];
> +
> +		/*
> +		 * Disable auto-compaction for all but the last runs. Like this
> +		 * we can ensure that we indeed honor this setting and have
> +		 * better control over when exactly auto compaction runs.
> +		 */
> +		st->disable_auto_compact = i != n;
> +
> +		snprintf(name, sizeof(name), "branch%04d", i);
> +		ref.refname = name;

Is there a reason that we have to use snprintf() here and not a strbuf?

I would have expected to see something like:

    struct strbuf buf = STRBUF_INIT;
    /* ... */
    strbuf_addf(&buf, "branch%04d", i);
    ref.refname = strbuf_detach(&buf, NULL);

I guess it doesn't matter too much, but I think if we can avoid using
snprintf(), it's worth doing. If we must use snprintf() here, we should
probably use Git's xsnprintf() instead.

> +		err = reftable_stack_add(st, &write_test_ref, &ref);
> +		EXPECT_ERR(err);
> +
> +		/*
> +		 * The stack length should grow continuously for all runs where
> +		 * auto compaction is disabled. When enabled, we should merge
> +		 * all tables in the stack.
> +		 */
> +		if (i != n)
> +			EXPECT(st->merged->stack_len == i + 1);
> +		else
> +			EXPECT(st->merged->stack_len == 1);

You could shorten this to

    EXPECT(st->merged->stack_len == (i == n ? 1 : i + 1);

But I like the version that you wrote here better, because it clearly
indicates when we should and should not perform compaction.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v2 02/11] reftable: handle interrupted reads
From: Taylor Blau @ 2023-12-08 21:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder
In-Reply-To: <b404fdf066e802328bcbcefeb9da7c996738f840.1702047081.git.ps@pks.im>

On Fri, Dec 08, 2023 at 03:53:02PM +0100, Patrick Steinhardt wrote:
> There are calls to pread(3P) and read(3P) where we don't properly handle
> interrupts. Convert them to use `pread_in_full()` and `read_in_full()`,

Just checking... do you mean "interrupt" in the kernel sense? Or are you
referring to the possibility of short reads/writes (in later patches)?

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH] gitk: add setting to hide unknown refs
From: Junio C Hamano @ 2023-12-08 21:13 UTC (permalink / raw)
  To: Joachim B Haga via GitGitGadget; +Cc: git, Joachim B Haga
In-Reply-To: <pull.1619.git.1701695899635.gitgitgadget@gmail.com>

"Joachim B Haga via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This patch adds a setting to gitk to exclude all unknown refs - which
> is considerably simpler than trying to honour the `excludeDecoration`
> pattern.

"This was simpler to implement" is a one-time cost savings for the
developer who added the feature.  For that one-time cost savings,
all the current and future users will pay the price of inconsistent
behaviour between "gitk" and "git log".

It does not look like a good trade-off.

^ permalink raw reply

* Re: Minor UX annoyance w/`git add --patch untracked/file`
From: Junio C Hamano @ 2023-12-08 21:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Vito Caputo, git
In-Reply-To: <20231206195407.GD103708@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> They _could_, but keep in mind that the argument is not strictly a path.
> It is a pathspec that may match multiple paths. So:
>
>   git add -p path/to/
>
> for example will pick up the tracked files in path/to/, but not your
> untracked one.

The corresponding command w/o "-p", i.e., "git add path/to/", will
pick up both tracked and untracked ones from the named directory,
while honoring the ignore settings.  So I suspect it might feel more
natural if "-p" followed suit.

Not that I feel strongly either way.  The command has only worked
with already tracked files since its inception and nobody complained
in the past 15 years or so, probably because nobody cared that much
for relatively rare event of creating a new file and adding it.

Thanks.

^ permalink raw reply

* Re: [PATCH] setup: recognize bare repositories with packed-refs
From: Taylor Blau @ 2023-12-08 21:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Adam Majer, git
In-Reply-To: <20231206210836.GA106480@coredump.intra.peff.net>

On Wed, Dec 06, 2023 at 04:08:36PM -0500, Jeff King wrote:
> On Wed, Nov 29, 2023 at 04:30:46PM -0500, Taylor Blau wrote:
>
> > On Tue, Nov 28, 2023 at 02:04:46PM -0500, Jeff King wrote:
> > >   - whatever is consuming the embedded repos could "mkdir -p refs
> > >     objects" as needed. This is a minor pain, but I think in the long
> > >     term we are moving to a world where you have to explicitly do
> > >     "GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
> > >     they're already special and require some setup; adding an extra step
> > >     may not be so bad.
> >
> > I hope not. I suppose that using embedded bare repositories in a test
> > requires additional setup at least to "cd" into the directory (if they
> > are not using `$GIT_DIR` or `--git-dir` already). But I fear that
> > imposing even a small change like this is too tall an order for how many
> > millions of these exist in the wild across all sorts of projects.
>
> I dunno. I am skeptical that there are millions of these. Who really
> wants to embed bare git repos except for projects related to Git itself,
> which want test vectors? Is there a use case I'm missing?

Just picking on GitHub as an example, my copy has a fair number of
embedded bare repositories:

    $ find . -mindepth 2 -type d -name '*.git' | wc -l
    279

That might be an unfair example in general, since GitHub probably has a
greater need to embed bare repositories than most other projects. But I
think that we shouldn't make our decision here based on volume of
embedded bare repositories, but rather on the number of projects which
have >1 embedded bare repository.

In other words, the cost of migrating a single project's embedded bare
repositories is roughly the same whether there are 1 or 279 of them. So
the effort scales with the number of projects, not repositories.

Perhaps I'm over-estimating how difficult this transition would be to
impose on users. But it does make me very leery to make this kind of a
change without having a better sense of how many of them exist in the
wild.

Searching just on GitHub for `path:**/*.git/config` [^1], it looks like
there are ~1,400 results. That provides us an upper-bound on the number
of projects which have embedded bare repositories, so perhaps I really
am overestimating the burden we'd be imposing on other projects.

I dunno :-).

Thanks,
Taylor

[^1]: Searching for "path:**/*.git" doesn't quite work, since GitHub's
  search doesn't match directories here. So the search I actually used
  isn't perfect, but it should give us a rough approximation.

^ permalink raw reply

* Re: [Outreachy][PATCH v3] t2400: avoid using pipes
From: Junio C Hamano @ 2023-12-08 21:00 UTC (permalink / raw)
  To: Achu Luma; +Cc: christian.couder, git
In-Reply-To: <20231204153740.2992-1-ach.lumap@gmail.com>

Achu Luma <ach.lumap@gmail.com> writes:

> Subject: Re: [Outreachy][PATCH v3] t2400: avoid using pipes

"avoid using pipes" is a means to an end.  And it is more important
to tell readers what that "end" is.  With this patch, what are we
trying to achieve?  Cater to platforms that lack pipes?  Help
platforms that cannot run two processes at the same time, so let one
run and store the result in a file, and then let the other one run,
to reduce the CPU load?

If we run a "git" command, especially a command we are testing, on
the upstream side of a pipe, we lose information.  We cannot tell
what exit status the command exited with.  That is what we care
about.

So, it is better to say that in the title, e.g.,

    Subject: [PATCH] t2400: avoid losing exit status to pipes

> The exit code of the preceding command in a pipe is disregarded,
> so it's advisable to refrain from relying on it.

It is unclear what "it" refers to here.  We cannot rely on the exit
code of the command on the upstream side of a pipe, obviously.

> Instead, by
> saving the output of a Git command to a file, we gain the
> ability to examine the exit codes of both commands separately.

Surely.  I personally think that the title that says what the
purpose of the patch is clearly should be sufficient without any
further description in the body, though.
>
> Signed-off-by: Achu Luma <ach.lumap@gmail.com>
> ---
>  Since v2 I don't send a cover  letter anymore, and I changed 
>  my "Signed-of-by: ..." line so that it
>  contains my full real name and I added "Outreachy" to the subject.

Nicely done.

>
>  t/t2400-worktree-add.sh | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/t/t2400-worktree-add.sh b/t/t2400-worktree-add.sh
> index df4aff7825..7ead05bb98 100755
> --- a/t/t2400-worktree-add.sh
> +++ b/t/t2400-worktree-add.sh
> @@ -468,7 +468,8 @@ test_expect_success 'put a worktree under rebase' '
>  		cd under-rebase &&
>  		set_fake_editor &&
>  		FAKE_LINES="edit 1" git rebase -i HEAD^ &&
> -		git worktree list | grep "under-rebase.*detached HEAD"
> +		git worktree list >actual && 
> +		grep "under-rebase.*detached HEAD" actual
>  	)
>  '
>  
> @@ -509,7 +510,8 @@ test_expect_success 'checkout a branch under bisect' '
>  		git bisect start &&
>  		git bisect bad &&
>  		git bisect good HEAD~2 &&
> -		git worktree list | grep "under-bisect.*detached HEAD" &&
> +		git worktree list >actual && 
> +		grep "under-bisect.*detached HEAD" actual &&
>  		test_must_fail git worktree add new-bisect under-bisect &&
>  		! test -d new-bisect
>  	)

^ permalink raw reply

* Re: [BUG] rev-list doesn't validate arguments to -n option
From: Junio C Hamano @ 2023-12-08 20:36 UTC (permalink / raw)
  To: git; +Cc: Britton Kerin
In-Reply-To: <CAC4O8c-nuOTS=a0sVp1603KaM2bZjs+yNZzdAaa5CGTNGFE7hQ@mail.gmail.com>

Britton Kerin <britton.kerin@gmail.com> writes:

> It tolerates non-numeric arguments and garbage after a number:
>
> For example:
>
> $ # -n 1 means same as -n 0:
> $ git rev-list -n q newest_commit
> $ git rev-list -n 0 newest_commit
> $ # Garbage after number is tolerated:
> $ git rev-list -n 1q newest_commit
> 3be33f83695088d968cf084a1a08bdcde25a8d7a
> $ git rev-list -n 2q newest_commit
> 3be33f83695088d968cf084a1a08bdcde25a8d7a
> 286e62e1b68e39334978e6222cbff187ecc17bcf

Indeed, unlike the option parsing for most commands, "rev-list" and
"log" family of commands in the oldest part of the system still use
hand-crafted option parsing and most notably use atoi() instead of a
more robust strtol() family of functions.  The same issue is present
for parsing things like "--max-count=1q" and "--skip=1q".

Perhaps a change along this line, plus a few new tests, would
suffice.  There are others (like "--max-age" we see in the post
context of the patch) that use atoi() but they probably should not
share the same machinery (most importantly, the type of the value)
as "--max-count", so I didn't touch it in the below.

 revision.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git c/revision.c w/revision.c
index 00d5c29bfc..99e838bbd1 100644
--- c/revision.c
+++ w/revision.c
@@ -2223,6 +2223,15 @@ static void add_message_grep(struct rev_info *revs, const char *pattern)
 	add_grep(revs, pattern, GREP_PATTERN_BODY);
 }
 
+static int parse_count(const char *arg)
+{
+	int count;
+
+	if (strtol_i(arg, 10, &count) < 0 || count < 0)
+		die("'%s': not a non-negative integer", arg);
+	return count;
+}
+
 static int handle_revision_opt(struct rev_info *revs, int argc, const char **argv,
 			       int *unkc, const char **unkv,
 			       const struct setup_revision_opt* opt)
@@ -2249,26 +2258,24 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	}
 
 	if ((argcount = parse_long_opt("max-count", argv, &optarg))) {
-		revs->max_count = atoi(optarg);
+		revs->max_count = parse_count(optarg);
 		revs->no_walk = 0;
 		return argcount;
 	} else if ((argcount = parse_long_opt("skip", argv, &optarg))) {
-		revs->skip_count = atoi(optarg);
+		revs->skip_count = parse_count(optarg);
 		return argcount;
 	} else if ((*arg == '-') && isdigit(arg[1])) {
 		/* accept -<digit>, like traditional "head" */
-		if (strtol_i(arg + 1, 10, &revs->max_count) < 0 ||
-		    revs->max_count < 0)
-			die("'%s': not a non-negative integer", arg + 1);
+		revs->max_count = parse_count(arg + 1);
 		revs->no_walk = 0;
 	} else if (!strcmp(arg, "-n")) {
 		if (argc <= 1)
 			return error("-n requires an argument");
-		revs->max_count = atoi(argv[1]);
+		revs->max_count = parse_count(argv[1]);
 		revs->no_walk = 0;
 		return 2;
 	} else if (skip_prefix(arg, "-n", &optarg)) {
-		revs->max_count = atoi(optarg);
+		revs->max_count = parse_count(optarg);
 		revs->no_walk = 0;
 	} else if ((argcount = parse_long_opt("max-age", argv, &optarg))) {
 		revs->max_age = atoi(optarg);

^ permalink raw reply related

* Re: getting git send-email patches from someone who is behind
From: Sandra Snan @ 2023-12-08 18:57 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git
In-Reply-To: <20231208-amusing-vengeful-raptor-e71a8b@meerkat>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
> Yes, setting base-commit is the right solution here -- it will 
> tell you exactly where in the tree it belongs.

I didn't know that before today and that's good advice when 
sending patches, but if I'm receiving patches without that field 
set, I wonder how people are dealing with that.

> In its absence, tools like b4 will also try to guess where the 
> series might belong by comparing the file index information 
> mentioned in the patches.

Ah, this! https://b4.docs.kernel.org/en/latest/

This looks wonderful, thank you!

^ permalink raw reply

* Re: getting git send-email patches from someone who is behind
From: Konstantin Ryabitsev @ 2023-12-08 18:49 UTC (permalink / raw)
  To: Sandra Snan; +Cc: git
In-Reply-To: <87h6ksk2hk.fsf@ellen.idiomdrottning.org>

On Fri, Dec 08, 2023 at 06:50:31PM +0100, Sandra Snan wrote:
> I have a li'l git send-email question. Someone sent me a patch set of six
> patches today but they were not on most current main. I had to guess what
> version they were sending to so I could git am when I was on that particular
> version. I managed to sort it all out so this question is more for future
> reference.
> 
> Isn't there a way inside of the emails that it can show what version to
> apply the patches to?
> 
> Because now I was like "OK, I remember talking to them the other day and
> that means they probably are on what for me is HEAD^^" and that turned out
> to be correct, and sorting out the conflicts was also easy enough,
> but if I hadn't talked to them beforehand I would've been completely lost.
> 
> I asked another friend about it and he said:
> 
> > it's possible to record the base commit:
> > https://git-scm.com/docs/git-format-patch#_base_tree_information
> > however, it's a bit finicky to do with git-send-email
> 
> I dunno.
> 
> I get that one of the fun parts about using patches instead of PRs is
> that you can be a li'l more loosey goosey about exactly what commit
> something is supposed to belong to but here I would've been completely
> lost because the patchset just borked horribly right from the first patch.
> 
> If others have run into this, what's the solution?

Yes, setting base-commit is the right solution here -- it will tell you
exactly where in the tree it belongs. In its absence, tools like b4 will also
try to guess where the series might belong by comparing the file index
information mentioned in the patches. It's a clever, but not exact process,
because the patch may depend on changes in the files that aren't being
modified.

In general, having a base-commit: line in the cover letter or first patch is
absolutely the right way to go.

-K

^ permalink raw reply

* Re: [PATCH] setup: recognize bare repositories with packed-refs
From: Junio C Hamano @ 2023-12-08 18:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Adam Majer, git
In-Reply-To: <20231128190446.GA10477@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> So with regards to the loosening in your patch, my questions would be:
>
>   - if we are going to change the rules for repository detection, is
>     this where we want to end up? We haven't changed them (yet) for
>     reftables. If we are going to do so, should we have a scheme that
>     will work for that transition, too? The "refs is an empty file"
>     scheme would fix your use case, too (though see below).
>
>   - is the rest of Git ready to handle a missing "refs/" directory? It
>     looks like making a ref will auto-create it (since we may have to
>     make refs/foo/bar/... anyway).
>
>   - what about other implementations? Your embedded repos will
>     presumably not work with libgit2, jgit, etc, until they also get
>     similar patches.
>
>   - what about empty repositories? In that case there will be no "refs/"
>     file and no "packed-refs" file (such a repository is less likely, of
>     course, but it may contain objects but no refs, or the point may be
>     to have an empty repo as a test vector). Likewise, it is possible
>     for a repository to have an empty "objects" directory (even with a
>     non-empty refs directory, if there are only symrefs), and your patch
>     doesn't address that.

All good points.

> Getting back to your use case, I'd suggest one of:
>
>   - do the usual "touch refs/.gitignore" trick to explicitly track the
>     empty directory. It looks like the ref code will ignore this (we
>     don't allow ref names to start with "." in a path component)
>
>   - whatever is consuming the embedded repos could "mkdir -p refs
>     objects" as needed. This is a minor pain, but I think in the long
>     term we are moving to a world where you have to explicitly do
>     "GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
>     they're already special and require some setup; adding an extra step
>     may not be so bad.

Yeah, it truly is caused by the combination of the fact that we do
not "track" empty directories and that skeleton Git repository
structure does rely on possibly empty directories.  The above two
are reasonable workarounds when you are dealing with any medium that
does not allow empty directories, not just working tree managed by
Git.

^ permalink raw reply

* Re: [PATCH 1/1] MyFirstContribution: configure shallow threads for git format-patch
From: Junio C Hamano @ 2023-12-08 18:06 UTC (permalink / raw)
  To: Stan Hu; +Cc: git, Patrick Steinhardt
In-Reply-To: <e197cbd28135df5523ff5ba1688566edb831f037.1701927372.git.stanhu@gmail.com>

Stan Hu <stanhu@gmail.com> writes:

> +[[configure-shallow-threads]]
> +=== Configuring shallow threads for `git format-patch`
> +
> +It is common practice on the Git mailing list to use "shallow" threads,
> +where every mail is a reply to the first mail of the series. You can
> +configure the default threading style of `git format-patch` via:
> +
> +-----
> +$ git config format.thread shallow
> +----

Hmph, I do not think I have such an option defined, yet I send out
my series as "shallow" threads.  I think the new description is
somewhat misleading as-is.  Isn't it applicable ONLY to those who
let "format-patch" decide the message ID for patches?  If the user
lets "git send-email" assign the message IDs (and hence the thread
structure), the configuration variable would not apply, no?

^ permalink raw reply

* getting git send-email patches from someone who is behind
From: Sandra Snan @ 2023-12-08 17:50 UTC (permalink / raw)
  To: git

I have a li'l git send-email question. Someone sent me a patch set 
of six patches today but they were not on most current main. I had 
to guess what version they were sending to so I could git am when 
I was on that particular version. I managed to sort it all out so 
this question is more for future reference.

Isn't there a way inside of the emails that it can show what 
version to apply the patches to?

Because now I was like "OK, I remember talking to them the other 
day and that means they probably are on what for me is HEAD^^" and 
that turned out to be correct, and sorting out the conflicts was 
also easy enough,
but if I hadn't talked to them beforehand I would've been 
completely lost.

I asked another friend about it and he said:

> it's possible to record the base commit: 
> https://git-scm.com/docs/git-format-patch#_base_tree_information
> however, it's a bit finicky to do with git-send-email

I dunno.

I get that one of the fun parts about using patches instead of PRs 
is
that you can be a li'l more loosey goosey about exactly what 
commit
something is supposed to belong to but here I would've been 
completely
lost because the patchset just borked horribly right from the 
first patch.

If others have run into this, what's the solution?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox