Git development
 help / color / mirror / Atom feed
* Re: why does git set X in LESS env var?
From: Christoph Anton Mitterer @ 2023-10-11 22:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqa5sokdd3.fsf@gitster.g>

On Wed, 2023-10-11 at 15:23 -0700, Junio C Hamano wrote:
> I think that was the reason we added it back in 2005.  In any case,
> asking "why" is not a useful use of anybody's time, because it is
> very unlikely to change in the official version we ship, and because
> it is so easy for any individual who does not like it to drop by
> exporting the $LESS environment variable.


Well the other commit I've mentioned kinda read as if it was thought
that either X or both F and X were needed for the effect to exit less
immediately if the output is too short ("F and X because
> sometimes the output Git pipes to less is short").

So I thought maybe that was intended, and the no-clear was just a side-
effect no one ever really thought about.


Anyway, thanks,
Chris.

^ permalink raw reply

* Re: [PATCH] upload-pack: add tracing for fetches
From: Jeff King @ 2023-10-11 22:27 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget; +Cc: git, Robert Coup
In-Reply-To: <pull.1598.git.1697040242703.gitgitgadget@gmail.com>

On Wed, Oct 11, 2023 at 04:04:02PM +0000, Robert Coup via GitGitGadget wrote:

> Improve this by emitting a Trace2 JSON event from upload-pack with
> summary information on the contents of a fetch request.
> 
> * haves, wants, and want-ref counts can help determine (broadly) between
>   fetches and clones, and the use of single-branch, etc.
> * shallow clone depth, tip counts, and deepening options.
> * any partial clone filter type.

I think this is a reasonable thing to have. I see Taylor left a more
detailed review, but I did notice one thing...

> +static void trace2_fetch_info(struct upload_pack_data *data)
> +{
> +	struct json_writer jw = JSON_WRITER_INIT;
> +
> +	jw_object_begin(&jw, 0);
> +	{
> +		jw_object_intmax(&jw, "haves", data->haves.nr);
> +		jw_object_intmax(&jw, "wants", data->want_obj.nr);
> +		jw_object_intmax(&jw, "want-refs", data->wanted_refs.nr);
> +		jw_object_intmax(&jw, "depth", data->depth);
> +		jw_object_intmax(&jw, "shallows", data->shallows.nr);
> +		jw_object_bool(&jw, "deepen-since", data->deepen_since);
> +		jw_object_intmax(&jw, "deepen-not", data->deepen_not.nr);
> +		jw_object_bool(&jw, "deepen-relative", data->deepen_relative);
> +		if (data->filter_options.choice)
> +			jw_object_string(&jw, "filter", list_object_filter_config_name(data->filter_options.choice));
> +		else
> +			jw_object_null(&jw, "filter");
> +	}
> +	jw_end(&jw);
> +
> +	trace2_data_json("upload-pack", the_repository, "fetch-info", &jw);
> +
> +	jw_release(&jw);
> +}

Generating the json output isn't entirely trivial (and certainly
involves allocations), but we throw it away unused if tracing isn't
enabled. Maybe we'd want something like:

  if (!trace2_is_enabled())
          return;

at the top of the function? It looks like other callers of
jw_object_begin() have a similar issue, and this is probably premature
optimization to some degree. It just feels like it should be easy for
tracing to be zero-cost (beyond a single conditional) when it's
disabled.

-Peff

^ permalink raw reply

* Re: [PATCH] upload-pack: add tracing for fetches
From: Robert Coup @ 2023-10-11 22:33 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Robert Coup via GitGitGadget, git
In-Reply-To: <ZScUWrj6CQTg05HL@nand.local>

Hi Taylor,

Thanks for taking a look.

On Wed, 11 Oct 2023 at 22:32, Taylor Blau <me@ttaylorr.com> wrote:
>
> One of the custom patches that GitHub is carrying in its private
> fork is something similar to this that dumps information from
> upload-pack into a custom logging system specific to GitHub

I suspected as much. My initial goal is clones vs fetches, filter use, and
shallow use... since I'm changing code is there anything else that you think is
worthwhile instrumenting? I thought about casting the ints to booleans
(ie: >0 haves; >0 wants; etc) but I figured the actual numbers might come in
handy at some point, and changing them back in future would be incompatible. I
deliberately didn't add thin/ofs/etc flags: most of that stuff is on by default
now so it felt like noise.

> > + jw_object_begin(&jw, 0);
> > + {
>
> Is there a reason that we have a separate scope here? I think we may
> want to drop this as unnecessary, but it's entirely possible that I'm
> missing something here...

I just followed the patterns in test-json-writer.c - personally I think since
JSON objects/arrays have starts & ends then it's easy to read. But one person's
readable is another's coding style violation ;-)

> > + jw_object_string(&jw, "filter", list_object_filter_config_name(data->filter_options.choice));
>
> I'm pretty sure that list_object_filter_config_name() returns characters
> that are safe for JSON-encoding, and/or that jw_object_string() does any
> quoting beforehand, but worth checking nonetheless.

Yes, jw_object_string() calls append_quoted_string() which deals with escaping
and quoting.

Thanks,

Rob :)

^ permalink raw reply

* Re: [PATCH] upload-pack: add tracing for fetches
From: Robert Coup @ 2023-10-11 22:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Robert Coup via GitGitGadget, git
In-Reply-To: <20231011222728.GC518221@coredump.intra.peff.net>

Hi Jeff,

On Wed, 11 Oct 2023 at 23:27, Jeff King <peff@peff.net> wrote:
> I think this is a reasonable thing to have.

Thanks for taking a look

> Generating the json output isn't entirely trivial (and certainly
> involves allocations), but we throw it away unused if tracing isn't
> enabled. Maybe we'd want something like:
>
>   if (!trace2_is_enabled())
>           return;
>
> at the top of the function? It looks like other callers of
> jw_object_begin() have a similar issue, and this is probably premature
> optimization to some degree. It just feels like it should be easy for
> tracing to be zero-cost (beyond a single conditional) when it's
> disabled.

This definitely makes sense to me. Will add to a v2.

Thanks, Rob :)

^ permalink raw reply

* Re: [PATCH] send-email: move process_address_list earlier to avoid, uninitialized address error
From: Junio C Hamano @ 2023-10-11 22:37 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Strawbridge, Bagas Sanjaya, Todd Zullinger, Luben Tuikov,
	Ævar Arnfjörð Bjarmason, Taylor Blau,
	Git Mailing List
In-Reply-To: <20231011221844.GB518221@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Which of course implies that we're not (and cannot) validate what
> they're typing at this step, but I think that's OK because we feed it
> through extract_valid_address_or_die().

OK, let's queue it then.

> IOW, I think there are actually two distinct validation steps
> hidden here:
>
>   1. We want to validate that the patch files we were fed are OK.
>
>   2. We want to validate that the addresses, etc, fed by the user are
>      OK.
>
> And after Michael's original patch, we are accidentally hitting some of
> that validation code for (2) while doing (1).

> This is actually a weird split if you think about it. We are feeding to
> the validate hook in (1), so surely it would want to see the full set of
> inputs from the user, too? Which argues for pushing the "if ($validate)"
> down as you suggest. And then either:
>
>   a. We accept that the user experience is a little worse if validation
>      fails after the user typed.
>
>   b. We split (1) into "early" validation that just checks if the files
>      are OK, but doesn't call the hook. And then later on we do the full
>      validation.
>
> I don't have a strong opinion myself (I don't even use send-email
> myself, and if I did, I'd probably mostly be feeding it with "--to" etc
> on the command line, rather than interactively).

I am not affected, either, and do not have a strong opinion either
way.  As long as the end-user input is validated separately, it
would be OK, but if the end-user supplied validation hook cares
about what addresses the messages are going to be sent to, not
knowing the set of recipients mean the validation hook is not
getting the whole picture, which does smell bad.

On the other hand, I am not sure what is wrong with "after the user
typed", actually.  As you said, anybody sane would be using --to (or
an equivalent configuration variable in the repository) to send
their patches to the project address instead of typing, and to them
it is not a problem.  After getting the recipient address from the
end user, the validation may fail due to a wrong address, in which
case it is a good thing.  If the validation failed due to wrong
contents of the patch (perhaps it included a change to the file with
trade secret that appeared in the context lines), as long as the
reason why the validation hook rejected the patches is clear enough
(e.g., "it's the patches, not the recipients"), such "a rejection
after typing" would be only once per a patch series, so it does not
sound too bad, either.

But perhaps I am not seeing the reason why "fail after the user typed"
is so disliked and being unnecessarily unsympathetic.  I dunno.


^ permalink raw reply

* Re: [RFC] Define "precious" attribute and support it in `git clean`
From: Jeff King @ 2023-10-11 22:40 UTC (permalink / raw)
  To: Richard Kerry; +Cc: git@vger.kernel.org
In-Reply-To: <AS8PR02MB73027943EE0A30DD8DAAD4639CCCA@AS8PR02MB7302.eurprd02.prod.outlook.com>

On Wed, Oct 11, 2023 at 10:06:25AM +0000, Richard Kerry wrote:

> The version of CVS that I used to use, CVSNT, was a lot more careful
> about the user's files than Git is inclined to be.
> If CVSNT, while doing an Update, came across a non-tracked file that
> was in the way of something that it wanted to write, then the Update
> would be aborted showing a list of any files that were "in the way".
> The user could then rename/delete them or redo the Update with a
> "force" parameter to indicate that such items could be overwritten.
> Git has tended to take an approach of "if it's important it'll be
> tracked by Git - anything else can be trashed with impunity.".  Over
> the years people have been caught out by this and lost work.  It may
> well be that in a Linux development world anything other than tracked
> source files can be summarily deleted, but in a wider world, like
> Windows, or environments that are not software development, or that
> need special files lying around, this is not always an entirely
> reasonable approach.

I'm not sure if you are just skipping the details of ".gitignore" here,
but to be clear, blowing away untracked files is _not_ Git's default
behavior.

For example:

  [sample repo with established history]
  $ git init
  $ echo content >base
  $ git add base
  $ git commit -m base

  [one branch touches some-file]
  $ git checkout -b side-branch
  $ echo whatever >some-file
  $ git add some-file
  $ git commit -m 'add some-file'

  [but back on master/main, it is untracked]
  $ git checkout main
  $ echo precious >some-file

  [an operation that tries to overwrite the untracked file will fail]
  $ git checkout side-branch
  $ git checkout side-branch
  error: The following untracked working tree files would be overwritten by checkout:
	some-file
  Please move or remove them before you switch branches.
  Aborting

  [providing --force will obliterate it]
  $ git checkout --force side-branch
  Switched to branch 'side-branch'

The issue that people sometimes find with Git is when the user has
explicitly listed a file in ".gitignore", Git takes that to mean it
should never be tracked _and_ it is not precious. But people sometimes
want a way to say "this should never be tracked, but keep treating it as
precious in the usual way".

From the description above it might sound like Git's current behavior is
conflating two orthogonal things, but if you switched the default
behavior of .gitignore'd files to treat them as precious, you will find
lots of cases that are annoying. E.g., if a file is generated by some
parts of history and tracked in others, you'd have to use --force to
move between them to overwrite the generated version.

-Peff

^ permalink raw reply

* Re: [PATCH] send-email: move process_address_list earlier to avoid, uninitialized address error
From: Jeff King @ 2023-10-11 22:47 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Michael Strawbridge, Bagas Sanjaya, Todd Zullinger, Luben Tuikov,
	Ævar Arnfjörð Bjarmason, Taylor Blau,
	Git Mailing List
In-Reply-To: <xmqqzg0oiy4s.fsf@gitster.g>

On Wed, Oct 11, 2023 at 03:37:39PM -0700, Junio C Hamano wrote:

> On the other hand, I am not sure what is wrong with "after the user
> typed", actually.  As you said, anybody sane would be using --to (or
> an equivalent configuration variable in the repository) to send
> their patches to the project address instead of typing, and to them
> it is not a problem.  After getting the recipient address from the
> end user, the validation may fail due to a wrong address, in which
> case it is a good thing.  If the validation failed due to wrong
> contents of the patch (perhaps it included a change to the file with
> trade secret that appeared in the context lines), as long as the
> reason why the validation hook rejected the patches is clear enough
> (e.g., "it's the patches, not the recipients"), such "a rejection
> after typing" would be only once per a patch series, so it does not
> sound too bad, either.
> 
> But perhaps I am not seeing the reason why "fail after the user typed"
> is so disliked and being unnecessarily unsympathetic.  I dunno.

I did not look carefully at the flow of send-email, so this may or may
not be an issue. But what I think would be _really_ annoying is if you
asked to write a cover letter, went through the trouble of writing it,
and then send-email bailed due to some validation failure that could
have been checked earlier.

There is probably a way to recover your work (presumably we leave it in
a temporary file somewhere), but it may not be entirely trivial,
especially for users who are not comfortable with advanced usage of
their editor. ;)

I seem to remember we had one or two such problems in the early days
with "git commit", where you would go to the trouble to type a commit
message only to bail on some condition which _could_ have been checked
earlier. You can recover the message from .git/COMMIT_EDITMSG, but you
need to remember to do so before re-invoking "git commit", otherwise it
gets obliterated.

Now for send-email, if your flow is to generate the patches with
"format-patch", then edit the cover letter separately, and then finally
ship it all out with "send-email", that might not be an issue. But some
workflows use the --compose option instead.

-Peff

^ permalink raw reply

* Re: [PATCH 01/20] chunk-format: note that pair_chunk() is unsafe
From: Jeff King @ 2023-10-11 22:49 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZSXiJZFnmpzAmuwx@nand.local>

On Tue, Oct 10, 2023 at 07:45:41PM -0400, Taylor Blau wrote:

> On Mon, Oct 09, 2023 at 04:58:23PM -0400, Jeff King wrote:
> > There are no callers of the "safe" version yet, but we'll add some in
> > subsequent patches.
> 
> Makes sense.
> 
> > +int pair_chunk_unsafe(struct chunkfile *cf,
> > +		      uint32_t chunk_id,
> > +		      const unsigned char **p)
> >  {
> > -	return read_chunk(cf, chunk_id, pair_chunk_fn, p);
> > +	size_t dummy;
> > +	return pair_chunk(cf, chunk_id, p, &dummy);
> 
> I have always disliked functions that require you to pass a non-NULL
> pointer to some value that you may or may not want to have that function
> fill out. So I was going to suggest something along the lines of
> "pair_chunk() should tolerate a NULL fourth argument instead of filling
> out the size unconditionally".
> 
> But that's (a) the whole point of the series ;-), and (b) unnecessary,
> since this function is going to go away entirely by the end of the
> series, too.

Yeah, for the record, I think a dummy variable like this is usually a
code smell. And it truly is a "problem" here, because we are
intentionally doing the unsafe and stupid thing. :)

-Peff

^ permalink raw reply

* Re: why does git set X in LESS env var?
From: Dragan Simic @ 2023-10-11 22:51 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: Junio C Hamano, git
In-Reply-To: <0c10c4b95f2a947a5d569a2c3d51fcb02b35e81d.camel@scientia.org>

On 2023-10-12 00:26, Christoph Anton Mitterer wrote:
> On Wed, 2023-10-11 at 15:23 -0700, Junio C Hamano wrote:
>> I think that was the reason we added it back in 2005.  In any case,
>> asking "why" is not a useful use of anybody's time, because it is
>> very unlikely to change in the official version we ship, and because
>> it is so easy for any individual who does not like it to drop by
>> exporting the $LESS environment variable.
> 
> Well the other commit I've mentioned kinda read as if it was thought
> that either X or both F and X were needed for the effect to exit less
> immediately if the output is too short ("F and X because
> sometimes the output Git pipes to less is short").

In general, not clearing the screen (i.e. "-X") is there so the 
displayed contents is still visible in the terminal after exiting the 
pager.  That wouldn't be the case if the screen was cleared, making it 
less usable for most users.

Exiting if less contents than one full screen was displayed (i.e. "-F") 
is there to save people from the frustration of quitting a pager that 
actually wasn't needed to be executed.

When it comes to "-R", it has to be there, otherwise no coloring of the 
paginated output could be possible.

> So I thought maybe that was intended, and the no-clear was just a side-
> effect no one ever really thought about.
> 
> 
> Anyway, thanks,
> Chris.

^ permalink raw reply

* Re: [PATCH 03/20] midx: stop ignoring malformed oid fanout chunk
From: Jeff King @ 2023-10-11 22:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZSXjR6BjvaokNwPe@nand.local>

On Tue, Oct 10, 2023 at 07:50:31PM -0400, Taylor Blau wrote:

> All makes sense. I have a mild preference for "missing or corrupt" over
> "missing or corrupted", but it's mild enough that I wouldn't be sad if
> you kept it as-is ;-).

Hmm, now I wonder which is grammatically more appropriate.

According to the internet (which is never wrong), the distinction is
generally one of intrinsic corrupt state versus something that has
become corrupted. So I guess it depends on whether the file was created
corrupt by a bug or corrupted by a hard disk failure. ;)

I do not have a strong preference myself. The commits went into next
already, but I think you also pointed out that many of these could be
marked for translation (though this one is already).

So we could do one or two patches to adjust the error messages on top.

-Peff

^ permalink raw reply

* Re: [PATCH 04/20] commit-graph: check size of oid fanout chunk
From: Jeff King @ 2023-10-11 23:01 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZSXnZXglgbfK3VYd@nand.local>

On Tue, Oct 10, 2023 at 08:08:05PM -0400, Taylor Blau wrote:

> Nice. This makes sense and seems like an obvious improvement over the
> existing code.
> 
> I wonder how common this pattern is. We have read_chunk() which is for
> handling more complex scenarios than this. But the safe version of
> pair_chunk() really just wants to check that the size of the chunk is as
> expected and assign the location in the mmap to some pointer.

Sometimes yes, sometimes no. For fixed-size ones like this, that's
sufficient. For others we have to record the size and use it for later
bounds-checking. IIRC it's about 50/50 between the two.

> Do you think it would be worth changing pair_chunk() to take an expected
> size_t and handle this generically? I.e. have a version of
> chunk-format::pair_chunk_fn() that looks something like:
> 
>     static int pair_chunk_fn(const unsigned char *chunk_start,
>                              size_t chunk_size, void *data)
>     {
>         const unsigned char **p = data;
>         if (chunk_size != data->size)
>             return -1;
>         *p = chunk_start;
>         return 0;
>     }
> 
> and then our call here would be:
> 
>   if (pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT,
>                  (const unsigned char **)&graph->chunk_oid_fanout,
>                  256 * sizeof(uint32_t)) < 0)
>       return error("commit-graph oid fanout chunk is wrong size");
> 
> I dunno. It's hard to have a more concrete recomendation without having
> read the rest of the series. So it's possible that this is just complete
> nonsense ;-). But my hunch is that there are a number of callers that
> would benefit from having this built in.

I don't think it's nonsense, and I do think other callers would benefit.
On the other hand, I kind of like the notion that there is a complete
validation callback for each of these chunks. Even though it just checks
the size for now, it could handle other things. In the case of OIDF, for
example, we can check whether the entries are monotonic. It's just that
we happen to do those checks elsewhere.

Hmm, actually, looking at that again, I think I may have missed a case
in patch 6. For pack .idx files, we check the fanout table when they are
loaded. And patch 6 adds the same for commit-graph files. I thought midx
files were handled the same .idx, but looking at it again, I only see
the monotonicity check in the "multi-pack-index verify" code paths. So
it might need the same treatment.

I'm not sure how I missed that (I started by making a corrupted midx
first and couldn't get it to fail, which is when I discovered the
existing checks, but maybe I am mixing up .idx and midx in my memory).

-Peff

^ permalink raw reply

* Re: [PATCH v8 1/3] unit tests: Add a project plan document
From: Oswald Buddenhagen @ 2023-10-11 23:05 UTC (permalink / raw)
  To: Josh Steadmon, git, phillip.wood123, linusa, calvinwan, gitster,
	rsbecker
In-Reply-To: <ZScQG5QHznMEGzhC@google.com>

On Wed, Oct 11, 2023 at 02:14:03PM -0700, Josh Steadmon wrote:
>On 2023.10.10 10:57, Oswald Buddenhagen wrote:
>> On Mon, Oct 09, 2023 at 03:21:20PM -0700, Josh Steadmon wrote:
>> > +=== Comparison
>> > +
>> > +[format="csv",options="header",width="33%"]
>> > +|=====
>> > +Framework,"<<license,License>>","<<vendorable-or-ubiquitous,Vendorable or ubiquitous>>","<<maintainable-extensible,Maintainable / extensible>>","<<major-platform-support,Major platform support>>","<<tap-support,TAP support>>","<<diagnostic-output,Diagnostic output>>","<<runtime--skippable-tests,Runtime- skippable tests>>","<<parallel-execution,Parallel execution>>","<<mock-support,Mock support>>","<<signal-error-handling,Signal & error handling>>","<<project-kloc,Project KLOC>>","<<adoption,Adoption>>"
>> > 
>> the redundancy seems unnecessary; asciidoc should automatically use each
>> target's section title as the xreflabel.
>
>Hmm, this doesn't seem to work for me. It only renders as
>"[anchor-label]".
>
i thought
https://docs.asciidoctor.org/asciidoc/latest/attributes/id/#customize-automatic-xreftext 
is pretty clear about it, though. maybe the actual tooling uses an older 
version of the spec? or is buggy? or the placement of the titles is 
incorrect? or this applies to different links or targets only? or am i 
misreading something? or ...?

regards

^ permalink raw reply

* Re: [PATCH 07/20] midx: check size of pack names chunk
From: Jeff King @ 2023-10-11 23:06 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZSa2nbVDTXvFZvSx@nand.local>

On Wed, Oct 11, 2023 at 10:52:13AM -0400, Taylor Blau wrote:

> On Mon, Oct 09, 2023 at 05:05:14PM -0400, Jeff King wrote:
> > @@ -176,9 +176,16 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
> >
> >  	cur_pack_name = (const char *)m->chunk_pack_names;
> >  	for (i = 0; i < m->num_packs; i++) {
> > +		const char *end;
> > +		size_t avail = m->chunk_pack_names_len -
> > +				(cur_pack_name - (const char *)m->chunk_pack_names);
> > +
> 
> This patch all looks good to me, but reading this hunk gave me a little
> bit of pause. I was wondering what might happen if chunk_pack_names_len
> was zero, and subtracting some other non-zero size_t from it might cause
> us to wrap around.
> 
> But I think that's a non-issue here, since we'd set cur_pack_name to the
> beginning of the chunk, and compute avail as 0 - (m->chunk_pack_names -
> m->chunk_pack_names), and get 0 back, causing our memchr() lookup below
> to fail, and for us to report this chunk is garbage.

Right. If it is 0, then we should have 0 avail here in the first loop.

I was more worried while writing this that:

  cur_pack_name = end + 1;

later in the loop could get us an off-by-one. But we know we are always
pointing to one past an available NUL there, so at most our subtraction
will equal m->chunk_pack_names_len.

> And since cur_pack_name monotonically increases, I think that there is
> an inductive argument to be made that this subtraction is always safe to
> do. But it couldn't hurt to do something like:
> 
>     size_t read = cur_pack_name - (const char *)m->chunk_pack_names;
>     size_t avail = m->chunk_pack_names_len;
> 
>     if (read > avail)
>         die("...");
>     avail -= read;
> 
> to make absolutely sure that we would never underflow here.

You end up with two die() calls, then. One for "hey, we somehow read too
far", and one for "hey, I ran out of data while reading the entry". And
the first one cannot be triggered.

IOW, I think your die() here is a BUG().

-Peff

^ permalink raw reply

* Re: [PATCH 08/20] midx: enforce chunk alignment on reading
From: Jeff King @ 2023-10-11 23:09 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZSa4tfG2NyM1s9Jz@nand.local>

On Wed, Oct 11, 2023 at 11:01:09AM -0400, Taylor Blau wrote:

> On Mon, Oct 09, 2023 at 05:05:23PM -0400, Jeff King wrote:
> > @@ -120,6 +121,11 @@ int read_table_of_contents(struct chunkfile *cf,
> >  			error(_("terminating chunk id appears earlier than expected"));
> >  			return 1;
> >  		}
> > +		if (chunk_offset % expected_alignment != 0) {
> 
> Oops, I missed this in my first read. I'm definitely nit-picking here,
> but this should probably be:
> 
>     if (chunk_offset % expected_alignment)
> 
> without the trailing "!= 0".
> 
> I don't have a strong preference here, since we are doing a comparison
> of an arithmetic operation against an (un-)expected value. But I think
> the CodingGuidelines would technically call this out of style...

I suppose so, but somehow I consider the subtlety of "%" to merit the
more explicit comparison (versus something like "if (foo)"). Grepping
for "if (.* %)' seems to show some of both.

-Peff

^ permalink raw reply

* Re: [PATCH 10/20] midx: bounds-check large offset chunk
From: Jeff King @ 2023-10-11 23:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZSbrj3hmm8H7ce2l@nand.local>

On Wed, Oct 11, 2023 at 02:38:07PM -0400, Taylor Blau wrote:

> >  		offset32 ^= MIDX_LARGE_OFFSET_NEEDED;
> > -		return get_be64(m->chunk_large_offsets +
> > -				st_mult(sizeof(uint64_t), offset32));
> > +		if (offset32 >= m->chunk_large_offsets_len / sizeof(uint64_t))
> > +			die(_("multi-pack-index large offset out of bounds"));
> > +		return get_be64(m->chunk_large_offsets + sizeof(uint64_t) * offset32);
> 
> Makes sense, and this seems very reasonable. I had a couple of thoughts
> on this hunk, one nitpick, and one non-nitpick ;-).
> 
> The nitpick is the easy one, which is that I typically think of scaling
> some index to produce an offset into some chunk, instead of dividing and
> going the other way.
> 
> So I probably would have written something like:
> 
>     if (st_mult(offset32, sizeof(uint64_t)) >= m->chunk_large_offsets_len)
>         die(_("multi-pack-index large offset out of bounds"));

Yeah, I admit that my inclination is to think of it that way, too, and
the division is a bit of an inversion. But I guess I am hard-wired to do
bounds checks with division, because I know it avoids overflow issues.
And the behavior is actually different, since st_mult() dies (with a
somewhat vague message) rather than returning with an error.

I was actually tempted to write a small inline helper to do
bounds-checks (since this pattern appears a few times in this series).
But of course it suffers from the same issues (it dies with a vague
message, or it returns a result code and then is awkward to call/use
because you have to stick the output in an out-parameter).

> But that is definitely a subjective/minor point, and I would not at all
> be sad if you felt differently about it. That said, I do wish for a
> little more information in the die() message, perhaps:
> 
>     if (st_mult(offset32, sizeof(uint64_t)) >= m->chunk_large_offsets_len)
>         die(_("multi-pack-index large offset for %s out of bounds "
>               "(%"PRIuMAX" is beyond %"PRIuMAX")"),
>             (uintmax_t)(offset32 * sizeof(uint64_t)), /* checked earlier */
>             (uintmax_t)m->chunk_large_offsets_len);
> 
> I can imagine that for debugging corrupt MIDXs in the future, having
> some extra information like the above would give us a better starting
> point than popping into a debugger to get the same information.

As you'll see when you get to the BDAT/BIDX messages, I put in a load of
detail. That's because I did those ones first, and then after seeing the
terse "eh, it's the wrong size" messages in the rest of the commit-graph
and midx code, I just followed suit.

We can circle back and improve the detail in the messages. One slightly
annoying thing is dealing with it in the tests. We'd have to do one of:

  - make the tests more brittle by hard-coding positions and offsets we
    don't necessarily care about

  - loosen the tests by just matching with "grep", which can sometimes
    miss other unexpected output

  - do some post-processing on the output to massage out the details we
    don't care about; this in theory is the best of both worlds, but
    reliably post-processing is non-trivial.

So of the three I'd probably just loosen to use "grep" in most spots.

-Peff

^ permalink raw reply

* Re: [PATCH 12/20] commit-graph: check size of commit data chunk
From: Jeff King @ 2023-10-11 23:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZSbthK5919QdrEPx@nand.local>

On Wed, Oct 11, 2023 at 02:46:28PM -0400, Taylor Blau wrote:

> > +static int graph_read_commit_data(const unsigned char *chunk_start,
> > +				  size_t chunk_size, void *data)
> > +{
> > +	struct commit_graph *g = data;
> > +	if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)
> 
> Should this be guarded with an st_mult? I think that GRAPH_DATA_WIDTH is
> defined as (the_hash_algo->rawsz + 16), so I *think* that the expression
> in the parenthesis would get done as a size_t, and then g->num_commits
> would be widened to a size_t for the purposes of evaluating this
> expression.
> 
> So I think that this is all OK in the sense that we'd never underflow
> the 64-bit space, and having more than 2^64-1/36 (some eighteen
> quintillion) commits is... unlikely ;-).

Hmm, yeah, I think you are right, but I agree it's awfully subtle. There
is no reason somebody couldn't later change "rawsz" to a smaller type
(after all, we know it's going to be tiny), and it would be quite
surprising if that introduces an overflow in far-away code. We should
protect ourselves here.

-Peff

^ permalink raw reply

* Re: why does git set X in LESS env var?
From: Christoph Anton Mitterer @ 2023-10-11 23:16 UTC (permalink / raw)
  To: Dragan Simic; +Cc: Junio C Hamano, git
In-Reply-To: <eadc03fc56d530ea31790f8a4b47a16e@manjaro.org>

On Thu, 2023-10-12 at 00:51 +0200, Dragan Simic wrote:
> In general, not clearing the screen (i.e. "-X") is there so the 
> displayed contents is still visible in the terminal after exiting the
> pager.  That wouldn't be the case if the screen was cleared, making
> it 
> less usable for most users.

Well, I personally, also prefer it that way... but I'd also say that
just like in the case of `S`, this is not really needed from the git
side, but rather simply a user choice.

And since, if the output did not fit one one screen, the non-cleared
remains may likely be chopped off,... one could argue that some users
would actually prefer to have it cleared.


> Exiting if less contents than one full screen was displayed (i.e. "-
> F") 
> is there to save people from the frustration of quitting a pager that
> actually wasn't needed to be executed.

Same actually here, at least strictly speaking, ... though I (and
probably everybody else?) would really hate it, if that was removed. ^^


Anyway... that's no request from my side to change the default. I just
wanted to know whether that don't-clear-the-screen part was the
motivation for the `X`.


In case someone cares, I've asked less upstream whether there's a way
to have VTE scrolling work with -X:
https://github.com/gwsw/less/issues/445


Thanks,
Chris.

^ permalink raw reply

* Re: [PATCH 17/20] commit-graph: check bounds when accessing BDAT chunk
From: Jeff King @ 2023-10-11 23:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZSbzf0o7r5GWAZFF@nand.local>

On Wed, Oct 11, 2023 at 03:11:59PM -0400, Taylor Blau wrote:

> > +	if (check_bloom_offset(g, lex_pos, end_index) < 0 ||
> > +	    check_bloom_offset(g, lex_pos - 1, start_index) < 0)
> 
> Can lex_pos ever be zero? I can't think of any reason that it couldn't,
> and indeed the (elided) diff context shows that immediately preceding
> this if-statement is another that checks "if (lex_pos > 0)".
> 
> So I think we'd want to avoid checking lex_pos - 1 if we know that
> lex_pos is zero. Not that any of this really matters, since the only
> thing we use the computed value for is showing the graph position in the
> warning() message. So seeing a bogus value there isn't the end of the
> world.

If lex_pos is 0, then we set start_index to 0 manually, which we know to
be valid. So we can't trigger a bogus warning(). My thinking was that it
was OK to just let this fall out naturally as it does here, since it
makes the code shorter. But if we want to cover that case separately,
I think you'd want to split the checks like:

diff --git a/bloom.c b/bloom.c
index 1474aa19fa..d9ad69ad82 100644
--- a/bloom.c
+++ b/bloom.c
@@ -65,16 +65,16 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 	lex_pos = graph_pos - g->num_commits_in_base;
 
 	end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
+	if (check_bloom_offset(g, lex_pos, end_index) < 0)
+		return 0;
 
-	if (lex_pos > 0)
+	if (lex_pos > 0) {
 		start_index = get_be32(g->chunk_bloom_indexes + 4 * (lex_pos - 1));
-	else
+		if (check_bloom_offset(g, lex_pos - 1, start_index) < 0)
+			return 0;
+	} else
 		start_index = 0;
 
-	if (check_bloom_offset(g, lex_pos, end_index) < 0 ||
-	    check_bloom_offset(g, lex_pos - 1, start_index) < 0)
-		return 0;
-
 	if (end_index < start_index) {
 		warning("ignoring decreasing changed-path index offsets"
 			" (%"PRIuMAX" > %"PRIuMAX") for positions"

-Peff

^ permalink raw reply related

* Re: why does git set X in LESS env var?
From: Dragan Simic @ 2023-10-11 23:29 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: Junio C Hamano, git
In-Reply-To: <ec43820562198de078db7df54d0338edf1f333ea.camel@scientia.org>

On 2023-10-12 01:16, Christoph Anton Mitterer wrote:
> On Thu, 2023-10-12 at 00:51 +0200, Dragan Simic wrote:
>> In general, not clearing the screen (i.e. "-X") is there so the
>> displayed contents is still visible in the terminal after exiting the
>> pager.  That wouldn't be the case if the screen was cleared, making
>> it
>> less usable for most users.
> 
> Well, I personally, also prefer it that way... but I'd also say that
> just like in the case of `S`, this is not really needed from the git
> side, but rather simply a user choice.

It's about providing a set of sane defaults for less(1), which other 
utilities also do, including dmesg, for example.  Of course, everyone 
can set $PAGER or $GIT_PAGER to fit their own prereferences.

> And since, if the output did not fit one one screen, the non-cleared
> remains may likely be chopped off,... one could argue that some users
> would actually prefer to have it cleared.

I'm not sure what do you mean by the non-cleared remains being chopped 
off...  Could you clarify that a bit, please?

As I already mentioned above, everyone is free to configure the pager 
behavior in any way they like.

>> Exiting if less contents than one full screen was displayed (i.e. "-
>> F")
>> is there to save people from the frustration of quitting a pager that
>> actually wasn't needed to be executed.
> 
> Same actually here, at least strictly speaking, ... though I (and
> probably everybody else?) would really hate it, if that was removed. ^^

I'm afraid that I don't understand very well are you complaining about 
the presence of "-F" or not?

> Anyway... that's no request from my side to change the default. I just
> wanted to know whether that don't-clear-the-screen part was the
> motivation for the `X`.

AFAIK, there should be no motivation other than not clearing the screen. 
  Other utilities that invoke the pager internally configure the default 
pager options in a very similar way.

> In case someone cares, I've asked less upstream whether there's a way
> to have VTE scrolling work with -X:
> https://github.com/gwsw/less/issues/445

Quite frankly, I can't stand scrolling in less(1) using the mouse wheel, 
but I do understand why some people like it.

^ permalink raw reply

* Re: [PATCH 0/20] bounds-checks for chunk-based files
From: Jeff King @ 2023-10-11 23:31 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZSb1QAaLX+xcZK4a@nand.local>

On Wed, Oct 11, 2023 at 03:19:28PM -0400, Taylor Blau wrote:

> I reviewed this carefully (well, except for the new Perl script, for
> obvious[^1] reasons ;-)).
> 
> Everything mostly looks good to me, though I
> had a handful of review comments throughout. Many of them are trivial
> (e.g. a number of warning() and error() strings should be marked for
> translation, etc.), but a couple of them I think are worth looking at.

Thanks for taking a look. I think it may make sense to come back on top
and adjust a few of the commit messages, along with adding a few
st_mult() overflow checks that you suggest.

> Most notably, I think that by the end of the series, I was convinced
> that having some kind of 'pair_chunk_expectsz()' or similar would be
> useful and eliminate a good chunk of the boilerplate you have to write
> to check the chunk size against an expected value when using
> read_chunk().

This I'm less convinced by. In fact, I _almost_ just dropped
pair_chunk() entirely. Adding an out-parameter for the size at least
forces the caller to consider what to do with the size. But really, I
think the right mindset is "we should be sanity-checking this chunk as
we load it". And having a callback, even if it is a little bit of
boilerplate, helps set that frame of mind.

I dunno. Maybe that is all just programmer pseudo-psychology. But I also
don't like that about half the calls to pair_chunk() can't do a size
check (so we need two functions, or to make the "expect" parameter
optional).

-Peff

^ permalink raw reply

* Re: [RFC] Define "precious" attribute and support it in `git clean`
From: Junio C Hamano @ 2023-10-11 23:35 UTC (permalink / raw)
  To: Richard Kerry; +Cc: git@vger.kernel.org
In-Reply-To: <AS8PR02MB73027943EE0A30DD8DAAD4639CCCA@AS8PR02MB7302.eurprd02.prod.outlook.com>

Richard Kerry <richard.kerry@eviden.com> writes:

> An option might be to state, in config, whether a project, or
> everything, should be managed on the basis of "all untracked files
> are precious" or "files may be explicitly marked precious", or, as
> now, "nothing is precious".

I do not think there is any need to have a separate "all or none"
option.  We do not have to make things more complicated than
necessary.

If all untracked files are precious, a user should be able to say so
with an entry that matches all paths "*" to mark them precious, and
nothing more needs to be done.  By default nothing is ignored and
nothing is precious, until you start marking paths with .gitignore
entries.

^ permalink raw reply

* Re: gpg.ssh.defaultKeyCommand docs bug?
From: Jeff King @ 2023-10-11 23:41 UTC (permalink / raw)
  To: matthew sporleder; +Cc: Fabian Stelzer, Git Mailing List
In-Reply-To: <CAHKF-AsjY_P6mbAs7KWcgL39KbLbu9OE9XiLabghhTn-f0ybzQ@mail.gmail.com>

On Wed, Oct 11, 2023 at 02:16:27PM -0400, matthew sporleder wrote:

> It gave very confusing errors!
> 
> key::ssh-rsa ABC123 me@localhost (no new line)
> error: Load key "....: invalid format?

It's hard to say without seeing the whole output, but I suspect this is
actually coming from ssh, not Git. We just dump the output into a
tempfile and feed it to "ssh-keygen -f".

Though I'd think we would see the same issue with user.signingKey in
that case.

So I'm not sure what's going on here (I haven't set up ssh signing to
play with yet).

> key::ABC123 (yes new line)
> error: Couldn't load public key ...: No such file or directory?

That one makes sense to me. The "ssh-rsa" part is important, because
without it, ssh-keygen has no idea what format it is in.

> key::ssh-rsa ABC123 me@localhost (yes new line)
> works, I think

This is the recommended format.

> ssh-rsa ABC123 me@localhost (yes new line)
> works (the script I provided)

And this is the historical one.

So I don't think the documentation is _wrong_ here, but I agree that it
is a bit on the confusing side (especially understanding that "key::"
came later, and that raw "ssh-rsa" is deprecated, which is only
mentioned in user.signingKey, not gpg.ssh.defaultKeyCommand.

And I'm still not sure what's going on with your no-new-line example,
which I'd have expected to work.

-Peff

^ permalink raw reply

* Re: why does git set X in LESS env var?
From: Christoph Anton Mitterer @ 2023-10-11 23:43 UTC (permalink / raw)
  To: Dragan Simic; +Cc: Junio C Hamano, git
In-Reply-To: <6457310b8ca0e7d3b288a3bbbe264012@manjaro.org>

On Thu, 2023-10-12 at 01:29 +0200, Dragan Simic wrote:
> I'm not sure what do you mean by the non-cleared remains being
> chopped 
> off...  Could you clarify that a bit, please?

Well if I do say:
$ reset
$ git diff HEAD~10

and from there scroll down a bit and then q to exit less (and the
screen is not cleared), I see the output only so far as I've had
previously scrolled down in less.

Everything that would have come after that is of course not visible.
The place where I exited may be some "well defined" border, like the
end of a commit... or anywhere it the middle of a patch (making the
left over remains on the terminal perhaps even ambiguous).

What's worse, when (in less) I scroll down and up again, perhaps
repeating several times, and then quit... I see (at least in my
less/terminal combination) things twice and mangled up (i.e. when I
scroll up the terminal (outside of less)).

So AFAICS, not clearing the screen only works properly when never
scrolling up again (in less).


> As I already mentioned above, everyone is free to configure the pager
> behavior in any way they like.

Sure :-)

> 
> > > Exiting if less contents than one full screen was displayed (i.e.
> > > "-
> > > F")
> > > is there to save people from the frustration of quitting a pager
> > > that
> > > actually wasn't needed to be executed.
> > 
> > Same actually here, at least strictly speaking, ... though I (and
> > probably everybody else?) would really hate it, if that was
> > removed. ^^
> 
> I'm afraid that I don't understand very well are you complaining
> about 
> the presence of "-F" or not?

No :-) As I've said, I like it that way and I and probably everyone
else would be annoyed, if -F was not present.

I just meant that strictly speaking the same reason why "S" was
removed, could be applied to "F" as well.

It is - like -R - not necessary for less to work with git.

But it is, of course, what virtually everyone will want in practise.


> Quite frankly, I can't stand scrolling in less(1) using the mouse
> wheel, 
> but I do understand why some people like it.

The main reason I want it is, that things don't get messy, when I
forget being in less and mouse scroll. ;-)


Thanks,
Chris.

^ permalink raw reply

* Re: why does git set X in LESS env var?
From: Jeff King @ 2023-10-12  0:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christoph Anton Mitterer, git
In-Reply-To: <xmqqa5sokdd3.fsf@gitster.g>

On Wed, Oct 11, 2023 at 03:23:20PM -0700, Junio C Hamano wrote:

> Christoph Anton Mitterer <calestyo@scientia.org> writes:
> 
> > But I still don't get from that why X would be needed?
> >
> > My less manpage documents it as:
> >> -X or --no‐init
> >>     Disables sending the termcap initialization and deinitialization
> >>     strings to the terminal.  This is sometimes desirable if the
> >>     deinitialization string does something unnecessary, like clearing
> >>     the screen.
> >
> > Is it to avoid clearing the screen?
> 
> I think that was the reason we added it back in 2005.  In any case,
> asking "why" is not a useful use of anybody's time, because it is
> very unlikely to change in the official version we ship, and because
> it is so easy for any individual who does not like it to drop by
> exporting the $LESS environment variable.

I agree it is probably not worth changing now, but I think the history
here is a little interesting.

Yes, I think "X" was added because less would clear the screen after
exiting, and with "F" this meant you'd see nothing. Here's a thread from
the same time period discussing it:

  https://lore.kernel.org/git/cc723f590610210623sbee2075i5f2fd441cceb84ae@mail.gmail.com/

But I also think this was a pretty well-known annoyance with "less" back
then.

However, I can't seem to reproduce it now! Digging into the history and
the changelog, this note is in "changes between less versions 487 and
530":

  Don't output terminal init sequence if using -F and file fits on one
  screen.

So it seems like the problem has been fixed inside less for recent
versions. And in theory we _could_ drop "-X" if it is causing problems.
That version of less is ~5 years old. It does seem a little premature to
assume everybody has it. And as you say, if there are people who really
care about their LESS options, it is easy for them to override it.

-Peff

^ permalink raw reply

* Re: why does git set X in LESS env var?
From: Dragan Simic @ 2023-10-12  0:06 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: Junio C Hamano, git
In-Reply-To: <fbb3c2bf1c832f0f16cb913da6b862dd313359ef.camel@scientia.org>

On 2023-10-12 01:43, Christoph Anton Mitterer wrote:
> On Thu, 2023-10-12 at 01:29 +0200, Dragan Simic wrote:
>> I'm not sure what do you mean by the non-cleared remains being
>> chopped
>> off...  Could you clarify that a bit, please?
> 
> Well if I do say:
> $ reset
> $ git diff HEAD~10
> 
> and from there scroll down a bit and then q to exit less (and the
> screen is not cleared), I see the output only so far as I've had
> previously scrolled down in less.

There's also scrollback in the terminal, which can be used to show more 
of the contents that was displayed before exiting the pager.

> Everything that would have come after that is of course not visible.
> The place where I exited may be some "well defined" border, like the
> end of a commit... or anywhere it the middle of a patch (making the
> left over remains on the terminal perhaps even ambiguous).

If you didn't select some line or page to be displayed, by scrolling 
within the pager, it obviously isn't going to be displayed, which is the 
whole point of using a pager instead of "spitting" the whole contents 
out at once.

Where and when you exit the pager is up to you only, and you can decide 
what will be left on the screen at that point.

> What's worse, when (in less) I scroll down and up again, perhaps
> repeating several times, and then quit... I see (at least in my
> less/terminal combination) things twice and mangled up (i.e. when I
> scroll up the terminal (outside of less)).

That sounds like some issue with your terminal or terminal emulator, 
which should be debugged and fixed separately.  Such misbehavior isn't 
supposed to happen at all.

> So AFAICS, not clearing the screen only works properly when never
> scrolling up again (in less).

It works just fine for me, for example.  You're obviously having some 
unrelated issues with your terminal emulator.

>> As I already mentioned above, everyone is free to configure the pager
>> behavior in any way they like.
> 
> Sure :-)
> 
>> 
>> > > Exiting if less contents than one full screen was displayed (i.e.
>> > > "-
>> > > F")
>> > > is there to save people from the frustration of quitting a pager
>> > > that
>> > > actually wasn't needed to be executed.
>> >
>> > Same actually here, at least strictly speaking, ... though I (and
>> > probably everybody else?) would really hate it, if that was
>> > removed. ^^
>> 
>> I'm afraid that I don't understand very well are you complaining
>> about
>> the presence of "-F" or not?
> 
> No :-) As I've said, I like it that way and I and probably everyone
> else would be annoyed, if -F was not present.
> 
> I just meant that strictly speaking the same reason why "S" was
> removed, could be applied to "F" as well.

I see.  Actually, removing "-S" was a good decision, IMHO, because 
chopping long lines isn't something that a sane set of defaults should 
do.  Many users would probably be confused with the need to use the 
right arrow to see long lines in their entirety.

> It is - like -R - not necessary for less to work with git.
> 
> But it is, of course, what virtually everyone will want in practise.

Well, "-R" is pretty much mandatory, because coloring the outputs has 
become some kind of defacto standard, and it's very useful when viewing 
diffs, for example.

>> Quite frankly, I can't stand scrolling in less(1) using the mouse
>> wheel,
>> but I do understand why some people like it.
> 
> The main reason I want it is, that things don't get messy, when I
> forget being in less and mouse scroll. ;-)
> 
> 
> Thanks,
> Chris.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox