Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v2] fuzz: add new oss-fuzz fuzzer for date.c / date.h
From: Junio C Hamano @ 2023-11-13 23:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Arthur Chan via GitGitGadget, git, Arthur Chan
In-Reply-To: <20231113183509.GA3838361@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Mon, Nov 13, 2023 at 04:22:48PM +0000, Arthur Chan via GitGitGadget wrote:
>
>> +	str = (char *)malloc(size + 1);
>> +	if (!str)
>> +		return 0;
>> +	memcpy(str, data, size);
>> +	str[size] = '\0';
>
> Is it important that we avoid calling die() if the malloc fails here?
>
> The usual way to write this in our code base is just:
>
>   str = xmemdupz(data, size);
>
> It's not entirely a style thing; we sometimes audit the code base
> looking for computations on malloc sizes (for integer overflows) as well
> as sites that should be using xmalloc and are not. Obviously we can
> exclude oss-fuzz/ from such audits, but if there's no reason not to
> prefer our usual style, it's one less thing to worry about.

Good point.  Thanks.

^ permalink raw reply

* Re: [PATCH v2] fuzz: add new oss-fuzz fuzzer for date.c / date.h
From: Junio C Hamano @ 2023-11-13 23:27 UTC (permalink / raw)
  To: Arthur Chan via GitGitGadget; +Cc: git, Arthur Chan
In-Reply-To: <pull.1612.v2.git.1699892568344.gitgitgadget@gmail.com>

"Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);

It is somewhat annoying that everybody has to repeat this twice
here, but it is not your fault X-<.

> +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
> +{
> +	int local;
> +	int num;
> +	uint16_t tz;

tz offset can be negative, so uint16_t is not appropriate.  See
date.c:gm_time_t() that is eventually called from show_date().

> +	char *str;
> +	timestamp_t ts;
> +	enum date_mode_type dmtype;
> +	struct date_mode *dm;
> +
> +	if (size <= 4)
> +		/*
> +		 * we use the first byte to fuzz dmtype and local,
> +		 * then the next three bytes to fuzz tz	offset,
> +		 * and the remainder (at least one byte) is fed
> +		 * as end-user input to approxidate_careful().
> +		 */
> +		return 0;
> +
> +	local = !!(*data & 0x10);
> +	dmtype = (enum date_mode_type)(*data % DATE_UNIX);
> +	if (dmtype == DATE_STRFTIME)
> +		/*
> +		 * Currently DATE_STRFTIME is not supported.
> +		 */
> +		return 0;

There is an off-by-one error above, as modulo DATE_UNIX will never
yield DATE_UNIX.  Presumably we could do something silly like

	tmp = *data % DATE_UNIX;
	if (DATE_STRFTIME <= tmp)
		tmp++;
	dmtime = (enum date_mode_type)tmp;

to pick values from [0..DATE_UNIX) and then shift everything above
DATE_STRFTIME by one to create a hole there and fill DATE_UNIX at
the same time, without wasting a sample by returning.

> +	data++;
> +	size--;
> +
> +	tz = *data++;
> +	tz = (tz << 8) | *data++;
> +	tz = (tz << 8) | *data++;
> +	size -= 3;

If your tz is 16-bit wide, then we do not have to eat three bytes
here, do we?

You never answered my question on your intention.  Is "tz"
considered attacker controlled (and needs to be fuzzed including
invalid values)?

> +	str = (char *)malloc(size + 1);
> +	if (!str)
> +		return 0;
> +	memcpy(str, data, size);
> +	str[size] = '\0';
> +
> +	ts = approxidate_careful(str, &num);
> +	free(str);
> +
> +	dm = date_mode_from_type(dmtype);
> +	dm->local = local;
> +	show_date(ts, (int16_t)tz, dm);
> +
> +	date_mode_release(dm);
> +
> +	return 0;
> +}
>
> base-commit: dadef801b365989099a9929e995589e455c51fed

Thanks.

^ permalink raw reply

* [PATCH v2] glossary: add definitions for dereference & peel
From: Victoria Dye via GitGitGadget @ 2023-11-13 23:17 UTC (permalink / raw)
  To: git; +Cc: ps, Kristoffer Haugsbakk, Victoria Dye, Victoria Dye
In-Reply-To: <pull.1610.git.1699574277143.gitgitgadget@gmail.com>

From: Victoria Dye <vdye@github.com>

Add 'gitglossary' definitions for "dereference" (as it used for both symrefs
and objects) and "peel". These terms are used in options and documentation
throughout Git, but they are not clearly defined anywhere and the behavior
they refer to depends heavily on context. Provide explicit definitions to
clarify existing documentation to users and help contributors to use the
most appropriate terminology possible in their additions to Git.

Update other definitions in the glossary that use the term "dereference" to
link to 'def_dereference'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
    glossary: add definitions for dereference & peel
    
    As promised in [1], this patch adds definitions for "peel" and
    "dereference" in the glossary, based on how they're currently used
    throughout Git. As a result, the definitions are somewhat broad
    (although I did my best to explicitly describe the different contexts in
    which they're used). My hope is that this will at least reduce confusion
    around this terminology. These definitions can also serve as a starting
    point if, in the future, another contributor wants to deprecate certain
    usages of these terms to make them less ambiguous.
    
     * Victoria
    
    [1]
    https://lore.kernel.org/git/21dfe606-39f5-4154-aaa4-695e5f6f784d@github.com/
    
    
    Changes since V1
    ================
    
     * Removed references to "peeling" a commit; the updated definition
       discusses "peeling" only in the context of tags.
     * Added a cross-link from "dereference" to "peel" (one already existed
       for "peel" to "dereference").

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1610%2Fvdye%2Fvdye%2Fglossary-peel-dereference-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1610/vdye/vdye/glossary-peel-dereference-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1610

Range-diff vs v1:

 1:  e40fc3e5e04 ! 1:  4d9e0d7fc81 glossary: add definitions for dereference & peel
     @@ Documentation/glossary-content.txt: to point at the new commit.
      +<<def_object,object>> a tag points at. Tags are recursively dereferenced by
      +repeating the operation on the result object until the result has either a
      +specified <<def_object_type,object type>> (where applicable) or any non-"tag"
     -+object type.
     ++object type. A synonym for "recursive dereference" in the context of tags is
     ++"<<def_peel,peel>>".
      ++
      +Referring to a <<def_commit_object,commit object>>: the action of accessing
      +the commit's tree object. Commits cannot be dereferenced recursively.
     @@ Documentation/glossary-content.txt: exclude;;
       	parents.
       
      +[[def_peel]]peel::
     -+	Synonym for object <<def_dereference,dereference>>. Most commonly used
     -+	in the context of tags, where it refers to the process of recursively
     -+	dereferencing a <<def_tag_object,tag object>> until the result object's
     -+	<<def_object_type,type>> is something other than "tag".
     ++	The action of recursively <<def_dereference,dereferencing>> a
     ++	<<def_tag_object,tag object>>.
      +
       [[def_pickaxe]]pickaxe::
       	The term <<def_pickaxe,pickaxe>> refers to an option to the diffcore


 Documentation/glossary-content.txt | 49 +++++++++++++++++++++---------
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt
index 65c89e7b3eb..59d8ab85721 100644
--- a/Documentation/glossary-content.txt
+++ b/Documentation/glossary-content.txt
@@ -98,9 +98,8 @@ to point at the new commit.
 	revision.
 
 [[def_commit-ish]]commit-ish (also committish)::
-	A <<def_commit_object,commit object>> or an
-	<<def_object,object>> that can be recursively dereferenced to
-	a commit object.
+	A <<def_commit_object,commit object>> or an <<def_object,object>> that
+	can be recursively <<def_dereference,dereferenced>> to a commit object.
 	The following are all commit-ishes:
 	a commit object,
 	a <<def_tag_object,tag object>> that points to a commit
@@ -125,6 +124,25 @@ to point at the new commit.
 	dangling object has no references to it from any
 	reference or <<def_object,object>> in the <<def_repository,repository>>.
 
+[[def_dereference]]dereference::
+	Referring to a <<def_symref,symbolic ref>>: the action of accessing the
+	<<def_ref,reference>> pointed at by a symbolic ref. Recursive
+	dereferencing involves repeating the aforementioned process on the
+	resulting ref until a non-symbolic reference is found.
++
+Referring to a <<def_tag_object,tag object>>: the action of accessing the
+<<def_object,object>> a tag points at. Tags are recursively dereferenced by
+repeating the operation on the result object until the result has either a
+specified <<def_object_type,object type>> (where applicable) or any non-"tag"
+object type. A synonym for "recursive dereference" in the context of tags is
+"<<def_peel,peel>>".
++
+Referring to a <<def_commit_object,commit object>>: the action of accessing
+the commit's tree object. Commits cannot be dereferenced recursively.
++
+Unless otherwise specified, "dereferencing" as it used in the context of Git
+commands or protocols is implicitly recursive.
+
 [[def_detached_HEAD]]detached HEAD::
 	Normally the <<def_HEAD,HEAD>> stores the name of a
 	<<def_branch,branch>>, and commands that operate on the
@@ -444,6 +462,10 @@ exclude;;
 	of the logical predecessor(s) in the line of development, i.e. its
 	parents.
 
+[[def_peel]]peel::
+	The action of recursively <<def_dereference,dereferencing>> a
+	<<def_tag_object,tag object>>.
+
 [[def_pickaxe]]pickaxe::
 	The term <<def_pickaxe,pickaxe>> refers to an option to the diffcore
 	routines that help select changes that add or delete a given text
@@ -620,12 +642,11 @@ The most notable example is `HEAD`.
 	copies of) commit objects of the contained submodules.
 
 [[def_symref]]symref::
-	Symbolic reference: instead of containing the <<def_SHA1,SHA-1>>
-	id itself, it is of the format 'ref: refs/some/thing' and when
-	referenced, it recursively dereferences to this reference.
-	'<<def_HEAD,HEAD>>' is a prime example of a symref. Symbolic
-	references are manipulated with the linkgit:git-symbolic-ref[1]
-	command.
+	Symbolic reference: instead of containing the <<def_SHA1,SHA-1>> id
+	itself, it is of the format 'ref: refs/some/thing' and when referenced,
+	it recursively <<def_dereference,dereferences>> to this reference.
+	'<<def_HEAD,HEAD>>' is a prime example of a symref. Symbolic references
+	are manipulated with the linkgit:git-symbolic-ref[1] command.
 
 [[def_tag]]tag::
 	A <<def_ref,ref>> under `refs/tags/` namespace that points to an
@@ -661,11 +682,11 @@ The most notable example is `HEAD`.
 	<<def_tree,tree>> is equivalent to a <<def_directory,directory>>.
 
 [[def_tree-ish]]tree-ish (also treeish)::
-	A <<def_tree_object,tree object>> or an <<def_object,object>>
-	that can be recursively dereferenced to a tree object.
-	Dereferencing a <<def_commit_object,commit object>> yields the
-	tree object corresponding to the <<def_revision,revision>>'s
-	top <<def_directory,directory>>.
+	A <<def_tree_object,tree object>> or an <<def_object,object>> that can
+	be recursively <<def_dereference,dereferenced>> to a tree object.
+	Dereferencing a <<def_commit_object,commit object>> yields the tree
+	object corresponding to the <<def_revision,revision>>'s top
+	<<def_directory,directory>>.
 	The following are all tree-ishes:
 	a <<def_commit-ish,commit-ish>>,
 	a tree object,

base-commit: dadef801b365989099a9929e995589e455c51fed
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH 02/13] Enable builds for z/OS.
From: brian m. carlson @ 2023-11-13 23:03 UTC (permalink / raw)
  To: Haritha D via GitGitGadget; +Cc: git, Haritha
In-Reply-To: <098b9ca8ece4fdce45a9b48e576b474ed81dced1.1699871056.git.gitgitgadget@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 10819 bytes --]

On 2023-11-13 at 10:24:04, Haritha D via GitGitGadget wrote:
> From: Haritha D <harithamma.d@ibm.com>
> 
> This commit enables git to build on z/OS.
> It takes advantage of enahanced ASCII
> services on z/OS to auto-convert input
> files to ASCII

In general, we don't want to convert files to ASCII, since we assume
text files are UTF-8 unless otherwise specified.  If you're suggesting
that your system is normally EBCDIC, changing that should be an initial
piece of setup code or something used in `xopen` on your platform so it
applies everywhere with minimal changes.

> It also adds support for
> [platform]-working-tree-encoding.
> Platform is substituted with uname_info.sysname,
> so it will only apply to the given platform.

I think this is going to need to be a separate patch and feature with
its own explanation of why it's valuable.

> Also adds support for scripts that are not in
> standard locations so that /bin/env bash
> can be specified.

> Signed-off-by: Harithamma D <harithamma.d@ibm.com>
> ---
>  Makefile              | 21 +++++++++---
>  builtin.h             |  3 ++
>  builtin/archive.c     |  6 ++++
>  builtin/hash-object.c | 28 +++++++++++++++
>  combine-diff.c        |  4 +++
>  config.c              |  7 ++++
>  configure.ac          |  3 ++
>  convert.c             | 44 ++++++++++++++++++++----
>  copy.c                |  3 ++
>  diff.c                | 11 ++++++
>  entry.c               | 26 ++++++++++++++
>  environment.c         |  3 ++
>  git-compat-util.h     |  8 +++++
>  negotiator/default.c  |  4 +--
>  negotiator/noop.c     |  4 +--
>  negotiator/skipping.c |  4 +--
>  object-file.c         | 80 ++++++++++++++++++++++++++++++++++++++++++-
>  read-cache.c          |  3 ++
>  utf8.c                | 11 ++++++
>  19 files changed, 255 insertions(+), 18 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 9c6a2f125f8..30aa76da4f4 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -20,6 +20,8 @@ include shared.mak
>  #
>  # Define SHELL_PATH to a POSIX shell if your /bin/sh is broken.
>  #
> +# Define SHELL_PATH_FOR_SCRIPTS to a POSIX shell if your /bin/sh is broken.
> +#
>  # Define SANE_TOOL_PATH to a colon-separated list of paths to prepend
>  # to PATH if your tools in /usr/bin are broken.
>  #
> @@ -215,6 +217,8 @@ include shared.mak
>  #
>  # Define PERL_PATH to the path of your Perl binary (usually /usr/bin/perl).
>  #
> +# Define PERL_PATH_FOR_SCRIPTS to a Perl binary if your /usr/bin/perl is broken.

You will probably want to explain in your commit message why these two
are required to be different from the standard values and explain here
what the relevant difference is so that users can set them
appropriately.

> diff --git a/builtin/archive.c b/builtin/archive.c
> index 90761fdfee0..53ec794356f 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -14,6 +14,12 @@
>  static void create_output_file(const char *output_file)
>  {
>  	int output_fd = xopen(output_file, O_CREAT | O_WRONLY | O_TRUNC, 0666);
> +#ifdef __MVS__
> + #if (__CHARSET_LIB == 1)
> +	if (setbinaryfd(output_fd))
> +		die_errno(_("could not tag archive file '%s'"), output_file);
> + #endif
> +#endif

This would be better to place in `xopen` itself so that all files are
correctly configured.  That would do well as its own patch, and you'd
want to explain well in the commit message what this function does, why
it's necessary, and what the consequences of not using it are, as well
as any alternatives that you've rejected.

>  	if (output_fd != 1) {
>  		if (dup2(output_fd, 1) < 0)
>  			die_errno(_("could not redirect output"));
> diff --git a/builtin/hash-object.c b/builtin/hash-object.c
> index 5ffec99dcea..b33b32ff977 100644
> --- a/builtin/hash-object.c
> +++ b/builtin/hash-object.c
> @@ -57,11 +57,39 @@ static void hash_fd(int fd, const char *type, const char *path, unsigned flags,
>  	maybe_flush_or_die(stdout, "hash to stdout");
>  }
>  
> +#ifdef __MVS__
> +#  if (__CHARSET_LIB == 1)
> +#  include <stdio.h>
> +#  include <stdlib.h>

We typically don't include the standard headers here.  Instead, they're
included by git-compat-util.h because on some systems they have to be
included in a certain order with certain options.  If you include that
header instead at the top of the file, or one of the headers that
includes it, then typically that should do the right thing.

> +   int setbinaryfd(int fd)
> +   {
> +     attrib_t attr;
> +     int rc;
> +
> +     memset(&attr, 0, sizeof(attr));
> +     attr.att_filetagchg = 1;
> +     attr.att_filetag.ft_ccsid = FT_BINARY;
> +     attr.att_filetag.ft_txtflag = 0;
> +
> +     rc = __fchattr(fd, &attr, sizeof(attr));
> +     return rc;
> +   }
> +#  endif
> +#endif

I would think a comment explaining what this function does and why it's
necessary would be appropriate, since it's not in POSIX and isn't
typically necessary on POSIX systems.

> diff --git a/combine-diff.c b/combine-diff.c
> index f90f4424829..73445a517c7 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -1082,6 +1082,10 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
>  			ssize_t done;
>  			int is_file, i;
>  
> +#ifdef __MVS__
> +      __disableautocvt(fd);
> +#endif

Again, if this can be centralized, it should be, and it should be
explained well in the commit message.  I'm uncertain what it does or
what value it provides.

> diff --git a/config.c b/config.c
> index f9a1cca4e8a..37c124a37c0 100644
> --- a/config.c
> +++ b/config.c
> @@ -1521,6 +1521,13 @@ static int git_default_core_config(const char *var, const char *value,
>  		return 0;
>  	}
>  
> +	#ifdef __MVS__
> +	if (!strcmp(var, "core.ignorefiletags")) {
> +		ignore_file_tags = git_config_bool(var, value);
> +		return 0;
> +	}
> +	#endif

This should also live in its own patch and the commit message should
explain what it does.  We'd also want it to be documented in the config
options in the Documentation directory.

> diff --git a/convert.c b/convert.c
> index a8870baff36..4f14ff6f1ed 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -377,12 +377,15 @@ static int check_roundtrip(const char *enc_name)
>  static const char *default_encoding = "UTF-8";
>  
>  static int encode_to_git(const char *path, const char *src, size_t src_len,
> -			 struct strbuf *buf, const char *enc, int conv_flags)
> +			 struct strbuf *buf, const char *enc, enum convert_crlf_action attr_action, int conv_flags)
>  {
>  	char *dst;
>  	size_t dst_len;
>  	int die_on_error = conv_flags & CONV_WRITE_OBJECT;
>  
> +  if (attr_action == CRLF_BINARY) {
> +    return 0;
> +  }

I'm pretty sure this is a change in behaviour from what we had before.
It should live in its own patch, with an explanation in the commit
message why it's a compelling and correct change overall, and with
suitable tests.

>  	/*
>  	 * No encoding is specified or there is nothing to encode.
>  	 * Tell the caller that the content was not modified.
> @@ -403,6 +406,11 @@ static int encode_to_git(const char *path, const char *src, size_t src_len,
>  		return 0;
>  
>  	trace_encoding("source", path, enc, src, src_len);
> +#ifdef __MVS__
> +  // Don't convert ISO8859-1 on z/OS
> +  if (strcasecmp("ISO8859-1", enc) == 0)
> +    return 0;
> +#endif

This definitely needs explanation in the commit message and should
probably be its own patch, explaining why z/OS has this compelling need
to not convert ISO8859-1.  Note that ISO8859-1 is not the same as "no
binary conversion", since it doesn't include many control codes.

Note that if, as it says later on, this really means "UTF-8", that's a
platform wart you'd want to paper over in a file in the compat code.  In
general, the compat directory is a good place to put anything that your
platform needs specifically.

> +static const char* get_platform() {
> +	struct utsname uname_info;
> +
> +	if (uname(&uname_info))
> +		die(_("uname() failed with error '%s' (%d)\n"),
> +			    strerror(errno),
> +			    errno);
> +
> +  if (!strcmp(uname_info.sysname, "OS/390"))
> +    return "zos";
> +  return uname_info.sysname;
> +}

This is definitely a new feature, and I'm not sure why it's necessary or
useful.  I suspect there's something about z/OS that makes it valuable,
but I don't know what it is since the commit message doesn't tell me.
I'm also not sure that these values will be correct on Windows.

I think I could go on to make similar comments about the rest of this
series.  I'm not opposed to seeing z/OS changes come in, but you've
amalgamated at least a half-dozen separate patches into one and haven't
explained them very thoroughly in the commit message.

I'd generally want to look at the commit message and understand the
problem the code is trying to solve and then look at the code and think,
"Oh, yes, this seems like the obvious and logical way to solve this
problem," or at least think, "Oh, no, I think we should solve this
problem in a different way," so I can help make a thoughtful review
comment.  Right now, I lack the information to have an informed opinion
and so I can't provide any helpful feedback or analysis of the patches.

When you're adding new features or fixing bugs, we'll also want tests
for those cases to help us avoid regressing that code in the future.
Even if we don't normally run the testsuite on z/OS, at least _you_ will
notice that the tests have failed and then we'll be able to address the
bugs in a timely manner.

I also noted that there were some fixup commits later on in the series
that address whitespace issues.  Typically, we'd want to squash those in
to the earlier patches.  Nobody expects perfection, but squashing errors
into earlier patches helps us keep the history neat and let us pretend
like you never made those errors at all.  It also lets tools like git
blame and git bisect work more nicely for users.

I'd recommend a quick pass over the SubmittingPatches file, which is
also available at https://git-scm.com/docs/SubmittingPatches.  The
sections on making separate commits for separate changes and describing
changes well come to mind as places to focus.

I know this may seem overwhelming and like I'm upset or disappointed;
I'm definitely not.  I'm very much interested in seeing Git available
for more platforms, but right now it's too hard for me to reason about
the changes for z/OS to provide helpful feedback, so I'm hoping you can
send a fixed v2 that helps me (and everyone else) understand these
changes better so you can get a helpful review.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack`
From: Taylor Blau @ 2023-11-13 22:34 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren, git, Eric W. Biederman, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <20231113220254.GA2065691@coredump.intra.peff.net>

On Mon, Nov 13, 2023 at 05:02:54PM -0500, Jeff King wrote:
> On Fri, Nov 10, 2023 at 03:51:18PM -0800, Elijah Newren wrote:
>
> > This is unsafe; the object may need to be read later within the same
> > merge. [...]
> >
> > I think (untested) that you could fix this by creating two packs
> > instead of just one.  In particular, add a call to
> > flush_odb_transcation() after the "redo_after_renames" block in
> > merge_ort_nonrecursive_internal().
>
> It might not be too hard to just let in-process callers access the
> objects we've written. A quick and dirty patch is below, which works
> with the test you suggested (the test still fails because it finds a
> conflict, but it gets past the "woah, I can't find that sha1" part).

That's a very slick idea, and I think that this series has some legs to
stand on now as a result.

There are a few of interesting conclusions we can draw here:

  1. This solves the recursive merge problem that Elijah mentioned
     earlier where we could generate up to 2^N packs (where N is the
     maximum depth of the recursive merge).

  2. This also solves the case where the merge-ort code needs to read an
     object that it wrote earlier on in the same process without having
     to flush out intermediate packs. So in the best case we need only a
     single pack (instead of two).

  3. This also solves the 'replay' issue, *and* allows us to avoid the
     tmp-objdir thing there entirely, since we can simulate object reads
     in the bulk-checkin pack.

So I think that this is a direction worth pursuing.

In terms of making those lookups faster, I think that what you'd want is
an oidmap. The overhead is slightly unfortunate, but I think that any
other solution which requires keeping the written array in sorted order
would in practice be more expensive as you have to constantly reallocate
and copy portions of the array around to maintain its invariant.

> I don't know if that is sufficient, though. Would other spawned
> processes (hooks, external merge drivers, and so on) need to be able to
> see these objects, too?

Interesting point. In theory those processes could ask about newly
created objects, and if they were spawned before the bulk-checkin pack
was flushed, those lookups would fail.

But that requires that, e.g. for hooks, that we know a-priori the object
ID of some newly-written objects. If you wanted to make those lookups
succeed, I think there are a couple of options:

  - teach child-processes about the bulk-checkin pack, and let them
    perform the same fake lookup

  - flush (but do not close) the bulk-checkin transaction

In any event, I think that this is a sufficiently rare and niche case
that we'd be OK to declare that you should not expect the above
scenarios to work when using `--write-pack`. If someone does ask for
that feature in the future, we could implement it relatively painlessly
using one of the above options.

Thanks,
Taylor

^ permalink raw reply

* [PATCH] completion: add and use the __git_get_config_subsection helper function
From: SZEDER Gábor @ 2023-11-13 22:25 UTC (permalink / raw)
  To: git; +Cc: Philippe Blain, Junio C Hamano, SZEDER Gábor

Our Bash completion script recently learned to complete configured
trailer key aliases for 'git config --trailer=<TAB>', but the helper
function extracting the key alias from 'git config's output, i.e. the
subsection from 'trailer.*.key', ended up more complex than necessary,
with considerable overhead from executing four external processes in a
pipe.

Replace those commands in the pipe with a simple shell loop using only
a pair of parameter expansions and a builtin 'echo', which is easier
to understand and should perform better (I assume that users don't
have that many subsections in any particular section to make the
processing with an external process (let alone four) worth it).

And while at it, let's extract this loop into a generic
__git_get_config_subsections() helper function, as it might be useful
elsewhere in the future as well (at the moment it isn't, AFAICT).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 contrib/completion/git-completion.bash | 14 +++++++++++++-
 t/t9902-completion.sh                  | 13 +++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 13a39ebd2e..34bbb66f85 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1148,6 +1148,18 @@ __git_get_config_variables ()
 	done
 }
 
+# Lists all subsections in the given section which contain the given
+# config variable, with the section and variable names removed.
+__git_get_config_subsections ()
+{
+	local section="$1" var="$2" i IFS=$'\n'
+	for i in $(__git config --name-only --get-regexp "^$section\..*\.$var$"); do
+		i=${i#$section.}
+		i=${i%.$var}
+		echo "$i"
+	done
+}
+
 __git_pretty_aliases ()
 {
 	__git_get_config_variables "pretty"
@@ -1681,7 +1693,7 @@ __git_untracked_file_modes="all no normal"
 
 __git_trailer_tokens ()
 {
-	__git config --name-only --get-regexp '^trailer\..*\.key$' | cut -d. -f 2- | rev | cut -d. -f2- | rev
+	__git_get_config_subsections trailer key
 }
 
 _git_commit ()
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index a7c3b4eb63..11ed83d0ed 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -2130,6 +2130,19 @@ test_expect_success '__git_get_config_variables' '
 	test_cmp expect actual
 '
 
+test_expect_success '__git_get_config_subsections' '
+	cat >expect <<-\EOF &&
+	subsection-1
+	SubSection-2
+	sub.section.3
+	EOF
+	test_config interesting.subsection-1.name good &&
+	test_config Interesting.SubSection-2.Name good &&
+	test_config interesting.sub.section.3.name good &&
+	__git_get_config_subsections interesting name >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success '__git_pretty_aliases' '
 	cat >expect <<-EOF &&
 	author
-- 
2.43.0.rc1.528.g8f9d60d041


^ permalink raw reply related

* Re: [PATCH 1/1] attr: add native file mode values support
From: Junio C Hamano @ 2023-11-13 22:22 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Joanna Wang, git
In-Reply-To: <20231113165031.GA28778@tb-raspi4>

Torsten Bögershausen <tboegi@web.de> writes:

> On Sat, Nov 11, 2023 at 04:03:08AM +0000, Joanna Wang wrote:
>
> Some thoughts and comments inline...
>
>> Gives all paths inherent 'mode' attribute values based on the paths'
>> modes (one of 100644, 100755, 120000, 040000, 160000). Users may use
>> this feature to filter by file types. For example a pathspec such as
>> ':(attr:mode=160000)' could filter for submodules without needing
>
> My spontanous feeling is that filetype may be another choice:
>> ':(attr:filetype=160000)' could filter for submodules without needing

I do agree that "mode" invites "mode of what???" reaction, and that
a term that narrows the scope would be preferrable.  "Filemode" is a
bit questionable, though, as we give this permbits to non-files like
submodules.  "ls-tree" documentation seems to call it %(objectmode).

> And having written this, we can think using something borrowed from
> `find . -type f`
>
> :(attr:filetype=f)' or :(attr:filetype=x)' (for executable)

This would not work for submodules, though.  Naively one might want
to abuse 'd' but I suspect we would eventually want to be able to
give the mode bits to an out-of-cone directory storeed in the index
as a tree in a cone-mode sparse checkout, which would be 040000,
which deserves 'd' more than submodules.

> But then I missed the point why we need an attribute here?
> The mode is already defined by the the file system (and Git),
> is there a special reason that the user can define or re-define the
> value here ?

I think the idea is that "mode" being a too generic word can be used
for totally different purposes in existing projects and the addition
did not want to disturb their own use.  But stepping back a bit,
such an application is likely marking selected few paths with the
attribute, and paths for which "mode" was "unset" are now given
these natural "mode"; it is inevitable to crash with such uses.  If
we want to introduce "native" attributes of this kind, we would
probably need to carve out namespaces a bit more clearaly.

> May be there is, when working with pathspec.
> But then "pathspec=" could be a better construction.
> Since "mode" could make a reader think that Git does somewhat with the file
> when checking out.
> My personal hope reading "mode=100755" in .gitattributes would
> be that Git makes it executable when checking out, if if it is
> recorded in Git as 100644, probably coming from a file-system that
> doesn't support the executable bit in a Unix way.

That is not the intended way this attribute is to be used.  Perhaps
we should make it an error (or ignored) when certain built-in/native
attributes are seen in the attribute file, but again that takes some
namespace carved out to avoid crashing with end-user names.

>> If there is any existing mode attribute for a path (e.g. there is
>> !mode, -mode, mode, mode=<value> in .gitattributes) that setting will
>> take precedence over the native mode value.

Again, this has one hole, I think.  Paths that are not mentioned
(not even with "!mode") would come to the function as ATTR__UNKNOWN
and trigger the fallback behaviour, even when other paths are given
end-user specified "mode" attribute values.

^ permalink raw reply

* Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack`
From: Jeff King @ 2023-11-13 22:05 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Elijah Newren, git, Eric W. Biederman, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <ZU7X3N/rqCK/Y9cj@nand.local>

On Fri, Nov 10, 2023 at 08:24:44PM -0500, Taylor Blau wrote:

> > This does mean that for a recursive merge, that you'll get up to 2*N
> > packfiles, where N is the depth of the recursive merge.
> 
> We definitely want to avoid that ;-). I think there are a couple of
> potential directions forward here, but the most promising one I think is
> due to Johannes who suggests that we write loose objects into a
> temporary directory with a replace_tmp_objdir() call, and then repack
> that side directory before migrating a single pack back into the main
> object store.

I posted an alternative in response to Elijah; the general idea being to
allow the usual object-lookup code to access the in-progress pack. That
would keep us limited to a single pack.

It _might_ be a terrible idea. E.g., if you write a non-bulk object that
references a bulk one, then that non-bulk one is broken from the
perspective of other processes (until the bulk checkin is flushed). But
I think we'd always be writing to one or the other here, never
interleaving?

-Peff

^ permalink raw reply

* Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack`
From: Jeff King @ 2023-11-13 22:02 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Taylor Blau, git, Eric W. Biederman, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <CABPp-BEfy9VOvimP9==ry_rZXu=metOQ8s=_-XiG_Pdx9c06Ww@mail.gmail.com>

On Fri, Nov 10, 2023 at 03:51:18PM -0800, Elijah Newren wrote:

> This is unsafe; the object may need to be read later within the same
> merge. [...]
>
> I think (untested) that you could fix this by creating two packs
> instead of just one.  In particular, add a call to
> flush_odb_transcation() after the "redo_after_renames" block in
> merge_ort_nonrecursive_internal().

It might not be too hard to just let in-process callers access the
objects we've written. A quick and dirty patch is below, which works
with the test you suggested (the test still fails because it finds a
conflict, but it gets past the "woah, I can't find that sha1" part).

I don't know if that is sufficient, though. Would other spawned
processes (hooks, external merge drivers, and so on) need to be able to
see these objects, too?

The patch teaches the packfile code about the special bulk checkin pack.
It might be cleaner to invert it, and just have the bulk checkin code
register a magic packed_git (it would need to fake the .idx somehow).

diff --git a/bulk-checkin.c b/bulk-checkin.c
index bd6151ba3c..566fc36e68 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -30,6 +30,8 @@ static struct bulk_checkin_packfile {
 	struct pack_idx_entry **written;
 	uint32_t alloc_written;
 	uint32_t nr_written;
+
+	struct packed_git *fake_packed_git;
 } bulk_checkin_packfile;
 
 static void finish_tmp_packfile(struct strbuf *basename,
@@ -82,6 +84,7 @@ static void flush_bulk_checkin_packfile(struct bulk_checkin_packfile *state)
 
 clear_exit:
 	free(state->written);
+	free(state->fake_packed_git);
 	memset(state, 0, sizeof(*state));
 
 	strbuf_release(&packname);
@@ -530,3 +533,37 @@ void end_odb_transaction(void)
 
 	flush_odb_transaction();
 }
+
+static struct packed_git *fake_packed_git(struct bulk_checkin_packfile *state)
+{
+	struct packed_git *p = state->fake_packed_git;
+	if (!p) {
+		FLEX_ALLOC_STR(p, pack_name, "/fake/in-progress.pack");
+		state->fake_packed_git = p;
+		p->pack_fd = state->f->fd;
+		p->do_not_close = 1;
+	}
+
+	hashflush(state->f);
+	p->pack_size = state->f->total; /* maybe add 20 to simulate trailer? */
+
+	return p;
+}
+
+int bulk_checkin_pack_entry(const struct object_id *oid, struct pack_entry *e)
+{
+	size_t i;
+	/*
+	 * this really ought to have some more efficient data structure for
+	 * lookup; ditto for the existing already_written()
+	 */
+	for (i = 0; i < bulk_checkin_packfile.nr_written; i++) {
+		struct pack_idx_entry *p = bulk_checkin_packfile.written[i];
+		if (oideq(&p->oid, oid)) {
+			e->p = fake_packed_git(&bulk_checkin_packfile);
+			e->offset = p->offset;
+			return 0;
+		}
+	}
+	return -1;
+}
diff --git a/bulk-checkin.h b/bulk-checkin.h
index 89786b3954..153fe87c06 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -44,4 +44,7 @@ void flush_odb_transaction(void);
  */
 void end_odb_transaction(void);
 
+struct pack_entry;
+int bulk_checkin_pack_entry(const struct object_id *oid, struct pack_entry *e);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 9cc0a2e37a..05194b1d9b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -23,6 +23,7 @@
 #include "commit-graph.h"
 #include "pack-revindex.h"
 #include "promisor-remote.h"
+#include "bulk-checkin.h"
 
 char *odb_pack_name(struct strbuf *buf,
 		    const unsigned char *hash,
@@ -2045,6 +2046,9 @@ int find_pack_entry(struct repository *r, const struct object_id *oid, struct pa
 	struct list_head *pos;
 	struct multi_pack_index *m;
 
+	if (!bulk_checkin_pack_entry(oid, e))
+		return 1;
+
 	prepare_packed_git(r);
 	if (!r->objects->packed_git && !r->objects->multi_pack_index)
 		return 0;

^ permalink raw reply related

* Re: [PATCH] remote-curl: avoid hang if curl asks for more data after eof
From: Jonathan Tan @ 2023-11-13 21:22 UTC (permalink / raw)
  To: Jiří Hruška; +Cc: Jonathan Tan, git, Jeff King
In-Reply-To: <CAGE_+C6DJMAO0bj5QHoKBBV3gMEMtZ-ajJ9ZnDGYq6eorr-oig@mail.gmail.com>

Jiří Hruška <jirka@fud.cz> writes:
> It has been observed that under some circumstances, libcurl can call
> our `CURLOPT_READFUNCTION` callback `rpc_out()` again even after
> already getting a return value of zero (EOF) back once before.
> 
> This results in `rpc_read_from_out()` trying to read more from the
> child process pipe, because `rpc->flush_read_but_not_sent` is reset to
> false already the first time EOF is returned, and then the whole
> operation gets stuck - the child process is already trying to read
> a response back and will not write anything to the output pipe anymore,
> while the parent/remote process is now blocked waiting to read more too
> and never even finishes sending the request.

Ah, thanks for finding this bug. It sounds worthwhile to make Git more
resilient in this situation.

I'll just make some preliminary comments.

> This commit addresses the bug with the following changes:

[snip]

This seems like a long list of changes, when from the description of the
bug above, I would have assumed it sufficient to record somewhere when
the CURLOPT_READFUNCTION callback returns zero, and then always return
zero after that if the callback is ever re-invoked. If this is indeed
not sufficient, we should describe why.

Also, if multiple changes are needed, please split them into several
commits.

> A trivial solution would be to just take the line which resets the flag
> 
>     /*
>      * The line length either does not need to be sent at
>      * all or has already been completely sent. Now we can
>      * return 0, indicating EOF, meaning that the flush has
>      * been fully sent.
>      */
> -   rpc->flush_read_but_not_sent = 0;
>     return 0;
> 
> from rpc_out() and reset it only in post_rpc(), before the next time a large
> request is being sent out and rpc_out() will go into play again:
> 
>     if (large_request) {
>         ...
>         rpc->initial_buffer = 1;
> +       rpc->flush_read_but_not_sent = 0;
>         curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out);
> 
> This way the CURLOPT_READFUNCTION would be returning zeros at the end of the
> upload as long as needed, just like fread() at the end of a real file.
> 
> Hence, the bug could be fixed with just that two-lines change.

Ah, you describe the straightforward fix here. I haven't looked at this
closely enough to see if this would work, though.

> But while trying to figure out the above, I noticed a few things that prolonged
> the time I needed to understand what was going on, so I would like to propose
> some more changes to make the code perhaps a bit easier to read for the next
> person who comes to hack on it after me.
> 
> The description of the extra modifications is in the commit message. All of
> these changes are obviously optional and naturally subjective. I think that we
> can all agree on some points (less indentation = good), but naming is hard,
> and so is balance between unclear and too verbose, or when to split all
> non-functional changes to a separate commit. So let me know if there are things
> to do differently and I will gladly obey, it is your codebase after all.

Yes, please split non-functional changes into a separate commit
(preferably one for each concern). I do envision reviewers saying "let's
put patches X, Y, and Z in, but not patches A, B, and C", so splitting
would make it easier to decide what's worthwhile to have.

> Which brings me to the next topic, testing.
> 
> Validating the fix would be trivial with a mocked libcurl, but turns out to be
> much harder with the integration-level test suite of this project.

[snip]

> But outside of that, and as far as the bundled test suite goes, I have failed to
> write a test that could validate this problem does not ever occur again.

Due to the nature of the bug and the fix, I do agree that it's hard to
test and I would be OK with including the fix without associated tests.

 

^ permalink raw reply

* Re: git-bisect reset not deleting .git/BISECT_LOG
From: Jeff King @ 2023-11-13 21:08 UTC (permalink / raw)
  To: Janik Haag; +Cc: git
In-Reply-To: <bef9d5b3-bb64-4662-8952-d000872c5244@aq0.de>

On Mon, Nov 13, 2023 at 09:42:09PM +0100, Janik Haag wrote:

> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
> 
> What did you do before the bug happened? (Steps to reproduce your issue)
> ```bash
> git init /tmp/reproduce-bisect-warning
> cd /tmp/reproduce-bisect-warning
> touch .git/BISECT_LOG
> git bisect reset
> git switch -c log
> ```
> 
> What did you expect to happen? (Expected behavior)
> 
> `git-bisect reset` should have deleted .git/BISECT_LOG
> 
> What happened instead? (Actual behavior)
> 
> .git/BISECT_LOG is still there

I don't think this is really specific to BISECT_LOG. In "bisect reset",
we'll call bisect_clean_state(), which removes a bunch of files. The
problem in your example is that "bisect reset" sees that we are not
bisecting and bails early.

Which I can kind of see, as part of the "reset" process is to reset to
the original pre-bisect commit. If it's not given on the command line,
the default is to use BISECT_START, which of course does not exist.

  As a side note, "git bisect reset HEAD" will do what you want, because
  it skips the BISECT_START check. But obviously that's not something
  normal users should need to know about, and I think is even an
  accidental side-effect of the conversion in 5e82c3dd22
  (bisect--helper: `bisect_reset` shell function in C, 2019-01-02).

So really, you just want the "clean" part of "bisect reset", but not the
"reset". We could have a separate "bisect clean" that would do what you
want (clean state without trying to reset HEAD). But I don't think it
would be unreasonable to "reset" to just unconditionally clean. I think
it would probably just be a few lines in bisect_reset() to avoid the
early return, and skip the call to "checkout" when we have no branch to
go back to.

Maybe a good simple patch for somebody interested in getting into Git
development?

-Peff

^ permalink raw reply

* commit-graph paranoia performance, was Re: [ANNOUNCE] Git v2.43.0-rc1
From: Jeff King @ 2023-11-13 20:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git
In-Reply-To: <xmqq8r785ev1.fsf@gitster.g>

On Thu, Nov 09, 2023 at 02:33:54AM +0900, Junio C Hamano wrote:

>  * The codepath to traverse the commit-graph learned to notice that a
>    commit is missing (e.g., corrupt repository lost an object), even
>    though it knows something about the commit (like its parents) from
>    what is in commit-graph.
>    (merge 7a5d604443 ps/do-not-trust-commit-graph-blindly-for-existence later to maint).

I happened to be timing "rev-list" for an unrelated topic today, and I
noticed that this change had a rather large effect. The commit message
for 7a5d604443 claims a 30% performance regression. But that's when
using "--topo-order", and actually writing out the result.

Running "rev-list --count" on a copy of linux.git with a fully-built
commit-graph shows that the run-time doubles:

  Benchmark 1: git.v2.42.1 rev-list --count HEAD
    Time (mean ± σ):     658.0 ms ±   5.2 ms    [User: 613.5 ms, System: 44.4 ms]
    Range (min … max):   650.2 ms … 666.0 ms    10 runs

  Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
    Time (mean ± σ):      1.333 s ±  0.019 s    [User: 1.263 s, System: 0.069 s]
    Range (min … max):    1.302 s …  1.361 s    10 runs

  Summary
    git.v2.42.1 rev-list --count HEAD ran
      2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD

Now in defense of that patch, this particular command is going to be one
of the most sensitive in terms of percent change, simply because it
isn't doing much besides walking the commits. And 650ms isn't _that_ big
in an absolute sense. But it also doesn't quite feel like nothing, even
tacked onto a command that might otherwise take 1000ms to run.

Should we default GIT_COMMIT_GRAPH_PARANOIA to "0"? Yes, some operations
might miss a breakage, but that is true of so much of Git. For day to
day commands we generally assume that the repository is not corrupted,
and avoid looking at any data we can. Other commands (like "commit-graph
verify", but maybe others) would probably want to be more careful
(either by checking this case explicitly, or by enabling the paranoia
flag themselves).

-Peff

^ permalink raw reply

* git-bisect reset not deleting .git/BISECT_LOG
From: Janik Haag @ 2023-11-13 20:42 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)
```bash
git init /tmp/reproduce-bisect-warning
cd /tmp/reproduce-bisect-warning
touch .git/BISECT_LOG
git bisect reset
git switch -c log
```

What did you expect to happen? (Expected behavior)

`git-bisect reset` should have deleted .git/BISECT_LOG

What happened instead? (Actual behavior)

.git/BISECT_LOG is still there

What's different between what you expected and what actually happened?

git switch and some external tools like shell prompts are relying on 
that file to know if we are doing a bisect so switch printed a warning 
that we are still bisecting. Funnily enough git checkout doesn't print 
that warning.

Anything else you want to add:

In case you are wondering why I created the .git/BISECT_LOG file with 
touch instead of using the git cli, I forgot how I ended up there in the 
first place that was like a year ago and the warning didn't really 
bother me, today someone told me to look for anything in .git namend 
BISECT* and after deleting the log by hand it worked. So I figured 
git-bisect reset should probably have done that after I ended up in that 
unkown state. the content I had in BISECT_LOG before deleting it was:
```
# good: [e24028141f2] qdmr: fixup
git bisect good e24028141f278590407dd5389330f23d01c89cff
```

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.

[System Info]
git version:
git version 2.42.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: 
/nix/store/lf0wpjrj8yx4gsmw2s3xfl58ixmqk8qa-bash-5.2-p15/bin/bash
uname: Linux 6.1.61 #1-NixOS SMP PREEMPT_DYNAMIC Thu Nov  2 08:35:33 UTC 
2023 x86_64
compiler info: gnuc: 12.3
libc info: glibc: 2.38
$SHELL (typically, interactive shell): /run/current-system/sw/bin/zsh

[Enabled Hooks]

^ permalink raw reply

* Re: rev-list default order sometimes very slow (size and metadata dependent)
From: Jeff King @ 2023-11-13 20:39 UTC (permalink / raw)
  To: Kevin Lyles; +Cc: git@vger.kernel.org, allred.sean@gmail.com
In-Reply-To: <DM6PR17MB247594CF3A0511EF95893935AEADA@DM6PR17MB2475.namprd17.prod.outlook.com>

On Sat, Nov 11, 2023 at 04:17:17PM +0000, Kevin Lyles wrote:

> If the archive branch is created with author/commit dates older than the
> rest of the repository, we're able to run:
> 
>   $ git rev-list --count --all
> 
> in ~9-10 seconds on a mirror clone with a commit-graph.  However, if the
> archive branch is instead created with author/commit dates newer than the
> rest of the repository, it takes 4-5 minutes.

Thanks for providing a reproduction script. I had trouble following the
text explanation, but I can easily reproduce the issue using your
script.

Running the slow case under perf, it looks like we are spending most of
the time in commit_list_insert_by_date(). Which makes sense. The queue
of commits to visit during the traversal is kept as an ordered linked
list that uses linear search for the insertion. So the worst case to
insert N commits is quadratic.

In practice, it tends not to be a big problem because we visit commits
in roughly reverse-timestamp order as we traverse. So it's more like
O(N*width), where "width" is the average number of simultaneous lines of
history. And your archive graph is...very wide.

Playing with the commit timestamps helps your case because it just
happens to reorder the queue in such a way that we put off looking at
those huge archive merges until much later in the traversal, when our
remaining "N" is much smaller.

The right solution is obviously to switch to a better data structure.
And some of the infrastructure is in place from the last time we dealt
with a similar case:

  https://lore.kernel.org/git/HE1PR0702MB3788FCDAB764252D9CBB42E5B0560@HE1PR0702MB3788.eurprd07.prod.outlook.com/

There we started using a priority queue instead of the linked list.
That patch stopped shorting of replacing the main revs->commits
traversal list, though, as it's referenced in a ton of places (some of
which really care about its list-like nature). Here's a hacky patch that
tries to lazily convert the list to a queue. It makes your case much
faster, but it also causes a few failures in the test suite (related to
commit ordering), so obviously it's violating some assumptions
somewhere. I'm not sure if it's a fruitful direction or not.

And as you noticed, using --topo-order does not suffer from the same
problem. It follows a completely different path, which...you guessed it,
uses a priority queue. I'm not sure what the current state of things is,
but one of the promises of commit-graphs is that with stored generation
numbers, we could compute topo-order incrementally and efficiently. So
another possible route is that we'd simply enable it by default. But I'm
not sure if there's more work that needs to be done to get there, or if
there are other downsides.

diff --git a/revision.c b/revision.c
index 00d5c29bfc..62d33dbfee 100644
--- a/revision.c
+++ b/revision.c
@@ -3119,6 +3119,7 @@ void release_revisions(struct rev_info *revs)
 	clear_decoration(&revs->treesame, free);
 	line_log_free(revs);
 	oidset_clear(&revs->missing_commits);
+	clear_prio_queue(&revs->commitq);
 }

 static void add_child(struct rev_info *revs, struct commit *parent, struct commit *child)
@@ -4161,6 +4162,36 @@ static void track_linear(struct rev_info *revs, struct commit *commit)

 static struct commit *get_revision_1(struct rev_info *revs)
 {
+	/*
+	 * This is kind of hacky. We'll dynamically move commits from the
+	 * linear list over to our priority queue, which has better worst-case
+	 * behavior. Doing it here serves two purposes:
+	 *
+	 *   - we'll let all of the setup code still operate on the "commits"
+	 *     list, and then just move it over to the queue lazily. This
+	 *     probably could be done in prepare_revision_walk() or similar,
+	 *     but by doing it here we know we're catching all code paths.
+	 *
+	 *   - we'll likewise catch any code which tries to add new commits to
+	 *     the list _during_ the traversal. The main one would be
+	 *     process_parents() below, which we've taught to use the queue
+	 *     directly. But this will catch any other random bits of code,
+	 *     including callers of the revision API, which add to the
+	 *     traversal as they go.
+	 *
+	 * A cleaner solution would probably be to get rid of the "commits"
+	 * list entirely, but that's a much bigger task.
+	 */
+	if (revs->commits && !revs->topo_order) {
+		struct commit_list *p;
+		if (!revs->unsorted_input)
+			revs->commitq.compare = compare_commits_by_commit_date;
+		for (p = revs->commits; p; p = p->next)
+			prio_queue_put(&revs->commitq, p->item);
+		free_commit_list(revs->commits);
+		revs->commits = NULL;
+	}
+
 	while (1) {
 		struct commit *commit;

@@ -4169,7 +4200,7 @@ static struct commit *get_revision_1(struct rev_info *revs)
 		else if (revs->topo_walk_info)
 			commit = next_topo_commit(revs);
 		else
-			commit = pop_commit(&revs->commits);
+			commit = prio_queue_get(&revs->commitq);

 		if (!commit)
 			return NULL;
@@ -4191,7 +4222,7 @@ static struct commit *get_revision_1(struct rev_info *revs)
 				try_to_simplify_commit(revs, commit);
 			else if (revs->topo_walk_info)
 				expand_topo_walk(revs, commit);
-			else if (process_parents(revs, commit, &revs->commits, NULL) < 0) {
+			else if (process_parents(revs, commit, NULL, &revs->commitq) < 0) {
 				if (!revs->ignore_missing_links)
 					die("Failed to traverse parents of commit %s",
 						oid_to_hex(&commit->object.oid));
diff --git a/revision.h b/revision.h
index 94c43138bc..4301b825a0 100644
--- a/revision.h
+++ b/revision.h
@@ -12,6 +12,7 @@
 #include "ident.h"
 #include "list-objects-filter-options.h"
 #include "strvec.h"
+#include "prio-queue.h"

 /**
  * The revision walking API offers functions to build a list of revisions
@@ -124,6 +125,12 @@ struct rev_info {
 	struct object_array pending;
 	struct repository *repo;

+	/*
+	 * We'll use this queue during the actual traversal, rather than
+	 * adding and popping from the "commits" list.
+	 */
+	struct prio_queue commitq;
+
 	/* Parents of shown commits */
 	struct object_array boundary_commits;

^ permalink raw reply related

* [PATCH v2] ci: avoid running the test suite _twice_
From: Johannes Schindelin via GitGitGadget @ 2023-11-13 20:25 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Johannes Schindelin, Johannes Schindelin
In-Reply-To: <pull.1613.git.1699894837844.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This is a late amendment of 4a6e4b960263 (CI: remove Travis CI support,
2021-11-23), whereby the `.prove` file (being written by the `prove`
command that is used to run the test suite) is no longer retained
between CI builds: This feature was only ever used in the Travis CI
builds, we tried for a while to do the same in Azure Pipelines CI runs
(but I gave up on it after a while), and we never used that feature in
GitHub Actions (nor does the new GitLab CI code use it).

Retaining the Prove cache has been fragile from the start, even though
the idea seemed good at the time, the idea being that the `.prove` file
caches information about previous `prove` runs (`save`) and uses them
(`slow`) to run the tests in the order from longer-running to shorter
ones, making optimal use of the parallelism implied by `--jobs=<N>`.

However, using a Prove cache can cause some surprising behavior: When
the `prove` caches information about a test script it has run,
subsequent `prove` runs (with `--state=slow`) will run the same test
script again even if said script is not specified on the `prove`
command-line!

So far, this bug did not matter. Right until d8f416bbb87c (ci: run unit
tests in CI, 2023-11-09) did it not matter.

But starting with that commit, we invoke `prove` _twice_ in CI, once to
run the regular test suite of regression test scripts, and once to run
the unit tests. Due to the bug, the second invocation re-runs all of the
tests that were already run as part of the first invocation. This not
only wastes build minutes, it also frequently causes the `osx-*` jobs to
fail because they already take a long time and now are likely to run
into a timeout.

The worst part about it is that there is actually no benefit to keep
running with `--state=slow,save`, ever since we decided no longer to
try to reuse the Prove cache between CI runs.

So let's just drop that Prove option and live happily ever after.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    ci: avoid running the test suite twice
    
    This
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166] is
    an example of a osx-* job that times out. Here
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166#step:4:839],
    it is running t0013, and here
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166#step:4:2765],
    it is run again (in the middle of the entire test suite, as part of make
    unit-tests).
    
    While this tries to fix a bug uncovered by js/doc-unit-tests, to avoid
    merge conflicts, this is based on ps/ci-gitlab.
    
    Changes since v1:
    
     * Reworded the commit message, specifically avoiding a reference to a
       patch that has not been upstreamed from Git for Windows yet.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1613%2Fdscho%2Favoid-running-test-suite-twice-in-ci-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1613/dscho/avoid-running-test-suite-twice-in-ci-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1613

Range-diff vs v1:

 1:  c74eda36246 ! 1:  880f2c19478 ci: avoid running the test suite _twice_
     @@ Metadata
       ## Commit message ##
          ci: avoid running the test suite _twice_
      
     -    This is a late amendment of 19ec39aab54 (ci: stop linking the `prove`
     -    cache, 2022-07-10), fixing a bug that had been hidden so far.
     -
     -    The bug is that the `.prove` cache stores information about previous
     -    `prove` runs (`save`) and uses them (`slow`, to run the tests in the
     -    order from longer-running to shorter ones).
     -
     -    This bug can cause some surprising behavior: when the Prove cache
     -    contains a reference to a test script, subsequent `prove` runs (with
     -    `--state=slow`) will run the same test script again even if said script
     -    is not specified on the `prove` command-line!
     +    This is a late amendment of 4a6e4b960263 (CI: remove Travis CI support,
     +    2021-11-23), whereby the `.prove` file (being written by the `prove`
     +    command that is used to run the test suite) is no longer retained
     +    between CI builds: This feature was only ever used in the Travis CI
     +    builds, we tried for a while to do the same in Azure Pipelines CI runs
     +    (but I gave up on it after a while), and we never used that feature in
     +    GitHub Actions (nor does the new GitLab CI code use it).
     +
     +    Retaining the Prove cache has been fragile from the start, even though
     +    the idea seemed good at the time, the idea being that the `.prove` file
     +    caches information about previous `prove` runs (`save`) and uses them
     +    (`slow`) to run the tests in the order from longer-running to shorter
     +    ones, making optimal use of the parallelism implied by `--jobs=<N>`.
     +
     +    However, using a Prove cache can cause some surprising behavior: When
     +    the `prove` caches information about a test script it has run,
     +    subsequent `prove` runs (with `--state=slow`) will run the same test
     +    script again even if said script is not specified on the `prove`
     +    command-line!
      
          So far, this bug did not matter. Right until d8f416bbb87c (ci: run unit
     -    tests in CI, 2023-11-09) it did not matter.
     -
     -    But starting with that commit, we run `prove` _twice_ in CI, and with
     -    completely different sets of tests to run. Due to the bug, the second
     -    invocation re-runs all of the tests that were already run as part of the
     -    first invocation. This not only wastes build minutes, it also frequently
     -    causes the `osx-*` jobs to fail because they already take a long time
     -    and now are likely to run into a timeout.
     +    tests in CI, 2023-11-09) did it not matter.
     +
     +    But starting with that commit, we invoke `prove` _twice_ in CI, once to
     +    run the regular test suite of regression test scripts, and once to run
     +    the unit tests. Due to the bug, the second invocation re-runs all of the
     +    tests that were already run as part of the first invocation. This not
     +    only wastes build minutes, it also frequently causes the `osx-*` jobs to
     +    fail because they already take a long time and now are likely to run
     +    into a timeout.
      
          The worst part about it is that there is actually no benefit to keep
          running with `--state=slow,save`, ever since we decided no longer to


 ci/lib.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ci/lib.sh b/ci/lib.sh
index 6dfc90d7f53..307a8df0b5a 100755
--- a/ci/lib.sh
+++ b/ci/lib.sh
@@ -281,7 +281,7 @@ else
 fi
 
 MAKEFLAGS="$MAKEFLAGS --jobs=$JOBS"
-GIT_PROVE_OPTS="--timer --jobs $JOBS --state=failed,slow,save"
+GIT_PROVE_OPTS="--timer --jobs $JOBS"
 
 GIT_TEST_OPTS="$GIT_TEST_OPTS --verbose-log -x"
 case "$CI_OS_NAME" in

base-commit: 0e3b67e2aa25edb7e1a5c999c87b52a7b3a7649a
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH] format-patch: fix ignored encode_email_headers for cover letter
From: Jeff King @ 2023-11-13 19:00 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Junio C Hamano, Simon Ser
In-Reply-To: <0c0d685c-29e5-462c-a743-4573a56d7e04@web.de>

On Sun, Nov 12, 2023 at 07:38:22PM +0100, René Scharfe wrote:

> > Grepping for other
> >      code that does the same thing, I see that show_log() and
> >      cmd_format_patch() copy a lot more.
> 
> show_log() copies almost half of the struct 6d167fd7cc members
> from struct rev_info!
> 
> cmd_format_patch() doesn't seem have a struct pretty_print_context,
> though...

Doh, you're right. I grepped for spots setting encode_email_headers, but
the one in cmd_format_patch() is copying it from the config-default into
the rev_info, not into the pretty-print context.

Which makes sense. It is going to call show_log() to show each commit,
which is where the value is copied into the pretty-print context.

> > (For that matter, why doesn't
> >      make_cover_letter() just use the context set up by
> >      cmd_format_patch()?).
> 
> ... which answers this question, but did you perhaps mean a different
> function?

Right, I was just confused.

> >   2. Why are we copying this stuff at all? When we introduced the
> >      pretty-print context back in 6bf139440c (clean up calling
> >      conventions for pretty.c functions, 2011-05-26), the idea was just
> >      to keep all of the format options together. But later, 6d167fd7cc
> >      (pretty: use fmt_output_email_subject(), 2017-03-01) added a
> >      pointer to the rev_info directly.
> 
> Hmm.  Was sticking the rev_info pointer in unwise?  Does it tangle up
> things that should be kept separate?  It uses force_in_body_from,
> grep_filter, sources, nr, total and subject_prefix from struct rev_info.
> That seems a lot, but is just a small fraction of its total members and
> we could just copy those we need.  Or prepare the subject string and
> pass it in, as before 6d167fd7cc.

I don't know that it was unwise. I was mostly just noting the history
because that explains why we _didn't_ simply refer to ctx->revs in
6bf139440c. Has the separation between the two been valuable since then?
I'm not sure. We do have some code paths that do not have a rev_info at
all (e.g., pp_commit_easy(), which makes an ad-hoc empty context
struct).

> > So could/should we just be using
> >      pp->rev->encode_email_headers here?
> 
> Perhaps.  If we want struct pretty_print_context to be a collection of
> named parameters, though, then it makes sense to avoid using
> complicated types to provide a nice interface to its callers, and
> struct rev_info is one of our largest structs we have.

Yeah, philosophically it may be better to keep the modules separated.
But if we end up having to just copy a ton of fields, it may not be as
practical. If we can at least factor that out into a single spot,
though, it may not be so bad.

> >      Or if that field is not always set (or we want to override some
> >      elements), should there be a single helper function to initialize
> >      the pretty_print_context from a rev_info, that could be shared
> >      between spots like show_log() and make_cover_letter()?
> 
> That would help avoid forgetting to copy something.  But copying is
> questionable in general, as you mentioned.  Given the extent of the
> overlap, would it make sense to embed struct pretty_print_context in
> struct rev_info instead?  Or a subset thereof?

I had a similar thought, but the pretty_print_context carries both input
options from the caller, as well as computed state used internally by
the pretty-print code. So you'd have to split those two up, I would
think. And now all of the pretty-print functions have to pass around
_two_ contexts.

That's more annoying, but arguably is a cleaner design (the internal
struct can be a private thing that is not even defined outside of
pretty.c). I dunno.

-Peff

^ permalink raw reply

* Re: [PATCH] ci: avoid running the test suite _twice_
From: Jeff King @ 2023-11-13 18:49 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Josh Steadmon, Phillip Wood, git, Johannes Schindelin
In-Reply-To: <pull.1613.git.1699894837844.gitgitgadget@gmail.com>

On Mon, Nov 13, 2023 at 05:00:37PM +0000, Johannes Schindelin via GitGitGadget wrote:

> This is a late amendment of 19ec39aab54 (ci: stop linking the `prove`
> cache, 2022-07-10), fixing a bug that had been hidden so far.

We don't seem to have that commit in Junio's tree; it is only in
git-for-windows.

Not that we should not fix things if they are broken, but I am trying
to understand if git/git is experiencing the same bug. It sounds like
not yet, though from looking at 19ec39aab54, I would expect to get these
doubled runs any time we store the prove state. But maybe without that
commit our state-file symlink is going somewhere invalid, and prove
fails to actually store anything?

> But starting with that commit, we run `prove` _twice_ in CI, and with
> completely different sets of tests to run. Due to the bug, the second
> invocation re-runs all of the tests that were already run as part of the
> first invocation. This not only wastes build minutes, it also frequently
> causes the `osx-*` jobs to fail because they already take a long time
> and now are likely to run into a timeout.
> 
> The worst part about it is that there is actually no benefit to keep
> running with `--state=slow,save`, ever since we decided no longer to
> try to reuse the Prove cache between CI runs.
> 
> So let's just drop that Prove option and live happily ever after.

Yes, I think this is the right thing to do regardless. If we are not
saving the state to use between two related runs, there is no point
storing it in the first place.

I do have to wonder, though, as somebody who did not follow the
unit-test topic closely: why are the unit tests totally separate from
the rest of the suite? I would think we'd want them run from one or more
t/t*.sh scripts. That would make bugs like this impossible, but also:

  1. They'd be run via "make test", so developers don't have to remember
     to run them separately.

  2. They can be run in parallel with all of the other tests when using
     "prove -j", etc.

-Peff

^ permalink raw reply

* Re: [PATCH v2] fuzz: add new oss-fuzz fuzzer for date.c / date.h
From: Jeff King @ 2023-11-13 18:35 UTC (permalink / raw)
  To: Arthur Chan via GitGitGadget; +Cc: git, Arthur Chan
In-Reply-To: <pull.1612.v2.git.1699892568344.gitgitgadget@gmail.com>

On Mon, Nov 13, 2023 at 04:22:48PM +0000, Arthur Chan via GitGitGadget wrote:

> +	str = (char *)malloc(size + 1);
> +	if (!str)
> +		return 0;
> +	memcpy(str, data, size);
> +	str[size] = '\0';

Is it important that we avoid calling die() if the malloc fails here?

The usual way to write this in our code base is just:

  str = xmemdupz(data, size);

It's not entirely a style thing; we sometimes audit the code base
looking for computations on malloc sizes (for integer overflows) as well
as sites that should be using xmalloc and are not. Obviously we can
exclude oss-fuzz/ from such audits, but if there's no reason not to
prefer our usual style, it's one less thing to worry about.

-Peff

^ permalink raw reply

* Re: [PATCH v4 4/4] rebase: rewrite --(no-)autosquash documentation
From: Phillip Wood @ 2023-11-13 17:02 UTC (permalink / raw)
  To: Andy Koppe, git; +Cc: gitster, newren
In-Reply-To: <20231111132720.78877-6-andy.koppe@gmail.com>

Hi Andy

On 11/11/2023 13:27, Andy Koppe wrote:
> Rewrite the description of the rebase --(no-)autosquash options to try
> to make it a bit clearer. Don't use "the '...'" to refer to part of a
> commit message, mention how --interactive can be used to review the
> todo list, and add a bit more detail on commit --squash/amend.

This version looks good to me, thanks for working on it

Best Wishes

Phillip

> Signed-off-by: Andy Koppe <andy.koppe@gmail.com>
> ---
>   Documentation/git-rebase.txt | 34 ++++++++++++++++++++--------------
>   1 file changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 10548e715c..1dd6555f66 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -589,21 +589,27 @@ See also INCOMPATIBLE OPTIONS below.
>   
>   --autosquash::
>   --no-autosquash::
> -	When the commit log message begins with "squash! ..." or "fixup! ..."
> -	or "amend! ...", and there is already a commit in the todo list that
> -	matches the same `...`, automatically modify the todo list of
> -	`rebase`, so that the commit marked for squashing comes right after
> -	the commit to be modified, and change the action of the moved commit
> -	from `pick` to `squash` or `fixup` or `fixup -C` respectively. A commit
> -	matches the `...` if the commit subject matches, or if the `...` refers
> -	to the commit's hash. As a fall-back, partial matches of the commit
> -	subject work, too. The recommended way to create fixup/amend/squash
> -	commits is by using the `--fixup`, `--fixup=amend:` or `--fixup=reword:`
> -	and `--squash` options respectively of linkgit:git-commit[1].
> +	Automatically squash commits with specially formatted messages into
> +	previous commits being rebased.  If a commit message starts with
> +	"squash! ", "fixup! " or "amend! ", the remainder of the subject line
> +	is taken as a commit specifier, which matches a previous commit if it
> +	matches the subject line or the hash of that commit.  If no commit
> +	matches fully, matches of the specifier with the start of commit
> +	subjects are considered.
>   +
> -If the `--autosquash` option is enabled by default using the
> -configuration variable `rebase.autoSquash`, this option can be
> -used to override and disable this setting.
> +In the rebase todo list, the actions of squash, fixup and amend commits are
> +changed from `pick` to `squash`, `fixup` or `fixup -C`, respectively, and they
> +are moved right after the commit they modify.  The `--interactive` option can
> +be used to review and edit the todo list before proceeding.
> ++
> +The recommended way to create commits with squash markers is by using the
> +`--squash`, `--fixup`, `--fixup=amend:` or `--fixup=reword:` options of
> +linkgit:git-commit[1], which take the target commit as an argument and
> +automatically fill in the subject line of the new commit from that.
> ++
> +Settting configuration variable `rebase.autoSquash` to true enables
> +auto-squashing by default for interactive rebase.  The `--no-autosquash`
> +option can be used to override that setting.
>   +
>   See also INCOMPATIBLE OPTIONS below.
>   


^ permalink raw reply

* Re: [PATCH v4 3/4] rebase: test autosquash with and without -i
From: Phillip Wood @ 2023-11-13 17:02 UTC (permalink / raw)
  To: Andy Koppe, git; +Cc: gitster, newren
In-Reply-To: <20231111132720.78877-4-andy.koppe@gmail.com>

On 11/11/2023 13:27, Andy Koppe wrote:
> Amend t3415-rebase-autosquash.sh to test the --autosquash option and
> rebase.autoSquash config with and without -i.

Thanks for adding these tests. I'd be happy to see this squashed into 
the previous commit, though that is probably not worth a re-roll on its own.

Best Wishes

Phillip

> Signed-off-by: Andy Koppe <andy.koppe@gmail.com>
> ---
>   t/t3415-rebase-autosquash.sh | 38 ++++++++++++++++++++++++++----------
>   1 file changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/t/t3415-rebase-autosquash.sh b/t/t3415-rebase-autosquash.sh
> index a364530d76..fcc40d6fe1 100755
> --- a/t/t3415-rebase-autosquash.sh
> +++ b/t/t3415-rebase-autosquash.sh
> @@ -43,7 +43,7 @@ test_auto_fixup () {
>   
>   	git tag $1 &&
>   	test_tick &&
> -	git rebase $2 -i HEAD^^^ &&
> +	git rebase $2 HEAD^^^ &&
>   	git log --oneline >actual &&
>   	if test -n "$no_squash"
>   	then
> @@ -61,15 +61,24 @@ test_auto_fixup () {
>   }
>   
>   test_expect_success 'auto fixup (option)' '
> -	test_auto_fixup final-fixup-option --autosquash
> +	test_auto_fixup fixup-option --autosquash &&
> +	test_auto_fixup fixup-option-i "--autosquash -i"
>   '
>   
> -test_expect_success 'auto fixup (config)' '
> +test_expect_success 'auto fixup (config true)' '
>   	git config rebase.autosquash true &&
> -	test_auto_fixup final-fixup-config-true &&
> +	test_auto_fixup ! fixup-config-true &&
> +	test_auto_fixup fixup-config-true-i -i &&
>   	test_auto_fixup ! fixup-config-true-no --no-autosquash &&
> +	test_auto_fixup ! fixup-config-true-i-no "-i --no-autosquash"
> +'
> +
> +test_expect_success 'auto fixup (config false)' '
>   	git config rebase.autosquash false &&
> -	test_auto_fixup ! final-fixup-config-false
> +	test_auto_fixup ! fixup-config-false &&
> +	test_auto_fixup ! fixup-config-false-i -i &&
> +	test_auto_fixup fixup-config-false-yes --autosquash &&
> +	test_auto_fixup fixup-config-false-i-yes "-i --autosquash"
>   '
>   
>   test_auto_squash () {
> @@ -87,7 +96,7 @@ test_auto_squash () {
>   	git commit -m "squash! first" -m "extra para for first" &&
>   	git tag $1 &&
>   	test_tick &&
> -	git rebase $2 -i HEAD^^^ &&
> +	git rebase $2 HEAD^^^ &&
>   	git log --oneline >actual &&
>   	if test -n "$no_squash"
>   	then
> @@ -105,15 +114,24 @@ test_auto_squash () {
>   }
>   
>   test_expect_success 'auto squash (option)' '
> -	test_auto_squash final-squash --autosquash
> +	test_auto_squash squash-option --autosquash &&
> +	test_auto_squash squash-option-i "--autosquash -i"
>   '
>   
> -test_expect_success 'auto squash (config)' '
> +test_expect_success 'auto squash (config true)' '
>   	git config rebase.autosquash true &&
> -	test_auto_squash final-squash-config-true &&
> +	test_auto_squash ! squash-config-true &&
> +	test_auto_squash squash-config-true-i -i &&
>   	test_auto_squash ! squash-config-true-no --no-autosquash &&
> +	test_auto_squash ! squash-config-true-i-no "-i --no-autosquash"
> +'
> +
> +test_expect_success 'auto squash (config false)' '
>   	git config rebase.autosquash false &&
> -	test_auto_squash ! final-squash-config-false
> +	test_auto_squash ! squash-config-false &&
> +	test_auto_squash ! squash-config-false-i -i &&
> +	test_auto_squash squash-config-false-yes --autosquash &&
> +	test_auto_squash squash-config-false-i-yes "-i --autosquash"
>   '
>   
>   test_expect_success 'misspelled auto squash' '


^ permalink raw reply

* Re: [PATCH v4 2/4] rebase: support --autosquash without -i
From: Phillip Wood @ 2023-11-13 17:01 UTC (permalink / raw)
  To: Andy Koppe, git; +Cc: gitster, newren
In-Reply-To: <20231111132720.78877-3-andy.koppe@gmail.com>

Hi Andy

On 11/11/2023 13:27, Andy Koppe wrote:
> The --autosquash option prevents preemptive fast-forwarding and triggers
> conflicts with amend backend options,

"amend" should be "apply". While this sentence is true I'm not quite 
sure how it relates to the change in this commit.

> yet it only actually performs
> auto-squashing when combined with the --interactive (or -i) option.
> 
> Remove the latter restriction and tweak the --autosquash description
> accordingly.

This seems like a reasonable change to me, thanks for working on it.

Best Wishes

Phillip

> Signed-off-by: Andy Koppe <andy.koppe@gmail.com>
> ---
>   Documentation/git-rebase.txt | 2 +-
>   builtin/rebase.c             | 4 +---
>   2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index b4526ca246..10548e715c 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -592,7 +592,7 @@ See also INCOMPATIBLE OPTIONS below.
>   	When the commit log message begins with "squash! ..." or "fixup! ..."
>   	or "amend! ...", and there is already a commit in the todo list that
>   	matches the same `...`, automatically modify the todo list of
> -	`rebase -i`, so that the commit marked for squashing comes right after
> +	`rebase`, so that the commit marked for squashing comes right after
>   	the commit to be modified, and change the action of the moved commit
>   	from `pick` to `squash` or `fixup` or `fixup -C` respectively. A commit
>   	matches the `...` if the commit subject matches, or if the `...` refers
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index a73de7892b..9f8192e0a5 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -710,10 +710,8 @@ static int run_specific_rebase(struct rebase_options *opts)
>   	if (opts->type == REBASE_MERGE) {
>   		/* Run sequencer-based rebase */
>   		setenv("GIT_CHERRY_PICK_HELP", resolvemsg, 1);
> -		if (!(opts->flags & REBASE_INTERACTIVE_EXPLICIT)) {
> +		if (!(opts->flags & REBASE_INTERACTIVE_EXPLICIT))
>   			setenv("GIT_SEQUENCE_EDITOR", ":", 1);
> -			opts->autosquash = 0;
> -		}
>   		if (opts->gpg_sign_opt) {
>   			/* remove the leading "-S" */
>   			char *tmp = xstrdup(opts->gpg_sign_opt + 2);


^ permalink raw reply

* Re: [PATCH v4 1/4] rebase: fully ignore rebase.autoSquash without -i
From: Phillip Wood @ 2023-11-13 17:01 UTC (permalink / raw)
  To: Andy Koppe, git; +Cc: gitster, newren
In-Reply-To: <20231111132720.78877-2-andy.koppe@gmail.com>

Hi Andy

On 11/11/2023 13:27, Andy Koppe wrote:
> Setting the rebase.autoSquash config variable to true implies a couple
> of restrictions: it prevents preemptive fast-forwarding and it triggers
> conflicts with amend backend options. However, it only actually results
> in auto-squashing when combined with the --interactive (or -i) option,
> due to code in run_specific_rebase() that disables auto-squashing unless
> the REBASE_INTERACTIVE_EXPLICIT flag is set.
> 
> Doing autosquashing for rebase.autoSquash without --interactive would be
> problematic in terms of backward compatibility, but conversely, there is
> no need for the aforementioned restrictions without --interactive.
> 
> So drop the options.config_autosquash check from the conditions for
> clearing allow_preemptive_ff, as the case where it is combined with
> --interactive is already covered by the REBASE_INTERACTIVE_EXPLICIT
> flag check above it.
> 
> Also drop the "apply options are incompatible with rebase.autoSquash"
> error, because it is unreachable if it is restricted to --interactive,
> as apply options already cause an error when used with --interactive.
> Drop the tests for the error from t3422-rebase-incompatible-options.sh,
> which has separate tests for the conflicts of --interactive with apply
> options.
> 
> When neither --autosquash nor --no-autosquash are given, only set
> options.autosquash to true if rebase.autosquash is combined with
> --interactive.
> 
> Don't initialize options.config_autosquash to -1, as there is no need to
> distinguish between rebase.autoSquash being unset or explicitly set to
> false.
> 
> Finally, amend the rebase.autoSquash documentation to say it only
> affects interactive mode.

Thanks for the well reasoned explanation of the changes. I think 
documenting rebase.autosquash as only applying to interactive rebases is 
a good way forward. The code changes all look sensible to me.

Best Wishes

Phillip

> Signed-off-by: Andy Koppe <andy.koppe@gmail.com>
> ---
>   Documentation/config/rebase.txt        |  4 +++-
>   builtin/rebase.c                       | 13 ++++++-------
>   t/t3422-rebase-incompatible-options.sh | 12 ------------
>   3 files changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/Documentation/config/rebase.txt b/Documentation/config/rebase.txt
> index 9c248accec..d59576dbb2 100644
> --- a/Documentation/config/rebase.txt
> +++ b/Documentation/config/rebase.txt
> @@ -9,7 +9,9 @@ rebase.stat::
>   	rebase. False by default.
>   
>   rebase.autoSquash::
> -	If set to true enable `--autosquash` option by default.
> +	If set to true, enable the `--autosquash` option of
> +	linkgit:git-rebase[1] by default for interactive mode.
> +	This can be overridden with the `--no-autosquash` option.
>   
>   rebase.autoStash::
>   	When set to true, automatically create a temporary stash entry
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 043c65dccd..a73de7892b 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -149,7 +149,6 @@ struct rebase_options {
>   		.reapply_cherry_picks = -1,             \
>   		.allow_empty_message = 1,               \
>   		.autosquash = -1,                       \
> -		.config_autosquash = -1,                \
>   		.rebase_merges = -1,                    \
>   		.config_rebase_merges = -1,             \
>   		.update_refs = -1,                      \
> @@ -1405,7 +1404,6 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>   	if ((options.flags & REBASE_INTERACTIVE_EXPLICIT) ||
>   	    (options.action != ACTION_NONE) ||
>   	    (options.exec.nr > 0) ||
> -	    (options.autosquash == -1 && options.config_autosquash == 1) ||
>   	    options.autosquash == 1) {
>   		allow_preemptive_ff = 0;
>   	}
> @@ -1508,8 +1506,6 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>   			if (is_merge(&options))
>   				die(_("apply options and merge options "
>   					  "cannot be used together"));
> -			else if (options.autosquash == -1 && options.config_autosquash == 1)
> -				die(_("apply options are incompatible with rebase.autoSquash.  Consider adding --no-autosquash"));
>   			else if (options.rebase_merges == -1 && options.config_rebase_merges == 1)
>   				die(_("apply options are incompatible with rebase.rebaseMerges.  Consider adding --no-rebase-merges"));
>   			else if (options.update_refs == -1 && options.config_update_refs == 1)
> @@ -1529,10 +1525,13 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>   	options.rebase_merges = (options.rebase_merges >= 0) ? options.rebase_merges :
>   				((options.config_rebase_merges >= 0) ? options.config_rebase_merges : 0);
>   
> -	if (options.autosquash == 1)
> +	if (options.autosquash == 1) {
>   		imply_merge(&options, "--autosquash");
> -	options.autosquash = (options.autosquash >= 0) ? options.autosquash :
> -			     ((options.config_autosquash >= 0) ? options.config_autosquash : 0);
> +	} else if (options.autosquash == -1) {
> +		options.autosquash =
> +			options.config_autosquash &&
> +			(options.flags & REBASE_INTERACTIVE_EXPLICIT);
> +	}
>   
>   	if (options.type == REBASE_UNSPECIFIED) {
>   		if (!strcmp(options.default_backend, "merge"))
> diff --git a/t/t3422-rebase-incompatible-options.sh b/t/t3422-rebase-incompatible-options.sh
> index 2eba00bdf5..b40f26250b 100755
> --- a/t/t3422-rebase-incompatible-options.sh
> +++ b/t/t3422-rebase-incompatible-options.sh
> @@ -100,12 +100,6 @@ test_rebase_am_only () {
>   		test_must_fail git rebase $opt --root A
>   	"
>   
> -	test_expect_success "$opt incompatible with rebase.autosquash" "
> -		git checkout B^0 &&
> -		test_must_fail git -c rebase.autosquash=true rebase $opt A 2>err &&
> -		grep -e --no-autosquash err
> -	"
> -
>   	test_expect_success "$opt incompatible with rebase.rebaseMerges" "
>   		git checkout B^0 &&
>   		test_must_fail git -c rebase.rebaseMerges=true rebase $opt A 2>err &&
> @@ -118,12 +112,6 @@ test_rebase_am_only () {
>   		grep -e --no-update-refs err
>   	"
>   
> -	test_expect_success "$opt okay with overridden rebase.autosquash" "
> -		test_when_finished \"git reset --hard B^0\" &&
> -		git checkout B^0 &&
> -		git -c rebase.autosquash=true rebase --no-autosquash $opt A
> -	"
> -
>   	test_expect_success "$opt okay with overridden rebase.rebaseMerges" "
>   		test_when_finished \"git reset --hard B^0\" &&
>   		git checkout B^0 &&


^ permalink raw reply

* [PATCH] ci: avoid running the test suite _twice_
From: Johannes Schindelin via GitGitGadget @ 2023-11-13 17:00 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This is a late amendment of 19ec39aab54 (ci: stop linking the `prove`
cache, 2022-07-10), fixing a bug that had been hidden so far.

The bug is that the `.prove` cache stores information about previous
`prove` runs (`save`) and uses them (`slow`, to run the tests in the
order from longer-running to shorter ones).

This bug can cause some surprising behavior: when the Prove cache
contains a reference to a test script, subsequent `prove` runs (with
`--state=slow`) will run the same test script again even if said script
is not specified on the `prove` command-line!

So far, this bug did not matter. Right until d8f416bbb87c (ci: run unit
tests in CI, 2023-11-09) it did not matter.

But starting with that commit, we run `prove` _twice_ in CI, and with
completely different sets of tests to run. Due to the bug, the second
invocation re-runs all of the tests that were already run as part of the
first invocation. This not only wastes build minutes, it also frequently
causes the `osx-*` jobs to fail because they already take a long time
and now are likely to run into a timeout.

The worst part about it is that there is actually no benefit to keep
running with `--state=slow,save`, ever since we decided no longer to
try to reuse the Prove cache between CI runs.

So let's just drop that Prove option and live happily ever after.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    ci: avoid running the test suite twice

    This
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166] is
    an example of a osx-* job that times out. Here
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166#step:4:839],
    it is running t0013, and here
    [https://github.com/git/git/actions/runs/6845537889/job/18614840166#step:4:2765],
    it is run again (in the middle of the entire test suite, as part of make
    unit-tests).

    While this tries to fix a bug uncovered by js/doc-unit-tests, to avoid
    merge conflicts, this is based on ps/ci-gitlab.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1613%2Fdscho%2Favoid-running-test-suite-twice-in-ci-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1613/dscho/avoid-running-test-suite-twice-in-ci-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1613

 ci/lib.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ci/lib.sh b/ci/lib.sh
index 6dfc90d7f53..307a8df0b5a 100755
--- a/ci/lib.sh
+++ b/ci/lib.sh
@@ -281,7 +281,7 @@ else
 fi

 MAKEFLAGS="$MAKEFLAGS --jobs=$JOBS"
-GIT_PROVE_OPTS="--timer --jobs $JOBS --state=failed,slow,save"
+GIT_PROVE_OPTS="--timer --jobs $JOBS"

 GIT_TEST_OPTS="$GIT_TEST_OPTS --verbose-log -x"
 case "$CI_OS_NAME" in

base-commit: 0e3b67e2aa25edb7e1a5c999c87b52a7b3a7649a
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH 1/1] attr: add native file mode values support
From: Torsten Bögershausen @ 2023-11-13 16:50 UTC (permalink / raw)
  To: Joanna Wang; +Cc: git
In-Reply-To: <20231111040309.2848560-1-jojwang@google.com>

On Sat, Nov 11, 2023 at 04:03:08AM +0000, Joanna Wang wrote:

Some thoughts and comments inline...

> Gives all paths inherent 'mode' attribute values based on the paths'
> modes (one of 100644, 100755, 120000, 040000, 160000). Users may use
> this feature to filter by file types. For example a pathspec such as
> ':(attr:mode=160000)' could filter for submodules without needing

My spontanous feeling is that filetype may be another choice:
> ':(attr:filetype=160000)' could filter for submodules without needing
And having written this, we can think using something borrowed from
`find . -type f`

:(attr:filetype=f)' or :(attr:filetype=x)' (for executable)

> `mode=160000` to actually be specified for each subdmoule path in
> .gitattributes. The native mode values are also reflected in
> `git check-attr` results.

But then I missed the point why we need an attribute here?
The mode is already defined by the the file system (and Git),
is there a special reason that the user can define or re-define the
value here ?
May be there is, when working with pathspec.
But then "pathspec=" could be a better construction.
Since "mode" could make a reader think that Git does somewhat with the file
when checking out.
My personal hope reading "mode=100755" in .gitattributes would
be that Git makes it executable when checking out, if if it is
recorded in Git as 100644, probably coming from a file-system that
doesn't support the executable bit in a Unix way.


> If there is any existing mode attribute for a path (e.g. there is
> !mode, -mode, mode, mode=<value> in .gitattributes) that setting will
> take precedence over the native mode value.
>
> ---
>
> I went with 'mode' because that is the word used in documentation
> (e.g. https://git-scm.com/book/sv/v2/Git-Internals-Git-Objects)
> and in the code (e.g. `ce_mode`). Please let me know what you think
> of this UX.
>
> The implementation happens within git_check_attr() method which is
> the common mathod called for getting a pathspec attribute value.
>
> The previous discussed idea did not work with `git check-attr`.
> (https://lore.kernel.org/all/CAMmZTi8swsSMcLUcW+YwUDg8GcrY_ks2+i35-nsHE3o9MNpsUQ@mail.gmail.com/).
>
> There are no tests for excluding based on pathspec attrs in subdirectories
> due to an existing bug. Since it is not specific to native mode, I thought
> it would be better to fix separately.
> https://lore.kernel.org/all/CAMmZTi89Un+bsLXdEdYs44oT8eLNp8y=Pm8ywaurcQ7ccRKGdQ@mail.gmail.com/

Reading "pathspec attrs" above it feels better to call the new attribute pathspec.
But why should a user be allowed to overwrite what we have in the index ?
That is somewhat missing in the motivation for this change,
it would be good to have this explained in the commit message.

>
>  Documentation/gitattributes.txt | 10 +++++++
>  attr.c                          | 42 ++++++++++++++++++++++++--
>  t/t0003-attributes.sh           | 40 +++++++++++++++++++++++++
>  t/t6135-pathspec-with-attrs.sh  | 53 +++++++++++++++++++++++++++++++++
>  4 files changed, 143 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 8c1793c148..bb3c11f151 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -156,6 +156,16 @@ Unspecified::
>  Any other value causes Git to act as if `text` has been left
>  unspecified.
>
> +`mode`
> +^^^^^
> +
> +This attribute is for filtering files by their file bit modes
> +(40000, 120000, 160000, 100755, 100644) and has native support in git,
> +meaning values for this attribute are natively set (e.g. mode=100644) by
> +git without having to specify them in .gitattributes. However, if
> +'mode' is set in .gitattributest for some path, that value takes precendence,
> +whether it is 'set', 'unset', 'unspecified', or some other value.
> +
>  `eol`
>  ^^^^^
>
> diff --git a/attr.c b/attr.c
> index e62876dfd3..95dc1cf695 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -1240,20 +1240,58 @@ static struct object_id *default_attr_source(void)
>  	return &attr_source;
>  }
>
> +
> +/*
> + * This implements native file mode attr support.
> + * We always return the current mode of the path, not the mode stored
> + * in the index, unless the path is a directory where we check the index
> + * to see if it is a GITLINK. It is ok to rely on the index for GITLINK
> + * modes because a path cannot become a GITLINK (which is a git concept only)
> + * without having it indexed with a GITLINK mode in git.
> + */
> +static unsigned int native_mode_attr(struct index_state *istate, const char *path)
> +{
> +	struct stat st;
> +	unsigned int mode;
> +	if (lstat(path, &st))
> +		die_errno(_("unable to stat '%s'"), path);
> +	mode = canon_mode(st.st_mode);
> +	if (S_ISDIR(mode)) {
> +		int pos = index_name_pos(istate, path, strlen(path));
> +		if (pos >= 0)
> +			if (S_ISGITLINK(istate->cache[pos]->ce_mode))
> +				return istate->cache[pos]->ce_mode;
> +	}
> +	return mode;
> +}
> +
> +
>  void git_check_attr(struct index_state *istate,
>  		    const char *path,
>  		    struct attr_check *check)
>  {
>  	int i;
>  	const struct object_id *tree_oid = default_attr_source();
> +	struct strbuf sb = STRBUF_INIT;
>
>  	collect_some_attrs(istate, tree_oid, path, check);
>
>  	for (i = 0; i < check->nr; i++) {
>  		unsigned int n = check->items[i].attr->attr_nr;
>  		const char *value = check->all_attrs[n].value;
> -		if (value == ATTR__UNKNOWN)
> -			value = ATTR__UNSET;
> +		if (value == ATTR__UNKNOWN){
> +			if (strcmp(check->all_attrs[n].attr->name, "mode") == 0) {
> +				/*
> +				 * If value is ATTR_UNKNOWN then it has not been set
> +				 * anywhere with -mode (ATTR_FALSE), !mode (ATTR_UNSET),
> +				 * mode (ATTR_TRUE), or an explicit value. We fill
> +				 * value with the native mode value.
> +				 */
> +				strbuf_addf(&sb, "%06o", native_mode_attr(istate, path));
> +				value = sb.buf;
> +			} else
> +				value = ATTR__UNSET;
> +		}
>  		check->items[i].value = value;
>  	}
>  }
> diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
> index aee2298f01..9c2603d8e2 100755
> --- a/t/t0003-attributes.sh
> +++ b/t/t0003-attributes.sh
> @@ -19,6 +19,15 @@ attr_check () {
>  	test_must_be_empty err
>  }
>
> +attr_check_mode () {
> +	path="$1" expect="$2" git_opts="$3" &&
It would be easier to read, if each assignment is on it's on line
> +
> +	git $git_opts check-attr mode -- "$path" >actual 2>err &&
> +	echo "$path: mode: $expect" >expect &&
> +	test_cmp expect actual &&
> +	test_must_be_empty err
> +}
> +
>  attr_check_quote () {
>  	path="$1" quoted_path="$2" expect="$3" &&
>
> @@ -558,4 +567,35 @@ test_expect_success EXPENSIVE 'large attributes file ignored in index' '
>  	test_cmp expect err
>  '
>
> +test_expect_success 'submodule with .git file' '
> +	mkdir sub &&
> +	(cd sub &&
> +	git init &&
> +	mv .git .real &&
> +	echo "gitdir: .real" >.git &&
> +	test_commit first) &&
> +	git update-index --add -- sub

Style and indentation (Please use TAB for indenting)
Using sub-shells is good. Somewhat easier to read would be this:

	mkdir sub &&
	(
	  cd sub &&
	  git init &&
	  mv .git .real &&
	  echo "gitdir: .real" >.git &&
	  test_commit first
	)  &&




> +'
> +
> +test_expect_success 'native mode attributes work' '
> +	>exec && chmod +x exec && attr_check_mode exec 100755 &&
> +	>normal && attr_check_mode normal 100644 &&
> +	mkdir dir && attr_check_mode dir 040000 &&
> +	ln -s normal normal_sym && attr_check_mode normal_sym 120000 &&
> +	attr_check_mode sub 160000
> +'

We need a precondition here:
test_expect_success SYMLINKS

> +
> +test_expect_success '.gitattributes mode values take precedence' '
> +	(
> +		echo "mode_value* mode=myownmode" &&
> +		echo "mode_set* mode" &&
> +		echo "mode_unset* -mode" &&
> +		echo "mode_unspecified* !mode"
> +	) >.gitattributes &&

Can this be written using a
	cat <<-\EOF >expect &&

	EOF
expression ?

> +	>mode_value && attr_check_mode mode_value myownmode &&
> +	>mode_unset && attr_check_mode mode_unset unset &&
> +	>mode_unspecified && attr_check_mode mode_unspecified unspecified &&
> +	>mode_set && attr_check_mode mode_set set
> +'
> +
>  test_done
> diff --git a/t/t6135-pathspec-with-attrs.sh b/t/t6135-pathspec-with-attrs.sh
> index a9c1e4e0ec..fd9569d39b 100755
> --- a/t/t6135-pathspec-with-attrs.sh
> +++ b/t/t6135-pathspec-with-attrs.sh
> @@ -64,6 +64,9 @@ test_expect_success 'setup .gitattributes' '
>  	fileSetLabel label
>  	fileValue label=foo
>  	fileWrongLabel label☺
> +	mode_set* mode=1234
> +	mode_unset* -mode
> +	mode_unspecified* !mode
>  	EOF
>  	echo fileSetLabel label1 >sub/.gitattributes &&
>  	git add .gitattributes sub/.gitattributes &&
> @@ -295,4 +298,54 @@ test_expect_success 'reading from .gitattributes in a subdirectory (3)' '
>  	test_cmp expect actual
>  '
>
> +test_expect_success 'mode attr is handled correctly for overriden values' '
> +	>mode_set_1 &&
> +	>mode_unset_1 &&
> +	>mode_unspecified_1 &&
> +	>mode_regular_file_1 &&
> +
> +	git status -s ":(attr:mode=1234)mode*" >actual &&
> +	cat <<-\EOF >expect &&
> +	?? mode_set_1
> +	EOF
> +	test_cmp expect actual &&
> +
> +	git status -s ":(attr:-mode)mode*" >actual &&
> +	echo ?? mode_unset_1 >expect &&
> +	test_cmp expect actual &&
> +
> +	git status -s ":(attr:!mode)mode*" >actual &&
> +	echo ?? mode_unspecified_1 >expect &&
> +	test_cmp expect actual &&
> +
> +	git status -s ":(attr:mode=100644)mode*" >actual &&
> +	echo ?? mode_regular_file_1 >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'mode attr values are current file modes, not indexed modes' '
> +	>mode_exec_file_1 &&
> +
> +	git status -s ":(attr:mode=100644)mode_exec_*" >actual &&
> +	echo ?? mode_exec_file_1 >expect &&
> +	test_cmp expect actual &&
> +
> +	git add mode_exec_file_1 && chmod +x mode_exec_file_1 &&
> +	git status -s ":(attr:mode=100755)mode_exec_*" >actual &&
> +	echo AM mode_exec_file_1 >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'mode attr can be excluded' '
> +	>mode_1_regular &&
> +	>mode_1_exec  && chmod +x mode_1_exec &&
> +	git status -s ":(exclude,attr:mode=100644)" "mode_1_*" >actual &&
> +	echo ?? mode_1_exec >expect &&
> +	test_cmp expect actual &&
> +
> +	git status -s ":(exclude,attr:mode=100755)" "mode_1_*" >actual &&
> +	echo ?? mode_1_regular >expect &&
> +	test_cmp expect actual
> +'
> +
>  test_done
> --
> 2.42.0.869.gea05f2083d-goog
>
>

^ permalink raw reply

* [PATCH v2] fuzz: add new oss-fuzz fuzzer for date.c / date.h
From: Arthur Chan via GitGitGadget @ 2023-11-13 16:22 UTC (permalink / raw)
  To: git; +Cc: Arthur Chan, Arthur Chan
In-Reply-To: <pull.1612.git.1699724379458.gitgitgadget@gmail.com>

From: Arthur Chan <arthur.chan@adalogics.com>

Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
---
    fuzz: add new oss-fuzz fuzzer for date.c / date.h
    
    This patch is aimed to add a new oss-fuzz fuzzer to the oss-fuzz
    directory for fuzzing date.c / date.h in the base directory.
    
    The .gitignore of the oss-fuzz directory and the Makefile have been
    modified to accommodate the new fuzzer fuzz-date.c.
    
    Fixed the objects order in .gitignore and Makefiles and fixed some of
    the logic and formatting for the fuzz-date.c fuzzer in v2.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1612%2Farthurscchan%2Fnew-fuzzer-date-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1612/arthurscchan/new-fuzzer-date-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1612

Range-diff vs v1:

 1:  d43724c19d0 ! 1:  2928e2b858d fuzz: add new oss-fuzz fuzzer for date.c / date.h
     @@ Commit message
          Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
      
       ## Makefile ##
     -@@ Makefile: ETAGS_TARGET = TAGS
     +@@ Makefile: SCRIPTS = $(SCRIPT_SH_GEN) \
     + ETAGS_TARGET = TAGS
     + 
       FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o
     ++FUZZ_OBJS += oss-fuzz/fuzz-date.o
       FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o
       FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o
     -+FUZZ_OBJS += oss-fuzz/fuzz-date.o
       .PHONY: fuzz-objs
     - fuzz-objs: $(FUZZ_OBJS)
     - 
      
       ## oss-fuzz/.gitignore ##
      @@
       fuzz-commit-graph
     ++fuzz-date
       fuzz-pack-headers
       fuzz-pack-idx
     -+fuzz-date
      
       ## oss-fuzz/fuzz-date.c (new) ##
      @@
     @@ oss-fuzz/fuzz-date.c (new)
      +
      +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
      +{
     -+	int type;
     -+	int time;
     ++	int local;
      +	int num;
     ++	uint16_t tz;
      +	char *str;
      +	timestamp_t ts;
      +	enum date_mode_type dmtype;
      +	struct date_mode *dm;
      +
     -+	if (size <= 8)
     -+	{
     ++	if (size <= 4)
     ++		/*
     ++		 * we use the first byte to fuzz dmtype and local,
     ++		 * then the next three bytes to fuzz tz	offset,
     ++		 * and the remainder (at least one byte) is fed
     ++		 * as end-user input to approxidate_careful().
     ++		 */
      +		return 0;
     -+	}
      +
     -+	type = (*((int *)data)) % 8;
     -+	data += 4;
     -+	size -= 4;
     ++	local = !!(*data & 0x10);
     ++	dmtype = (enum date_mode_type)(*data % DATE_UNIX);
     ++	if (dmtype == DATE_STRFTIME)
     ++		/*
     ++		 * Currently DATE_STRFTIME is not supported.
     ++		 */
     ++		return 0;
     ++	data++;
     ++	size--;
      +
     -+	time = abs(*((int *)data));
     -+	data += 4;
     -+	size -= 4;
     ++	tz = *data++;
     ++	tz = (tz << 8) | *data++;
     ++	tz = (tz << 8) | *data++;
     ++	size -= 3;
      +
     -+	str = (char *)malloc(size+1);
     ++	str = (char *)malloc(size + 1);
      +	if (!str)
     -+	{
      +		return 0;
     -+	}
      +	memcpy(str, data, size);
      +	str[size] = '\0';
      +
      +	ts = approxidate_careful(str, &num);
      +	free(str);
      +
     -+	switch(type)
     -+	{
     -+		case 0: default:
     -+			dmtype = DATE_NORMAL;
     -+			break;
     -+		case 1:
     -+			dmtype = DATE_HUMAN;
     -+			break;
     -+		case 2:
     -+			dmtype = DATE_SHORT;
     -+			break;
     -+		case 3:
     -+			dmtype = DATE_ISO8601;
     -+			break;
     -+		case 4:
     -+			dmtype = DATE_ISO8601_STRICT;
     -+			break;
     -+		case 5:
     -+			dmtype = DATE_RFC2822;
     -+			break;
     -+		case 6:
     -+			dmtype = DATE_RAW;
     -+			break;
     -+		case 7:
     -+			dmtype = DATE_UNIX;
     -+			break;
     -+	}
     -+
      +	dm = date_mode_from_type(dmtype);
     -+	dm->local = 1;
     -+	show_date(ts, time, dm);
     ++	dm->local = local;
     ++	show_date(ts, (int16_t)tz, dm);
      +
      +	date_mode_release(dm);
      +


 Makefile             |  1 +
 oss-fuzz/.gitignore  |  1 +
 oss-fuzz/fuzz-date.c | 56 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 58 insertions(+)
 create mode 100644 oss-fuzz/fuzz-date.c

diff --git a/Makefile b/Makefile
index 03adcb5a480..4b875ef6ce1 100644
--- a/Makefile
+++ b/Makefile
@@ -750,6 +750,7 @@ SCRIPTS = $(SCRIPT_SH_GEN) \
 ETAGS_TARGET = TAGS
 
 FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o
+FUZZ_OBJS += oss-fuzz/fuzz-date.o
 FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o
 FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o
 .PHONY: fuzz-objs
diff --git a/oss-fuzz/.gitignore b/oss-fuzz/.gitignore
index 9acb74412ef..5b954088254 100644
--- a/oss-fuzz/.gitignore
+++ b/oss-fuzz/.gitignore
@@ -1,3 +1,4 @@
 fuzz-commit-graph
+fuzz-date
 fuzz-pack-headers
 fuzz-pack-idx
diff --git a/oss-fuzz/fuzz-date.c b/oss-fuzz/fuzz-date.c
new file mode 100644
index 00000000000..a5c887bf11c
--- /dev/null
+++ b/oss-fuzz/fuzz-date.c
@@ -0,0 +1,56 @@
+#include "git-compat-util.h"
+#include "date.h"
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+	int local;
+	int num;
+	uint16_t tz;
+	char *str;
+	timestamp_t ts;
+	enum date_mode_type dmtype;
+	struct date_mode *dm;
+
+	if (size <= 4)
+		/*
+		 * we use the first byte to fuzz dmtype and local,
+		 * then the next three bytes to fuzz tz	offset,
+		 * and the remainder (at least one byte) is fed
+		 * as end-user input to approxidate_careful().
+		 */
+		return 0;
+
+	local = !!(*data & 0x10);
+	dmtype = (enum date_mode_type)(*data % DATE_UNIX);
+	if (dmtype == DATE_STRFTIME)
+		/*
+		 * Currently DATE_STRFTIME is not supported.
+		 */
+		return 0;
+	data++;
+	size--;
+
+	tz = *data++;
+	tz = (tz << 8) | *data++;
+	tz = (tz << 8) | *data++;
+	size -= 3;
+
+	str = (char *)malloc(size + 1);
+	if (!str)
+		return 0;
+	memcpy(str, data, size);
+	str[size] = '\0';
+
+	ts = approxidate_careful(str, &num);
+	free(str);
+
+	dm = date_mode_from_type(dmtype);
+	dm->local = local;
+	show_date(ts, (int16_t)tz, dm);
+
+	date_mode_release(dm);
+
+	return 0;
+}

base-commit: dadef801b365989099a9929e995589e455c51fed
-- 
gitgitgadget

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox