Git development
 help / color / mirror / Atom feed
* Re: Method for Calculating Statistics of Developer Contribution to a Specified Branch.
From: Hongyi Zhao @ 2023-10-17 11:37 UTC (permalink / raw)
  To: brian m. carlson, Hongyi Zhao, Git List
In-Reply-To: <ZS2qZtYDvItovjqg@tapette.crustytoothpaste.net>

On Tue, Oct 17, 2023 at 5:26 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2023-10-16 at 14:10:01, Hongyi Zhao wrote:
> > Dear Git Mailing List,
> >
> > I am a developer currently working on a project and I wanted to
> > establish statistics for each team member's contribution to a specific
> > branch.
> >
> > Say, for a user "JianboLin", I am currently using the following method:
> >
> > $ git clone https://github.com/OrderN/CONQUEST-release.git
> > $ cd CONQUEST-release
> > $ git log --author="JianboLin" --stat --summary origin/f-mlff | awk
> > 'NF ==4 && $2 =="|" && $3 ~/[0-9]+/ && $4 ~/[+-]+|[+]+|[-]+/ {s+=$3}
> > END {print s}'
> >
> > Using the above command, I am able to calculate the number of lines
> > contributed by a specific author on a specific branch, which allows me
> > to quantify the contribution to a branch by each team member.
> >
> > However, I would like to know if a more efficient or accurate method
> > exists to carry out this task. Are there any other parameters,
> > commands, or aspects I need to consider to get a more comprehensive
> > measure of contribution?
>
> Can you maybe explain what you want to measure and what your goal is in
> doing so?
>
> The problem is that lines of code isn't really that useful as a measure
> of contribution value or developer productivity, which are the reasons
> people typically measure that metric.  For example, with three lines, a
> colleague fixed a persistently difficult-to-reproduce problem which had
> been affecting many of our largest customers.  That was a very valuable
> contribution, but not very large.  I've made similar kinds of changes
> myself, both at work and in open source projects.
>
> Certainly you can compute the number of lines of code changed by a
> developer, but that is not typically a very useful metric, since it
> doesn't lead you to any interesting conclusions about the benefits or
> value of the contributions or developer in question.  However, perhaps
> you have a different goal in mind, and if you can explain what that is,
> we may be able to help you find a better way of doing it.

I want to calculate a certain developer's contribution based on
different standards of code line count and the importance of the code.

> --
> brian m. carlson (he/him or they/them)
> Toronto, Ontario, CA

Regards,
Zhao

^ permalink raw reply

* Re: Git Pathspec bug
From: Kristoffer Haugsbakk @ 2023-10-17 10:51 UTC (permalink / raw)
  To: Moritz Widmann; +Cc: git
In-Reply-To: <7368e4ad-b05b-4b8f-a13b-0a68b442e72b@tweaklab.org>

On Tue, Oct 17, 2023, at 11:45, Moritz Widmann wrote:
> I executed the following command in zsh (added `command` just to be sure 
> that there's no aliases or functions)
>
> command git submodule add 
> 'git@github.com:moritz-t-w/Godot-Onscreen-Keyboard.git' '.'
> fatal: empty string is not a valid pathspec. please use . instead if you 
> meant to match all paths
>
> Git Version: 2.42.0
>
> OS: Arch Linux

Is this the same issue?: https://stackoverflow.com/a/53441183/1725151

^ permalink raw reply

* Git Pathspec bug
From: Moritz Widmann @ 2023-10-17  9:45 UTC (permalink / raw)
  To: git

I executed the following command in zsh (added `command` just to be sure 
that there's no aliases or functions)

command git submodule add 
'git@github.com:moritz-t-w/Godot-Onscreen-Keyboard.git' '.'
fatal: empty string is not a valid pathspec. please use . instead if you 
meant to match all paths

Git Version: 2.42.0

OS: Arch Linux


^ permalink raw reply

* Re: [PATCH v3 00/17] bloom: changed-path Bloom filters v2 (& sundries)
From: Patrick Steinhardt @ 2023-10-17  8:45 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Jonathan Tan, Junio C Hamano, Jeff King, SZEDER Gábor
In-Reply-To: <cover.1696969994.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 10543 bytes --]

On Tue, Oct 10, 2023 at 04:33:17PM -0400, Taylor Blau wrote:
> (Rebased onto the tip of 'master', which is 3a06386e31 (The fifteenth
> batch, 2023-10-04), at the time of writing).
> 
> This series is a reroll of the combined efforts of [1] and [2] to
> introduce the v2 changed-path Bloom filters, which fixes a bug in our
> existing implementation of murmur3 paths with non-ASCII characters (when
> the "char" type is signed).
> 
> In large part, this is the same as the previous round. But this round
> includes some extra bits that address issues pointed out by SZEDER
> Gábor, which are:
> 
>   - not reading Bloom filters for root commits
>   - corrupting Bloom filter reads by tweaking the filter settings
>     between layers.
> 
> These issues were discussed in (among other places) [3], and [4],
> respectively.
> 
> Thanks to Jonathan, Peff, and SZEDER who have helped a great deal in
> assembling these patches. As usual, a range-diff is included below.
> Thanks in advance for your
> review!

As this patch series has been sitting around without reviews for a week
I've tried my best to give it a go. Note though that this area is mostly
outside of my own comfort zone, so some of the questions and suggestions
might ultimately not apply.

Patrick

> [1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
> [2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/
> [3]: https://public-inbox.org/git/20201015132147.GB24954@szeder.dev/
> [4]: https://lore.kernel.org/git/20230830200218.GA5147@szeder.dev/
> 
> Jonathan Tan (4):
>   gitformat-commit-graph: describe version 2 of BDAT
>   t4216: test changed path filters with high bit paths
>   repo-settings: introduce commitgraph.changedPathsVersion
>   commit-graph: new filter ver. that fixes murmur3
> 
> Taylor Blau (13):
>   t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
>   revision.c: consult Bloom filters for root commits
>   commit-graph: ensure Bloom filters are read with consistent settings
>   t/helper/test-read-graph.c: extract `dump_graph_info()`
>   bloom.h: make `load_bloom_filter_from_graph()` public
>   t/helper/test-read-graph: implement `bloom-filters` mode
>   bloom: annotate filters with hash version
>   bloom: prepare to discard incompatible Bloom filters
>   commit-graph.c: unconditionally load Bloom filters
>   commit-graph: drop unnecessary `graph_read_bloom_data_context`
>   object.h: fix mis-aligned flag bits table
>   commit-graph: reuse existing Bloom filters where possible
>   bloom: introduce `deinit_bloom_filters()`
> 
>  Documentation/config/commitgraph.txt     |  26 ++-
>  Documentation/gitformat-commit-graph.txt |   9 +-
>  bloom.c                                  | 208 ++++++++++++++++-
>  bloom.h                                  |  38 +++-
>  commit-graph.c                           |  61 ++++-
>  object.h                                 |   3 +-
>  oss-fuzz/fuzz-commit-graph.c             |   2 +-
>  repo-settings.c                          |   6 +-
>  repository.h                             |   2 +-
>  revision.c                               |  26 ++-
>  t/helper/test-bloom.c                    |   9 +-
>  t/helper/test-read-graph.c               |  67 ++++--
>  t/t0095-bloom.sh                         |   8 +
>  t/t4216-log-bloom.sh                     | 272 ++++++++++++++++++++++-
>  14 files changed, 682 insertions(+), 55 deletions(-)
> 
> Range-diff against v2:
> 10:  002a06d1e9 !  1:  fe671d616c t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
>     @@ Commit message
>          indicating that no filters were used.
>      
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## t/t4216-log-bloom.sh ##
>      @@ t/t4216-log-bloom.sh: test_bloom_filters_used () {
>  -:  ---------- >  2:  7d0fa93543 revision.c: consult Bloom filters for root commits
>  -:  ---------- >  3:  2ecc0a2d58 commit-graph: ensure Bloom filters are read with consistent settings
>  1:  5fa681b58e !  4:  17703ed89a gitformat-commit-graph: describe version 2 of BDAT
>     @@ Commit message
>          Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>          Signed-off-by: Junio C Hamano <gitster@pobox.com>
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## Documentation/gitformat-commit-graph.txt ##
>      @@ Documentation/gitformat-commit-graph.txt: All multi-byte numbers are in network byte order.
>  2:  623d840575 !  5:  94552abf45 t/helper/test-read-graph.c: extract `dump_graph_info()`
>     @@ Commit message
>          Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>          Signed-off-by: Junio C Hamano <gitster@pobox.com>
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## t/helper/test-read-graph.c ##
>      @@
>  3:  bc9d77ae60 !  6:  3d81efa27b bloom.h: make `load_bloom_filter_from_graph()` public
>     @@ Commit message
>          Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>          Signed-off-by: Junio C Hamano <gitster@pobox.com>
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## bloom.c ##
>      @@ bloom.c: static inline unsigned char get_bitmask(uint32_t pos)
>  4:  ac7008aed3 !  7:  d23cd89037 t/helper/test-read-graph: implement `bloom-filters` mode
>     @@ Commit message
>          Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>          Signed-off-by: Junio C Hamano <gitster@pobox.com>
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## t/helper/test-read-graph.c ##
>      @@ t/helper/test-read-graph.c: static void dump_graph_info(struct commit_graph *graph)
>     @@ t/helper/test-read-graph.c: int cmd__read_graph(int argc UNUSED, const char **ar
>      -	return 0;
>      +	return ret;
>       }
>     ++
>     ++
>  5:  71755ba856 !  8:  cba766f224 t4216: test changed path filters with high bit paths
>     @@ Commit message
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## t/t4216-log-bloom.sh ##
>     -@@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty commits' '
>     - 	)
>     +@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' '
>     + 	! grep "disabling Bloom filters" err
>       '
>       
>      +get_first_changed_path_filter () {
>  6:  9768d92c0f !  9:  a08a961f41 repo-settings: introduce commitgraph.changedPathsVersion
>     @@ Commit message
>          Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>          Signed-off-by: Junio C Hamano <gitster@pobox.com>
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## Documentation/config/commitgraph.txt ##
>      @@ Documentation/config/commitgraph.txt: commitGraph.maxNewFilters::
>  7:  f911b4bfab = 10:  61d44519a5 commit-graph: new filter ver. that fixes murmur3
>  8:  35009900df ! 11:  a8c10f8de8 bloom: annotate filters with hash version
>     @@ Commit message
>          Bloom filter.
>      
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## bloom.c ##
>      @@ bloom.c: int load_bloom_filter_from_graph(struct commit_graph *g,
>  9:  138bc16905 ! 12:  2ba10a4b4b bloom: prepare to discard incompatible Bloom filters
>     @@ Commit message
>          `get_or_compute_bloom_filter()`.
>      
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## bloom.c ##
>      @@ bloom.c: static void init_truncated_large_filter(struct bloom_filter *filter,
> 11:  2437e62813 ! 13:  09d8669c3a commit-graph.c: unconditionally load Bloom filters
>     @@ Commit message
>          either "1" or "2".
>      
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## commit-graph.c ##
>      @@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
> 12:  fe8fb2f5fe ! 14:  0d4f9dc4ee commit-graph: drop unnecessary `graph_read_bloom_data_context`
>     @@ Commit message
>      
>          Noticed-by: Jonathan Tan <jonathantanmy@google.com>
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## commit-graph.c ##
>      @@ commit-graph.c: static int graph_read_oid_lookup(const unsigned char *chunk_start,
> 13:  825af91e11 ! 15:  1f7f27bc47 object.h: fix mis-aligned flag bits table
>     @@ Commit message
>          Bit position 23 is one column too far to the left.
>      
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## object.h ##
>      @@ object.h: void object_array_init(struct object_array *array);
> 14:  593b317192 ! 16:  abbef95ae8 commit-graph: reuse existing Bloom filters where possible
>     @@ Commit message
>            commits by their generation number.
>      
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>     -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## bloom.c ##
>      @@
> 15:  8bf2c9cf98 = 17:  ca362408d5 bloom: introduce `deinit_bloom_filters()`
> -- 
> 2.42.0.342.g8bb3a896ee

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v3 13/17] commit-graph.c: unconditionally load Bloom filters
From: Patrick Steinhardt @ 2023-10-17  8:45 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Jonathan Tan, Junio C Hamano, Jeff King, SZEDER Gábor
In-Reply-To: <09d8669c3a074e7a2ace9d650a345244b2362f7e.1696969994.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 4073 bytes --]

On Tue, Oct 10, 2023 at 04:33:59PM -0400, Taylor Blau wrote:
> In 9e4df4da07 (commit-graph: new filter ver. that fixes murmur3,

Nit: It's a bit funny to read this reference to a commit ID when the
commit in question is part of the same series. Isn't it likely to grow
stale?

> 2023-08-01), we began ignoring the Bloom data ("BDAT") chunk for
> commit-graphs whose Bloom filters were computed using a hash version
> incompatible with the value of `commitGraph.changedPathVersion`.
> 
> Now that the Bloom API has been hardened to discard these incompatible
> filters (with the exception of low-level APIs), we can safely load these
> Bloom filters unconditionally.
> 
> We no longer want to return early from `graph_read_bloom_data()`, and
> similarly do not want to set the bloom_settings' `hash_version` field as
> a side-effect. The latter is because we want to wait until we know which
> Bloom settings we're using (either the defaults, from the GIT_TEST
> variables, or from the previous commit-graph layer) before deciding what
> hash_version to use.
> 
> If we detect an existing BDAT chunk, we'll infer the rest of the
> settings (e.g., number of hashes, bits per entry, and maximum number of
> changed paths) from the earlier graph layer. The hash_version will be
> inferred from the previous layer as well, unless one has already been
> specified via configuration.
> 
> Once all of that is done, we normalize the value of the hash_version to
> either "1" or "2".
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  commit-graph.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index db623afd09..fa3b58e762 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -327,12 +327,6 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,
>  	uint32_t hash_version;
>  	hash_version = get_be32(chunk_start);
>  
> -	if (*c->commit_graph_changed_paths_version == -1) {
> -		*c->commit_graph_changed_paths_version = hash_version;
> -	} else if (hash_version != *c->commit_graph_changed_paths_version) {
> -		return 0;
> -	}
> -
>  	g->chunk_bloom_data = chunk_start;
>  	g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
>  	g->bloom_filter_settings->hash_version = hash_version;
> @@ -2473,8 +2467,7 @@ int write_commit_graph(struct object_directory *odb,
>  	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
>  	ctx->num_generation_data_overflows = 0;
>  
> -	bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2
> -		? 2 : 1;
> +	bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version;
>  	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
>  						      bloom_settings.bits_per_entry);
>  	bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
> @@ -2506,10 +2499,18 @@ int write_commit_graph(struct object_directory *odb,
>  		/* We have changed-paths already. Keep them in the next graph */
>  		if (g && g->bloom_filter_settings) {
>  			ctx->changed_paths = 1;
> -			ctx->bloom_settings = g->bloom_filter_settings;
> +
> +			/* don't propagate the hash_version unless unspecified */
> +			if (bloom_settings.hash_version == -1)
> +				bloom_settings.hash_version = g->bloom_filter_settings->hash_version;
> +			bloom_settings.bits_per_entry = g->bloom_filter_settings->bits_per_entry;
> +			bloom_settings.num_hashes = g->bloom_filter_settings->num_hashes;
> +			bloom_settings.max_changed_paths = g->bloom_filter_settings->max_changed_paths;
>  		}
>  	}
>  
> +	bloom_settings.hash_version = bloom_settings.hash_version == 2 ? 2 : 1;
> +

What if there is a future version of Git that writes Bloom filters with
hash version 3? Should we really normalize that to `1`?

Patrick

>  	if (ctx->split) {
>  		struct commit_graph *g = ctx->r->objects->commit_graph;
>  
> -- 
> 2.42.0.342.g8bb3a896ee
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v3 10/17] commit-graph: new filter ver. that fixes murmur3
From: Patrick Steinhardt @ 2023-10-17  8:45 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Jonathan Tan, Junio C Hamano, Jeff King, SZEDER Gábor
In-Reply-To: <61d44519a5ffaf2c040198cf8d80d05a09de5de5.1696969994.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 17868 bytes --]

On Tue, Oct 10, 2023 at 04:33:49PM -0400, Taylor Blau wrote:
> From: Jonathan Tan <jonathantanmy@google.com>
> 
> The murmur3 implementation in bloom.c has a bug when converting series
> of 4 bytes into network-order integers when char is signed (which is
> controllable by a compiler option, and the default signedness of char is
> platform-specific). When a string contains characters with the high bit
> set, this bug causes results that, although internally consistent within
> Git, does not accord with other implementations of murmur3 (thus,
> the changed path filters wouldn't be readable by other off-the-shelf
> implementatios of murmur3) and even with Git binaries that were compiled
> with different signedness of char. This bug affects both how Git writes
> changed path filters to disk and how Git interprets changed path filters
> on disk.
> 
> Therefore, introduce a new version (2) of changed path filters that
> corrects this problem. The existing version (1) is still supported and
> is still the default, but users should migrate away from it as soon
> as possible.
> 
> Because this bug only manifests with characters that have the high bit
> set, it may be possible that some (or all) commits in a given repo would
> have the same changed path filter both before and after this fix is
> applied. However, in order to determine whether this is the case, the
> changed paths would first have to be computed, at which point it is not
> much more expensive to just compute a new changed path filter.
> 
> So this patch does not include any mechanism to "salvage" changed path
> filters from repositories. There is also no "mixed" mode - for each
> invocation of Git, reading and writing changed path filters are done
> with the same version number; this version number may be explicitly
> stated (typically if the user knows which version they need) or
> automatically determined from the version of the existing changed path
> filters in the repository.
> 
> There is a change in write_commit_graph(). graph_read_bloom_data()
> makes it possible for chunk_bloom_data to be non-NULL but
> bloom_filter_settings to be NULL, which causes a segfault later on. I
> produced such a segfault while developing this patch, but couldn't find
> a way to reproduce it neither after this complete patch (or before),
> but in any case it seemed like a good thing to include that might help
> future patch authors.
> 
> The value in t0095 was obtained from another murmur3 implementation
> using the following Go source code:
> 
>   package main
> 
>   import "fmt"
>   import "github.com/spaolacci/murmur3"
> 
>   func main() {
>           fmt.Printf("%x\n", murmur3.Sum32([]byte("Hello world!")))
>           fmt.Printf("%x\n", murmur3.Sum32([]byte{0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff}))
>   }
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Documentation/config/commitgraph.txt |   5 +-
>  bloom.c                              |  69 +++++++++++++++++-
>  bloom.h                              |   8 +-
>  commit-graph.c                       |  32 ++++++--
>  t/helper/test-bloom.c                |   9 ++-
>  t/t0095-bloom.sh                     |   8 ++
>  t/t4216-log-bloom.sh                 | 105 +++++++++++++++++++++++++++
>  7 files changed, 223 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt
> index 2dc9170622..acc74a2f27 100644
> --- a/Documentation/config/commitgraph.txt
> +++ b/Documentation/config/commitgraph.txt
> @@ -15,7 +15,7 @@ commitGraph.readChangedPaths::
>  
>  commitGraph.changedPathsVersion::
>  	Specifies the version of the changed-path Bloom filters that Git will read and
> -	write. May be -1, 0 or 1.
> +	write. May be -1, 0, 1, or 2.
>  +
>  Defaults to -1.
>  +
> @@ -28,4 +28,7 @@ filters when instructed to write.
>  If 1, Git will only read version 1 Bloom filters, and will write version 1
>  Bloom filters.
>  +
> +If 2, Git will only read version 2 Bloom filters, and will write version 2
> +Bloom filters.
> ++
>  See linkgit:git-commit-graph[1] for more information.
> diff --git a/bloom.c b/bloom.c
> index 3e78cfe79d..ebef5cfd2f 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -66,7 +66,64 @@ int load_bloom_filter_from_graph(struct commit_graph *g,
>   * Not considered to be cryptographically secure.
>   * Implemented as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm
>   */
> -uint32_t murmur3_seeded(uint32_t seed, const char *data, size_t len)
> +uint32_t murmur3_seeded_v2(uint32_t seed, const char *data, size_t len)
> +{
> +	const uint32_t c1 = 0xcc9e2d51;
> +	const uint32_t c2 = 0x1b873593;
> +	const uint32_t r1 = 15;
> +	const uint32_t r2 = 13;
> +	const uint32_t m = 5;
> +	const uint32_t n = 0xe6546b64;
> +	int i;
> +	uint32_t k1 = 0;
> +	const char *tail;
> +
> +	int len4 = len / sizeof(uint32_t);
> +
> +	uint32_t k;
> +	for (i = 0; i < len4; i++) {
> +		uint32_t byte1 = (uint32_t)(unsigned char)data[4*i];
> +		uint32_t byte2 = ((uint32_t)(unsigned char)data[4*i + 1]) << 8;
> +		uint32_t byte3 = ((uint32_t)(unsigned char)data[4*i + 2]) << 16;
> +		uint32_t byte4 = ((uint32_t)(unsigned char)data[4*i + 3]) << 24;
> +		k = byte1 | byte2 | byte3 | byte4;
> +		k *= c1;
> +		k = rotate_left(k, r1);
> +		k *= c2;
> +
> +		seed ^= k;
> +		seed = rotate_left(seed, r2) * m + n;
> +	}
> +
> +	tail = (data + len4 * sizeof(uint32_t));
> +
> +	switch (len & (sizeof(uint32_t) - 1)) {
> +	case 3:
> +		k1 ^= ((uint32_t)(unsigned char)tail[2]) << 16;
> +		/*-fallthrough*/
> +	case 2:
> +		k1 ^= ((uint32_t)(unsigned char)tail[1]) << 8;
> +		/*-fallthrough*/
> +	case 1:
> +		k1 ^= ((uint32_t)(unsigned char)tail[0]) << 0;
> +		k1 *= c1;
> +		k1 = rotate_left(k1, r1);
> +		k1 *= c2;
> +		seed ^= k1;
> +		break;
> +	}
> +
> +	seed ^= (uint32_t)len;
> +	seed ^= (seed >> 16);
> +	seed *= 0x85ebca6b;
> +	seed ^= (seed >> 13);
> +	seed *= 0xc2b2ae35;
> +	seed ^= (seed >> 16);
> +
> +	return seed;
> +}
> +
> +static uint32_t murmur3_seeded_v1(uint32_t seed, const char *data, size_t len)
>  {
>  	const uint32_t c1 = 0xcc9e2d51;
>  	const uint32_t c2 = 0x1b873593;
> @@ -131,8 +188,14 @@ void fill_bloom_key(const char *data,
>  	int i;
>  	const uint32_t seed0 = 0x293ae76f;
>  	const uint32_t seed1 = 0x7e646e2c;
> -	const uint32_t hash0 = murmur3_seeded(seed0, data, len);
> -	const uint32_t hash1 = murmur3_seeded(seed1, data, len);
> +	uint32_t hash0, hash1;
> +	if (settings->hash_version == 2) {
> +		hash0 = murmur3_seeded_v2(seed0, data, len);
> +		hash1 = murmur3_seeded_v2(seed1, data, len);
> +	} else {
> +		hash0 = murmur3_seeded_v1(seed0, data, len);
> +		hash1 = murmur3_seeded_v1(seed1, data, len);
> +	}
>  
>  	key->hashes = (uint32_t *)xcalloc(settings->num_hashes, sizeof(uint32_t));
>  	for (i = 0; i < settings->num_hashes; i++)
> diff --git a/bloom.h b/bloom.h
> index 1e4f612d2c..138d57a86b 100644
> --- a/bloom.h
> +++ b/bloom.h
> @@ -8,9 +8,11 @@ struct commit_graph;
>  struct bloom_filter_settings {
>  	/*
>  	 * The version of the hashing technique being used.
> -	 * We currently only support version = 1 which is
> +	 * The newest version is 2, which is
>  	 * the seeded murmur3 hashing technique implemented
> -	 * in bloom.c.
> +	 * in bloom.c. Bloom filters of version 1 were created
> +	 * with prior versions of Git, which had a bug in the
> +	 * implementation of the hash function.
>  	 */
>  	uint32_t hash_version;
>  
> @@ -80,7 +82,7 @@ int load_bloom_filter_from_graph(struct commit_graph *g,
>   * Not considered to be cryptographically secure.
>   * Implemented as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm
>   */
> -uint32_t murmur3_seeded(uint32_t seed, const char *data, size_t len);
> +uint32_t murmur3_seeded_v2(uint32_t seed, const char *data, size_t len);
>  
>  void fill_bloom_key(const char *data,
>  		    size_t len,
> diff --git a/commit-graph.c b/commit-graph.c
> index ea677c87fb..db623afd09 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -314,17 +314,26 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
>  	return 0;
>  }
>  
> +struct graph_read_bloom_data_context {
> +	struct commit_graph *g;
> +	int *commit_graph_changed_paths_version;
> +};
> +
>  static int graph_read_bloom_data(const unsigned char *chunk_start,
>  				  size_t chunk_size, void *data)
>  {
> -	struct commit_graph *g = data;
> +	struct graph_read_bloom_data_context *c = data;
> +	struct commit_graph *g = c->g;
>  	uint32_t hash_version;
> -	g->chunk_bloom_data = chunk_start;
>  	hash_version = get_be32(chunk_start);
>  
> -	if (hash_version != 1)
> +	if (*c->commit_graph_changed_paths_version == -1) {
> +		*c->commit_graph_changed_paths_version = hash_version;
> +	} else if (hash_version != *c->commit_graph_changed_paths_version) {
>  		return 0;
> +	}

In case we have `c->commit_graph_changed_paths_version == -1` we lose
the check that the hash version is something that we know and support,
don't we? And while we do start to handle `-1` in the writing path, I
think we don't in the reading path unless I missed something.

> +	g->chunk_bloom_data = chunk_start;
>  	g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
>  	g->bloom_filter_settings->hash_version = hash_version;
>  	g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4);
> @@ -412,10 +421,14 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
>  	}
>  
>  	if (s->commit_graph_changed_paths_version) {
> +		struct graph_read_bloom_data_context context = {
> +			.g = graph,
> +			.commit_graph_changed_paths_version = &s->commit_graph_changed_paths_version
> +		};
>  		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
>  			   &graph->chunk_bloom_indexes);
>  		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
> -			   graph_read_bloom_data, graph);
> +			   graph_read_bloom_data, &context);
>  	}
>  
>  	if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) {
> @@ -2441,6 +2454,13 @@ int write_commit_graph(struct object_directory *odb,
>  	}
>  	if (!commit_graph_compatible(r))
>  		return 0;
> +	if (r->settings.commit_graph_changed_paths_version < -1
> +	    || r->settings.commit_graph_changed_paths_version > 2) {
> +		warning(_("attempting to write a commit-graph, but "
> +			  "'commitgraph.changedPathsVersion' (%d) is not supported"),
> +			r->settings.commit_graph_changed_paths_version);
> +		return 0;
> +	}
>  
>  	CALLOC_ARRAY(ctx, 1);
>  	ctx->r = r;
> @@ -2453,6 +2473,8 @@ int write_commit_graph(struct object_directory *odb,
>  	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
>  	ctx->num_generation_data_overflows = 0;
>  
> +	bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2
> +		? 2 : 1;
>  	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
>  						      bloom_settings.bits_per_entry);
>  	bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
> @@ -2482,7 +2504,7 @@ int write_commit_graph(struct object_directory *odb,
>  		g = ctx->r->objects->commit_graph;
>  
>  		/* We have changed-paths already. Keep them in the next graph */
> -		if (g && g->chunk_bloom_data) {
> +		if (g && g->bloom_filter_settings) {
>  			ctx->changed_paths = 1;
>  			ctx->bloom_settings = g->bloom_filter_settings;
>  		}
> diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c
> index aabe31d724..3cbc0a5b50 100644
> --- a/t/helper/test-bloom.c
> +++ b/t/helper/test-bloom.c
> @@ -50,6 +50,7 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid)
>  
>  static const char *bloom_usage = "\n"
>  "  test-tool bloom get_murmur3 <string>\n"
> +"  test-tool bloom get_murmur3_seven_highbit\n"
>  "  test-tool bloom generate_filter <string> [<string>...]\n"
>  "  test-tool bloom get_filter_for_commit <commit-hex>\n";
>  
> @@ -64,7 +65,13 @@ int cmd__bloom(int argc, const char **argv)
>  		uint32_t hashed;
>  		if (argc < 3)
>  			usage(bloom_usage);
> -		hashed = murmur3_seeded(0, argv[2], strlen(argv[2]));
> +		hashed = murmur3_seeded_v2(0, argv[2], strlen(argv[2]));
> +		printf("Murmur3 Hash with seed=0:0x%08x\n", hashed);
> +	}
> +
> +	if (!strcmp(argv[1], "get_murmur3_seven_highbit")) {
> +		uint32_t hashed;
> +		hashed = murmur3_seeded_v2(0, "\x99\xaa\xbb\xcc\xdd\xee\xff", 7);
>  		printf("Murmur3 Hash with seed=0:0x%08x\n", hashed);
>  	}
>  
> diff --git a/t/t0095-bloom.sh b/t/t0095-bloom.sh
> index b567383eb8..c8d84ab606 100755
> --- a/t/t0095-bloom.sh
> +++ b/t/t0095-bloom.sh
> @@ -29,6 +29,14 @@ test_expect_success 'compute unseeded murmur3 hash for test string 2' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'compute unseeded murmur3 hash for test string 3' '
> +	cat >expect <<-\EOF &&
> +	Murmur3 Hash with seed=0:0xa183ccfd
> +	EOF
> +	test-tool bloom get_murmur3_seven_highbit >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'compute bloom key for empty string' '
>  	cat >expect <<-\EOF &&
>  	Hashes:0x5615800c|0x5b966560|0x61174ab4|0x66983008|0x6c19155c|0x7199fab0|0x771ae004|
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index da67c40134..8f8b5d4966 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -536,4 +536,109 @@ test_expect_success 'version 1 changed-path used when version 1 requested' '
>  	)
>  '
>  
> +test_expect_success 'version 1 changed-path not used when version 2 requested' '
> +	(
> +		cd highbit1 &&
> +		git config --add commitgraph.changedPathsVersion 2 &&
> +		test_bloom_filters_not_used "-- another$CENT"
> +	)
> +'
> +
> +test_expect_success 'version 1 changed-path used when autodetect requested' '
> +	(
> +		cd highbit1 &&
> +		git config --add commitgraph.changedPathsVersion -1 &&
> +		test_bloom_filters_used "-- another$CENT"
> +	)
> +'
> +
> +test_expect_success 'when writing another commit graph, preserve existing version 1 of changed-path' '
> +	test_commit -C highbit1 c1double "$CENT$CENT" &&
> +	git -C highbit1 commit-graph write --reachable --changed-paths &&
> +	(
> +		cd highbit1 &&
> +		git config --add commitgraph.changedPathsVersion -1 &&
> +		echo "options: bloom(1,10,7) read_generation_data" >expect &&
> +		test-tool read-graph >full &&
> +		grep options full >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
> +test_expect_success 'set up repo with high bit path, version 2 changed-path' '
> +	git init highbit2 &&
> +	git -C highbit2 config --add commitgraph.changedPathsVersion 2 &&
> +	test_commit -C highbit2 c2 "$CENT" &&
> +	git -C highbit2 commit-graph write --reachable --changed-paths
> +'
> +
> +test_expect_success 'check value of version 2 changed-path' '
> +	(
> +		cd highbit2 &&
> +		echo "c01f" >expect &&
> +		get_first_changed_path_filter >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
> +test_expect_success 'setup make another commit' '
> +	# "git log" does not use Bloom filters for root commits - see how, in
> +	# revision.c, rev_compare_tree() (the only code path that eventually calls
> +	# get_bloom_filter()) is only called by try_to_simplify_commit() when the commit
> +	# has one parent. Therefore, make another commit so that we perform the tests on
> +	# a non-root commit.
> +	test_commit -C highbit2 anotherc2 "another$CENT"
> +'
> +
> +test_expect_success 'version 2 changed-path used when version 2 requested' '
> +	(
> +		cd highbit2 &&
> +		test_bloom_filters_used "-- another$CENT"
> +	)
> +'
> +
> +test_expect_success 'version 2 changed-path not used when version 1 requested' '
> +	(
> +		cd highbit2 &&
> +		git config --add commitgraph.changedPathsVersion 1 &&
> +		test_bloom_filters_not_used "-- another$CENT"
> +	)
> +'
> +
> +test_expect_success 'version 2 changed-path used when autodetect requested' '
> +	(
> +		cd highbit2 &&
> +		git config --add commitgraph.changedPathsVersion -1 &&
> +		test_bloom_filters_used "-- another$CENT"
> +	)
> +'
> +
> +test_expect_success 'when writing another commit graph, preserve existing version 2 of changed-path' '
> +	test_commit -C highbit2 c2double "$CENT$CENT" &&
> +	git -C highbit2 commit-graph write --reachable --changed-paths &&
> +	(
> +		cd highbit2 &&
> +		git config --add commitgraph.changedPathsVersion -1 &&
> +		echo "options: bloom(2,10,7) read_generation_data" >expect &&
> +		test-tool read-graph >full &&
> +		grep options full >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
> +test_expect_success 'when writing commit graph, do not reuse changed-path of another version' '
> +	git init doublewrite &&
> +	test_commit -C doublewrite c "$CENT" &&
> +	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
> +	git -C doublewrite commit-graph write --reachable --changed-paths &&
> +	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
> +	git -C doublewrite commit-graph write --reachable --changed-paths &&
> +	(
> +		cd doublewrite &&
> +		echo "c01f" >expect &&
> +		get_first_changed_path_filter >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +

With the supposedly missing check in mind, should we also add tests for
currently unknown versions like 3 or -2?

Patrick

>  test_done
> -- 
> 2.42.0.342.g8bb3a896ee
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v3 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()`
From: Patrick Steinhardt @ 2023-10-17  8:45 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Jonathan Tan, Junio C Hamano, Jeff King, SZEDER Gábor
In-Reply-To: <94552abf455c6d341a0811333ae4edb4a8cea259.1696969994.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 2160 bytes --]

On Tue, Oct 10, 2023 at 04:33:33PM -0400, Taylor Blau wrote:
> Prepare for the 'read-graph' test helper to perform other tasks besides
> dumping high-level information about the commit-graph by extracting its
> main routine into a separate function.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>

Nit: your signoff is duplicated here. This is also still the case for
some of the other commits.

Patrick

> ---
>  t/helper/test-read-graph.c | 31 ++++++++++++++++++-------------
>  1 file changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c
> index 8c7a83f578..3375392f6c 100644
> --- a/t/helper/test-read-graph.c
> +++ b/t/helper/test-read-graph.c
> @@ -5,20 +5,8 @@
>  #include "bloom.h"
>  #include "setup.h"
>  
> -int cmd__read_graph(int argc UNUSED, const char **argv UNUSED)
> +static void dump_graph_info(struct commit_graph *graph)
>  {
> -	struct commit_graph *graph = NULL;
> -	struct object_directory *odb;
> -
> -	setup_git_directory();
> -	odb = the_repository->objects->odb;
> -
> -	prepare_repo_settings(the_repository);
> -
> -	graph = read_commit_graph_one(the_repository, odb);
> -	if (!graph)
> -		return 1;
> -
>  	printf("header: %08x %d %d %d %d\n",
>  		ntohl(*(uint32_t*)graph->data),
>  		*(unsigned char*)(graph->data + 4),
> @@ -57,6 +45,23 @@ int cmd__read_graph(int argc UNUSED, const char **argv UNUSED)
>  	if (graph->topo_levels)
>  		printf(" topo_levels");
>  	printf("\n");
> +}
> +
> +int cmd__read_graph(int argc UNUSED, const char **argv UNUSED)
> +{
> +	struct commit_graph *graph = NULL;
> +	struct object_directory *odb;
> +
> +	setup_git_directory();
> +	odb = the_repository->objects->odb;
> +
> +	prepare_repo_settings(the_repository);
> +
> +	graph = read_commit_graph_one(the_repository, odb);
> +	if (!graph)
> +		return 1;
> +
> +	dump_graph_info(graph);
>  
>  	UNLEAK(graph);
>  
> -- 
> 2.42.0.342.g8bb3a896ee
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v3 08/17] t4216: test changed path filters with high bit paths
From: Patrick Steinhardt @ 2023-10-17  8:45 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Jonathan Tan, Junio C Hamano, Jeff King, SZEDER Gábor
In-Reply-To: <cba766f224b0d2b4fd952b11bef8068c07dfcf88.1696969994.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 3157 bytes --]

On Tue, Oct 10, 2023 at 04:33:42PM -0400, Taylor Blau wrote:
> From: Jonathan Tan <jonathantanmy@google.com>
> 
> Subsequent commits will teach Git another version of changed path
> filter that has different behavior with paths that contain at least
> one character with its high bit set, so test the existing behavior as
> a baseline.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>

Nit: the signoffs are still funny here.

> ---
>  t/t4216-log-bloom.sh | 52 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index f49a8f2fbf..da67c40134 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -484,4 +484,56 @@ test_expect_success 'merge graph layers with incompatible Bloom settings' '
>  	! grep "disabling Bloom filters" err
>  '
>  
> +get_first_changed_path_filter () {
> +	test-tool read-graph bloom-filters >filters.dat &&
> +	head -n 1 filters.dat
> +}
> +
> +# chosen to be the same under all Unicode normalization forms
> +CENT=$(printf "\302\242")
> +
> +test_expect_success 'set up repo with high bit path, version 1 changed-path' '
> +	git init highbit1 &&
> +	test_commit -C highbit1 c1 "$CENT" &&
> +	git -C highbit1 commit-graph write --reachable --changed-paths
> +'
> +
> +test_expect_success 'setup check value of version 1 changed-path' '
> +	(
> +		cd highbit1 &&
> +		echo "52a9" >expect &&
> +		get_first_changed_path_filter >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
> +# expect will not match actual if char is unsigned by default. Write the test
> +# in this way, so that a user running this test script can still see if the two
> +# files match. (It will appear as an ordinary success if they match, and a skip
> +# if not.)
> +if test_cmp highbit1/expect highbit1/actual
> +then
> +	test_set_prereq SIGNED_CHAR_BY_DEFAULT
> +fi
> +test_expect_success SIGNED_CHAR_BY_DEFAULT 'check value of version 1 changed-path' '
> +	# Only the prereq matters for this test.
> +	true
> +'

Doesn't this mean that the preceding test where we `test_cmp expect
actual` can fail on some platforms depending on the signedness of
`char`?

Patrick

> +test_expect_success 'setup make another commit' '
> +	# "git log" does not use Bloom filters for root commits - see how, in
> +	# revision.c, rev_compare_tree() (the only code path that eventually calls
> +	# get_bloom_filter()) is only called by try_to_simplify_commit() when the commit
> +	# has one parent. Therefore, make another commit so that we perform the tests on
> +	# a non-root commit.
> +	test_commit -C highbit1 anotherc1 "another$CENT"
> +'
> +
> +test_expect_success 'version 1 changed-path used when version 1 requested' '
> +	(
> +		cd highbit1 &&
> +		test_bloom_filters_used "-- another$CENT"
> +	)
> +'
> +
>  test_done
> -- 
> 2.42.0.342.g8bb3a896ee
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v3 03/17] commit-graph: ensure Bloom filters are read with consistent settings
From: Patrick Steinhardt @ 2023-10-17  8:45 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Jonathan Tan, Junio C Hamano, Jeff King, SZEDER Gábor
In-Reply-To: <2ecc0a2d58432b149d73a3e2abfa948eb1f0aa0b.1696969994.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 8299 bytes --]

On Tue, Oct 10, 2023 at 04:33:26PM -0400, Taylor Blau wrote:
> The changed-path Bloom filter mechanism is parameterized by a couple of
> variables, notably the number of bits per hash (typically "m" in Bloom
> filter literature) and the number of hashes themselves (typically "k").
> 
> It is critically important that filters are read with the Bloom filter
> settings that they were written with. Failing to do so would mean that
> each query is liable to compute different fingerprints, meaning that the
> filter itself could return a false negative. This goes against a basic
> assumption of using Bloom filters (that they may return false positives,
> but never false negatives) and can lead to incorrect results.
> 
> We have some existing logic to carry forward existing Bloom filter
> settings from one layer to the next. In `write_commit_graph()`, we have
> something like:
> 
>     if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) {
>         struct commit_graph *g = ctx->r->objects->commit_graph;
> 
>         /* We have changed-paths already. Keep them in the next graph */
>         if (g && g->chunk_bloom_data) {
>             ctx->changed_paths = 1;
>             ctx->bloom_settings = g->bloom_filter_settings;
>         }
>     }
> 
> , which drags forward Bloom filter settings across adjacent layers.
> 
> This doesn't quite address all cases, however, since it is possible for
> intermediate layers to contain no Bloom filters at all. For example,
> suppose we have two layers in a commit-graph chain, say, {G1, G2}. If G1
> contains Bloom filters, but G2 doesn't, a new G3 (whose base graph is
> G2) may be written with arbitrary Bloom filter settings, because we only
> check the immediately adjacent layer's settings for compatibility.
> 
> This behavior has existed since the introduction of changed-path Bloom
> filters. But in practice, this is not such a big deal, since the only
> way up until this point to modify the Bloom filter settings at write
> time is with the undocumented environment variables:
> 
>   - GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY
>   - GIT_TEST_BLOOM_SETTINGS_NUM_HASHES
>   - GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS
> 
> (it is still possible to tweak MAX_CHANGED_PATHS between layers, but
> this does not affect reads, so is allowed to differ across multiple
> graph layers).
> 
> But in future commits, we will introduce another parameter to change the
> hash algorithm used to compute Bloom fingerprints itself. This will be
> exposed via a configuration setting, making this foot-gun easier to use.
> 
> To prevent this potential issue, validate that all layers of a split
> commit-graph have compatible settings with the newest layer which
> contains Bloom filters.
> 
> Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
> Original-test-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  commit-graph.c       | 25 +++++++++++++++++
>  t/t4216-log-bloom.sh | 64 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 89 insertions(+)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 1a56efcf69..ae0902f7f4 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -498,6 +498,30 @@ static int validate_mixed_generation_chain(struct commit_graph *g)
>  	return 0;
>  }
>  
> +static void validate_mixed_bloom_settings(struct commit_graph *g)
> +{
> +	struct bloom_filter_settings *settings = NULL;
> +	for (; g; g = g->base_graph) {
> +		if (!g->bloom_filter_settings)
> +			continue;
> +		if (!settings) {
> +			settings = g->bloom_filter_settings;
> +			continue;
> +		}
> +
> +		if (g->bloom_filter_settings->bits_per_entry != settings->bits_per_entry ||
> +		    g->bloom_filter_settings->num_hashes != settings->num_hashes) {
> +			g->chunk_bloom_indexes = NULL;
> +			g->chunk_bloom_data = NULL;
> +			FREE_AND_NULL(g->bloom_filter_settings);
> +
> +			warning(_("disabling Bloom filters for commit-graph "
> +				  "layer '%s' due to incompatible settings"),
> +				oid_to_hex(&g->oid));
> +		}
> +	}
> +}
> +
>  static int add_graph_to_chain(struct commit_graph *g,
>  			      struct commit_graph *chain,
>  			      struct object_id *oids,
> @@ -614,6 +638,7 @@ struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r,
>  	}
>  
>  	validate_mixed_generation_chain(graph_chain);
> +	validate_mixed_bloom_settings(graph_chain);
>  
>  	free(oids);
>  	fclose(fp);
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index 322640feeb..f49a8f2fbf 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -420,4 +420,68 @@ test_expect_success 'Bloom generation backfills empty commits' '
>  	)
>  '
>  
> +graph=.git/objects/info/commit-graph
> +graphdir=.git/objects/info/commit-graphs
> +chain=$graphdir/commit-graph-chain
> +
> +test_expect_success 'setup for mixed Bloom setting tests' '
> +	repo=mixed-bloom-settings &&
> +
> +	git init $repo &&
> +	for i in one two three
> +	do
> +		test_commit -C $repo $i file || return 1
> +	done
> +'
> +
> +test_expect_success 'split' '
> +	# Compute Bloom filters with "unusual" settings.
> +	git -C $repo rev-parse one >in &&
> +	GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
> +		--stdin-commits --changed-paths --split <in &&
> +	layer=$(head -n 1 $repo/$chain) &&
> +
> +	# A commit-graph layer without Bloom filters "hides" the layers
> +	# below ...
> +	git -C $repo rev-parse two >in &&
> +	git -C $repo commit-graph write --stdin-commits --no-changed-paths \
> +		--split=no-merge <in &&
> +
> +	# Another commit-graph layer that has Bloom filters, but with
> +	# standard settings, and is thus incompatible with the base
> +	# layer written above.
> +	git -C $repo rev-parse HEAD >in &&
> +	git -C $repo commit-graph write --stdin-commits --changed-paths \
> +		--split=no-merge <in &&
> +
> +	test_line_count = 3 $repo/$chain &&
> +
> +	# Ensure that incompatible Bloom filters are ignored.
> +	git -C $repo -c core.commitGraph=false log --oneline --no-decorate -- file \
> +		>expect 2>err &&
> +	git -C $repo log --oneline --no-decorate -- file >actual 2>err &&
> +	test_cmp expect actual &&
> +	grep "disabling Bloom filters for commit-graph layer .$layer." err
> +'

Up to this point everything looks sensible to me.

> +test_expect_success 'merge graph layers with incompatible Bloom settings' '
> +	# Ensure that incompatible Bloom filters are ignored when
> +	# generating new layers.
> +	git -C $repo commit-graph write --reachable --changed-paths 2>err &&
> +	grep "disabling Bloom filters for commit-graph layer .$layer." err &&
> +
> +	test_path_is_file $repo/$graph &&
> +	test_dir_is_empty $repo/$graphdir &&
> +
> +	# ...and merging existing ones.
> +	git -C $repo -c core.commitGraph=false log --oneline --no-decorate -- file \
> +		>expect 2>err &&
> +	GIT_TRACE2_PERF="$(pwd)/trace.perf" \
> +		git -C $repo log --oneline --no-decorate -- file >actual 2>err &&

But this test is a bit confusing to me, to be honest, also because the
comment for the second block here reads funny. We don't really merge
anything, do we? We only generate logs and compare that the log with and
without the resulting merged commit graph is the same. The actual logic
happened before.

> +	test_cmp expect actual && cat err &&

The `cat err` looks like a leftover from debugging.

> +	grep "statistics:{\"filter_not_present\":0" trace.perf &&

Also, why should the filter not be present here? If we merge the
commit-graphs with `--changed-paths` I'd have expected that we either
carry over bloom filters from preexisting commit graphs if compatible,
or otherwise generate them if they are either incompatible or don't
exist.

I feel like I'm missing something obvious, so this may be me just
missing the bigger picture.

> +	! grep "disabling Bloom filters" err

Can we make this assertion stricter and verify that `err` is empty? I
always think that `! grep` is quite a fragile pattern as it is quite
prone to becoming stale, e.g. when the error message itself would
change.

Patrick

> +'
> +
>  test_done
> -- 
> 2.42.0.342.g8bb3a896ee
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] commit: detect commits that exist in commit-graph but not in the ODB
From: Patrick Steinhardt @ 2023-10-17  6:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Karthik Nayak, Taylor Blau
In-Reply-To: <xmqq1qdy1iyr.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 5942 bytes --]

On Fri, Oct 13, 2023 at 11:21:48AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > @@ -572,8 +572,13 @@ int repo_parse_commit_internal(struct repository *r,
> >  		return -1;
> >  	if (item->object.parsed)
> >  		return 0;
> > -	if (use_commit_graph && parse_commit_in_graph(r, item))
> > +	if (use_commit_graph && parse_commit_in_graph(r, item)) {
> > +		if (!has_object(r, &item->object.oid, 0))
> > +			return quiet_on_missing ? -1 :
> > +				error(_("commit %s exists in commit-graph but not in the object database"),
> > +				      oid_to_hex(&item->object.oid));
> >  		return 0;
> > +	}
> 
> Ever since this codepath was introduced by 177722b3 (commit:
> integrate commit graph with commit parsing, 2018-04-10), we blindly
> trusted what commit-graph file says.  This change is a strict
> improvement in the correctness department, but there are two things
> that are a bit worrying.
> 
> One.  The additional check should certainly be cheaper than a full
> reading and parsing of an object, either from a loose object file or
> from a pack entry.  It may not hurt performance too much, but it
> still would give us more confidence if we know by how much we are
> pessimising good cases where the commit-graph does match reality.
> Our stance on these secondary files that store precomputed values
> for optimization purposes is in general to use them blindly unless
> in exceptional cases where the operation values the correctness even
> when the validity of these secondary files is dubious (e.g., "fsck"),
> and doing this extra check regardless of the caller at this low level
> of the callchain is a bit worrying.

Fair point indeed. The following is a worst-case scenario benchmark of
of the change where we do a full topological walk of all reachable
commits in the graph, executed in linux.git. We parse commit parents via
`repo_parse_commit_gently()`, so the new code path now basically has to
check for object existence of every reachable commit:

Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
  Time (mean ± σ):      2.913 s ±  0.018 s    [User: 2.363 s, System: 0.548 s]
  Range (min … max):    2.894 s …  2.950 s    10 runs

Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
  Time (mean ± σ):      3.834 s ±  0.052 s    [User: 3.276 s, System: 0.556 s]
  Range (min … max):    3.780 s …  3.961 s    10 runs

Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
  Time (mean ± σ):     13.841 s ±  0.084 s    [User: 13.152 s, System: 0.687 s]
  Range (min … max):   13.714 s … 13.995 s    10 runs

Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
  Time (mean ± σ):     13.762 s ±  0.116 s    [User: 13.094 s, System: 0.667 s]
  Range (min … max):   13.645 s … 14.038 s    10 runs

Summary
  git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
    1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)

The added check does lead to a performance regression indeed, which is
not all that unexpected. That being said, the commit-graph still results
in a significant speedup compared to the case where we don't have it.

> Another is that by the time parse_commit_in_graph() returns true and
> we realize that the answer we got is bogus by asking has_object(),
> item->object.parsed has already been toggled on, so the caller now
> has a commit object that claimed it was already parsed and does not
> match reality.  Hopefully the caller takes an early exit upon seeing
> a failure from parse_commit_gently() and the .parsed bit does not
> matter, but maybe I am missing a case where it does.  I dunno.

We could also call `unparse_commit()` when we notice the stale commit
graph item. This would be in the same spirit as the rest of this patch
as it would lead to an overall safer end state.

In any case I'll wait for additional input before sending a v2, most
importantly to see whether we think that consistency trumps performance
in this case. Personally I'm still of the mind that it should, which
also comes from the fact that we were fighting with stale commit graphs
several times in production data.

Patrick

> Other than that, sounds very sensible and the code change is clean.
> 
> Will queue.  Thanks.
> 
> > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> > index ba65f17dd9..25f8e9e2d3 100755
> > --- a/t/t5318-commit-graph.sh
> > +++ b/t/t5318-commit-graph.sh
> > @@ -821,4 +821,27 @@ test_expect_success 'overflow during generation version upgrade' '
> >  	)
> >  '
> >  
> > +test_expect_success 'commit exists in commit-graph but not in object database' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +
> > +		test_commit A &&
> > +		test_commit B &&
> > +		test_commit C &&
> > +		git commit-graph write --reachable &&
> > +
> > +		# Corrupt the repository by deleting the intermittent commit
> > +		# object. Commands should notice that this object is absent and
> > +		# thus that the repository is corrupt even if the commit graph
> > +		# exists.
> > +		oid=$(git rev-parse B) &&
> > +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
> > +
> > +		test_must_fail git rev-parse HEAD~2 2>error &&
> > +		grep "error: commit $oid exists in commit-graph but not in the object database" error
> > +	)
> > +'
> > +
> >  test_done

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: Proxy Flag for git-clone, push, pull
From: Eric Sunshine @ 2023-10-17  5:19 UTC (permalink / raw)
  To: Jaydeep Das; +Cc: Git Mailing List
In-Reply-To: <CACaPSotRhyFZ-eBZ9KNKRUjLFHKo09P-Un+sitDXEktzmwuaxA@mail.gmail.com>

On Tue, Oct 17, 2023 at 1:15 AM Jaydeep Das <jaydeepjd.8914@gmail.com> wrote:
> To make git work through proxy, we need to set the `http.proxy` config
> or need to set `http_proxy` environment variable.
>
> However wouldn't it be better if there was a flag in the command
> itself (like npm) which
> overrides whatever proxy is set. Something like
>
> git clone --proxy "http://..." <url>

The -c option allows you to specify configuration on the command-line, so:

    git -c http.proxy="http://..." clone <url>

should do what you want.

^ permalink raw reply

* Proxy Flag for git-clone, push, pull
From: Jaydeep Das @ 2023-10-17  5:15 UTC (permalink / raw)
  To: Git Mailing List

To make git work through proxy, we need to set the `http.proxy` config
or need to set `http_proxy` environment variable.

However wouldn't it be better if there was a flag in the command
itself (like npm) which
overrides whatever proxy is set. Something like

git clone --proxy "http://..." <url>

^ permalink raw reply

* [PATCH] Include gettext.h in MyFirstContribution tutorial
From: Jacob Stopak @ 2023-10-17  4:15 UTC (permalink / raw)
  To: git; +Cc: Jacob Stopak

The tutorial in Documentation/MyFirstContribution.txt has steps to print
some text using the "_" function. However, this leads to compiler errors
when running "make" since "gettext.h" is not #included.

Update docs with a note to #include "gettext.h" in "builtin/psuh.c".

Signed-off-by: Jacob Stopak <jacob@initialcommit.io>
---
 Documentation/MyFirstContribution.txt | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/MyFirstContribution.txt b/Documentation/MyFirstContribution.txt
index 62d11a5cd7..7cfed60c2e 100644
--- a/Documentation/MyFirstContribution.txt
+++ b/Documentation/MyFirstContribution.txt
@@ -160,10 +160,11 @@ in order to keep the declarations alphabetically sorted:
 int cmd_psuh(int argc, const char **argv, const char *prefix);
 ----
 
-Be sure to `#include "builtin.h"` in your `psuh.c`.
+Be sure to `#include "builtin.h"` in your `psuh.c`. You'll also need to
+`#include "gettext.h"` to use functions related to printing output text.
 
-Go ahead and add some throwaway printf to that function. This is a decent
-starting point as we can now add build rules and register the command.
+Go ahead and add some throwaway printf to the `cmd_psuh` function. This is a
+decent starting point as we can now add build rules and register the command.
 
 NOTE: Your throwaway text, as well as much of the text you will be adding over
 the course of this tutorial, is user-facing. That means it needs to be
-- 
2.42.0.398.ga9ecda2788.dirty


^ permalink raw reply related

* Re: [PATCH v3 1/1] bugreport: include +i in outfile suffix as needed
From: Jacob Stopak @ 2023-10-17  3:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq4jiqkwi1.fsf@gitster.g>

On Mon, Oct 16, 2023 at 03:55:50PM -0700, Junio C Hamano wrote:
> Jacob Stopak <jacob@initialcommit.io> writes:
> 
> >  builtin/bugreport.c | 83 +++++++++++++++++++++++++++++++--------------
> >  1 file changed, 57 insertions(+), 26 deletions(-)
> 
> Looking good.  It is not easy to do an automated and reliable test
> for this one for obvious reasons ;-), so let's queue it as-is.
> 
> Thanks.
> 

Awesome! Thank you!

> > -	/* fopen doesn't offer us an O_EXCL alternative, except with glibc. */
> > -	report = xopen(report_path.buf, O_CREAT | O_EXCL | O_WRONLY, 0666);
> > +	again:
> > +		/* fopen doesn't offer us an O_EXCL alternative, except with glibc. */
> > +		report = open(report_path.buf, O_CREAT | O_EXCL | O_WRONLY, 0666);
> > +		if (report < 0 && errno == EEXIST && !option_suffix_is_from_user) {
> > +			build_path(&report_path, prefixed_filename,
> > +				   "git-bugreport-", option_suffix, now, &i,
> > +				   ".txt");
> > +			goto again;
> > +		} else if (report < 0) {
> > +			die_errno(_("unable to open '%s'"), report_path.buf);
> > +		}
> 
> I didn't expect a rewrite to add an extra level of indentation like
> this, though ;-).

Whoops... I looked in another file to check the indentation around a
goto label and misread how it should be. Let me know if I should submit
v4 with that corrected.

^ permalink raw reply

* Bug: dir.c traversing the filesystem: unindexed directories do not get recursed into when there is a (non-excluding) pathspec
From: Joanna Wang @ 2023-10-17  0:08 UTC (permalink / raw)
  To: git

What did you do before the bug happened? (Steps to reproduce your issue)
I created a new untracked directory `foo/`.
In .gitattributes i added `bar_yes* text`
and created files `foo/bar_yes` and `foo/bar_no` with content 'chicken'

I tried the following commands:
git grep --no-index chicken -- ':(attr:text)foo/*'
git ls-files --others --exclude-standard ':(attr:text)foo/*'
git status ':(attr:text)foo/*'


What did you expect to happen? (Expected behavior)
For all of the above, I expected `foo/bar_yes` to be returned.

What happened instead? (Actual behavior)
For all three commands, nothing was returned.

Anything else you want to add:
If I change the pathspec to ':(exclude,attr:text)foo/*' it works as expected.
`foo/bar_no` is returned.

(except for git status where the output is just `foo/`, but I think
this is expected.)

This behavior seems due to this section in dir.c:
https://github.com/git/git/blob/master/dir.c#L1893-L1908
for a pathspec like ':(attr:text)foo/*', this section of code will prevent
`foo/` from getting recursed.
It only checks for a pathspec match against the directory name, so any
files under the dir aren't checked.
Commenting that part of the code out, seems to fix this issue.
(for `git status`, `foo/` is returned both with `exclude` in the
pathspec and without)
But I am not sure if simply removing that code is the right fix.

[Enabled Hooks]
commit-msg
pre-commit
prepare-commit-msg

^ permalink raw reply

* Re: [PATCH v2 2/2] Prevent git from rehashing 4GiB files
From: Jeff King @ 2023-10-17  0:00 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano, Taylor Blau, Jason Hatton
In-Reply-To: <20231012160930.330618-3-sandals@crustytoothpaste.net>

On Thu, Oct 12, 2023 at 04:09:30PM +0000, brian m. carlson wrote:

> +static unsigned int munge_st_size(off_t st_size) {
> +	unsigned int sd_size = st_size;
> +
> +	/*
> +	 * If the file is an exact multiple of 4 GiB, modify the value so it
> +	 * doesn't get marked as racily clean (zero).
> +	 */
> +	if (!sd_size && st_size)
> +		return 0x80000000;
> +	else
> +		return sd_size;
> +}

Coverity complained that the "true" side of this conditional is
unreachable, since sd_size is assigned from st_size, so the two values
cannot be both true and false. But obviously we are depending here on
the truncation of off_t to "unsigned int". I'm not sure if Coverity is
just dumb, or if it somehow has a different size for off_t.

I don't _think_ this would ever cause confusion in a real compiler, as
assignment from a larger type to a smaller has well-defined truncation,
as far as I know.

But I do wonder if an explicit "& 0xFFFFFFFF" would make it more obvious
what is happening (which would also do the right thing if in some
hypothetical platform "unsigned int" ended up larger than 32 bits).

-Peff

^ permalink raw reply

* Re: [PATCH v2 1/2] t: add a test helper to truncate files
From: Jeff King @ 2023-10-16 23:53 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano, Taylor Blau, Jason Hatton
In-Reply-To: <20231012160930.330618-2-sandals@crustytoothpaste.net>

On Thu, Oct 12, 2023 at 04:09:29PM +0000, brian m. carlson wrote:

> +int cmd__truncate(int argc, const char **argv)
> +{
> +	char *p = NULL;
> +	uintmax_t sz = 0;
> +	int fd = -1;
> +
> +	if (argc != 3)
> +		die("expected filename and size");
> +
> +	sz = strtoumax(argv[2], &p, 0);
> +	if (*p)
> +		die("invalid size");
> +
> +	fd = open(argv[1], O_WRONLY | O_CREAT, 0600);
> +	if (fd < 0)
> +		die_errno("failed to open file %s", argv[1]);
> +
> +	if (ftruncate(fd, (off_t) sz) < 0)
> +		die_errno("failed to truncate file");
> +	return 0;
> +}

Coverity flagged this as leaking the descriptor "fd" (which is obviously
true). I guess it doesn't matter much in practice, since we're exiting
the process directly afterwards. If it were a memory leak we'd free() it
anyway to shut up the leak detector, but I don't know if we want to be
as careful here (in theory it creates noise that distracts from real
problems, but Coverity is noisy enough that I don't even bother looking
at existing problems).

-Peff

^ permalink raw reply

* Re: [PATCH v3 1/1] bugreport: include +i in outfile suffix as needed
From: Junio C Hamano @ 2023-10-16 22:55 UTC (permalink / raw)
  To: Jacob Stopak; +Cc: git
In-Reply-To: <20231016214045.146862-2-jacob@initialcommit.io>

Jacob Stopak <jacob@initialcommit.io> writes:

>  builtin/bugreport.c | 83 +++++++++++++++++++++++++++++++--------------
>  1 file changed, 57 insertions(+), 26 deletions(-)

Looking good.  It is not easy to do an automated and reliable test
for this one for obvious reasons ;-), so let's queue it as-is.

Thanks.

> -	/* fopen doesn't offer us an O_EXCL alternative, except with glibc. */
> -	report = xopen(report_path.buf, O_CREAT | O_EXCL | O_WRONLY, 0666);
> +	again:
> +		/* fopen doesn't offer us an O_EXCL alternative, except with glibc. */
> +		report = open(report_path.buf, O_CREAT | O_EXCL | O_WRONLY, 0666);
> +		if (report < 0 && errno == EEXIST && !option_suffix_is_from_user) {
> +			build_path(&report_path, prefixed_filename,
> +				   "git-bugreport-", option_suffix, now, &i,
> +				   ".txt");
> +			goto again;
> +		} else if (report < 0) {
> +			die_errno(_("unable to open '%s'"), report_path.buf);
> +		}

I didn't expect a rewrite to add an extra level of indentation like
this, though ;-).

^ permalink raw reply

* Re: [PATCH v2] status: fix branch shown when not only bisecting
From: Junio C Hamano @ 2023-10-16 22:08 UTC (permalink / raw)
  To: Rubén Justo; +Cc: Git List
In-Reply-To: <2e24ca9b-9c5f-f4df-b9f8-6574a714dfb2@gmail.com>

Rubén Justo <rjusto@gmail.com> writes:

> In 83c750acde (wt-status.*: better advice for git status added,
> 2012-06-05), git-status received new informative messages to describe
> the ongoing work in a worktree.
>
> These messages were enhanced in 0722c805d6 (status: show the branch name
> if possible in in-progress info, 2013-02-03), to show, if possible, the
> branch where the operation was initiated.
>
> Since then, we show incorrect information when several operations are in
> progress and one of them is bisect:
>
>    $ git checkout -b foo
>    $ GIT_SEQUENCE_EDITOR='echo break >' git rebase -i HEAD~
>    $ git checkout -b bar
>    $ git bisect start
>    $ git status
>    ...
>
>    You are currently editing a commit while rebasing branch 'bar' on '...'.
>
>    You are currently bisecting, started from branch 'bar'.
>
>    ...
>
> Note that we erroneously say "while rebasing branch 'bar'" when we
> should be referring to "foo".
>
> This must have gone unnoticed for so long because it must be unusual to
> start a bisection while another operation is in progress.  And even less
> usual to involve different branches.
>
> It caught my attention reviewing a leak introduced in 8b87cfd000
> (wt-status: move strbuf into read_and_strip_branch(), 2013-03-16).
>
> A simple change to deal with this situation can be to record in struct
> wt_status_state, the branch where the bisect starts separately from the
> branch related to other operations.
>
> Let's do it and so we'll be able to display correct information and
> we'll avoid the leak as well.
>
> Signed-off-by: Rubén Justo <rjusto@gmail.com>
>
> ---
>
> Let's try again.

Sigh... nobody seems to be interested in making sure this correctly
improves the system X-<.

After a quick re-read, I didn't find anything glaringly wrong, so
let's see if anybody complains after the patch gets merged to
'next'.

Thanks.

^ permalink raw reply

* Re: [PATCH 00/25] Documentation fixes
From: Junio C Hamano @ 2023-10-16 21:54 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren
In-Reply-To: <pull.1595.git.1696747527.gitgitgadget@gmail.com>

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> I did review every single change here, multiple times, and I have tried to
> split up this series in a way to make it easier to review. In particular I:
> ...
> (Note: every patch in this series, except for the whitespace fixes patch,
> are best viewed with --color-words.)

I didn't think of anything clever, so ended up reading a bit by bit
over days.  I didn't find anything glaringly wrong ;-)

Let's mark it for 'next'.

Thanks.

^ permalink raw reply

* [PATCH v3 1/1] bugreport: include +i in outfile suffix as needed
From: Jacob Stopak @ 2023-10-16 21:40 UTC (permalink / raw)
  To: git; +Cc: Jacob Stopak
In-Reply-To: <20231016214045.146862-1-jacob@initialcommit.io>

When the -s flag is absent, git bugreport includes the current hour and
minute values in the default bugreport filename (and diagnostics zip
filename if --diagnose is supplied).

If a user runs the bugreport command more than once within a minute, a
filename conflict with an existing file occurs and the program errors,
since the new output filename was already used for the previous file. If
the user waits anywhere from 1 to 60 seconds (depending on when during
the minute the first command was run) the command works again with no
error since the default filename is now unique, and multiple bug reports
are able to be created with default settings.

This is a minor thing but can cause confusion for first time users of
the bugreport command, who are likely to run it multiple times in quick
succession to learn how it works, (like I did). Or users who quickly
fill in a few details before closing and creating a new one.

Add a '+i' into the bugreport filename suffix where 'i' is an integer
starting at 1 and growing as needed until a unique filename is obtained.

This leads to default output filenames like:

git-bugreport-%Y-%m-%d-%H%M+1.txt
git-bugreport-%Y-%m-%d-%H%M+2.txt
...
git-bugreport-%Y-%m-%d-%H%M+i.txt

This means the user will end up with multiple bugreport files being
created if they run the command multiple times quickly, but that feels
more intuitive and consistent than an error arbitrarily occuring within
a minute, especially given that the time window in which the error
currently occurs is variable as described above.

If --diagnose is supplied, match the incremented suffix of the
diagnostics zip file to the bugreport.

Signed-off-by: Jacob Stopak <jacob@initialcommit.io>
---
 builtin/bugreport.c | 83 +++++++++++++++++++++++++++++++--------------
 1 file changed, 57 insertions(+), 26 deletions(-)

diff --git a/builtin/bugreport.c b/builtin/bugreport.c
index d2ae5c305d..ed65735873 100644
--- a/builtin/bugreport.c
+++ b/builtin/bugreport.c
@@ -11,6 +11,7 @@
 #include "diagnose.h"
 #include "object-file.h"
 #include "setup.h"
+#include "dir.h"
 
 static void get_system_info(struct strbuf *sys_info)
 {
@@ -97,20 +98,41 @@ static void get_header(struct strbuf *buf, const char *title)
 	strbuf_addf(buf, "\n\n[%s]\n", title);
 }
 
+static void build_path(struct strbuf *buf, const char *dir_path,
+		       const char *prefix, const char *suffix,
+		       time_t t, int *i, const char *ext)
+{
+	struct tm tm;
+
+	strbuf_reset(buf);
+	strbuf_addstr(buf, dir_path);
+	strbuf_complete(buf, '/');
+
+	strbuf_addstr(buf, prefix);
+	strbuf_addftime(buf, suffix, localtime_r(&t, &tm), 0, 0);
+
+	if (*i > 0)
+		strbuf_addf(buf, "+%d", *i);
+
+	strbuf_addstr(buf, ext);
+
+	(*i)++;
+}
+
 int cmd_bugreport(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf buffer = STRBUF_INIT;
 	struct strbuf report_path = STRBUF_INIT;
 	int report = -1;
 	time_t now = time(NULL);
-	struct tm tm;
 	enum diagnose_mode diagnose = DIAGNOSE_NONE;
 	char *option_output = NULL;
-	char *option_suffix = "%Y-%m-%d-%H%M";
+	char *option_suffix = "";
+	int option_suffix_is_from_user = 0;
 	const char *user_relative_path = NULL;
 	char *prefixed_filename;
-	size_t output_path_len;
 	int ret;
+	int i = 0;
 
 	const struct option bugreport_options[] = {
 		OPT_CALLBACK_F(0, "diagnose", &diagnose, N_("mode"),
@@ -126,16 +148,16 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, bugreport_options,
 			     bugreport_usage, 0);
 
+	if (!strlen(option_suffix))
+		option_suffix = "%Y-%m-%d-%H%M";
+	else
+		option_suffix_is_from_user = 1;
+
 	/* Prepare the path to put the result */
 	prefixed_filename = prefix_filename(prefix,
 					    option_output ? option_output : "");
-	strbuf_addstr(&report_path, prefixed_filename);
-	strbuf_complete(&report_path, '/');
-	output_path_len = report_path.len;
-
-	strbuf_addstr(&report_path, "git-bugreport-");
-	strbuf_addftime(&report_path, option_suffix, localtime_r(&now, &tm), 0, 0);
-	strbuf_addstr(&report_path, ".txt");
+	build_path(&report_path, prefixed_filename, "git-bugreport-",
+		   option_suffix, now, &i, ".txt");
 
 	switch (safe_create_leading_directories(report_path.buf)) {
 	case SCLD_OK:
@@ -146,20 +168,6 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 		    report_path.buf);
 	}
 
-	/* Prepare diagnostics, if requested */
-	if (diagnose != DIAGNOSE_NONE) {
-		struct strbuf zip_path = STRBUF_INIT;
-		strbuf_add(&zip_path, report_path.buf, output_path_len);
-		strbuf_addstr(&zip_path, "git-diagnostics-");
-		strbuf_addftime(&zip_path, option_suffix, localtime_r(&now, &tm), 0, 0);
-		strbuf_addstr(&zip_path, ".zip");
-
-		if (create_diagnostics_archive(&zip_path, diagnose))
-			die_errno(_("unable to create diagnostics archive %s"), zip_path.buf);
-
-		strbuf_release(&zip_path);
-	}
-
 	/* Prepare the report contents */
 	get_bug_template(&buffer);
 
@@ -169,14 +177,37 @@ int cmd_bugreport(int argc, const char **argv, const char *prefix)
 	get_header(&buffer, _("Enabled Hooks"));
 	get_populated_hooks(&buffer, !startup_info->have_repository);
 
-	/* fopen doesn't offer us an O_EXCL alternative, except with glibc. */
-	report = xopen(report_path.buf, O_CREAT | O_EXCL | O_WRONLY, 0666);
+	again:
+		/* fopen doesn't offer us an O_EXCL alternative, except with glibc. */
+		report = open(report_path.buf, O_CREAT | O_EXCL | O_WRONLY, 0666);
+		if (report < 0 && errno == EEXIST && !option_suffix_is_from_user) {
+			build_path(&report_path, prefixed_filename,
+				   "git-bugreport-", option_suffix, now, &i,
+				   ".txt");
+			goto again;
+		} else if (report < 0) {
+			die_errno(_("unable to open '%s'"), report_path.buf);
+		}
 
 	if (write_in_full(report, buffer.buf, buffer.len) < 0)
 		die_errno(_("unable to write to %s"), report_path.buf);
 
 	close(report);
 
+	/* Prepare diagnostics, if requested */
+	if (diagnose != DIAGNOSE_NONE) {
+		struct strbuf zip_path = STRBUF_INIT;
+		i--; /* Undo last increment to match zipfile suffix to bugreport */
+		build_path(&zip_path, prefixed_filename, "git-diagnostics-",
+			   option_suffix, now, &i, ".zip");
+
+		if (create_diagnostics_archive(&zip_path, diagnose))
+			die_errno(_("unable to create diagnostics archive %s"),
+				  zip_path.buf);
+
+		strbuf_release(&zip_path);
+	}
+
 	/*
 	 * We want to print the path relative to the user, but we still need the
 	 * path relative to us to give to the editor.
-- 
2.42.0.297.g36452639b8


^ permalink raw reply related

* [PATCH v3 0/1] bugreport: include +i in outfile suffix as needed
From: Jacob Stopak @ 2023-10-16 21:40 UTC (permalink / raw)
  To: git; +Cc: Jacob Stopak, Junio C Hamano
In-Reply-To: <20231015034238.100675-2-jacob@initialcommit.io>

Similar functionality to the previous patch version but with the
following changes:

  * Remove the default_option_suffix variable and clean up the way the
    option_suffix value is set.

  * Refactor code for building report path and diagnostics zip file path
    into the new function build_path(...), which builds a fresh path
    from scratch each time it's called. Although it does take quite a few
    arguments, it reduces duplicated code and simplifies the code by
    removing the need for splicing or inserting the incrementable suffix.

  * Allow the increment value 'i' to grow as needed instead of stopping
    at 9.

  * Replace xopen() with open() so that the program doesn't die with an
    error if the file already exists.

  * Replace the while loop with a fallback loop to prevent TOCTOU.

  * Clean up commit message verbiage.

Some additional comments:

> Perhaps we should refine the error message we give in this case and
> we are done, then?

I had thought about this but due to the variable timed behavior I had
trouble coming up with a message that would convey clearly what's going
on and what the user should do. Here were some things that popped into
my head but they all sounded a bit silly to me:

"A bugreport with this name already exists, try again shortly"
or
"File exists: wait 1 minute and try again or use a custom suffix with -s"
or
"File exists: try again in 1 to 60 seconds :-P"

I think the reason they sound silly to me is that the user is only being
made aware of this info because the program happens to be operating this
way - instead of the program working in a smoother way where this type of
operational info wouldn't need to be conveyed.

> What downside do you see in using 2023-10-15-1008+10 after you tried
> 9 of them?  The code to limit the upper bound smells like a wasted
> effort that helps nobody in practice because it is "unlikely".

> And limiting the upper bound also means you now have to have extra
> code to deal with "we ran out---error out and help the user how to
> recover" anyway.

I completely agree with this and made these updates.

> Notice that you said "in a single minute" without "calendar" and the
> sentence is perfectly understandable?

I googled "calendar minute" and the only thing that came up was some
Java documentation... So I guess I just made it up... :'D

Jacob Stopak (1):
  bugreport: include +i in outfile suffix as needed

 builtin/bugreport.c | 83 +++++++++++++++++++++++++++++++--------------
 1 file changed, 57 insertions(+), 26 deletions(-)


base-commit: 493f4622739e9b64f24b465b21aa85870dd9dc09
-- 
2.42.0.297.g36452639b8


^ permalink raw reply

* Re: Method for Calculating Statistics of Developer Contribution to a Specified Branch.
From: brian m. carlson @ 2023-10-16 21:25 UTC (permalink / raw)
  To: Hongyi Zhao; +Cc: Git List
In-Reply-To: <CAGP6POKg4mSFv-Z+dD1aXDFDbxH9Xu1WCdCA5TGfCAM3NUUQLw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]

On 2023-10-16 at 14:10:01, Hongyi Zhao wrote:
> Dear Git Mailing List,
> 
> I am a developer currently working on a project and I wanted to
> establish statistics for each team member's contribution to a specific
> branch.
> 
> Say, for a user "JianboLin", I am currently using the following method:
> 
> $ git clone https://github.com/OrderN/CONQUEST-release.git
> $ cd CONQUEST-release
> $ git log --author="JianboLin" --stat --summary origin/f-mlff | awk
> 'NF ==4 && $2 =="|" && $3 ~/[0-9]+/ && $4 ~/[+-]+|[+]+|[-]+/ {s+=$3}
> END {print s}'
> 
> Using the above command, I am able to calculate the number of lines
> contributed by a specific author on a specific branch, which allows me
> to quantify the contribution to a branch by each team member.
> 
> However, I would like to know if a more efficient or accurate method
> exists to carry out this task. Are there any other parameters,
> commands, or aspects I need to consider to get a more comprehensive
> measure of contribution?

Can you maybe explain what you want to measure and what your goal is in
doing so?

The problem is that lines of code isn't really that useful as a measure
of contribution value or developer productivity, which are the reasons
people typically measure that metric.  For example, with three lines, a
colleague fixed a persistently difficult-to-reproduce problem which had
been affecting many of our largest customers.  That was a very valuable
contribution, but not very large.  I've made similar kinds of changes
myself, both at work and in open source projects.

Certainly you can compute the number of lines of code changed by a
developer, but that is not typically a very useful metric, since it
doesn't lead you to any interesting conclusions about the benefits or
value of the contributions or developer in question.  However, perhaps
you have a different goal in mind, and if you can explain what that is,
we may be able to help you find a better way of doing it.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/3] rev-list: add support for commits in `--missing`
From: Junio C Hamano @ 2023-10-16 20:33 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps
In-Reply-To: <CAOLa=ZT4EuoB8GHMe+a2nfq8Pfg5x7xzrEa_qV39U1HqUyS+Eg@mail.gmail.com>

Karthik Nayak <karthik.188@gmail.com> writes:

> Well, actually the newly introduced tests t6022 require this block,
> but this is specific
> to when commit graph is enabled.

Ah, of course.

> Thanks for the quick review, I'll wait a day/two and send v3 with the
> changes to fix tests
> and cleaner code.

Thanks.

^ permalink raw reply

* Git does not work on my machine!!!
From: jtrites @ 2023-10-16 20:20 UTC (permalink / raw)
  To: git; +Cc: John Trites

I would appreciate some help resolving this issue!!!

JohnTrites@MSI MINGW64 /c/Users/JohnTrites
$ git config --global user.name "JohnTrites"
error: could not lock config file C:/Users/John
Trites/AppData/Roaming/SPB_16.6/.gitconfig: No such file or directory



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox