All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, me@ttaylorr.com,
	garimasigit@gmail.com, szeder.dev@gmail.com, jnareb@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 03/12] Documentation: changed-path Bloom filters use byte words
Date: Fri, 1 May 2020 16:55:00 -0600	[thread overview]
Message-ID: <20200501225500.GA46146@syl.local> (raw)
In-Reply-To: <03b2c84db36754b321dd15cd8a9e1a22b7c11c36.1588347029.git.gitgitgadget@gmail.com>

On Fri, May 01, 2020 at 03:30:20PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> In Documentation/technical/commit-graph-format.txt, the definition
> of the BIDX chunk specifies the length is a number of 8-byte words.
> During development we discovered that using 8-byte words in the
> Murmur3 hash algorithm causes issues with Big-Endian versus Little-
> Endian machines. Thus, the hash algorithm was adapted to work on a

I'm nit-picking here, but I believe that "Little Endian" is usually
spelled "little endian", right?

At least, 'git log --oneline -Gendian | wc -l' doesn't return many/any
results for me, but '-GEndian' does.

> byte-by-byte basis. However, this caused a change in the definition
> of a "word" in bloom.h. Now, a "word" is a single byte, which allows
> filters to be as small as two bytes. These length-two filters are
> demonstrated in t0095-bloom.sh, and a larger filter of length 25 is
> demonstrated as well.
>
> The original point of using 8-byte words was for alignment reasons.
> It also presented opportunities for extremely sparse Bloom filters
> when there were a small number of changes at a commit, creating a
> very low false-positive rate. However, modifying the format at this
> point is unlikely to be a valuable exercise. Also, this use of
> single-byte granularity does present opportunities to save space.
> It is unclear if 8-byte alignment of the filters would present any
> meaningful performance benefits.
>
> Modify the format document to reflect reality.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

The rest of this patch looks good, and indeed matches reality ;).

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

> ---
>  Documentation/technical/commit-graph-format.txt | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
> index de56f9f1efd..1beef171822 100644
> --- a/Documentation/technical/commit-graph-format.txt
> +++ b/Documentation/technical/commit-graph-format.txt
> @@ -97,10 +97,10 @@ CHUNK DATA:
>        bit on. The other bits correspond to the position of the last parent.
>
>    Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional]
> -    * The ith entry, BIDX[i], stores the number of 8-byte word blocks in all
> -      Bloom filters from commit 0 to commit i (inclusive) in lexicographic
> -      order. The Bloom filter for the i-th commit spans from BIDX[i-1] to
> -      BIDX[i] (plus header length), where BIDX[-1] is 0.
> +    * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters
> +      from commit 0 to commit i (inclusive) in lexicographic order. The Bloom
> +      filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header
> +      length), where BIDX[-1] is 0.
>      * The BIDX chunk is ignored if the BDAT chunk is not present.
>
>    Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
> --
> gitgitgadget

Thanks,
Taylor

  reply	other threads:[~2020-05-01 22:55 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-01 15:30 [PATCH 00/12] Integrating line-log and changed-path Bloom filters Derrick Stolee via GitGitGadget
2020-05-01 15:30 ` [PATCH 01/12] bloom: fix whitespace around tab length Derrick Stolee via GitGitGadget
2020-05-01 22:51   ` Taylor Blau
2020-05-01 15:30 ` [PATCH 02/12] test-bloom: fix usage typo Derrick Stolee via GitGitGadget
2020-05-01 22:51   ` Taylor Blau
2020-05-01 15:30 ` [PATCH 03/12] Documentation: changed-path Bloom filters use byte words Derrick Stolee via GitGitGadget
2020-05-01 22:55   ` Taylor Blau [this message]
2020-05-01 15:30 ` [PATCH 04/12] bloom: de-duplicate directory entries Derrick Stolee via GitGitGadget
2020-05-01 23:06   ` Taylor Blau
2020-05-01 15:30 ` [PATCH 05/12] bloom: parse commit before computing filters Derrick Stolee via GitGitGadget
2020-05-01 23:10   ` Taylor Blau
2020-05-01 15:30 ` [PATCH 06/12] bloom: use num_changes not nr for limit detection Derrick Stolee via GitGitGadget
2020-05-01 23:12   ` Taylor Blau
2020-05-01 15:30 ` [PATCH 07/12] completion: offer '--(no-)patch' among 'git log' options SZEDER Gábor via GitGitGadget
2020-05-01 23:44   ` Taylor Blau
2020-05-01 15:30 ` [PATCH 08/12] line-log: remove unused fields from 'struct line_log_data' SZEDER Gábor via GitGitGadget
2020-05-01 23:46   ` Taylor Blau
2020-05-01 15:30 ` [PATCH 09/12] t4211-line-log: add tests for parent oids SZEDER Gábor via GitGitGadget
2020-05-04 17:31   ` Taylor Blau
2020-05-01 15:30 ` [PATCH 10/12] line-log: more responsive, incremental 'git log -L' SZEDER Gábor via GitGitGadget
2020-05-04 17:52   ` Taylor Blau
2020-05-04 17:55     ` Derrick Stolee
2020-05-01 15:30 ` [PATCH 11/12] line-log: try to use generation number-based topo-ordering SZEDER Gábor via GitGitGadget
2020-05-04 21:25   ` Taylor Blau
2020-05-01 15:30 ` [PATCH 12/12] line-log: integrate with changed-path Bloom filters Derrick Stolee via GitGitGadget
2020-05-04 21:50   ` Taylor Blau
2020-05-01 17:34 ` [PATCH 00/12] Integrating line-log and " Junio C Hamano
2020-05-11 11:56 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2020-05-11 11:56   ` [PATCH v2 01/12] bloom: fix whitespace around tab length Derrick Stolee via GitGitGadget
2020-05-11 11:56   ` [PATCH v2 02/12] test-bloom: fix usage typo Derrick Stolee via GitGitGadget
2020-05-11 11:56   ` [PATCH v2 03/12] bloom: parse commit before computing filters Derrick Stolee via GitGitGadget
2020-05-11 11:56   ` [PATCH v2 04/12] Documentation: changed-path Bloom filters use byte words Derrick Stolee via GitGitGadget
2020-05-11 11:56   ` [PATCH v2 05/12] bloom: de-duplicate directory entries Derrick Stolee via GitGitGadget
2020-06-07 21:45     ` SZEDER Gábor
2020-05-11 11:56   ` [PATCH v2 06/12] bloom: use num_changes not nr for limit detection Derrick Stolee via GitGitGadget
2020-08-04 14:51     ` SZEDER Gábor
2020-05-11 11:56   ` [PATCH v2 07/12] completion: offer '--(no-)patch' among 'git log' options SZEDER Gábor via GitGitGadget
2020-05-11 11:56   ` [PATCH v2 08/12] line-log: remove unused fields from 'struct line_log_data' SZEDER Gábor via GitGitGadget
2020-05-11 11:56   ` [PATCH v2 09/12] t4211-line-log: add tests for parent oids SZEDER Gábor via GitGitGadget
2020-05-11 11:56   ` [PATCH v2 10/12] line-log: more responsive, incremental 'git log -L' SZEDER Gábor via GitGitGadget
2020-05-11 11:56   ` [PATCH v2 11/12] line-log: try to use generation number-based topo-ordering SZEDER Gábor via GitGitGadget
2020-05-11 11:56   ` [PATCH v2 12/12] line-log: integrate with changed-path Bloom filters Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200501225500.GA46146@syl.local \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=garimasigit@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.