* [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()`
From: Taylor Blau @ 2023-10-06 22:01 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>
Before deflating a blob into a pack, the bulk-checkin mechanism prepares
the pack object header by calling `format_object_header()`, and writing
into a scratch buffer, the contents of which eventually makes its way
into the pack.
Future commits will add support for deflating multiple kinds of objects
into a pack, and will likewise need to perform a similar operation as
below.
This is a mostly straightforward extraction, with one notable exception.
Instead of hard-coding `the_hash_algo`, pass it in to the new function
as an argument. This isn't strictly necessary for our immediate purposes
here, but will prove useful in the future if/when the bulk-checkin
mechanism grows support for the hash transition plan.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
bulk-checkin.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 223562b4e7..0aac3dfe31 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -247,6 +247,19 @@ static void prepare_to_stream(struct bulk_checkin_packfile *state,
die_errno("unable to write pack header");
}
+static void format_object_header_hash(const struct git_hash_algo *algop,
+ git_hash_ctx *ctx, enum object_type type,
+ size_t size)
+{
+ unsigned char header[16384];
+ unsigned header_len = format_object_header((char *)header,
+ sizeof(header),
+ type, size);
+
+ algop->init_fn(ctx);
+ algop->update_fn(ctx, header, header_len);
+}
+
static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
struct object_id *result_oid,
int fd, size_t size,
@@ -254,8 +267,6 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
{
off_t seekback, already_hashed_to;
git_hash_ctx ctx;
- unsigned char obuf[16384];
- unsigned header_len;
struct hashfile_checkpoint checkpoint = {0};
struct pack_idx_entry *idx = NULL;
@@ -263,10 +274,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
if (seekback == (off_t) -1)
return error("cannot find the current offset");
- header_len = format_object_header((char *)obuf, sizeof(obuf),
- OBJ_BLOB, size);
- the_hash_algo->init_fn(&ctx);
- the_hash_algo->update_fn(&ctx, obuf, header_len);
+ format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
/* Note: idx is non-NULL when we are writing */
if ((flags & HASH_WRITE_OBJECT) != 0)
--
2.42.0.8.g7a7e1e881e.dirty
^ permalink raw reply related
* [PATCH 0/7] merge-ort: implement support for packing objects together
From: Taylor Blau @ 2023-10-06 22:01 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
(Based on the tip of 'eb/limit-bulk-checkin-to-blobs'.)
This series implements support for a new merge-tree option,
`--write-pack`, which causes any newly-written objects to be packed
together instead of being stored individually as loose.
The motivating use-case behind these changes is to better support
repositories who invoke merge-tree frequently, generating a potentially
large number of loose objects, resulting in a possible adverse effect on
performance.
The majority of the changes here are preparatory to refactor common
routines out of the bulk-checkin machinery to prepare for indexing
different types of objects whose contents can be held in-core.
Also worth noting is the relative ease this series can be adapted to
support the $NEW_HASH interop work in 'eb/hash-transition-rfc'. For more
details on the relatively small number of changes necessary to make that
work, see the log message of the second-to-last patch.
Thanks in advance for your review!
Taylor Blau (7):
bulk-checkin: factor out `format_object_header_hash()`
bulk-checkin: factor out `prepare_checkpoint()`
bulk-checkin: factor out `truncate_checkpoint()`
bulk-checkin: factor our `finalize_checkpoint()`
bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
builtin/merge-tree.c: implement support for `--write-pack`
Documentation/git-merge-tree.txt | 4 +
builtin/merge-tree.c | 5 +
bulk-checkin.c | 249 ++++++++++++++++++++++++++-----
bulk-checkin.h | 8 +
merge-ort.c | 43 ++++--
merge-recursive.h | 1 +
t/t4301-merge-tree-write-tree.sh | 93 ++++++++++++
7 files changed, 355 insertions(+), 48 deletions(-)
--
2.42.0.8.g7a7e1e881e.dirty
^ permalink raw reply
* Re: [PATCH 2/4] dir.[ch]: expose 'get_dtype'
From: Junio C Hamano @ 2023-10-06 22:00 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
In-Reply-To: <24014010ea350a2ea8676b6560ca1d60838c56ef.1696615769.git.gitgitgadget@gmail.com>
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> Move 'get_dtype()' from 'diagnose.c' to 'dir.c' and add its declaration to
> 'dir.h' so that it is accessible to callers in other files. The function and
> its documentation are moved verbatim except for a small addition to the
> description clarifying what the 'path' arg represents.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> diagnose.c | 36 ------------------------------------
> dir.c | 28 ++++++++++++++++++++++++++++
> dir.h | 11 +++++++++++
> 3 files changed, 39 insertions(+), 36 deletions(-)
OK. diagnose.c should still have access to the function as it
includes <dir.h>, and to anybody that includes <dir.h> and sees the
declaration of get_dtype(), DT_FOO should be visible because <dir.h>
includes <statinfo.h> that has fallback definition of DT_FOO.
Looking simple and straight-forward.
> diff --git a/diagnose.c b/diagnose.c
> index 8430064000b..fc4d344bd63 100644
> --- a/diagnose.c
> +++ b/diagnose.c
> @@ -71,42 +71,6 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
> return 0;
> }
>
> -/*
> - * Get the d_type of a dirent. If the d_type is unknown, derive it from
> - * stat.st_mode.
> - *
> - * Note that 'path' is assumed to have a trailing slash. It is also modified
> - * in-place during the execution of the function, but is then reverted to its
> - * original value before returning.
> - */
> -static unsigned char get_dtype(struct dirent *e, struct strbuf *path)
> -{
> - struct stat st;
> - unsigned char dtype = DTYPE(e);
> - size_t base_path_len;
> -
> - if (dtype != DT_UNKNOWN)
> - return dtype;
> -
> - /* d_type unknown in dirent, try to fall back on lstat results */
> - base_path_len = path->len;
> - strbuf_addstr(path, e->d_name);
> - if (lstat(path->buf, &st))
> - goto cleanup;
> -
> - /* determine d_type from st_mode */
> - if (S_ISREG(st.st_mode))
> - dtype = DT_REG;
> - else if (S_ISDIR(st.st_mode))
> - dtype = DT_DIR;
> - else if (S_ISLNK(st.st_mode))
> - dtype = DT_LNK;
> -
> -cleanup:
> - strbuf_setlen(path, base_path_len);
> - return dtype;
> -}
> -
> static int count_files(struct strbuf *path)
> {
> DIR *dir = opendir(path->buf);
> diff --git a/dir.c b/dir.c
> index 8486e4d56ff..5e01af3a25e 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2235,6 +2235,34 @@ static int get_index_dtype(struct index_state *istate,
> return DT_UNKNOWN;
> }
>
> +unsigned char get_dtype(struct dirent *e, struct strbuf *path)
> +{
> + struct stat st;
> + unsigned char dtype = DTYPE(e);
> + size_t base_path_len;
> +
> + if (dtype != DT_UNKNOWN)
> + return dtype;
> +
> + /* d_type unknown in dirent, try to fall back on lstat results */
> + base_path_len = path->len;
> + strbuf_addstr(path, e->d_name);
> + if (lstat(path->buf, &st))
> + goto cleanup;
> +
> + /* determine d_type from st_mode */
> + if (S_ISREG(st.st_mode))
> + dtype = DT_REG;
> + else if (S_ISDIR(st.st_mode))
> + dtype = DT_DIR;
> + else if (S_ISLNK(st.st_mode))
> + dtype = DT_LNK;
> +
> +cleanup:
> + strbuf_setlen(path, base_path_len);
> + return dtype;
> +}
> +
> static int resolve_dtype(int dtype, struct index_state *istate,
> const char *path, int len)
> {
> diff --git a/dir.h b/dir.h
> index ad06682fd54..28c630ce806 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -363,6 +363,17 @@ struct dir_struct {
>
> struct dirent *readdir_skip_dot_and_dotdot(DIR *dirp);
>
> +/*
> + * Get the d_type of a dirent. If the d_type is unknown, derive it from
> + * stat.st_mode using the path to the dirent's containing directory (path) and
> + * the name of the dirent itself.
> + *
> + * Note that 'path' is assumed to have a trailing slash. It is also modified
> + * in-place during the execution of the function, but is then reverted to its
> + * original value before returning.
> + */
> +unsigned char get_dtype(struct dirent *e, struct strbuf *path);
> +
> /*Count the number of slashes for string s*/
> int count_slashes(const char *s);
^ permalink raw reply
* Re: [PATCH 1/4] ref-cache.c: fix prefix matching in ref iteration
From: Junio C Hamano @ 2023-10-06 21:51 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
In-Reply-To: <59276a5b3fd1fd3b25db73e096cf0e834af2d4f9.1696615769.git.gitgitgadget@gmail.com>
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
>
> Update 'cache_ref_iterator_advance' to skip over refs that are not matched
> by the given prefix.
>
> Currently, a ref entry is considered "matched" if the entry name is fully
> contained within the prefix:
>
> * prefix: "refs/heads/v1"
> * entry: "refs/heads/v1.0"
>
> OR if the prefix is fully contained in the entry name:
>
> * prefix: "refs/heads/v1.0"
> * entry: "refs/heads/v1"
>
> The first case is always correct, but the second is only correct if the ref
> cache entry is a directory, for example:
>
> * prefix: "refs/heads/example"
> * entry: "refs/heads/"
>
> Modify the logic in 'cache_ref_iterator_advance' to reflect these
> expectations:
>
> 1. If 'overlaps_prefix' returns 'PREFIX_EXCLUDES_DIR', then the prefix and
> ref cache entry do not overlap at all. Skip this entry.
> 2. If 'overlaps_prefix' returns 'PREFIX_WITHIN_DIR', then the prefix matches
> inside this entry if it is a directory. Skip if the entry is not a
> directory, otherwise iterate over it.
> 3. Otherwise, 'overlaps_prefix' returned 'PREFIX_CONTAINS_DIR', indicating
> that the cache entry (directory or not) is fully contained by or equal to
> the prefix. Iterate over this entry.
>
> Note that condition 2 relies on the names of directory entries having the
> appropriate trailing slash. The existing function documentation of
> 'create_dir_entry' explicitly calls out the trailing slash requirement, so
> this is a safe assumption to make.
Thanks for explaining it very well and clearly.
Allowing prefix="refs/heads/v1.0" to yield entry="refs/heads/v1"
(case #2 above that this patch fixes the behaviour for) would cause
ref_iterator_advance() to return a ref outside the hierarhcy,
wouldn't it? So it appears to me that either one of the two would
be true:
* the code is structured in such a way that such a condition does
not actually happen (in which case this patch would be a no-op),
or
* there is a bug in the current code that is fixed by this patch,
whose externally observable behaviour can be verified with a
test.
It is not quite clear to me which is the case here. The code with
the patch looks more logical than the original, but I am not sure
how to demonstrate the existing breakage (if any).
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> refs/ref-cache.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 2294c4564fb..6e3b725245c 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -412,7 +412,8 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
>
> if (level->prefix_state == PREFIX_WITHIN_DIR) {
> entry_prefix_state = overlaps_prefix(entry->name, iter->prefix);
> - if (entry_prefix_state == PREFIX_EXCLUDES_DIR)
> + if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
> + (entry_prefix_state == PREFIX_WITHIN_DIR && !(entry->flag & REF_DIR)))
> continue;
> } else {
> entry_prefix_state = level->prefix_state;
^ permalink raw reply
* Re: [PATCH 0/4] Performance improvement & cleanup in loose ref iteration
From: Junio C Hamano @ 2023-10-06 19:09 UTC (permalink / raw)
To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
In-Reply-To: <pull.1594.git.1696615769.gitgitgadget@gmail.com>
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> While investigating ref iteration performance in builtins like
> 'for-each-ref' and 'show-ref', I found two small improvement opportunities.
>
> The first patch tweaks the logic around prefix matching in
> 'cache_ref_iterator_advance' so that we correctly skip refs that do not
> actually match a given prefix. The unnecessary iteration doesn't seem to be
> causing any bugs in the ref iteration commands that I've tested, but it
> doesn't hurt to be more precise (and it helps with some other patches I'm
> working on ;) ).
>
> The next three patches update how 'loose_fill_ref_dir' determines the type
> of ref cache entry to create (directory or regular). On platforms that
> include d_type information in 'struct dirent' (as far as I can tell, all
> except NonStop & certain versions of Cygwin), this allows us to skip calling
> 'stat'. In ad-hoc testing, this improved performance of 'git for-each-ref'
> by about 20%.
Yay. That is a very obvious one, once it is pointed out. Thanks
for noticing and improving. Looking forward to reading the patches
themselves.
^ permalink raw reply
* Re: How To Pick And Work On A Microproject
From: Junio C Hamano @ 2023-10-06 19:03 UTC (permalink / raw)
To: Christian Couder; +Cc: git, Naomi Ibe, Kaartic Sivaraam
In-Reply-To: <CAP8UFD1cd5YZqAxYbYUMNkAYJLLGjBpNe_NK5nVq3eLxxSDzEQ@mail.gmail.com>
Christian Couder <christian.couder@gmail.com> writes:
> I am not sure how others feel about this, but I think it would be
> better in the future to not have to prepare such pages, and to just
> have a section with a number of examples of good microprojects on this
> https://git.github.io/General-Microproject-Information/ page. It will
> be easier to update this section when we know about other good ideas
> or better ideas, or when we want to remove an idea that we don't
> consider good anymore, or just update an idea.
If we have curated one-stop shop for microproject candidates to make
it easy to find them, it would be a vast improvement over the status
quo. The easier for us to update the contents of the list, the
better for participants. Having only one place that we need to look
at is one way to do so, and the general microproject information
page would be the best place to host it. I like it.
>> Then it goes on to suggest finding a bug report, but I tend to think
>> that fixing them is way oversized to be a good microproject.
>
> I agree that it's oversized for most bugs. I have just added the
> following paragraph at the end of this "Searching for bug reports"
> subsection:
>
> "Also some bugs are difficult to understand and require too much or
> too difficult work for a microproject, so don’t spend too much time on
> them if it appears that they might not be simple to fix, and don’t
> hesitate to ask on the mailing list if they are a good microproject."
Would that be better, or would it be simpler to gut the whole
paragraph about bug reports? This is "how to pick a microproject",
not "how to pick your main project to work on during your mentoring
program".
Unlike #leftoverbits that sometimes cover trivial but boring style
normalization and easy refactoring of code into helper functions, I
have never seen a bug report on the list that may make a good
microproject. If we were to add a curated list of microproject idea
on the general microproject information page, it probably is better
to remove these mentions of bugreports and #leftoverbits, so that
readers will not get distracted. "Don't hesitate to ask" so that
they may try to tackle more challenging one, if they wish, is a good
thing to say nevertheless.
Thanks.
^ permalink raw reply
* Re: [PATCH v3 3/3] completion: complete '--dd'
From: Sergey Organov @ 2023-10-06 18:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqlecgeoan.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> Sergey Organov <sorganov@gmail.com> writes:
>
>> '--dd' only makes sense for 'git log' and 'git show', so add it to
>> __git_log_show_options which is referenced in the completion for these
>> two commands.
>>
>> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> ---
>> contrib/completion/git-completion.bash | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
>> index 133ec92bfae7..ca4fa39f3ff8 100644
>> --- a/contrib/completion/git-completion.bash
>> +++ b/contrib/completion/git-completion.bash
>> @@ -2042,7 +2042,7 @@ __git_log_shortlog_options="
>> "
>> # Options accepted by log and show
>> __git_log_show_options="
>> - --diff-merges --diff-merges= --no-diff-merges --remerge-diff
>> + --diff-merges --diff-merges= --no-diff-merges --dd --remerge-diff
>> "
>>
>> __git_diff_merges_opts="off none on first-parent 1 separate m combined c dense-combined cc remerge r"
>
> Quite straight-forward. I am kind of surprised that we do not have
> to list "--cc" here. Perhaps it is so short and common that people
> do not need completion help?
>
> But that is not a new problem caused by this series, so it is OK.
>
> Unless "--cc" gets completed without being listed here, using some
> automation like the "--git-completion-helper" option, in which case
> we may want to see if we can remove all of the above and complete
> them the same way as "--cc" gets completed. I didn't check.
I checked, though with rather old 2.25.1 running on my Ubuntu, and it
is not completed.
I think that it's still a good idea to add --cc to completions, so that
it's there in the suggested completion list, for the sake of
discoverability. That's why I bothered to add --dd to the completions.
Thanks,
-- Sergey Organov
>
> Thanks.
^ permalink raw reply
* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Sergey Organov @ 2023-10-06 18:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq34yog3ux.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> Sergey Organov <sorganov@gmail.com> writes:
[...]
>> +on, m::
>> + Make diff output for merge commits to be shown in the default
>> + format. The default format could be changed using
>> `log.diffMerges` configuration parameter, which default value
>> is `separate`.
>
> The original is already wrong so these are not problems this patch
> introduces, but
>
> - "configuration variable" is how we refer to these entities.
> - "which default value" -> "whose default value".
Added this amendment to the patch.
Thanks,
-- Sergey Organov
^ permalink raw reply
* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Sergey Organov @ 2023-10-06 18:36 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Elijah Newren, git
In-Reply-To: <xmqq7cnzaav0.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> Elijah Newren <newren@gmail.com> writes:
>
>>> > +--cc::
>>> > + Produce dense combined diff output for merge commits.
>>> > + Shortcut for '--diff-merges=dense-combined -p'.
>>>
>>> Good.
>>>
>>> > +--remerge-diff::
>>> > + Produce diff against re-merge.
>>> > + Shortcut for '--diff-merges=remerge -p'.
>>> ...
>> Perhaps:
>>
>> Produce remerge-diff output for merge commits, in order to show how
>> conflicts were resolved.
>
> I do not mind it, but then I'd prefer to see ", in order to show
> how" also in the description of "--cc" and "-c" for consistency.
>
> A succinct way to say what they do may be hard to come by, but I
> think of them showing places that did not have obvious natural
> resolution.
So, is it OK with both of you if I leave it as:
"Produce remerge-diff output for merge commits."
for now, and let you tweak the descriptions later on, if needed?
Thanks,
-- Sergey Organov
^ permalink raw reply
* [PATCH 4/4] files-backend.c: avoid stat in 'loose_fill_ref_dir'
From: Victoria Dye via GitGitGadget @ 2023-10-06 18:09 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
In-Reply-To: <pull.1594.git.1696615769.gitgitgadget@gmail.com>
From: Victoria Dye <vdye@github.com>
Modify the 'readdir' loop in 'loose_fill_ref_dir' to, rather than 'stat' a
file to determine whether it is a directory or not, use 'get_dtype'.
Currently, the loop uses 'stat' to determine whether each dirent is a
directory itself or not in order to construct the appropriate ref cache
entry. If 'stat' fails (returning a negative value), the dirent is silently
skipped; otherwise, 'S_ISDIR(st.st_mode)' is used to check whether the entry
is a directory.
On platforms that include an entry's d_type in in the 'dirent' struct, this
extra 'stat' check is redundant. We can use the 'get_dtype' method to
extract this information on platforms that support it (i.e. where
NO_D_TYPE_IN_DIRENT is unset), and derive it with 'stat' on platforms that
don't. Because 'stat' is an expensive call, this confers a
modest-but-noticeable performance improvement when iterating over large
numbers of refs (approximately 20% speedup in 'git for-each-ref' in a 30k
ref repo).
Unlike other existing usage of 'get_dtype', the 'follow_symlinks' arg is set
to 1 to replicate the existing handling of symlink dirents. This
unfortunately requires calling 'stat' on the associated entry regardless of
platform, but symlinks in the loose ref store are highly unlikely since
they'd need to be created manually by a user.
Note that this patch also changes the condition for skipping creation of a
ref entry from "when 'stat' fails" to "when the d_type is anything other
than DT_REG or DT_DIR". If a dirent's d_type is DT_UNKNOWN (either because
the platform doesn't support d_type in dirents or some other reason) or
DT_LNK, 'get_dtype' will try to derive the underlying type with 'stat'. If
the 'stat' fails, the d_type will remain 'DT_UNKNOWN' and dirent will be
skipped. However, it will also be skipped if it is any other valid d_type
(e.g. DT_FIFO for named pipes, DT_LNK for a nested symlink). Git does not
handle these properly anyway, so we can safely constrain accepted types to
directories and regular files.
Signed-off-by: Victoria Dye <vdye@github.com>
---
refs/files-backend.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 341354182bb..db5c0c7a724 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -246,10 +246,8 @@ static void loose_fill_ref_dir(struct ref_store *ref_store,
int dirnamelen = strlen(dirname);
struct strbuf refname;
struct strbuf path = STRBUF_INIT;
- size_t path_baselen;
files_ref_path(refs, &path, dirname);
- path_baselen = path.len;
d = opendir(path.buf);
if (!d) {
@@ -262,23 +260,22 @@ static void loose_fill_ref_dir(struct ref_store *ref_store,
while ((de = readdir(d)) != NULL) {
struct object_id oid;
- struct stat st;
int flag;
+ unsigned char dtype;
if (de->d_name[0] == '.')
continue;
if (ends_with(de->d_name, ".lock"))
continue;
strbuf_addstr(&refname, de->d_name);
- strbuf_addstr(&path, de->d_name);
- if (stat(path.buf, &st) < 0) {
- ; /* silently ignore */
- } else if (S_ISDIR(st.st_mode)) {
+
+ dtype = get_dtype(de, &path, 1);
+ if (dtype == DT_DIR) {
strbuf_addch(&refname, '/');
add_entry_to_dir(dir,
create_dir_entry(dir->cache, refname.buf,
refname.len));
- } else {
+ } else if (dtype == DT_REG) {
if (!refs_resolve_ref_unsafe(&refs->base,
refname.buf,
RESOLVE_REF_READING,
@@ -308,7 +305,6 @@ static void loose_fill_ref_dir(struct ref_store *ref_store,
create_ref_entry(refname.buf, &oid, flag));
}
strbuf_setlen(&refname, dirnamelen);
- strbuf_setlen(&path, path_baselen);
}
strbuf_release(&refname);
strbuf_release(&path);
--
gitgitgadget
^ permalink raw reply related
* [PATCH 2/4] dir.[ch]: expose 'get_dtype'
From: Victoria Dye via GitGitGadget @ 2023-10-06 18:09 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
In-Reply-To: <pull.1594.git.1696615769.gitgitgadget@gmail.com>
From: Victoria Dye <vdye@github.com>
Move 'get_dtype()' from 'diagnose.c' to 'dir.c' and add its declaration to
'dir.h' so that it is accessible to callers in other files. The function and
its documentation are moved verbatim except for a small addition to the
description clarifying what the 'path' arg represents.
Signed-off-by: Victoria Dye <vdye@github.com>
---
diagnose.c | 36 ------------------------------------
dir.c | 28 ++++++++++++++++++++++++++++
dir.h | 11 +++++++++++
3 files changed, 39 insertions(+), 36 deletions(-)
diff --git a/diagnose.c b/diagnose.c
index 8430064000b..fc4d344bd63 100644
--- a/diagnose.c
+++ b/diagnose.c
@@ -71,42 +71,6 @@ static int dir_file_stats(struct object_directory *object_dir, void *data)
return 0;
}
-/*
- * Get the d_type of a dirent. If the d_type is unknown, derive it from
- * stat.st_mode.
- *
- * Note that 'path' is assumed to have a trailing slash. It is also modified
- * in-place during the execution of the function, but is then reverted to its
- * original value before returning.
- */
-static unsigned char get_dtype(struct dirent *e, struct strbuf *path)
-{
- struct stat st;
- unsigned char dtype = DTYPE(e);
- size_t base_path_len;
-
- if (dtype != DT_UNKNOWN)
- return dtype;
-
- /* d_type unknown in dirent, try to fall back on lstat results */
- base_path_len = path->len;
- strbuf_addstr(path, e->d_name);
- if (lstat(path->buf, &st))
- goto cleanup;
-
- /* determine d_type from st_mode */
- if (S_ISREG(st.st_mode))
- dtype = DT_REG;
- else if (S_ISDIR(st.st_mode))
- dtype = DT_DIR;
- else if (S_ISLNK(st.st_mode))
- dtype = DT_LNK;
-
-cleanup:
- strbuf_setlen(path, base_path_len);
- return dtype;
-}
-
static int count_files(struct strbuf *path)
{
DIR *dir = opendir(path->buf);
diff --git a/dir.c b/dir.c
index 8486e4d56ff..5e01af3a25e 100644
--- a/dir.c
+++ b/dir.c
@@ -2235,6 +2235,34 @@ static int get_index_dtype(struct index_state *istate,
return DT_UNKNOWN;
}
+unsigned char get_dtype(struct dirent *e, struct strbuf *path)
+{
+ struct stat st;
+ unsigned char dtype = DTYPE(e);
+ size_t base_path_len;
+
+ if (dtype != DT_UNKNOWN)
+ return dtype;
+
+ /* d_type unknown in dirent, try to fall back on lstat results */
+ base_path_len = path->len;
+ strbuf_addstr(path, e->d_name);
+ if (lstat(path->buf, &st))
+ goto cleanup;
+
+ /* determine d_type from st_mode */
+ if (S_ISREG(st.st_mode))
+ dtype = DT_REG;
+ else if (S_ISDIR(st.st_mode))
+ dtype = DT_DIR;
+ else if (S_ISLNK(st.st_mode))
+ dtype = DT_LNK;
+
+cleanup:
+ strbuf_setlen(path, base_path_len);
+ return dtype;
+}
+
static int resolve_dtype(int dtype, struct index_state *istate,
const char *path, int len)
{
diff --git a/dir.h b/dir.h
index ad06682fd54..28c630ce806 100644
--- a/dir.h
+++ b/dir.h
@@ -363,6 +363,17 @@ struct dir_struct {
struct dirent *readdir_skip_dot_and_dotdot(DIR *dirp);
+/*
+ * Get the d_type of a dirent. If the d_type is unknown, derive it from
+ * stat.st_mode using the path to the dirent's containing directory (path) and
+ * the name of the dirent itself.
+ *
+ * Note that 'path' is assumed to have a trailing slash. It is also modified
+ * in-place during the execution of the function, but is then reverted to its
+ * original value before returning.
+ */
+unsigned char get_dtype(struct dirent *e, struct strbuf *path);
+
/*Count the number of slashes for string s*/
int count_slashes(const char *s);
--
gitgitgadget
^ permalink raw reply related
* [PATCH 3/4] dir.[ch]: add 'follow_symlink' arg to 'get_dtype'
From: Victoria Dye via GitGitGadget @ 2023-10-06 18:09 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
In-Reply-To: <pull.1594.git.1696615769.gitgitgadget@gmail.com>
From: Victoria Dye <vdye@github.com>
Add a 'follow_symlink' boolean option to 'get_type()'. If 'follow_symlink'
is enabled, DT_LNK (in addition to DT_UNKNOWN) d_types triggers the
stat-based d_type resolution, using 'stat' instead of 'lstat' to get the
type of the followed symlink. Note that symlinks are not followed
recursively, so a symlink pointing to another symlink will still resolve to
DT_LNK.
Update callers in 'diagnose.c' to specify 'follow_symlink = 0' to preserve
current behavior.
Signed-off-by: Victoria Dye <vdye@github.com>
---
diagnose.c | 6 +++---
dir.c | 13 +++++++++----
dir.h | 7 ++++++-
3 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/diagnose.c b/diagnose.c
index fc4d344bd63..4d096c857f1 100644
--- a/diagnose.c
+++ b/diagnose.c
@@ -81,7 +81,7 @@ static int count_files(struct strbuf *path)
return 0;
while ((e = readdir_skip_dot_and_dotdot(dir)) != NULL)
- if (get_dtype(e, path) == DT_REG)
+ if (get_dtype(e, path, 0) == DT_REG)
count++;
closedir(dir);
@@ -110,7 +110,7 @@ static void loose_objs_stats(struct strbuf *buf, const char *path)
base_path_len = count_path.len;
while ((e = readdir_skip_dot_and_dotdot(dir)) != NULL)
- if (get_dtype(e, &count_path) == DT_DIR &&
+ if (get_dtype(e, &count_path, 0) == DT_DIR &&
strlen(e->d_name) == 2 &&
!hex_to_bytes(&c, e->d_name, 1)) {
strbuf_setlen(&count_path, base_path_len);
@@ -155,7 +155,7 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
strbuf_add_absolute_path(&abspath, at_root ? "." : path);
strbuf_addch(&abspath, '/');
- dtype = get_dtype(e, &abspath);
+ dtype = get_dtype(e, &abspath, 0);
strbuf_setlen(&buf, len);
strbuf_addstr(&buf, e->d_name);
diff --git a/dir.c b/dir.c
index 5e01af3a25e..16fdb03f2a5 100644
--- a/dir.c
+++ b/dir.c
@@ -2235,19 +2235,24 @@ static int get_index_dtype(struct index_state *istate,
return DT_UNKNOWN;
}
-unsigned char get_dtype(struct dirent *e, struct strbuf *path)
+unsigned char get_dtype(struct dirent *e, struct strbuf *path,
+ int follow_symlink)
{
struct stat st;
unsigned char dtype = DTYPE(e);
size_t base_path_len;
- if (dtype != DT_UNKNOWN)
+ if (dtype != DT_UNKNOWN && !(follow_symlink && dtype == DT_LNK))
return dtype;
- /* d_type unknown in dirent, try to fall back on lstat results */
+ /*
+ * d_type unknown or unfollowed symlink, try to fall back on [l]stat
+ * results. If [l]stat fails, explicitly set DT_UNKNOWN.
+ */
base_path_len = path->len;
strbuf_addstr(path, e->d_name);
- if (lstat(path->buf, &st))
+ if ((follow_symlink && stat(path->buf, &st)) ||
+ (!follow_symlink && lstat(path->buf, &st)))
goto cleanup;
/* determine d_type from st_mode */
diff --git a/dir.h b/dir.h
index 28c630ce806..98aa85fcc0e 100644
--- a/dir.h
+++ b/dir.h
@@ -368,11 +368,16 @@ struct dirent *readdir_skip_dot_and_dotdot(DIR *dirp);
* stat.st_mode using the path to the dirent's containing directory (path) and
* the name of the dirent itself.
*
+ * If 'follow_symlink' is 1, this function will attempt to follow DT_LNK types
+ * using 'stat'. Links are *not* followed recursively, so a symlink pointing
+ * to another symlink will still resolve to 'DT_LNK'.
+ *
* Note that 'path' is assumed to have a trailing slash. It is also modified
* in-place during the execution of the function, but is then reverted to its
* original value before returning.
*/
-unsigned char get_dtype(struct dirent *e, struct strbuf *path);
+unsigned char get_dtype(struct dirent *e, struct strbuf *path,
+ int follow_symlink);
/*Count the number of slashes for string s*/
int count_slashes(const char *s);
--
gitgitgadget
^ permalink raw reply related
* [PATCH 1/4] ref-cache.c: fix prefix matching in ref iteration
From: Victoria Dye via GitGitGadget @ 2023-10-06 18:09 UTC (permalink / raw)
To: git; +Cc: Victoria Dye, Victoria Dye
In-Reply-To: <pull.1594.git.1696615769.gitgitgadget@gmail.com>
From: Victoria Dye <vdye@github.com>
Update 'cache_ref_iterator_advance' to skip over refs that are not matched
by the given prefix.
Currently, a ref entry is considered "matched" if the entry name is fully
contained within the prefix:
* prefix: "refs/heads/v1"
* entry: "refs/heads/v1.0"
OR if the prefix is fully contained in the entry name:
* prefix: "refs/heads/v1.0"
* entry: "refs/heads/v1"
The first case is always correct, but the second is only correct if the ref
cache entry is a directory, for example:
* prefix: "refs/heads/example"
* entry: "refs/heads/"
Modify the logic in 'cache_ref_iterator_advance' to reflect these
expectations:
1. If 'overlaps_prefix' returns 'PREFIX_EXCLUDES_DIR', then the prefix and
ref cache entry do not overlap at all. Skip this entry.
2. If 'overlaps_prefix' returns 'PREFIX_WITHIN_DIR', then the prefix matches
inside this entry if it is a directory. Skip if the entry is not a
directory, otherwise iterate over it.
3. Otherwise, 'overlaps_prefix' returned 'PREFIX_CONTAINS_DIR', indicating
that the cache entry (directory or not) is fully contained by or equal to
the prefix. Iterate over this entry.
Note that condition 2 relies on the names of directory entries having the
appropriate trailing slash. The existing function documentation of
'create_dir_entry' explicitly calls out the trailing slash requirement, so
this is a safe assumption to make.
Signed-off-by: Victoria Dye <vdye@github.com>
---
refs/ref-cache.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 2294c4564fb..6e3b725245c 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -412,7 +412,8 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
if (level->prefix_state == PREFIX_WITHIN_DIR) {
entry_prefix_state = overlaps_prefix(entry->name, iter->prefix);
- if (entry_prefix_state == PREFIX_EXCLUDES_DIR)
+ if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
+ (entry_prefix_state == PREFIX_WITHIN_DIR && !(entry->flag & REF_DIR)))
continue;
} else {
entry_prefix_state = level->prefix_state;
--
gitgitgadget
^ permalink raw reply related
* [PATCH 0/4] Performance improvement & cleanup in loose ref iteration
From: Victoria Dye via GitGitGadget @ 2023-10-06 18:09 UTC (permalink / raw)
To: git; +Cc: Victoria Dye
While investigating ref iteration performance in builtins like
'for-each-ref' and 'show-ref', I found two small improvement opportunities.
The first patch tweaks the logic around prefix matching in
'cache_ref_iterator_advance' so that we correctly skip refs that do not
actually match a given prefix. The unnecessary iteration doesn't seem to be
causing any bugs in the ref iteration commands that I've tested, but it
doesn't hurt to be more precise (and it helps with some other patches I'm
working on ;) ).
The next three patches update how 'loose_fill_ref_dir' determines the type
of ref cache entry to create (directory or regular). On platforms that
include d_type information in 'struct dirent' (as far as I can tell, all
except NonStop & certain versions of Cygwin), this allows us to skip calling
'stat'. In ad-hoc testing, this improved performance of 'git for-each-ref'
by about 20%.
Thanks!
* Victoria
Victoria Dye (4):
ref-cache.c: fix prefix matching in ref iteration
dir.[ch]: expose 'get_dtype'
dir.[ch]: add 'follow_symlink' arg to 'get_dtype'
files-backend.c: avoid stat in 'loose_fill_ref_dir'
diagnose.c | 42 +++---------------------------------------
dir.c | 33 +++++++++++++++++++++++++++++++++
dir.h | 16 ++++++++++++++++
refs/files-backend.c | 14 +++++---------
refs/ref-cache.c | 3 ++-
5 files changed, 59 insertions(+), 49 deletions(-)
base-commit: 3a06386e314565108ad56a9bdb8f7b80ac52fb69
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1594%2Fvdye%2Fvdye%2Fref-iteration-cleanup-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1594/vdye/vdye/ref-iteration-cleanup-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1594
--
gitgitgadget
^ permalink raw reply
* Re: What's cooking in git.git (Oct 2023, #01; Mon, 2)
From: Sergey Organov @ 2023-10-06 18:02 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqh6n4eo7n.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> Sergey Organov <sorganov@gmail.com> writes:
>
>> Overall, as an example, I'd understand if you had deflected the patch
>> with "let's rather use -d for '--decorate=short', or '--date=relative'",
>> or something like that, but you don't, leaving me uncertain about your
>> actual worries and intentions.
>
> Oh, I would be very much more sympathetic if somebody wanted to make
> a short-and-sweet single-letter option to stand for "--first-parent
> -p", if they come with the "first-parent chain is special---it is
> the trunk history of the development" world view. And the resulting
> behaviour would be "give me the diffs" in their world view, so I
> would understand if they wanted to use "-d" for such an operation.
>
> However, to folks who do not subscribe to "the first parent chain is
> the trunk history" world view, "give me the diffs" is not an
> explanation of the resulting behaviour, because in "-d" there is no
> trace of hint that it is also about first-parent traversal.
>
> So "-d" may not be a perfect fit for it, either. But at least it is
> based on a more consistent world view, I would think, than
> "--diff-merges=1 -p", whose behaviour becomes unexplainable when it
> hits "reverse" merges in a world where the first parent chain is not
> necessarily the trunk.
>
> Anyway, I've tentatively queued the "--dd" round. Naming is hard,
> I cannot tell what "dd" stards for, and I suspect no user can X-<.
Thanks, it stands for diff-diff (for both merges and regulars), and got
inspired by --cc as well. Also selected as being fairly easy to type.
Should I add this to the commit message?
-- Sergey Organov
^ permalink raw reply
* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Junio C Hamano @ 2023-10-06 18:01 UTC (permalink / raw)
To: Elijah Newren; +Cc: Sergey Organov, git
In-Reply-To: <CABPp-BFsrt0zS3NHsVAyOSW6vGioe8Z-iN2M3_JNBpP2fWVq9g@mail.gmail.com>
Elijah Newren <newren@gmail.com> writes:
>> > +--cc::
>> > + Produce dense combined diff output for merge commits.
>> > + Shortcut for '--diff-merges=dense-combined -p'.
>>
>> Good.
>>
>> > +--remerge-diff::
>> > + Produce diff against re-merge.
>> > + Shortcut for '--diff-merges=remerge -p'.
>> ...
> Perhaps:
>
> Produce remerge-diff output for merge commits, in order to show how
> conflicts were resolved.
I do not mind it, but then I'd prefer to see ", in order to show
how" also in the description of "--cc" and "-c" for consistency.
A succinct way to say what they do may be hard to come by, but I
think of them showing places that did not have obvious natural
resolution.
> For a two-parent merge commit, a merge of these two commits is
> retried to create a temporary tree object, potentially containing
> files with conflict markers. A diff is then shown between that
> temporary tree and the actual merge commit. This has the effect
> of showing whether and how both semantic and textual conflicts
> were resolved by the user (i.e. what changes the user made after
> running 'git merge' and before finally committing).
Yes, and because we do not have a back reference from here to the
description for "--remerge-diff" we saw earlier, we'd need the "in
order to" you suggested earlier there, too.
>> Either way, it makes readers wonder what happens to merges with more
>> than 2 parents (octopus merges). It is not a new problem and this
>> topic should not attempt to fix it.
>
> We could add:
>
> For octopus merges (merges with more than two parents), currently
> only shows a warning about skipping such commits.
>
> if wanted.
>
> But perhaps I've distracted too much from Sergey's topic, and I should
> submit these wording tweaks as a patch on top? I'm fine either way.
The primary purpose of polishing during a review cycle should be to
help the original contributor to express what they wanted to express
better, so talking about octopus behaviour, which wasn't covered in
the original nor the patch under review, can be left out to avoid
extending the scope of the topic further.
But everything else you said in the message I am responding to falls
into the scope of the "improving existing documentation for various
merge presentation modes" topic, I would think, and they are more or
less usable verbatim, so it would not be too much of a burden to
make sure they are used in the next iteration.
Thanks for a review, and thanks Sergey for streamlining the
documentation around here.
^ permalink raw reply
* Re: [PATCH v2 1/2] attr: add attr.tree for setting the treeish to read attributes from
From: Jonathan Tan @ 2023-10-06 17:20 UTC (permalink / raw)
To: Junio C Hamano
Cc: Jonathan Tan, John Cai via GitGitGadget, git, Jeff King, John Cai
In-Reply-To: <xmqqv8bmlzoi.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> As this adds a new call to git_config_get_string(), which will only
> be available by including <config.h>, a merge-fix into 'seen' of
> this topic needs to revert what b1bda751 (parse: separate out
> parsing functions from config.h, 2023-09-29) did, which made this
> file include only <parse.h>.
>
> As this configuration variable was invented to improve the way the
> attribute source tree is supported by emulating how mailmap.blob is
> done, it deserves a bit of comparison.
>
> The way mailmap.c does this is not have any code that reads or
> parses configuration in mailmap.c (which is a rather library-ish
> place), and leaves it up to callers to pre-populate the global
> variable git_mailmap_blob with config.c:git_default_config(). That
> way, they do not need to include <config.h> (nor <parse.h>) that is
> closer to the UI layer. I am wondering why we are not doing the
> same, and instead making an ad-hoc call to git_config_get_string()
> in this code, and if it is a good direction to move the codebase to
> (in which case we may want to make sure that the same pattern is
> followed in other places).
>
> Folks interested in libification, as to the direction of that
> effort, what's your plan on where to draw a line between "library"
> and "userland"? Should library-ish code be allowed to call
> git_config_anything()? I somehow suspect that it might be cleaner
> if they didn't, and instead have the user of the "attr" module to
> supply the necessary values from outside.
I think that ideally library-ish code shouldn't be allowed to call
config, yes. However I think what's practical would be for libraries
that use very few config variables to get the necessary values from
outside, and libraries that use many config variables (e.g. fetch, if it
becomes a library) to call config.
> On the other hand, once the part we have historically called
> "config" API gets a reasonably solid abstraction so that they become
> pluggable and replaceable, random ad-hoc calls from library code
> outside the "config" library code may not be a huge problem, as long
> as we plumb the necessary object handles around (so "attr" library
> would need to be told which "config" backend is in use, probably in
> the form of a struct that holds the various states in to replace
> the current use of globals, plus a vtable to point at
> implementations of the "config" service, and git_config_get_string()
> call in such a truly libified world would grab the value of the named
> variable transparently from whichever "config" backend is currently
> in use).
This is true, but if we were ever to use the attr library elsewhere
(whether in the git.git repo itself to unit test this library, or
in another software project), we would need to supply a mock/stub of
config. If attr uses very few config variables, I think it's clearer if
it takes in the information from outside.
^ permalink raw reply
* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Sergey Organov @ 2023-10-06 17:18 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq34yog3ux.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> Sergey Organov <sorganov@gmail.com> writes:
[...]
>> --no-diff-merges::
>> + Synonym for '--diff-merges=off'.
>> +
>> +--diff-merges=<format>::
>> Specify diff format to be used for merge commits. Default is
>> - {diff-merges-default} unless `--first-parent` is in use, in which case
>> - `first-parent` is the default.
>> + {diff-merges-default} unless `--first-parent` is in use, in
>> + which case `first-parent` is the default.
>
> This reads well.
>
> In the longer term, "--diff-merge=first-parent" that is used without
> first-parent traversal should be discouraged and be deprecated, I
> think, but that is a separate story [*].
I fail to see why useful harmless feature is to be deprecated. I believe
users are pretty capable to decide if they need it by themselves,
without our guidance.
Thanks,
-- Sergey Organov
^ permalink raw reply
* gpg.ssh.defaultKeyCommand docs bug?
From: matthew sporleder @ 2023-10-06 17:14 UTC (permalink / raw)
To: Git Mailing List
https://git-scm.com/docs/git-config#Documentation/git-config.txt-gpgsshdefaultKeyCommand
This command that will be run when user.signingkey is not set and a
ssh signature is requested. On successful exit a valid ssh public key
prefixed with key:: is expected in the first line of its output. This
allows for a script doing a dynamic lookup of the correct public key
when it is impractical to statically configure user.signingKey. For
example when keys or SSH Certificates are rotated frequently or
selection of the right key depends on external factors unknown to git.
---
The command does not actually work (for me, git version 2.42.0) with
key:: prefixed.
It only works if I cat the public key as-is.
I only figured this out because the docs previously said it took the
format of ssh-add -L, which also doesn't not contain key::.
I am using this script for my "dynamic" key discovery:
#!/bin/sh
f=$(ssh -G $(git remote get-url $(git remote|head -1)|awk -F':' '{
print $1 }') |grep -E '^identityfile'|sed 's#^identityfile ##g')
cat $(eval realpath ${f}.pub)
Thanks,
Matt
^ permalink raw reply
* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Sergey Organov @ 2023-10-06 17:07 UTC (permalink / raw)
To: Elijah Newren; +Cc: Junio C Hamano, git
In-Reply-To: <CABPp-BFsrt0zS3NHsVAyOSW6vGioe8Z-iN2M3_JNBpP2fWVq9g@mail.gmail.com>
Elijah Newren <newren@gmail.com> writes:
> Hi,
>
> On Thu, Oct 5, 2023 at 2:24 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Sergey Organov <sorganov@gmail.com> writes:
>> > +--remerge-diff::
>> > + Produce diff against re-merge.
>> > + Shortcut for '--diff-merges=remerge -p'.
>>
>> I suspect that many people do not get what "re-merge" in "against
>> re-merge" really is. As "combined diff" and "dense combined diff"
>> are not explained in the previous two entries either, and expect the
>> readers to read the real description (which more or less matches
>> what the original description for "-c" and "--cc" had, which is
>> good), it would be better to say "Produce remerge-diff output for
>> merge commits." here, too. It makes it consistent, and "for merge
>> commits" makes it clear the "magic" does not apply to regular
>> commits (which the above entries for "-c" and "--cc" do, which is
>> very good).
>
> Perhaps:
>
> Produce remerge-diff output for merge commits, in order to show how
> conflicts were resolved.
Will use this description in re-roll.
>
>> > +separate::
>> > + Show full diff with respect to each of parents.
>> > + Separate log entry and diff is generated for each parent.
>>
>> In the early days of Git before -c/--cc were invented, we explained
>> this mode as "pairwise comparison", and the phrase "pairwise" still
>> may be the best one to describe the behaviour here. In fact, we see
>> in the updated description of combined below the exact phrase is used
>> to refer to this oldest output format.
>>
>> Show the `--patch` output pairwise, together with the commit
>> header, repeated for each parent for a merge commit.
>
> I like this.
>
>> or something, perhaps. I added "repeated" here to make the contrast
>> with "simultaneously" stand out.
Please let's left it for some follow-up, as this patch does not rephrase
original, just changes the presentation.
>>
>> > +combined, c::
>> > + Show differences from each of the parents to the merge
>> > + result simultaneously instead of showing pairwise diff between
>> > + a parent and the result one at a time. Furthermore, it lists
>> > + only files which were modified from all parents.
>> > ++
>> > +dense-combined, cc::
>> > + Further compress output produced by `--diff-merges=combined`
>> > + by omitting uninteresting hunks whose contents in the parents
>> > + have only two variants and the merge result picks one of them
>> > + without modification.
>> > ++
>> > +remerge, r::
>> > + Remerge two-parent merge commits to create a temporary tree
>> > + object--potentially containing files with conflict markers
>> > + and such. A diff is then shown between that temporary tree
>> > + and the actual merge commit.
>>
>> The original says "two-parent merge comimts are remerged" so it is
>> not a failure of this patch, but the first verb "Remerge" sounds
>> unnecessarily unfriendly to the readers.
>>
>> For a two-parent merge commit, a merge of these two commits
>> is retried to create a temporary tree object, potentially
>> containing files with conflict markers. A `--patch` output
>> then is shown between ...
>>
>> would be easier to follow and more faithful to the original
>> description added by db757e8b (show, log: provide a --remerge-diff
>> capability, 2022-02-02).
>
> I like it. Perhaps it may also benefit from explaining why this mode
> is useful as well:
>
> For a two-parent merge commit, a merge of these two commits is
> retried to create a temporary tree object, potentially containing
> files with conflict markers. A diff is then shown between that
> temporary tree and the actual merge commit. This has the effect
> of showing whether and how both semantic and textual conflicts
> were resolved by the user (i.e. what changes the user made after
> running 'git merge' and before finally committing).
>
>> Either way, it makes readers wonder what happens to merges with more
>> than 2 parents (octopus merges). It is not a new problem and this
>> topic should not attempt to fix it.
>
> We could add:
>
> For octopus merges (merges with more than two parents), currently
> only shows a warning about skipping such commits.
>
> if wanted.
>
> But perhaps I've distracted too much from Sergey's topic, and I should
> submit these wording tweaks as a patch on top? I'm fine either way.
>
>> [Footnote]
>>
>> * When a project allows fast-forward merges, something like this can
>> happen (and Git was _designed_ to allow and even encourage it)
>>
>> - Linus pulls from Sergey and sees merge conflicts that are very
>> messy. Sergey is asked to resolve the conflict, as Linus knows
>> Sergey understands the changes he is asking Linus to pull much
>> better than Linus does.
>>
>> - Sergey does "git pull origin" that would give the same set of
>> conflicts Linus saw, perhaps ours/theirs sides swapped, resolves
>> the conflicts, and comits the merge result. He may even add a
>> few other improvements on top (or may not). He tells Linus that
>> his tree is ready to be pulled again.
>>
>> - Linus pulls from Sergey again. This time it is fast-forward,
>> without an extra merge commit that records the Linus's previous
>> tip as the first parent and Sergey's work as the second parent.
>>
>> - Linus continues working from here.
>>
>> In such a workflow, merges are nothing more than "combining
>> multiple histories together" and the first parenthood is NOT
>> inherently special among parents at all. The original "-m -p"
>> (aka "pairwise diff") output reflects this world view and ensures
>> that all parents are shown more or less as equals (yes, the first
>> parent diff is shown first before the other parents, but you
>> cannot avoid it when outputting to a single dimension medium).
>>
>> This world view was the only world view Git supported, until I
>> added the "--first-parent" traversal in 0053e902 (git-log
>> --first-parent: show only the first parent log, 2007-03-13).
>>
>> With the "--first-parent", with "--no-ff" option to "git merge", a
>> different world view becomes possible. A merge is not merely
>> combining multiple histories, which are equals. It is bringing
>> work done on a side branch into the trunk. To see the overview of
>> the history, "git log --first-parent" would give the outline,
>> which would be full of merges from side branches, each of which
>> can be seen as summarizing the work done on the side branch that
>> was merged, and it may occasionally have single-parent commits
>> that are hotfixes or trivial clean-ups or project administrivia
>> commits. With "-p", "git log" would show the changes the work
>> done on a side branch as a single unit for a merge, and individual
>> commits if they are single-parent. The life is good.
>>
>> It all breaks down if the "diff against the first parent" is done
>> on a merge that is not bringing the work on a side branch in to
>> the trunk. The merge done in the second step Sergey did for Linus
>> in the above example will have his work on the history leading to
>> its first parent, and from the overall project's point of view,
>> the second parent is the tip of the history of the trunk. Showing
>> first-parent diff for a merge that was *not* discovered via the
>> first-parent traversal would show such a meaningless patch. This
>> is an illustration of the fallout from mixing two incompatible
>> world views together, "--diff-merges=first-parent" wants to work
>> in a world where the first-parent is special among parents, but
>> traversal without "--first-parent" wants to treat all the branches
>> equally.
>>
>> All the other <format>s accepted by the "--diff-merges=<format>"
>> option are symmetrical and they work equally well when in a
>> history of a project that considers the first-parenthood special
>> (i.e. work on a side branch is brought into the trunk history) or
>> in a history with merges whose parent order should not matter, so
>> unlike "--diff-merges=first-parent", it makes sense to apply them
>> with or without first-parent traversal. It however is not true
>> for the "--diff-merges=first-parent" variant, which is asymmetric.
>>
>> And that is why I think use of "--diff-merges=first-parent"
>> without "--first-parent" traversal is a bad thing to teach users
>> to use.
>
> Thanks for writing this up. In the past, I didn't know how to put
> into words why I didn't particularly care for this mode. You explain
> it rather well.
^ permalink raw reply
* Re: [PATCH 1/1] Allow attr magic for git-add, git-stash.
From: Junio C Hamano @ 2023-10-06 17:05 UTC (permalink / raw)
To: Joanna Wang; +Cc: git
In-Reply-To: <xmqqedi8bhf1.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
>> -test_expect_success 'fail if attr magic is used places not implemented' '
>> - # The main purpose of this test is to check that we actually fail
>> - # when you attempt to use attr magic in commands that do not implement
>> - # attr magic. This test does not advocate git-add to stay that way,
>> - # though, but git-add is convenient as it has its own internal pathspec
>> - # parsing.
>> - test_must_fail git add ":(attr:labelB)" 2>actual &&
>> - test_i18ngrep "magic not supported" actual
> ...
> IOW, do not remove the above test, but find something
> other than "add" that is suitable to still follow the original
> intention of the test.
We can do something like the attached, instead of discarding the
protection from future breakage. You're welcome to find another
command and magic that are less likely to be implemented than
check-ignore with attr magic and replace them, of course. I just
eyeballed the hits from
$ git grep -A2 parse_pathspec\(
and picked one that has pathspec magic restriction (i.e., the second
parameter is not zero) and appears near the surface (i.e., the one
in blame.c for example is a bad target, because it is to ensure that
a pathspec that is internally created out of an end-user provided
pathname is treated only literally, and is impossible to pass a
pathspec with offending magic to that codepath from the outside) to
make it easier to pass offending magic from the command line.
A tangent (to the topic of testing, but relevant to the whole
patch). I notice 'stash' is mentioned on the topic, but I do not
see changes to the codepath that is specific to 'stash', and changes
to the tests do not demonstrate existing breakage. The lack of code
changes probably is because something shared, which is pretected
with magic guard lifted by the patch, is called from 'stash' as well
as 'add', or something? Whatever the reason is, it would be better
to document it in the proposed log message, and have a test to make
sure that the command works with the attr magic.
Alternatively, leave the mention of 'stash' out of the patch, if it
is not affected, perhaps?
In any case, here is a replacement for the removed test.
diff --git c/t/t6135-pathspec-with-attrs.sh w/t/t6135-pathspec-with-attrs.sh
index f70c395e75..b450cb5b3a 100755
--- c/t/t6135-pathspec-with-attrs.sh
+++ w/t/t6135-pathspec-with-attrs.sh
@@ -242,10 +242,10 @@ test_expect_success 'fail on multiple attr specifiers in one pathspec item' '
test_expect_success 'fail if attr magic is used places not implemented' '
# The main purpose of this test is to check that we actually fail
# when you attempt to use attr magic in commands that do not implement
- # attr magic. This test does not advocate git-add to stay that way,
- # though, but git-add is convenient as it has its own internal pathspec
- # parsing.
- test_must_fail git add ":(attr:labelB)" 2>actual &&
+ # attr magic. This test does not advocate check-ignore to stay that way.
+ # When you teach the command to grok the pathspec, you need to find
+ # another commnad to replace it for the test.
+ test_must_fail git check-ignore ":(attr:labelB)" 2>actual &&
test_i18ngrep "magic not supported" actual
'
^ permalink raw reply related
* Re: [PATCH v3 2/3] diff-merges: introduce '--dd' option
From: Sergey Organov @ 2023-10-06 17:05 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqr0m8eoaq.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> Sergey Organov <sorganov@gmail.com> writes:
>
>> This option provides a shortcut to request diff with respect to first
>> parent for any kind of commit, universally. It's implemented as pure
>> synonym for "--diff-merges=first-parent --patch".
>
> That explains what the patch does, but it does not tell us why it is
> useful [*].
>
>> NOTE: originally proposed as '-d', and renamed to '--dd' due to Junio
>> request to keep "short-and-sweet" '-d' reserved for other uses.
>
> The note is not grammatical, and more importantly, readers of "git
> log" 6 months down the road would not care. I'd rather not see it
> in the proposed log message. It is suitable material to place after
> the three-dash line, or in the cover letter for the iteration.
OK, will get rid of it.
Thanks,
-- Sergey Organov
^ permalink raw reply
* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Sergey Organov @ 2023-10-06 17:03 UTC (permalink / raw)
To: Elijah Newren; +Cc: Junio C Hamano, git
In-Reply-To: <CABPp-BFsrt0zS3NHsVAyOSW6vGioe8Z-iN2M3_JNBpP2fWVq9g@mail.gmail.com>
Elijah Newren <newren@gmail.com> writes:
> Hi,
>
> On Thu, Oct 5, 2023 at 2:24 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Sergey Organov <sorganov@gmail.com> writes:
>>
>> > ---diff-merges=(off|none|on|first-parent|1|separate|m|combined|c|dense-combined|cc|remerge|r)::
>> > +-m::
>> > + Show diffs for merge commits in the default format. This is
>> > + similar to '--diff-merges=on' (which see) except `-m` will
>> > + produce no output unless `-p` is given as well.
>>
>> I think the sentence reads better without the translated (q.v.) that
>> confused Eric.
>
> Agreed; confused me too.
Will remove, no problem.
Thanks,
-- Sergey Organov
^ permalink raw reply
* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Sergey Organov @ 2023-10-06 17:02 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Eric Sunshine, git
In-Reply-To: <xmqqcyxshizq.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> Eric Sunshine <sunshine@sunshineco.com> writes:
>
>>> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>>> @@ -43,66 +43,74 @@ endif::git-diff[]
>>> +-m::
>>> + Show diffs for merge commits in the default format. This is
>>> + similar to '--diff-merges=on' (which see) except `-m` will
>>> + produce no output unless `-p` is given as well.
>>
>> I'm having difficulty grasping the parenthetical "(which see)" comment.
>
> I am, too. I know what it means when written in the more common
> Latin abbreviation (q.v.), but I suspect it may be rare to spell it
> in English like this. I found
Well, I didn't invent it, and didn't lookup it until asked, it just
popped-up out of my head somehow.
I'll remove it as causes confusion.
Thanks,
-- Sergey Organov
^ permalink raw reply
* Re: [PATCH 2/3] revision: clear decoration structs during release_revisions()
From: Junio C Hamano @ 2023-10-06 16:42 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20231006005132.GA992085@coredump.intra.peff.net>
Jeff King <peff@peff.net> writes:
> On Thu, Oct 05, 2023 at 04:00:54PM -0700, Junio C Hamano wrote:
>
>> Wow, nested maze of callbacks make my head spin ;-) but they all
>> look reasonable. Thanks.
>
> Yeah, I don't love those one-liner callbacks just to handle the cast.
>
> The other alternative is to write some kind of for_each_decoration()
> macro, but I think it ends up in the usual macro hell (requiring the
> caller to provide iterator variables, hanging half-open braces, and so
> on). It might be worth it if iterating could be used in other places,
> but I don't think it is.
>
> So I tried to choose the lesser of two evils. :)
Oh, I am happy with the result.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox