git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Victoria Dye <vdye@github.com>
To: Patrick Steinhardt <ps@pks.im>,
	Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 10/16] mktree: overwrite duplicate entries
Date: Wed, 12 Jun 2024 11:48:37 -0700	[thread overview]
Message-ID: <dab4b0e3-8000-465e-8f0a-61df3d9168a3@github.com> (raw)
In-Reply-To: <ZmltGAPQ2dAfW0kG@tanuki>

Patrick Steinhardt wrote:
> On Tue, Jun 11, 2024 at 06:24:42PM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> If multiple tree entries with the same name are provided as input to
>> 'mktree', only write the last one to the tree. Entries are considered
>> duplicates if they have identical names (*not* considering mode); if a blob
>> and a tree with the same name are provided, only the last one will be
>> written to the tree. A tree with duplicate entries is invalid (per 'git
>> fsck'), so that condition should be avoided wherever possible.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  Documentation/git-mktree.txt |  8 ++++---
>>  builtin/mktree.c             | 45 ++++++++++++++++++++++++++++++++----
>>  t/t1010-mktree.sh            | 36 +++++++++++++++++++++++++++--
>>  3 files changed, 80 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
>> index fb07e40cef0..afbc846d077 100644
>> --- a/Documentation/git-mktree.txt
>> +++ b/Documentation/git-mktree.txt
>> @@ -43,9 +43,11 @@ OPTIONS
>>  INPUT FORMAT
>>  ------------
>>  Tree entries may be specified in any of the formats compatible with the
>> -`--index-info` option to linkgit:git-update-index[1]. The order of the tree
>> -entries is normalized by `mktree` so pre-sorting the input by path is not
>> -required.
>> +`--index-info` option to linkgit:git-update-index[1].
>> +
>> +The order of the tree entries is normalized by `mktree` so pre-sorting the input
>> +by path is not required. Multiple entries provided with the same path are
>> +deduplicated, with only the last one specified added to the tree.
> 
> Hm. I'm not sure whether this is a good idea. With git-mktree(1) being
> part of our plumbing layer, you can expect that it's mostly going to be
> fed input from scripts. And any script that generates duplicate tree
> entries is broken, but we now start to paper over such brokenness
> without giving the user any indicator of this. As user of git-mktree(1)
> in Gitaly I can certainly say that I'd rather want to see it die instead
> of silently fixing my inputs so that I start to notice my own bugs.

'git mktree' already does some cleaning of the inputs by sorting the
entries, presumably so that a valid tree is created rather than one with
ordering errors. Deduplication is also a cleanup of user inputs to ensure a
valid tree is created, so to me it's a consistent extension to existing
behavior. Conversely, rejecting the inputs and failing would be introducing
an error scenario where none existed previously, which to me would be a
bigger deviation.

One potential way to get the kind of functionality you're looking for,
though, might be to combine something like '--literally' and a '--strict'
that validates the tree before writing. Like I mentioned in the cover letter
[1], I do plan to submit a follow-up series with '--strict' (it's just that
this series is already pretty long and it would add 4-ish more patches). 

[1] https://lore.kernel.org/git/pull.1746.git.1718130288.gitgitgadget@gmail.com/

> So without seeing a strong motivating usecase for this feature I'd think
> that git-mktree(1) should reject such inputs and return an error such
> that the user can fix their tooling.

Practically, there are a couple of reasons that led me to wanting this
behavior. One is that it allows using data structures with more rigid
integrity checks (like the index & cache tree). The other is that, once the
ability to add nested entries is introduced, the concept of a "duplicate"
gets fuzzier and blocking them entirely could lead to inconsistencies and/or
limited flexibility. If, for example a user wants to create a tree with a
directory 'folder1/' with OID '0123456789012345678901234567890123456789',
but update a blob 'folder1/file1' in it to OID
'0987654321098765432109876543210987654321', the latter is technically a
"duplicate" but rejecting it would avoid being able to create the tree
without first expanding 'folder1/'with something like 'ls-tree', replacing the
appropriate entry, then calling 'mktree'.

> 
> Patrick



  reply	other threads:[~2024-06-12 18:48 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 01/16] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 02/16] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-11 18:45   ` Eric Sunshine
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 04/16] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-11 22:45   ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 05/16] index-info.c: identify empty input lines in read_index_info Victoria Dye via GitGitGadget
2024-06-11 22:52   ` Junio C Hamano
2024-06-18 17:33     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 06/16] index-info.c: parse object type in provided " Victoria Dye via GitGitGadget
2024-06-12  1:54   ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-12  2:11   ` Junio C Hamano
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-12 18:35     ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 08/16] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-12  2:18   ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 09/16] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-12  2:26   ` Junio C Hamano
2024-06-12 19:01     ` Victoria Dye
2024-06-12 19:45       ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 10/16] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-12 18:48     ` Victoria Dye [this message]
2024-06-11 18:24 ` [PATCH 11/16] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 12/16] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-13 18:38     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 13/16] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 14/16] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-12 19:50     ` Junio C Hamano
2024-06-17 19:23     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 15/16] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 16/16] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 01/17] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 02/17] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 03/17] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 04/17] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 05/17] index-info.c: return unrecognized lines to caller Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 06/17] index-info.c: parse object type in provided in read_index_info Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 07/17] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-20 20:18     ` Junio C Hamano
2024-06-19 21:57   ` [PATCH v2 08/17] mktree.c: do not fail on mismatched submodule type Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 09/17] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 10/17] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 11/17] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-20 22:05     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 12/17] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-20 22:26     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-26 21:10     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 14/17] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-19 21:58   ` [PATCH v2 15/17] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-26 21:23     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 16/17] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-27 19:29     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 17/17] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-25 23:26   ` [PATCH v2 00/17] mktree: support more flexible usage Junio C Hamano
2024-07-10 21:40     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dab4b0e3-8000-465e-8f0a-61df3d9168a3@github.com \
    --to=vdye@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).