* [WIP PATCH] fast-export: emit deletions first
@ 2026-04-06 6:36 Raymond E. Pasco
2026-04-06 17:15 ` Junio C Hamano
0 siblings, 1 reply; 8+ messages in thread
From: Raymond E. Pasco @ 2026-04-06 6:36 UTC (permalink / raw)
To: git; +Cc: ray
fast-export chooses its output order by pathname, sorting longer
paths earlier. However, this causes faulty output when the deleted
path is a prefix of the added one. For example, deleting a file 'a' and
creating a file 'a/b' emits:
from :prev_label
M 100644 :blob_label a/b
D a
Fix this by sorting deletions to come before other types of change.
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
---
This is a quick and dirty fix for the bug. However, I do want to spend a
little more time on it - it may be that we only want to reverse the sort
when the deletion is specifically the prefix of some addition, and I
want to fence this off with new tests.
builtin/fast-export.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index b90da5e616..82d73b2f43 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -354,6 +354,12 @@ static int depth_first(const void *a_, const void *b_)
int len_a, len_b, len;
int cmp;
+ /* emit deletions first */
+ int a_deletes = (a->status == DIFF_STATUS_DELETED);
+ int b_deletes = (b->status == DIFF_STATUS_DELETED);
+ if (a_deletes != b_deletes)
+ return b_deletes - a_deletes;
+
name_a = a->one ? a->one->path : a->two->path;
name_b = b->one ? b->one->path : b->two->path;
--
2.54.0.rc0.605.g598a273b03.dirty
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [WIP PATCH] fast-export: emit deletions first
2026-04-06 6:36 [WIP PATCH] fast-export: emit deletions first Raymond E. Pasco
@ 2026-04-06 17:15 ` Junio C Hamano
2026-04-06 21:29 ` Jeff King
2026-04-07 1:12 ` Raymond E. Pasco
0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2026-04-06 17:15 UTC (permalink / raw)
To: Raymond E. Pasco; +Cc: git
"Raymond E. Pasco" <ray@ameretat.dev> writes:
> fast-export chooses its output order by pathname, sorting longer
> paths earlier. However, this causes faulty output when the deleted
> path is a prefix of the added one. For example, deleting a file 'a' and
> creating a file 'a/b' emits:
>
> from :prev_label
> M 100644 :blob_label a/b
> D a
>
> Fix this by sorting deletions to come before other types of change.
>
> Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
> ---
>
> This is a quick and dirty fix for the bug. However, I do want to spend a
> little more time on it - it may be that we only want to reverse the sort
> when the deletion is specifically the prefix of some addition, and I
> want to fence this off with new tests.
I recall doing something like this in "git checkout" and also "git
am" to ensure that a thing deep in the hierarchy will not be
affected by a D/F conflict at a shallower level, so I do not have
objection to this kind of change in principle. I do not know if
depth_first() is the right place to make this decision or if the
function should keep its name if it turns out to be the right place.
In any case, it is a bit surprising that fast-export survived this
long without having encountering the problem you are solving. I
wonder if fast-import handles such an output with some smart to
avoid the issue?
Thanks.
> builtin/fast-export.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index b90da5e616..82d73b2f43 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -354,6 +354,12 @@ static int depth_first(const void *a_, const void *b_)
> int len_a, len_b, len;
> int cmp;
>
> + /* emit deletions first */
> + int a_deletes = (a->status == DIFF_STATUS_DELETED);
> + int b_deletes = (b->status == DIFF_STATUS_DELETED);
> + if (a_deletes != b_deletes)
> + return b_deletes - a_deletes;
> +
> name_a = a->one ? a->one->path : a->two->path;
> name_b = b->one ? b->one->path : b->two->path;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [WIP PATCH] fast-export: emit deletions first
2026-04-06 17:15 ` Junio C Hamano
@ 2026-04-06 21:29 ` Jeff King
2026-04-06 21:44 ` Elijah Newren
2026-04-07 1:12 ` Raymond E. Pasco
1 sibling, 1 reply; 8+ messages in thread
From: Jeff King @ 2026-04-06 21:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Elijah Newren, Raymond E. Pasco, git
On Mon, Apr 06, 2026 at 10:15:27AM -0700, Junio C Hamano wrote:
> In any case, it is a bit surprising that fast-export survived this
> long without having encountering the problem you are solving. I
> wonder if fast-import handles such an output with some smart to
> avoid the issue?
I think it has come up a few times, but we never actually applied a fix:
2015: https://lore.kernel.org/git/alpine.DEB.2.10.1508191532330.31851@buzzword-bingo.mit.edu/
2017: https://lore.kernel.org/git/1493079137-1838-1-git-send-email-miguel.torroja@gmail.com/
2023: https://lore.kernel.org/git/BBB169A5-0665-47C9-819B-6409A22AB699@lanl.gov/
Looks like discussion got hung up on ordering other types of
modifications, like renames (which can actually have cycles). But I
don't see anything to contradict the view that putting deletions first
solves real problems and would not harm anything. And the answer to "it
hurts to fast-export with renames" is probably "don't do it".
It's also possible that sorting should be the responsibility of the
receiver. I.e., should fast-import see:
M 100644 :blob_label a/b
D a
and figure it out? Or maybe we want both (to help other consumers of
fast-export, but also to help fast-import when consuming output of other
sources).
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [WIP PATCH] fast-export: emit deletions first
2026-04-06 21:29 ` Jeff King
@ 2026-04-06 21:44 ` Elijah Newren
2026-04-07 4:24 ` Jeff King
2026-04-07 21:28 ` Raymond E. Pasco
0 siblings, 2 replies; 8+ messages in thread
From: Elijah Newren @ 2026-04-06 21:44 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Raymond E. Pasco, git
On Mon, Apr 6, 2026 at 2:29 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Apr 06, 2026 at 10:15:27AM -0700, Junio C Hamano wrote:
>
> > In any case, it is a bit surprising that fast-export survived this
> > long without having encountering the problem you are solving. I
> > wonder if fast-import handles such an output with some smart to
> > avoid the issue?
>
> I think it has come up a few times, but we never actually applied a fix:
>
> 2015: https://lore.kernel.org/git/alpine.DEB.2.10.1508191532330.31851@buzzword-bingo.mit.edu/
> 2017: https://lore.kernel.org/git/1493079137-1838-1-git-send-email-miguel.torroja@gmail.com/
> 2023: https://lore.kernel.org/git/BBB169A5-0665-47C9-819B-6409A22AB699@lanl.gov/
>
> Looks like discussion got hung up on ordering other types of
> modifications, like renames (which can actually have cycles). But I
> don't see anything to contradict the view that putting deletions first
> solves real problems and would not harm anything. And the answer to "it
> hurts to fast-export with renames" is probably "don't do it".
>
> It's also possible that sorting should be the responsibility of the
> receiver. I.e., should fast-import see:
>
> M 100644 :blob_label a/b
> D a
>
> and figure it out? Or maybe we want both (to help other consumers of
> fast-export, but also to help fast-import when consuming output of other
> sources).
Would re-ordering on fast-import's side introduce bugs or violate
user's assumptions? Right now, fast-import has no check to prevent
more than one command for the same pathname being given, and has a
last-entry-wins ruling. Thus filemodify PATH followed by filedelete
PATH gives different results than reversing the order. Most probably
wouldn't care or want to ever do that, but I could see it as a way of
allowing you to change your mind in the stream and override an earlier
directive you sent.
Further, from this paragraph:
```
Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`,
`filedeleteall` and `notemodify` commands
may be included to update the contents of the branch prior to
creating the commit. These commands may be supplied in any order.
However it is recommended that a `filedeleteall` command precede
all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
the same commit, as `filedeleteall` wipes the branch clean (see below).
```
the comment about ordering with `filedeleteall` does suggest that
ordering matters to fast-import and thus perhaps that we shouldn't be
messing with the order the stream-writer gave us.
On the creator side, I agree that fast-export would definitely want to
sort its deletes before modifies to avoid D/F conflict issues. That
doesn't help with renames, but I agree with you that the answer for
renames is probably "then don't do that."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [WIP PATCH] fast-export: emit deletions first
2026-04-06 17:15 ` Junio C Hamano
2026-04-06 21:29 ` Jeff King
@ 2026-04-07 1:12 ` Raymond E. Pasco
2026-04-07 4:26 ` Jeff King
1 sibling, 1 reply; 8+ messages in thread
From: Raymond E. Pasco @ 2026-04-07 1:12 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On 26/04/06 10:15AM, Junio C Hamano wrote:
> In any case, it is a bit surprising that fast-export survived this
> long without having encountering the problem you are solving. I
> wonder if fast-import handles such an output with some smart to
> avoid the issue?
I was surprised too. The case where this was encountered was a repo
that had a directory symlink, and promoted it to a real directory,
but the symlink turned out to be a red herring, it's purely path
prefixes.
fast-import itself just does things in the order given; you might
rename with copy followed by delete (though 'R'ename was added at
some point). It's on the stream author for this to make sense;
fast-export doesn't use this pattern.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [WIP PATCH] fast-export: emit deletions first
2026-04-06 21:44 ` Elijah Newren
@ 2026-04-07 4:24 ` Jeff King
2026-04-07 21:28 ` Raymond E. Pasco
1 sibling, 0 replies; 8+ messages in thread
From: Jeff King @ 2026-04-07 4:24 UTC (permalink / raw)
To: Elijah Newren; +Cc: Junio C Hamano, Raymond E. Pasco, git
On Mon, Apr 06, 2026 at 02:44:05PM -0700, Elijah Newren wrote:
> > It's also possible that sorting should be the responsibility of the
> > receiver. I.e., should fast-import see:
> >
> > M 100644 :blob_label a/b
> > D a
> >
> > and figure it out? Or maybe we want both (to help other consumers of
> > fast-export, but also to help fast-import when consuming output of other
> > sources).
>
> Would re-ordering on fast-import's side introduce bugs or violate
> user's assumptions? Right now, fast-import has no check to prevent
> more than one command for the same pathname being given, and has a
> last-entry-wins ruling. Thus filemodify PATH followed by filedelete
> PATH gives different results than reversing the order. Most probably
> wouldn't care or want to ever do that, but I could see it as a way of
> allowing you to change your mind in the stream and override an earlier
> directive you sent.
Hmm, good point. It probably is better to leave the reading side as-is,
then, to be on the safe side.
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [WIP PATCH] fast-export: emit deletions first
2026-04-07 1:12 ` Raymond E. Pasco
@ 2026-04-07 4:26 ` Jeff King
0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2026-04-07 4:26 UTC (permalink / raw)
To: Raymond E. Pasco; +Cc: Junio C Hamano, git
On Mon, Apr 06, 2026 at 09:12:19PM -0400, Raymond E. Pasco wrote:
> On 26/04/06 10:15AM, Junio C Hamano wrote:
> > In any case, it is a bit surprising that fast-export survived this
> > long without having encountering the problem you are solving. I
> > wonder if fast-import handles such an output with some smart to
> > avoid the issue?
>
> I was surprised too. The case where this was encountered was a repo
> that had a directory symlink, and promoted it to a real directory,
> but the symlink turned out to be a red herring, it's purely path
> prefixes.
That's the original case from 2015, too. Which I guess is not too
surprising, since it's probably a more common conversion than a true
directory-into-file. There was a patch with a test provided in one of
the threads I linked earlier, in case that helps, but it is pretty easy
to write a new one.
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [WIP PATCH] fast-export: emit deletions first
2026-04-06 21:44 ` Elijah Newren
2026-04-07 4:24 ` Jeff King
@ 2026-04-07 21:28 ` Raymond E. Pasco
1 sibling, 0 replies; 8+ messages in thread
From: Raymond E. Pasco @ 2026-04-07 21:28 UTC (permalink / raw)
To: Elijah Newren; +Cc: Jeff King, Junio C Hamano, git
On 26/04/06 02:44PM, Elijah Newren wrote:
> On the creator side, I agree that fast-export would definitely want to
> sort its deletes before modifies to avoid D/F conflict issues. That
> doesn't help with renames, but I agree with you that the answer for
> renames is probably "then don't do that."
fast-export does force 'R'enames (of a to b) to appear after other lines
operating on a, 4ce6fb80 (fast-export: ensure that a renamed file is
printed after all references).
I think all Ds first works for the patterns fast-export actually uses.
According to a comment, the reason it's sorting by depth at all is a
subset of this, to put D a/b before M 120000 a, or similar.
The additional roundtripping tests I'm writing should handle all this, I
hope. I think now is a good time to get round-trips down, since people
might potentially use fast-export | fast-import to switch hash functions
(when commit and tag resigning are fully in fast-import).
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-04-07 21:29 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-06 6:36 [WIP PATCH] fast-export: emit deletions first Raymond E. Pasco
2026-04-06 17:15 ` Junio C Hamano
2026-04-06 21:29 ` Jeff King
2026-04-06 21:44 ` Elijah Newren
2026-04-07 4:24 ` Jeff King
2026-04-07 21:28 ` Raymond E. Pasco
2026-04-07 1:12 ` Raymond E. Pasco
2026-04-07 4:26 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox