Git development
 help / color / mirror / Atom feed
* [WIP PATCH] fast-export: emit deletions first
@ 2026-04-06  6:36 Raymond E. Pasco
  2026-04-06 17:15 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Raymond E. Pasco @ 2026-04-06  6:36 UTC (permalink / raw)
  To: git; +Cc: ray

fast-export chooses its output order by pathname, sorting longer
paths earlier. However, this causes faulty output when the deleted
path is a prefix of the added one. For example, deleting a file 'a' and
creating a file 'a/b' emits:

from :prev_label
M 100644 :blob_label a/b
D a

Fix this by sorting deletions to come before other types of change.

Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
---

This is a quick and dirty fix for the bug. However, I do want to spend a
little more time on it - it may be that we only want to reverse the sort
when the deletion is specifically the prefix of some addition, and I
want to fence this off with new tests.

 builtin/fast-export.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index b90da5e616..82d73b2f43 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -354,6 +354,12 @@ static int depth_first(const void *a_, const void *b_)
 	int len_a, len_b, len;
 	int cmp;
 
+	/* emit deletions first */
+	int a_deletes = (a->status == DIFF_STATUS_DELETED);
+	int b_deletes = (b->status == DIFF_STATUS_DELETED);
+	if (a_deletes != b_deletes)
+		return b_deletes - a_deletes;
+
 	name_a = a->one ? a->one->path : a->two->path;
 	name_b = b->one ? b->one->path : b->two->path;
 
-- 
2.54.0.rc0.605.g598a273b03.dirty


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [WIP PATCH] fast-export: emit deletions first
  2026-04-06  6:36 [WIP PATCH] fast-export: emit deletions first Raymond E. Pasco
@ 2026-04-06 17:15 ` Junio C Hamano
  2026-04-06 21:29   ` Jeff King
  2026-04-07  1:12   ` Raymond E. Pasco
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2026-04-06 17:15 UTC (permalink / raw)
  To: Raymond E. Pasco; +Cc: git

"Raymond E. Pasco" <ray@ameretat.dev> writes:

> fast-export chooses its output order by pathname, sorting longer
> paths earlier. However, this causes faulty output when the deleted
> path is a prefix of the added one. For example, deleting a file 'a' and
> creating a file 'a/b' emits:
>
> from :prev_label
> M 100644 :blob_label a/b
> D a
>
> Fix this by sorting deletions to come before other types of change.
>
> Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
> ---
>
> This is a quick and dirty fix for the bug. However, I do want to spend a
> little more time on it - it may be that we only want to reverse the sort
> when the deletion is specifically the prefix of some addition, and I
> want to fence this off with new tests.

I recall doing something like this in "git checkout" and also "git
am" to ensure that a thing deep in the hierarchy will not be
affected by a D/F conflict at a shallower level, so I do not have
objection to this kind of change in principle.  I do not know if
depth_first() is the right place to make this decision or if the
function should keep its name if it turns out to be the right place.

In any case, it is a bit surprising that fast-export survived this
long without having encountering the problem you are solving.  I
wonder if fast-import handles such an output with some smart to
avoid the issue?

Thanks.

>  builtin/fast-export.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index b90da5e616..82d73b2f43 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -354,6 +354,12 @@ static int depth_first(const void *a_, const void *b_)
>  	int len_a, len_b, len;
>  	int cmp;
>  
> +	/* emit deletions first */
> +	int a_deletes = (a->status == DIFF_STATUS_DELETED);
> +	int b_deletes = (b->status == DIFF_STATUS_DELETED);
> +	if (a_deletes != b_deletes)
> +		return b_deletes - a_deletes;
> +
>  	name_a = a->one ? a->one->path : a->two->path;
>  	name_b = b->one ? b->one->path : b->two->path;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [WIP PATCH] fast-export: emit deletions first
  2026-04-06 17:15 ` Junio C Hamano
@ 2026-04-06 21:29   ` Jeff King
  2026-04-06 21:44     ` Elijah Newren
  2026-04-07  1:12   ` Raymond E. Pasco
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff King @ 2026-04-06 21:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Elijah Newren, Raymond E. Pasco, git

On Mon, Apr 06, 2026 at 10:15:27AM -0700, Junio C Hamano wrote:

> In any case, it is a bit surprising that fast-export survived this
> long without having encountering the problem you are solving.  I
> wonder if fast-import handles such an output with some smart to
> avoid the issue?

I think it has come up a few times, but we never actually applied a fix:

  2015: https://lore.kernel.org/git/alpine.DEB.2.10.1508191532330.31851@buzzword-bingo.mit.edu/
  2017: https://lore.kernel.org/git/1493079137-1838-1-git-send-email-miguel.torroja@gmail.com/
  2023: https://lore.kernel.org/git/BBB169A5-0665-47C9-819B-6409A22AB699@lanl.gov/

Looks like discussion got hung up on ordering other types of
modifications, like renames (which can actually have cycles). But I
don't see anything to contradict the view that putting deletions first
solves real problems and would not harm anything. And the answer to "it
hurts to fast-export with renames" is probably "don't do it".

It's also possible that sorting should be the responsibility of the
receiver. I.e., should fast-import see:

  M 100644 :blob_label a/b
  D a

and figure it out? Or maybe we want both (to help other consumers of
fast-export, but also to help fast-import when consuming output of other
sources).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [WIP PATCH] fast-export: emit deletions first
  2026-04-06 21:29   ` Jeff King
@ 2026-04-06 21:44     ` Elijah Newren
  2026-04-07  4:24       ` Jeff King
  2026-04-07 21:28       ` Raymond E. Pasco
  0 siblings, 2 replies; 8+ messages in thread
From: Elijah Newren @ 2026-04-06 21:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Raymond E. Pasco, git

On Mon, Apr 6, 2026 at 2:29 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Apr 06, 2026 at 10:15:27AM -0700, Junio C Hamano wrote:
>
> > In any case, it is a bit surprising that fast-export survived this
> > long without having encountering the problem you are solving.  I
> > wonder if fast-import handles such an output with some smart to
> > avoid the issue?
>
> I think it has come up a few times, but we never actually applied a fix:
>
>   2015: https://lore.kernel.org/git/alpine.DEB.2.10.1508191532330.31851@buzzword-bingo.mit.edu/
>   2017: https://lore.kernel.org/git/1493079137-1838-1-git-send-email-miguel.torroja@gmail.com/
>   2023: https://lore.kernel.org/git/BBB169A5-0665-47C9-819B-6409A22AB699@lanl.gov/
>
> Looks like discussion got hung up on ordering other types of
> modifications, like renames (which can actually have cycles). But I
> don't see anything to contradict the view that putting deletions first
> solves real problems and would not harm anything. And the answer to "it
> hurts to fast-export with renames" is probably "don't do it".
>
> It's also possible that sorting should be the responsibility of the
> receiver. I.e., should fast-import see:
>
>   M 100644 :blob_label a/b
>   D a
>
> and figure it out? Or maybe we want both (to help other consumers of
> fast-export, but also to help fast-import when consuming output of other
> sources).

Would re-ordering on fast-import's side introduce bugs or violate
user's assumptions?  Right now, fast-import has no check to prevent
more than one command for the same pathname being given, and has a
last-entry-wins ruling.  Thus filemodify PATH followed by filedelete
PATH gives different results than reversing the order.  Most probably
wouldn't care or want to ever do that, but I could see it as a way of
allowing you to change your mind in the stream and override an earlier
directive you sent.

Further, from this paragraph:
```
Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`,
`filedeleteall` and `notemodify` commands
may be included to update the contents of the branch prior to
creating the commit.  These commands may be supplied in any order.
However it is recommended that a `filedeleteall` command precede
all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
the same commit, as `filedeleteall` wipes the branch clean (see below).
```
the comment about ordering with `filedeleteall` does suggest that
ordering matters to fast-import and thus perhaps that we shouldn't be
messing with the order the stream-writer gave us.

On the creator side, I agree that fast-export would definitely want to
sort its deletes before modifies to avoid D/F conflict issues.  That
doesn't help with renames, but I agree with you that the answer for
renames is probably "then don't do that."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [WIP PATCH] fast-export: emit deletions first
  2026-04-06 17:15 ` Junio C Hamano
  2026-04-06 21:29   ` Jeff King
@ 2026-04-07  1:12   ` Raymond E. Pasco
  2026-04-07  4:26     ` Jeff King
  1 sibling, 1 reply; 8+ messages in thread
From: Raymond E. Pasco @ 2026-04-07  1:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 26/04/06 10:15AM, Junio C Hamano wrote:
> In any case, it is a bit surprising that fast-export survived this
> long without having encountering the problem you are solving.  I
> wonder if fast-import handles such an output with some smart to
> avoid the issue?

I was surprised too. The case where this was encountered was a repo
that had a directory symlink, and promoted it to a real directory,
but the symlink turned out to be a red herring, it's purely path
prefixes.

fast-import itself just does things in the order given; you might
rename with copy followed by delete (though 'R'ename was added at
some point). It's on the stream author for this to make sense;
fast-export doesn't use this pattern.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [WIP PATCH] fast-export: emit deletions first
  2026-04-06 21:44     ` Elijah Newren
@ 2026-04-07  4:24       ` Jeff King
  2026-04-07 21:28       ` Raymond E. Pasco
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff King @ 2026-04-07  4:24 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Junio C Hamano, Raymond E. Pasco, git

On Mon, Apr 06, 2026 at 02:44:05PM -0700, Elijah Newren wrote:

> > It's also possible that sorting should be the responsibility of the
> > receiver. I.e., should fast-import see:
> >
> >   M 100644 :blob_label a/b
> >   D a
> >
> > and figure it out? Or maybe we want both (to help other consumers of
> > fast-export, but also to help fast-import when consuming output of other
> > sources).
> 
> Would re-ordering on fast-import's side introduce bugs or violate
> user's assumptions?  Right now, fast-import has no check to prevent
> more than one command for the same pathname being given, and has a
> last-entry-wins ruling.  Thus filemodify PATH followed by filedelete
> PATH gives different results than reversing the order.  Most probably
> wouldn't care or want to ever do that, but I could see it as a way of
> allowing you to change your mind in the stream and override an earlier
> directive you sent.

Hmm, good point. It probably is better to leave the reading side as-is,
then, to be on the safe side.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [WIP PATCH] fast-export: emit deletions first
  2026-04-07  1:12   ` Raymond E. Pasco
@ 2026-04-07  4:26     ` Jeff King
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2026-04-07  4:26 UTC (permalink / raw)
  To: Raymond E. Pasco; +Cc: Junio C Hamano, git

On Mon, Apr 06, 2026 at 09:12:19PM -0400, Raymond E. Pasco wrote:

> On 26/04/06 10:15AM, Junio C Hamano wrote:
> > In any case, it is a bit surprising that fast-export survived this
> > long without having encountering the problem you are solving.  I
> > wonder if fast-import handles such an output with some smart to
> > avoid the issue?
> 
> I was surprised too. The case where this was encountered was a repo
> that had a directory symlink, and promoted it to a real directory,
> but the symlink turned out to be a red herring, it's purely path
> prefixes.

That's the original case from 2015, too. Which I guess is not too
surprising, since it's probably a more common conversion than a true
directory-into-file. There was a patch with a test provided in one of
the threads I linked earlier, in case that helps, but it is pretty easy
to write a new one.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [WIP PATCH] fast-export: emit deletions first
  2026-04-06 21:44     ` Elijah Newren
  2026-04-07  4:24       ` Jeff King
@ 2026-04-07 21:28       ` Raymond E. Pasco
  1 sibling, 0 replies; 8+ messages in thread
From: Raymond E. Pasco @ 2026-04-07 21:28 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jeff King, Junio C Hamano, git

On 26/04/06 02:44PM, Elijah Newren wrote:
> On the creator side, I agree that fast-export would definitely want to
> sort its deletes before modifies to avoid D/F conflict issues.  That
> doesn't help with renames, but I agree with you that the answer for
> renames is probably "then don't do that."

fast-export does force 'R'enames (of a to b) to appear after other lines
operating on a, 4ce6fb80 (fast-export: ensure that a renamed file is
printed after all references).

I think all Ds first works for the patterns fast-export actually uses.
According to a comment, the reason it's sorting by depth at all is a
subset of this, to put D a/b before M 120000 a, or similar.

The additional roundtripping tests I'm writing should handle all this, I
hope. I think now is a good time to get round-trips down, since people
might potentially use fast-export | fast-import to switch hash functions
(when commit and tag resigning are fully in fast-import).

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-07 21:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-06  6:36 [WIP PATCH] fast-export: emit deletions first Raymond E. Pasco
2026-04-06 17:15 ` Junio C Hamano
2026-04-06 21:29   ` Jeff King
2026-04-06 21:44     ` Elijah Newren
2026-04-07  4:24       ` Jeff King
2026-04-07 21:28       ` Raymond E. Pasco
2026-04-07  1:12   ` Raymond E. Pasco
2026-04-07  4:26     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox