git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] builtin-apply: keep information about files to be deleted
@ 2009-04-11 19:31 Michał Kiedrowicz
  2009-04-13 13:51 ` Michał Kiedrowicz
  2009-04-13 18:51 ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Michał Kiedrowicz @ 2009-04-11 19:31 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Michał Kiedrowicz

Example correct diff generated by `diff -M -B' might look like this:

	diff --git a/file1 b/file2
	similarity index 100%
	rename from file1
	rename to file2
	diff --git a/file2 b/file1
	similarity index 100%
	rename from file2
	rename to file1

Information about removing `file2' comes after information about creation
of new `file2' (renamed from `file1'). Existing implementation isn't able to
apply such patch, because it has to know in advance which files will be
removed.

This patch populates fn_table with information about removal of files
before calling check_patch() for each patch to be applied.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 builtin-apply.c |   44 +++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/builtin-apply.c b/builtin-apply.c
index 1926cd8..6f6bf85 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -2271,6 +2271,16 @@ static struct patch *in_fn_table(const char *name)
 	return NULL;
 }
 
+static int to_be_deleted(struct patch *patch)
+{
+	return patch == (struct patch *) -2;
+}
+
+static int was_deleted(struct patch *patch)
+{
+	return patch == (struct patch *) -1;
+}
+
 static void add_to_fn_table(struct patch *patch)
 {
 	struct string_list_item *item;
@@ -2295,6 +2305,24 @@ static void add_to_fn_table(struct patch *patch)
 	}
 }
 
+static void prepare_fn_table(struct patch *patch)
+{
+	/*
+	 * store information about incoming file deletion
+	 */
+
+	while (patch) {
+
+		if ((patch->new_name == NULL) || (patch->is_rename)) {
+			struct string_list_item *item =
+				string_list_insert(patch->old_name, &fn_table);
+			item->util = (struct patch *) -2;
+		}
+
+		patch = patch->next;
+	}
+}
+
 static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *ce)
 {
 	struct strbuf buf = STRBUF_INIT;
@@ -2304,8 +2332,8 @@ static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *
 	struct patch *tpatch;
 
 	if (!(patch->is_copy || patch->is_rename) &&
-	    ((tpatch = in_fn_table(patch->old_name)) != NULL)) {
-		if (tpatch == (struct patch *) -1) {
+	    (tpatch = in_fn_table(patch->old_name)) != NULL && !to_be_deleted(tpatch)) {
+		if (was_deleted(tpatch)) {
 			return error("patch %s has been renamed/deleted",
 				patch->old_name);
 		}
@@ -2399,8 +2427,8 @@ static int check_preimage(struct patch *patch, struct cache_entry **ce, struct s
 	assert(patch->is_new <= 0);
 
 	if (!(patch->is_copy || patch->is_rename) &&
-	    (tpatch = in_fn_table(old_name)) != NULL) {
-		if (tpatch == (struct patch *) -1) {
+	    (tpatch = in_fn_table(old_name)) != NULL && !to_be_deleted(tpatch)) {
+		if (was_deleted(tpatch)) {
 			return error("%s: has been deleted/renamed", old_name);
 		}
 		st_mode = tpatch->new_mode;
@@ -2410,6 +2438,8 @@ static int check_preimage(struct patch *patch, struct cache_entry **ce, struct s
 			return error("%s: %s", old_name, strerror(errno));
 	}
 
+	if(to_be_deleted(tpatch)) tpatch = NULL;
+
 	if (check_index && !tpatch) {
 		int pos = cache_name_pos(old_name, strlen(old_name));
 		if (pos < 0) {
@@ -2471,6 +2501,7 @@ static int check_patch(struct patch *patch)
 	const char *new_name = patch->new_name;
 	const char *name = old_name ? old_name : new_name;
 	struct cache_entry *ce = NULL;
+	struct patch *tpatch;
 	int ok_if_exists;
 	int status;
 
@@ -2481,7 +2512,8 @@ static int check_patch(struct patch *patch)
 		return status;
 	old_name = patch->old_name;
 
-	if (in_fn_table(new_name) == (struct patch *) -1)
+	if ((tpatch = in_fn_table(new_name)) &&
+			(was_deleted(tpatch) || to_be_deleted(tpatch)))
 		/*
 		 * A type-change diff is always split into a patch to
 		 * delete old, immediately followed by a patch to
@@ -2533,6 +2565,8 @@ static int check_patch_list(struct patch *patch)
 {
 	int err = 0;
 
+	prepare_fn_table(patch);
+
 	while (patch) {
 		if (apply_verbosely)
 			say_patch_name(stderr,
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] builtin-apply: keep information about files to be deleted
  2009-04-11 19:31 [PATCH] builtin-apply: keep information about files to be deleted Michał Kiedrowicz
@ 2009-04-13 13:51 ` Michał Kiedrowicz
  2009-04-13 18:51 ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Michał Kiedrowicz @ 2009-04-13 13:51 UTC (permalink / raw)
  To: Git Mailing List

Dnia 2009-04-11, o godz. 21:31:00
Michał Kiedrowicz <michal.kiedrowicz@gmail.com> napisał(a):

> Example correct diff generated by `diff -M -B' might look like this:
> 
> 	diff --git a/file1 b/file2
> 	similarity index 100%
> 	rename from file1
> 	rename to file2
> 	diff --git a/file2 b/file1
> 	similarity index 100%
> 	rename from file2
> 	rename to file1
> 
> Information about removing `file2' comes after information about
> creation of new `file2' (renamed from `file1'). Existing
> implementation isn't able to apply such patch, because it has to know
> in advance which files will be removed.
> 
> This patch populates fn_table with information about removal of files
> before calling check_patch() for each patch to be applied.
> 
> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>

Can anyone comment on this patch? It should fix bug mentioned at

http://www.spinics.net/lists/git/msg100481.html

-- 
Michał Kiedrowicz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] builtin-apply: keep information about files to be deleted
  2009-04-11 19:31 [PATCH] builtin-apply: keep information about files to be deleted Michał Kiedrowicz
  2009-04-13 13:51 ` Michał Kiedrowicz
@ 2009-04-13 18:51 ` Junio C Hamano
  2009-04-13 21:03   ` Michał Kiedrowicz
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2009-04-13 18:51 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: Git Mailing List

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> diff --git a/builtin-apply.c b/builtin-apply.c
> index 1926cd8..6f6bf85 100644
> --- a/builtin-apply.c
> +++ b/builtin-apply.c
> @@ -2271,6 +2271,16 @@ static struct patch *in_fn_table(const char *name)
>  	return NULL;
>  }
>  
> +static int to_be_deleted(struct patch *patch)
> +{
> +	return patch == (struct patch *) -2;
> +}
> +
> +static int was_deleted(struct patch *patch)
> +{
> +	return patch == (struct patch *) -1;
> +}

Please use more descriptive symbolic constants, and add a comment.
Perhaps:

    /*
     * item->util in the filename table records the status of the path.
     * Usually it points at a patch (whose result records the contents
     * of it after applying it), but it could be PATH_WAS_DELETED for a
     * path that a previously applied patch has already removed.
     */
    #define PATH_TO_BE_DELETED ((struct patch *) -2)
    #define PATH_WAS_DELETED ((struct patch *) -1)

> @@ -2295,6 +2305,24 @@ static void add_to_fn_table(struct patch *patch)
> ...
> +static void prepare_fn_table(struct patch *patch)
> +{
> +	/*
> +	 * store information about incoming file deletion
> +	 */
> +	while (patch) {
> +		if ((patch->new_name == NULL) || (patch->is_rename)) {
> +			struct string_list_item *item =
> +				string_list_insert(patch->old_name, &fn_table);
> +			item->util = (struct patch *) -2;
> +		}
> +		patch = patch->next;
> +	}
> +}

This PATH_TO_BE_DELETED logic should be Ok for the normal case, but it
seems a bit fragile.  In a sequence of patches, if you have even one patch
that makes the path disappear, you initialize it as PATH_TO_BE_DELETED,
and special case the "creation should not clobber existing path" rule to
allow it to be present in the tree.

That may make this sequence work, I presume, with your change:

	patch #1	renames frotz.c to hello.c
        patch #2	renames hello.c to frotz.c

because of patch #2, hello.c is marked as PATH_TO_BE_DELETED initially and
then when patch #1 is handled, frotz.c is allowed to replace it.

But if you have further patches that do the following (the "file table"
mechanism was added to handle concatenated patches that affect the same
path more than once), I thing PATH_TO_BE_DELETED logic would break down:

        patch #3	renames alpha.c to hello.c
	patch #4	renames hello.c to alpha.c

When patch #3 is handled, the PATH_TO_BE_DELETED mark is long gone from
hello.c, and we will see the same failure you addressed in your patch,
won't we?

The prepare_fn_table() may be a good place to diagnose such a situation
and warn or error out if the user feeds such an input we cannot handle
sanely.

> @@ -2410,6 +2438,8 @@ static int check_preimage(struct patch *patch, struct cache_entry **ce, struct s
>  			return error("%s: %s", old_name, strerror(errno));
>  	}
>  
> +	if(to_be_deleted(tpatch)) tpatch = NULL;
> +

Style;

	if (to_be_deleted(tpatch))
        	tpatch = NULL;

Other than that, I think it is a sensible approach.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] builtin-apply: keep information about files to be deleted
  2009-04-13 18:51 ` Junio C Hamano
@ 2009-04-13 21:03   ` Michał Kiedrowicz
  2009-04-13 21:30     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Michał Kiedrowicz @ 2009-04-13 21:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> wrote:

> This PATH_TO_BE_DELETED logic should be Ok for the normal case, but it
> seems a bit fragile.  In a sequence of patches, if you have even one
> patch that makes the path disappear, you initialize it as
> PATH_TO_BE_DELETED, and special case the "creation should not clobber
> existing path" rule to allow it to be present in the tree.
> 
> That may make this sequence work, I presume, with your change:
> 
> 	patch #1	renames frotz.c to hello.c
>       patch #2	renames hello.c to frotz.c
> 
> because of patch #2, hello.c is marked as PATH_TO_BE_DELETED
> initially and then when patch #1 is handled, frotz.c is allowed to
> replace it.
> 
> But if you have further patches that do the following (the "file
> table" mechanism was added to handle concatenated patches that affect
> the same path more than once), I thing PATH_TO_BE_DELETED logic would
> break down:
> 
>       patch #3	renames alpha.c to hello.c
> 	patch #4	renames hello.c to alpha.c
> 
> When patch #3 is handled, the PATH_TO_BE_DELETED mark is long gone
> from hello.c, and we will see the same failure you addressed in your
> patch, won't we?

As far as I understand the code, diffs are applied independently
(for every file apply_patch() is called) and for every apply_patch()
call fn_table is cleared. So situation you described in only possible
in a *single* diff and I don't think it is possible to happen. 

Performing two criss-cross renames results in following diff:

	mv file1 tmp
	mv file2 file1
	mv tmp file2

	mv file1 tmp
	mv file3 file1
	mv tmp file3

	git diff -M -B

diff --git a/file3 b/file1
similarity index 100%
rename from file3
rename to file1
diff --git a/file1 b/file2
similarity index 100%
rename from file1
rename to file2
diff --git a/file2 b/file3
similarity index 100%
rename from file2
rename to file3

However, sanity checking still may be performed and error printed on
situations which cannot be resolved.

> 
> The prepare_fn_table() may be a good place to diagnose such a
> situation and warn or error out if the user feeds such an input we
> cannot handle sanely.
> 

I'll add some checks to this function as you suggest.

> 
> Style;
> 
> 	if (to_be_deleted(tpatch))
>         	tpatch = NULL;
> 
> Other than that, I think it is a sensible approach.

Thanks for feedback.

-- 
Michał Kiedrowicz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] builtin-apply: keep information about files to be deleted
  2009-04-13 21:03   ` Michał Kiedrowicz
@ 2009-04-13 21:30     ` Junio C Hamano
  2009-04-17 17:23       ` Michał Kiedrowicz
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2009-04-13 21:30 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
> ...
>> But if you have further patches that do the following (the "file
>> table" mechanism was added to handle concatenated patches that affect
>> the same path more than once), I thing PATH_TO_BE_DELETED logic would
>> break down:
>> 
>>      patch #3	renames alpha.c to hello.c
>> 	patch #4	renames hello.c to alpha.c
>> 
>> When patch #3 is handled, the PATH_TO_BE_DELETED mark is long gone
>> from hello.c, and we will see the same failure you addressed in your
>> patch, won't we?
>
> As far as I understand the code, diffs are applied independently
> (for every file apply_patch() is called) and for every apply_patch()
> call fn_table is cleared. So situation you described in only possible
> in a *single* diff and I don't think it is possible to happen. 

Yes, one invocation of "git format-patch -1" will not produce such a
situation.

A single diff file that is concatenation of two "git format-patch -1"
output (or just a plain-old "diff -ru" output from outside git, perhaps
managed in quilt) was what introduced fn_table mechanism.  Apparently
people use "git apply" to apply such a patch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] builtin-apply: keep information about files to be deleted
  2009-04-13 21:30     ` Junio C Hamano
@ 2009-04-17 17:23       ` Michał Kiedrowicz
  2009-04-18  4:59         ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Michał Kiedrowicz @ 2009-04-17 17:23 UTC (permalink / raw)
  To: git


W dniu 13 kwietnia 2009 23:30 użytkownik Junio C Hamano
<gitster@pobox.com> napisał:
>  
> >
> > As far as I understand the code, diffs are applied independently
> > (for every file apply_patch() is called) and for every apply_patch()
> > call fn_table is cleared. So situation you described in only
> > possible in a *single* diff and I don't think it is possible to
> > happen.  
>
> Yes, one invocation of "git format-patch -1" will not produce such a
> situation.
>
> A single diff file that is concatenation of two "git format-patch -1"
> output (or just a plain-old "diff -ru" output from outside git,
> perhaps managed in quilt) was what introduced fn_table mechanism.
>  Apparently people use "git apply" to apply such a patch.
>  

I have been thinking about that and IMO something is not right in
handling multiple patches. I'm still new to git, so I may be wrong.
Look:

Suppose I have 3 patches:

patch #1: modify A
patch #2: rename A to B
patch #3: modify B

These patches will be applied correctly.

But, if I swap patches #1 and #3, none of them will be applied. This
is because of 2 rules, implemented in add_to_fn_table():

1. If a file was renamed/deleted, applying a patch is not possible.
2. If a file is new/modified, applying a patch is possible.

They seem reasonable. In previous example, file A comes under rule #1
and file B under rule #2. However, there are some cases when these two
rules may cause problems:

patch #1: rename A to B
patch #2: rename C to A
patch #3: modify A

Should patch #3 modify B (which was A) or A (which was C)?

patch #1: rename A to B
patch #2: rename B to A
patch #3: modify A
patch #4: modify B

Which files should be patched by #3 and #4?

In my opinion both #3 and #4 should fail (or both should succeed) --
with my patch only #3 will work and #4 will be rejected, because in #2
B was marked as deleted.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] builtin-apply: keep information about files to be deleted
  2009-04-17 17:23       ` Michał Kiedrowicz
@ 2009-04-18  4:59         ` Junio C Hamano
  2009-04-18 11:27           ` Andreas Ericsson
  2009-04-18 20:58           ` Michał Kiedrowicz
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2009-04-18  4:59 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> ... However, there are some cases when these two
> rules may cause problems:
>
> patch #1: rename A to B
> patch #2: rename C to A
> patch #3: modify A
>
> Should patch #3 modify B (which was A) or A (which was C)?
>
> patch #1: rename A to B
> patch #2: rename B to A
> patch #3: modify A
> patch #4: modify B
>
> Which files should be patched by #3 and #4?
>
> In my opinion both #3 and #4 should fail (or both should succeed) --
> with my patch only #3 will work and #4 will be rejected, because in #2
> B was marked as deleted.

Both of the examples above cannot be emitted as a single commit by
format-patch; the user is feeding a combined patch.  Perhaps renames
in each example sequence were came from one git commit but modifications
are from separate commit or handcrafted "follow-up" patch.

There are two stances we can take:

 (1) The user knows what he is doing.

     In the first example, if he wanted the change in #3 to end up in B,
     he would have arranged the patches in a different order, namely, 3 1
     2, but he didn't.  We should modify A (that came from C).

 (2) In situations like these when it is unusual and there is no clear and
     unambiguous answer, the rule has always been "fail and ask the user
     to clean up", because silently doing a wrong thing in an unusual
     situation that happens only once in a while is far worse than
     interrupting the user and forcing a manual intervention.

     In the first example, there is no clear answer.  Perhaps all three
     patches were independent patches (the first two obviously came from
     git because only we can do renames, but they may have been separate
     commits), and the user may have reordered them (or just picked a
     random order because he was linearizing a history with a merge).

The second one is even iffier.  If we _know_ that originally patch #1 and
#2 came from the same commit, then they represent swapping between A and
B, but if they came from different git commits, and if the user picked
patches in a random order, it may mean something completely different.

I am somewhat tempted to say that we should fail all of them, including
the original "single patch swapping files" brought up by Linus.

BUT

Can we make use simple rule to detect problematic cases?

 - An input to git-apply can contain more than one patch that affects a
   path; however

   - you cannot create a path that still exists, except for a path that
     _will_ be renamed away or removed (your patch fixes this by adding
     this "except for..." part to loosen the existing rule);

   - you cannot modify a path in a separate patch if it is involved in an
     either side of a rename (this will catch the ambiguity of patch #3 in
     your first example and #3 and #4 in your second example);

 - In addition:

   - the same path cannot be renamed from more than once (this will catch
     concatenation of two git generated patches);

With such a change, I think we can keep the safety of "when there are more
than one plausible outcomes, the tool shouldn't silently decide, nor make
progress that the user later needs to undo and redo", while allowing a
sane use of rename patches generated out of a git commit.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] builtin-apply: keep information about files to be deleted
  2009-04-18  4:59         ` Junio C Hamano
@ 2009-04-18 11:27           ` Andreas Ericsson
  2009-04-18 19:56             ` Junio C Hamano
  2009-04-18 20:58           ` Michał Kiedrowicz
  1 sibling, 1 reply; 12+ messages in thread
From: Andreas Ericsson @ 2009-04-18 11:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michał Kiedrowicz, git

Junio C Hamano wrote:
> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
>> ... However, there are some cases when these two
>> rules may cause problems:
>>
>> patch #1: rename A to B
>> patch #2: rename C to A
>> patch #3: modify A
>>
>> Should patch #3 modify B (which was A) or A (which was C)?
>>
>> patch #1: rename A to B
>> patch #2: rename B to A
>> patch #3: modify A
>> patch #4: modify B
>>
>> Which files should be patched by #3 and #4?
>>
>> In my opinion both #3 and #4 should fail (or both should succeed) --
>> with my patch only #3 will work and #4 will be rejected, because in #2
>> B was marked as deleted.
> 
> Both of the examples above cannot be emitted as a single commit by
> format-patch; the user is feeding a combined patch.  Perhaps renames
> in each example sequence were came from one git commit but modifications
> are from separate commit or handcrafted "follow-up" patch.
> 
> There are two stances we can take:
> 
>  (1) The user knows what he is doing.
> 
>      In the first example, if he wanted the change in #3 to end up in B,
>      he would have arranged the patches in a different order, namely, 3 1
>      2, but he didn't.  We should modify A (that came from C).
> 

This gets my vote. Standard "diff -u" patches have always had to be
numbered properly if they have even the slightest chance of interfering
with each other, so developers are already used to it.

/Andreas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] builtin-apply: keep information about files to be deleted
  2009-04-18 11:27           ` Andreas Ericsson
@ 2009-04-18 19:56             ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2009-04-18 19:56 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Michał Kiedrowicz, git

Andreas Ericsson <exon@op5.com> writes:

>> There are two stances we can take:
>>
>>  (1) The user knows what he is doing.
>>
>>      In the first example, if he wanted the change in #3 to end up in B,
>>      he would have arranged the patches in a different order, namely, 3 1
>>      2, but he didn't.  We should modify A (that came from C).
>>
>
> This gets my vote. Standard "diff -u" patches have always had to be
> numbered properly if they have even the slightest chance of interfering
> with each other, so developers are already used to it.

You stripped the more important part from the quote, where I describe why
this would not work well for the second situation.  Without addressing it,
how could you possibly vote?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] builtin-apply: keep information about files to be deleted
  2009-04-18  4:59         ` Junio C Hamano
  2009-04-18 11:27           ` Andreas Ericsson
@ 2009-04-18 20:58           ` Michał Kiedrowicz
  2009-04-18 21:03             ` [PATCH] tests: make test-apply-criss-cross-rename more robust Michał Kiedrowicz
  2009-04-18 22:05             ` [PATCH] builtin-apply: keep information about files to be deleted Junio C Hamano
  1 sibling, 2 replies; 12+ messages in thread
From: Michał Kiedrowicz @ 2009-04-18 20:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
>   
>> ... However, there are some cases when these two
>> rules may cause problems:
>>
>> patch #1: rename A to B
>> patch #2: rename C to A
>> patch #3: modify A
>>
>> Should patch #3 modify B (which was A) or A (which was C)?
>>
>> patch #1: rename A to B
>> patch #2: rename B to A
>> patch #3: modify A
>> patch #4: modify B
>>
>> Which files should be patched by #3 and #4?
>>
>> In my opinion both #3 and #4 should fail (or both should succeed) --
>> with my patch only #3 will work and #4 will be rejected, because in
>> #2 B was marked as deleted.  
> 
> Both of the examples above cannot be emitted as a single commit by
> format-patch; the user is feeding a combined patch.  Perhaps renames
> in each example sequence were came from one git commit but
> modifications are from separate commit or handcrafted "follow-up"
> patch.

Yes, that's true. In "normal" case, renames and modifications should be
handled properly and (generally) aren't subject of this discussion.

>
> There are two stances we can take:
> 
>  (1) The user knows what he is doing.
> 
>      In the first example, if he wanted the change in #3 to end up in
> B, he would have arranged the patches in a different order, namely, 3
> 1 2, but he didn't.  We should modify A (that came from C).
> 
>  (2) In situations like these when it is unusual and there is no
> clear and unambiguous answer, the rule has always been "fail and ask
> the user to clean up", because silently doing a wrong thing in an
> unusual situation that happens only once in a while is far worse than
>      interrupting the user and forcing a manual intervention.
> 
>      In the first example, there is no clear answer.  Perhaps all
> three patches were independent patches (the first two obviously came
> from git because only we can do renames, but they may have been
> separate commits), and the user may have reordered them (or just
> picked a random order because he was linearizing a history with a
> merge).
> 
> The second one is even iffier.  If we _know_ that originally patch #1
> and #2 came from the same commit, then they represent swapping
> between A and B, but if they came from different git commits, and if
> the user picked patches in a random order, it may mean something
> completely different.

The problem here is that there are at least two patches which touch the
same file(s) and it is impossible to say which patches should be handled
atomically. However, there is no easy way to specify renames as a
single patch. A diff containing swapping of three files looks like this:

	diff --git a/file2 b/file1
	similarity index 100%
	rename from file2
	rename to file1
	diff --git a/file3 b/file2
	similarity index 100%
	rename from file3
	rename to file2
	diff --git a/file1 b/file3
	similarity index 100%
	rename from file1
	rename to file3

BTW: it applies correctly :).

> 
> I am somewhat tempted to say that we should fail all of them,
> including the original "single patch swapping files" brought up by
> Linus.

I may agree that difficult scenarios should be rejected, but I will
also say that git-apply should always accept git-diff output.

> 
> BUT
> 
> Can we make use simple rule to detect problematic cases?
> 
>  - An input to git-apply can contain more than one patch that affects
> a path; however
> 
>    - you cannot create a path that still exists, except for a path
> that _will_ be renamed away or removed (your patch fixes this by
> adding this "except for..." part to loosen the existing rule);
> 
>    - you cannot modify a path in a separate patch if it is involved
> in an either side of a rename (this will catch the ambiguity of patch
> #3 in your first example and #3 and #4 in your second example);

What should happen in following situation:

patch #1: modify A
patch #2: rename A to B

#2 should fail? Now it creates new B which is a copy of A before
applying any patches and modifies A according to #1.

AFAIC, copies and renames are handled differently from normal
modifications (in_fn_table() is not used for them, but
add_to_fn_table() is, so "rename patches don't look in the past, but
have influence upon the future").

> 
>  - In addition:
> 
>    - the same path cannot be renamed from more than once (this will
> catch concatenation of two git generated patches);
> 
> With such a change, I think we can keep the safety of "when there are
> more than one plausible outcomes, the tool shouldn't silently decide,
> nor make progress that the user later needs to undo and redo", while
> allowing a sane use of rename patches generated out of a git commit.
> 
> 

Do you mean that patches which break above rules should be
skipped when "--reject" is set, as other failures? Or that
whole git-apply should fail regardless of "--reject"?


Michal Kiedrowicz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] tests: make test-apply-criss-cross-rename more robust
  2009-04-18 20:58           ` Michał Kiedrowicz
@ 2009-04-18 21:03             ` Michał Kiedrowicz
  2009-04-18 22:05             ` [PATCH] builtin-apply: keep information about files to be deleted Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Michał Kiedrowicz @ 2009-04-18 21:03 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

I realized that this test does check if git-apply succeeds, but doesn't
tell if it applies patches correctly. So I added test_cmp to check it.

I also added a test which checks swapping three files.
---
 t/t4130-apply-criss-cross-rename.sh |   34 +++++++++++++++++++++++++++++++---
 1 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/t/t4130-apply-criss-cross-rename.sh b/t/t4130-apply-criss-cross-rename.sh
index 8623dbe..7cfa2d6 100755
--- a/t/t4130-apply-criss-cross-rename.sh
+++ b/t/t4130-apply-criss-cross-rename.sh
@@ -15,14 +15,17 @@ create_file() {
 test_expect_success 'setup' '
 	create_file file1 "File1 contents" &&
 	create_file file2 "File2 contents" &&
-	git add file1 file2 &&
+	create_file file3 "File3 contents" &&
+	git add file1 file2 file3 &&
 	git commit -m 1
 '
 
 test_expect_success 'criss-cross rename' '
 	mv file1 tmp &&
 	mv file2 file1 &&
-	mv tmp file2
+	mv tmp file2 &&
+	cp file1 file1-swapped &&
+	cp file2 file2-swapped
 '
 
 test_expect_success 'diff -M -B' '
@@ -32,7 +35,32 @@ test_expect_success 'diff -M -B' '
 '
 
 test_expect_success 'apply' '
-	git apply diff
+	git apply diff &&
+	test_cmp file1 file1-swapped &&
+	test_cmp file2 file2-swapped
+'
+
+test_expect_success 'criss-cross rename' '
+	git reset --hard &&
+	mv file1 tmp &&
+	mv file2 file1 &&
+	mv file3 file2
+	mv tmp file3 &&
+	cp file1 file1-swapped &&
+	cp file2 file2-swapped &&
+	cp file3 file3-swapped
+'
+
+test_expect_success 'diff -M -B' '
+	git diff -M -B > diff &&
+	git reset --hard
+'
+
+test_expect_success 'apply' '
+	git apply diff &&
+	test_cmp file1 file1-swapped &&
+	test_cmp file2 file2-swapped &&
+	test_cmp file3 file3-swapped
 '
 
 test_done
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] builtin-apply: keep information about files to be deleted
  2009-04-18 20:58           ` Michał Kiedrowicz
  2009-04-18 21:03             ` [PATCH] tests: make test-apply-criss-cross-rename more robust Michał Kiedrowicz
@ 2009-04-18 22:05             ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2009-04-18 22:05 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>
>> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
>>   
>>> ... However, there are some cases when these two
>>> rules may cause problems:
>>>
>>> patch #1: rename A to B
>>> patch #2: rename C to A
>>> patch #3: modify A
>>>
>>> Should patch #3 modify B (which was A) or A (which was C)?
>>>
>>> patch #1: rename A to B
>>> patch #2: rename B to A
>>> patch #3: modify A
>>> patch #4: modify B
>>>
>>> Which files should be patched by #3 and #4?
>>>
>>> In my opinion both #3 and #4 should fail (or both should succeed) --
>>> with my patch only #3 will work and #4 will be rejected, because in
>>> #2 B was marked as deleted.  
>> 
>> Both of the examples above cannot be emitted as a single commit by
>> format-patch; the user is feeding a combined patch.  Perhaps renames
>> in each example sequence were came from one git commit but
>> modifications are from separate commit or handcrafted "follow-up"
>> patch.
>
> Yes, that's true. In "normal" case, renames and modifications should be
> handled properly and (generally) aren't subject of this discussion.
>
>>
>> There are two stances we can take:
>> 
>>  (1) The user knows what he is doing.
>> 
>>      In the first example, if he wanted the change in #3 to end up in
>> B, he would have arranged the patches in a different order, namely, 3
>> 1 2, but he didn't.  We should modify A (that came from C).
>> 
>>  (2) In situations like these when it is unusual and there is no
>> clear and unambiguous answer, the rule has always been "fail and ask
>> the user to clean up", because silently doing a wrong thing in an
>> unusual situation that happens only once in a while is far worse than
>>      interrupting the user and forcing a manual intervention.
>> 
>>      In the first example, there is no clear answer.  Perhaps all
>> three patches were independent patches (the first two obviously came
>> from git because only we can do renames, but they may have been
>> separate commits), and the user may have reordered them (or just
>> picked a random order because he was linearizing a history with a
>> merge).
>> 
>> The second one is even iffier.  If we _know_ that originally patch #1
>> and #2 came from the same commit, then they represent swapping
>> between A and B, but if they came from different git commits, and if
>> the user picked patches in a random order, it may mean something
>> completely different.
>
> The problem here is that there are at least two patches which touch the
> same file(s) and it is impossible to say which patches should be handled
> atomically. However, there is no easy way to specify renames as a
> single patch. A diff containing swapping of three files looks like this:
>
> 	diff --git a/file2 b/file1
> 	similarity index 100%
> 	rename from file2
> 	rename to file1
> 	diff --git a/file3 b/file2
> 	similarity index 100%
> 	rename from file3
> 	rename to file2
> 	diff --git a/file1 b/file3
> 	similarity index 100%
> 	rename from file1
> 	rename to file3
>
> BTW: it applies correctly :).
>
>> 
>> I am somewhat tempted to say that we should fail all of them,
>> including the original "single patch swapping files" brought up by
>> Linus.
>
> I may agree that difficult scenarios should be rejected, but I will
> also say that git-apply should always accept git-diff output.
>
>> 
>> BUT
>> 
>> Can we make use simple rule to detect problematic cases?
>> 
>>  - An input to git-apply can contain more than one patch that affects
>> a path; however
>> 
>>    - you cannot create a path that still exists, except for a path
>> that _will_ be renamed away or removed (your patch fixes this by
>> adding this "except for..." part to loosen the existing rule);
>> 
>>    - you cannot modify a path in a separate patch if it is involved
>> in an either side of a rename (this will catch the ambiguity of patch
>> #3 in your first example and #3 and #4 in your second example);
>
> What should happen in following situation:
>
> patch #1: modify A
> patch #2: rename A to B
>
> #2 should fail? Now it creates new B which is a copy of A before
> applying any patches and modifies A according to #1.

Yes.  It is obviously a handcrafted sequence, and it could even have been
mechanically created.

Imagine a merge of two branches like this:

               2----HEAD
              /    /
	common----1

and somebody fed "common..HEAD" to his script that internally runs
format-patch and squashes the patch output into one, perhaps:

	#!/bin/sh
	# Create a single patch e-mail, squashed.
        tmp=/var/tmp/my-squash$$
	rm -rf "$tmp" && mkdir -p "$tmp/out" || exit
        trap 'rm -rf "$tmp"' 0 1 2 3 15
	git format-patch -o "$tmp/out" "$@"
	>"$tmp/all.messages"
        >"$tmp/all.patches"
        for mail in "$tmp"/out/0*
        do
        	git mailinfo "$tmp/msg" "$tmp/patch" >"$tmp/info" <"$mail"
                echo "$mail" >>.messages
                cat "$tmp/msg" >>"$tmp/all.messages"
                cat "$tmp/patch" >>"$tmp/all.patches"
	done
	(
	        cat "$tmp/info"; echo
                cat "$tmp/all.messages"; echo
	        cat "$tmp/all.patches"
	)

Depending on the sort order between #1 and #2, you cannot tell "modify A
and then rename it to B" is the order we will see such a patch at all.  I
think it is safer to reject such a patch with the "when in doubt, do not
act too clever and risk making a silent mistake" principle.

In this particular case, the reverse order of "renaming A to B" and then
"modifing A" would fail anyway, but if you have another patch that renames
C to A in the mix of patches whose order cannot be determined, I think you
can come up with a sequence that results in an "applicable in order but is
the order really what the author intended?" situation.

> Do you mean that patches which break above rules should be
> skipped when "--reject" is set, as other failures? Or that
> whole git-apply should fail regardless of "--reject"?

I meant the latter.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-04-18 22:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-11 19:31 [PATCH] builtin-apply: keep information about files to be deleted Michał Kiedrowicz
2009-04-13 13:51 ` Michał Kiedrowicz
2009-04-13 18:51 ` Junio C Hamano
2009-04-13 21:03   ` Michał Kiedrowicz
2009-04-13 21:30     ` Junio C Hamano
2009-04-17 17:23       ` Michał Kiedrowicz
2009-04-18  4:59         ` Junio C Hamano
2009-04-18 11:27           ` Andreas Ericsson
2009-04-18 19:56             ` Junio C Hamano
2009-04-18 20:58           ` Michał Kiedrowicz
2009-04-18 21:03             ` [PATCH] tests: make test-apply-criss-cross-rename more robust Michał Kiedrowicz
2009-04-18 22:05             ` [PATCH] builtin-apply: keep information about files to be deleted Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).