* [PATCH/RFC] fast-import: disallow empty branches as parents
@ 2012-06-20 19:34 Dmitry Ivankov
2012-06-20 19:34 ` [PATCH] " Dmitry Ivankov
2012-06-25 8:00 ` [PATCH/RFC] " Dmitry Ivankov
0 siblings, 2 replies; 5+ messages in thread
From: Dmitry Ivankov @ 2012-06-20 19:34 UTC (permalink / raw)
To: git; +Cc: Jonathan Nieder, Jeff King, Sverre Rabbelier, Dmitry Ivankov
Combinations of "reset", "commit" with "from" and/or "merge" commands
may make fast-import to produce bad objects (null_sha1 parents) or
accept bad inputs (ones asking for empty branches as parents).
Fix this and add some tests.
One RFC here: does following use case make any sense/should it be allowed?
commit refs/heads/master
...
from something
merge refs/heads/master
1. If "from" is omitted or equals refs/heads/master we end up with duplicated parents.
2. And if something is not master we allow to pick a new first parent path.
"2" seems quite legal, while "1" looks worse. Though "1" is not directly related to
this patch and can be reproduced via a simple "merge X X" command for example.
Dmitry Ivankov (1):
fast-import: disallow empty branches as parents
fast-import.c | 49 +++++++++++++++++++++++++++++------------------
t/t9300-fast-import.sh | 48 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 78 insertions(+), 19 deletions(-)
--
1.7.3.4
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] fast-import: disallow empty branches as parents
2012-06-20 19:34 [PATCH/RFC] fast-import: disallow empty branches as parents Dmitry Ivankov
@ 2012-06-20 19:34 ` Dmitry Ivankov
2012-06-21 3:57 ` Jonathan Nieder
2012-06-25 8:00 ` [PATCH/RFC] " Dmitry Ivankov
1 sibling, 1 reply; 5+ messages in thread
From: Dmitry Ivankov @ 2012-06-20 19:34 UTC (permalink / raw)
To: git; +Cc: Jonathan Nieder, Jeff King, Sverre Rabbelier, Dmitry Ivankov
Empty branches (either new or reset-ed) have null_sha1 in fast-import
internals. These null_sha1 heads can slip to the real commit objects.
- parse_merge() had no check against null_sha1, so add it.
- parse_from() didn't have it too. It doesn't cause null_sha1 to slip
because for the first parent there is a null_sha1 check. But still
the input with "from empty_branch" must be rejected, make it so via
adding the check to parse_from().
There is a special case with "merge branch_itself" command. When the
branch_itself is empty it must clearly be rejected. Though in a
presence of "from ..." command it is not. parse_from() writes the new
first parent sha1 to branch->sha1 as a temporary placeholder, which
parse_merge() then picks up. Though this sha1 might have never been a
tip of the branch_itself at all.
Make parse_from() store the first parent sha1 in a temporary buffer
to fix this special case.
Add some tests for all these fixes.
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
fast-import.c | 49 +++++++++++++++++++++++++++++------------------
t/t9300-fast-import.sh | 48 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 78 insertions(+), 19 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index eed97c8..2e089a8 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2540,7 +2540,8 @@ static void file_change_deleteall(struct branch *b)
b->num_notes = 0;
}
-static void parse_from_commit(struct branch *b, char *buf, unsigned long size)
+static void parse_from_commit(struct branch *b, unsigned char *sha1,
+ char *buf, unsigned long size)
{
if (!buf || size < 46)
die("Not a valid commit: %s", sha1_to_hex(b->sha1));
@@ -2551,29 +2552,31 @@ static void parse_from_commit(struct branch *b, char *buf, unsigned long size)
b->branch_tree.versions[1].sha1);
}
-static void parse_from_existing(struct branch *b)
+static void parse_from_existing(struct branch *b, unsigned char *sha1)
{
- if (is_null_sha1(b->sha1)) {
+ if (is_null_sha1(sha1)) {
hashclr(b->branch_tree.versions[0].sha1);
hashclr(b->branch_tree.versions[1].sha1);
} else {
unsigned long size;
char *buf;
- buf = read_object_with_reference(b->sha1,
- commit_type, &size, b->sha1);
- parse_from_commit(b, buf, size);
+ buf = read_object_with_reference(sha1,
+ commit_type, &size, sha1);
+ parse_from_commit(b, sha1, buf, size);
free(buf);
}
}
-static int parse_from(struct branch *b)
+static int parse_from(struct branch *b, unsigned char *sha1out)
{
const char *from;
struct branch *s;
- if (prefixcmp(command_buf.buf, "from "))
+ if (prefixcmp(command_buf.buf, "from ")) {
+ hashclr(sha1out);
return 0;
+ }
if (b->branch_tree.tree) {
release_tree_content_recursive(b->branch_tree.tree);
@@ -2586,7 +2589,10 @@ static int parse_from(struct branch *b)
die("Can't create a branch from itself: %s", b->name);
else if (s) {
unsigned char *t = s->branch_tree.versions[1].sha1;
- hashcpy(b->sha1, s->sha1);
+ if (is_null_sha1(s->sha1))
+ die("Can't create a branch from an empty branch:"
+ " %s from %s", b->name, s->name);
+ hashcpy(sha1out, s->sha1);
hashcpy(b->branch_tree.versions[0].sha1, t);
hashcpy(b->branch_tree.versions[1].sha1, t);
} else if (*from == ':') {
@@ -2594,16 +2600,16 @@ static int parse_from(struct branch *b)
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", idnum);
- hashcpy(b->sha1, oe->idx.sha1);
+ hashcpy(sha1out, oe->idx.sha1);
if (oe->pack_id != MAX_PACK_ID) {
unsigned long size;
char *buf = gfi_unpack_entry(oe, &size);
- parse_from_commit(b, buf, size);
+ parse_from_commit(b, sha1out, buf, size);
free(buf);
} else
- parse_from_existing(b);
- } else if (!get_sha1(from, b->sha1))
- parse_from_existing(b);
+ parse_from_existing(b, sha1out);
+ } else if (!get_sha1(from, sha1out))
+ parse_from_existing(b, sha1out);
else
die("Invalid ref name or SHA1 expression: %s", from);
@@ -2622,9 +2628,11 @@ static struct hash_list *parse_merge(unsigned int *count)
from = strchr(command_buf.buf, ' ') + 1;
n = xmalloc(sizeof(*n));
s = lookup_branch(from);
- if (s)
+ if (s) {
+ if (is_null_sha1(s->sha1))
+ die("Can't merge empty branch: %s", s->name);
hashcpy(n->sha1, s->sha1);
- else if (*from == ':') {
+ } else if (*from == ':') {
uintmax_t idnum = parse_mark_ref_eol(from);
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
@@ -2656,6 +2664,7 @@ static void parse_new_commit(void)
{
static struct strbuf msg = STRBUF_INIT;
struct branch *b;
+ unsigned char sha1[20];
char *sp;
char *author = NULL;
char *committer = NULL;
@@ -2683,7 +2692,7 @@ static void parse_new_commit(void)
die("Expected committer but didn't get one");
parse_data(&msg, 0, NULL);
read_next_command();
- parse_from(b);
+ parse_from(b, sha1);
merge_list = parse_merge(&merge_count);
/* ensure the branch is active/loaded */
@@ -2730,7 +2739,9 @@ static void parse_new_commit(void)
strbuf_reset(&new_data);
strbuf_addf(&new_data, "tree %s\n",
sha1_to_hex(b->branch_tree.versions[1].sha1));
- if (!is_null_sha1(b->sha1))
+ if (!is_null_sha1(sha1))
+ strbuf_addf(&new_data, "parent %s\n", sha1_to_hex(sha1));
+ else if (!is_null_sha1(b->sha1))
strbuf_addf(&new_data, "parent %s\n", sha1_to_hex(b->sha1));
while (merge_list) {
struct hash_list *next = merge_list->next;
@@ -2855,7 +2866,7 @@ static void parse_reset_branch(void)
else
b = new_branch(sp);
read_next_command();
- parse_from(b);
+ parse_from(b, b->sha1);
if (command_buf.len > 0)
unread_command_buf = 1;
}
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 2aa1824..5f25c01 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2895,6 +2895,54 @@ test_expect_success 'S: merge with garbage after mark must fail' '
test_i18ngrep "after mark" err
'
+test_expect_success 'S: empty branch as merge parent must fail' '
+ test_must_fail git fast-import <<-EOF 2>err &&
+ commit refs/heads/chicken
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ I am the chicken.
+ COMMIT
+ merge refs/heads/chicken
+ EOF
+ cat err &&
+ test_must_fail git rev-parse --verify refs/heads/chicken^
+'
+
+test_expect_success 'S: empty branch as merge parent must fail (2)' '
+ test_must_fail git fast-import <<-EOF 2>err &&
+ commit refs/heads/egg1
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ I am the egg N1.
+ COMMIT
+
+ commit refs/heads/egg2
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ I am the egg N2.
+ COMMIT
+ from refs/heads/egg1
+ merge refs/heads/egg2
+ EOF
+ cat err &&
+ test_must_fail git rev-parse --verify refs/heads/egg2^2
+'
+
+test_expect_success 'S: empty branch as a parent must fail ' '
+ test_must_fail git fast-import <<-EOF 2>err &&
+ reset refs/heads/egg3
+
+ commit refs/heads/egg4
+ committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data <<COMMIT
+ I am the egg N4.
+ COMMIT
+ from refs/heads/egg3
+ EOF
+ cat err &&
+ test_must_fail git rev-parse --verify refs/heads/egg4^2
+'
+
#
# tag, from markref
#
--
1.7.3.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] fast-import: disallow empty branches as parents
2012-06-20 19:34 ` [PATCH] " Dmitry Ivankov
@ 2012-06-21 3:57 ` Jonathan Nieder
2012-06-21 6:39 ` Dmitry Ivankov
0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Nieder @ 2012-06-21 3:57 UTC (permalink / raw)
To: Dmitry Ivankov; +Cc: git, Jeff King, Sverre Rabbelier, David Barr
Hi Dmitry,
Dmitry Ivankov wrote:
> Empty branches (either new or reset-ed) have null_sha1 in fast-import
> internals. These null_sha1 heads can slip to the real commit objects.
[... nice explanation snipped ...]
Very nice, thanks much for this.
Would it be possible to split this into multiple independent fixes?
See [*] below for one way.
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -2540,7 +2540,8 @@ static void file_change_deleteall(struct branch *b)
> b->num_notes = 0;
> }
>
> -static void parse_from_commit(struct branch *b, char *buf, unsigned long size)
> +static void parse_from_commit(struct branch *b, unsigned char *sha1,
> + char *buf, unsigned long size)
> {
What is happening here? The new argument doesn't seem to be used.
[...]
> @@ -2551,29 +2552,31 @@ static void parse_from_commit(struct branch *b, char *buf, unsigned long size)
> b->branch_tree.versions[1].sha1);
> }
>
> -static void parse_from_existing(struct branch *b)
> +static void parse_from_existing(struct branch *b, unsigned char *sha1)
> {
> - if (is_null_sha1(b->sha1)) {
> + if (is_null_sha1(sha1)) {
> hashclr(b->branch_tree.versions[0].sha1);
> hashclr(b->branch_tree.versions[1].sha1);
> } else {
> unsigned long size;
> char *buf;
>
> - buf = read_object_with_reference(b->sha1,
> - commit_type, &size, b->sha1);
> + buf = read_object_with_reference(sha1,
> + commit_type, &size, sha1);
This seems to be about delaying the effect of "from" so it doesn't
interfere with a "merge" command referring to the same commit.
[...]
> -static int parse_from(struct branch *b)
> +static int parse_from(struct branch *b, unsigned char *sha1out)
> {
> const char *from;
> struct branch *s;
>
> - if (prefixcmp(command_buf.buf, "from "))
> + if (prefixcmp(command_buf.buf, "from ")) {
> + hashclr(sha1out);
> return 0;
> + }
This code path handles the case where there is no "from" after a
"reset" or "commit" command. We clear sha1out to make the calling
convention simple --- sha1out is always written to, and the caller
does not have to worry about initializing it in advance.
I guess this is part of the change that delays the effect of "from"?
[...]
> @@ -2586,7 +2589,10 @@ static int parse_from(struct branch *b)
> die("Can't create a branch from itself: %s", b->name);
> else if (s) {
> unsigned char *t = s->branch_tree.versions[1].sha1;
> + if (is_null_sha1(s->sha1))
> + die("Can't create a branch from an empty branch:"
> + " %s from %s", b->name, s->name);
This seems to be about protecting against "from" with an unborn
branch.
> - hashcpy(b->sha1, s->sha1);
> + hashcpy(sha1out, s->sha1);
Delaying the effect of "from", maybe.
[...]
> @@ -2594,16 +2600,16 @@ static int parse_from(struct branch *b)
> struct object_entry *oe = find_mark(idnum);
> if (oe->type != OBJ_COMMIT)
> die("Mark :%" PRIuMAX " not a commit", idnum);
> - hashcpy(b->sha1, oe->idx.sha1);
> + hashcpy(sha1out, oe->idx.sha1);
Delaying "from" effect?
> if (oe->pack_id != MAX_PACK_ID) {
> unsigned long size;
> char *buf = gfi_unpack_entry(oe, &size);
> - parse_from_commit(b, buf, size);
> + parse_from_commit(b, sha1out, buf, size);
Likewise?
> free(buf);
> } else
> - parse_from_existing(b);
> + parse_from_existing(b, sha1out);
> - } else if (!get_sha1(from, b->sha1))
> + } else if (!get_sha1(from, sha1out))
> - parse_from_existing(b);
> + parse_from_existing(b, sha1out);
Likewise?
[...]
> else
> die("Invalid ref name or SHA1 expression: %s", from);
>
> @@ -2622,9 +2628,11 @@ static struct hash_list *parse_merge(unsigned int *count)
> from = strchr(command_buf.buf, ' ') + 1;
> n = xmalloc(sizeof(*n));
> s = lookup_branch(from);
> - if (s)
> + if (s) {
> + if (is_null_sha1(s->sha1))
> + die("Can't merge empty branch: %s", s->name);
> hashcpy(n->sha1, s->sha1);
> - else if (*from == ':') {
> + } else if (*from == ':') {
Protecting against "merge" with unknown branch.
[...]
> @@ -2656,6 +2664,7 @@ static void parse_new_commit(void)
> {
> static struct strbuf msg = STRBUF_INIT;
> struct branch *b;
> + unsigned char sha1[20];
> char *sp;
> char *author = NULL;
> char *committer = NULL;
> @@ -2683,7 +2692,7 @@ static void parse_new_commit(void)
> die("Expected committer but didn't get one");
> parse_data(&msg, 0, NULL);
> read_next_command();
> - parse_from(b);
> + parse_from(b, sha1);
Delayed "from" effect?
[...]
> @@ -2730,7 +2739,9 @@ static void parse_new_commit(void)
> strbuf_reset(&new_data);
> strbuf_addf(&new_data, "tree %s\n",
> sha1_to_hex(b->branch_tree.versions[1].sha1));
> - if (!is_null_sha1(b->sha1))
> + if (!is_null_sha1(sha1))
> + strbuf_addf(&new_data, "parent %s\n", sha1_to_hex(sha1));
> + else if (!is_null_sha1(b->sha1))
> strbuf_addf(&new_data, "parent %s\n", sha1_to_hex(b->sha1));
Delaying "from"?
[...]
> @@ -2855,7 +2866,7 @@ static void parse_reset_branch(void)
> else
> b = new_branch(sp);
> read_next_command();
> - parse_from(b);
> + parse_from(b, b->sha1);
Likewise?
[*]
So it looks like this would be easier to read as three patches:
1. protecting against "from" of unborn branch
2. protecting against "merge" of unborn branch
3. delaying the effect of "from" to avoid it confusingly changing
the effect of a "merge" in the same commit
(1) and (2) would be no-brainers, while (3) seems more subtle ---
maybe it should be documented to help importers for other version
control systems to know to make the same change?
[...]
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -2895,6 +2895,54 @@ test_expect_success 'S: merge with garbage after mark must fail' '
> test_i18ngrep "after mark" err
> '
>
> +test_expect_success 'S: empty branch as merge parent must fail' '
> + test_must_fail git fast-import <<-EOF 2>err &&
> + commit refs/heads/chicken
> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> + data <<COMMIT
> + I am the chicken.
> + COMMIT
> + merge refs/heads/chicken
> + EOF
> + cat err &&
> + test_must_fail git rev-parse --verify refs/heads/chicken^
There would be no "chicken" branch after this import at all, right?
[...]
> +test_expect_success 'S: empty branch as merge parent must fail (2)' '
> + test_must_fail git fast-import <<-EOF 2>err &&
> + commit refs/heads/egg1
> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> + data <<COMMIT
> + I am the egg N1.
> + COMMIT
> +
> + commit refs/heads/egg2
> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> + data <<COMMIT
> + I am the egg N2.
> + COMMIT
> + from refs/heads/egg1
> + merge refs/heads/egg2
> + EOF
> + cat err &&
> + test_must_fail git rev-parse --verify refs/heads/egg2^2
Likewise for egg2.
[...]
> +test_expect_success 'S: empty branch as a parent must fail ' '
> + test_must_fail git fast-import <<-EOF 2>err &&
> + reset refs/heads/egg3
> +
> + commit refs/heads/egg4
> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> + data <<COMMIT
> + I am the egg N4.
> + COMMIT
> + from refs/heads/egg3
> + EOF
> + cat err &&
> + test_must_fail git rev-parse --verify refs/heads/egg4^2
Likewise for egg4.
If this were split up as described above, I imagine it would be much
easier to read and most of it would move into the "obviously good"
category (I'm still uncertain about some details of the "delayed from"
implementation and haven't checked carefully whether it misses any
spots). What do you think?
Thanks for a pleasant read, and hope that helps,
Jonathan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fast-import: disallow empty branches as parents
2012-06-21 3:57 ` Jonathan Nieder
@ 2012-06-21 6:39 ` Dmitry Ivankov
0 siblings, 0 replies; 5+ messages in thread
From: Dmitry Ivankov @ 2012-06-21 6:39 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Jeff King, Sverre Rabbelier, David Barr
On Thu, Jun 21, 2012 at 9:57 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Hi Dmitry,
>
> Dmitry Ivankov wrote:
>
>> Empty branches (either new or reset-ed) have null_sha1 in fast-import
>> internals. These null_sha1 heads can slip to the real commit objects.
> [... nice explanation snipped ...]
>
> Very nice, thanks much for this.
>
> Would it be possible to split this into multiple independent fixes?
> See [*] below for one way.
>
>> --- a/fast-import.c
>> +++ b/fast-import.c
>> @@ -2540,7 +2540,8 @@ static void file_change_deleteall(struct branch *b)
>> b->num_notes = 0;
>> }
>>
>> -static void parse_from_commit(struct branch *b, char *buf, unsigned long size)
>> +static void parse_from_commit(struct branch *b, unsigned char *sha1,
>> + char *buf, unsigned long size)
>> {
>
> What is happening here? The new argument doesn't seem to be used.
Ow, sorry. In fact it is here for die() messages.
A bit like spaghetti code, maybe die()-s can be moved to the caller.
>
> [...]
>> @@ -2551,29 +2552,31 @@ static void parse_from_commit(struct branch *b, char *buf, unsigned long size)
>> b->branch_tree.versions[1].sha1);
>> }
>>
>> -static void parse_from_existing(struct branch *b)
>> +static void parse_from_existing(struct branch *b, unsigned char *sha1)
>> {
>> - if (is_null_sha1(b->sha1)) {
>> + if (is_null_sha1(sha1)) {
>> hashclr(b->branch_tree.versions[0].sha1);
>> hashclr(b->branch_tree.versions[1].sha1);
>> } else {
>> unsigned long size;
>> char *buf;
>>
>> - buf = read_object_with_reference(b->sha1,
>> - commit_type, &size, b->sha1);
>> + buf = read_object_with_reference(sha1,
>> + commit_type, &size, sha1);
>
> This seems to be about delaying the effect of "from" so it doesn't
> interfere with a "merge" command referring to the same commit.
Kind of, parse_from_*() arrange b->sha1 and b->branch_tree to correspond
the first parent of a new tip. With this patch b->sha1 is left alone
to avoid the
interference. Oh, yes, luckily parse_merge() doesn't need b->branch_tree-s.
>
> [...]
>> -static int parse_from(struct branch *b)
>> +static int parse_from(struct branch *b, unsigned char *sha1out)
>> {
>> const char *from;
>> struct branch *s;
>>
>> - if (prefixcmp(command_buf.buf, "from "))
>> + if (prefixcmp(command_buf.buf, "from ")) {
>> + hashclr(sha1out);
>> return 0;
>> + }
>
> This code path handles the case where there is no "from" after a
> "reset" or "commit" command. We clear sha1out to make the calling
> convention simple --- sha1out is always written to, and the caller
> does not have to worry about initializing it in advance.
>
> I guess this is part of the change that delays the effect of "from"?
>
> [...]
>> @@ -2586,7 +2589,10 @@ static int parse_from(struct branch *b)
>> die("Can't create a branch from itself: %s", b->name);
>> else if (s) {
>> unsigned char *t = s->branch_tree.versions[1].sha1;
>> + if (is_null_sha1(s->sha1))
>> + die("Can't create a branch from an empty branch:"
>> + " %s from %s", b->name, s->name);
>
> This seems to be about protecting against "from" with an unborn
> branch.
>
>> - hashcpy(b->sha1, s->sha1);
>> + hashcpy(sha1out, s->sha1);
>
> Delaying the effect of "from", maybe.
>
> [...]
>> @@ -2594,16 +2600,16 @@ static int parse_from(struct branch *b)
>> struct object_entry *oe = find_mark(idnum);
>> if (oe->type != OBJ_COMMIT)
>> die("Mark :%" PRIuMAX " not a commit", idnum);
>> - hashcpy(b->sha1, oe->idx.sha1);
>> + hashcpy(sha1out, oe->idx.sha1);
>
> Delaying "from" effect?
>
>> if (oe->pack_id != MAX_PACK_ID) {
>> unsigned long size;
>> char *buf = gfi_unpack_entry(oe, &size);
>> - parse_from_commit(b, buf, size);
>> + parse_from_commit(b, sha1out, buf, size);
>
> Likewise?
>
>> free(buf);
>> } else
>> - parse_from_existing(b);
>> + parse_from_existing(b, sha1out);
>> - } else if (!get_sha1(from, b->sha1))
>> + } else if (!get_sha1(from, sha1out))
>> - parse_from_existing(b);
>> + parse_from_existing(b, sha1out);
>
> Likewise?
>
> [...]
>> else
>> die("Invalid ref name or SHA1 expression: %s", from);
>>
>> @@ -2622,9 +2628,11 @@ static struct hash_list *parse_merge(unsigned int *count)
>> from = strchr(command_buf.buf, ' ') + 1;
>> n = xmalloc(sizeof(*n));
>> s = lookup_branch(from);
>> - if (s)
>> + if (s) {
>> + if (is_null_sha1(s->sha1))
>> + die("Can't merge empty branch: %s", s->name);
>> hashcpy(n->sha1, s->sha1);
>> - else if (*from == ':') {
>> + } else if (*from == ':') {
>
> Protecting against "merge" with unknown branch.
>
> [...]
>> @@ -2656,6 +2664,7 @@ static void parse_new_commit(void)
>> {
>> static struct strbuf msg = STRBUF_INIT;
>> struct branch *b;
>> + unsigned char sha1[20];
>> char *sp;
>> char *author = NULL;
>> char *committer = NULL;
>> @@ -2683,7 +2692,7 @@ static void parse_new_commit(void)
>> die("Expected committer but didn't get one");
>> parse_data(&msg, 0, NULL);
>> read_next_command();
>> - parse_from(b);
>> + parse_from(b, sha1);
>
> Delayed "from" effect?
>
> [...]
>> @@ -2730,7 +2739,9 @@ static void parse_new_commit(void)
>> strbuf_reset(&new_data);
>> strbuf_addf(&new_data, "tree %s\n",
>> sha1_to_hex(b->branch_tree.versions[1].sha1));
>> - if (!is_null_sha1(b->sha1))
>> + if (!is_null_sha1(sha1))
>> + strbuf_addf(&new_data, "parent %s\n", sha1_to_hex(sha1));
>> + else if (!is_null_sha1(b->sha1))
>> strbuf_addf(&new_data, "parent %s\n", sha1_to_hex(b->sha1));
>
> Delaying "from"?
>
> [...]
>> @@ -2855,7 +2866,7 @@ static void parse_reset_branch(void)
>> else
>> b = new_branch(sp);
>> read_next_command();
>> - parse_from(b);
>> + parse_from(b, b->sha1);
>
> Likewise?
>
> [*]
> So it looks like this would be easier to read as three patches:
>
> 1. protecting against "from" of unborn branch
> 2. protecting against "merge" of unborn branch
> 3. delaying the effect of "from" to avoid it confusingly changing
> the effect of a "merge" in the same commit
Thanks, will look into it. Without (3) we'll still have inputs asking
for (2) accepted, like:
commit new_tip
...
from old_tip
merge new_tip
new_tip won't end up with null_sha1 parent written, so should be ok to
postpone this case to (3).
>
> (1) and (2) would be no-brainers, while (3) seems more subtle ---
> maybe it should be documented to help importers for other version
> control systems to know to make the same change?
Will add a note on this.
>
> [...]
>> --- a/t/t9300-fast-import.sh
>> +++ b/t/t9300-fast-import.sh
>> @@ -2895,6 +2895,54 @@ test_expect_success 'S: merge with garbage after mark must fail' '
>> test_i18ngrep "after mark" err
>> '
>>
>> +test_expect_success 'S: empty branch as merge parent must fail' '
>> + test_must_fail git fast-import <<-EOF 2>err &&
>> + commit refs/heads/chicken
>> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
>> + data <<COMMIT
>> + I am the chicken.
>> + COMMIT
>> + merge refs/heads/chicken
>> + EOF
>> + cat err &&
>> + test_must_fail git rev-parse --verify refs/heads/chicken^
>
> There would be no "chicken" branch after this import at all, right?
Oh, yes, these rev-parse are all to illustrate the reason why import must fail.
With this one we can add a positive test where chicken is a born branch and
import succeeds.
>
> [...]
>> +test_expect_success 'S: empty branch as merge parent must fail (2)' '
>> + test_must_fail git fast-import <<-EOF 2>err &&
>> + commit refs/heads/egg1
>> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
>> + data <<COMMIT
>> + I am the egg N1.
>> + COMMIT
>> +
>> + commit refs/heads/egg2
>> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
>> + data <<COMMIT
>> + I am the egg N2.
>> + COMMIT
>> + from refs/heads/egg1
>> + merge refs/heads/egg2
>> + EOF
>> + cat err &&
>> + test_must_fail git rev-parse --verify refs/heads/egg2^2
>
> Likewise for egg2.
Adding a positive one here too looks a good idea.
>
> [...]
>> +test_expect_success 'S: empty branch as a parent must fail ' '
>> + test_must_fail git fast-import <<-EOF 2>err &&
>> + reset refs/heads/egg3
>> +
>> + commit refs/heads/egg4
>> + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
>> + data <<COMMIT
>> + I am the egg N4.
>> + COMMIT
>> + from refs/heads/egg3
>> + EOF
>> + cat err &&
>> + test_must_fail git rev-parse --verify refs/heads/egg4^2
>
> Likewise for egg4.
There is a copy-paste typo "egg4^2" instead of "egg4". Will just drop
rev-parse here as there should be a positive merge test somewhere
already.
>
> If this were split up as described above, I imagine it would be much
> easier to read and most of it would move into the "obviously good"
> category (I'm still uncertain about some details of the "delayed from"
> implementation and haven't checked carefully whether it misses any
> spots). What do you think?
Thanks, good idea indeed. Will rearrange and resend.
>
> Thanks for a pleasant read, and hope that helps,
> Jonathan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH/RFC] fast-import: disallow empty branches as parents
2012-06-20 19:34 [PATCH/RFC] fast-import: disallow empty branches as parents Dmitry Ivankov
2012-06-20 19:34 ` [PATCH] " Dmitry Ivankov
@ 2012-06-25 8:00 ` Dmitry Ivankov
1 sibling, 0 replies; 5+ messages in thread
From: Dmitry Ivankov @ 2012-06-25 8:00 UTC (permalink / raw)
To: git
Cc: Jonathan Nieder, Jeff King, Sverre Rabbelier, Dmitry Ivankov,
Shawn Pearce
+Shawn Pearce
On Thu, Jun 21, 2012 at 1:34 AM, Dmitry Ivankov <divanorama@gmail.com> wrote:
> Combinations of "reset", "commit" with "from" and/or "merge" commands
> may make fast-import to produce bad objects (null_sha1 parents) or
> accept bad inputs (ones asking for empty branches as parents).
I was trying to split the patch into small an obvious cases and found out that
there are 4 ways to make or refer to "empty" branches:
1. reset branch_name
2. reset branch_name \n from `0{40}`
3. refer to it as `0{40}`
4. commit branch_name for a new branch name. It's empty up to the
'commit' command end.
Note: I'll use 4 to denote a reference to a branch name from the
'commit' command to this very branch name.
And these branches can be used in 'from' and 'merge' commands.
In 'from' command:
- 1,2,3 are allowed in 'from', no bug of writing 0{40} parent here as
'from' parent is checked against null_sha1
- 4 is not allowed with message "Can't create a branch from itself"
In 'merge' command:
- 1,2 are allowed in 'merge' and lead to 0{40} being actually written
as a commit parent - BUG
- 3 is not allowed with message "Not a valid commit"
- 4 without 'from' is allowed, parent is previous branch tip. May be 0{40} - BUG
- 4 with 'from' is allowed, 'merge' parent is 'from' parent - probably
a bug, also may lead to 0{40} being written - BUG
BUG stuff obviously needs to be fixed, at least we can just skip 0{40}
parents in 'merge' command.
Then difference in 3 between 'from' and 'merge' should be dealt with.
'merge' behaviour comes from
tags/v1.5.0.4~21^2 2f6dc35d2ad0b.. 5 Mar 2007 fast-import: Fail if
a non-existant commit is used for merge
since then 'merge sha1' just looks up commit with this sha1. But
'from' command has it's story too - it was
a 'new_branch' long ago (up to tags/v1.5.0-rc4~14^2~64) and since the
very beginning (tags/v1.5.0-rc4~14^2~76)
is had a special case to allow 0{40} as a parent. So, should we just
allow 'merge 0{40}'?
And finally 4 in 'merge' should probably always refer to the previous
tip to cause less surprise (it was the largest
part of my patch, though it turns out not the most interesting one).
P.S. At first I thought that any reference to an empty branch in
'merge' of 'from' should be rejected, but given that we
allow sha1 0{40} to be used in these and in 2, I guess we should keep
it allowed.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-06-25 8:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-20 19:34 [PATCH/RFC] fast-import: disallow empty branches as parents Dmitry Ivankov
2012-06-20 19:34 ` [PATCH] " Dmitry Ivankov
2012-06-21 3:57 ` Jonathan Nieder
2012-06-21 6:39 ` Dmitry Ivankov
2012-06-25 8:00 ` [PATCH/RFC] " Dmitry Ivankov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).