git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] fast-import: fix pack_id corner cases
@ 2011-09-18 19:01 Dmitry Ivankov
  2011-09-18 19:01 ` [PATCH 1/3] fast-import: die if we produce too many (MAX_PACK_ID) packs Dmitry Ivankov
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Dmitry Ivankov @ 2011-09-18 19:01 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

[1/3] is just a precaution unlikely to happen as having 65536+ packs
produced in fast-import looks extremely rare.
[2/2] is more real bug. Shouldn't be too hard to reproduce, but I'm
currently too lazy to as it is quite rare and not likely to get broken
again.
[3/3] is pure cosmetic change while I'm on pack_id topic.

Dmitry Ivankov (3):
  fast-import: die if we produce too many (MAX_PACK_ID) packs
  fast-import: fix corner case for checkpoint
  fast-import: rename object_count to pack_object_count

 fast-import.c |   29 +++++++++++++++--------------
 1 files changed, 15 insertions(+), 14 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] fast-import: die if we produce too many (MAX_PACK_ID) packs
  2011-09-18 19:01 [PATCH 0/3] fast-import: fix pack_id corner cases Dmitry Ivankov
@ 2011-09-18 19:01 ` Dmitry Ivankov
  2011-09-18 19:17   ` Jonathan Nieder
  2011-09-18 19:01 ` [PATCH 2/3] fast-import: fix corner case for checkpoint Dmitry Ivankov
  2011-09-18 19:01 ` [PATCH 3/3] fast-import: rename object_count to pack_object_count Dmitry Ivankov
  2 siblings, 1 reply; 9+ messages in thread
From: Dmitry Ivankov @ 2011-09-18 19:01 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

In fast-import pack_id is 16-bit with MAX_PACK_ID reserved to identify
pre-existing objects. It is unlikely to wrap under reasonable settings
but still things in fast-import will break once it happens.

Add a check and immediate die() as the simplest reaction to being unable
to continue the import.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 742e7da..907cb05 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1009,6 +1009,8 @@ static void end_packfile(void)
 static void cycle_packfile(void)
 {
 	end_packfile();
+	if (pack_id >= MAX_PACK_ID)
+		die("too many (%u) packs produced", pack_id);
 	start_packfile();
 }
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] fast-import: fix corner case for checkpoint
  2011-09-18 19:01 [PATCH 0/3] fast-import: fix pack_id corner cases Dmitry Ivankov
  2011-09-18 19:01 ` [PATCH 1/3] fast-import: die if we produce too many (MAX_PACK_ID) packs Dmitry Ivankov
@ 2011-09-18 19:01 ` Dmitry Ivankov
  2011-09-18 19:28   ` Jonathan Nieder
  2011-09-18 19:01 ` [PATCH 3/3] fast-import: rename object_count to pack_object_count Dmitry Ivankov
  2 siblings, 1 reply; 9+ messages in thread
From: Dmitry Ivankov @ 2011-09-18 19:01 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

checkpoint command makes fast-import finish current pack and write out
branches/tags and marks. In case no new objects are added in current
pack fast-import falls back to no-op. While it is possible that refs
or marks need to be updated (to point to old objects).

Make fast-import always dump them on checkpoint. But as before do not
cycle_packfile if there are no objects to write.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 907cb05..014a807 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -3025,12 +3025,11 @@ static void parse_ls(struct branch *b)
 static void checkpoint(void)
 {
 	checkpoint_requested = 0;
-	if (object_count) {
+	if (object_count)
 		cycle_packfile();
-		dump_branches();
-		dump_tags();
-		dump_marks();
-	}
+	dump_branches();
+	dump_tags();
+	dump_marks();
 }
 
 static void parse_checkpoint(void)
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] fast-import: rename object_count to pack_object_count
  2011-09-18 19:01 [PATCH 0/3] fast-import: fix pack_id corner cases Dmitry Ivankov
  2011-09-18 19:01 ` [PATCH 1/3] fast-import: die if we produce too many (MAX_PACK_ID) packs Dmitry Ivankov
  2011-09-18 19:01 ` [PATCH 2/3] fast-import: fix corner case for checkpoint Dmitry Ivankov
@ 2011-09-18 19:01 ` Dmitry Ivankov
  2011-09-18 19:32   ` Jonathan Nieder
  2 siblings, 1 reply; 9+ messages in thread
From: Dmitry Ivankov @ 2011-09-18 19:01 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

object_count is used to count objects that'll go to the current pack.
While object_count_by_* are used to count total amount of objects and
are not used to determine if current packfile is empty.

Rename (and move declaration) object_count to pack_object_count to
avoid possible confusion.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 014a807..8f2411b 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -290,7 +290,6 @@ static uintmax_t object_count_by_type[1 << TYPE_BITS];
 static uintmax_t duplicate_count_by_type[1 << TYPE_BITS];
 static uintmax_t delta_count_by_type[1 << TYPE_BITS];
 static uintmax_t delta_count_attempts_by_type[1 << TYPE_BITS];
-static unsigned long object_count;
 static unsigned long branch_count;
 static unsigned long branch_load_count;
 static int failure;
@@ -316,6 +315,7 @@ static struct sha1file *pack_file;
 static struct packed_git *pack_data;
 static struct packed_git **all_packs;
 static off_t pack_size;
+static unsigned long pack_object_count;
 
 /* Table of objects we've written. */
 static unsigned int object_entry_alloc = 5000;
@@ -880,7 +880,7 @@ static void start_packfile(void)
 
 	pack_data = p;
 	pack_size = sizeof(hdr);
-	object_count = 0;
+	pack_object_count = 0;
 
 	all_packs = xrealloc(all_packs, sizeof(*all_packs) * (pack_id + 1));
 	all_packs[pack_id] = p;
@@ -894,17 +894,17 @@ static const char *create_index(void)
 	struct object_entry_pool *o;
 
 	/* Build the table of object IDs. */
-	idx = xmalloc(object_count * sizeof(*idx));
+	idx = xmalloc(pack_object_count * sizeof(*idx));
 	c = idx;
 	for (o = blocks; o; o = o->next_pool)
 		for (e = o->next_free; e-- != o->entries;)
 			if (pack_id == e->pack_id)
 				*c++ = &e->idx;
-	last = idx + object_count;
+	last = idx + pack_object_count;
 	if (c != last)
 		die("internal consistency error creating the index");
 
-	tmpfile = write_idx_file(NULL, idx, object_count, &pack_idx_opts, pack_data->sha1);
+	tmpfile = write_idx_file(NULL, idx, pack_object_count, &pack_idx_opts, pack_data->sha1);
 	free(idx);
 	return tmpfile;
 }
@@ -953,7 +953,7 @@ static void end_packfile(void)
 	struct packed_git *old_p = pack_data, *new_p;
 
 	clear_delta_base_cache();
-	if (object_count) {
+	if (pack_object_count) {
 		unsigned char cur_pack_sha1[20];
 		char *idx_name;
 		int i;
@@ -963,7 +963,7 @@ static void end_packfile(void)
 		close_pack_windows(pack_data);
 		sha1close(pack_file, cur_pack_sha1, 0);
 		fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1,
-				    pack_data->pack_name, object_count,
+				    pack_data->pack_name, pack_object_count,
 				    cur_pack_sha1, pack_size);
 		close(pack_data->pack_fd);
 		idx_name = keep_pack(create_index());
@@ -1103,7 +1103,7 @@ static int store_object(
 	e->type = type;
 	e->pack_id = pack_id;
 	e->idx.offset = pack_size;
-	object_count++;
+	pack_object_count++;
 	object_count_by_type[type]++;
 
 	crc32_begin(pack_file);
@@ -1267,7 +1267,7 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 		e->pack_id = pack_id;
 		e->idx.offset = offset;
 		e->idx.crc32 = crc32_end(pack_file);
-		object_count++;
+		pack_object_count++;
 		object_count_by_type[OBJ_BLOB]++;
 	}
 
@@ -3025,7 +3025,7 @@ static void parse_ls(struct branch *b)
 static void checkpoint(void)
 {
 	checkpoint_requested = 0;
-	if (object_count)
+	if (pack_object_count)
 		cycle_packfile();
 	dump_branches();
 	dump_tags();
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] fast-import: die if we produce too many (MAX_PACK_ID) packs
  2011-09-18 19:01 ` [PATCH 1/3] fast-import: die if we produce too many (MAX_PACK_ID) packs Dmitry Ivankov
@ 2011-09-18 19:17   ` Jonathan Nieder
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Nieder @ 2011-09-18 19:17 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Shawn O. Pearce, David Barr

Dmitry Ivankov wrote:

> In fast-import pack_id is 16-bit with MAX_PACK_ID reserved to identify
> pre-existing objects. It is unlikely to wrap under reasonable settings
> but still things in fast-import will break once it happens.
>
> Add a check and immediate die() as the simplest reaction to being unable
> to continue the import.

Makes a lot of sense.  A few possible minor clarity improvements:

 - missing commas after "In fast-import" and before "with MAX_PACK_ID
   reserved"
 - "pre-existing objects": it would be clearer to say something like
   "objects this fast-import process instance did not write out to a
   packfile", like the comment before gfi_unpack_entry() does
 - I suppose "under reasonable settings" means "with a reasonable
   max-pack-size setting"?
 - "things will break" is a bit vague.
 - "immediate" -> "immediately"

Maybe:

	In fast-import, pack_id is a 16-bit unsigned integer, with MAX_PACK_ID
	(2^16 - 1) reserved for use by objects that are not in a packfile that
	this fast-import process instance wrote.  It is unusual for pack_id to
	hit MAX_PACK_ID with a reasonable --max-pack-size setting, but when it
	does, the pack_id stored in each "struct object_entry" wraps and
	fast-import gets utterly confused.

	Add a check and immediately die() so the operator can at least see what
	went wrong instead of experiencing an unexplained broken import.

With or without that clarification,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks!  A test would still be nice, if someone has time to write one.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] fast-import: fix corner case for checkpoint
  2011-09-18 19:01 ` [PATCH 2/3] fast-import: fix corner case for checkpoint Dmitry Ivankov
@ 2011-09-18 19:28   ` Jonathan Nieder
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Nieder @ 2011-09-18 19:28 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Shawn O. Pearce, David Barr

Dmitry Ivankov wrote:

> checkpoint command makes fast-import finish current pack and write out
> branches/tags and marks. In case no new objects are added in current
> pack fast-import falls back to no-op. While it is possible that refs
> or marks need to be updated (to point to old objects).
>
> Make fast-import always dump them on checkpoint. But as before do not
> cycle_packfile if there are no objects to write.

Yeah, that would be annoying to run into.  Rearranging the description
a little for clarity and brevity:

	fast-import: update refs on checkpoint even if there are no new objects

	During an import using the fast-import command, it is possible for
	no new objects to have been added between two checkpoints requested
	with the SIGUSR1 signal or the "checkpoint" command.  Even in this
	case, fast-import should write out any updated refs and marks to
	fulfill the second checkpoint request.

	As before, fast-import will not write an empty pack and start a new
	one when there are no new objects to write out.

With that change,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] fast-import: rename object_count to pack_object_count
  2011-09-18 19:01 ` [PATCH 3/3] fast-import: rename object_count to pack_object_count Dmitry Ivankov
@ 2011-09-18 19:32   ` Jonathan Nieder
  2011-09-18 19:51     ` Dmitry Ivankov
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Nieder @ 2011-09-18 19:32 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Shawn O. Pearce, David Barr

Dmitry Ivankov wrote:

> object_count is used to count objects that'll go to the current pack.
> While object_count_by_* are used to count total amount of objects and
> are not used to determine if current packfile is empty.
>
> Rename (and move declaration) object_count to pack_object_count to
> avoid possible confusion.

No strong opinion on this one.  I guess the important thing is that
you are moving the declaration to the group of declarations labelled as

	/* The .pack file being generated */

.  Is it important to rename the variable while at it (which will
disrupt other patches in flight using that variable if they exist)?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] fast-import: rename object_count to pack_object_count
  2011-09-18 19:32   ` Jonathan Nieder
@ 2011-09-18 19:51     ` Dmitry Ivankov
  2011-09-18 21:40       ` Jonathan Nieder
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Ivankov @ 2011-09-18 19:51 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Shawn O. Pearce, David Barr

On Mon, Sep 19, 2011 at 1:32 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Dmitry Ivankov wrote:
>
>> object_count is used to count objects that'll go to the current pack.
>> While object_count_by_* are used to count total amount of objects and
>> are not used to determine if current packfile is empty.
>>
>> Rename (and move declaration) object_count to pack_object_count to
>> avoid possible confusion.
>
> No strong opinion on this one.  I guess the important thing is that
> you are moving the declaration to the group of declarations labelled as
>
>        /* The .pack file being generated */
>
> .  Is it important to rename the variable while at it (which will
> disrupt other patches in flight using that variable if they exist)?
Not that important. Maybe a huge comment will do more and better.
object_count++ still appears near object_count_by_type[type]++, but
hopefully one will look for their declarations and thus avoid the confusion.

--- a/fast-import.c
+++ b/fast-import.c
@@ -290,7 +290,6 @@ static uintmax_t object_count_by_type[1 << TYPE_BITS];
 static uintmax_t duplicate_count_by_type[1 << TYPE_BITS];
 static uintmax_t delta_count_by_type[1 << TYPE_BITS];
 static uintmax_t delta_count_attempts_by_type[1 << TYPE_BITS];
-static unsigned long object_count;
 static unsigned long branch_count;
 static unsigned long branch_load_count;
 static int failure;
@@ -310,8 +309,16 @@ static unsigned int atom_cnt;
 static struct atom_str **atom_table;

 /* The .pack file being generated */
+/*
+ * objects that are being written to the current pack
+ * all *must* have current pack_id in struct object_entry.
+ * And object_count *must* be a count of object_entry's
+ * having current pack_id. This data is used to create
+ * index file once current pack_file is finished.
+ */
 static struct pack_idx_option pack_idx_opts;
 static unsigned int pack_id;
+static unsigned long object_count;
 static struct sha1file *pack_file;
 static struct packed_git *pack_data;
 static struct packed_git **all_packs;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] fast-import: rename object_count to pack_object_count
  2011-09-18 19:51     ` Dmitry Ivankov
@ 2011-09-18 21:40       ` Jonathan Nieder
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Nieder @ 2011-09-18 21:40 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Shawn O. Pearce, David Barr

Dmitry Ivankov wrote:

> --- a/fast-import.c
> +++ b/fast-import.c
[...]
> @@ -310,8 +309,16 @@ static unsigned int atom_cnt;
>  static struct atom_str **atom_table;
> 
>  /* The .pack file being generated */
> +/*
> + * objects that are being written to the current pack
> + * all *must* have current pack_id in struct object_entry.
> + * And object_count *must* be a count of object_entry's
> + * having current pack_id. This data is used to create
> + * index file once current pack_file is finished.
> + */
>  static struct pack_idx_option pack_idx_opts;
>  static unsigned int pack_id;
> +static unsigned long object_count;
>  static struct sha1file *pack_file;

Closer.  Now I am tempted to nitpick and say that this should be
a single comment, formatted in complete sentences, and written to
be descriptive rather than normative when possible (since norms
will inevitably change over time, and future readers should not
have an excuse to be afraid to adjust the comment to match code
changes).

	/*
	 * The .pack file being generated
	 *
	 * Objects that are being written to the current pack store the
	 * current value of "pack_id" in struct object_entry.
	 * "object_count" counts the object_entrys with the current
	 * pack_id.  These values are used to create the pack index
	 * file when the current pack is finished.
	 */
	static struct pack_idx_option pack_idx_opts;
	static unsigned int pack_id;
	...

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-09-18 21:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-18 19:01 [PATCH 0/3] fast-import: fix pack_id corner cases Dmitry Ivankov
2011-09-18 19:01 ` [PATCH 1/3] fast-import: die if we produce too many (MAX_PACK_ID) packs Dmitry Ivankov
2011-09-18 19:17   ` Jonathan Nieder
2011-09-18 19:01 ` [PATCH 2/3] fast-import: fix corner case for checkpoint Dmitry Ivankov
2011-09-18 19:28   ` Jonathan Nieder
2011-09-18 19:01 ` [PATCH 3/3] fast-import: rename object_count to pack_object_count Dmitry Ivankov
2011-09-18 19:32   ` Jonathan Nieder
2011-09-18 19:51     ` Dmitry Ivankov
2011-09-18 21:40       ` Jonathan Nieder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).