* [PATCH] Enhance unpack-objects for extracting large objects
@ 2007-05-25 8:20 Dana How
2007-05-25 13:41 ` Nicolas Pitre
2007-05-25 19:22 ` Junio C Hamano
0 siblings, 2 replies; 7+ messages in thread
From: Dana How @ 2007-05-25 8:20 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List, danahow
Nicolas Pitre wrote:
> I wouldn't mind a _separate_ tool that would load a pack index,
> determine object sizes from it, and then extract big objects to write
> them as loose objects ...
Below we add two new options to git-unpack-objects:
--min-blob-size=<n>:: Unpacking is only done for objects
larger than or equal to n kB (uncompressed size by Junio).
--force:: Loose objects will be created even if they
already exist in the repository packed. This is an option
I've wanted before for other reasons.
This passes the tests in "t" but has not yet been used on my large repos.
Based on "next" but should apply to "master" as well.
Signed-off-by: Dana L. How <danahow@gmail.com>
---
Documentation/git-unpack-objects.txt | 17 +++++++++++++----
builtin-unpack-objects.c | 20 ++++++++++++++++++--
cache.h | 2 ++
sha1_file.c | 11 +++++++++--
4 files changed, 42 insertions(+), 8 deletions(-)
diff --git a/Documentation/git-unpack-objects.txt b/Documentation/git-unpack-objects.txt
index ff6184b..4513d8d 100644
--- a/Documentation/git-unpack-objects.txt
+++ b/Documentation/git-unpack-objects.txt
@@ -8,7 +8,7 @@ git-unpack-objects - Unpack objects from a packed archive
SYNOPSIS
--------
-'git-unpack-objects' [-n] [-q] [-r] <pack-file
+'git-unpack-objects' [-n] [-q] [-r] [--force] [--min-blob-size=N] <pack-file
DESCRIPTION
@@ -17,9 +17,10 @@ Read a packed archive (.pack) from the standard input, expanding
the objects contained within and writing them into the repository in
"loose" (one object per file) format.
-Objects that already exist in the repository will *not* be unpacked
-from the pack-file. Therefore, nothing will be unpacked if you use
-this command on a pack-file that exists within the target repository.
+By default, objects that already exist in the repository will *not*
+be unpacked from the pack-file. Therefore, nothing will be unpacked
+if you use this command on a pack-file that exists within the target
+repository, unless you specify --force.
Please see the `git-repack` documentation for options to generate
new packs and replace existing ones.
@@ -40,6 +41,14 @@ OPTIONS
and make the best effort to recover as many objects as
possible.
+--force::
+ Allow loose objects to be created in the same repository that
+ contains the packfile.
+
+--min-blob-size=<n>::
+ Smallest loose object to create, expressed in kB.
+ Blobs smaller than this will not be unpacked. Default is 0.
+
Author
------
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index a6ff62f..a42bf0d 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -10,13 +10,16 @@
#include "progress.h"
static int dry_run, quiet, recover, has_errors;
-static const char unpack_usage[] = "git-unpack-objects [-n] [-q] [-r] < pack-file";
+static const char unpack_usage[] =
+"git-unpack-objects [-n] [-q] [-r] [--force] [--min-blob-size=N] < pack-file";
/* We always read in 4kB chunks. */
static unsigned char buffer[4096];
static unsigned int offset, len;
static off_t consumed_bytes;
static SHA_CTX ctx;
+static int force = 0;
+uint32_t min_blob_size;
/*
* Make sure at least "min" bytes are available in the buffer, and
@@ -131,7 +134,9 @@ static void added_object(unsigned nr, enum object_type type,
static void write_object(unsigned nr, enum object_type type,
void *buf, unsigned long size)
{
- if (write_sha1_file(buf, size, typename(type), obj_list[nr].sha1) < 0)
+ int force2 = size < min_blob_size ? -1 : force;
+ if (write_sha1_file_maybe(buf, size, typename(type),
+ force2, obj_list[nr].sha1) < 0)
die("failed to write object");
added_object(nr, type, buf, size);
}
@@ -361,6 +366,17 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
recover = 1;
continue;
}
+ if (!strcmp(arg, "--force")) {
+ force = 1;
+ continue;
+ }
+ if (!prefixcmp(arg, "--min-blob-size=")) {
+ char *end;
+ min_blob_size = strtoul(arg+16, &end, 0) * 1024;
+ if (!arg[16] || *end)
+ usage(unpack_usage);
+ continue;
+ }
if (!prefixcmp(arg, "--pack_header=")) {
struct pack_header *hdr;
char *c;
diff --git a/cache.h b/cache.h
index ec85d93..d0c3030 100644
--- a/cache.h
+++ b/cache.h
@@ -343,6 +343,8 @@ extern int sha1_object_info(const unsigned char *, unsigned long *);
extern void * read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size);
extern int hash_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1);
extern int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
+extern int write_sha1_file_maybe(void *buf, unsigned long len, const char *type,
+ int ignore, unsigned char *return_sha1);
extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *);
extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type);
diff --git a/sha1_file.c b/sha1_file.c
index 12d2ef2..68b8db8 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1979,7 +1979,8 @@ int hash_sha1_file(const void *buf, unsigned long len, const char *type,
return 0;
}
-int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
+int write_sha1_file_maybe(void *buf, unsigned long len, const char *type,
+ int ignore, unsigned char *returnsha1)
{
int size, ret;
unsigned char *compressed;
@@ -1997,7 +1998,7 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
filename = sha1_file_name(sha1);
if (returnsha1)
hashcpy(returnsha1, sha1);
- if (has_sha1_file(sha1))
+ if (ignore < 0 || !ignore && has_sha1_file(sha1))
return 0;
fd = open(filename, O_RDONLY);
if (fd >= 0) {
@@ -2062,6 +2063,12 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
return move_temp_to_file(tmpfile, filename);
}
+int write_sha1_file(void *buf, unsigned long len, const char *type,
+ unsigned char *returnsha1)
+{
+ return write_sha1_file_maybe(buf, len, type, 0, returnsha1);
+}
+
/*
* We need to unpack and recompress the object for writing
* it out to a different file.
--
1.5.2.762.gd8c6-dirty
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Enhance unpack-objects for extracting large objects
2007-05-25 8:20 [PATCH] Enhance unpack-objects for extracting large objects Dana How
@ 2007-05-25 13:41 ` Nicolas Pitre
2007-05-25 19:22 ` Junio C Hamano
1 sibling, 0 replies; 7+ messages in thread
From: Nicolas Pitre @ 2007-05-25 13:41 UTC (permalink / raw)
To: Dana How; +Cc: Junio C Hamano, Git Mailing List
On Fri, 25 May 2007, Dana How wrote:
>
> Nicolas Pitre wrote:
> > I wouldn't mind a _separate_ tool that would load a pack index,
> > determine object sizes from it, and then extract big objects to write
> > them as loose objects ...
>
> Below we add two new options to git-unpack-objects:
>
> --min-blob-size=<n>:: Unpacking is only done for objects
> larger than or equal to n kB (uncompressed size by Junio).
>
> --force:: Loose objects will be created even if they
> already exist in the repository packed. This is an option
> I've wanted before for other reasons.
>
> This passes the tests in "t" but has not yet been used on my large repos.
> Based on "next" but should apply to "master" as well.
>
> Signed-off-by: Dana L. How <danahow@gmail.com>
This is clever, and the --force option is a nice thing to have. Not
that I find it particularly useful (I'd personally pack large objects
together rather than keeping them loose), but at least having the option
to explode any pack even in a live repository is a good thing to have at
the plumbing level given the simplicity of the patch.
ACK.
> ---
> Documentation/git-unpack-objects.txt | 17 +++++++++++++----
> builtin-unpack-objects.c | 20 ++++++++++++++++++--
> cache.h | 2 ++
> sha1_file.c | 11 +++++++++--
> 4 files changed, 42 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/git-unpack-objects.txt b/Documentation/git-unpack-objects.txt
> index ff6184b..4513d8d 100644
> --- a/Documentation/git-unpack-objects.txt
> +++ b/Documentation/git-unpack-objects.txt
> @@ -8,7 +8,7 @@ git-unpack-objects - Unpack objects from a packed archive
>
> SYNOPSIS
> --------
> -'git-unpack-objects' [-n] [-q] [-r] <pack-file
> +'git-unpack-objects' [-n] [-q] [-r] [--force] [--min-blob-size=N] <pack-file
>
>
> DESCRIPTION
> @@ -17,9 +17,10 @@ Read a packed archive (.pack) from the standard input, expanding
> the objects contained within and writing them into the repository in
> "loose" (one object per file) format.
>
> -Objects that already exist in the repository will *not* be unpacked
> -from the pack-file. Therefore, nothing will be unpacked if you use
> -this command on a pack-file that exists within the target repository.
> +By default, objects that already exist in the repository will *not*
> +be unpacked from the pack-file. Therefore, nothing will be unpacked
> +if you use this command on a pack-file that exists within the target
> +repository, unless you specify --force.
>
> Please see the `git-repack` documentation for options to generate
> new packs and replace existing ones.
> @@ -40,6 +41,14 @@ OPTIONS
> and make the best effort to recover as many objects as
> possible.
>
> +--force::
> + Allow loose objects to be created in the same repository that
> + contains the packfile.
> +
> +--min-blob-size=<n>::
> + Smallest loose object to create, expressed in kB.
> + Blobs smaller than this will not be unpacked. Default is 0.
> +
>
> Author
> ------
> diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
> index a6ff62f..a42bf0d 100644
> --- a/builtin-unpack-objects.c
> +++ b/builtin-unpack-objects.c
> @@ -10,13 +10,16 @@
> #include "progress.h"
>
> static int dry_run, quiet, recover, has_errors;
> -static const char unpack_usage[] = "git-unpack-objects [-n] [-q] [-r] < pack-file";
> +static const char unpack_usage[] =
> +"git-unpack-objects [-n] [-q] [-r] [--force] [--min-blob-size=N] < pack-file";
>
> /* We always read in 4kB chunks. */
> static unsigned char buffer[4096];
> static unsigned int offset, len;
> static off_t consumed_bytes;
> static SHA_CTX ctx;
> +static int force = 0;
> +uint32_t min_blob_size;
>
> /*
> * Make sure at least "min" bytes are available in the buffer, and
> @@ -131,7 +134,9 @@ static void added_object(unsigned nr, enum object_type type,
> static void write_object(unsigned nr, enum object_type type,
> void *buf, unsigned long size)
> {
> - if (write_sha1_file(buf, size, typename(type), obj_list[nr].sha1) < 0)
> + int force2 = size < min_blob_size ? -1 : force;
> + if (write_sha1_file_maybe(buf, size, typename(type),
> + force2, obj_list[nr].sha1) < 0)
> die("failed to write object");
> added_object(nr, type, buf, size);
> }
> @@ -361,6 +366,17 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
> recover = 1;
> continue;
> }
> + if (!strcmp(arg, "--force")) {
> + force = 1;
> + continue;
> + }
> + if (!prefixcmp(arg, "--min-blob-size=")) {
> + char *end;
> + min_blob_size = strtoul(arg+16, &end, 0) * 1024;
> + if (!arg[16] || *end)
> + usage(unpack_usage);
> + continue;
> + }
> if (!prefixcmp(arg, "--pack_header=")) {
> struct pack_header *hdr;
> char *c;
> diff --git a/cache.h b/cache.h
> index ec85d93..d0c3030 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -343,6 +343,8 @@ extern int sha1_object_info(const unsigned char *, unsigned long *);
> extern void * read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size);
> extern int hash_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1);
> extern int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
> +extern int write_sha1_file_maybe(void *buf, unsigned long len, const char *type,
> + int ignore, unsigned char *return_sha1);
> extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *);
>
> extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type);
> diff --git a/sha1_file.c b/sha1_file.c
> index 12d2ef2..68b8db8 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1979,7 +1979,8 @@ int hash_sha1_file(const void *buf, unsigned long len, const char *type,
> return 0;
> }
>
> -int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
> +int write_sha1_file_maybe(void *buf, unsigned long len, const char *type,
> + int ignore, unsigned char *returnsha1)
> {
> int size, ret;
> unsigned char *compressed;
> @@ -1997,7 +1998,7 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
> filename = sha1_file_name(sha1);
> if (returnsha1)
> hashcpy(returnsha1, sha1);
> - if (has_sha1_file(sha1))
> + if (ignore < 0 || !ignore && has_sha1_file(sha1))
> return 0;
> fd = open(filename, O_RDONLY);
> if (fd >= 0) {
> @@ -2062,6 +2063,12 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
> return move_temp_to_file(tmpfile, filename);
> }
>
> +int write_sha1_file(void *buf, unsigned long len, const char *type,
> + unsigned char *returnsha1)
> +{
> + return write_sha1_file_maybe(buf, len, type, 0, returnsha1);
> +}
> +
> /*
> * We need to unpack and recompress the object for writing
> * it out to a different file.
> --
> 1.5.2.762.gd8c6-dirty
>
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Nicolas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Enhance unpack-objects for extracting large objects
2007-05-25 8:20 [PATCH] Enhance unpack-objects for extracting large objects Dana How
2007-05-25 13:41 ` Nicolas Pitre
@ 2007-05-25 19:22 ` Junio C Hamano
2007-05-25 19:49 ` Dana How
2007-05-25 19:52 ` Nicolas Pitre
1 sibling, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2007-05-25 19:22 UTC (permalink / raw)
To: Dana How; +Cc: Git Mailing List
Dana How <danahow@gmail.com> writes:
> Nicolas Pitre wrote:
>> I wouldn't mind a _separate_ tool that would load a pack index,
>> determine object sizes from it, and then extract big objects to write
>> them as loose objects ...
>
> Below we add two new options to git-unpack-objects:
>
> --min-blob-size=<n>:: Unpacking is only done for objects
> larger than or equal to n kB (uncompressed size by Junio).
Elsewhere you wanted to use --max-* and that was counted in megs;
isn't using kilo here and meg there inconsistent?
> --force:: Loose objects will be created even if they
> already exist in the repository packed. This is an option
> I've wanted before for other reasons.
... but if they already exist in the repository as loose
objects, do not replace it.
Usually we do not overwrite existing loose objects and it is one
of the security measure --- if you have an object already, that
cannot be touched by somebody who maliciously creats a hash
colliding loose object and tries to inject it into your
repository via unpack-objects. It's good that you kept this
behaviour intact.
> This passes the tests in "t" but has not yet been used on my large repos.
> Based on "next" but should apply to "master" as well.
>
> Signed-off-by: Dana L. How <danahow@gmail.com>
> ---
> Documentation/git-unpack-objects.txt | 17 +++++++++++++----
> builtin-unpack-objects.c | 20 ++++++++++++++++++--
> cache.h | 2 ++
> sha1_file.c | 11 +++++++++--
> 4 files changed, 42 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/git-unpack-objects.txt b/Documentation/git-unpack-objects.txt
> index ff6184b..4513d8d 100644
> --- a/Documentation/git-unpack-objects.txt
> +++ b/Documentation/git-unpack-objects.txt
> ...
> @@ -17,9 +17,10 @@ Read a packed archive (.pack) from the standard input, expanding
> the objects contained within and writing them into the repository in
> "loose" (one object per file) format.
>
> -Objects that already exist in the repository will *not* be unpacked
> -from the pack-file. Therefore, nothing will be unpacked if you use
> -this command on a pack-file that exists within the target repository.
> +By default, objects that already exist in the repository will *not*
> +be unpacked from the pack-file. Therefore, nothing will be unpacked
> +if you use this command on a pack-file that exists within the target
> +repository, unless you specify --force.
I would want to add:
If an object already exists unpacked in the repository,
it will not be replaced with the copy from the pack,
with or without `--force`.
> diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
> index a6ff62f..a42bf0d 100644
> --- a/builtin-unpack-objects.c
> +++ b/builtin-unpack-objects.c
> @@ -10,13 +10,16 @@
> #include "progress.h"
>
> static int dry_run, quiet, recover, has_errors;
> -static const char unpack_usage[] = "git-unpack-objects [-n] [-q] [-r] < pack-file";
> +static const char unpack_usage[] =
> +"git-unpack-objects [-n] [-q] [-r] [--force] [--min-blob-size=N] < pack-file";
Maybe we would want to call it '-f' for consistency. Another
possibility is the other way around, giving others a longer
synonyms, like --quiet, but this command is plumbing and I do
not think long options matters that much, so my preference is to
do '-f' not '--force'.
> @@ -131,7 +134,9 @@ static void added_object(unsigned nr, enum object_type type,
> static void write_object(unsigned nr, enum object_type type,
> void *buf, unsigned long size)
> {
> - if (write_sha1_file(buf, size, typename(type), obj_list[nr].sha1) < 0)
> + int force2 = size < min_blob_size ? -1 : force;
> + if (write_sha1_file_maybe(buf, size, typename(type),
> + force2, obj_list[nr].sha1) < 0)
> die("failed to write object");
> added_object(nr, type, buf, size);
> }
Without --min-blob-size option, min_blob_size is initialized to
0u and force2 always gets the value of force. With the option,
blobs smaller than the threshold gets -1 and otherwise the value
of force.
"write_sha1_file_maybe()" can take 0, 1, or -1 as its fourth
parameter. The reader is left puzzled what the distinction
among these three and decides to read on to figure it out before
complaining too much about the code, but no matter what it does,
doesn't the above logic already feel wrong?
* You already have the size here, so if min_blob_size is set
and the size is larger, you do not even have to call
write_sha1_file() at all.
* If you do so, write_sha1_file_maybe()'s additional parameter
can be "skip the check to see if we have one packed".
> diff --git a/cache.h b/cache.h
> index ec85d93..d0c3030 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -343,6 +343,8 @@ extern int sha1_object_info(const unsigned char *, unsigned long *);
> extern void * read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size);
> extern int hash_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1);
> extern int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
> +extern int write_sha1_file_maybe(void *buf, unsigned long len, const char *type,
> + int ignore, unsigned char *return_sha1);
> extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *);
>
> extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type);
... and it says "ignore". The reader is still puzzled and reads on...
> diff --git a/sha1_file.c b/sha1_file.c
> index 12d2ef2..68b8db8 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1979,7 +1979,8 @@ int hash_sha1_file(const void *buf, unsigned long len, const char *type,
> return 0;
> }
>
> -int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
> +int write_sha1_file_maybe(void *buf, unsigned long len, const char *type,
> + int ignore, unsigned char *returnsha1)
> {
> int size, ret;
> unsigned char *compressed;
> @@ -1997,7 +1998,7 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
> filename = sha1_file_name(sha1);
> if (returnsha1)
> hashcpy(returnsha1, sha1);
> - if (has_sha1_file(sha1))
> + if (ignore < 0 || !ignore && has_sha1_file(sha1))
> return 0;
> fd = open(filename, O_RDONLY);
> if (fd >= 0) {
So "ignore" means:
negative: never write it out, even if it does not exist.
zero: do not write it out if it is available (in pack,
or loose, either local or alternate), do
write it out otherwise; it is the same
as the current behaviour of write_sha1_file().
positive: always write it out.
That does not sound like "ignore".
My suggestion would be:
> static void write_object(unsigned nr, enum object_type type,
> void *buf, unsigned long size)
> {
if (!min_blob_size || size < min_blob_size) {
if (write_sha1_file_maybe(buf, size, typename(type),
force, obj_list[nr].sha1) < 0)
die("failed to write object");
}
}
added_object(nr, type, buf, size);
> }
And then.
int write_sha1_file_maybe(void *buf, unsigned long len, const char *type,
int make_loose, unsigned char *returnsha1)
{
...
> filename = sha1_file_name(sha1);
> if (returnsha1)
> hashcpy(returnsha1, sha1);
if (!make_loose && has_sha1_file(sha1))
> return 0;
> fd = open(filename, O_RDONLY);
> if (fd >= 0) {
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Enhance unpack-objects for extracting large objects
2007-05-25 19:22 ` Junio C Hamano
@ 2007-05-25 19:49 ` Dana How
2007-05-25 19:59 ` Junio C Hamano
2007-05-25 19:52 ` Nicolas Pitre
1 sibling, 1 reply; 7+ messages in thread
From: Dana How @ 2007-05-25 19:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List, danahow
On 5/25/07, Junio C Hamano <junkio@cox.net> wrote:
> Dana How <danahow@gmail.com> writes:
> > Nicolas Pitre wrote:
> >> I wouldn't mind a _separate_ tool that would load a pack index,
> >> determine object sizes from it, and then extract big objects to write
> >> them as loose objects ...
> >
> > Below we add two new options to git-unpack-objects:
> >
> > --min-blob-size=<n>:: Unpacking is only done for objects
> > larger than or equal to n kB (uncompressed size by Junio).
>
> Elsewhere you wanted to use --max-* and that was counted in megs;
> isn't using kilo here and meg there inconsistent?
For git-repack --max-pack-size=N I used MB to be consistent
with git-fast-import. I think it makes sense to use MB everywhere
we talk about packfile size.
For the old degunking patch's -max-blob-size=N , and this patch's
-min-blob-size=N , I was using KB to describe blob size.
It looked like I needed finer granularity, at least for experiments.
I think if we always use MB for packfile sizes, and KB
for blob sizes, we should be OK.
> > --force:: Loose objects will be created even if they
> > already exist in the repository packed. This is an option
> > I've wanted before for other reasons.
>
> ... but if they already exist in the repository as loose
> objects, do not replace it.
>
> Usually we do not overwrite existing loose objects and it is one
> of the security measure --- if you have an object already, that
> cannot be touched by somebody who maliciously creats a hash
> colliding loose object and tries to inject it into your
> repository via unpack-objects. It's good that you kept this
> behaviour intact.
I agree.
I'll add the "... but" part to the corrected patch's commit msg.
> > -Objects that already exist in the repository will *not* be unpacked
> > -from the pack-file. Therefore, nothing will be unpacked if you use
> > -this command on a pack-file that exists within the target repository.
> > +By default, objects that already exist in the repository will *not*
> > +be unpacked from the pack-file. Therefore, nothing will be unpacked
> > +if you use this command on a pack-file that exists within the target
> > +repository, unless you specify --force.
> I would want to add:
> If an object already exists unpacked in the repository,
> it will not be replaced with the copy from the pack,
> with or without `--force`.
OK, that's clearer.
> > -static const char unpack_usage[] = "git-unpack-objects [-n] [-q] [-r] < pack-file";
> > +static const char unpack_usage[] =
> > +"git-unpack-objects [-n] [-q] [-r] [--force] [--min-blob-size=N] < pack-file";
>
> Maybe we would want to call it '-f' for consistency. Another
> possibility is the other way around, giving others a longer
> synonyms, like --quiet, but this command is plumbing and I do
> not think long options matters that much, so my preference is to
> do '-f' not '--force'.
I picked the longer one only because I didn't view it as frequently used.
But it will be more used than min-blob-size. I'll change it to -f.
> * You already have the size here, so if min_blob_size is set
> and the size is larger, you do not even have to call
> write_sha1_file() at all.
The way I read the code, it looks like unpack-objects needs
the last argument always to be initialized with the SHA-1 computed
from the object contents. Therefore I always need to call
write_sha1_file(), even if I don't want it to write anything.
> So "ignore" means:
> negative: never write it out, even if it does not exist.
> zero: do not write it out if it is available (in pack,
> or loose, either local or alternate), do
> write it out otherwise; it is the same
> as the current behaviour of write_sha1_file().
> positive: always write it out.
> That does not sound like "ignore".
I agree "ignore" is confusing;
I will change it to "when", which is more consistent with the
_maybe prefix on the function name.
> My suggestion would be:
I like this suggestion, but since I need to call the function
to get the SHA-1, I don't think I can follow it.
I'll send you an updated patch in a moment.
Thanks,
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Enhance unpack-objects for extracting large objects
2007-05-25 19:22 ` Junio C Hamano
2007-05-25 19:49 ` Dana How
@ 2007-05-25 19:52 ` Nicolas Pitre
1 sibling, 0 replies; 7+ messages in thread
From: Nicolas Pitre @ 2007-05-25 19:52 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Dana How, Git Mailing List
On Fri, 25 May 2007, Junio C Hamano wrote:
> Maybe we would want to call it '-f' for consistency. Another
> possibility is the other way around, giving others a longer
> synonyms, like --quiet, but this command is plumbing and I do
> not think long options matters that much, so my preference is to
> do '-f' not '--force'.
OTOH, I like to have long options for weird or obscur parameters. Their
action is less likely to be presumed by casual inspection of a script
using them. I don't feel strongly about it either ways though.
> > @@ -131,7 +134,9 @@ static void added_object(unsigned nr, enum object_type type,
> > static void write_object(unsigned nr, enum object_type type,
> > void *buf, unsigned long size)
> > {
> > - if (write_sha1_file(buf, size, typename(type), obj_list[nr].sha1) < 0)
> > + int force2 = size < min_blob_size ? -1 : force;
> > + if (write_sha1_file_maybe(buf, size, typename(type),
> > + force2, obj_list[nr].sha1) < 0)
> > die("failed to write object");
> > added_object(nr, type, buf, size);
> > }
>
> Without --min-blob-size option, min_blob_size is initialized to
> 0u and force2 always gets the value of force. With the option,
> blobs smaller than the threshold gets -1 and otherwise the value
> of force.
>
> "write_sha1_file_maybe()" can take 0, 1, or -1 as its fourth
> parameter. The reader is left puzzled what the distinction
> among these three and decides to read on to figure it out before
> complaining too much about the code, but no matter what it does,
> doesn't the above logic already feel wrong?
>
> * You already have the size here, so if min_blob_size is set
> and the size is larger, you do not even have to call
> write_sha1_file() at all.
You still do to get the object's SHA1.
Nicolas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Enhance unpack-objects for extracting large objects
2007-05-25 19:49 ` Dana How
@ 2007-05-25 19:59 ` Junio C Hamano
2007-05-25 20:05 ` Nicolas Pitre
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2007-05-25 19:59 UTC (permalink / raw)
To: Dana How; +Cc: Nicolas Pitre, Git Mailing List
"Dana How" <danahow@gmail.com> writes:
>> * You already have the size here, so if min_blob_size is set
>> and the size is larger, you do not even have to call
>> write_sha1_file() at all.
> The way I read the code, it looks like unpack-objects needs
> the last argument always to be initialized with the SHA-1 computed
> from the object contents. Therefore I always need to call
> write_sha1_file(), even if I don't want it to write anything.
Ah, that is what I missed.
There is a separate function to only hash, named (surprisingly)
"hash_sha1_file(). Maybe you can teach the caller's "don't
write it out" codepath to call it.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Enhance unpack-objects for extracting large objects
2007-05-25 19:59 ` Junio C Hamano
@ 2007-05-25 20:05 ` Nicolas Pitre
0 siblings, 0 replies; 7+ messages in thread
From: Nicolas Pitre @ 2007-05-25 20:05 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Dana How, Git Mailing List
On Fri, 25 May 2007, Junio C Hamano wrote:
> "Dana How" <danahow@gmail.com> writes:
>
> >> * You already have the size here, so if min_blob_size is set
> >> and the size is larger, you do not even have to call
> >> write_sha1_file() at all.
> > The way I read the code, it looks like unpack-objects needs
> > the last argument always to be initialized with the SHA-1 computed
> > from the object contents. Therefore I always need to call
> > write_sha1_file(), even if I don't want it to write anything.
>
> Ah, that is what I missed.
>
> There is a separate function to only hash, named (surprisingly)
> "hash_sha1_file(). Maybe you can teach the caller's "don't
> write it out" codepath to call it.
That would be clearer indeed.
Nicolas
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-05-25 20:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-25 8:20 [PATCH] Enhance unpack-objects for extracting large objects Dana How
2007-05-25 13:41 ` Nicolas Pitre
2007-05-25 19:22 ` Junio C Hamano
2007-05-25 19:49 ` Dana How
2007-05-25 19:59 ` Junio C Hamano
2007-05-25 20:05 ` Nicolas Pitre
2007-05-25 19:52 ` Nicolas Pitre
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.