* [PATCH 0/2] Support for packs in HTTP
@ 2005-07-10 19:53 Daniel Barkalow
2005-07-10 19:55 ` [PATCH 1/2] Management of packs not yet installed Daniel Barkalow
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Daniel Barkalow @ 2005-07-10 19:53 UTC (permalink / raw)
To: git; +Cc: Linus Torvalds, Petr Baudis
This series has one patch which is ready to go in and one that's not
(although it's a reasonable phony for the current state of the git world).
1: Several additional functions are needed in the library to support
progressively getting pack data from some remote location and using it
to determine what else to get.
2: git-http-pull can get packs as appropriate by getting all the index
files first, and then using them to figure out whether the object it's
looking for is in some pack it could get.
Currently, there's no sane way to figure out what pack/index files are
available from an HTTP server. But there only seems to be one pack file
available on an HTTP server at the moment, so this tries to get that
one.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] Management of packs not yet installed
2005-07-10 19:53 [PATCH 0/2] Support for packs in HTTP Daniel Barkalow
@ 2005-07-10 19:55 ` Daniel Barkalow
2005-07-10 19:56 ` [PATCH 2/2] Demo support for packs via HTTP Daniel Barkalow
2005-07-10 23:10 ` [PATCH 0/2] Support for packs in HTTP Junio C Hamano
2 siblings, 0 replies; 15+ messages in thread
From: Daniel Barkalow @ 2005-07-10 19:55 UTC (permalink / raw)
To: git; +Cc: Linus Torvalds, Petr Baudis
Support for parsing index files without pack files, installing pack
files while running, and checking what pack files are available.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
---
commit b686d7a0377c24e05dbed0dafe909dda6c3dfb48
tree ce285b1a0adb4f8d415f72668a77bc1f1f92e1e1
parent 167a4a3308f4a1606e268c2204c98d6999046ae0
author Daniel Barkalow <barkalow@iabervon.org> 1121024924 -0400
committer Daniel Barkalow <barkalow@silva-tulga.(none)> 1121024924 -0400
Index: cache.h
===================================================================
--- dbae854c7c91182c8a124d0b85d802945d1c6223/cache.h (mode:100644 sha1:84d43d366c6145a30865aa65d92ada88ab95bb9f)
+++ ce285b1a0adb4f8d415f72668a77bc1f1f92e1e1/cache.h (mode:100644 sha1:719a77dfabb24e58abd21b7f3a4b846a114e000a)
@@ -161,6 +161,8 @@
extern char *mkpath(const char *fmt, ...);
extern char *git_path(const char *fmt, ...);
extern char *sha1_file_name(const unsigned char *sha1);
+extern char *sha1_pack_name(const unsigned char *sha1);
+extern char *sha1_pack_index_name(const unsigned char *sha1);
int safe_create_leading_directories(char *path);
@@ -189,6 +191,9 @@
extern int has_sha1_pack(const unsigned char *sha1);
extern int has_sha1_file(const unsigned char *sha1);
+extern int has_pack_file(const unsigned char *sha1);
+extern int has_pack_index(const unsigned char *sha1);
+
/* Convert to/from hex/sha1 representation */
extern int get_sha1(const char *str, unsigned char *sha1);
extern int get_sha1_hex(const char *hex, unsigned char *sha1);
@@ -260,6 +265,7 @@
void *pack_base;
unsigned int pack_last_used;
unsigned int pack_use_cnt;
+ unsigned char sha1[20];
char pack_name[0]; /* something like ".git/objects/pack/xxxxx.pack" */
} *packed_git;
@@ -274,7 +280,14 @@
extern int path_match(const char *path, int nr, char **match);
extern int get_ack(int fd, unsigned char *result_sha1);
+extern struct packed_git *parse_pack_index(unsigned char *sha1);
+
extern void prepare_packed_git(void);
+extern void install_packed_git(struct packed_git *pack);
+
+struct packed_git *find_sha1_pack(const unsigned char *sha1,
+ struct packed_git *packs);
+
extern int use_packed_git(struct packed_git *);
extern void unuse_packed_git(struct packed_git *);
extern struct packed_git *add_packed_git(char *, int);
Index: sha1_file.c
===================================================================
--- dbae854c7c91182c8a124d0b85d802945d1c6223/sha1_file.c (mode:100644 sha1:b2914dd2ea629ae974fd4b4c272e77cb04e5c0e0)
+++ ce285b1a0adb4f8d415f72668a77bc1f1f92e1e1/sha1_file.c (mode:100644 sha1:27136fdba0fbf2dd943f2634cb49660cdbf95ec4)
@@ -200,6 +200,56 @@
return base;
}
+char *sha1_pack_name(const unsigned char *sha1)
+{
+ static const char hex[] = "0123456789abcdef";
+ static char *name, *base, *buf;
+ int i;
+
+ if (!base) {
+ const char *sha1_file_directory = get_object_directory();
+ int len = strlen(sha1_file_directory);
+ base = xmalloc(len + 60);
+ sprintf(base, "%s/pack/pack-1234567890123456789012345678901234567890.pack", sha1_file_directory);
+ name = base + len + 11;
+ }
+
+ buf = name;
+
+ for (i = 0; i < 20; i++) {
+ unsigned int val = *sha1++;
+ *buf++ = hex[val >> 4];
+ *buf++ = hex[val & 0xf];
+ }
+
+ return base;
+}
+
+char *sha1_pack_index_name(const unsigned char *sha1)
+{
+ static const char hex[] = "0123456789abcdef";
+ static char *name, *base, *buf;
+ int i;
+
+ if (!base) {
+ const char *sha1_file_directory = get_object_directory();
+ int len = strlen(sha1_file_directory);
+ base = xmalloc(len + 60);
+ sprintf(base, "%s/pack/pack-1234567890123456789012345678901234567890.idx", sha1_file_directory);
+ name = base + len + 11;
+ }
+
+ buf = name;
+
+ for (i = 0; i < 20; i++) {
+ unsigned int val = *sha1++;
+ *buf++ = hex[val >> 4];
+ *buf++ = hex[val & 0xf];
+ }
+
+ return base;
+}
+
struct alternate_object_database *alt_odb;
/*
@@ -360,6 +410,14 @@
int use_packed_git(struct packed_git *p)
{
+ if (!p->pack_size) {
+ struct stat st;
+ // We created the struct before we had the pack
+ stat(p->pack_name, &st);
+ if (!S_ISREG(st.st_mode))
+ die("packfile %s not a regular file", p->pack_name);
+ p->pack_size = st.st_size;
+ }
if (!p->pack_base) {
int fd;
struct stat st;
@@ -387,8 +445,10 @@
* this is cheap.
*/
if (memcmp((char*)(p->index_base) + p->index_size - 40,
- p->pack_base + p->pack_size - 20, 20))
+ p->pack_base + p->pack_size - 20, 20)) {
+
die("packfile %s does not match index.", p->pack_name);
+ }
}
p->pack_last_used = pack_used_ctr++;
p->pack_use_cnt++;
@@ -426,6 +486,37 @@
return p;
}
+struct packed_git *parse_pack_index(unsigned char *sha1)
+{
+ struct packed_git *p;
+ unsigned long idx_size;
+ void *idx_map;
+ char *path = sha1_pack_index_name(sha1);
+
+ if (check_packed_git_idx(path, &idx_size, &idx_map))
+ return NULL;
+
+ path = sha1_pack_name(sha1);
+
+ p = xmalloc(sizeof(*p) + strlen(path) + 2);
+ strcpy(p->pack_name, path);
+ p->index_size = idx_size;
+ p->pack_size = 0;
+ p->index_base = idx_map;
+ p->next = NULL;
+ p->pack_base = NULL;
+ p->pack_last_used = 0;
+ p->pack_use_cnt = 0;
+ memcpy(p->sha1, sha1, 20);
+ return p;
+}
+
+void install_packed_git(struct packed_git *pack)
+{
+ pack->next = packed_git;
+ packed_git = pack;
+}
+
static void prepare_packed_git_one(char *objdir)
{
char path[PATH_MAX];
@@ -999,6 +1090,20 @@
return 0;
}
+struct packed_git *find_sha1_pack(const unsigned char *sha1,
+ struct packed_git *packs)
+{
+ struct packed_git *p;
+ struct pack_entry e;
+
+ for (p = packs; p; p = p->next) {
+ if (find_pack_entry_one(sha1, &e, p))
+ return p;
+ }
+ return NULL;
+
+}
+
int sha1_object_info(const unsigned char *sha1, char *type, unsigned long *sizep)
{
int status;
@@ -1283,6 +1388,22 @@
return 0;
}
+int has_pack_index(const unsigned char *sha1)
+{
+ struct stat st;
+ if (stat(sha1_pack_index_name(sha1), &st))
+ return 0;
+ return 1;
+}
+
+int has_pack_file(const unsigned char *sha1)
+{
+ struct stat st;
+ if (stat(sha1_pack_name(sha1), &st))
+ return 0;
+ return 1;
+}
+
int has_sha1_pack(const unsigned char *sha1)
{
struct pack_entry e;
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/2] Demo support for packs via HTTP
2005-07-10 19:53 [PATCH 0/2] Support for packs in HTTP Daniel Barkalow
2005-07-10 19:55 ` [PATCH 1/2] Management of packs not yet installed Daniel Barkalow
@ 2005-07-10 19:56 ` Daniel Barkalow
2005-07-11 21:49 ` Darrin Thompson
2005-07-10 23:10 ` [PATCH 0/2] Support for packs in HTTP Junio C Hamano
2 siblings, 1 reply; 15+ messages in thread
From: Daniel Barkalow @ 2005-07-10 19:56 UTC (permalink / raw)
To: git; +Cc: Linus Torvalds, Petr Baudis
Support for downloading the pack file
"e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135" when appropriate. (Will
support other pack files when the repository has a list of them.)
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
---
commit 74132562a2f6cfce9690a5091de7e85bd51d88af
tree c0ae9cb936abac4412aa4a89928f4609d111fd2c
parent b686d7a0377c24e05dbed0dafe909dda6c3dfb48
author Daniel Barkalow <barkalow@iabervon.org> 1121024943 -0400
committer Daniel Barkalow <barkalow@silva-tulga.(none)> 1121024943 -0400
Index: http-pull.c
===================================================================
--- ce285b1a0adb4f8d415f72668a77bc1f1f92e1e1/http-pull.c (mode:100644 sha1:1f9d60b9b1d5eed85b24d96c240666bbfc5a22ed)
+++ c0ae9cb936abac4412aa4a89928f4609d111fd2c/http-pull.c (mode:100644 sha1:2a8d7e71d9447483668cb4a1eb01a096e736f8e3)
@@ -56,6 +56,126 @@
return size;
}
+static int got_indices = 0;
+
+static struct packed_git *packs = NULL;
+
+static int fetch_index(unsigned char *sha1)
+{
+ char *filename;
+ char *url;
+
+ FILE *indexfile;
+
+ if (has_pack_index(sha1))
+ return 0;
+
+ if (get_verbosely)
+ fprintf(stderr, "Getting index for pack %s\n",
+ sha1_to_hex(sha1));
+
+ url = xmalloc(strlen(base) + 64);
+ sprintf(url, "%s/objects/pack/pack-%s.idx",
+ base, sha1_to_hex(sha1));
+
+ filename = sha1_pack_index_name(sha1);
+ indexfile = fopen(filename, "w");
+ if (!indexfile)
+ return error("Unable to open local file %s for pack index",
+ filename);
+
+ curl_easy_setopt(curl, CURLOPT_FILE, indexfile);
+ curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
+ curl_easy_setopt(curl, CURLOPT_URL, url);
+
+ if (curl_easy_perform(curl)) {
+ fclose(indexfile);
+ return error("Unable to get pack index %s", url);
+ }
+
+ fclose(indexfile);
+ return 0;
+}
+
+static int setup_index(unsigned char *sha1)
+{
+ struct packed_git *new_pack;
+ if (has_pack_file(sha1))
+ return 0; // don't list this as something we can get
+
+ if (fetch_index(sha1))
+ return -1;
+
+ new_pack = parse_pack_index(sha1);
+ new_pack->next = packs;
+ packs = new_pack;
+ return 0;
+}
+
+static int fetch_indices(void)
+{
+ unsigned char sha1[20];
+ if (got_indices)
+ return 0;
+ get_sha1_hex("e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135", sha1);
+ setup_index(sha1);
+ got_indices = 1;
+ return 0;
+}
+
+static int fetch_pack(unsigned char *sha1)
+{
+ char *url;
+ struct packed_git *target;
+ struct packed_git **lst;
+ FILE *packfile;
+ char *filename;
+
+ if (fetch_indices())
+ return -1;
+ target = find_sha1_pack(sha1, packs);
+ if (!target)
+ return error("Couldn't get %s: not separate or in any pack",
+ sha1_to_hex(sha1));
+
+ if (get_verbosely) {
+ fprintf(stderr, "Getting pack %s\n",
+ sha1_to_hex(target->sha1));
+ fprintf(stderr, " which contains %s\n",
+ sha1_to_hex(sha1));
+ }
+
+ url = xmalloc(strlen(base) + 65);
+ sprintf(url, "%s/objects/pack/pack-%s.pack",
+ base, sha1_to_hex(target->sha1));
+
+ filename = sha1_pack_name(target->sha1);
+ packfile = fopen(filename, "w");
+ if (!packfile)
+ return error("Unable to open local file %s for pack",
+ filename);
+
+ curl_easy_setopt(curl, CURLOPT_FILE, packfile);
+ curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
+ curl_easy_setopt(curl, CURLOPT_URL, url);
+
+ if (curl_easy_perform(curl)) {
+ fclose(packfile);
+ return error("Unable to get pack file %s", url);
+ }
+
+ fclose(packfile);
+
+ install_packed_git(target);
+
+ lst = &packs;
+ while (*lst != target)
+ lst = &((*lst)->next);
+ *lst = (*lst)->next;
+
+ return 0;
+}
+
int fetch(unsigned char *sha1)
{
char *hex = sha1_to_hex(sha1);
@@ -67,7 +187,7 @@
local = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
if (local < 0)
- return error("Couldn't open %s\n", filename);
+ return error("Couldn't open local object %s\n", filename);
memset(&stream, 0, sizeof(stream));
@@ -75,6 +195,7 @@
SHA1_Init(&c);
+ curl_easy_setopt(curl, CURLOPT_FAILONERROR, 1);
curl_easy_setopt(curl, CURLOPT_FILE, NULL);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite_sha1_file);
@@ -90,8 +211,12 @@
curl_easy_setopt(curl, CURLOPT_URL, url);
- if (curl_easy_perform(curl))
- return error("Couldn't get %s for %s\n", url, hex);
+ if (curl_easy_perform(curl)) {
+ unlink(filename);
+ if (fetch_pack(sha1))
+ return error("Tried %s", url);
+ return 0;
+ }
close(local);
inflateEnd(&stream);
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Support for packs in HTTP
2005-07-10 19:53 [PATCH 0/2] Support for packs in HTTP Daniel Barkalow
2005-07-10 19:55 ` [PATCH 1/2] Management of packs not yet installed Daniel Barkalow
2005-07-10 19:56 ` [PATCH 2/2] Demo support for packs via HTTP Daniel Barkalow
@ 2005-07-10 23:10 ` Junio C Hamano
2005-07-10 23:45 ` Linus Torvalds
2 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2005-07-10 23:10 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Linus Torvalds, Petr Baudis, git
Daniel Barkalow <barkalow@iabervon.org> writes:
> This series has one patch which is ready to go in and one that's not
> (although it's a reasonable phony for the current state of the git world).
I like the general direction in which this patch is leading us.
But before going further, I'd like to see a consensus on the
pack naming convention. The "sha1 of packed object names" was
originally introduced to easily avoid the pack name collisions,
but not enforced, so a user could do the following and still
expect things to work:
$ n=`git-pack-objects pk <list-of-objects`
$ mv pk-$n.pack .git/objects/pack/pk.pack
$ mv pk-$n.idx .git/objects/pack/pk.idx
The first part of this patch makes things stricter, and your
packfile under .git/objects/pack _must_ be named pack-X{40}.pack
(I am not saying this is a bad thing). So I would suggest
either:
- droping the packname parameter from git-pack-objects. Make
the packs always named pack-X{40}.pack (or just X{40}.pack);
also have verify-pack to verify the name of the packfile,
and make sure X{40} part of the name matches what it claims
to contain;
- or drop sha1_pack_name() and let the user name the pack any
way he wants.
I am moderately in favor of the former.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Support for packs in HTTP
2005-07-10 23:10 ` [PATCH 0/2] Support for packs in HTTP Junio C Hamano
@ 2005-07-10 23:45 ` Linus Torvalds
2005-07-11 0:20 ` Daniel Barkalow
0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2005-07-10 23:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Daniel Barkalow, Petr Baudis, git
On Sun, 10 Jul 2005, Junio C Hamano wrote:
>
> So I would suggest either:
>
> - droping the packname parameter from git-pack-objects. Make
> the packs always named pack-X{40}.pack (or just X{40}.pack);
Well, regardless, we want to be able to specify which directory to write
them to. We don't necessarily want to write them to the current working
directory, nor do we want to write them to their eventual destination in
.git/objects/pack.
In fact, the main current user ("git repack") really wants to write them
to a temporary file, and one that isn't even called "pack-xxx", since it
ends up doing cleanup with
rm -f .tmp-pack-*
in case a previous re-pack was interrupted (in which case it simply cannor
know what the exact name was supposed to be).
So the "basename" ends up being necessary and meaningful regardless. We do
_not_ want to remove that capability.
> also have verify-pack to verify the name of the packfile,
> and make sure X{40} part of the name matches what it claims
> to contain;
Now, that would be fine, but it can't be done. Not the way things are laid
out. A SHA1 checksum depends on the order the data was checksummed in, and
we don't even save that.
> - or drop sha1_pack_name() and let the user name the pack any
> way he wants.
No, I do want to use a SHA1, because I want to make sure that you can mix
packs in a rsync/wget environment where if two files are named the same,
they'll have the same contents.
So we can make verify-pack check that the pack-name matches the style
"pack-xxxxx." naming convention, but we can't match up the sha1 with
anything.
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Support for packs in HTTP
2005-07-10 23:45 ` Linus Torvalds
@ 2005-07-11 0:20 ` Daniel Barkalow
2005-07-11 0:34 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Daniel Barkalow @ 2005-07-11 0:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git
On Sun, 10 Jul 2005, Linus Torvalds wrote:
> On Sun, 10 Jul 2005, Junio C Hamano wrote:
> >
> > So I would suggest either:
> >
> > - droping the packname parameter from git-pack-objects. Make
> > the packs always named pack-X{40}.pack (or just X{40}.pack);
>
> Well, regardless, we want to be able to specify which directory to write
> them to. We don't necessarily want to write them to the current working
> directory, nor do we want to write them to their eventual destination in
> .git/objects/pack.
>
> In fact, the main current user ("git repack") really wants to write them
> to a temporary file, and one that isn't even called "pack-xxx", since it
> ends up doing cleanup with
>
> rm -f .tmp-pack-*
>
> in case a previous re-pack was interrupted (in which case it simply cannor
> know what the exact name was supposed to be).
>
> So the "basename" ends up being necessary and meaningful regardless. We do
> _not_ want to remove that capability.
Shouldn't we do the same thing we do with object files? I don't see any
difference in desired behavior.
> > also have verify-pack to verify the name of the packfile,
> > and make sure X{40} part of the name matches what it claims
> > to contain;
>
> Now, that would be fine, but it can't be done. Not the way things are laid
> out. A SHA1 checksum depends on the order the data was checksummed in, and
> we don't even save that.
Why not checksum it in a predictable order, either that of the pack file
or the index? We do care that it's something verifiable, so that people
can't cause intentional collisions (for a DoS) just by naming their packs
after existing packs that users might not have downloaded yet.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Support for packs in HTTP
2005-07-11 0:20 ` Daniel Barkalow
@ 2005-07-11 0:34 ` Linus Torvalds
2005-07-11 3:22 ` Daniel Barkalow
0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2005-07-11 0:34 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Junio C Hamano, Petr Baudis, git
On Sun, 10 Jul 2005, Daniel Barkalow wrote:
> On Sun, 10 Jul 2005, Linus Torvalds wrote:
> >
> > Well, regardless, we want to be able to specify which directory to write
> > them to. We don't necessarily want to write them to the current working
> > directory, nor do we want to write them to their eventual destination in
> > .git/objects/pack.
> >
> > In fact, the main current user ("git repack") really wants to write them
> > to a temporary file, and one that isn't even called "pack-xxx", since it
> > ends up doing cleanup with
> >
> > rm -f .tmp-pack-*
> >
> > in case a previous re-pack was interrupted (in which case it simply cannor
> > know what the exact name was supposed to be).
> >
> > So the "basename" ends up being necessary and meaningful regardless. We do
> > _not_ want to remove that capability.
>
> Shouldn't we do the same thing we do with object files? I don't see any
> difference in desired behavior.
Well, the main difference is that pack-files can be used for many things.
For example, a web interface for getting a pack-file between two releases:
say you knew you had version xyzzy, and you want to get version xyzzy+1,
you could do that through webgit some way even with a "stupid" interface.
Kay already had some patch to generate pack-files for something.
The point being that pack-files are _not_ like objects. Pack-files are
meant for communication. Having them in .git/objects/pack is just one
special case.
> Why not checksum it in a predictable order, either that of the pack file
> or the index? We do care that it's something verifiable, so that people
> can't cause intentional collisions (for a DoS) just by naming their packs
> after existing packs that users might not have downloaded yet.
We could sha1-sum the "sorted by SHA1" list, I guess.
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Support for packs in HTTP
2005-07-11 0:34 ` Linus Torvalds
@ 2005-07-11 3:22 ` Daniel Barkalow
2005-07-11 3:37 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Daniel Barkalow @ 2005-07-11 3:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git
On Sun, 10 Jul 2005, Linus Torvalds wrote:
>
>
> On Sun, 10 Jul 2005, Daniel Barkalow wrote:
>
> > On Sun, 10 Jul 2005, Linus Torvalds wrote:
> > >
> > > Well, regardless, we want to be able to specify which directory to write
> > > them to. We don't necessarily want to write them to the current working
> > > directory, nor do we want to write them to their eventual destination in
> > > .git/objects/pack.
> > >
> > > In fact, the main current user ("git repack") really wants to write them
> > > to a temporary file, and one that isn't even called "pack-xxx", since it
> > > ends up doing cleanup with
> > >
> > > rm -f .tmp-pack-*
> > >
> > > in case a previous re-pack was interrupted (in which case it simply cannor
> > > know what the exact name was supposed to be).
> > >
> > > So the "basename" ends up being necessary and meaningful regardless. We do
> > > _not_ want to remove that capability.
> >
> > Shouldn't we do the same thing we do with object files? I don't see any
> > difference in desired behavior.
>
> Well, the main difference is that pack-files can be used for many things.
>
> For example, a web interface for getting a pack-file between two releases:
> say you knew you had version xyzzy, and you want to get version xyzzy+1,
> you could do that through webgit some way even with a "stupid" interface.
> Kay already had some patch to generate pack-files for something.
>
> The point being that pack-files are _not_ like objects. Pack-files are
> meant for communication. Having them in .git/objects/pack is just one
> special case.
Okay, I can see the use for them getting written to arbitrary paths; but I
think that it's worth having a canonical location for a pack that's being
used by the system (either not having been sent anywhere, or after having
been received). Perhaps git-pack-objects should have the base as a
optional argument, with a default of the filename in $GIT_DIR/objects/pack
and an option for sending just the pack file to stdout? I think that
covers everything in order of usefulness, and means that the program deals
with any filename that the user doesn't know in advance.
> > Why not checksum it in a predictable order, either that of the pack file
> > or the index? We do care that it's something verifiable, so that people
> > can't cause intentional collisions (for a DoS) just by naming their packs
> > after existing packs that users might not have downloaded yet.
>
> We could sha1-sum the "sorted by SHA1" list, I guess.
That'd be good; then git-http-pull can validate the hash on the index and
be sure that a matching pack file from a different location still has the
same contents.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Support for packs in HTTP
2005-07-11 3:22 ` Daniel Barkalow
@ 2005-07-11 3:37 ` Linus Torvalds
2005-07-11 4:07 ` Daniel Barkalow
0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2005-07-11 3:37 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Junio C Hamano, Petr Baudis, git
On Sun, 10 Jul 2005, Daniel Barkalow wrote:
>
> Perhaps git-pack-objects should have the base as a optional argument,
> with a default of the filename in $GIT_DIR/objects/pack and an option
> for sending just the pack file to stdout?
You really _mustn't_ try to create the pack directly to the
$GIT_DIR/objects/pack subdirectory - that would make git itself start
possibly using that pack before the index is all done, and that would be
just wrong and nasty.
So you really should _always_ generate the pack somewhere else, and then
move it (pack file first, index file second).
Which is, btw, exactly what "git repack" does, so the solution to the
problem is to just never use git-pack-objects directly if you don't like
the semantics..
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Support for packs in HTTP
2005-07-11 3:37 ` Linus Torvalds
@ 2005-07-11 4:07 ` Daniel Barkalow
2005-07-11 17:44 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Daniel Barkalow @ 2005-07-11 4:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git
On Sun, 10 Jul 2005, Linus Torvalds wrote:
> On Sun, 10 Jul 2005, Daniel Barkalow wrote:
> >
> > Perhaps git-pack-objects should have the base as a optional argument,
> > with a default of the filename in $GIT_DIR/objects/pack and an option
> > for sending just the pack file to stdout?
>
> You really _mustn't_ try to create the pack directly to the
> $GIT_DIR/objects/pack subdirectory - that would make git itself start
> possibly using that pack before the index is all done, and that would be
> just wrong and nasty.
>
> So you really should _always_ generate the pack somewhere else, and then
> move it (pack file first, index file second).
It's currently fine ignoring index files without corresponding
pack files (sha1_file.c, line 470). Do you want to make the constraint
that the pack/ directory doesn't have index files for packs that aren't
also there? (I've been putting the index files for packs that might be
possibile to get there, and relying on the above check to make sure that
they don't affect anything if it hasn't fetched the pack.)
Of course, we should never write to files in locations that anything looks
at; we want everything to appear atomically, completely written and
verified. But there's nothing wrong with having the C code place the
objects, which is certainly going to be necessary in the case of
downloading them by HTTP, since the program will want to place them and
enable them while running.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Support for packs in HTTP
2005-07-11 4:07 ` Daniel Barkalow
@ 2005-07-11 17:44 ` Linus Torvalds
2005-07-11 20:03 ` Daniel Barkalow
0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2005-07-11 17:44 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Junio C Hamano, Petr Baudis, git
On Mon, 11 Jul 2005, Daniel Barkalow wrote:
> On Sun, 10 Jul 2005, Linus Torvalds wrote:
>
> >
> > You really _mustn't_ try to create the pack directly to the
> > $GIT_DIR/objects/pack subdirectory - that would make git itself start
> > possibly using that pack before the index is all done, and that would be
> > just wrong and nasty.
> >
> > So you really should _always_ generate the pack somewhere else, and then
> > move it (pack file first, index file second).
>
> It's currently fine ignoring index files without corresponding
> pack files (sha1_file.c, line 470).
That doesn't help.
Redgardless of which order you write them (and you _will_ write the
pack-file first), you'll find that at some point you have both files, but
one or the other isn't fully written, ie they are unusable.
And yes, you can handle that by always checking the SHA1 of the files when
you open them, but the fact is, you shouldn't need to, just to use it.
Checking the SHA1 of the pack-file in particular is very expensive (since
it's potentially a huge file, and you don't even want to read all of it).
So you could have rules like: pack-file must get populated first, and
index file must be SHA1-checked by all users.
Or, you could just have a saner rule: create the pack-files somewhere
else, and move them atomically to the right place. Problem solved.
So that's what I decided the rule is: never ever have a partial file, and
thus you can by definition use them immediately when you see both files.
But that requires that you write them under another name than the final
one. And since you want that _anyway_ for other uses, you don't hide that
inside "git-pack-objects", but you make it an exported interface.
Linus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] Support for packs in HTTP
2005-07-11 17:44 ` Linus Torvalds
@ 2005-07-11 20:03 ` Daniel Barkalow
0 siblings, 0 replies; 15+ messages in thread
From: Daniel Barkalow @ 2005-07-11 20:03 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git
On Mon, 11 Jul 2005, Linus Torvalds wrote:
>
>
> On Mon, 11 Jul 2005, Daniel Barkalow wrote:
> > On Sun, 10 Jul 2005, Linus Torvalds wrote:
> >
> > >
> > > You really _mustn't_ try to create the pack directly to the
> > > $GIT_DIR/objects/pack subdirectory - that would make git itself start
> > > possibly using that pack before the index is all done, and that would be
> > > just wrong and nasty.
> > >
> > > So you really should _always_ generate the pack somewhere else, and then
> > > move it (pack file first, index file second).
> >
> > It's currently fine ignoring index files without corresponding
> > pack files (sha1_file.c, line 470).
>
> That doesn't help.
Well, it means that the order you move them doesn't matter, because it
will ignore the pair if either hasn't been moved.
> Redgardless of which order you write them (and you _will_ write the
> pack-file first), you'll find that at some point you have both files, but
> one or the other isn't fully written, ie they are unusable.
(Off topic: note that git-http-pull writes the _index_ first, because it
fetches it to determine if it should fetch the pack)
> And yes, you can handle that by always checking the SHA1 of the files when
> you open them, but the fact is, you shouldn't need to, just to use it.
> Checking the SHA1 of the pack-file in particular is very expensive (since
> it's potentially a huge file, and you don't even want to read all of it).
IIRC, we check the size of the pack file and there are hashes around the
ends of the two files which have to match; but this is a die() check, not
an ignore check, so we just crash with a clear error message rather than
doing crazy stuff (like reading from beyond the end of the mmap).
> So that's what I decided the rule is: never ever have a partial file, and
> thus you can by definition use them immediately when you see both files.
>
> But that requires that you write them under another name than the final
> one. And since you want that _anyway_ for other uses, you don't hide that
> inside "git-pack-objects", but you make it an exported interface.
We should never write anything under the final name, anyway, for just this
reason; we already use open/write/close/rename for objects, refs, and
cache (maybe not working directory files, though). I think we're actually
agreeing on this.
My position is that the temporary location should be something like
{final-name}.part, such that it doesn't match *.idx or *.pack beforehand
(so it doesn't look like a complete file that you might want to send to
someone) and it doesn't have to worry about EXDEV on the rename. Also, I
would ideally like to be able to resume an interrupted download, which
means that it would have to find the partial file in a predictable
location, given what it's supposed to contain.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] Demo support for packs via HTTP
2005-07-10 19:56 ` [PATCH 2/2] Demo support for packs via HTTP Daniel Barkalow
@ 2005-07-11 21:49 ` Darrin Thompson
2005-07-12 1:54 ` Daniel Barkalow
0 siblings, 1 reply; 15+ messages in thread
From: Darrin Thompson @ 2005-07-11 21:49 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: git, Linus Torvalds, Petr Baudis
On Sun, 2005-07-10 at 15:56 -0400, Daniel Barkalow wrote:
> + curl_easy_setopt(curl, CURLOPT_FILE, indexfile);
> + curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
> + curl_easy_setopt(curl, CURLOPT_URL, url);
I was hoping to send in a patch which would turn on user auth and turn
off ssl peer verification.
Your (preliminary obviously) patch puts curl handling in two places. Is
there a place were I can safely start working on adding the needed
setopts?
--
Darrin
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] Demo support for packs via HTTP
2005-07-11 21:49 ` Darrin Thompson
@ 2005-07-12 1:54 ` Daniel Barkalow
2005-07-12 13:35 ` Darrin Thompson
0 siblings, 1 reply; 15+ messages in thread
From: Daniel Barkalow @ 2005-07-12 1:54 UTC (permalink / raw)
To: Darrin Thompson; +Cc: git, Linus Torvalds, Petr Baudis
On Mon, 11 Jul 2005, Darrin Thompson wrote:
> On Sun, 2005-07-10 at 15:56 -0400, Daniel Barkalow wrote:
> > + curl_easy_setopt(curl, CURLOPT_FILE, indexfile);
> > + curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, fwrite);
> > + curl_easy_setopt(curl, CURLOPT_URL, url);
>
> I was hoping to send in a patch which would turn on user auth and turn
> off ssl peer verification.
>
> Your (preliminary obviously) patch puts curl handling in two places. Is
> there a place were I can safely start working on adding the needed
> setopts?
If I understand the curl documentation, you should be able to set options
on the curl object when it has just been created, if those options aren't
going to change between requests. Note that I make requests from multiple
places, but I use the same curl object for all of them.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] Demo support for packs via HTTP
2005-07-12 1:54 ` Daniel Barkalow
@ 2005-07-12 13:35 ` Darrin Thompson
0 siblings, 0 replies; 15+ messages in thread
From: Darrin Thompson @ 2005-07-12 13:35 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: git, Linus Torvalds, Petr Baudis
Daniel Barkalow wrote:
> If I understand the curl documentation, you should be able to set options
> on the curl object when it has just been created, if those options aren't
> going to change between requests. Note that I make requests from multiple
> places, but I use the same curl object for all of them.
>
I didn't realize that it was the same object. Great!
--
Darrin
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2005-07-12 13:39 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-10 19:53 [PATCH 0/2] Support for packs in HTTP Daniel Barkalow
2005-07-10 19:55 ` [PATCH 1/2] Management of packs not yet installed Daniel Barkalow
2005-07-10 19:56 ` [PATCH 2/2] Demo support for packs via HTTP Daniel Barkalow
2005-07-11 21:49 ` Darrin Thompson
2005-07-12 1:54 ` Daniel Barkalow
2005-07-12 13:35 ` Darrin Thompson
2005-07-10 23:10 ` [PATCH 0/2] Support for packs in HTTP Junio C Hamano
2005-07-10 23:45 ` Linus Torvalds
2005-07-11 0:20 ` Daniel Barkalow
2005-07-11 0:34 ` Linus Torvalds
2005-07-11 3:22 ` Daniel Barkalow
2005-07-11 3:37 ` Linus Torvalds
2005-07-11 4:07 ` Daniel Barkalow
2005-07-11 17:44 ` Linus Torvalds
2005-07-11 20:03 ` Daniel Barkalow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).