* [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
@ 2006-09-23 16:45 Petr Baudis
2006-09-23 16:57 ` A Large Angry SCM
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Petr Baudis @ 2006-09-23 16:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
This patch enables fetching multiple repositories at once over the Git
protocol (and SSH, and locally if git-fetch-pack is your cup of coffee
there). This is done especially for the xorg people who have tons of
repositories and dislike pulls much slower than they were used to with CVS.
I'm eager to hear how this affects the situation.
It's kind of "superproject" thing, basically, taking reverse approach than
the subproject ideas. However, in practice I think it can be used for
subprojects quite well and perhaps if I find some spare time during the day
I will add the lightweight subproject support to Cogito, using this.
So, you need some kind of porcelain for this. The idea is that instead of
telling git-fetch-pack a single repository, you pass multiple --repo=
parameters and refs always "belong" to the latest mentioned repository;
when outputting the new ref values, the appropriate repo is mentioned near
each ref. In order for this to be useful, on your local side you should
share the objects database in some way - either using alternates (then
you must fetch to an object database reachable from everywhere) or symlinked
object databases.
You still need to pass git-fetch-pack some URL in addition to the
repositories - it is used only for git_connect(), the purpose is that
repositories must be local directories so if you want to talk remote, you
need to do something like
git-fetch-pack git://kernel.org/pub/scm/git/git.git --repo=/pub/scm/git/git.git master next --repo=/pub/scm/cogito/cogito.git master
The implementation is, well... rather hackish to say the least. :-)
We can do something nicer if it turns out to be a boost huge enough
in reality too.
The main target of this are the Xorg people who have plenty of small repos
and frequently want to update many of them. It would be nice if someone
from Xorg reading this and concerned with this could measure the
difference.
Signed-off-by: Petr Baudis <pasky@suse.cz>
---
This is an even more awful hack than the previous version, because of
did_refs. I wonder how ugly would it be to have that in stock Git
version, or if there's a good way to eliminate that hack before
proceeding with full libification.
This is a second resend. The first one was from the end of July but
flyed by with not much of a response.
cache.h | 7 ++++-
connect.c | 11 +++++---
daemon.c | 9 +++++--
fetch-pack.c | 58 ++++++++++++++++++++++++++++++---------------
path.c | 5 ++++
peek-remote.c | 4 ++-
refs.c | 6 ++---
send-pack.c | 4 ++-
sha1_file.c | 6 ++---
upload-pack.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++--------
10 files changed, 136 insertions(+), 48 deletions(-)
diff --git a/cache.h b/cache.h
index a7f673d..59cf0ed 100644
--- a/cache.h
+++ b/cache.h
@@ -240,6 +240,7 @@ int git_config_perm(const char *var, con
int adjust_shared_perm(const char *path);
int safe_create_leading_directories(char *path);
char *enter_repo(char *path, int strict);
+int security_repo_check(int export_all_trees);
/* Read and unpack a sha1 file into memory, write memory to a sha1 file */
extern int sha1_object_info(const unsigned char *, char *, unsigned long *);
@@ -332,6 +333,7 @@ extern struct alternate_object_database
char base[FLEX_ARRAY]; /* more */
} *alt_odb_list;
extern void prepare_alt_odb(void);
+extern int link_alt_odb_entry(const char * entry, int len, const char * relative_base, int depth, int allow_self);
extern struct packed_git {
struct packed_git *next;
@@ -359,6 +361,7 @@ struct ref {
unsigned char new_sha1[20];
unsigned char force;
struct ref *peer_ref; /* when renaming */
+ char *repo; /* when doing multi-repo operations */
char name[FLEX_ARRAY]; /* more */
};
@@ -368,11 +371,11 @@ #define REF_TAGS (1u << 2)
extern pid_t git_connect(int fd[2], char *url, const char *prog);
extern int finish_connect(pid_t pid);
-extern int path_match(const char *path, int nr, char **match);
+extern int path_match(const char *path, const char *repo, int nr, char **match, char **match_repo);
extern int match_refs(struct ref *src, struct ref *dst, struct ref ***dst_tail,
int nr_refspec, char **refspec, int all);
extern int get_ack(int fd, unsigned char *result_sha1);
-extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, unsigned int flags);
+extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, char *repo, unsigned int flags);
extern int server_supports(const char *feature);
extern struct packed_git *parse_pack_index(unsigned char *sha1);
diff --git a/connect.c b/connect.c
index c55a20a..0b62406 100644
--- a/connect.c
+++ b/connect.c
@@ -45,9 +45,8 @@ static int check_ref(const char *name, i
*/
struct ref **get_remote_heads(int in, struct ref **list,
int nr_match, char **match,
- unsigned int flags)
+ char *repo, unsigned int flags)
{
- *list = NULL;
for (;;) {
struct ref *ref;
unsigned char old_sha1[20];
@@ -74,11 +73,12 @@ struct ref **get_remote_heads(int in, st
if (!check_ref(name, name_len, flags))
continue;
- if (nr_match && !path_match(name, nr_match, match))
+ if (nr_match && !path_match(name, NULL, nr_match, match, NULL))
continue;
ref = xcalloc(1, sizeof(*ref) + len - 40);
hashcpy(ref->old_sha1, old_sha1);
memcpy(ref->name, buffer + 41, len - 40);
+ ref->repo = repo;
*list = ref;
list = &ref->next;
}
@@ -112,7 +112,8 @@ int get_ack(int fd, unsigned char *resul
die("git-fetch_pack: expected ACK/NAK, got '%s'", line);
}
-int path_match(const char *path, int nr, char **match)
+int path_match(const char *path, const char *repo, int nr, char **match,
+ char **match_repo)
{
int i;
int pathlen = strlen(path);
@@ -121,6 +122,8 @@ int path_match(const char *path, int nr,
char *s = match[i];
int len = strlen(s);
+ if (match_repo && repo != match_repo[i])
+ continue;
if (!len || len > pathlen)
continue;
if (memcmp(path + pathlen - len, s, len))
diff --git a/daemon.c b/daemon.c
index eb4f3f1..d0b44b2 100644
--- a/daemon.c
+++ b/daemon.c
@@ -321,8 +321,7 @@ static int run_service(struct interp *it
* path_ok() uses enter_repo() and does whitelist checking.
* We only need to make sure the repository is exported.
*/
-
- if (!export_all_trees && access("git-daemon-export-ok", F_OK)) {
+ if (!security_repo_check(export_all_trees)) {
logerror("'%s': repository not exported.", path);
errno = EACCES;
return -1;
@@ -355,11 +354,15 @@ static int upload_pack(void)
{
/* Timeout as string */
char timeout_buf[64];
+ char checkexport_buf[64];
snprintf(timeout_buf, sizeof timeout_buf, "--timeout=%u", timeout);
+ strcpy(checkexport_buf, "--check-export");
+ if (export_all_trees) checkexport_buf[1] = '\0'; /* XXX */
+
/* git-upload-pack only ever reads stuff, so this is safe */
- execl_git_cmd("upload-pack", "--strict", timeout_buf, ".", NULL);
+ execl_git_cmd("upload-pack", "--strict", timeout_buf, checkexport_buf, ".", NULL);
return -1;
}
diff --git a/fetch-pack.c b/fetch-pack.c
index 99ac08b..edbdf19 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -282,7 +282,8 @@ static void mark_recent_complete_commits
}
}
-static void filter_refs(struct ref **refs, int nr_match, char **match)
+static void filter_refs(struct ref **refs, int nr_match, char **match,
+ char **match_repo)
{
struct ref **return_refs;
struct ref *newlist = NULL;
@@ -313,7 +314,7 @@ static void filter_refs(struct ref **ref
continue;
}
else {
- int order = path_match(ref->name, nr_match, match);
+ int order = path_match(ref->name, ref->repo, nr_match, match, match_repo);
if (order) {
return_refs[order-1] = ref;
continue; /* we will link it later */
@@ -338,7 +339,8 @@ static void filter_refs(struct ref **ref
*refs = newlist;
}
-static int everything_local(struct ref **refs, int nr_match, char **match)
+static int everything_local(struct ref **refs, int nr_match, char **match,
+ char **match_repo)
{
struct ref *ref;
int retval;
@@ -387,7 +389,7 @@ static int everything_local(struct ref *
}
}
- filter_refs(refs, nr_match, match);
+ filter_refs(refs, nr_match, match, match_repo);
for (retval = 1, ref = *refs; ref ; ref = ref->next) {
const unsigned char *remote = ref->old_sha1;
@@ -415,13 +417,15 @@ static int everything_local(struct ref *
return retval;
}
-static int fetch_pack(int fd[2], int nr_match, char **match)
+static int fetch_pack(int fd[2], int nr_repo, char **repo,
+ int nr_match, char **match, char **match_repo)
{
- struct ref *ref;
+ struct ref *ref = NULL, **refn;
unsigned char sha1[20];
int status;
+ int i;
- get_remote_heads(fd[0], &ref, 0, NULL, 0);
+ refn = get_remote_heads(fd[0], &ref, 0, NULL, nr_repo ? repo[0] : NULL, 0);
if (server_supports("multi_ack")) {
if (verbose)
fprintf(stderr, "Server supports multi_ack\n");
@@ -437,11 +441,19 @@ static int fetch_pack(int fd[2], int nr_
fprintf(stderr, "Server supports side-band\n");
use_sideband = 1;
}
+ if (!server_supports("switch-repo") && nr_repo > 1)
+ die("server does not support fetching multiple repositories at once");
+
+ for (i = 1; i < nr_repo; i++) {
+ packet_write(fd[1], "switch-repo %s", repo[i]);
+ refn = get_remote_heads(fd[0], refn, 0, NULL, repo[i], 0);
+ }
+
if (!ref) {
packet_flush(fd[1]);
die("no matching remote head");
}
- if (everything_local(&ref, nr_match, match)) {
+ if (everything_local(&ref, nr_match, match, match_repo)) {
packet_flush(fd[1]);
goto all_done;
}
@@ -462,8 +474,9 @@ static int fetch_pack(int fd[2], int nr_
all_done:
while (ref) {
- printf("%s %s\n",
- sha1_to_hex(ref->old_sha1), ref->name);
+ printf("%s %s%s%s\n",
+ sha1_to_hex(ref->old_sha1), ref->name,
+ ref->repo ? " " : "", ref->repo ? : "");
ref = ref->next;
}
return 0;
@@ -471,15 +484,16 @@ static int fetch_pack(int fd[2], int nr_
int main(int argc, char **argv)
{
- int i, ret, nr_heads;
- char *dest = NULL, **heads;
+ int i, ret, nr_repo = 0, nr_heads = 0;
+ char *dest = NULL;
+ char **repo = xmalloc(argc * sizeof(*repo));
+ char **heads = xmalloc(argc * sizeof(*heads));
+ char **heads_repo = xmalloc(argc * sizeof(*heads_repo));
int fd[2];
pid_t pid;
setup_git_directory();
- nr_heads = 0;
- heads = NULL;
for (i = 1; i < argc; i++) {
char *arg = argv[i];
@@ -504,16 +518,22 @@ int main(int argc, char **argv)
fetch_all = 1;
continue;
}
+ if (!strncmp("--repo=", arg, 7)) {
+ repo[nr_repo++] = arg + 7;
+ continue;
+ }
if (!strcmp("-v", arg)) {
verbose = 1;
continue;
}
usage(fetch_pack_usage);
}
- dest = arg;
- heads = argv + i + 1;
- nr_heads = argc - i - 1;
- break;
+ if (!dest) {
+ dest = arg;
+ continue;
+ }
+ heads_repo[nr_heads] = nr_repo ? repo[nr_repo - 1] : NULL;
+ heads[nr_heads++] = arg;
}
if (!dest)
usage(fetch_pack_usage);
@@ -522,7 +542,7 @@ int main(int argc, char **argv)
pid = git_connect(fd, dest, exec);
if (pid < 0)
return 1;
- ret = fetch_pack(fd, nr_heads, heads);
+ ret = fetch_pack(fd, nr_repo, repo, nr_heads, heads, heads_repo);
close(fd[0]);
close(fd[1]);
ret |= finish_connect(pid);
diff --git a/path.c b/path.c
index bb89fb0..d669cac 100644
--- a/path.c
+++ b/path.c
@@ -251,6 +251,11 @@ char *enter_repo(char *path, int strict)
return NULL;
}
+int security_repo_check(int export_all_trees)
+{
+ return (export_all_trees || access("git-daemon-export-ok", F_OK));
+}
+
int adjust_shared_perm(const char *path)
{
struct stat st;
diff --git a/peek-remote.c b/peek-remote.c
index 353da00..2cf9e50 100644
--- a/peek-remote.c
+++ b/peek-remote.c
@@ -8,9 +8,9 @@ static const char *exec = "git-upload-pa
static int peek_remote(int fd[2], unsigned flags)
{
- struct ref *ref;
+ struct ref *ref = NULL;
- get_remote_heads(fd[0], &ref, 0, NULL, flags);
+ get_remote_heads(fd[0], &ref, 0, NULL, NULL, flags);
packet_flush(fd[1]);
while (ref) {
diff --git a/refs.c b/refs.c
index 9b3986a..10e588e 100644
--- a/refs.c
+++ b/refs.c
@@ -144,12 +144,12 @@ nextitem:;
return list;
}
+int did_read_loose_refs = 0;
static struct ref_list *get_loose_refs(void)
{
- static int did_refs = 0;
static struct ref_list *refs = NULL;
- if (!did_refs) {
+ if (!did_read_loose_refs) {
/* We need to make sure refs/heads and refs/tags always
* go before everything else. */
char *except[] = {"refs/heads", "refs/tags", NULL};
@@ -167,7 +167,7 @@ static struct ref_list *get_loose_refs(v
refs = other_refs;
}
- did_refs = 1;
+ did_read_loose_refs = 1;
}
return refs;
}
diff --git a/send-pack.c b/send-pack.c
index fbd792c..e1cf58b 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -266,14 +266,14 @@ static int receive_status(int in)
static int send_pack(int in, int out, int nr_refspec, char **refspec)
{
- struct ref *ref;
+ struct ref *ref = NULL;
int new_refs;
int ret = 0;
int ask_for_status_report = 0;
int expect_status_report = 0;
/* No funny business with the matcher */
- remote_tail = get_remote_heads(in, &remote_refs, 0, NULL, REF_NORMAL);
+ remote_tail = get_remote_heads(in, &remote_refs, 0, NULL, NULL, REF_NORMAL);
get_local_heads();
/* Does the other end support the reporting? */
diff --git a/sha1_file.c b/sha1_file.c
index 27b1ebb..14f06c9 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -235,7 +235,7 @@ static void read_info_alternates(const c
* SHA1, an extra slash for the first level indirection, and the
* terminating NUL.
*/
-static int link_alt_odb_entry(const char * entry, int len, const char * relative_base, int depth)
+int link_alt_odb_entry(const char * entry, int len, const char * relative_base, int depth, int allow_self)
{
struct stat st;
const char *objdir = get_object_directory();
@@ -285,7 +285,7 @@ static int link_alt_odb_entry(const char
return -1;
}
}
- if (!memcmp(ent->base, objdir, pfxlen)) {
+ if (!allow_self && !memcmp(ent->base, objdir, pfxlen)) {
free(ent);
return -1;
}
@@ -331,7 +331,7 @@ static void link_alt_odb_entries(const c
relative_base, last);
} else {
link_alt_odb_entry(last, cp - last,
- relative_base, depth);
+ relative_base, depth, 0);
}
}
while (cp < ep && *cp == sep)
diff --git a/upload-pack.c b/upload-pack.c
index 4e53edd..57ae127 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -10,7 +10,7 @@ #include "object.h"
#include "commit.h"
#include "exec_cmd.h"
-static const char upload_pack_usage[] = "git-upload-pack [--strict] [--timeout=nn] <dir>";
+static const char upload_pack_usage[] = "git-upload-pack [--strict] [--timeout=nn] [--check-export] <dir>";
/* bits #0..7 in revision.h, #8..10 in commit.c */
#define THEY_HAVE (1u << 11)
@@ -30,6 +30,8 @@ static unsigned int timeout;
* otherwise maximum packet size (up to 65520 bytes).
*/
static int use_sideband;
+static int check_export;
+static int strict;
static void reset_timeout(void)
{
@@ -455,7 +457,7 @@ static int get_common_commits(void)
}
}
-static void receive_needs(void)
+static char *receive_needs(void)
{
static char line[1000];
int len;
@@ -466,8 +468,10 @@ static void receive_needs(void)
len = packet_read_line(0, line, sizeof(line));
reset_timeout();
if (!len)
- return;
+ return NULL;
+ if (!strncmp("switch-repo ", line, 12))
+ return line+12;
if (strncmp("want ", line, 5) ||
get_sha1_hex(line+5, sha1_buf))
die("git-upload-pack: protocol error, "
@@ -501,7 +505,7 @@ static void receive_needs(void)
static int send_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
{
- static const char *capabilities = "multi_ack thin-pack side-band side-band-64k";
+ static const char *capabilities = "multi_ack thin-pack side-band side-band-64k switch-repo";
struct object *o = parse_object(sha1);
if (!o)
@@ -524,13 +528,60 @@ static int send_ref(const char *refname,
return 0;
}
+extern int did_read_loose_refs;
static void upload_pack(void)
{
- reset_timeout();
- head_ref(send_ref, NULL);
- for_each_ref(send_ref, NULL);
- packet_flush(1);
- receive_needs();
+ int multirepo = 0;
+
+ while (1) {
+ char *repo;
+ char cwd[PATH_MAX];
+
+ reset_timeout();
+ head_ref(send_ref, NULL);
+ for_each_ref(send_ref, NULL);
+ packet_flush(1);
+ repo = receive_needs();
+ if (!repo)
+ /* This is where we go most of the time, unless this is
+ * a multirepo fetch */
+ break;
+ multirepo++;
+
+ fprintf(stderr, "git-upload-pack: switching to repo %s\n", repo);
+
+ /* So that we still find objects of the original repository... */
+ getcwd(cwd, PATH_MAX);
+ if (strlen(cwd) < PATH_MAX - 8)
+ strcat(cwd, "/objects");
+ link_alt_odb_entry(cwd, strlen(cwd), NULL, 0, 1);
+
+ if (!enter_repo(repo, strict) || !security_repo_check(!check_export))
+ die("git-upload-pack: security violation");
+ /* This is a hack and a memory leak. If you are seeing this
+ * in a stock Git version, someone did a horrible mistake. */
+ did_read_loose_refs = 0;
+ }
+
+ if (multirepo) {
+#define ALTENV_SIZE 65536
+ /* Propagate all the repositories to the children */
+ char altenv[ALTENV_SIZE], *p = altenv;
+ struct alternate_object_database *alt;
+ strcpy(p, ALTERNATE_DB_ENVIRONMENT "=");
+ p += sizeof(ALTERNATE_DB_ENVIRONMENT);
+ for (alt = alt_odb_list; alt; alt = alt->next) {
+ strncpy(p, alt->base, alt->name - alt->base);
+ p += alt->name - alt->base;
+ if (p - altenv < ALTENV_SIZE)
+ *p++ = ':';
+ if (p - altenv >= ALTENV_SIZE)
+ die("fetching too many repositories");
+ }
+ p[-1] = '\0';
+ putenv(altenv);
+ }
+
if (want_obj.nr) {
get_common_commits();
create_pack_file();
@@ -541,7 +592,6 @@ int main(int argc, char **argv)
{
char *dir;
int i;
- int strict = 0;
for (i = 1; i < argc; i++) {
char *arg = argv[i];
@@ -556,6 +606,10 @@ int main(int argc, char **argv)
timeout = atoi(arg+10);
continue;
}
+ if (!strcmp(arg, "--check-export")) {
+ check_export = 1;
+ continue;
+ }
if (!strcmp(arg, "--")) {
i++;
break;
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 16:45 [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once Petr Baudis
@ 2006-09-23 16:57 ` A Large Angry SCM
2006-09-23 17:04 ` Petr Baudis
2006-09-23 17:06 ` Petr Baudis
2006-09-23 18:45 ` Jakub Narebski
2 siblings, 1 reply; 13+ messages in thread
From: A Large Angry SCM @ 2006-09-23 16:57 UTC (permalink / raw)
To: Petr Baudis; +Cc: Junio C Hamano, git
Petr Baudis wrote:
> This patch enables fetching multiple repositories at once over the Git
> protocol (and SSH, and locally if git-fetch-pack is your cup of coffee
> there). This is done especially for the xorg people who have tons of
> repositories and dislike pulls much slower than they were used to with CVS.
> I'm eager to hear how this affects the situation.
>
> It's kind of "superproject" thing, basically, taking reverse approach than
> the subproject ideas. However, in practice I think it can be used for
> subprojects quite well and perhaps if I find some spare time during the day
> I will add the lightweight subproject support to Cogito, using this.
>
> So, you need some kind of porcelain for this. The idea is that instead of
> telling git-fetch-pack a single repository, you pass multiple --repo=
> parameters and refs always "belong" to the latest mentioned repository;
> when outputting the new ref values, the appropriate repo is mentioned near
> each ref. In order for this to be useful, on your local side you should
> share the objects database in some way - either using alternates (then
> you must fetch to an object database reachable from everywhere) or symlinked
> object databases.
>
> You still need to pass git-fetch-pack some URL in addition to the
> repositories - it is used only for git_connect(), the purpose is that
> repositories must be local directories so if you want to talk remote, you
> need to do something like
>
> git-fetch-pack git://kernel.org/pub/scm/git/git.git --repo=/pub/scm/git/git.git master next --repo=/pub/scm/cogito/cogito.git master
>
Can't this be done with a simple wrapper around git-fetch* ?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 16:57 ` A Large Angry SCM
@ 2006-09-23 17:04 ` Petr Baudis
2006-09-23 17:23 ` Johannes Schindelin
0 siblings, 1 reply; 13+ messages in thread
From: Petr Baudis @ 2006-09-23 17:04 UTC (permalink / raw)
To: A Large Angry SCM; +Cc: Junio C Hamano, git
Dear diary, on Sat, Sep 23, 2006 at 06:57:36PM CEST, I got a letter
where A Large Angry SCM <gitzilla@gmail.com> said that...
> Petr Baudis wrote:
> >You still need to pass git-fetch-pack some URL in addition to the
> >repositories - it is used only for git_connect(), the purpose is that
> >repositories must be local directories so if you want to talk remote, you
> >need to do something like
> >
> > git-fetch-pack git://kernel.org/pub/scm/git/git.git
> > --repo=/pub/scm/git/git.git master next
> > --repo=/pub/scm/cogito/cogito.git master
> >
>
> Can't this be done with a simple wrapper around git-fetch* ?
The point is that that is too slow (the difference is noticeable
especiall in the case there is no new stuff ready), you need to create a
connection and do whole lot of work for each of the repositories.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 17:04 ` Petr Baudis
@ 2006-09-23 17:23 ` Johannes Schindelin
2006-09-23 17:39 ` Petr Baudis
0 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2006-09-23 17:23 UTC (permalink / raw)
To: Petr Baudis; +Cc: A Large Angry SCM, Junio C Hamano, git
Hi,
On Sat, 23 Sep 2006, Petr Baudis wrote:
> Dear diary, on Sat, Sep 23, 2006 at 06:57:36PM CEST, I got a letter
> where A Large Angry SCM <gitzilla@gmail.com> said that...
> > Petr Baudis wrote:
> > >You still need to pass git-fetch-pack some URL in addition to the
> > >repositories - it is used only for git_connect(), the purpose is that
> > >repositories must be local directories so if you want to talk remote, you
> > >need to do something like
> > >
> > > git-fetch-pack git://kernel.org/pub/scm/git/git.git
> > > --repo=/pub/scm/git/git.git master next
> > > --repo=/pub/scm/cogito/cogito.git master
> > >
> >
> > Can't this be done with a simple wrapper around git-fetch* ?
>
> The point is that that is too slow (the difference is noticeable
> especiall in the case there is no new stuff ready), you need to create a
> connection and do whole lot of work for each of the repositories.
I still firmly believe that it would be way more efficient to fetch all
those branches into _one_ proxy repository. Especially since you can reuse
the objects with an alternate, which has an additional benefit over your
approach.
And _then_, you can make a wrapper script, which guarantees that the
alternate is set, and that the ref is actually fetched from the proxy.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 17:23 ` Johannes Schindelin
@ 2006-09-23 17:39 ` Petr Baudis
2006-09-23 17:54 ` Johannes Schindelin
0 siblings, 1 reply; 13+ messages in thread
From: Petr Baudis @ 2006-09-23 17:39 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: A Large Angry SCM, Junio C Hamano, git
Hi,
Dear diary, on Sat, Sep 23, 2006 at 07:23:01PM CEST, I got a letter
where Johannes Schindelin <Johannes.Schindelin@gmx.de> said that...
> On Sat, 23 Sep 2006, Petr Baudis wrote:
>
> > Dear diary, on Sat, Sep 23, 2006 at 06:57:36PM CEST, I got a letter
> > where A Large Angry SCM <gitzilla@gmail.com> said that...
> > > Petr Baudis wrote:
> > > >You still need to pass git-fetch-pack some URL in addition to the
> > > >repositories - it is used only for git_connect(), the purpose is that
> > > >repositories must be local directories so if you want to talk remote, you
> > > >need to do something like
> > > >
> > > > git-fetch-pack git://kernel.org/pub/scm/git/git.git
> > > > --repo=/pub/scm/git/git.git master next
> > > > --repo=/pub/scm/cogito/cogito.git master
> > > >
> > >
> > > Can't this be done with a simple wrapper around git-fetch* ?
> >
> > The point is that that is too slow (the difference is noticeable
> > especiall in the case there is no new stuff ready), you need to create a
> > connection and do whole lot of work for each of the repositories.
>
> I still firmly believe that it would be way more efficient to fetch all
> those branches into _one_ proxy repository. Especially since you can reuse
> the objects with an alternate, which has an additional benefit over your
> approach.
Huh? You can reuse the objects with my approach as well. Actually, it
is automagically done so.
With proxy repository, you would still need a server-side setup to
maintain that repository, and specialized client-side porcelain to fetch
from it. My approach initially requires some core changes (which aren't
very pretty as it is but are not very fundamental or logically intrusive
either) but in the longer run it pays off since you don't need a
convoluted server-side setup for that.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 17:39 ` Petr Baudis
@ 2006-09-23 17:54 ` Johannes Schindelin
2006-09-23 18:05 ` Petr Baudis
0 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2006-09-23 17:54 UTC (permalink / raw)
To: Petr Baudis; +Cc: A Large Angry SCM, Junio C Hamano, git
Hi,
On Sat, 23 Sep 2006, Petr Baudis wrote:
> Dear diary, on Sat, Sep 23, 2006 at 07:23:01PM CEST, I got a letter
> where Johannes Schindelin <Johannes.Schindelin@gmx.de> said that...
> > On Sat, 23 Sep 2006, Petr Baudis wrote:
> >
> > > Dear diary, on Sat, Sep 23, 2006 at 06:57:36PM CEST, I got a letter
> > > where A Large Angry SCM <gitzilla@gmail.com> said that...
> > > > Petr Baudis wrote:
> > > > >You still need to pass git-fetch-pack some URL in addition to the
> > > > >repositories - it is used only for git_connect(), the purpose is that
> > > > >repositories must be local directories so if you want to talk remote, you
> > > > >need to do something like
> > > > >
> > > > > git-fetch-pack git://kernel.org/pub/scm/git/git.git
> > > > > --repo=/pub/scm/git/git.git master next
> > > > > --repo=/pub/scm/cogito/cogito.git master
> > > > >
> > > >
> > > > Can't this be done with a simple wrapper around git-fetch* ?
> > >
> > > The point is that that is too slow (the difference is noticeable
> > > especiall in the case there is no new stuff ready), you need to create a
> > > connection and do whole lot of work for each of the repositories.
> >
> > I still firmly believe that it would be way more efficient to fetch all
> > those branches into _one_ proxy repository. Especially since you can reuse
> > the objects with an alternate, which has an additional benefit over your
> > approach.
>
> Huh? You can reuse the objects with my approach as well. Actually, it
> is automagically done so.
>
> With proxy repository, you would still need a server-side setup to
> maintain that repository, and specialized client-side porcelain to fetch
> from it. My approach initially requires some core changes (which aren't
> very pretty as it is but are not very fundamental or logically intrusive
> either) but in the longer run it pays off since you don't need a
> convoluted server-side setup for that.
No, you do not need _any_ server-side setup. And you do not need any
specialized client-side porcelain other than a script, which just does
the job.
The point is, your changes are not only intrusive, they are unnecessary.
You can even declare the first repository as the proxy. Sketch of script:
- find which branches are wanted at all,
- do the fetch (of all wanted branches) in the first repository,
storing the refs under refs/multi/,
- for each repository
- ensure that the first repo is given as alternative
- store the branches this repo wants (you only need to
update the refs, since you have all the objects)
This is not particularly complicated, is it?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 17:54 ` Johannes Schindelin
@ 2006-09-23 18:05 ` Petr Baudis
2006-09-23 18:14 ` Johannes Schindelin
0 siblings, 1 reply; 13+ messages in thread
From: Petr Baudis @ 2006-09-23 18:05 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: A Large Angry SCM, Junio C Hamano, git
Hi!
Dear diary, on Sat, Sep 23, 2006 at 07:54:10PM CEST, I got a letter
where Johannes Schindelin <Johannes.Schindelin@gmx.de> said that...
> On Sat, 23 Sep 2006, Petr Baudis wrote:
>
> > Dear diary, on Sat, Sep 23, 2006 at 07:23:01PM CEST, I got a letter
> > where Johannes Schindelin <Johannes.Schindelin@gmx.de> said that...
> > > I still firmly believe that it would be way more efficient to fetch all
> > > those branches into _one_ proxy repository. Especially since you can reuse
> > > the objects with an alternate, which has an additional benefit over your
> > > approach.
> >
> > Huh? You can reuse the objects with my approach as well. Actually, it
> > is automagically done so.
> >
> > With proxy repository, you would still need a server-side setup to
> > maintain that repository, and specialized client-side porcelain to fetch
> > from it. My approach initially requires some core changes (which aren't
> > very pretty as it is but are not very fundamental or logically intrusive
> > either) but in the longer run it pays off since you don't need a
> > convoluted server-side setup for that.
>
> No, you do not need _any_ server-side setup. And you do not need any
> specialized client-side porcelain other than a script, which just does
> the job.
So how should the proxy repository be set up at the server side? I can
just imagine a cronjob periodically sweeping the repositories and
populate the proxy repository with new stuff (refs and objects) from
those.
Looking at what you wrote again, you talk about fetching all those
branches _into_ one proxy repository. Perhaps we misunderstand each
other. This patch is not about the "into" part, it is about being able
to fetch _FROM_ _multiple_ repositories.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 18:05 ` Petr Baudis
@ 2006-09-23 18:14 ` Johannes Schindelin
2006-09-23 19:17 ` Petr Baudis
2006-09-23 19:35 ` Junio C Hamano
0 siblings, 2 replies; 13+ messages in thread
From: Johannes Schindelin @ 2006-09-23 18:14 UTC (permalink / raw)
To: Petr Baudis; +Cc: A Large Angry SCM, Junio C Hamano, git
Hi,
On Sat, 23 Sep 2006, Petr Baudis wrote:
> Hi!
>
> Dear diary, on Sat, Sep 23, 2006 at 07:54:10PM CEST, I got a letter
> where Johannes Schindelin <Johannes.Schindelin@gmx.de> said that...
> > On Sat, 23 Sep 2006, Petr Baudis wrote:
> >
> > > Dear diary, on Sat, Sep 23, 2006 at 07:23:01PM CEST, I got a letter
> > > where Johannes Schindelin <Johannes.Schindelin@gmx.de> said that...
> > > > I still firmly believe that it would be way more efficient to fetch all
> > > > those branches into _one_ proxy repository. Especially since you can reuse
> > > > the objects with an alternate, which has an additional benefit over your
> > > > approach.
> > >
> > > Huh? You can reuse the objects with my approach as well. Actually, it
> > > is automagically done so.
> > >
> > > With proxy repository, you would still need a server-side setup to
> > > maintain that repository, and specialized client-side porcelain to fetch
> > > from it. My approach initially requires some core changes (which aren't
> > > very pretty as it is but are not very fundamental or logically intrusive
> > > either) but in the longer run it pays off since you don't need a
> > > convoluted server-side setup for that.
> >
> > No, you do not need _any_ server-side setup. And you do not need any
> > specialized client-side porcelain other than a script, which just does
> > the job.
>
> So how should the proxy repository be set up at the server side? I can
> just imagine a cronjob periodically sweeping the repositories and
> populate the proxy repository with new stuff (refs and objects) from
> those.
>
> Looking at what you wrote again, you talk about fetching all those
> branches _into_ one proxy repository. Perhaps we misunderstand each
> other. This patch is not about the "into" part, it is about being able
> to fetch _FROM_ _multiple_ repositories.
Indeed, I misunderstood.
Hmmm. Did you test if this makes things better? If so, could you test with
one proxy repo on the server? I bet that this would be even more
efficient.
And the proxy repo would be updated best by a hook, not a cronjob.
The thing is, your patch optimizes for a very special case, which special
case IMHO should be handled differently to begin with.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 18:14 ` Johannes Schindelin
@ 2006-09-23 19:17 ` Petr Baudis
2006-09-23 19:35 ` Junio C Hamano
1 sibling, 0 replies; 13+ messages in thread
From: Petr Baudis @ 2006-09-23 19:17 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: A Large Angry SCM, Junio C Hamano, git
Hi,
Dear diary, on Sat, Sep 23, 2006 at 08:14:28PM CEST, I got a letter
where Johannes Schindelin <Johannes.Schindelin@gmx.de> said that...
> Hmmm. Did you test if this makes things better? If so, could you test with
> one proxy repo on the server? I bet that this would be even more
> efficient.
I didn't test it but it _has_ to... ;-))
I will try to do some real benchmarks (repo.or.cz might come handy
here again) but I don't know when will I get to it. I will try the proxy
setup as well.
> And the proxy repo would be updated best by a hook, not a cronjob.
You would have a race window there.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 18:14 ` Johannes Schindelin
2006-09-23 19:17 ` Petr Baudis
@ 2006-09-23 19:35 ` Junio C Hamano
1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2006-09-23 19:35 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Petr Baudis, A Large Angry SCM, git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> The thing is, your patch optimizes for a very special case, which special
> case IMHO should be handled differently to begin with.
I share the feeling, and the usual URL while spelling out the
repository name but used only for the site name, because other
special parameters are used to override the repository path
within the site, is just too ugly.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 16:45 [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once Petr Baudis
2006-09-23 16:57 ` A Large Angry SCM
@ 2006-09-23 17:06 ` Petr Baudis
2006-09-23 18:45 ` Jakub Narebski
2 siblings, 0 replies; 13+ messages in thread
From: Petr Baudis @ 2006-09-23 17:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]
Dear diary, on Sat, Sep 23, 2006 at 06:45:09PM CEST, I got a letter
where Petr Baudis <pasky@suse.cz> said that...
> You still need to pass git-fetch-pack some URL in addition to the
> repositories - it is used only for git_connect(), the purpose is that
> repositories must be local directories so if you want to talk remote, you
> need to do something like
>
> git-fetch-pack git://kernel.org/pub/scm/git/git.git --repo=/pub/scm/git/git.git master next --repo=/pub/scm/cogito/cogito.git master
..snip..
> The main target of this are the Xorg people who have plenty of small repos
> and frequently want to update many of them. It would be nice if someone
> from Xorg reading this and concerned with this could measure the
> difference.
Attached is a simple porcelain script for this.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
[-- Attachment #2: multifetch.pl --]
[-- Type: text/x-perl, Size: 1265 bytes --]
#!/usr/bin/perl
use warnings;
use strict;
# Remember that you must ensure the obj database is shared - either symlink it
# or setup alternates!
#my $remoteurl = 'git://git.kernel.org/pub/scm/git/git.git';
#my %config = (
# '/pub/scm/git/git.git' => {
# 'next' => {
# '/home/xpasky/q/gg' => 'origin'
# },
# 'master' => {
# '/home/xpasky/q/gg' => 'origin2',
# '/home/xpasky/q/gg2' => 'origin'
# }
# },
# '/pub/scm/cogito/cogito-doc.git' => {
# master => {
# '/home/xpasky/q/gg2' => 'origin2'
# }
# }
#);
my $remoteurl = 'puturlofremotehosthere';
my %config = (
'remoterepo1' => {
branch1 => {
'localrepo1' => 'origin'
},
branch2 => {
'localrepo1' => 'origin2',
'localrepo2' => 'origin'
}
},
'remoterepo2' => {
master => {
'localrepo2' => 'origin2'
}
}
);
my @args = ($remoteurl);
foreach my $repo (keys %config) {
push (@args, '--repo='.$repo);
foreach my $branch (keys %{$config{$repo}}) {
push (@args, $branch);
}
}
open (F, '-|', 'git-fetch-pack', @args) or die "$!";
while (<F>) {
chomp;
split / /, $_;
my ($sha, $ref, $repo) = @_;
$ref =~ s#^refs/heads/##;
foreach my $lrepo (keys %{$config{$repo}->{$ref}}) {
system("GIT_DIR=$lrepo git-update-ref $config{$repo}->{$ref}->{$lrepo} $sha");
}
}
close (F);
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 16:45 [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once Petr Baudis
2006-09-23 16:57 ` A Large Angry SCM
2006-09-23 17:06 ` Petr Baudis
@ 2006-09-23 18:45 ` Jakub Narebski
2006-09-23 19:14 ` Petr Baudis
2 siblings, 1 reply; 13+ messages in thread
From: Jakub Narebski @ 2006-09-23 18:45 UTC (permalink / raw)
To: git
Petr Baudis wrote:
> This patch enables fetching multiple repositories at once over the Git
> protocol (and SSH, and locally if git-fetch-pack is your cup of coffee
> there). This is done especially for the xorg people who have tons of
> repositories and dislike pulls much slower than they were used to with CVS.
> I'm eager to hear how this affects the situation.
[...]
I see that objects got to the one shared object database. Where the heads
(refs) go to? Separate projects have separate ref spaces, even if they share
object database...
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once
2006-09-23 18:45 ` Jakub Narebski
@ 2006-09-23 19:14 ` Petr Baudis
0 siblings, 0 replies; 13+ messages in thread
From: Petr Baudis @ 2006-09-23 19:14 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Dear diary, on Sat, Sep 23, 2006 at 08:45:01PM CEST, I got a letter
where Jakub Narebski <jnareb@gmail.com> said that...
> Petr Baudis wrote:
>
> > This patch enables fetching multiple repositories at once over the Git
> > protocol (and SSH, and locally if git-fetch-pack is your cup of coffee
> > there). This is done especially for the xorg people who have tons of
> > repositories and dislike pulls much slower than they were used to with CVS.
> > I'm eager to hear how this affects the situation.
> [...]
>
> I see that objects got to the one shared object database. Where the heads
> (refs) go to? Separate projects have separate ref spaces, even if they share
> object database...
git-fetch-pack actually does not save the refs at all, that's the job of
the porcelain (either git-fetch or cg-fetch). git-fetch-pack just emits
the refs list on stdout. See the script in a followup for an example of
a multirepo-aware porcelain wrapper.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-09-23 19:35 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-23 16:45 [RFC][RESEND][PATCH] Allow fetching from multiple repositories at once Petr Baudis
2006-09-23 16:57 ` A Large Angry SCM
2006-09-23 17:04 ` Petr Baudis
2006-09-23 17:23 ` Johannes Schindelin
2006-09-23 17:39 ` Petr Baudis
2006-09-23 17:54 ` Johannes Schindelin
2006-09-23 18:05 ` Petr Baudis
2006-09-23 18:14 ` Johannes Schindelin
2006-09-23 19:17 ` Petr Baudis
2006-09-23 19:35 ` Junio C Hamano
2006-09-23 17:06 ` Petr Baudis
2006-09-23 18:45 ` Jakub Narebski
2006-09-23 19:14 ` Petr Baudis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).