* [PATCH][RFC] Add git-archive-tree
@ 2006-09-02 12:23 Rene Scharfe
2006-09-02 12:37 ` [PATCH] Add support for tgz archive format Rene Scharfe
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Rene Scharfe @ 2006-09-02 12:23 UTC (permalink / raw)
To: Git Mailing List; +Cc: Junio C Hamano, Franck Bui-Huu
git-archive-tree is a command to make tar and ZIP archives of a git tree.
It helps prevent a proliferation of git-{format}-tree commands. This is
useful e.g. for remote archive fetching because we only need to write a
single upload and a single download program that simply pass on the
format option to git-archive-tree.
Speaking of remote, please note the absence of the --remote option of
git-tar-tree. This is intentional; remote operations are special enough
to deserve a separate (yet to be written) command.
Currently git-archive-tree -f tar is slower than git-tar-tree. This is
because it is welded to the side of the existing code to minimize patch
size, and I also suspect read_tree_recursive() to be quite a bit slower
than builtin-tar-tree.c::traverse_tree().
Documentation/git-archive-tree.txt | 99 +++++++++++++++++++++++++++++++++++++
Makefile | 3 -
archive.h | 6 ++
builtin-archive-tree.c | 92 ++++++++++++++++++++++++++++++++++
builtin-tar-tree.c | 66 ++++++++++++++++++++++++
builtin-zip-tree.c | 28 ++++++++++
builtin.h | 1
git.c | 1
8 files changed, 295 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-archive-tree.txt b/Documentation/git-archive-tree.txt
new file mode 100644
index 0000000..122c482
--- /dev/null
+++ b/Documentation/git-archive-tree.txt
@@ -0,0 +1,99 @@
+git-archive-tree(1)
+===============
+
+NAME
+----
+git-archive-tree - Creates a archive of the files in the named tree
+
+
+SYNOPSIS
+--------
+'git-archive-tree' -f {tar|zip} [--prefix=<prefix>/] [-0|...|-9]
+ <tree-ish> [path...]
+
+DESCRIPTION
+-----------
+Creates an archive of the specified format containing the tree structure
+for the named tree. If <prefix> is specified it is prepended to the
+filenames in the archive.
+
+git-archive-tree behaves differently when given a tree ID versus when
+given a commit ID or tag ID. In the first case the current time is used
+as modification time of each file in the archive. In the latter case the
+commit time as recorded in the referenced commit object is used instead.
+Additionally the commit ID is stored in a global extended pax header if
+the tar format is used; it can be extracted using git-get-tar-commit-id.
+In ZIP files it is stored as a file comment.
+
+OPTIONS
+-------
+
+-f::
+ Format of the resulting archive, can be either 'tar' or 'zip'.
+
+<tree-ish>::
+ The tree or commit to produce an archive for.
+
+path::
+ If one or more paths are specified, include only these in the
+ archive, otherwise include all files and subdirectories.
+
+--prefix=<prefix>/::
+ Prepend <prefix>/ to each filename in the archive.
+
+-0::
+ Store files in the archive instead of compressing them. This
+ option has no effect when the tar format is used.
+
+-9::
+ Highest and slowest compression level. You can specify any
+ number from 1 to 9 to adjust compression speed and ratio. This
+ option has no effect when the tar format is used.
+
+CONFIGURATION
+-------------
+By default, file and directories modes are set to 0666 or 0777 in tar
+archives. It is possible to change this by setting the "umask" variable
+in the repository configuration as follows :
+
+[tar]
+ umask = 002 ;# group friendly
+
+The special umask value "user" indicates that the user's current umask
+will be used instead. The default value remains 0, which means world
+readable/writable files and directories.
+
+EXAMPLES
+--------
+git archive -f tar --prefix=junk/ HEAD | (cd /var/tmp/ && tar xf -)::
+
+ Create a tar archive that contains the contents of the
+ latest commit on the current branch, and extracts it in
+ `/var/tmp/junk` directory.
+
+git archive -f tar --prefix=git-1.4.0/ v1.4.0 | gzip >git-1.4.0.tar.gz::
+
+ Create a compressed tarball for v1.4.0 release.
+
+git archive -f tar --prefix=git-1.4.0/ v1.4.0{caret}\{tree\} | gzip >git-1.4.0.tar.gz::
+
+ Create a compressed tarball for v1.4.0 release, but without a
+ global extended pax header.
+
+git archive -f zip --prefix=git-docs/ HEAD:Documentation/ > git-1.4.0-docs.zip::
+
+ Put everything in the current head's Documentation/ directory
+ into 'git-1.4.0-docs.zip', with the prefix 'git-docs/'.
+
+Author
+------
+Written by Rene Scharfe.
+
+Documentation
+--------------
+Documentation by David Greaves, Junio C Hamano and the git-list <git@vger.kernel.org>.
+
+GIT
+---
+Part of the gitlink:git[7] suite
+
diff --git a/Makefile b/Makefile
index 05bd77f..d0a1055 100644
--- a/Makefile
+++ b/Makefile
@@ -231,7 +231,7 @@ LIB_FILE=libgit.a
XDIFF_LIB=xdiff/lib.a
LIB_H = \
- blob.h cache.h commit.h csum-file.h delta.h \
+ archive.h blob.h cache.h commit.h csum-file.h delta.h \
diff.h object.h pack.h pkt-line.h quote.h refs.h \
run-command.h strbuf.h tag.h tree.h git-compat-util.h revision.h \
tree-walk.h log-tree.h dir.h path-list.h unpack-trees.h builtin.h
@@ -255,6 +255,7 @@ LIB_OBJS = \
BUILTIN_OBJS = \
builtin-add.o \
builtin-apply.o \
+ builtin-archive-tree.o \
builtin-cat-file.o \
builtin-checkout-index.o \
builtin-check-ref-format.o \
diff --git a/archive.h b/archive.h
new file mode 100644
index 0000000..7813962
--- /dev/null
+++ b/archive.h
@@ -0,0 +1,6 @@
+#include "tree.h"
+
+typedef int (*write_archive_fn_t)(struct tree *tree, const unsigned char *commit_sha1, const char *prefix, time_t time, const char **pathspec);
+
+int write_tar_archive(struct tree *tree, const unsigned char *commit_sha1, const char *prefix, time_t time, const char **pathspec);
+int write_zip_archive(struct tree *tree, const unsigned char *commit_sha1, const char *prefix, time_t time, const char **pathspec);
diff --git a/builtin-archive-tree.c b/builtin-archive-tree.c
new file mode 100644
index 0000000..2c6ee60
--- /dev/null
+++ b/builtin-archive-tree.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright (c) 2006 Rene Scharfe
+ */
+#include <time.h>
+#include "cache.h"
+#include "builtin.h"
+#include "commit.h"
+#include "tree-walk.h"
+#include "archive.h"
+
+static const char archive_usage[] =
+"git-archive-tree -f {tar|zip} [--prefix=<prefix>/] [-0|...|-9] <tree-ish> [path...]";
+
+static write_archive_fn_t parse_archive_format(const char *format)
+{
+ if (!strcmp(format, "tar"))
+ return write_tar_archive;
+ if (!strcmp(format, "zip"))
+ return write_zip_archive;
+ return NULL;
+}
+
+int cmd_archive_tree(int argc, const char **argv, const char *prefix)
+{
+ int more_args = 1;
+ const char *archive_prefix = "";
+ unsigned char sha1[20];
+ struct commit *commit;
+ time_t archive_time;
+ const char **pathspec;
+ const unsigned char *commit_sha1 = NULL;
+ write_archive_fn_t write_archive = NULL;
+ struct tree *tree;
+ int result;
+
+ while (argc > 2 && more_args) {
+ const char *arg = argv[1];
+ if (!strcmp(arg, "-f")) {
+ write_archive = parse_archive_format(argv[2]);
+ argv++;
+ argc--;
+ } else if (!strncmp(arg, "-f", 2))
+ write_archive = parse_archive_format(arg + 2);
+ else if (arg[0] == '-' && isdigit(arg[1]) && arg[2] == '\0')
+ zlib_compression_level = arg[1] - '0';
+ else if (!strcmp(arg, "--"))
+ more_args = 0;
+ else if (!strncmp(arg, "--prefix=", 9))
+ archive_prefix = arg + 9;
+ else if (arg[0] == '-')
+ usage(archive_usage);
+ else
+ break;
+ argv++;
+ argc--;
+ }
+
+ if (!write_archive)
+ usage(archive_usage);
+ if (argc < 2)
+ usage(archive_usage);
+ if (get_sha1(argv[1], sha1))
+ die("Not a valid object name %s", argv[1]);
+
+ commit = lookup_commit_reference_gently(sha1, 1);
+ if (commit)
+ commit_sha1 = commit->object.sha1;
+
+ tree = parse_tree_indirect(sha1);
+ if (!tree)
+ die("not a tree object");
+
+ if (prefix) {
+ unsigned char tree_sha1[20];
+ unsigned int mode;
+ int err = get_tree_entry(tree->object.sha1, prefix,
+ tree_sha1, &mode);
+ if (err || !S_ISDIR(mode))
+ die("current working directory is untracked");
+ free(tree);
+ tree = parse_tree_indirect(tree_sha1);
+ }
+
+ archive_time = commit ? commit->date : time(NULL);
+ pathspec = get_pathspec(archive_prefix, argv + 2);
+
+ result = write_archive(tree, commit_sha1, archive_prefix,
+ archive_time, pathspec);
+ free(tree);
+
+ return result;
+}
diff --git a/builtin-tar-tree.c b/builtin-tar-tree.c
index 61a4135..e0da01e 100644
--- a/builtin-tar-tree.c
+++ b/builtin-tar-tree.c
@@ -9,6 +9,7 @@ #include "strbuf.h"
#include "tar.h"
#include "builtin.h"
#include "pkt-line.h"
+#include "archive.h"
#define RECORDSIZE (512)
#define BLOCKSIZE (RECORDSIZE * 20)
@@ -338,6 +339,71 @@ static int generate_tar(int argc, const
return 0;
}
+static int write_tar_entry(const unsigned char *sha1,
+ const char *base, int baselen,
+ const char *filename, unsigned mode, int stage)
+{
+ static struct strbuf path;
+ int filenamelen = strlen(filename);
+ void *buffer;
+ char type[20];
+ unsigned long size;
+
+ if (!path.alloc) {
+ path.buf = xmalloc(PATH_MAX);
+ path.alloc = PATH_MAX;
+ path.len = path.eof = 0;
+ }
+ if (path.alloc < baselen + filenamelen) {
+ free(path.buf);
+ path.buf = xmalloc(baselen + filenamelen);
+ path.alloc = baselen + filenamelen;
+ }
+ memcpy(path.buf, base, baselen);
+ memcpy(path.buf + baselen, filename, filenamelen);
+ path.len = baselen + filenamelen;
+ if (S_ISDIR(mode)) {
+ strbuf_append_string(&path, "/");
+ buffer = NULL;
+ size = 0;
+ } else {
+ buffer = read_sha1_file(sha1, type, &size);
+ if (!buffer)
+ die("cannot read %s", sha1_to_hex(sha1));
+ }
+
+ write_entry(sha1, &path, mode, buffer, size);
+
+ return READ_TREE_RECURSIVE;
+}
+
+int write_tar_archive(struct tree *tree, const unsigned char *commit_sha1,
+ const char *prefix, time_t time, const char **pathspec)
+{
+ int plen = strlen(prefix);
+
+ git_config(git_tar_config);
+
+ archive_time = time;
+
+ if (commit_sha1)
+ write_global_extended_header(commit_sha1);
+
+ if (prefix && plen > 0 && prefix[plen - 1] == '/') {
+ char *base = strdup(prefix);
+ int baselen = strlen(base);
+
+ while (baselen > 0 && base[baselen - 1] == '/')
+ base[--baselen] = '\0';
+ write_tar_entry(tree->object.sha1, "", 0, base, 040777, 0);
+ free(base);
+ }
+ read_tree_recursive(tree, prefix, plen, 0, pathspec, write_tar_entry);
+ write_trailer();
+
+ return 0;
+}
+
static const char *exec = "git-upload-tar";
static int remote_tar(int argc, const char **argv)
diff --git a/builtin-zip-tree.c b/builtin-zip-tree.c
index a5b834d..b142771 100644
--- a/builtin-zip-tree.c
+++ b/builtin-zip-tree.c
@@ -8,6 +8,7 @@ #include "blob.h"
#include "tree.h"
#include "quote.h"
#include "builtin.h"
+#include "archive.h"
static const char zip_tree_usage[] =
"git-zip-tree [-0|...|-9] <tree-ish> [ <base> ]";
@@ -351,3 +352,30 @@ int cmd_zip_tree(int argc, const char **
return 0;
}
+
+int write_zip_archive(struct tree *tree, const unsigned char *commit_sha1,
+ const char *prefix, time_t time, const char **pathspec)
+{
+ int plen = strlen(prefix);
+
+ dos_time(&time, &zip_date, &zip_time);
+
+ zip_dir = xmalloc(ZIP_DIRECTORY_MIN_SIZE);
+ zip_dir_size = ZIP_DIRECTORY_MIN_SIZE;
+
+ if (prefix && plen > 0 && prefix[plen - 1] == '/') {
+ char *base = strdup(prefix);
+ int baselen = strlen(base);
+
+ while (baselen > 0 && base[baselen - 1] == '/')
+ base[--baselen] = '\0';
+ write_zip_entry(tree->object.sha1, "", 0, base, 040777, 0);
+ free(base);
+ }
+ read_tree_recursive(tree, prefix, plen, 0, pathspec, write_zip_entry);
+ write_zip_trailer(commit_sha1);
+
+ free(zip_dir);
+
+ return 0;
+}
diff --git a/builtin.h b/builtin.h
index 25431d7..febb9d0 100644
--- a/builtin.h
+++ b/builtin.h
@@ -53,6 +53,7 @@ extern int cmd_stripspace(int argc, cons
extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
extern int cmd_tar_tree(int argc, const char **argv, const char *prefix);
extern int cmd_zip_tree(int argc, const char **argv, const char *prefix);
+extern int cmd_archive_tree(int argc, const char **argv, const char *prefix);
extern int cmd_unpack_objects(int argc, const char **argv, const char *prefix);
extern int cmd_update_index(int argc, const char **argv, const char *prefix);
extern int cmd_update_ref(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index 05871ad..937b7f7 100644
--- a/git.c
+++ b/git.c
@@ -264,6 +264,7 @@ static void handle_internal_command(int
{ "symbolic-ref", cmd_symbolic_ref, RUN_SETUP },
{ "tar-tree", cmd_tar_tree, RUN_SETUP },
{ "zip-tree", cmd_zip_tree, RUN_SETUP },
+ { "archive-tree", cmd_archive_tree, RUN_SETUP },
{ "unpack-objects", cmd_unpack_objects, RUN_SETUP },
{ "update-index", cmd_update_index, RUN_SETUP },
{ "update-ref", cmd_update_ref, RUN_SETUP },
--
VGER BF report: U 0.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH] Add support for tgz archive format
2006-09-02 12:23 [PATCH][RFC] Add git-archive-tree Rene Scharfe
@ 2006-09-02 12:37 ` Rene Scharfe
2006-09-02 13:10 ` [PATCH][RFC] Add git-archive-tree Rene Scharfe
` (3 subsequent siblings)
4 siblings, 0 replies; 17+ messages in thread
From: Rene Scharfe @ 2006-09-02 12:37 UTC (permalink / raw)
To: Git Mailing List; +Cc: Junio C Hamano, Franck Bui-Huu
Documentation/git-archive-tree.txt | 9 ++---
archive.h | 2 +
builtin-archive-tree.c | 4 +-
builtin-tar-tree.c | 63 ++++++++++++++++++++++++++++++++-----
4 files changed, 65 insertions(+), 13 deletions(-)
diff --git a/Documentation/git-archive-tree.txt b/Documentation/git-archive-tree.txt
index 122c482..25136d9 100644
--- a/Documentation/git-archive-tree.txt
+++ b/Documentation/git-archive-tree.txt
@@ -8,7 +8,7 @@ git-archive-tree - Creates a archive of the f
SYNOPSIS
--------
-'git-archive-tree' -f {tar|zip} [--prefix=<prefix>/] [-0|...|-9]
+'git-archive-tree' -f {tar|tgz|zip} [--prefix=<prefix>/] [-0|...|-9]
<tree-ish> [path...]
DESCRIPTION
@@ -29,7 +29,8 @@ OPTIONS
-------
-f::
- Format of the resulting archive, can be either 'tar' or 'zip'.
+ Format of the resulting archive, can be either 'tar', 'tgz'
+ or 'zip'.
<tree-ish>::
The tree or commit to produce an archive for.
@@ -71,11 +72,11 @@ git archive -f tar --prefix=junk/ HEAD |
latest commit on the current branch, and extracts it in
`/var/tmp/junk` directory.
-git archive -f tar --prefix=git-1.4.0/ v1.4.0 | gzip >git-1.4.0.tar.gz::
+git archive -f tgz --prefix=git-1.4.0/ v1.4.0 >git-1.4.0.tar.gz::
Create a compressed tarball for v1.4.0 release.
-git archive -f tar --prefix=git-1.4.0/ v1.4.0{caret}\{tree\} | gzip >git-1.4.0.tar.gz::
+git archive -f tgz --prefix=git-1.4.0/ v1.4.0{caret}\{tree\} >git-1.4.0.tar.gz::
Create a compressed tarball for v1.4.0 release, but without a
global extended pax header.
diff --git a/archive.h b/archive.h
index 7813962..6a03864 100644
--- a/archive.h
+++ b/archive.h
@@ -3,4 +3,6 @@ #include "tree.h"
typedef int (*write_archive_fn_t)(struct tree *tree, const unsigned char *commit_sha1, const char *prefix, time_t time, const char **pathspec);
int write_tar_archive(struct tree *tree, const unsigned char *commit_sha1, const char *prefix, time_t time, const char **pathspec);
+int write_tgz_archive(struct tree *tree, const unsigned char *commit_sha1,
+ const char *prefix, time_t time, const char **pathspec);
int write_zip_archive(struct tree *tree, const unsigned char *commit_sha1, const char *prefix, time_t time, const char **pathspec);
diff --git a/builtin-archive-tree.c b/builtin-archive-tree.c
index 2c6ee60..f61e26d 100644
--- a/builtin-archive-tree.c
+++ b/builtin-archive-tree.c
@@ -9,12 +9,14 @@ #include "tree-walk.h"
#include "archive.h"
static const char archive_usage[] =
-"git-archive-tree -f {tar|zip} [--prefix=<prefix>/] [-0|...|-9] <tree-ish> [path...]";
+"git-archive-tree -f {tar|tgz|zip} [--prefix=<prefix>/] [-0|...|-9] <tree-ish> [path...]";
static write_archive_fn_t parse_archive_format(const char *format)
{
if (!strcmp(format, "tar"))
return write_tar_archive;
+ if (!strcmp(format, "tgz"))
+ return write_tgz_archive;
if (!strcmp(format, "zip"))
return write_zip_archive;
return NULL;
diff --git a/builtin-tar-tree.c b/builtin-tar-tree.c
index e0da01e..743d53a 100644
--- a/builtin-tar-tree.c
+++ b/builtin-tar-tree.c
@@ -23,6 +23,10 @@ static unsigned long offset;
static time_t archive_time;
static int tar_umask;
+static gzFile gzstdout;
+
+typedef void (*blocked_write_fn)(const void *data, unsigned long size);
+
/* writes out the whole block, but only if it is full */
static void write_if_needed(void)
{
@@ -36,7 +40,7 @@ static void write_if_needed(void)
* queues up writes, so that all our write(2) calls write exactly one
* full block; pads writes to RECORDSIZE
*/
-static void write_blocked(const void *data, unsigned long size)
+static void do_write_blocked(const void *data, unsigned long size)
{
const char *buf = data;
unsigned long tail;
@@ -68,19 +72,20 @@ static void write_blocked(const void *da
write_if_needed();
}
+static blocked_write_fn write_blocked = do_write_blocked;
+
/*
* The end of tar archives is marked by 2*512 nul bytes and after that
* follows the rest of the block (if any).
*/
static void write_trailer(void)
{
- int tail = BLOCKSIZE - offset;
- memset(block + offset, 0, tail);
- write_or_die(1, block, BLOCKSIZE);
- if (tail < 2 * RECORDSIZE) {
- memset(block, 0, offset);
- write_or_die(1, block, BLOCKSIZE);
- }
+ char zeroes[RECORDSIZE];
+ memset(zeroes, 0, RECORDSIZE);
+ write_blocked(zeroes, RECORDSIZE);
+ write_blocked(zeroes, RECORDSIZE);
+ while (offset)
+ write_blocked(zeroes, RECORDSIZE);
}
static void strbuf_append_string(struct strbuf *sb, const char *s)
@@ -404,6 +409,48 @@ int write_tar_archive(struct tree *tree,
return 0;
}
+static void write_blocked_gzip(const void *data, unsigned long size)
+{
+ const char *p = data;
+ int written;
+ unsigned long tail = size % RECORDSIZE;
+
+ while (size > 0) {
+ written = gzwrite(gzstdout, p, size);
+ if (written == 0) {
+ if (errno == EPIPE)
+ exit(0);
+ die("gzwrite error (%s)", strerror(errno));
+ }
+ size -= written;
+ p += written;
+ }
+
+ if (tail) {
+ z_off_t result = gzseek(gzstdout, RECORDSIZE - tail, SEEK_CUR);
+ if (result == -1)
+ die("gzseek error (%s)", strerror(errno));
+ }
+}
+
+int write_tgz_archive(struct tree *tree, const unsigned char *commit_sha1,
+ const char *prefix, time_t time, const char **pathspec)
+{
+ int result;
+
+ gzstdout = gzdopen(1, "wb");
+ if (!gzstdout)
+ die("zlib is unable to open stdout");
+
+ write_blocked = write_blocked_gzip;
+ result = write_tar_archive(tree, commit_sha1, prefix, time, pathspec);
+
+ if (gzclose(gzstdout) != Z_OK)
+ result = 1;
+
+ return result;
+}
+
static const char *exec = "git-upload-tar";
static int remote_tar(int argc, const char **argv)
--
VGER BF report: U 0.497484
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-02 12:23 [PATCH][RFC] Add git-archive-tree Rene Scharfe
2006-09-02 12:37 ` [PATCH] Add support for tgz archive format Rene Scharfe
@ 2006-09-02 13:10 ` Rene Scharfe
2006-09-02 20:13 ` Franck Bui-Huu
2006-09-02 21:19 ` Junio C Hamano
2006-09-02 14:17 ` Rene Scharfe
` (2 subsequent siblings)
4 siblings, 2 replies; 17+ messages in thread
From: Rene Scharfe @ 2006-09-02 13:10 UTC (permalink / raw)
To: Git Mailing List; +Cc: Junio C Hamano, Franck Bui-Huu
The two patches I sent are what I have been able to come up with so far.
The next step would be to add archive-neutral upload and download support.
Having thought a bit about it I propose to keep git-archive-tree for
local operations, only. It can be called by the uploader just like
git-tar-tree is now called by git-upload-tar. As Franck suggested, the
uploader should allow the list of archive formats it supports to be
restricted in a config file. The range of allowed compression levels
should also be configurable.
Does it make sense to change the wire protocol to simply send the
command line options one by one?
The interface could be something like this:
git-download-archive <repo> <git-archive-tree options...>
git-upload-archive <directory>
Or, if the big number of git command names is a concern:
git-archive-remote --upload|--download ...
What do you all think?
I won't have time for any of this over the next few days, so Franck, go
wild. ;-)
René
--
VGER BF report: U 0.499999
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-02 13:10 ` [PATCH][RFC] Add git-archive-tree Rene Scharfe
@ 2006-09-02 20:13 ` Franck Bui-Huu
2006-09-04 18:22 ` Rene Scharfe
2006-09-02 21:19 ` Junio C Hamano
1 sibling, 1 reply; 17+ messages in thread
From: Franck Bui-Huu @ 2006-09-02 20:13 UTC (permalink / raw)
To: Rene Scharfe; +Cc: Git Mailing List, Junio C Hamano
2006/9/2, Rene Scharfe <rene.scharfe@lsrfire.ath.cx>:
> The two patches I sent are what I have been able to come up with so far.
> The next step would be to add archive-neutral upload and download support.
>
> Having thought a bit about it I propose to keep git-archive-tree for
> local operations, only. It can be called by the uploader just like
Well I don't see why putting the remote operations in an other file. I
was more thinking on something like this:
git-archive --format=<fmt> [--remote=<repo>] <tree-ish> [path...]
This main porcelain function would call directly functions provided by
archivers lib. We will need to define an API which git-archive will
use for local operations.
Symetrically, on the sever side we would have:
git-upload-archive --format=<fmt> <repo> [path...]
used by git-daemon. It will deal with protocol, paths and use archiver's lib.
Eventually, we would have 2 commands:
git-archive
git-upload-archive
and get ride of
git-tar-tree
git-zip-tree
git-upload-tar
git-upload-zip
> git-tar-tree is now called by git-upload-tar. As Franck suggested, the
> uploader should allow the list of archive formats it supports to be
> restricted in a config file. The range of allowed compression levels
> should also be configurable.
>
> Does it make sense to change the wire protocol to simply send the
> command line options one by one?
That would make sense if the number of options grow up. Currently the
remote protocol had been written by Junio, I just pick up that part
from git-tar-tree and put it into git-archive. But if we allow
pathspec for remote operations, then we need to send them to the
uploader.
>
> The interface could be something like this:
>
> git-download-archive <repo> <git-archive-tree options...>
> git-upload-archive <directory>
>
> Or, if the big number of git command names is a concern:
>
I think it is, IMHO. And that's why I think we could have only one
command for building localy/remotely archive whatever the format.
git-archive should be a main procelain command, and we should get rid
of git-{tar,zip}-tree commands.
--
Franck
--
VGER BF report: U 0.742177
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-02 20:13 ` Franck Bui-Huu
@ 2006-09-04 18:22 ` Rene Scharfe
2006-09-04 20:09 ` Junio C Hamano
0 siblings, 1 reply; 17+ messages in thread
From: Rene Scharfe @ 2006-09-04 18:22 UTC (permalink / raw)
To: Franck Bui-Huu; +Cc: Git Mailing List, Junio C Hamano
Franck Bui-Huu schrieb:
> 2006/9/2, Rene Scharfe <rene.scharfe@lsrfire.ath.cx>:
>> The two patches I sent are what I have been able to come up with so far.
>> The next step would be to add archive-neutral upload and download
>> support.
>>
>> Having thought a bit about it I propose to keep git-archive-tree for
>> local operations, only. It can be called by the uploader just like
>
> Well I don't see why putting the remote operations in an other file. I
> was more thinking on something like this:
>
> git-archive --format=<fmt> [--remote=<repo>] <tree-ish> [path...]
My intention was to put both halves of the wire protocol implementation
into the same source file. I still think this is a good idea, but of
course it's independent from any command line interface considerations.
Internally, remote and local operations do completely different things,
so it doesn't make sense to mix them. But users probably don't care
about such details and may prefer a single command to express "gimme
that archive, no matter if it's made here or there". OK, sort of.
> This main porcelain function would call directly functions provided by
> archivers lib. We will need to define an API which git-archive will
> use for local operations.
Yes, this would be write_archive_fn_t in archive.h.
> Symetrically, on the sever side we would have:
>
> git-upload-archive --format=<fmt> <repo> [path...]
No, git-upload-archive would accept only a single parameter: the path to
the repository. It'd receive all other options via the wire protocol,
just like git-upload-tar.
> used by git-daemon. It will deal with protocol, paths and use archiver's
> lib.
Except if you meant by this that git-daemon would handle the protocol
etc., which I'd disagree with ("It" is slightly ambiguous here).
> Eventually, we would have 2 commands:
>
> git-archive
> git-upload-archive
>
> and get ride of
>
> git-tar-tree
> git-zip-tree
> git-upload-tar
> git-upload-zip
Let's keep the git-<verb>-<object> nomenclature used for most commands
for the sake of consistency. In both cases you'd only type
git-archiv<TAB> anyway. ;-)
We could remove the existing commands git-upload-tar and git-zip-tree
right now, as they were never part of a release, yet. git-upload-zip
doesn't exist, yet, and git-tar-tree would probably survive as legacy
interface, calling git-archive-tree internals. Junio, am I correct
regarding the cool-blooded killing of unreleased commands?
>> git-tar-tree is now called by git-upload-tar. As Franck suggested, the
>> uploader should allow the list of archive formats it supports to be
>> restricted in a config file. The range of allowed compression levels
>> should also be configurable.
>>
>> Does it make sense to change the wire protocol to simply send the
>> command line options one by one?
>
> That would make sense if the number of options grow up. Currently the
> remote protocol had been written by Junio, I just pick up that part
> from git-tar-tree and put it into git-archive. But if we allow
> pathspec for remote operations, then we need to send them to the
> uploader.
If we simply send each option to the downloader without understanding
them then the uploader can be kept really simple. The protocol can be
kept simple, and it would be future-proof: we will never have to update
it. I'm a bit worried about the security implications of such a setup,
strangely _because_ I can't see a way to exploit it right now (and
because it allows arbitrary input, which we must be able to cope with
anyway).
>>
>> The interface could be something like this:
>>
>> git-download-archive <repo> <git-archive-tree options...>
>> git-upload-archive <directory>
>>
>> Or, if the big number of git command names is a concern:
>>
>
> I think it is, IMHO. And that's why I think we could have only one
> command for building localy/remotely archive whatever the format.
> git-archive should be a main procelain command, and we should get rid
> of git-{tar,zip}-tree commands.
OK, makes sense; users will never need to call git-upload-archive
directly, and having a single command for all archiving is a good thing.
My next steps will be to make traverse_tree() support path specs, in
order to achieve feature parity with read_tree_recursive(). I hope that
the former keeps being significantly faster than the latter even after
that. If it works out I'm going to convert git-zip-tree to use
traverse_tree().
Other tasks: morphing git-upload-tar into git-upload-archive,
replacing/enhancing the protocol, and adding a matching uploader
(--remote= option handler). I won't start with that stuff until I get
the performance sorted out. I hope this keeps me from stepping on your
toes (again), Franck. ;-)
Thanks,
René
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-04 18:22 ` Rene Scharfe
@ 2006-09-04 20:09 ` Junio C Hamano
2006-09-04 22:02 ` Rene Scharfe
0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-09-04 20:09 UTC (permalink / raw)
To: Rene Scharfe; +Cc: Franck Bui-Huu, Git Mailing List
Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> Franck Bui-Huu schrieb:
>>
>> Well I don't see why putting the remote operations in an other file. I
>> was more thinking on something like this:
>>
>> git-archive --format=<fmt> [--remote=<repo>] <tree-ish> [path...]
>
> My intention was to put both halves of the wire protocol implementation
> into the same source file. I still think this is a good idea, but of
> course it's independent from any command line interface considerations.
I think so too, and agree with your reasoning.
> No, git-upload-archive would accept only a single parameter: the path to
> the repository. It'd receive all other options via the wire protocol,
> just like git-upload-tar.
Same here.
> We could remove the existing commands git-upload-tar and git-zip-tree
> right now, as they were never part of a release, yet. git-upload-zip
> doesn't exist, yet, and git-tar-tree would probably survive as legacy
> interface, calling git-archive-tree internals. Junio, am I correct
> regarding the cool-blooded killing of unreleased commands?
>>> git-tar-tree is now called by git-upload-tar.
I do think leaving tar-tree vs upload-tar protocol alone would
not hurt development of this new archive vs upload-archive
protocol. Also 'git-tar-tree --remote' has been part of git for
some time now (1.4.2 has it). Having said that, I suspect
nobody really relies on that, and I think this restructuring is
a good change. Time to poll the userbase?
Do people get upset if we said "git-tar-tree
--remote=xyzzy.example.com" is no more, and you will
have to say "git-fetch-archive -f tar xyzzy.example.com"?
>>> Does it make sense to change the wire protocol to simply send the
>>> command line options one by one?
>>
>> That would make sense if the number of options grow up.
... or different archive backends want different kind of things
(tar-tree without compression would not want -z6 but zip-tree
would).
>> But if we allow
>> pathspec for remote operations, then we need to send them to the
>> uploader.
Yup.
> If we simply send each option to the downloader without understanding
> them then the uploader can be kept really simple. The protocol can be
> kept simple, and it would be future-proof: we will never have to update
> it. I'm a bit worried about the security implications of such a setup,
> strangely _because_ I can't see a way to exploit it right now (and
> because it allows arbitrary input, which we must be able to cope with
> anyway).
>>>
>>> The interface could be something like this:
>>>
>>> git-download-archive <repo> <git-archive-tree options...>
>>> git-upload-archive <directory>
>>>
>> I think it is, IMHO. And that's why I think we could have only one
>> command for building localy/remotely archive whatever the format.
>> git-archive should be a main procelain command, and we should get rid
>> of git-{tar,zip}-tree commands.
>
> OK, makes sense; users will never need to call git-upload-archive
> directly, and having a single command for all archiving is a good thing.
So to recap:
- "git-archive-tree --format <foo> <foo specific options> <tree>"
would know how to create <foo> format archive and send the result
to stdout.
- "git-download-archive <repo> <git-archive-tree-command-line>"
would talk with "git-upload-archive" in the remote repository,
give archive-tree command line to it and receives the result.
- "git-upload-archive <repo>" is not used by the end user.
Underlying git-archive-tree command line options are sent
over the protocol from download-archive, just like upload-tar
does.
If this is what you mean, I think three of us are in agreement
here.
> My next steps will be to make traverse_tree() support path specs, in
> order to achieve feature parity with read_tree_recursive(). I hope that
> the former keeps being significantly faster than the latter even after
> that.
A thing that have been bothering me for some time in pathspec
area is that we have two (eh perhaps three) quite different
pathspec semantics.
- diff-tree family (and anything based on revision.c including
git-log) is a strict prefix directory match (e.g. no
wildcards, and "Documentation/howto" matches the directory
but not "Documentation/howto-index.sh")
- ls-files family (ls-files used to be an one odd man out, but
git-grep mimics it, and "git-commit <paths>" uses ls-files
internally so they form a family) is prefix match with
fnmatch match upon wildcard (e.g. "Documentation/howto" still
matches the directory but not "Documentation/howto-index.sh",
but you can say "Documentation/howto*" to match both, and you
can even say "Document*").
The former semantics is very friendly with "struct tree_desc"
based traversal. Allowing pathspecs with ls-files style
matching would probably more useful but more work. And there
are at least four implementations of pathspec matcher with
slightly different interfaces (ugh, sorry) if I am not mistaken:
builtin-ls-files.c has one (match), tree-diff.c has another
(interesting), builtin-grep.c has one (pathspec_matches), dir.c
has another (match_pathspec).
We might be able to share code with para-walk.c in "pu" (it is
designed to walk zero or more trees optionally with index and
working tree in parallel and we would be using it to walk only
single tree without index nor working tree). That would give me
an incentive to clean up the code ;-) Currently it is a ten
"WIP" patch series.
One thing it attempts to do on the side is to consolidate the
diff-tree style ones into one by introducing another in
read-cache.c (pathname_included) but it does not convert
existing users of other matchers to use it yet.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-04 20:09 ` Junio C Hamano
@ 2006-09-04 22:02 ` Rene Scharfe
2006-09-04 22:20 ` Junio C Hamano
0 siblings, 1 reply; 17+ messages in thread
From: Rene Scharfe @ 2006-09-04 22:02 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Franck Bui-Huu, Git Mailing List
Junio C Hamano schrieb:
> So to recap:
>
> - "git-archive-tree --format <foo> <foo specific options> <tree>"
> would know how to create <foo> format archive and send the result to
> stdout.
>
> - "git-download-archive <repo> <git-archive-tree-command-line>" would
> talk with "git-upload-archive" in the remote repository, give
> archive-tree command line to it and receives the result.
>
> - "git-upload-archive <repo>" is not used by the end user. Underlying
> git-archive-tree command line options are sent over the protocol from
> download-archive, just like upload-tar does.
>
> If this is what you mean, I think three of us are in agreement here.
Well, this is just _one_ of the positions I've taken on this topic, I
have to admit. Franck then convinced me that merging downloader and
archiver into one command is nice for users (less commands to remember,
keeps existing --remote option) even if it doesn't make sense
technically, because their implementations have nothing in common.
This is a bikeshed colour discussion, and I'm bad at it. I'll shut up
on this and simply follow the directions above.
>> My next steps will be to make traverse_tree() support path specs,
>> in order to achieve feature parity with read_tree_recursive(). I
>> hope that the former keeps being significantly faster than the
>> latter even after that.
>
> A thing that have been bothering me for some time in pathspec area is
> that we have two (eh perhaps three) quite different pathspec
> semantics.
>
> - diff-tree family (and anything based on revision.c including
> git-log) is a strict prefix directory match (e.g. no wildcards, and
> "Documentation/howto" matches the directory but not
> "Documentation/howto-index.sh")
>
> - ls-files family (ls-files used to be an one odd man out, but
> git-grep mimics it, and "git-commit <paths>" uses ls-files internally
> so they form a family) is prefix match with fnmatch match upon
> wildcard (e.g. "Documentation/howto" still matches the directory but
> not "Documentation/howto-index.sh", but you can say
> "Documentation/howto*" to match both, and you can even say
> "Document*").
>
> The former semantics is very friendly with "struct tree_desc" based
> traversal. Allowing pathspecs with ls-files style matching would
> probably more useful but more work. And there are at least four
> implementations of pathspec matcher with slightly different
> interfaces (ugh, sorry) if I am not mistaken: builtin-ls-files.c has
> one (match), tree-diff.c has another (interesting), builtin-grep.c
> has one (pathspec_matches), dir.c has another (match_pathspec).
>
> We might be able to share code with para-walk.c in "pu" (it is
> designed to walk zero or more trees optionally with index and working
> tree in parallel and we would be using it to walk only single tree
> without index nor working tree). That would give me an incentive to
> clean up the code ;-) Currently it is a ten "WIP" patch series.
>
> One thing it attempts to do on the side is to consolidate the
> diff-tree style ones into one by introducing another in read-cache.c
> (pathname_included) but it does not convert existing users of other
> matchers to use it yet.
Interesting. OK, I'll check out the existing implementations with an
eye on consolidation and also take a look around that scary place named
'pu'. ;-)
So far I have failed in creating a traverse_tree() function with path
spec match support which also is faster than read_tree_recursive().
Maybe the speed difference between 'git-tar-tree' and 'git-archive-tree
-ftar' is caused by something else. I keep on trying.
René
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-04 22:02 ` Rene Scharfe
@ 2006-09-04 22:20 ` Junio C Hamano
2006-09-05 11:43 ` Franck Bui-Huu
0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-09-04 22:20 UTC (permalink / raw)
To: Rene Scharfe; +Cc: git
Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> Junio C Hamano schrieb:
>...
> Well, this is just _one_ of the positions I've taken on this topic, I
> have to admit. Franck then convinced me that merging downloader and
> archiver into one command is nice for users (less commands to remember,
> keeps existing --remote option) even if it doesn't make sense
> technically, because their implementations have nothing in common.
Ah, I was not following the thread closely and I agree with you
and Franck now in that smaller number of commands the better.
(IOW, download-archive in my 3-item list is not needed in the
end user UI and can be implemented as "git-archive-tree
--remote=<repo>).
Thanks for clarification.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-04 22:20 ` Junio C Hamano
@ 2006-09-05 11:43 ` Franck Bui-Huu
0 siblings, 0 replies; 17+ messages in thread
From: Franck Bui-Huu @ 2006-09-05 11:43 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Rene Scharfe, git
2006/9/5, Junio C Hamano <junkio@cox.net>:
> Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
>
> > Junio C Hamano schrieb:
> >...
> > Well, this is just _one_ of the positions I've taken on this topic, I
> > have to admit. Franck then convinced me that merging downloader and
> > archiver into one command is nice for users (less commands to remember,
> > keeps existing --remote option) even if it doesn't make sense
> > technically, because their implementations have nothing in common.
>
> Ah, I was not following the thread closely and I agree with you
> and Franck now in that smaller number of commands the better.
> (IOW, download-archive in my 3-item list is not needed in the
> end user UI and can be implemented as "git-archive-tree
> --remote=<repo>).
>
OK, I'm sending what I've been able to do so far. It has been done
before you sent some feedbacks one day ago, so please forgive me if,
for the moment, I called "git-archive" instead of "git-archive-tree"
or if I missed some of your points; I'll change that.
I'm sending 2 patches, which are sent to get some comments. They're
not for inclusion in anyways.
The pathspec area has been pick up from Rene's initial patch. I have
not enough git's internal knowledge to improve it. But I'll be glad to
look at this later (if you have any clues for starting point BTW).
--
Franck
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-02 13:10 ` [PATCH][RFC] Add git-archive-tree Rene Scharfe
2006-09-02 20:13 ` Franck Bui-Huu
@ 2006-09-02 21:19 ` Junio C Hamano
1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-09-02 21:19 UTC (permalink / raw)
To: Rene Scharfe; +Cc: Git Mailing List, Franck Bui-Huu
Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> Does it make sense to change the wire protocol to simply send the
> command line options one by one?
Which wire protocol are you talking about? The one between
upload-tar and local "tar-tree --remote"? I think that one was
a tar-tree specific hack and we do not want to mimic it.
Your idea of making archiver neutral upload/download pair makes
sense. The daemon can invoke upload-archive with a single
parameter (the repository, "."), just like upload_tar() in
"pu:daemon.c" does [*1*], and upload-archive talks with the
other end to know which archiver to run with what paramter. We
would probably want to use some sort of side-band mechanism so
that we can do progress-bar as well.
*1* Unrelated side note. I find myself typing "git less" quite
often ;-)
git less pu:daemon.c
and get "no such command, dummy" response.
Yes, I know I could alias it to "-p cat-file -p". I am just
too lazy to do so.
--
VGER BF report: U 0.709954
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-02 12:23 [PATCH][RFC] Add git-archive-tree Rene Scharfe
2006-09-02 12:37 ` [PATCH] Add support for tgz archive format Rene Scharfe
2006-09-02 13:10 ` [PATCH][RFC] Add git-archive-tree Rene Scharfe
@ 2006-09-02 14:17 ` Rene Scharfe
2006-09-02 15:24 ` Franck Bui-Huu
2006-09-02 21:27 ` Junio C Hamano
4 siblings, 0 replies; 17+ messages in thread
From: Rene Scharfe @ 2006-09-02 14:17 UTC (permalink / raw)
To: Git Mailing List; +Cc: Junio C Hamano, Franck Bui-Huu
Oops, forgot to sign off. Please consider both patches
(git-archive-tree and tgz support) as
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
--
VGER BF report: U 0.500007
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-02 12:23 [PATCH][RFC] Add git-archive-tree Rene Scharfe
` (2 preceding siblings ...)
2006-09-02 14:17 ` Rene Scharfe
@ 2006-09-02 15:24 ` Franck Bui-Huu
2006-09-02 16:08 ` Rene Scharfe
2006-09-02 21:27 ` Junio C Hamano
4 siblings, 1 reply; 17+ messages in thread
From: Franck Bui-Huu @ 2006-09-02 15:24 UTC (permalink / raw)
To: Rene Scharfe; +Cc: Git Mailing List, Junio C Hamano
Hi,
2006/9/2, Rene Scharfe <rene.scharfe@lsrfire.ath.cx>:
> git-archive-tree is a command to make tar and ZIP archives of a git tree.
> It helps prevent a proliferation of git-{format}-tree commands. This is
> useful e.g. for remote archive fetching because we only need to write a
> single upload and a single download program that simply pass on the
> format option to git-archive-tree.
OK, Rene you beat me. I started to cook up something this morning but
had no time to go further.
I sent a starting implementation just because it seems complementary
to the one you sent a couple hours ago: it supports '--remote' option.
But it does _not_ have path spec support you introduced. I think it's
cool feature but I would have to dig into git's internal to implement
it, which would have taken me a while.
>From now, do you think we should import my work into your version or
'vice verca' ?
-- >8 --
Subject: [PATCH] Add git-archive (other implementation)
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
---
.gitignore | 1
Makefile | 1
builtin-archive.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++
builtin.h | 1
generate-cmdlist.sh | 1
git.c | 1
6 files changed, 176 insertions(+), 0 deletions(-)
diff --git a/.gitignore b/.gitignore
index 78cb671..a3e7ca1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -8,6 +8,7 @@ git-apply
git-applymbox
git-applypatch
git-archimport
+git-archive
git-bisect
git-branch
git-cat-file
diff --git a/Makefile b/Makefile
index 164dbcf..8d963e0 100644
--- a/Makefile
+++ b/Makefile
@@ -266,6 +266,7 @@ LIB_OBJS = \
BUILTIN_OBJS = \
builtin-add.o \
builtin-apply.o \
+ builtin-archive.o \
builtin-cat-file.o \
builtin-checkout-index.o \
builtin-check-ref-format.o \
diff --git a/builtin-archive.c b/builtin-archive.c
new file mode 100644
index 0000000..47898ee
--- /dev/null
+++ b/builtin-archive.c
@@ -0,0 +1,171 @@
+/*
+ * Copyright (c) 2006 Franck Bui-Huu
+ */
+#include "cache.h"
+#include "builtin.h"
+#include "exec_cmd.h"
+#include "pkt-line.h"
+
+#define MAX_ARGS 32
+
+static const char *extra_argv[MAX_ARGS];
+static int extra_argc = 1;
+static const char *base;
+static const char *remote;
+
+static const char archive_usage[] = \
+"git-archive -f <format> [extra] [--remote=<repo>] [--prefix=<base>]
<tree-ish>";
+
+/* archiver's options */
+#define NEED_BASE (1<<0)
+
+struct archiver_info {
+ const char *name;
+ int (*fn)(int argc, const char **argv, const char *prefix);
+ int options;
+};
+
+static struct archiver_info archivers[] = {
+ { "tar", cmd_tar_tree, NEED_BASE },
+ { "zip", cmd_zip_tree, NEED_BASE },
+};
+
+static int run_remote_archiver(struct archiver_info *ar, char *url)
+{
+ int fd[2], len, rv;
+ char buf[1024];
+ pid_t pid;
+
+ /*
+ * For now, remote operations does not support extra options
+ */
+ if (extra_argc > 2)
+ die("Remote operation does not support extra options yet.");
+
+ sprintf(buf, "git-upload-%s", ar->name);
+ pid = git_connect(fd, url, buf);
+ if (pid < 0)
+ return pid;
+
+ packet_write(fd[1], "want %s\n", extra_argv[1]);
+ if (base)
+ packet_write(fd[1], "base %s\n", base);
+ packet_flush(fd[1]);
+
+ len = packet_read_line(fd[0], buf, sizeof(buf));
+ if (!len)
+ die("git-archive: expected ACK/NAK, got EOF");
+ if (buf[len-1] == '\n')
+ buf[--len] = 0;
+ if (strcmp(buf, "ACK")) {
+ if (len > 5 && !strncmp(buf, "NACK ", 5))
+ die("git-archive: NACK %s", buf + 5);
+ die("git-archive: protocol error");
+ }
+ if (packet_read_line(fd[0], buf, sizeof(buf)))
+ die("git-archive: expected a flush");
+
+ /* Now, start reading from fd[0] and spit it out to stdout */
+ rv = copy_fd(fd[0], 1);
+ close(fd[0]);
+
+ return !!(rv | finish_connect(pid));
+}
+
+void insert_extra_option(const char *option, int idx)
+{
+ const char **argv = extra_argv; /* some shortcuts... */
+ int argc = extra_argc;
+
+ if (argc > MAX_ARGS - 2)
+ die("Too many args");
+ if (idx > MAX_ARGS - 1)
+ die("idx overflow extra_argv buffer !");
+
+ /* insert at last position ? */
+ if (idx < 0) {
+ argv[argc++] = option;
+ argv[argc] = NULL;
+ } else {
+ argv += idx;
+ memmove(argv + 1, argv, argc * sizeof(*argv));
+ *argv = option;
+ }
+ extra_argc++;
+}
+
+int run_archiver(const char *format)
+{
+ struct archiver_info *ar;
+ char buf[32];
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(archivers); i++) {
+ ar = &archivers[i];
+ if (strcmp(format, ar->name))
+ continue;
+ goto found;
+ }
+ return error("Unknown archive format '%s'\n", format);
+found:
+ if (remote) {
+ char *url = strdup(remote+9); /* --remote=<repo> */
+ return run_remote_archiver(ar, url);
+ }
+ if (base) {
+ i = 1;
+ if (ar->options & NEED_BASE) {
+ base += 9; /* --prefix=<base> */
+ i = -1;
+ }
+ insert_extra_option(base, i);
+ }
+ sprintf(buf, "%s-tree", ar->name);
+ extra_argv[0] = buf;
+
+ return ar->fn(extra_argc, extra_argv, setup_git_directory());
+}
+
+static int show_list_formats(void)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(archivers); i++) {
+ printf("%s\n", archivers[i].name);
+ }
+ return 0;
+}
+
+int cmd_archive(int argc, const char **argv, const char *prefix)
+{
+ static const char *format;
+ int i;
+
+ for (i = 1; i < argc; i++) {
+ const char *arg = argv[i];
+
+ if (!strcmp(arg, "--list") || !strcmp(arg, "-l")) {
+ return show_list_formats();
+ }
+ if (!strncmp(arg, "--format=", 9)) {
+ format = arg+9;
+ continue;
+ }
+ if (!strncmp(arg, "--prefix=", 9)) {
+ base = arg;
+ continue;
+ }
+ if (!strncmp(arg, "--remote=", 9)) {
+ remote = arg;
+ continue;
+ }
+ if (extra_argc > MAX_ARGS - 2)
+ die("Too many options");
+ extra_argv[extra_argc++] = arg;
+ }
+
+ if (!format)
+ die("You must specify an archive format");
+
+ return run_archiver(format);
+}
diff --git a/builtin.h b/builtin.h
index 25431d7..50852cd 100644
--- a/builtin.h
+++ b/builtin.h
@@ -15,6 +15,7 @@ extern int write_tree(unsigned char *sha
extern int cmd_add(int argc, const char **argv, const char *prefix);
extern int cmd_apply(int argc, const char **argv, const char *prefix);
+extern int cmd_archive(int argc, const char **argv, const char *prefix);
extern int cmd_cat_file(int argc, const char **argv, const char *prefix);
extern int cmd_checkout_index(int argc, const char **argv, const char *prefix);
extern int cmd_check_ref_format(int argc, const char **argv, const
char *prefix);
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index ec1eda2..5450918 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -12,6 +12,7 @@ struct cmdname_help common_cmds[] = {"
sort <<\EOF |
add
apply
+archive
bisect
branch
checkout
diff --git a/git.c b/git.c
index 403fb3a..ddb1620 100644
--- a/git.c
+++ b/git.c
@@ -218,6 +218,7 @@ static void handle_internal_command(int
} commands[] = {
{ "add", cmd_add, RUN_SETUP },
{ "apply", cmd_apply },
+ { "archive", cmd_archive },
{ "cat-file", cmd_cat_file, RUN_SETUP },
{ "checkout-index", cmd_checkout_index, RUN_SETUP },
{ "check-ref-format", cmd_check_ref_format },
--
1.4.2.g8ed9-dirty
--
VGER BF report: U 0.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-02 15:24 ` Franck Bui-Huu
@ 2006-09-02 16:08 ` Rene Scharfe
0 siblings, 0 replies; 17+ messages in thread
From: Rene Scharfe @ 2006-09-02 16:08 UTC (permalink / raw)
To: Franck Bui-Huu; +Cc: Git Mailing List, Junio C Hamano
Franck Bui-Huu schrieb:
> Hi,
>
> 2006/9/2, Rene Scharfe <rene.scharfe@lsrfire.ath.cx>:
>> git-archive-tree is a command to make tar and ZIP archives of a git tree.
>> It helps prevent a proliferation of git-{format}-tree commands. This is
>> useful e.g. for remote archive fetching because we only need to write a
>> single upload and a single download program that simply pass on the
>> format option to git-archive-tree.
>
> OK, Rene you beat me. I started to cook up something this morning but
> had no time to go further.
I am sorry. My excuse is I was so angry the other night (for unrelated
reasons) that I started coding and I didn't stop until git-archive-tree
was working at three in the morning.
> I sent a starting implementation just because it seems complementary
> to the one you sent a couple hours ago: it supports '--remote' option.
> But it does _not_ have path spec support you introduced. I think it's
> cool feature but I would have to dig into git's internal to implement
> it, which would have taken me a while.
I cheated by using the already existing function read_tree_recursive().
> From now, do you think we should import my work into your version or
> 'vice verca' ?
Of course I like mine better. :-) But first we need to agree on a
direction. I think it makes sense to keep the network stuff out of the
archiver, and to keep the downloader as simple ("dumb") as possible: it
should pass on any options unaltered, including the format option.
That means the wire format changes completely, but since this feature
hasn't been officially released, yet, this shouldn't be a problem, right?
So, I propose to not add anything (except new formats) to
git-archive-tree and to add a separate downloader and uploader (maybe
combined in one program) instead. That way the remote archive
operations are format agnostic. OK, almost, because the uploader needs
to determine the format and compression level in order to allow or
disallow uploads based on that.
What do you think?
Thanks,
René
--
VGER BF report: U 0.539485
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-02 12:23 [PATCH][RFC] Add git-archive-tree Rene Scharfe
` (3 preceding siblings ...)
2006-09-02 15:24 ` Franck Bui-Huu
@ 2006-09-02 21:27 ` Junio C Hamano
2006-09-06 18:05 ` Rene Scharfe
4 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-09-02 21:27 UTC (permalink / raw)
To: Rene Scharfe; +Cc: git, Franck Bui-Huu
Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> Currently git-archive-tree -f tar is slower than git-tar-tree. This is
> because it is welded to the side of the existing code to minimize patch
> size, and I also suspect read_tree_recursive() to be quite a bit slower
> than builtin-tar-tree.c::traverse_tree().
Yes, I suspect "struct object" and friends are very inefficient
to use for things like this. "struct tree_desc" based traverser
is much preferred.
--
VGER BF report: U 0.772588
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-02 21:27 ` Junio C Hamano
@ 2006-09-06 18:05 ` Rene Scharfe
2006-09-06 21:47 ` Junio C Hamano
0 siblings, 1 reply; 17+ messages in thread
From: Rene Scharfe @ 2006-09-06 18:05 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Franck Bui-Huu
Junio C Hamano schrieb:
> Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
>
>> Currently git-archive-tree -f tar is slower than git-tar-tree. This is
>> because it is welded to the side of the existing code to minimize patch
>> size, and I also suspect read_tree_recursive() to be quite a bit slower
>> than builtin-tar-tree.c::traverse_tree().
>
> Yes, I suspect "struct object" and friends are very inefficient
> to use for things like this. "struct tree_desc" based traverser
> is much preferred.
Turns out the reason for git-archive-tree -f tar being 10% slower than
git-tar-tree was a stupid memory leak in write_tar_entry(). *blush*
It caused lots of brk() calls (i.e. system calls).
In order to simplify measurement, I commented out the body of
write_entry(), which both git-archive-tree and git-tar-tree are calling
to write their output. The rest left is basically two pure tree
traversers, git-archive-tree using read_tree_recursive() and git-tar-tree
using its struct tree_desc based traverse_tree().
I then let the two chew away on the kernel repository. And as
kcachegrind impressively shows, all we do with our trees and objects is
dwarfed by inflate(). In both cases more than 96.6% of the cost lies
within libz. That's not too surprising, because archivers need to
decompress _all_ objects, not only trees.
So for git-archive we can pretty much chose which traverser to use based
on convenience.
As a second experiment I wrote a struct tree_desc based traverser plus a
matching read_tree_recursive() compatibility wrapper (included below for
reference, not for inclusion) and compared the performance of
'git-ls-tree -r -t' on the kernel repository with and without it. The
result is that the relative cost of all functions from tree.c combined
decreased from 0.93% to 0.66%. Ugh.
So while a struct tree_desc based traverser can be significantly faster
than read_tree_recursive(), as soon as you actually start to do something
to the trees that difference pales to insignificance.
René
diff --git a/tree.c b/tree.c
index ea386e5..977a4aa 100644
--- a/tree.c
+++ b/tree.c
@@ -4,6 +4,7 @@ #include "blob.h"
#include "commit.h"
#include "tag.h"
#include "tree-walk.h"
+#include "strbuf.h"
#include <stdlib.h>
const char *tree_type = "tree";
@@ -227,3 +228,99 @@ struct tree *parse_tree_indirect(const u
parse_object(obj->sha1);
} while (1);
}
+
+static int do_read_tree_recursive_light(struct tree_desc *desc,
+ struct strbuf *base,
+ const char **match, read_tree_fn_t fn)
+{
+ struct name_entry entry;
+ int err = 0;
+ int baselen = base->len;
+
+ while (tree_entry(desc, &entry)) {
+ if (!match_tree_entry(base->buf, base->len, entry.path, entry.mode, match))
+ continue;
+
+ err = fn(entry.sha1, base->buf, base->len, entry.path, entry.mode, 0);
+ switch (err) {
+ case 0:
+ continue;
+ case READ_TREE_RECURSIVE:
+ break;
+ default:
+ return -1;
+ }
+
+ if (S_ISDIR(entry.mode)) {
+ struct tree_desc subtree;
+ char type[20];
+ void *buf;
+ int newbaselen;
+
+ buf = read_sha1_file(entry.sha1, type, &subtree.size);
+ if (!buf)
+ return error("cannot read %s",
+ sha1_to_hex(entry.sha1));
+ if (strcmp(type, tree_type)) {
+ free(buf);
+ return error("Object %s not a tree",
+ sha1_to_hex(entry.sha1));
+ }
+ subtree.buf = buf;
+
+ newbaselen = baselen + entry.pathlen + 1;
+ if (newbaselen > base->alloc) {
+ base->buf = xrealloc(base->buf, newbaselen);
+ base->alloc = newbaselen;
+ }
+ memcpy(base->buf + baselen, entry.path, entry.pathlen);
+ base->buf[baselen + entry.pathlen] = '/';
+ base->len = newbaselen;
+
+ err = do_read_tree_recursive_light(&subtree,
+ base,
+ match, fn);
+ base->len = baselen;
+ free(buf);
+ if (err)
+ break;
+ }
+ }
+
+ return err;
+}
+
+int read_tree_recursive_light(struct tree *tree,
+ const char *base, int baselen, int stage,
+ const char **match, read_tree_fn_t fn)
+{
+ unsigned char *sha1 = tree->object.sha1;
+ struct tree_desc desc;
+ char type[20];
+ void *buf;
+ int err;
+ struct strbuf sb;
+
+ sb.buf = xmalloc(PATH_MAX);
+ sb.alloc = PATH_MAX;
+ sb.len = 0;
+ if (baselen > sb.alloc) {
+ sb.buf = xrealloc(sb.buf, baselen);
+ sb.alloc = baselen;
+ }
+ memcpy(sb.buf, base, baselen);
+ sb.len = baselen;
+
+ desc.buf = buf = read_sha1_file(sha1, type, &desc.size);
+ if (!buf)
+ return error("Could not read %s", sha1_to_hex(sha1));
+ if (strcmp(type, tree_type)) {
+ free(buf);
+ return error("Object %s not a tree", sha1_to_hex(sha1));
+ }
+
+ err = do_read_tree_recursive_light(&desc, &sb, match, fn);
+ free(buf);
+
+ return err;
+}
diff --git a/tree.h b/tree.h
index dd25c53..2294bc2 100644
--- a/tree.h
+++ b/tree.h
@@ -30,4 +30,8 @@ extern int read_tree_recursive(struct tr
extern int read_tree(struct tree *tree, int stage, const char **paths);
+extern int read_tree_recursive_light(struct tree *tree,
+ const char *base, int baselen, int stage,
+ const char **match, read_tree_fn_t fn);
+
#endif /* TREE_H */
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-06 18:05 ` Rene Scharfe
@ 2006-09-06 21:47 ` Junio C Hamano
2006-09-17 11:54 ` Rene Scharfe
0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-09-06 21:47 UTC (permalink / raw)
To: Rene Scharfe; +Cc: git, Franck Bui-Huu
Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> I then let the two chew away on the kernel repository. And as
> kcachegrind impressively shows, all we do with our trees and objects is
> dwarfed by inflate().
The diff output codepath has a logic that says "if the blob we
are dealing with has the same object name as the corresponding
blob in the index, and if the index entry is clean (i.e. it is
known that the file sitting in the working tree matches the
blob), then do not inflate() but use data from that file
instead". It was originally done that way because we used to
prepare temporary files out of blob and fed them to GNU diff,
but I think it is still kept that way. I wonder if doing the
same may be cheaper for archivers.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH][RFC] Add git-archive-tree
2006-09-06 21:47 ` Junio C Hamano
@ 2006-09-17 11:54 ` Rene Scharfe
0 siblings, 0 replies; 17+ messages in thread
From: Rene Scharfe @ 2006-09-17 11:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Franck Bui-Huu
Junio C Hamano schrieb:
> Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
>
>> I then let the two chew away on the kernel repository. And as
>> kcachegrind impressively shows, all we do with our trees and
>> objects is dwarfed by inflate().
>
> The diff output codepath has a logic that says "if the blob we are
> dealing with has the same object name as the corresponding blob in
> the index, and if the index entry is clean (i.e. it is known that the
> file sitting in the working tree matches the blob), then do not
> inflate() but use data from that file instead".
Nice idea. The tree traverser would need to provide the filenames
relative to the current working directory in addition to the
filenames as they are written to the archive. I guess your para-walk
tree walker could be useful here. I sadly haven't found the time to
look at it, yet, and now it even vanished from the pu branch.
A read is an order of magnitude faster than a deflate of the same data,
at least that's what I guess from comparing the runtimes of git-tar-tree
and tar. _However_, this doesn't account for I/O costs (in my tests the
repo and all checked-out files were cache hot) and for any compression
that would certainly be applied to the resulting archive. So the full
runtime of archive creation wouldn't be that much shorter.
René
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2006-09-17 11:54 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-02 12:23 [PATCH][RFC] Add git-archive-tree Rene Scharfe
2006-09-02 12:37 ` [PATCH] Add support for tgz archive format Rene Scharfe
2006-09-02 13:10 ` [PATCH][RFC] Add git-archive-tree Rene Scharfe
2006-09-02 20:13 ` Franck Bui-Huu
2006-09-04 18:22 ` Rene Scharfe
2006-09-04 20:09 ` Junio C Hamano
2006-09-04 22:02 ` Rene Scharfe
2006-09-04 22:20 ` Junio C Hamano
2006-09-05 11:43 ` Franck Bui-Huu
2006-09-02 21:19 ` Junio C Hamano
2006-09-02 14:17 ` Rene Scharfe
2006-09-02 15:24 ` Franck Bui-Huu
2006-09-02 16:08 ` Rene Scharfe
2006-09-02 21:27 ` Junio C Hamano
2006-09-06 18:05 ` Rene Scharfe
2006-09-06 21:47 ` Junio C Hamano
2006-09-17 11:54 ` Rene Scharfe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).