* [PATCH] Add builtin dump command, to query a repository using a pipe.
@ 2008-01-26 0:24 Han-Wen Nienhuys
2008-01-26 9:35 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Han-Wen Nienhuys @ 2008-01-26 0:24 UTC (permalink / raw)
To: git
This command prints either contents or metadata on stdout for SHA1s or
symbolic object names supplied to its stdin, so you can extract many
pieces of data of the repository using a single subprocess.
Documentation will follow once the code is approved.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
Makefile | 1 +
builtin-dump.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
builtin.h | 1 +
git.c | 1 +
4 files changed, 103 insertions(+), 0 deletions(-)
create mode 100644 builtin-dump.c
diff --git a/Makefile b/Makefile
index 21c80e6..63a26a7 100644
--- a/Makefile
+++ b/Makefile
@@ -339,6 +339,7 @@ BUILTIN_OBJS = \
builtin-diff-files.o \
builtin-diff-index.o \
builtin-diff-tree.o \
+ builtin-dump.o \
builtin-fast-export.o \
builtin-fetch.o \
builtin-fetch-pack.o \
diff --git a/builtin-dump.c b/builtin-dump.c
new file mode 100644
index 0000000..813c7ab
--- /dev/null
+++ b/builtin-dump.c
@@ -0,0 +1,100 @@
+/*
+ * Copyright (C) 2008 Google Inc.
+ * Author: hanwen@google.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include "cache.h"
+#include "exec_cmd.h"
+#include "object.h"
+#include "builtin.h"
+
+/*
+ format:
+
+ 'show' SP type SP objname LF
+ =>
+ 'len' number LF
+ contents LF
+
+ 'metadata' SP objname LF
+ =>
+ type SP size SP sha1 LF
+
+ for non-existent objects, 'metadata' prints
+
+ 'none' SP 0 SP null-hash LF
+
+*/
+
+typedef unsigned char sha1_t[20];
+
+void print_metadata(struct strbuf command_buf) {
+ char *obj_name = strchr(command_buf.buf, ' ') + 1;
+ sha1_t sha1 = {0};
+ const char *type_name = "none";
+ unsigned long size = 0;
+ char header[200];
+
+ if (!get_sha1(obj_name, sha1)) {
+ enum object_type type = sha1_object_info(sha1, &size);
+ type_name = type ? typename(type) : "none";
+ }
+ sprintf(header, "%s %lu %s\n", type_name, size, sha1_to_hex(sha1));
+ write_or_die(1, header, strlen(header));
+}
+
+
+void print_show(struct strbuf command_buf) {
+ char *type = strchr(command_buf.buf, ' ') + 1;
+ char *obj_name;
+ sha1_t sha1;
+ unsigned long size;
+ char header[200];
+ void *buf;
+
+ if (!type)
+ die("commmand format error: %s\n", command_buf.buf);
+
+ obj_name = strchr(type, ' ');
+ if (!obj_name)
+ die("commmand format error: %s\n", command_buf.buf);
+ *obj_name = '\0';
+ obj_name++;
+
+ if (get_sha1(obj_name, sha1))
+ die("not a valid object name: %s", obj_name);
+
+ buf = read_object_with_reference(sha1, type, &size, NULL);
+ sprintf(header, "len %lu\n", size);
+ write_or_die(1, header, strlen(header));
+ write_or_die(1, buf, size);
+ write_or_die(1, "\n", 1);
+}
+
+
+int cmd_dump(int argc, const char **argv, const char *prefix) {
+ struct strbuf command_buf = STRBUF_INIT;
+ while (strbuf_getline(&command_buf, stdin, '\n') != EOF) {
+ if (!prefixcmp(command_buf.buf, "metadata "))
+ print_metadata(command_buf);
+ else if (!prefixcmp(command_buf.buf, "print "))
+ print_show(command_buf);
+ else
+ die("unknown command %s", command_buf.buf);
+ }
+ return 0;
+}
diff --git a/builtin.h b/builtin.h
index cb675c4..df873f9 100644
--- a/builtin.h
+++ b/builtin.h
@@ -33,6 +33,7 @@ extern int cmd_diff_files(int argc, const char **argv, const char *prefix);
extern int cmd_diff_index(int argc, const char **argv, const char *prefix);
extern int cmd_diff(int argc, const char **argv, const char *prefix);
extern int cmd_diff_tree(int argc, const char **argv, const char *prefix);
+extern int cmd_dump(int argc, const char **argv, const char *prefix);
extern int cmd_fast_export(int argc, const char **argv, const char *prefix);
extern int cmd_fetch(int argc, const char **argv, const char *prefix);
extern int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index 15fec89..853dbc1 100644
--- a/git.c
+++ b/git.c
@@ -303,6 +303,7 @@ static void handle_internal_command(int argc, const char **argv)
{ "diff-files", cmd_diff_files },
{ "diff-index", cmd_diff_index, RUN_SETUP },
{ "diff-tree", cmd_diff_tree, RUN_SETUP },
+ { "dump", cmd_dump, RUN_SETUP },
{ "fast-export", cmd_fast_export, RUN_SETUP },
{ "fetch", cmd_fetch, RUN_SETUP },
{ "fetch-pack", cmd_fetch_pack, RUN_SETUP },
--
1.5.3.5.643.g40e25
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] Add builtin dump command, to query a repository using a pipe.
2008-01-26 0:24 [PATCH] Add builtin dump command, to query a repository using a pipe Han-Wen Nienhuys
@ 2008-01-26 9:35 ` Junio C Hamano
2008-01-26 9:46 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2008-01-26 9:35 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: git
Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
> This command prints either contents or metadata on stdout for SHA1s or
> symbolic object names supplied to its stdin, so you can extract many
> pieces of data of the repository using a single subprocess.
>
> Documentation will follow once the code is approved.
I will comment on this part at the end.
> +typedef unsigned char sha1_t[20];
Hmph. We do not seem to do typedefs like this elsewhere. Would
it be worth doing so everywhere? I generally stay away from a
new type that is "an array of some type".
> +void print_metadata(struct strbuf command_buf) {
Style. "{" at the beginning of a function goes to the beginning
of a line.
Are you sure you want to pass "struct strbuf" by value? As you
do not use anything other than its .buf, you may instead want to
make the parameter "const char *" and have the caller do buf.buf
business.
> + char *obj_name = strchr(command_buf.buf, ' ') + 1;
Wasts reader's time by making him wonder if this needs to check
for malformed command buffer. The answer happens to be no only
because the caller does something stupid ;-)
if (!prefixcmp(command_buf.buf, "metadata "))
print_metadata(command_buf);
and the strchr is guaranteed to find that ' ' after metadata.
if (!prefixcmp(command_buf.buf, "metadata "))
print_metadata(command_buf.buf + 9);
would give you the beginning of obj_name in the parameter
without any need for the strchr().
> + sha1_t sha1 = {0};
> + const char *type_name = "none";
> + unsigned long size = 0;
> + char header[200];
> +
> + if (!get_sha1(obj_name, sha1)) {
> + enum object_type type = sha1_object_info(sha1, &size);
> + type_name = type ? typename(type) : "none";
> + }
> + sprintf(header, "%s %lu %s\n", type_name, size, sha1_to_hex(sha1));
> + write_or_die(1, header, strlen(header));
> +}
> +
> +
> +void print_show(struct strbuf command_buf) {
> + char *type = strchr(command_buf.buf, ' ') + 1;
Likewise.
> + char *obj_name;
> + sha1_t sha1;
> + unsigned long size;
> + char header[200];
> + void *buf;
> +
> + if (!type)
> + die("commmand format error: %s\n", command_buf.buf);
Especially, this will never trigger, although "if (!*type)" would.
> +
> + obj_name = strchr(type, ' ');
> + if (!obj_name)
> + die("commmand format error: %s\n", command_buf.buf);
I am not sure if this interface that "die()"s on an obviously
malformed input is nice to the user (which is the process at the
other end of the pipe). It only gets abrupt disconnection
without any diagnosis. But the user may deserve it, as it is
about "obviously malformed" input. I do not have strong opinion
about this.
> + *obj_name = '\0';
> + obj_name++;
> +
> + if (get_sha1(obj_name, sha1))
> + die("not a valid object name: %s", obj_name);
> +
> + buf = read_object_with_reference(sha1, type, &size, NULL);
This is not nice and definitely needs to be tightened for a case
where the object does not exist in this repository. Otherwise
safe use of this dump "server" will require one "metadata" query
to protect "print" query from barfing for each object.
> + sprintf(header, "len %lu\n", size);
> + write_or_die(1, header, strlen(header));
> + write_or_die(1, buf, size);
> + write_or_die(1, "\n", 1);
> +}
IOW, this needs an if() statement that triggers upon (buf == NULL)
to return an error message back to the other end.
Now, back to the proposed commit log message.
> Documentation will follow once the code is approved.
This sends a wrong message to the list audiences. It roughly
says:
I wrote something new. It is used to read many pieces of
data without having to spawn a subprocesses per request.
Isn't it nice?
Oh, by the way, I am not giving you more than the absolute
minimum information right now, the code, to let you become
qualified to say "Ah, yes, that new feature makes sense, it
meshes well with the way how the rest of the system works,
and I think it is a good addition to the git.git project".
If you really want to know how it interfaces to the outside
world (i.e. how the input is structured is parsed, and how
the output is structured and is designed to be parsed, how
errors are reported), go read it.
But trust me, it's a good code, and you will have
documentation once it is accepted anyway.
But the external interface matters. It often matters more than
the details of how the interface is implemented (which is the
code). If the interface is wrong, time spent on reading and
commenting on the code is often wasted. I had to read all the
code to notice that I may not necessarily agree with the
response of the command to a malformed input, or missing
objects. If it was documented, it would have been more obvious
and many other people would have responded, including the ones
who may not be inclined to work on or review dump "server"
implementation side written in C, but are _very much_ interested
in using such a server as a client, perhaps from their
higher-level language scripts.
If the patch was about a completely new command and sizeable, it
is very likely that I might have felt defeated and said "Ah, Ok,
I choose to stay unqualified to judge this patch to say 'yes,
this is a good idea', if I have to wade through that amount of
code". Most likely many others would do the same.
Please don't do that again. It also gives a false impression
that this kind of submission is an appropriate and acceptable
approach to people new to the list.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Add builtin dump command, to query a repository using a pipe.
2008-01-26 9:35 ` Junio C Hamano
@ 2008-01-26 9:46 ` Junio C Hamano
0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2008-01-26 9:46 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: git
Another thing.
As currently written, the "dump" server is strictly one-request,
one-response interface.
I can easily see the command set to be extended to allow
transfer of richer set of data (e.g. whether a ref is a symref
and if so where it points at, querying config variables). Also,
if you want to write a repository browser using this "dump"
server, you would first throw a commit at it, learn its parents
and its tree, and then you suddenly have _many_ objects that you
can simultaneously ask about without doing one-request,
one-response exchange.
So it might make sense to base this not on line-per-line message
format using strbuf_getline() interface, but base it instead on
packet_read_line()/packet_write() interface, and explicitly mark
the end of a batch of request with packet_flush(). That would
allow the "protocol" stream, which would matter even more when
you run this over the network, perhaps via git-daemon and/or
from connect.c
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-01-26 9:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-26 0:24 [PATCH] Add builtin dump command, to query a repository using a pipe Han-Wen Nienhuys
2008-01-26 9:35 ` Junio C Hamano
2008-01-26 9:46 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).