* [PATCHv3 1/9] Update technical docs to reflect side-band-64k capability in receive-pack
2011-05-15 21:37 ` [PATCHv3 0/9] Push limits Johan Herland
@ 2011-05-15 21:37 ` Johan Herland
2011-05-15 21:37 ` [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails Johan Herland
` (8 subsequent siblings)
9 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/technical/pack-protocol.txt | 3 ++-
Documentation/technical/protocol-capabilities.txt | 4 ++--
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 369f91d..4a68f0f 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -391,7 +391,8 @@ The reference discovery phase is done nearly the same way as it is in the
fetching protocol. Each reference obj-id and name on the server is sent
in packet-line format to the client, followed by a flush-pkt. The only
real difference is that the capability listing is different - the only
-possible values are 'report-status', 'delete-refs' and 'ofs-delta'.
+possible values are 'report-status', 'delete-refs', 'side-band-64k' and
+'ofs-delta'.
Reference Update Request and Packfile Transfer
----------------------------------------------
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index b15517f..b732e80 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -21,8 +21,8 @@ NOT advertise capabilities it does not understand.
The 'report-status' and 'delete-refs' capabilities are sent and
recognized by the receive-pack (push to server) process.
-The 'ofs-delta' capability is sent and recognized by both upload-pack
-and receive-pack protocols.
+The 'side-band-64k' and 'ofs-delta' capabilities are sent and
+recognized by both upload-pack and receive-pack protocols.
All other capabilities are only recognized by the upload-pack (fetch
from server) process.
--
1.7.5.rc1.3.g4d7b
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails
2011-05-15 21:37 ` [PATCHv3 0/9] Push limits Johan Herland
2011-05-15 21:37 ` [PATCHv3 1/9] Update technical docs to reflect side-band-64k capability in receive-pack Johan Herland
@ 2011-05-15 21:37 ` Johan Herland
2011-05-16 4:07 ` Jeff King
2011-05-15 21:37 ` [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
` (7 subsequent siblings)
9 siblings, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git, Jeff King, Johannes Sixt
When pushing, send-pack uses pack-objects to write the pack data to the
receive-pack process running on the remote end. The scenarios where
pack-objects dies unexpectedly, can be roughly divided based whether the
reason for the failure is _local_ (i.e. something in pack-objects caused
it to fail of its own accord), or _remote_ (i.e. something in the remote
receive-pack process caused it to fail, leaving the local pack-objects
process with a broken pipe)
If the reason for the failure is local, we expect pack-objects to report
an appropriate error message to the user.
However, if the reason for the failure is remote, pack-objects will merely
abort because of the broken pipe, and the user is left with no clue as to
the reason why the remote receive-pack process died.
In certain cases, though, the receive-pack process on the other end may have
produced an error message immediately before exiting. This error message may
be currently waiting to be read by the local send-pack process.
Therefore, we should try to read from the remote end, even when pack-objects
dies unexepectedly. We accomplish this by _always_ calling receive_status()
after pack_objects(). If the remote end managed to produce a well-formed
status report before exiting, then receive_status() simply presents that to
the user. Even if the data from the remote end cannot be understood by
receive_status(), it will print that data as part of its error message. In
any case, we give the user as much information about the failure as possible.
Signed-off-by: Johan Herland <johan@herland.net>
---
I first wrote this patch on a base where e07fd15 (Peff's "send-pack:
unbreak push over stateless rpc") was not present, and then resolved
a conflict when rebasing this patch onto current master. I hope Peff
or Johannes (Sixt) can verify that my patch does not reintroduce the
deadlock they fixed.
...Johan
builtin/send-pack.c | 18 +++++-------------
1 files changed, 5 insertions(+), 13 deletions(-)
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 4ac2ca9..f571917 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -251,7 +251,7 @@ int send_pack(struct send_pack_args *args,
int status_report = 0;
int use_sideband = 0;
unsigned cmds_sent = 0;
- int ret;
+ int ret = 0;
struct async demux;
/* Does the other end support the reporting? */
@@ -339,23 +339,15 @@ int send_pack(struct send_pack_args *args,
}
if (new_refs && cmds_sent) {
- if (pack_objects(out, remote_refs, extra_have, args) < 0) {
- for (ref = remote_refs; ref; ref = ref->next)
- ref->status = REF_STATUS_NONE;
- if (args->stateless_rpc)
- close(out);
- if (use_sideband)
- finish_async(&demux);
- return -1;
- }
+ ret = pack_objects(out, remote_refs, extra_have, args);
+ if (ret && args->stateless_rpc)
+ close(out);
}
if (args->stateless_rpc && cmds_sent)
packet_flush(out);
if (status_report && cmds_sent)
- ret = receive_status(in, remote_refs);
- else
- ret = 0;
+ ret |= receive_status(in, remote_refs);
if (args->stateless_rpc)
packet_flush(out);
--
1.7.5.rc1.3.g4d7b
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails
2011-05-15 21:37 ` [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails Johan Herland
@ 2011-05-16 4:07 ` Jeff King
2011-05-16 6:13 ` Jeff King
0 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-16 4:07 UTC (permalink / raw)
To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt
On Sun, May 15, 2011 at 11:37:13PM +0200, Johan Herland wrote:
> I first wrote this patch on a base where e07fd15 (Peff's "send-pack:
> unbreak push over stateless rpc") was not present, and then resolved
> a conflict when rebasing this patch onto current master. I hope Peff
> or Johannes (Sixt) can verify that my patch does not reintroduce the
> deadlock they fixed.
I don't think it reintroduces the deadlock we fixed, but I am worried
that it produces a new, similar one. That is, imagine pack-objects fails
horribly, maybe or maybe not producing any output. We close its pipe
outgoing pipe at the end of run_command, and per Johannes' 09c9957, we
are sure that the sideband demuxer does not hold a pipe end open,
either.
So the remote side sees us close our end of the pipe, knows there is no
more pack data, and then closes their end. So our receive_status should
either get some error message, or EOF, either of which is fine. So no
deadlock there. Essentially, we have done a half-duplex shutdown of the
connection to the remote, and that is enough for everybody to keep
going.
But what if we are not using pipes, but have an actual TCP socket? In
that case, I'm not sure what happens. We don't seem to do a half-duplex
shutdown() anywhere. So I'm concerned that we are still open for sending
from the remote's perspective, and we may deadlock.
However, that would not necessarily be something introduced by your
patch; you would deadlock in receive_status, but prior to that it would
deadlock in the sideband demuxer.
AFAICT, the only way to have an actual TCP connection instead of pipes
is for the push to go over git://, which is enabled almost nowhere. But
we should perhaps check for deadlock on failed pack-objects in that
case, both with and without your patch.
-Peff
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails
2011-05-16 4:07 ` Jeff King
@ 2011-05-16 6:13 ` Jeff King
2011-05-16 6:39 ` Jeff King
0 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-16 6:13 UTC (permalink / raw)
To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt
On Mon, May 16, 2011 at 12:07:45AM -0400, Jeff King wrote:
> But what if we are not using pipes, but have an actual TCP socket? In
> that case, I'm not sure what happens. We don't seem to do a half-duplex
> shutdown() anywhere. So I'm concerned that we are still open for sending
> from the remote's perspective, and we may deadlock.
>
> However, that would not necessarily be something introduced by your
> patch; you would deadlock in receive_status, but prior to that it would
> deadlock in the sideband demuxer.
>
> AFAICT, the only way to have an actual TCP connection instead of pipes
> is for the push to go over git://, which is enabled almost nowhere. But
> we should perhaps check for deadlock on failed pack-objects in that
> case, both with and without your patch.
Ugh, yeah, yet another deadlock. I can reproduce reliably with this:
[in one terminal]
mkdir daemon &&
git init --bare daemon/repo.git &&
git --git-dir=daemon/repo.git config daemon.receivepack true &&
git daemon --base-path=$PWD/daemon --export-all --verbose
[in another]
git init repo &&
cd repo &&
git remote add origin git://localhost/repo.git &&
echo content >file && git add file && git commit -a -m one &&
git push -f origin HEAD &&
echo content >>file && git commit -a -m two &&
sha1=`git rev-parse HEAD:file` &&
file=`echo $sha1 | sed 's,..,&/,'` &&
rm -fv .git/objects/$file &&
git push
and this patch fixes it:
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index e2f4e21..b9da044 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -345,6 +345,13 @@ int send_pack(struct send_pack_args *args,
ref->status = REF_STATUS_NONE;
if (args->stateless_rpc)
close(out);
+ /* in case we actually have a full-duplex socket
+ * and not two pipes; we can't use "out" because
+ * it has been closed already, but in the full-duplex
+ * case, "in" and "out" are merely dups of each other.
+ * We can't directly use "in" because it may be
+ * pointing to the sideband demuxer now */
+ shutdown(fd[0], SHUT_WR);
if (use_sideband)
finish_async(&demux);
return -1;
It does call shutdown() on a non-socket in the pipe case. That should be
a harmless noop, AFAIK.
-Peff
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails
2011-05-16 6:13 ` Jeff King
@ 2011-05-16 6:39 ` Jeff King
2011-05-16 6:46 ` [PATCH 1/3] connect: treat generic proxy processes like ssh processes Jeff King
` (2 more replies)
0 siblings, 3 replies; 53+ messages in thread
From: Jeff King @ 2011-05-16 6:39 UTC (permalink / raw)
To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt
On Mon, May 16, 2011 at 02:13:54AM -0400, Jeff King wrote:
> and this patch fixes it:
>
> diff --git a/builtin/send-pack.c b/builtin/send-pack.c
> index e2f4e21..b9da044 100644
> --- a/builtin/send-pack.c
> +++ b/builtin/send-pack.c
> @@ -345,6 +345,13 @@ int send_pack(struct send_pack_args *args,
> ref->status = REF_STATUS_NONE;
> if (args->stateless_rpc)
> close(out);
> + /* in case we actually have a full-duplex socket
> + * and not two pipes; we can't use "out" because
> + * it has been closed already, but in the full-duplex
> + * case, "in" and "out" are merely dups of each other.
> + * We can't directly use "in" because it may be
> + * pointing to the sideband demuxer now */
> + shutdown(fd[0], SHUT_WR);
> if (use_sideband)
> finish_async(&demux);
> return -1;
>
> It does call shutdown() on a non-socket in the pipe case. That should be
> a harmless noop, AFAIK.
If we do care (or if we just want to be cleaner), this patch series also
works (and goes on top of the same deadlock topic, i.e., e07fd15):
[1/3]: connect: treat generic proxy processes like ssh processes
[2/3]: connect: let callers know if connection is a socket
[3/3]: send-pack: avoid deadlock on git:// push with failed pack-objects
Another approach would be to actually spawn a pipe-based helper for tcp
connections (sort of a "git netcat"). That would mean all git-protocol
connections would get the same descriptor semantics, in case any other
bugs are lurking. I'm not sure if the ugliness (extra process to manage)
and decreased efficiency (pointlessly proxying data through an extra set
of pipes) are worth it. The only thing which makes me not reject it out
of hand is that it is already how git-over-ssh works (and not unlike
git-over-http), so the extra process and inefficiency are probably not
_that_ big a deal. It just feels ugly. I wish there were a portable way
to split a full-duplex socket into two half-duplex halves, but AFAIK,
that is not possible.
-Peff
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH 1/3] connect: treat generic proxy processes like ssh processes
2011-05-16 6:39 ` Jeff King
@ 2011-05-16 6:46 ` Jeff King
2011-05-16 19:57 ` Johannes Sixt
2011-05-16 6:52 ` [PATCH 2/3] connect: let callers know if connection is a socket Jeff King
2011-05-16 6:52 ` [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects Jeff King
2 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-16 6:46 UTC (permalink / raw)
To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt
The git_connect function returns two ends of a pipe for
talking with a remote, plus a struct child_process
representing the other end of the pipe. If we have a direct
socket connection, then this points to a special "no_fork"
child process.
The code path for doing git-over-pipes or git-over-ssh sets
up this child process to point to the child git command or
the ssh process. When we call finish_connect eventually, we
check wait() on the command and report its return value.
The code path for git://, on the other hand, always sets it
to no_fork. In the case of a direct TCP connection, this
makes sense; we have no child process. But in the case of a
proxy command (configured by core.gitproxy), we do have a
child process, but we throw away its pid, and therefore
ignore its return code.
Instead, let's keep that information in the proxy case, and
respect its return code, which can help catch some errors
(though depending on your proxy command, it will be errors
reported by the proxy command itself, and not propagated
from git commands. Still, it is probably better to propagate
such errors than to ignore them).
It also means that the child_process field can reliably be
used to determine whether the returned descriptors are
actually a full-duplex socket, which means we should be
using shutdown() instead of a simple close.
Signed-off-by: Jeff King <peff@peff.net>
---
Obviously I am interested mainly in the last bit for this series. But I
consider the rest of it a minor bugfix on its own.
connect.c | 27 +++++++++++++++------------
1 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/connect.c b/connect.c
index db965c9..86ad150 100644
--- a/connect.c
+++ b/connect.c
@@ -403,12 +403,12 @@ static int git_use_proxy(const char *host)
return (git_proxy_command && *git_proxy_command);
}
-static void git_proxy_connect(int fd[2], char *host)
+static struct child_process *git_proxy_connect(int fd[2], char *host)
{
const char *port = STR(DEFAULT_GIT_PORT);
char *colon, *end;
const char *argv[4];
- struct child_process proxy;
+ struct child_process *proxy;
if (host[0] == '[') {
end = strchr(host + 1, ']');
@@ -431,14 +431,15 @@ static void git_proxy_connect(int fd[2], char *host)
argv[1] = host;
argv[2] = port;
argv[3] = NULL;
- memset(&proxy, 0, sizeof(proxy));
- proxy.argv = argv;
- proxy.in = -1;
- proxy.out = -1;
- if (start_command(&proxy))
+ proxy = xcalloc(1, sizeof(*proxy));
+ proxy->argv = argv;
+ proxy->in = -1;
+ proxy->out = -1;
+ if (start_command(proxy))
die("cannot start proxy %s", argv[0]);
- fd[0] = proxy.out; /* read from proxy stdout */
- fd[1] = proxy.in; /* write to proxy stdin */
+ fd[0] = proxy->out; /* read from proxy stdout */
+ fd[1] = proxy->in; /* write to proxy stdin */
+ return proxy;
}
#define MAX_CMD_LEN 1024
@@ -553,9 +554,11 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
*/
char *target_host = xstrdup(host);
if (git_use_proxy(host))
- git_proxy_connect(fd, host);
- else
+ conn = git_proxy_connect(fd, host);
+ else {
git_tcp_connect(fd, host, flags);
+ conn = &no_fork;
+ }
/*
* Separate original protocol components prog and path
* from extended host header with a NUL byte.
@@ -571,7 +574,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
free(url);
if (free_path)
free(path);
- return &no_fork;
+ return conn;
}
conn = xcalloc(1, sizeof(*conn));
--
1.7.5.1.13.g0ca09.dirty
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH 1/3] connect: treat generic proxy processes like ssh processes
2011-05-16 6:46 ` [PATCH 1/3] connect: treat generic proxy processes like ssh processes Jeff King
@ 2011-05-16 19:57 ` Johannes Sixt
2011-05-16 23:12 ` Junio C Hamano
2011-05-17 5:54 ` Jeff King
0 siblings, 2 replies; 53+ messages in thread
From: Johannes Sixt @ 2011-05-16 19:57 UTC (permalink / raw)
To: Jeff King; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git
Am 16.05.2011 08:46, schrieb Jeff King:
> The git_connect function returns two ends of a pipe for
> talking with a remote, plus a struct child_process
> representing the other end of the pipe. If we have a direct
> socket connection, then this points to a special "no_fork"
> child process.
>
> The code path for doing git-over-pipes or git-over-ssh sets
> up this child process to point to the child git command or
> the ssh process. When we call finish_connect eventually, we
> check wait() on the command and report its return value.
>
> The code path for git://, on the other hand, always sets it
> to no_fork. In the case of a direct TCP connection, this
> makes sense; we have no child process. But in the case of a
> proxy command (configured by core.gitproxy), we do have a
> child process, but we throw away its pid, and therefore
> ignore its return code.
>
> Instead, let's keep that information in the proxy case, and
> respect its return code, which can help catch some errors
This patch looks strikingly familiar. I had written an almost identical
change more than 3 years ago and forgot about it, though the
justification I noted in the commit was more to properly shutdown the
proxy process rather than to abandon it and let it be collected by
init(8). Your justification is much better.
There's one problem with your implementation, though:
> -static void git_proxy_connect(int fd[2], char *host)
> +static struct child_process *git_proxy_connect(int fd[2], char *host)
> {
> const char *port = STR(DEFAULT_GIT_PORT);
> char *colon, *end;
> const char *argv[4];
> - struct child_process proxy;
> + struct child_process *proxy;
>
> if (host[0] == '[') {
> end = strchr(host + 1, ']');
> @@ -431,14 +431,15 @@ static void git_proxy_connect(int fd[2], char *host)
> argv[1] = host;
> argv[2] = port;
> argv[3] = NULL;
> - memset(&proxy, 0, sizeof(proxy));
> - proxy.argv = argv;
> - proxy.in = -1;
> - proxy.out = -1;
> - if (start_command(&proxy))
> + proxy = xcalloc(1, sizeof(*proxy));
> + proxy->argv = argv;
> + proxy->in = -1;
> + proxy->out = -1;
> + if (start_command(proxy))
At this point, proxy->argv would point to automatic storage; but we
need argv[0] in finish_command() for error reporting. In my
implementation, I xmalloced the pointer array and leaked it. (And
that's probably the reason that I never submitted the patch.) I
wouldn't dare to make argv just static because this limits us to have
just one open connection at a time established via git_proxy_connect().
Dunno...
Below is the interdiff that turns your patch into mine, mostly for
exposition: The two hunks in git_connect() are just cosmetic
differences. But the first hunk should be squashed into your patch to
fix a potential crash in an error situation (e.g., when the proxy
dies from a signal) at the cost of a small memory leak.
diff --git a/connect.c b/connect.c
index a28e084..8668913 100644
--- a/connect.c
+++ b/connect.c
@@ -399,9 +399,10 @@ static struct child_process *git_proxy_connect(int fd[2], char *host)
{
const char *port = STR(DEFAULT_GIT_PORT);
- const char *argv[4];
+ const char **argv;
struct child_process *proxy;
get_host_and_port(&host, &port);
+ argv = xmalloc(4*sizeof(*argv));
argv[0] = git_proxy_command;
argv[1] = host;
@@ -457,5 +458,5 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
char *end;
int c;
- struct child_process *conn;
+ struct child_process *conn = &no_fork;
enum protocol protocol = PROTO_LOCAL;
int free_path = 0;
@@ -543,8 +544,6 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
if (git_use_proxy(host))
conn = git_proxy_connect(fd, host);
- else {
+ else
git_tcp_connect(fd, host, flags);
- conn = &no_fork;
- }
/*
* Separate original protocol components prog and path
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH 1/3] connect: treat generic proxy processes like ssh processes
2011-05-16 19:57 ` Johannes Sixt
@ 2011-05-16 23:12 ` Junio C Hamano
2011-05-17 5:54 ` Jeff King
1 sibling, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16 23:12 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Jeff King, Johan Herland, Shawn Pearce, git
Johannes Sixt <j6t@kdbg.org> writes:
> At this point, proxy->argv would point to automatic storage; but we
> need argv[0] in finish_command() for error reporting. In my
> implementation, I xmalloced the pointer array and leaked it. (And
> that's probably the reason that I never submitted the patch.) I
> wouldn't dare to make argv just static because this limits us to have
> just one open connection at a time established via git_proxy_connect().
Good point. If we really care, I would imagine that struct child_process
can gain a boolean flag to tell finish_command() that the argv[] needs, so
I do not think leakage here is such a big deal.
Will squash all three hunks in. Assuming no_fork first and updating conn
as we discover what kind of URL we have feels natural.
Thanks, both.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH 1/3] connect: treat generic proxy processes like ssh processes
2011-05-16 19:57 ` Johannes Sixt
2011-05-16 23:12 ` Junio C Hamano
@ 2011-05-17 5:54 ` Jeff King
2011-05-17 20:14 ` Johannes Sixt
1 sibling, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-17 5:54 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git
On Mon, May 16, 2011 at 09:57:58PM +0200, Johannes Sixt wrote:
> > The code path for git://, on the other hand, always sets it
> > to no_fork. In the case of a direct TCP connection, this
> > makes sense; we have no child process. But in the case of a
> > proxy command (configured by core.gitproxy), we do have a
> > child process, but we throw away its pid, and therefore
> > ignore its return code.
> >
> > Instead, let's keep that information in the proxy case, and
> > respect its return code, which can help catch some errors
>
> This patch looks strikingly familiar. I had written an almost identical
> change more than 3 years ago and forgot about it, though the
> justification I noted in the commit was more to properly shutdown the
> proxy process rather than to abandon it and let it be collected by
> init(8). Your justification is much better.
Thanks, I had no idea your patch existed. I hate to duplicate work, but
at least it's a sanity check that it's not a totally stupid idea. ;)
> > const char *argv[4];
> > - struct child_process proxy;
> > + struct child_process *proxy;
> [...]
> At this point, proxy->argv would point to automatic storage; but we
> need argv[0] in finish_command() for error reporting.
Ick. Good catch.
> In my implementation, I xmalloced the pointer array and leaked it.
> (And that's probably the reason that I never submitted the patch.) I
> wouldn't dare to make argv just static because this limits us to have
> just one open connection at a time established via
> git_proxy_connect(). Dunno...
We also need to worry about the contents of each argv[] element, no? So
we should be xstrdup()ing the host and port, which point into some
string which gets passed to us. I didn't trace its provenance but I
think it is better to be defensive.
The leak is probably OK in a practical sense (you generally make no more
than one such connection per command), but it does seem ugly. I would
not be surprised if many other run-command invocations leak similarly.
The interdiff with the strdups is:
diff --git a/connect.c b/connect.c
index c678ceb..c24866e 100644
--- a/connect.c
+++ b/connect.c
@@ -398,14 +398,15 @@ static int git_use_proxy(const char *host)
static struct child_process *git_proxy_connect(int fd[2], char *host)
{
const char *port = STR(DEFAULT_GIT_PORT);
- const char *argv[4];
+ const char **argv;
struct child_process *proxy;
get_host_and_port(&host, &port);
+ argv = xmalloc(4 * sizeof(*argv));
argv[0] = git_proxy_command;
- argv[1] = host;
- argv[2] = port;
+ argv[1] = xstrdup(host);
+ argv[2] = xstrdup(port);
argv[3] = NULL;
proxy = xcalloc(1, sizeof(*proxy));
proxy->argv = argv;
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH 1/3] connect: treat generic proxy processes like ssh processes
2011-05-17 5:54 ` Jeff King
@ 2011-05-17 20:14 ` Johannes Sixt
2011-05-18 8:57 ` Jeff King
0 siblings, 1 reply; 53+ messages in thread
From: Johannes Sixt @ 2011-05-17 20:14 UTC (permalink / raw)
To: Jeff King; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git
Am 17.05.2011 07:54, schrieb Jeff King:
> On Mon, May 16, 2011 at 09:57:58PM +0200, Johannes Sixt wrote:
>> In my implementation, I xmalloced the pointer array and leaked it.
I noticed that it actually isn't leaked because finish_connect() frees
it. For this reason, I actually have to wonder why your version that
stored a pointer to automatic storage in ->argv worked.
> We also need to worry about the contents of each argv[] element, no? So
> we should be xstrdup()ing the host and port, which point into some
> string which gets passed to us. I didn't trace its provenance but I
> think it is better to be defensive.
I would not worry too much today. Of course, functions other than
start_command() might begin to access ->argv[i] with i > 0 later, but
then we have to audit all users of struct child_process anyway.
Currently, only start_command() uses these values, which is always
called at a time when they are still valid.
-- Hannes
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH 1/3] connect: treat generic proxy processes like ssh processes
2011-05-17 20:14 ` Johannes Sixt
@ 2011-05-18 8:57 ` Jeff King
0 siblings, 0 replies; 53+ messages in thread
From: Jeff King @ 2011-05-18 8:57 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git
On Tue, May 17, 2011 at 10:14:43PM +0200, Johannes Sixt wrote:
> Am 17.05.2011 07:54, schrieb Jeff King:
> > On Mon, May 16, 2011 at 09:57:58PM +0200, Johannes Sixt wrote:
> >> In my implementation, I xmalloced the pointer array and leaked it.
>
> I noticed that it actually isn't leaked because finish_connect() frees
> it. For this reason, I actually have to wonder why your version that
> stored a pointer to automatic storage in ->argv worked.
It probably didn't. As a simple refactoring, I didn't test the proxy
codepath, and apparently nothing in our test suite does, either. :(
We can put the patch below on top (it fails with my original series, but
passes with the bugfix you noticed).
> I would not worry too much today. Of course, functions other than
> start_command() might begin to access ->argv[i] with i > 0 later, but
> then we have to audit all users of struct child_process anyway.
> Currently, only start_command() uses these values, which is always
> called at a time when they are still valid.
That makes sense.
-- >8 --
Subject: [PATCH] test core.gitproxy configuration
This is just a basic sanity test to see whether
core.gitproxy works at all. Until now, we were not testing
anywhere.
Signed-off-by: Jeff King <peff@peff.net>
---
This is really basic. Apparently you can do horrible things like
git config core.gitproxy "./proxy for kernel.org"
git config core.gitproxy "./other-proxy for example.com"
I can make it more elaborate if we really want to care.
t/t5532-fetch-proxy.sh | 42 ++++++++++++++++++++++++++++++++++++++++++
1 files changed, 42 insertions(+), 0 deletions(-)
create mode 100755 t/t5532-fetch-proxy.sh
diff --git a/t/t5532-fetch-proxy.sh b/t/t5532-fetch-proxy.sh
new file mode 100755
index 0000000..07119d9
--- /dev/null
+++ b/t/t5532-fetch-proxy.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+test_description='fetching via git:// using core.gitproxy'
+. ./test-lib.sh
+
+test_expect_success 'setup remote repo' '
+ git init remote &&
+ (cd remote &&
+ echo content >file &&
+ git add file &&
+ git commit -m one
+ )
+'
+
+cat >proxy <<'EOF'
+#!/bin/sh
+echo >&2 "proxying for $*"
+cmd=`perl -e '
+ read(STDIN, $buf, 4);
+ my $n = hex($buf) - 4;
+ read(STDIN, $buf, $n);
+ my ($cmd, $other) = split /\0/, $buf;
+ # drop absolute-path on repo name
+ $cmd =~ s{ /}{ };
+ print $cmd;
+'`
+exec $cmd
+EOF
+chmod +x proxy
+test_expect_success 'setup local repo' '
+ git remote add fake git://example.com/remote &&
+ git config core.gitproxy ./proxy
+'
+
+test_expect_success 'fetch through proxy works' '
+ git fetch fake &&
+ echo one >expect &&
+ git log -1 --format=%s FETCH_HEAD >actual &&
+ test_cmp expect actual
+'
+
+test_done
--
1.7.5.8.ga95ab
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH 2/3] connect: let callers know if connection is a socket
2011-05-16 6:39 ` Jeff King
2011-05-16 6:46 ` [PATCH 1/3] connect: treat generic proxy processes like ssh processes Jeff King
@ 2011-05-16 6:52 ` Jeff King
2011-05-16 6:52 ` [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects Jeff King
2 siblings, 0 replies; 53+ messages in thread
From: Jeff King @ 2011-05-16 6:52 UTC (permalink / raw)
To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt
They might care because they want to do a half-duplex close.
With pipes, that means simply closing the output descriptor;
with a socket, you must actually call shutdown.
Instead of exposing the magic no_fork child_process struct,
let's encapsulate the test in a function.
Signed-off-by: Jeff King <peff@peff.net>
---
An more object-oriented refactoring would be something like:
struct git_connection {
struct child_process child;
int in;
int out;
};
void git_connection_read_fd(struct git_connection *c)
{
return c->in;
}
void git_connect_write_fd(struct git_connect *c)
{
return c->child.pid ? c->out : c->in;
}
void git_connection_half_duplex_close(struct git_connection *c)
{
if (!c->child.pid)
shutdown(c->in, SHUT_WR);
else
close(c->out);
}
but the idea that a git connection is defined by two file descriptors
runs throughout the code (in fact, we don't even explicitly do the
half-duplex close in the pipe case; we hand the descriptor off to the
pack-objects run-command, which takes ownership). So trying to be fancy
and abstracted is not worth it in this case.
cache.h | 1 +
connect.c | 7 ++++++-
2 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/cache.h b/cache.h
index bf468e5..724aad4 100644
--- a/cache.h
+++ b/cache.h
@@ -865,6 +865,7 @@ extern struct ref *find_ref_by_name(const struct ref *list, const char *name);
#define CONNECT_VERBOSE (1u << 0)
extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
extern int finish_connect(struct child_process *conn);
+extern int git_connection_is_socket(struct child_process *conn);
extern int path_match(const char *path, int nr, char **match);
struct extra_have_objects {
int nr, alloc;
diff --git a/connect.c b/connect.c
index 86ad150..20a008d 100644
--- a/connect.c
+++ b/connect.c
@@ -634,10 +634,15 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
return conn;
}
+int git_connection_is_socket(struct child_process *conn)
+{
+ return conn == &no_fork;
+}
+
int finish_connect(struct child_process *conn)
{
int code;
- if (!conn || conn == &no_fork)
+ if (!conn || git_connection_is_socket(conn))
return 0;
code = finish_command(conn);
--
1.7.5.1.13.g0ca09.dirty
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects
2011-05-16 6:39 ` Jeff King
2011-05-16 6:46 ` [PATCH 1/3] connect: treat generic proxy processes like ssh processes Jeff King
2011-05-16 6:52 ` [PATCH 2/3] connect: let callers know if connection is a socket Jeff King
@ 2011-05-16 6:52 ` Jeff King
2011-05-16 20:02 ` Johannes Sixt
2 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-16 6:52 UTC (permalink / raw)
To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt
Commit 09c9957c fixes a deadlock in which pack-objects
fails, the remote end is still waiting for pack data, and we
are still waiting for the remote end to say something (see
that commit for a much more in-depth explanation).
We solved the problem there by making sure the output pipe
is closed on error; thus the remote sees EOF, and proceeds
to complain and close its end of the connection.
However, in the special case of push over git://, we don't
have a pipe, but rather a full-duplex socket, with another
dup()-ed descriptor in place of the second half of the pipe.
In this case, closing the second descriptor signals nothing
to the remote end, and we still deadlock.
This patch calls shutdown() explicitly to signal EOF to the
other side.
Signed-off-by: Jeff King <peff@peff.net>
---
And if you wanted to drop the first two patches, this probably works OK
without the conditional, as the shutdown is just a no-op on a pipe
descriptor then.
builtin-send-pack.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/builtin-send-pack.c b/builtin-send-pack.c
index 3e70795..682a3f9 100644
--- a/builtin-send-pack.c
+++ b/builtin-send-pack.c
@@ -520,6 +520,8 @@ int send_pack(struct send_pack_args *args,
ref->status = REF_STATUS_NONE;
if (args->stateless_rpc)
close(out);
+ if (git_connection_is_socket(conn))
+ shutdown(fd[0], SHUT_WR);
if (use_sideband)
finish_async(&demux);
return -1;
--
1.7.5.1.13.g0ca09.dirty
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects
2011-05-16 6:52 ` [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects Jeff King
@ 2011-05-16 20:02 ` Johannes Sixt
2011-05-17 5:56 ` Jeff King
0 siblings, 1 reply; 53+ messages in thread
From: Johannes Sixt @ 2011-05-16 20:02 UTC (permalink / raw)
To: Jeff King; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git
Am 16.05.2011 08:52, schrieb Jeff King:
> And if you wanted to drop the first two patches, this probably works OK
> without the conditional, as the shutdown is just a no-op on a pipe
> descriptor then.
>
> builtin-send-pack.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/builtin-send-pack.c b/builtin-send-pack.c
> index 3e70795..682a3f9 100644
> --- a/builtin-send-pack.c
> +++ b/builtin-send-pack.c
> @@ -520,6 +520,8 @@ int send_pack(struct send_pack_args *args,
> ref->status = REF_STATUS_NONE;
> if (args->stateless_rpc)
> close(out);
> + if (git_connection_is_socket(conn))
> + shutdown(fd[0], SHUT_WR);
> if (use_sideband)
> finish_async(&demux);
> return -1;
We probably need a wrapper for shutdown() on Windows. I'll look into
this tomorrow.
-- Hannes
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects
2011-05-16 20:02 ` Johannes Sixt
@ 2011-05-17 5:56 ` Jeff King
2011-05-18 20:24 ` [PATCH] Windows: add a wrapper for the shutdown() system call Johannes Sixt
0 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-17 5:56 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git
On Mon, May 16, 2011 at 10:02:16PM +0200, Johannes Sixt wrote:
> > + if (git_connection_is_socket(conn))
> > + shutdown(fd[0], SHUT_WR);
>
> We probably need a wrapper for shutdown() on Windows. I'll look into
> this tomorrow.
FWIW, we already make an identical call in transport-helper.c (I was
tempted even to use "1" instead of SHUT_WR for portability, but it seems
nobody has complained so far about the use in transport-helper).
-Peff
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH] Windows: add a wrapper for the shutdown() system call
2011-05-17 5:56 ` Jeff King
@ 2011-05-18 20:24 ` Johannes Sixt
0 siblings, 0 replies; 53+ messages in thread
From: Johannes Sixt @ 2011-05-18 20:24 UTC (permalink / raw)
To: Jeff King; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git
Even though Windows's socket functions look like their POSIX counter parts,
they do not operate on file descriptors, but on "socket objects". To bring
the functions in line with POSIX, we have proxy functions that wrap and
unwrap the socket objects in file descriptors using open_osfhandle and
get_osfhandle. But shutdown() was not proxied, yet. Fix this.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
Am 17.05.2011 07:56, schrieb Jeff King:
> On Mon, May 16, 2011 at 10:02:16PM +0200, Johannes Sixt wrote:
>
>>> + if (git_connection_is_socket(conn))
>>> + shutdown(fd[0], SHUT_WR);
>>
>> We probably need a wrapper for shutdown() on Windows. I'll look into
>> this tomorrow.
We have a shutdown() on Windows, but it does not work out of the box.
Making it work is not a big deal, but it is an academic excercise if it
were only for this topic: On Windows, send-pack hangs when network
connections are involved for unknown reasons as long as side-band-64k
is enabled. If it is not enabled, the deadlock that you fixed does not
happen in the first place and it is irrelevant whether shutdown works.
> FWIW, we already make an identical call in transport-helper.c (I was
> tempted even to use "1" instead of SHUT_WR for portability, but it seems
> nobody has complained so far about the use in transport-helper).
Yes, and for the reasons mentioned above, this patch is intended for
the 1.7.4 series, where transport-helper.c went public. (But it should
apply cleanly to current master as well.) We do not have tests that
exercise the code in transport-helper.c, but I did test that this
shutdown implementation does something reasonable when it is merged
with your send-pack deadlock topic branch.
compat/mingw.c | 7 +++++++
compat/mingw.h | 3 +++
2 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index bee6054..1cbc9e8 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1219,6 +1219,13 @@ int mingw_setsockopt(int sockfd, int lvl, int optname, void *optval, int optlen)
return setsockopt(s, lvl, optname, (const char*)optval, optlen);
}
+#undef shutdown
+int mingw_shutdown(int sockfd, int how)
+{
+ SOCKET s = (SOCKET)_get_osfhandle(sockfd);
+ return shutdown(s, how);
+}
+
#undef listen
int mingw_listen(int sockfd, int backlog)
{
diff --git a/compat/mingw.h b/compat/mingw.h
index 14211c6..3b20fa9 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -219,6 +219,9 @@ int mingw_bind(int sockfd, struct sockaddr *sa, size_t sz);
int mingw_setsockopt(int sockfd, int lvl, int optname, void *optval, int optlen);
#define setsockopt mingw_setsockopt
+int mingw_shutdown(int sockfd, int how);
+#define shutdown mingw_shutdown
+
int mingw_listen(int sockfd, int backlog);
#define listen mingw_listen
--
1.7.5.rc1.97.ge0653
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout
2011-05-15 21:37 ` [PATCHv3 0/9] Push limits Johan Herland
2011-05-15 21:37 ` [PATCHv3 1/9] Update technical docs to reflect side-band-64k capability in receive-pack Johan Herland
2011-05-15 21:37 ` [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails Johan Herland
@ 2011-05-15 21:37 ` Johan Herland
2011-05-15 22:06 ` Shawn Pearce
2011-05-16 6:12 ` Junio C Hamano
2011-05-15 21:37 ` [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size Johan Herland
` (6 subsequent siblings)
9 siblings, 2 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git
Currently we refuse combining --max-pack-size with --stdout since there's
no way to make multiple packs when the pack is written to stdout. However,
we want to be able to limit the maximum size of the pack created by
--stdout (and abort pack-objects if we are unable to meet that limit).
Therefore, when used together with --stdout, we reinterpret --max-pack-size
to indicate the maximum pack size which - if exceeded - will cause
pack-objects to abort with an error message.
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/git-pack-objects.txt | 3 ++
builtin/pack-objects.c | 9 ++++---
t/t5300-pack-object.sh | 43 ++++++++++++++++++++++++++++++++++++
3 files changed, 51 insertions(+), 4 deletions(-)
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 20c8551..ca97463 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -112,6 +112,9 @@ base-name::
If specified, multiple packfiles may be created.
The default is unlimited, unless the config variable
`pack.packSizeLimit` is set.
++
+When used together with --stdout, the command will fail with an error
+message if the pack output exceeds the given limit.
--honor-pack-keep::
This flag causes an object already in a local pack that
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f402a84..69f1c51 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -229,7 +229,7 @@ static unsigned long write_object(struct sha1file *f,
if (!entry->delta)
usable_delta = 0; /* no delta */
- else if (!pack_size_limit)
+ else if (!pack_size_limit || pack_to_stdout)
usable_delta = 1; /* unlimited packfile */
else if (entry->delta->idx.offset == (off_t)-1)
usable_delta = 0; /* base was written to another pack */
@@ -478,6 +478,9 @@ static void write_pack_file(void)
* If so, rewrite it like in fast-import
*/
if (pack_to_stdout) {
+ if (nr_written != nr_remaining)
+ die("unable to make pack within the pack size"
+ " limit (%lu bytes)", pack_size_limit);
sha1close(f, sha1, CSUM_CLOSE);
} else if (nr_written == nr_remaining) {
sha1close(f, sha1, CSUM_FSYNC);
@@ -2315,9 +2318,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (!pack_to_stdout && !pack_size_limit)
pack_size_limit = pack_size_limit_cfg;
- if (pack_to_stdout && pack_size_limit)
- die("--max-pack-size cannot be used to build a pack for transfer.");
- if (pack_size_limit && pack_size_limit < 1024*1024) {
+ if (!pack_to_stdout && pack_size_limit && pack_size_limit < 1024*1024) {
warning("minimum pack size limit is 1 MiB");
pack_size_limit = 1024*1024;
}
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 602806d..00f1bd8 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -396,6 +396,49 @@ test_expect_success 'verify resulting packs' '
git verify-pack test-11-*.pack
'
+test_expect_success '--stdout ignores pack.packSizeLimit' '
+ git pack-objects --stdout <obj-list >test-12.pack &&
+ git index-pack --strict test-12.pack
+'
+
+test_expect_success 'verify resulting pack' '
+ git verify-pack test-12.pack
+'
+
+test_expect_success 'honor --max-pack-size' '
+ git config --unset pack.packSizeLimit &&
+ packname_13=$(git pack-objects --max-pack-size=3m test-13 <obj-list) &&
+ test 2 = $(ls test-13-*.pack | wc -l)
+'
+
+test_expect_success 'verify resulting packs' '
+ git verify-pack test-13-*.pack
+'
+
+test_expect_success 'tolerate --max-pack-size smaller than biggest object' '
+ packname_14=$(git pack-objects --max-pack-size=1 test-14 <obj-list) &&
+ test 5 = $(ls test-14-*.pack | wc -l)
+'
+
+test_expect_success 'verify resulting packs' '
+ git verify-pack test-14-*.pack
+'
+
+test_expect_success '--stdout works with large enough --max-pack-size' '
+ git pack-objects --stdout --max-pack-size=10m <obj-list >test-15.pack &&
+ git index-pack --strict test-15.pack
+'
+
+test_expect_success 'verify resulting pack' '
+ git verify-pack test-15.pack
+'
+
+test_expect_success '--stdout fails when pack exceeds --max-pack-size' '
+ test_must_fail git pack-objects --stdout --max-pack-size=1 <obj-list >test-16.pack 2>errs &&
+ test_must_fail git index-pack --strict test-16.pack &&
+ grep -q "pack size limit" errs
+'
+
#
# WARNING!
#
--
1.7.5.rc1.3.g4d7b
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout
2011-05-15 21:37 ` [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
@ 2011-05-15 22:06 ` Shawn Pearce
2011-05-16 1:39 ` Johan Herland
2011-05-16 6:12 ` Junio C Hamano
1 sibling, 1 reply; 53+ messages in thread
From: Shawn Pearce @ 2011-05-15 22:06 UTC (permalink / raw)
To: Johan Herland; +Cc: Junio C Hamano, git
On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
> Currently we refuse combining --max-pack-size with --stdout since there's
> no way to make multiple packs when the pack is written to stdout. However,
> we want to be able to limit the maximum size of the pack created by
> --stdout (and abort pack-objects if we are unable to meet that limit).
>
> Therefore, when used together with --stdout, we reinterpret --max-pack-size
> to indicate the maximum pack size which - if exceeded - will cause
> pack-objects to abort with an error message.
...
> if (pack_to_stdout) {
> + if (nr_written != nr_remaining)
> + die("unable to make pack within the pack size"
> + " limit (%lu bytes)", pack_size_limit);
I think this is too late. We have already output a bunch of data, up
to the size limit at this point. If the size limit is non-trivial
(e.g. 5 MB) we have already sent most of that to the remote side, and
its already written some of that out to disk.
I'd like this to be a soft limit derived from the reused object sizes.
When planning the pack by looking at where we will reuse an object
from, sum those sizes. If the sum of these sizes would break this
limit, then we abort before even writing the pack header out.
--
Shawn.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout
2011-05-15 22:06 ` Shawn Pearce
@ 2011-05-16 1:39 ` Johan Herland
0 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-16 1:39 UTC (permalink / raw)
To: Shawn Pearce; +Cc: git, Junio C Hamano
On Monday 16 May 2011, Shawn Pearce wrote:
> On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
> > Currently we refuse combining --max-pack-size with --stdout since
> > there's no way to make multiple packs when the pack is written to
> > stdout. However, we want to be able to limit the maximum size of the
> > pack created by --stdout (and abort pack-objects if we are unable to
> > meet that limit).
> >
> > Therefore, when used together with --stdout, we reinterpret
> > --max-pack-size to indicate the maximum pack size which - if exceeded
> > - will cause pack-objects to abort with an error message.
>
> ...
>
> > if (pack_to_stdout) {
> > + if (nr_written != nr_remaining)
> > + die("unable to make pack within the pack size"
> > + " limit (%lu bytes)", pack_size_limit);
>
> I think this is too late. We have already output a bunch of data, up
> to the size limit at this point. If the size limit is non-trivial
> (e.g. 5 MB) we have already sent most of that to the remote side, and
> its already written some of that out to disk.
>
> I'd like this to be a soft limit derived from the reused object sizes.
> When planning the pack by looking at where we will reuse an object
> from, sum those sizes. If the sum of these sizes would break this
> limit, then we abort before even writing the pack header out.
I agree, but it's currently late Sunday (early Monday), and after
looking at this for a while, I'm no longer thinking straight.
If someone that groks the pack-objects internal could help out, I'd be
really grateful. AFAICS, we need to drill into prepare_pack() to find
the details needed to estimate the total pack size, but I don't know
exactly which data structure(s) holds the data needed. We probably need
to accumulate a pack size estimate in find_deltas(), and then sum those
across the threads, before we finally compare the total estimate to
pack_size_limit prior to calling write_pack_file().
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout
2011-05-15 21:37 ` [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
2011-05-15 22:06 ` Shawn Pearce
@ 2011-05-16 6:12 ` Junio C Hamano
2011-05-16 9:27 ` Johan Herland
1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16 6:12 UTC (permalink / raw)
To: Johan Herland; +Cc: Shawn Pearce, git
Johan Herland <johan@herland.net> writes:
> Currently we refuse combining --max-pack-size with --stdout since there's
> no way to make multiple packs when the pack is written to stdout. However,
> we want to be able to limit the maximum size of the pack created by
> --stdout (and abort pack-objects if we are unable to meet that limit).
>
> Therefore, when used together with --stdout, we reinterpret --max-pack-size
> to indicate the maximum pack size which - if exceeded - will cause
> pack-objects to abort with an error message.
I only gave the code a cursory look, but I think your patch does more than
the above paragraphs say. I am not sure those extra change are justified.
For example,
> @@ -229,7 +229,7 @@ static unsigned long write_object(struct sha1file *f,
>
> if (!entry->delta)
> usable_delta = 0; /* no delta */
> - else if (!pack_size_limit)
> + else if (!pack_size_limit || pack_to_stdout)
> usable_delta = 1; /* unlimited packfile */
Why does this conditional have to change its behaviour when writing to the
standard output? I thought that the only thing you are doing "earlier we
didn't allow setting size limit when writing to standard output, now we
do", and I do not see the linkage between that objective and this change.
> @@ -2315,9 +2318,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>
> if (!pack_to_stdout && !pack_size_limit)
> pack_size_limit = pack_size_limit_cfg;
> - if (pack_to_stdout && pack_size_limit)
> - die("--max-pack-size cannot be used to build a pack for transfer.");
> - if (pack_size_limit && pack_size_limit < 1024*1024) {
> + if (!pack_to_stdout && pack_size_limit && pack_size_limit < 1024*1024) {
> warning("minimum pack size limit is 1 MiB");
> pack_size_limit = 1024*1024;
> }
Why is the new combination "writing to the standard output, but the
maximum size is limited" does not have the same lower bound to pack size
limit while on-disk packs do?
If you have a reason to believe 1 MiB is too large for a pack size limit,
shouldn't that logic apply equally to the on-disk case? What does this
change have to do with the interaction with --stdout option?
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout
2011-05-16 6:12 ` Junio C Hamano
@ 2011-05-16 9:27 ` Johan Herland
0 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-16 9:27 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Shawn Pearce
On Monday 16 May 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > Currently we refuse combining --max-pack-size with --stdout since
> > there's no way to make multiple packs when the pack is written to
> > stdout. However, we want to be able to limit the maximum size of the
> > pack created by --stdout (and abort pack-objects if we are unable to
> > meet that limit).
> >
> > Therefore, when used together with --stdout, we reinterpret
> > --max-pack-size to indicate the maximum pack size which - if exceeded
> > - will cause pack-objects to abort with an error message.
>
> I only gave the code a cursory look, but I think your patch does more
> than the above paragraphs say. I am not sure those extra change are
> justified.
>
> For example,
>
> > @@ -229,7 +229,7 @@ static unsigned long write_object(struct sha1file *f,
> >
> > if (!entry->delta)
> > usable_delta = 0; /* no delta */
> > - else if (!pack_size_limit)
> > + else if (!pack_size_limit || pack_to_stdout)
> > usable_delta = 1; /* unlimited packfile */
>
> Why does this conditional have to change its behaviour when writing to
> the standard output? I thought that the only thing you are doing
> "earlier we didn't allow setting size limit when writing to standard
> output, now we do", and I do not see the linkage between that objective
> and this change.
AFAICS, the intention of the above "else if" block, is to enable
usable_delta when there is no chance of a pack split happening.
To establish that a pack split cannot happen, the code checks that
pack_size_limit is disabled. Previously, pack_size_limit and
pack_to_stdout was mutually exclusive (look at the last hunk in
pack-objects.c). However, with this patch, it is possible to have
both pack_size_limit and pack_to_stdout enabled.
Crucially, though, when both are enabled, a pack split is _still_
impossible, since a pack split while pack_to_stdout is enabled, forces
pack-objects to abort altogether.
In other words, when pack_to_stdout is enabled, pack_size_limit is no
longer a good indicator of whether a pack split might happen. Instead,
pack_to_stdout being enabled is a good enough indicator in itself to
guarantee that no pack split can happen (and hence that usable_delta
should be enabled).
> > @@ -2315,9 +2318,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
> >
> > if (!pack_to_stdout && !pack_size_limit)
> > pack_size_limit = pack_size_limit_cfg;
> > - if (pack_to_stdout && pack_size_limit)
> > - die("--max-pack-size cannot be used to build a pack for transfer.");
> > - if (pack_size_limit && pack_size_limit < 1024*1024) {
> > + if (!pack_to_stdout && pack_size_limit && pack_size_limit < 1024*1024) {
> > warning("minimum pack size limit is 1 MiB");
> > pack_size_limit = 1024*1024;
> > }
>
> Why is the new combination "writing to the standard output, but the
> maximum size is limited" does not have the same lower bound to pack size
> limit while on-disk packs do?
The reason for this change is purely to leave the server (receive-pack)
in control of the pack size limit. If the server for any reason does
specify a pack size limit less than 1 MiB, we do not want the client
blindly ignoring that limit, and then submitting a pack that the server
will reject.
If we _do_ want a hard lower bound on the pack size limit, we should
apply that server-side (by setting 1 MiB as the minimum allowed value
for receive.packSizeLimit). However, we will now have a problem if we
in the future decide to change this lower bound for any reason
whatsoever: We will need to change it in both receive-pack and
pack-objects, and for as long as there are new clients talking to old
servers (or vice versa if we decrease the lower bound), there will be
clients submitting packs that are then rejected by the server.
I'd rather put the server in total control of the pack size limit.
> If you have a reason to believe 1 MiB is too large for a pack size limit,
> shouldn't that logic apply equally to the on-disk case? What does this
> change have to do with the interaction with --stdout option?
I have no opinion on the lower bound of the pack size limit in the
on-disk case.
In the above discussion, the usage of --stdout is synonymous with the
send-pack scenario (pack-objects communicating with receive-pack).
Although not true in general, it is true for this patch, since up until
now pack-objects would abort if both --stdout and --max-pack-size were
in use. Therefore, for the two changes discussed above:
- else if (!pack_size_limit)
+ else if (!pack_size_limit || pack_to_stdout)
and
- if (pack_size_limit && pack_size_limit < 1024*1024) {
+ if (!pack_to_stdout && pack_size_limit && pack_size_limit < 1024*1024) {
I can deduce that they only affect the current use case (the send-pack
scenario), since these changes make no (functional) difference in any
other use case (where --stdout and --max-pack-size are mutually
exclusive).
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
2011-05-15 21:37 ` [PATCHv3 0/9] Push limits Johan Herland
` (2 preceding siblings ...)
2011-05-15 21:37 ` [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
@ 2011-05-15 21:37 ` Johan Herland
2011-05-15 22:07 ` Shawn Pearce
2011-05-15 21:37 ` [PATCHv3 5/9] pack-objects: Teach new option --max-commit-count, limiting #commits in pack Johan Herland
` (5 subsequent siblings)
9 siblings, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git
The new --max-object-count option behaves similarly to --max-pack-size,
except that the decision to split packs is determined by the number of
objects in the pack, and not by the size of the pack.
The new option also has a corresponding configuration variable, named
pack.objectCountLimit, which works similarly to pack.packSizeLimit,
subject to the difference mentioned above.
As with --max-pack-size, you can use --max-object-count together with
--stdout to put a limit on the number of objects in the pack written to
stdout. If the pack would exceed this limit, pack-objects will abort with
an error message.
Finally, for completeness, the new option is also added to git-repack,
which simply forwards it to pack-objects
Documentation and tests are included.
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/config.txt | 8 ++++++
Documentation/git-pack-objects.txt | 9 +++++++
Documentation/git-repack.txt | 6 +++++
builtin/pack-objects.c | 20 ++++++++++++++++
git-repack.sh | 27 +++++++++++----------
t/t5300-pack-object.sh | 44 ++++++++++++++++++++++++++++++++++++
6 files changed, 101 insertions(+), 13 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 285c7f7..056e01f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1523,6 +1523,14 @@ pack.packSizeLimit::
Common unit suffixes of 'k', 'm', or 'g' are
supported.
+pack.objectCountLimit::
+ The maximum number of objects in a pack. This setting only
+ affects packing to a file when repacking, i.e. the git://
+ protocol is unaffected. It can be overridden by the
+ `\--max-object-count` option of linkgit:git-repack[1].
+ The default is unlimited. Common unit suffixes of 'k', 'm',
+ or 'g' are supported.
+
pager.<cmd>::
If the value is boolean, turns on or off pagination of the
output of a particular git subcommand when writing to a tty.
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index ca97463..5ab1fe9 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -116,6 +116,15 @@ base-name::
When used together with --stdout, the command will fail with an error
message if the pack output exceeds the given limit.
+--max-object-count=<n>::
+ Maximum number of objects in each output pack file. The number
+ can be suffixed with "k", "m", or "g". If specified, multiple
+ packfiles may be created. The default is unlimited, unless
+ the config variable `pack.objectCountLimit` is set.
++
+When used together with --stdout, the command will fail with an error
+message if the pack output exceeds the given limit.
+
--honor-pack-keep::
This flag causes an object already in a local pack that
has a .keep file to be ignored, even if it would have
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 0decee2..f9a44b8 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -109,6 +109,12 @@ other objects in that pack they already have locally.
The default is unlimited, unless the config variable
`pack.packSizeLimit` is set.
+--max-object-count=<n>::
+ Maximum number of objects in each output pack file. The number
+ can be suffixed with "k", "m", or "g". If specified, multiple
+ packfiles may be created. The default is unlimited, unless
+ the config variable `pack.objectCountLimit` is set.
+
Configuration
-------------
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 69f1c51..c4fbc54 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -74,6 +74,7 @@ static const char *base_name;
static int progress = 1;
static int window = 10;
static unsigned long pack_size_limit, pack_size_limit_cfg;
+static unsigned long object_count_limit, object_count_limit_cfg;
static int depth = 50;
static int delta_search_threads;
static int pack_to_stdout;
@@ -227,6 +228,10 @@ static unsigned long write_object(struct sha1file *f,
else
limit = pack_size_limit - write_offset;
+ /* Trigger new pack when we reach object count limit */
+ if (object_count_limit && nr_written >= object_count_limit)
+ return 0;
+
if (!entry->delta)
usable_delta = 0; /* no delta */
else if (!pack_size_limit || pack_to_stdout)
@@ -1897,6 +1902,10 @@ static int git_pack_config(const char *k, const char *v, void *cb)
pack_size_limit_cfg = git_config_ulong(k, v);
return 0;
}
+ if (!strcmp(k, "pack.objectcountlimit")) {
+ object_count_limit_cfg = git_config_ulong(k, v);
+ return 0;
+ }
return git_default_config(k, v, cb);
}
@@ -2182,6 +2191,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
usage(pack_usage);
continue;
}
+ if (!prefixcmp(arg, "--max-object-count=")) {
+ object_count_limit_cfg = 0;
+ if (!git_parse_ulong(arg+19, &object_count_limit))
+ usage(pack_usage);
+ continue;
+ }
if (!prefixcmp(arg, "--window=")) {
char *end;
window = strtoul(arg+9, &end, 0);
@@ -2322,6 +2337,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
warning("minimum pack size limit is 1 MiB");
pack_size_limit = 1024*1024;
}
+ if (!pack_to_stdout && !object_count_limit)
+ object_count_limit = object_count_limit_cfg;
if (!pack_to_stdout && thin)
die("--thin cannot be used to build an indexable pack.");
@@ -2349,6 +2366,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (non_empty && !nr_result)
return 0;
+ if (pack_to_stdout && object_count_limit && object_count_limit < nr_result)
+ die("unable to make pack within the object count limit"
+ " (%lu objects)", object_count_limit);
if (nr_result)
prepare_pack(window, depth);
write_pack_file();
diff --git a/git-repack.sh b/git-repack.sh
index 624feec..e8693a6 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -7,19 +7,20 @@ OPTIONS_KEEPDASHDASH=
OPTIONS_SPEC="\
git repack [options]
--
-a pack everything in a single pack
-A same as -a, and turn unreachable objects loose
-d remove redundant packs, and run git-prune-packed
-f pass --no-reuse-delta to git-pack-objects
-F pass --no-reuse-object to git-pack-objects
-n do not run git-update-server-info
-q,quiet be quiet
-l pass --local to git-pack-objects
+a pack everything in a single pack
+A same as -a, and turn unreachable objects loose
+d remove redundant packs, and run git-prune-packed
+f pass --no-reuse-delta to git-pack-objects
+F pass --no-reuse-object to git-pack-objects
+n do not run git-update-server-info
+q,quiet be quiet
+l pass --local to git-pack-objects
Packing constraints
-window= size of the window used for delta compression
-window-memory= same as the above, but limit memory size instead of entries count
-depth= limits the maximum delta depth
-max-pack-size= maximum size of each packfile
+window= size of the window used for delta compression
+window-memory= same as the above, but limit memory size instead of entries count
+depth= limits the maximum delta depth
+max-pack-size= maximum size of each packfile
+max-object-count= maximum number of objects in each packfile
"
SUBDIRECTORY_OK='Yes'
. git-sh-setup
@@ -38,7 +39,7 @@ do
-f) no_reuse=--no-reuse-delta ;;
-F) no_reuse=--no-reuse-object ;;
-l) local=--local ;;
- --max-pack-size|--window|--window-memory|--depth)
+ --max-pack-size|--max-object-count|--window|--window-memory|--depth)
extra="$extra $1=$2"; shift ;;
--) shift; break;;
*) usage ;;
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 00f1bd8..d133c28 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -439,6 +439,50 @@ test_expect_success '--stdout fails when pack exceeds --max-pack-size' '
grep -q "pack size limit" errs
'
+test_expect_success 'honor pack.objectCountLimit' '
+ git config pack.objectCountLimit 5 &&
+ packname_17=$(git pack-objects test-17 <obj-list) &&
+ test 2 = $(ls test-17-*.pack | wc -l)
+'
+
+test_expect_success 'verify resulting packs' '
+ git verify-pack test-17-*.pack
+'
+
+test_expect_success '--stdout ignores pack.objectCountLimit' '
+ git pack-objects --stdout <obj-list >test-18.pack &&
+ git index-pack --strict test-18.pack
+'
+
+test_expect_success 'verify resulting pack' '
+ git verify-pack test-18.pack
+'
+
+test_expect_success 'honor --max-object-count' '
+ git config --unset pack.objectCountLimit &&
+ packname_19=$(git pack-objects --max-object-count=5 test-19 <obj-list) &&
+ test 2 = $(ls test-19-*.pack | wc -l)
+'
+
+test_expect_success 'verify resulting packs' '
+ git verify-pack test-19-*.pack
+'
+
+test_expect_success '--stdout works with large enough --max-object-count' '
+ git pack-objects --stdout --max-object-count=8 <obj-list >test-20.pack &&
+ git index-pack --strict test-20.pack
+'
+
+test_expect_success 'verify resulting pack' '
+ git verify-pack test-20.pack
+'
+
+test_expect_success '--stdout fails when pack exceeds --max-object-count' '
+ test_must_fail git pack-objects --stdout --max-object-count=1 <obj-list >test-21.pack 2>errs &&
+ test_must_fail git index-pack --strict test-21.pack &&
+ grep -q "object count limit" errs
+'
+
#
# WARNING!
#
--
1.7.5.rc1.3.g4d7b
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
2011-05-15 21:37 ` [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size Johan Herland
@ 2011-05-15 22:07 ` Shawn Pearce
2011-05-15 22:31 ` Johan Herland
2011-05-16 6:25 ` Junio C Hamano
0 siblings, 2 replies; 53+ messages in thread
From: Shawn Pearce @ 2011-05-15 22:07 UTC (permalink / raw)
To: Johan Herland; +Cc: Junio C Hamano, git
On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
> The new --max-object-count option behaves similarly to --max-pack-size,
> except that the decision to split packs is determined by the number of
> objects in the pack, and not by the size of the pack.
Like my note about pack size for this case... I think doing this
during writing is too late. We should be aborting the counting phase
if the output pack is to stdout and we are going to exceed this limit.
--
Shawn.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
2011-05-15 22:07 ` Shawn Pearce
@ 2011-05-15 22:31 ` Johan Herland
2011-05-15 23:48 ` Shawn Pearce
2011-05-16 6:25 ` Junio C Hamano
1 sibling, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-15 22:31 UTC (permalink / raw)
To: Shawn Pearce; +Cc: git, Junio C Hamano
On Monday 16 May 2011, Shawn Pearce wrote:
> On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
> > The new --max-object-count option behaves similarly to --max-pack-size,
> > except that the decision to split packs is determined by the number of
> > objects in the pack, and not by the size of the pack.
>
> Like my note about pack size for this case... I think doing this
> during writing is too late. We should be aborting the counting phase
> if the output pack is to stdout and we are going to exceed this limit.
The patch actually does this in the --stdout case. Look at the last
hunk in builtin/pack-objects.c:
@@ -2349,6 +2366,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (non_empty && !nr_result)
return 0;
+ if (pack_to_stdout && object_count_limit && object_count_limit < nr_result)
+ die("unable to make pack within the object count limit"
+ " (%lu objects)", object_count_limit);
if (nr_result)
prepare_pack(window, depth);
write_pack_file();
So in the --stdout case, we have already aborted before we start
writing the pack (i.e. after the counting phase).
The commit message you quote above, are for the case where someone uses
--max-object-count _without_ --stdout, in which case we compare
nr_written to object_count_limit to determine when to split the pack.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
2011-05-15 22:31 ` Johan Herland
@ 2011-05-15 23:48 ` Shawn Pearce
0 siblings, 0 replies; 53+ messages in thread
From: Shawn Pearce @ 2011-05-15 23:48 UTC (permalink / raw)
To: Johan Herland; +Cc: git, Junio C Hamano
On Sun, May 15, 2011 at 15:31, Johan Herland <johan@herland.net> wrote:
> On Monday 16 May 2011, Shawn Pearce wrote:
>> On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
>> > The new --max-object-count option behaves similarly to --max-pack-size,
>> > except that the decision to split packs is determined by the number of
>> > objects in the pack, and not by the size of the pack.
>>
>> Like my note about pack size for this case... I think doing this
>> during writing is too late. We should be aborting the counting phase
>> if the output pack is to stdout and we are going to exceed this limit.
>
> The patch actually does this in the --stdout case. Look at the last
> hunk in builtin/pack-objects.c:
>
> @@ -2349,6 +2366,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>
> if (non_empty && !nr_result)
> return 0;
> + if (pack_to_stdout && object_count_limit && object_count_limit < nr_result)
> + die("unable to make pack within the object count limit"
> + " (%lu objects)", object_count_limit);
> if (nr_result)
> prepare_pack(window, depth);
> write_pack_file();
>
> So in the --stdout case, we have already aborted before we start
> writing the pack (i.e. after the counting phase).
>
> The commit message you quote above, are for the case where someone uses
> --max-object-count _without_ --stdout, in which case we compare
> nr_written to object_count_limit to determine when to split the pack.
Thanks for the clarification. Its Sunday, I am clearly not scanning
patches with the level of detail I should be. :-)
Given that this block is in here, most of the series looks pretty good
to me. Thanks for following up with this round, I know its a lot more
than you originally wanted to do for this "simple" limit, but I think
its a worthwhile improvement.
--
Shawn.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
2011-05-15 22:07 ` Shawn Pearce
2011-05-15 22:31 ` Johan Herland
@ 2011-05-16 6:25 ` Junio C Hamano
2011-05-16 9:49 ` Johan Herland
1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16 6:25 UTC (permalink / raw)
To: Johan Herland; +Cc: git, Shawn Pearce
Shawn Pearce <spearce@spearce.org> writes:
> On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
>> The new --max-object-count option behaves similarly to --max-pack-size,
>> except that the decision to split packs is determined by the number of
>> objects in the pack, and not by the size of the pack.
>
> Like my note about pack size for this case... I think doing this
> during writing is too late. We should be aborting the counting phase
> if the output pack is to stdout and we are going to exceed this limit.
Well, even more important is if this is even useful. What is the user
trying to prevent from happening, and is it a useful thing?
I am not interested in a literal answer "The user is trying to prevent a
push that pushes too many objects in a single push into a repository". I
am questioning why does anybody even care about the object count per-se.
I think "do not hog too much disk" (i.e. size) is an understandable wish,
and max-pack-size imposed on --stdout would be a good approximation for
that.
I would understand "this project has only these files, and pushing a tree
that has 100x leaves than that may be a mistake" (i.e. recursive sum of
number of entries of an individual tree). I would also sort-of understand
"do not push too deep a history at once" (i.e. we do not welcome pushing a
wildly diverged fork that has been allowed to grow for too long).
But I do not think max-object-count is a good approximation for either
to be useful.
Without a good answer to the above question, this looks like a "because we
could", not "because it is useful", feature.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
2011-05-16 6:25 ` Junio C Hamano
@ 2011-05-16 9:49 ` Johan Herland
0 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-16 9:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Shawn Pearce
On Monday 16 May 2011, Junio C Hamano wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> > On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
> >> The new --max-object-count option behaves similarly to
> >> --max-pack-size, except that the decision to split packs is
> >> determined by the number of objects in the pack, and not by the size
> >> of the pack.
> >
> > Like my note about pack size for this case... I think doing this
> > during writing is too late. We should be aborting the counting phase
> > if the output pack is to stdout and we are going to exceed this limit.
>
> Well, even more important is if this is even useful. What is the user
> trying to prevent from happening, and is it a useful thing?
I initially met this problem at $dayjob, where a user would push a large
pack from an unrelated repo, and the push would be refused by our update
hook, but not without storing the pack on the server first. Obviously, the
main objective of the patch is to prevent the pack from being stored on the
server in the first place. [1]
Based on that, I do not really care exactly what kind of limit I have to set
in order to prevent the push. I can easily enough find usable thresholds in
any of #objects, #commits or pack size.
> I think "do not hog too much disk" (i.e. size) is an understandable wish,
> and max-pack-size imposed on --stdout would be a good approximation for
> that.
Agreed.
> I would understand "this project has only these files, and pushing a tree
> that has 100x leaves than that may be a mistake" (i.e. recursive sum of
> number of entries of an individual tree). I would also sort-of understand
> "do not push too deep a history at once" (i.e. we do not welcome pushing
> a wildly diverged fork that has been allowed to grow for too long).
>
> But I do not think max-object-count is a good approximation for either
> to be useful.
I agree in principle, but currently, it's the only limit that we can enforce
on the server side.
> Without a good answer to the above question, this looks like a "because
> we could", not "because it is useful", feature.
Ok, I can drop the client side part of this patch (including the limit-
object-count capability), but I would like to retain the server side part
(abort if pack header reveals too many objects), since otherwise I have to
wait for all users to upgrade to a "limit-*"-aware client before I can
consider this problem truly solved.
...Johan
[1]: There is a followup to this problem, where a subsequent 'git gc' will
find all the objects in the pushed (but rejected by update-hook) pack to be
unreferenced, and then explode them into loose objects. In one case, a user
pushed a 300 MB pack to the server that was then exploded into 5 GB of loose
objects by a subsequent 'git gc'. Needless to say, we now run 'git gc --
prune=now' as a workaround until this can be fixed.
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCHv3 5/9] pack-objects: Teach new option --max-commit-count, limiting #commits in pack
2011-05-15 21:37 ` [PATCHv3 0/9] Push limits Johan Herland
` (3 preceding siblings ...)
2011-05-15 21:37 ` [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size Johan Herland
@ 2011-05-15 21:37 ` Johan Herland
2011-05-15 21:37 ` [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities Johan Herland
` (4 subsequent siblings)
9 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git
The new --max-commit-count option behaves similarly to --max-object-count,
when used together with --stdout: It limits the number of commits in the
pack written to stdout. If the pack would exceed this limit, pack-objects
will abort with an error message.
Unlike --max-pack-size and --max-object-count, --max-commit-count must
always be used together with --stdout. This is because using the commit
count to split packs is not at all a good heuristic, since Git does not
necessarily distribute commit objects uniformly across packs.
Documentation and tests are included.
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/git-pack-objects.txt | 8 ++++++++
builtin/pack-objects.c | 24 +++++++++++++++++++++---
t/t5300-pack-object.sh | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+), 3 deletions(-)
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 5ab1fe9..21c30c0 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -125,6 +125,14 @@ message if the pack output exceeds the given limit.
When used together with --stdout, the command will fail with an error
message if the pack output exceeds the given limit.
+--max-commit-count=<n>::
+ This option is only useful together with --stdout.
+ Specifies the maximum number of commits allowed in the created
+ pack. If the number of commits would exceed the given limit,
+ pack-objects will fail with an error message.
+ The number can be suffixed with "k", "m", or "g".
+ The default is unlimited.
+
--honor-pack-keep::
This flag causes an object already in a local pack that
has a .keep file to be ignored, even if it would have
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c4fbc54..6c6945e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -61,7 +61,7 @@ struct object_entry {
*/
static struct object_entry *objects;
static struct pack_idx_entry **written_list;
-static uint32_t nr_objects, nr_alloc, nr_result, nr_written;
+static uint32_t nr_objects, nr_alloc, nr_result, nr_commits, nr_written;
static int non_empty;
static int reuse_delta = 1, reuse_object = 1;
@@ -661,8 +661,11 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
if (ix >= 0) {
if (exclude) {
entry = objects + object_ix[ix] - 1;
- if (!entry->preferred_base)
+ if (!entry->preferred_base) {
nr_result--;
+ if (entry->type == OBJ_COMMIT)
+ nr_commits--;
+ }
entry->preferred_base = 1;
}
return 0;
@@ -702,8 +705,11 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
entry->type = type;
if (exclude)
entry->preferred_base = 1;
- else
+ else {
nr_result++;
+ if (type == OBJ_COMMIT)
+ nr_commits++;
+ }
if (found_pack) {
entry->in_pack = found_pack;
entry->in_pack_offset = found_offset;
@@ -2137,6 +2143,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
const char **rp_av;
int rp_ac_alloc = 64;
int rp_ac;
+ unsigned long commit_count_limit = 0;
read_replace_refs = 0;
@@ -2197,6 +2204,11 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
usage(pack_usage);
continue;
}
+ if (!prefixcmp(arg, "--max-commit-count=")) {
+ if (!git_parse_ulong(arg+19, &commit_count_limit))
+ usage(pack_usage);
+ continue;
+ }
if (!prefixcmp(arg, "--window=")) {
char *end;
window = strtoul(arg+9, &end, 0);
@@ -2340,6 +2352,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (!pack_to_stdout && !object_count_limit)
object_count_limit = object_count_limit_cfg;
+ if (!pack_to_stdout && commit_count_limit)
+ die("--max-commit-count is only useful together with --stdout.");
+
if (!pack_to_stdout && thin)
die("--thin cannot be used to build an indexable pack.");
@@ -2369,6 +2384,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (pack_to_stdout && object_count_limit && object_count_limit < nr_result)
die("unable to make pack within the object count limit"
" (%lu objects)", object_count_limit);
+ if (pack_to_stdout && commit_count_limit && commit_count_limit < nr_commits)
+ die("unable to make pack within the commit count limit"
+ " (%lu commits)", commit_count_limit);
if (nr_result)
prepare_pack(window, depth);
write_pack_file();
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index d133c28..919e2da 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -483,6 +483,40 @@ test_expect_success '--stdout fails when pack exceeds --max-object-count' '
grep -q "object count limit" errs
'
+test_expect_success 'make a few more commits' '
+ git reset --hard $commit &&
+ echo "change" > file &&
+ git add file &&
+ git commit -m second &&
+ commit2=`git rev-parse --verify HEAD` &&
+ echo "more change" >> file &&
+ git commit -a -m third &&
+ commit3=`git rev-parse --verify HEAD` &&
+ echo "even more change" >> file &&
+ git commit -a -m fourth &&
+ commit4=`git rev-parse --verify HEAD` && {
+ echo $commit &&
+ echo $commit2 &&
+ echo $commit3 &&
+ echo $commit4
+ } >>commit-list
+'
+
+test_expect_success '--stdout works with large enough --max-commit-count' '
+ git pack-objects --revs --stdout --max-commit-count=4 <commit-list >test-22.pack &&
+ git index-pack --strict test-22.pack
+'
+
+test_expect_success 'verify resulting pack' '
+ git verify-pack test-22.pack
+'
+
+test_expect_success '--stdout fails when pack exceeds --max-commit-count' '
+ test_must_fail git pack-objects --revs --stdout --max-commit-count=3 <commit-list >test-23.pack 2>errs &&
+ test_must_fail git index-pack --strict test-23.pack &&
+ grep -q "commit count limit" errs
+'
+
#
# WARNING!
#
--
1.7.5.rc1.3.g4d7b
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
2011-05-15 21:37 ` [PATCHv3 0/9] Push limits Johan Herland
` (4 preceding siblings ...)
2011-05-15 21:37 ` [PATCHv3 5/9] pack-objects: Teach new option --max-commit-count, limiting #commits in pack Johan Herland
@ 2011-05-15 21:37 ` Johan Herland
2011-05-16 6:50 ` Junio C Hamano
2011-05-15 21:37 ` [PATCHv3 7/9] send-pack/receive-pack: Allow server to refuse pushes with too many objects Johan Herland
` (3 subsequent siblings)
9 siblings, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git
This adds some technical documentation on the 'limit-*' family of
capabilities that will be added in the following commits.
Also refactor the generation of the capabilities declaration in receive-pack.
This will also be further expanded in the following commits.
Finally, change the return type of server_supports() to allow the caller to
more closely examine the found capability, e.g. by calling
server_supports("limit-foo="), and then use the return value to parse the
value following the '='.
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/technical/pack-protocol.txt | 6 ++--
Documentation/technical/protocol-capabilities.txt | 22 +++++++++++++++++++++
builtin/receive-pack.c | 16 +++++++++++---
cache.h | 2 +-
connect.c | 7 +++--
5 files changed, 42 insertions(+), 11 deletions(-)
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 4a68f0f..ddc0d0e 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -167,7 +167,7 @@ MUST peel the ref if it's an annotated tag.
other-peeled = obj-id SP refname "^{}" LF
capability-list = capability *(SP capability)
- capability = 1*(LC_ALPHA / DIGIT / "-" / "_")
+ capability = 1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
LC_ALPHA = %x61-7A
----
@@ -391,8 +391,8 @@ The reference discovery phase is done nearly the same way as it is in the
fetching protocol. Each reference obj-id and name on the server is sent
in packet-line format to the client, followed by a flush-pkt. The only
real difference is that the capability listing is different - the only
-possible values are 'report-status', 'delete-refs', 'side-band-64k' and
-'ofs-delta'.
+possible values are 'report-status', 'delete-refs', 'side-band-64k',
+'ofs-delta' and 'limit-*'.
Reference Update Request and Packfile Transfer
----------------------------------------------
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index b732e80..11849a3 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -21,6 +21,9 @@ NOT advertise capabilities it does not understand.
The 'report-status' and 'delete-refs' capabilities are sent and
recognized by the receive-pack (push to server) process.
+Any 'limit-*' capabilities may only be sent by the receive-pack
+process. It is never requested by client.
+
The 'side-band-64k' and 'ofs-delta' capabilities are sent and
recognized by both upload-pack and receive-pack protocols.
@@ -185,3 +188,22 @@ it is capable of accepting a zero-id value as the target
value of a reference update. It is not sent back by the client, it
simply informs the client that it can be sent zero-id values
to delete references.
+
+limit-*
+-------
+
+If the server sends one or more capabilities that start with "limit-",
+it means that there are certain limits to what kind of pack the server
+will receive. More specifically, these capabilities must be of the form
+"limit-<what>=<num>" where "<what>" (a sequence of lower-case letters,
+digits and "-") describes which property of the pack is limited, and
+"<num>" (a sequence of decimal digits) specifies the limit value.
+Capabilities of this type are not sent back by the client; instead the
+client must verify that the created packfile does not exceed the given
+limits. This check should happen prior to transferring the packfile to
+the server. If the check fails, the client must abort the upload, and
+report the reason for the aborted push back to the user.
+The following "limit-*" capabilites are recognized:
+
+More "limit-*" capabilities may be added in the future. The client
+is free to ignore any "limit-*" capabilities it does not understand.
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index e1ba4dc..c55989d 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -106,15 +106,23 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
return git_default_config(var, value, cb);
}
+static const char *capabilities()
+{
+ static char buf[1024];
+ int ret = snprintf(buf, sizeof(buf),
+ " report-status delete-refs side-band-64k%s",
+ prefer_ofs_delta ? " ofs-delta" : "");
+ assert(ret < sizeof(buf));
+ return buf;
+}
+
static int show_ref(const char *path, const unsigned char *sha1, int flag, void *cb_data)
{
if (sent_capabilities)
packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
else
- packet_write(1, "%s %s%c%s%s\n",
- sha1_to_hex(sha1), path, 0,
- " report-status delete-refs side-band-64k",
- prefer_ofs_delta ? " ofs-delta" : "");
+ packet_write(1, "%s %s%c%s\n",
+ sha1_to_hex(sha1), path, 0, capabilities());
sent_capabilities = 1;
return 0;
}
diff --git a/cache.h b/cache.h
index 2b34116..db97097 100644
--- a/cache.h
+++ b/cache.h
@@ -981,7 +981,7 @@ struct extra_have_objects {
unsigned char (*array)[20];
};
extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, unsigned int flags, struct extra_have_objects *);
-extern int server_supports(const char *feature);
+extern const char *server_supports(const char *feature);
extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
diff --git a/connect.c b/connect.c
index 57dc20c..015c570 100644
--- a/connect.c
+++ b/connect.c
@@ -102,10 +102,11 @@ struct ref **get_remote_heads(int in, struct ref **list,
return list;
}
-int server_supports(const char *feature)
+const char *server_supports(const char *feature)
{
- return server_capabilities &&
- strstr(server_capabilities, feature) != NULL;
+ if (server_capabilities)
+ return strstr(server_capabilities, feature);
+ return NULL;
}
int path_match(const char *path, int nr, char **match)
--
1.7.5.rc1.3.g4d7b
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
2011-05-15 21:37 ` [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities Johan Herland
@ 2011-05-16 6:50 ` Junio C Hamano
2011-05-16 9:53 ` Johan Herland
2011-05-16 22:02 ` Sverre Rabbelier
0 siblings, 2 replies; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16 6:50 UTC (permalink / raw)
To: Johan Herland; +Cc: Shawn Pearce, git
Johan Herland <johan@herland.net> writes:
> +const char *server_supports(const char *feature)
> {
> - return server_capabilities &&
> - strstr(server_capabilities, feature) != NULL;
> + if (server_capabilities)
> + return strstr(server_capabilities, feature);
> + return NULL;
> }
I've been meaning to fix this part, but currently the feature set is given
as space separated list " featurea featureb featurec" and we check with a
token without any space around, e.g. "if (server_supports("no-done"))",
which is quite broken.
We should tighten this strstr() to make sure we are not matching in the
middle of a string, and the need to do so is even greater now that you are
going to introduce "foo=<value>" and the value could even be strings in
the future.
How about implementing rules like these:
- feature must appear at the beginning of server_capabilities, or the
byte immediately before the matched location in server_capabilities
must be a SP; and
- if "feature" does not end with an equal sign, it does not expect a
value. The byte after the matched location in server_capabilities must
be either the end of string or a SP. A feature that expects a value is
checked with 'server_supports("feature=")' and the matched location in
server_capabilities can be followed by anything (i.e. if at the end of
string or a SP, it gets an empty string as the value, and otherwise it
will get the stretch of bytes after the '=' up to the next SP).
Given the server_capabilities string "foo=ab bar=froboz boz=nitfol",
I would like to see these happen:
server_supports("foo=") matches "foo=ab";
server_supports("ab") does not match anything;
server_supports("bar") does not match anything;
server_supports("boz") matches boz=nitfol, without failing at
the end of bar=froboz that comes earlier.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
2011-05-16 6:50 ` Junio C Hamano
@ 2011-05-16 9:53 ` Johan Herland
2011-05-16 22:02 ` Sverre Rabbelier
1 sibling, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-16 9:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Shawn Pearce
On Monday 16 May 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > +const char *server_supports(const char *feature)
> >
> > {
> >
> > - return server_capabilities &&
> > - strstr(server_capabilities, feature) != NULL;
> > + if (server_capabilities)
> > + return strstr(server_capabilities, feature);
> > + return NULL;
> >
> > }
>
> I've been meaning to fix this part, but currently the feature set is
> given as space separated list " featurea featureb featurec" and we check
> with a token without any space around, e.g. "if
> (server_supports("no-done"))", which is quite broken.
>
> We should tighten this strstr() to make sure we are not matching in the
> middle of a string, and the need to do so is even greater now that you
> are going to introduce "foo=<value>" and the value could even be strings
> in the future.
>
> How about implementing rules like these:
>
> [...]
Agreed. I'll take a stab at this in the re-roll.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
2011-05-16 6:50 ` Junio C Hamano
2011-05-16 9:53 ` Johan Herland
@ 2011-05-16 22:02 ` Sverre Rabbelier
2011-05-16 22:07 ` Junio C Hamano
1 sibling, 1 reply; 53+ messages in thread
From: Sverre Rabbelier @ 2011-05-16 22:02 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johan Herland, Shawn Pearce, git
Heya,
On Sun, May 15, 2011 at 23:50, Junio C Hamano <gitster@pobox.com> wrote:
> We should tighten this strstr() to make sure we are not matching in the
> middle of a string, and the need to do so is even greater now that you are
> going to introduce "foo=<value>" and the value could even be strings in
> the future.
If we are writing this down somewhere, should we also dictate how
spaces should be escaped to be "forward-compatible"?
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
2011-05-16 22:02 ` Sverre Rabbelier
@ 2011-05-16 22:07 ` Junio C Hamano
2011-05-16 22:09 ` Sverre Rabbelier
0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16 22:07 UTC (permalink / raw)
To: Sverre Rabbelier; +Cc: Junio C Hamano, Johan Herland, Shawn Pearce, git
Sverre Rabbelier <srabbelier@gmail.com> writes:
> On Sun, May 15, 2011 at 23:50, Junio C Hamano <gitster@pobox.com> wrote:
>> We should tighten this strstr() to make sure we are not matching in the
>> middle of a string, and the need to do so is even greater now that you are
>> going to introduce "foo=<value>" and the value could even be strings in
>> the future.
>
> If we are writing this down somewhere, should we also dictate how
> spaces should be escaped to be "forward-compatible"?
The old clients treat it as SP separted list, e.g. "featurea featureb featureb",
from the beginning of how capabilities[] code was writte, so I do not see
a point. I would expect an arbitrary string values would be encased in
something like base64, base85 or hex.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
2011-05-16 22:07 ` Junio C Hamano
@ 2011-05-16 22:09 ` Sverre Rabbelier
2011-05-16 22:12 ` Junio C Hamano
0 siblings, 1 reply; 53+ messages in thread
From: Sverre Rabbelier @ 2011-05-16 22:09 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johan Herland, Shawn Pearce, git
Heya,
On Mon, May 16, 2011 at 15:07, Junio C Hamano <gitster@pobox.com> wrote:
> The old clients treat it as SP separted list, e.g. "featurea featureb featureb",
> from the beginning of how capabilities[] code was writte, so I do not see
> a point. I would expect an arbitrary string values would be encased in
> something like base64, base85 or hex.
Right, that's my point, do we want to leave it up to each individual
option to decide what encoding to use?
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
2011-05-16 22:09 ` Sverre Rabbelier
@ 2011-05-16 22:12 ` Junio C Hamano
2011-05-16 22:16 ` Sverre Rabbelier
0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16 22:12 UTC (permalink / raw)
To: Sverre Rabbelier; +Cc: Junio C Hamano, Johan Herland, Shawn Pearce, git
Sverre Rabbelier <srabbelier@gmail.com> writes:
> On Mon, May 16, 2011 at 15:07, Junio C Hamano <gitster@pobox.com> wrote:
>> The old clients treat it as SP separted list, e.g. "featurea featureb featureb",
>> from the beginning of how capabilities[] code was writte, so I do not see
>> a point. I would expect an arbitrary string values would be encased in
>> something like base64, base85 or hex.
>
> Right, that's my point, do we want to leave it up to each individual
> option to decide what encoding to use?
I do not see any point in deciding that right now without even knowing
what kind of strings (short? mostly ascii? etc.) we would typically want
as payload and tie our hands with a poor decision we would end up making
out of thin air.
That is what I meant by "I do not see a point".
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
2011-05-16 22:12 ` Junio C Hamano
@ 2011-05-16 22:16 ` Sverre Rabbelier
0 siblings, 0 replies; 53+ messages in thread
From: Sverre Rabbelier @ 2011-05-16 22:16 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johan Herland, Shawn Pearce, git
Heya,
On Mon, May 16, 2011 at 15:12, Junio C Hamano <gitster@pobox.com> wrote:
> I do not see any point in deciding that right now without even knowing
> what kind of strings (short? mostly ascii? etc.) we would typically want
> as payload and tie our hands with a poor decision we would end up making
> out of thin air.
Ok, fair enough. :)
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCHv3 7/9] send-pack/receive-pack: Allow server to refuse pushes with too many objects
2011-05-15 21:37 ` [PATCHv3 0/9] Push limits Johan Herland
` (5 preceding siblings ...)
2011-05-15 21:37 ` [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities Johan Herland
@ 2011-05-15 21:37 ` Johan Herland
2011-05-15 21:37 ` [PATCHv3 8/9] send-pack/receive-pack: Allow server to refuse pushing too large packs Johan Herland
` (2 subsequent siblings)
9 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git
Add a new receive.objectCountLimit config variable which defines an upper
limit on the number of objects to accept in a single push.
This limit is advertised to clients, using the new "limit-object-count=<num>"
capability. The client side - aka. send-pack - parses this capability and
forwards it to pack-objects, using the recently added --max-object-count
option. pack-objects then checks the generated pack against the limit and
aborts the pack generation if the pack would have too many objects.
Additionally - for older clients that do not understand the capability - the
server aborts the transfer if the number of objects in the transferred pack
exceeds the limit. This is a suboptimal fallback solution, as it leaves the
client with a broken pipe, and likely a confused user.
Server administrators might want to use this config variable to prevent
unintended large pushes from entering the repo (typically a result of the
user not being aware of exactly what is being pushed, e.g. pushing a large
rewritten history). Note that this config variable is not intended to protect
against DoS attacks, since there are countless other ways to attempt to DoS a
server without violating this limit.
Traditionally, this kind of limit would be imposed by a pre-receive or update
hook, but both of those run _after_ the pack has been received and stored by
receive-pack, so they cannot prevent the pack from being stored on the server.
Documentation and tests are included.
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/config.txt | 9 +++
Documentation/technical/protocol-capabilities.txt | 2 +
builtin/receive-pack.c | 13 ++++-
builtin/send-pack.c | 10 +++
send-pack.h | 1 +
t/t5400-send-pack.sh | 66 +++++++++++++++++++++
6 files changed, 100 insertions(+), 1 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 056e01f..fd6c130 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1599,6 +1599,15 @@ receive.unpackLimit::
especially on slow filesystems. If not set, the value of
`transfer.unpackLimit` is used instead.
+receive.objectCountLimit::
+ If the number of objects received in a push exceeds this limit,
+ then the entire push will be refused. This is meant to prevent
+ an unintended large push (typically a result of the user not
+ being aware of exactly what is being pushed, e.g. pushing a
+ large rewritten history) from entering the repo. If not set,
+ there is no upper limit on the number of objects transferred
+ in a single push.
+
receive.denyDeletes::
If set to true, git-receive-pack will deny a ref update that deletes
the ref. Use this to prevent such a ref deletion via a push.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 11849a3..aac66af 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -205,5 +205,7 @@ the server. If the check fails, the client must abort the upload, and
report the reason for the aborted push back to the user.
The following "limit-*" capabilites are recognized:
+ - limit-object-count=<num> (Maximum number of objects in a pack)
+
More "limit-*" capabilities may be added in the future. The client
is free to ignore any "limit-*" capabilities it does not understand.
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index c55989d..d919e17 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -28,6 +28,7 @@ static int receive_fsck_objects;
static int receive_unpack_limit = -1;
static int transfer_unpack_limit = -1;
static int unpack_limit = 100;
+static unsigned long limit_object_count;
static int report_status;
static int use_sideband;
static int prefer_ofs_delta = 1;
@@ -73,6 +74,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
return 0;
}
+ if (strcmp(var, "receive.objectcountlimit") == 0) {
+ limit_object_count = git_config_ulong(var, value);
+ return 0;
+ }
+
if (strcmp(var, "receive.fsckobjects") == 0) {
receive_fsck_objects = git_config_bool(var, value);
return 0;
@@ -112,6 +118,9 @@ static const char *capabilities()
int ret = snprintf(buf, sizeof(buf),
" report-status delete-refs side-band-64k%s",
prefer_ofs_delta ? " ofs-delta" : "");
+ if (limit_object_count > 0)
+ ret += snprintf(buf + ret, sizeof(buf) - ret,
+ " limit-object-count=%lu", limit_object_count);
assert(ret < sizeof(buf));
return buf;
}
@@ -656,7 +665,9 @@ static const char *unpack(void)
"--pack_header=%"PRIu32",%"PRIu32,
ntohl(hdr.hdr_version), ntohl(hdr.hdr_entries));
- if (ntohl(hdr.hdr_entries) < unpack_limit) {
+ if (limit_object_count > 0 && ntohl(hdr.hdr_entries) > limit_object_count)
+ return "received pack exceeds configured receive.objectCountLimit";
+ else if (ntohl(hdr.hdr_entries) < unpack_limit) {
int code, i = 0;
const char *unpacker[4];
unpacker[i++] = "unpack-objects";
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index f571917..4103116 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -49,9 +49,11 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
NULL,
NULL,
NULL,
+ NULL,
};
struct child_process po;
int i;
+ char buf[40];
i = 4;
if (args->use_thin_pack)
@@ -62,6 +64,11 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
argv[i++] = "-q";
if (args->progress)
argv[i++] = "--progress";
+ if (args->max_object_count > 0) {
+ snprintf(buf, sizeof(buf), "--max-object-count=%lu",
+ args->max_object_count);
+ argv[i++] = buf;
+ }
memset(&po, 0, sizeof(po));
po.argv = argv;
po.in = -1;
@@ -253,6 +260,7 @@ int send_pack(struct send_pack_args *args,
unsigned cmds_sent = 0;
int ret = 0;
struct async demux;
+ const char *p;
/* Does the other end support the reporting? */
if (server_supports("report-status"))
@@ -263,6 +271,8 @@ int send_pack(struct send_pack_args *args,
args->use_ofs_delta = 1;
if (server_supports("side-band-64k"))
use_sideband = 1;
+ if ((p = server_supports("limit-object-count=")))
+ args->max_object_count = strtoul(p + 19, NULL, 10);
if (!remote_refs) {
fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
diff --git a/send-pack.h b/send-pack.h
index 05d7ab1..eaacb20 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -12,6 +12,7 @@ struct send_pack_args {
use_ofs_delta:1,
dry_run:1,
stateless_rpc:1;
+ unsigned long max_object_count;
};
int send_pack(struct send_pack_args *args,
diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index 0eace37..f1f8f80 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -222,4 +222,70 @@ test_expect_success 'deny pushing to delete current branch' '
)
'
+echo "0000" > pkt-flush
+
+test_expect_success 'verify that limit-object-count capability is not advertised by default' '
+ rewound_push_setup &&
+ (
+ cd parent &&
+ test_might_fail git receive-pack . <../pkt-flush >output &&
+ test_must_fail grep -q "limit-object-count" output
+ )
+'
+
+test_expect_success 'verify that receive.objectCountLimit triggers limit-object-count capability' '
+ (
+ cd parent &&
+ git config receive.objectCountLimit 1 &&
+ test_might_fail git receive-pack . <../pkt-flush >output &&
+ grep -q "limit-object-count=1" output
+ )
+'
+
+test_expect_success 'deny pushing when receive.objectCountLimit is exceeded' '
+ (
+ cd child &&
+ git reset --hard origin/master &&
+ echo three > file && git commit -a -m three &&
+ test_must_fail git send-pack ../parent master 2>errs &&
+ grep -q "object count limit" errs
+ ) &&
+ parent_head=$(cd parent && git rev-parse --verify master) &&
+ child_head=$(cd child && git rev-parse --verify master) &&
+ test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'repeated push failure proves that objects were not stored remotely' '
+ (
+ cd child &&
+ test_must_fail git send-pack ../parent master 2>errs &&
+ grep -q "object count limit" errs
+ ) &&
+ parent_head=$(cd parent && git rev-parse --verify master) &&
+ child_head=$(cd child && git rev-parse --verify master) &&
+ test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'increase receive.objectCountLimit' '
+ (
+ cd parent &&
+ git config receive.objectCountLimit 10 &&
+ test_might_fail git receive-pack . <../pkt-flush >output &&
+ grep -q "limit-object-count=10" output
+ )
+'
+
+test_expect_success 'push is allowed when object limit is not exceeded' '
+ (
+ cd child &&
+ git send-pack ../parent master 2>errs &&
+ test_must_fail grep -q "object count limit" errs &&
+ # Also no error message from remote receive-pack
+ test_must_fail grep -q "receive\\.objectCountLimit" errs
+ ) &&
+ parent_head=$(cd parent && git rev-parse --verify master) &&
+ child_head=$(cd child && git rev-parse --verify master) &&
+ test "$parent_head" = "$child_head"
+'
+
test_done
--
1.7.5.rc1.3.g4d7b
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCHv3 8/9] send-pack/receive-pack: Allow server to refuse pushing too large packs
2011-05-15 21:37 ` [PATCHv3 0/9] Push limits Johan Herland
` (6 preceding siblings ...)
2011-05-15 21:37 ` [PATCHv3 7/9] send-pack/receive-pack: Allow server to refuse pushes with too many objects Johan Herland
@ 2011-05-15 21:37 ` Johan Herland
2011-05-15 21:37 ` [PATCHv3 9/9] send-pack/receive-pack: Allow server to refuse pushes with too many commits Johan Herland
2011-05-15 21:52 ` [PATCHv3 0/9] Push limits Ævar Arnfjörð Bjarmason
9 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git
Add a new receive.packSizeLimit config variable which defines an upper
limit on the pack size to accept in a single push.
This limit is advertised to clients, using the new "limit-pack-size=<num>"
capability. The client side - aka. send-pack - parses this capability and
forwards it to pack-objects, using the --max-pack-size option.
pack-objects then checks the generated pack against the limit and aborts
the pack transmission if the pack becomes too large.
However, older clients that do not understand the capability will not check
their pack against the limit, and will end up pushing the pack to the server.
Currently there is no extra check on the server to detect a push that exceeds
receive.packSizeLimit. However, such a check could be done in a pre-receive
or update hook.
Documentation and tests are included.
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/config.txt | 9 +++
Documentation/technical/protocol-capabilities.txt | 1 +
builtin/receive-pack.c | 10 +++-
builtin/send-pack.c | 14 ++++-
send-pack.h | 1 +
t/t5400-send-pack.sh | 62 +++++++++++++++++++++
6 files changed, 93 insertions(+), 4 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index fd6c130..acc1ef2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1599,6 +1599,15 @@ receive.unpackLimit::
especially on slow filesystems. If not set, the value of
`transfer.unpackLimit` is used instead.
+receive.packSizeLimit::
+ If the pack file transferred in a push exceeds this limit,
+ then the entire push will be refused. This is meant to prevent
+ an unintended large push (typically a result of the user not
+ being aware of exactly what is being pushed, e.g. pushing a
+ large rewritten history) from entering the repo. If not set,
+ there is no upper limit on the size of the pack transferred
+ in a single push.
+
receive.objectCountLimit::
If the number of objects received in a push exceeds this limit,
then the entire push will be refused. This is meant to prevent
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index aac66af..d3921a9 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -205,6 +205,7 @@ the server. If the check fails, the client must abort the upload, and
report the reason for the aborted push back to the user.
The following "limit-*" capabilites are recognized:
+ - limit-pack-size=<num> (Maximum size (in bytes) of uploaded pack)
- limit-object-count=<num> (Maximum number of objects in a pack)
More "limit-*" capabilities may be added in the future. The client
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d919e17..97c154b 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -28,7 +28,7 @@ static int receive_fsck_objects;
static int receive_unpack_limit = -1;
static int transfer_unpack_limit = -1;
static int unpack_limit = 100;
-static unsigned long limit_object_count;
+static unsigned long limit_pack_size, limit_object_count;
static int report_status;
static int use_sideband;
static int prefer_ofs_delta = 1;
@@ -74,6 +74,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
return 0;
}
+ if (strcmp(var, "receive.packsizelimit") == 0) {
+ limit_pack_size = git_config_ulong(var, value);
+ return 0;
+ }
+
if (strcmp(var, "receive.objectcountlimit") == 0) {
limit_object_count = git_config_ulong(var, value);
return 0;
@@ -118,6 +123,9 @@ static const char *capabilities()
int ret = snprintf(buf, sizeof(buf),
" report-status delete-refs side-band-64k%s",
prefer_ofs_delta ? " ofs-delta" : "");
+ if (limit_pack_size > 0)
+ ret += snprintf(buf + ret, sizeof(buf) - ret,
+ " limit-pack-size=%lu", limit_pack_size);
if (limit_object_count > 0)
ret += snprintf(buf + ret, sizeof(buf) - ret,
" limit-object-count=%lu", limit_object_count);
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 4103116..e7c82ce 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -50,10 +50,11 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
NULL,
NULL,
NULL,
+ NULL,
};
struct child_process po;
int i;
- char buf[40];
+ char buf1[40], buf2[40];
i = 4;
if (args->use_thin_pack)
@@ -64,10 +65,15 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
argv[i++] = "-q";
if (args->progress)
argv[i++] = "--progress";
+ if (args->max_pack_size > 0) {
+ snprintf(buf1, sizeof(buf1), "--max-pack-size=%lu",
+ args->max_pack_size);
+ argv[i++] = buf1;
+ }
if (args->max_object_count > 0) {
- snprintf(buf, sizeof(buf), "--max-object-count=%lu",
+ snprintf(buf2, sizeof(buf2), "--max-object-count=%lu",
args->max_object_count);
- argv[i++] = buf;
+ argv[i++] = buf2;
}
memset(&po, 0, sizeof(po));
po.argv = argv;
@@ -271,6 +277,8 @@ int send_pack(struct send_pack_args *args,
args->use_ofs_delta = 1;
if (server_supports("side-band-64k"))
use_sideband = 1;
+ if ((p = server_supports("limit-pack-size=")))
+ args->max_pack_size = strtoul(p + 16, NULL, 10);
if ((p = server_supports("limit-object-count=")))
args->max_object_count = strtoul(p + 19, NULL, 10);
diff --git a/send-pack.h b/send-pack.h
index eaacb20..1c6c202 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -12,6 +12,7 @@ struct send_pack_args {
use_ofs_delta:1,
dry_run:1,
stateless_rpc:1;
+ unsigned long max_pack_size;
unsigned long max_object_count;
};
diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index f1f8f80..26fd1b4 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -288,4 +288,66 @@ test_expect_success 'push is allowed when object limit is not exceeded' '
test "$parent_head" = "$child_head"
'
+test_expect_success 'verify that limit-pack-size capability is not advertised by default' '
+ rewound_push_setup &&
+ (
+ cd parent &&
+ test_might_fail git receive-pack . <../pkt-flush >output &&
+ test_must_fail grep -q "limit-pack-size" output
+ )
+'
+
+test_expect_success 'verify that receive.packSizeLimit triggers limit-pack-size capability' '
+ (
+ cd parent &&
+ git config receive.packSizeLimit 10 &&
+ test_might_fail git receive-pack . <../pkt-flush >output &&
+ grep -q "limit-pack-size=10" output
+ )
+'
+
+test_expect_success 'deny pushing when receive.packSizeLimit is exceeded' '
+ (
+ cd child &&
+ git reset --hard origin/master &&
+ echo three > file && git commit -a -m three &&
+ test_must_fail git send-pack ../parent master 2>errs &&
+ grep -q "pack size limit" errs
+ ) &&
+ parent_head=$(cd parent && git rev-parse --verify master) &&
+ child_head=$(cd child && git rev-parse --verify master) &&
+ test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'repeated push failure proves that objects were not stored remotely' '
+ (
+ cd child &&
+ test_must_fail git send-pack ../parent master 2>errs &&
+ grep -q "pack size limit" errs
+ ) &&
+ parent_head=$(cd parent && git rev-parse --verify master) &&
+ child_head=$(cd child && git rev-parse --verify master) &&
+ test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'increase receive.packSizeLimit' '
+ (
+ cd parent &&
+ git config receive.packSizeLimit 1000000 &&
+ test_might_fail git receive-pack . <../pkt-flush >output &&
+ grep -q "limit-pack-size=1000000" output
+ )
+'
+
+test_expect_success 'push is allowed when pack size is not exceeded' '
+ (
+ cd child &&
+ git send-pack ../parent master 2>errs &&
+ test_must_fail grep -q "pack size limit" errs
+ ) &&
+ parent_head=$(cd parent && git rev-parse --verify master) &&
+ child_head=$(cd child && git rev-parse --verify master) &&
+ test "$parent_head" = "$child_head"
+'
+
test_done
--
1.7.5.rc1.3.g4d7b
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCHv3 9/9] send-pack/receive-pack: Allow server to refuse pushes with too many commits
2011-05-15 21:37 ` [PATCHv3 0/9] Push limits Johan Herland
` (7 preceding siblings ...)
2011-05-15 21:37 ` [PATCHv3 8/9] send-pack/receive-pack: Allow server to refuse pushing too large packs Johan Herland
@ 2011-05-15 21:37 ` Johan Herland
2011-05-15 21:52 ` [PATCHv3 0/9] Push limits Ævar Arnfjörð Bjarmason
9 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git
Add a new receive.commitCountLimit config variable which defines an upper
limit on the number of commits to accept in a single push.
This limit is advertised to clients, using the new "limit-commit-count=<num>"
capability. The client side - aka. send-pack - parses this capability and
forwards it to pack-objects, using the recently added --max-commit-count
option. pack-objects then checks the generated pack against the limit and
aborts the pack generation if the pack would have too many objects.
However, older clients that do not understand the capability will not check
their pack against the limit, and will end up pushing the pack to the server.
Currently there is no extra check on the server to detect a push that exceeds
receive.commitCountLimit. However, such a check could be done in a pre-receive
or update hook.
Documentation and tests are included.
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/config.txt | 9 +++
Documentation/technical/protocol-capabilities.txt | 1 +
builtin/receive-pack.c | 10 +++-
builtin/send-pack.c | 10 +++-
send-pack.h | 1 +
t/t5400-send-pack.sh | 63 +++++++++++++++++++++
6 files changed, 92 insertions(+), 2 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index acc1ef2..c63a224 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1617,6 +1617,15 @@ receive.objectCountLimit::
there is no upper limit on the number of objects transferred
in a single push.
+receive.commitCountLimit::
+ If the number of commits received in a push exceeds this limit,
+ then the entire push will be refused. This is meant to prevent
+ an unintended large push (typically a result of the user not
+ being aware of exactly what is being pushed, e.g. pushing a
+ large rewritten history) from entering the repo. If not set,
+ there is no upper limit on the number of commits transferred
+ in a single push.
+
receive.denyDeletes::
If set to true, git-receive-pack will deny a ref update that deletes
the ref. Use this to prevent such a ref deletion via a push.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index d3921a9..c16240b 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -207,6 +207,7 @@ The following "limit-*" capabilites are recognized:
- limit-pack-size=<num> (Maximum size (in bytes) of uploaded pack)
- limit-object-count=<num> (Maximum number of objects in a pack)
+ - limit-commit-count=<num> (Maximum number of commits in a pack)
More "limit-*" capabilities may be added in the future. The client
is free to ignore any "limit-*" capabilities it does not understand.
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 97c154b..1d1170b 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -28,7 +28,7 @@ static int receive_fsck_objects;
static int receive_unpack_limit = -1;
static int transfer_unpack_limit = -1;
static int unpack_limit = 100;
-static unsigned long limit_pack_size, limit_object_count;
+static unsigned long limit_pack_size, limit_object_count, limit_commit_count;
static int report_status;
static int use_sideband;
static int prefer_ofs_delta = 1;
@@ -84,6 +84,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
return 0;
}
+ if (strcmp(var, "receive.commitcountlimit") == 0) {
+ limit_commit_count = git_config_ulong(var, value);
+ return 0;
+ }
+
if (strcmp(var, "receive.fsckobjects") == 0) {
receive_fsck_objects = git_config_bool(var, value);
return 0;
@@ -129,6 +134,9 @@ static const char *capabilities()
if (limit_object_count > 0)
ret += snprintf(buf + ret, sizeof(buf) - ret,
" limit-object-count=%lu", limit_object_count);
+ if (limit_commit_count > 0)
+ ret += snprintf(buf + ret, sizeof(buf) - ret,
+ " limit-commit-count=%lu", limit_commit_count);
assert(ret < sizeof(buf));
return buf;
}
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index e7c82ce..0ef68ea 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -51,10 +51,11 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
NULL,
NULL,
NULL,
+ NULL,
};
struct child_process po;
int i;
- char buf1[40], buf2[40];
+ char buf1[40], buf2[40], buf3[40];
i = 4;
if (args->use_thin_pack)
@@ -75,6 +76,11 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
args->max_object_count);
argv[i++] = buf2;
}
+ if (args->max_commit_count > 0) {
+ snprintf(buf3, sizeof(buf3), "--max-commit-count=%lu",
+ args->max_commit_count);
+ argv[i++] = buf3;
+ }
memset(&po, 0, sizeof(po));
po.argv = argv;
po.in = -1;
@@ -281,6 +287,8 @@ int send_pack(struct send_pack_args *args,
args->max_pack_size = strtoul(p + 16, NULL, 10);
if ((p = server_supports("limit-object-count=")))
args->max_object_count = strtoul(p + 19, NULL, 10);
+ if ((p = server_supports("limit-commit-count=")))
+ args->max_commit_count = strtoul(p + 19, NULL, 10);
if (!remote_refs) {
fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
diff --git a/send-pack.h b/send-pack.h
index 1c6c202..9d28cc5 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -14,6 +14,7 @@ struct send_pack_args {
stateless_rpc:1;
unsigned long max_pack_size;
unsigned long max_object_count;
+ unsigned long max_commit_count;
};
int send_pack(struct send_pack_args *args,
diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index 26fd1b4..a71bc47 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -350,4 +350,67 @@ test_expect_success 'push is allowed when pack size is not exceeded' '
test "$parent_head" = "$child_head"
'
+test_expect_success 'verify that limit-commit-count capability is not advertised by default' '
+ rewound_push_setup &&
+ (
+ cd parent &&
+ test_might_fail git receive-pack . <../pkt-flush >output &&
+ test_must_fail grep -q "limit-commit-count" output
+ )
+'
+
+test_expect_success 'verify that receive.commitCountLimit triggers limit-commit-count capability' '
+ (
+ cd parent &&
+ git config receive.commitCountLimit 1 &&
+ test_might_fail git receive-pack . <../pkt-flush >output &&
+ grep -q "limit-commit-count=1" output
+ )
+'
+
+test_expect_success 'deny pushing when receive.commitCountLimit is exceeded' '
+ (
+ cd child &&
+ git reset --hard origin/master &&
+ echo three > file && git commit -a -m three &&
+ echo four > file && git commit -a -m four &&
+ test_must_fail git send-pack ../parent master 2>errs &&
+ grep -q "commit count limit" errs
+ ) &&
+ parent_head=$(cd parent && git rev-parse --verify master) &&
+ child_head=$(cd child && git rev-parse --verify master) &&
+ test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'repeated push failure proves that objects were not stored remotely' '
+ (
+ cd child &&
+ test_must_fail git send-pack ../parent master 2>errs &&
+ grep -q "commit count limit" errs
+ ) &&
+ parent_head=$(cd parent && git rev-parse --verify master) &&
+ child_head=$(cd child && git rev-parse --verify master) &&
+ test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'increase receive.commitCountLimit' '
+ (
+ cd parent &&
+ git config receive.commitCountLimit 2 &&
+ test_might_fail git receive-pack . <../pkt-flush >output &&
+ grep -q "limit-commit-count=2" output
+ )
+'
+
+test_expect_success 'push is allowed when commit limit is not exceeded' '
+ (
+ cd child &&
+ git send-pack ../parent master 2>errs &&
+ test_must_fail grep -q "commit count limit" errs
+ ) &&
+ parent_head=$(cd parent && git rev-parse --verify master) &&
+ child_head=$(cd child && git rev-parse --verify master) &&
+ test "$parent_head" = "$child_head"
+'
+
test_done
--
1.7.5.rc1.3.g4d7b
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCHv3 0/9] Push limits
2011-05-15 21:37 ` [PATCHv3 0/9] Push limits Johan Herland
` (8 preceding siblings ...)
2011-05-15 21:37 ` [PATCHv3 9/9] send-pack/receive-pack: Allow server to refuse pushes with too many commits Johan Herland
@ 2011-05-15 21:52 ` Ævar Arnfjörð Bjarmason
9 siblings, 0 replies; 53+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-05-15 21:52 UTC (permalink / raw)
To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Jeff King, Johannes Sixt
On Sun, May 15, 2011 at 23:37, Johan Herland <johan@herland.net> wrote:
> Here's the next iteration of what started out as a simple patch to let
> the server refuse pushes that exceeded a configurable limit on the
> #objects in a pack.
>
> The current patch series allows limiting pushes by
> - size of pack
> - #objects in pack
> - #commits in pack
FWIW I'd find this very useful. I recently spent a fair amount of time
cleaning up the mess created by a user pushing all the tags from
repository A to repository B (don't ask), the options you've
implemented here would have stopped that.
And as you point out even if you refuse these sort of things with a
hook (which I later implemented) that doesn't stop the server from
accepting the objects and keeping them around.
^ permalink raw reply [flat|nested] 53+ messages in thread