git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Clone fails on a repo with too many heads/tags
@ 2012-03-18  8:14 Ivan Todoroski
  2012-03-18 11:37 ` Ivan Todoroski
  2012-03-18 16:36 ` Jakub Narebski
  0 siblings, 2 replies; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-18  8:14 UTC (permalink / raw)
  To: git

I recently tried cloning a fresh copy of a large repo (converted from CVS, 
nearly 10 years of history) and to my surprise "git clone" failed with the 
following message:

    error: cannot spawn git: No such file or directory

The problem is only reproduced using the Smart HTTP transport.

I used msysGit on Windows so my first instinct was to contact them, but after 
some poking around I discovered that the problem is present in the Linux 
version too, although harder to trigger.

Try executing this script:

-------------------------------
git init too-many-refs
cd too-many-refs
echo bla > bla.txt
git add .
git commit -m test
sha=$(git rev-parse HEAD)
for ((i=0; i<100000; i++)); do 
	echo $sha refs/tags/artificially-long-tag-name-to-more-easily-
demonstrate-the-problem-$i >> .git/packed-refs
done
-------------------------------

Now share this repo using the Smart HTTP transport (git-http-backend) and then 
try cloning it in a different directory. This is what you would get:

$ git clone http://localhost/.../too-many-refs/.git
Cloning into 'too-many-refs'...
fatal: cannot exec 'fetch-pack': Argument list too long

So we come to the real reason for the failure: somewhere inside Git a 
subcommand is invoked with all the tags/heads on the command line and if you 
have enough of them it overflows the command line length limit of the OS.

Obviously the number of tags in the "too-many-refs" repo above is absurd (100k) 
because the cmdline length in Linux is much more generous, but on Windows the 
clone fails with as little as 500 tags in the above loop! I am already hitting 
this problem with msysGit on real repos, not just artificial test cases.

I tracked down the problem to remote-curl.c:fetch_git(). That's where the 
"fetch-pack" command line is being constructed with all the refs on one line:

git fetch-pack --stateless-rpc --lock-pack ...<all the refs>...

The solution is conceptually simple: if the list of refs results in a too long 
command line, split the refs in batches and call fetch-pack multiple times such 
that each call is under the cmdline limit:

git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
...
git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-18  8:14 Clone fails on a repo with too many heads/tags Ivan Todoroski
@ 2012-03-18 11:37 ` Ivan Todoroski
  2012-03-18 12:04   ` Nguyen Thai Ngoc Duy
  2012-03-18 16:36 ` Jakub Narebski
  1 sibling, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-18 11:37 UTC (permalink / raw)
  To: git

Ivan Todoroski <grnch_lists <at> gmx.net> writes:
> Now share this repo using the Smart HTTP transport (git-http-backend) and 
then 
> try cloning it in a different directory. This is what you would get:
> 
> $ git clone http://localhost/.../too-many-refs/.git
> Cloning into 'too-many-refs'...
> fatal: cannot exec 'fetch-pack': Argument list too long
> 
> [...]
> 
> The solution is conceptually simple: if the list of refs results in a too 
long 
> command line, split the refs in batches and call fetch-pack multiple times 
such 
> that each call is under the cmdline limit:
> 
> git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
> git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
> ...
> git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...


BTW, I didn't want to sound like I am expecting or demanding a fix. If the 
experienced Git devs lack the time or inclination to work on this bug 
(understandable), I am certainly willing to try it myself. My C skills are a 
bit rusty and I'm not very familiar with the Git codebase, but I will do my 
best to follow Documentation/SubmittingPatches as well as the existing code 
structure.

I will need a few pointers to get me started in the right direction though...


1) Is splitting the cmdline in batches and executing fetch-pack multiple times 
the right approach? If you have another solution please suggest.


2) Should I add the test case for this bug to existing scripts like t/t5551-
http-fetch.sh and t/t5561-http-backend.sh, or should I create a new test script 
under t/ following their example? There will probably be only one test case for 
this bug, basically the script I pasted in the original email to reproduce it.


3) What would be the most portable way to get the cmdline length limit between 
POSIX and Windows? Would something like this be acceptable:

#ifder _WIN32
	int cmdline_limit = 32767;
#else
	int cmdline_limit = sysconf(_SC_ARG_MAX);
#endif

I couldn't actually find a Windows API to get the cmdline limit, but this blog 
post by one of the Windows people tells the value:

http://blogs.msdn.com/b/oldnewthing/archive/2003/12/10/56028.aspx


4) Should this problem be fixed only in remote-curl.c:fetch_git() or should it 
be solved more generally in run-command.c:start_command(), which is used by 
fetch_git() for the actual invocation?

If this is fixed only in remote-curl:fetch_git(), then the same logic would 
need to be open coded in any other such place that might be found. Are you 
aware of any other internal sub-commands that put all refs on the command line 
and could be susceptible to the same issue?


If it's fixed at a lower level in run-command.c:start_command(), the logic 
would become available to any other sub-command that needs it.

However, this would mean that struct child_process as well as struct rpc_state 
would need an additional field that would tell whether the command is safe to 
execute in multiple batches and how many of the arguments at the beginning of 
child_process.argv must be preserved on every invocation (the switches and 
such).

Something like child_process.split_after, which if non-zero would mean that 
start_command() is free to invoke the command multiple times when argv exceeds 
the cmdline limit, by grouping any arguments after argv[split_after] in smaller 
batches for each invocation.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-18 11:37 ` Ivan Todoroski
@ 2012-03-18 12:04   ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 54+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-03-18 12:04 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: git, Shawn O. Pearce

I had a look but could not come up with a simple solution for this. CC
Shawn who knows smart-http really well.

On Sun, Mar 18, 2012 at 6:37 PM, Ivan Todoroski <grnch_lists@gmx.net> wrote:
> Ivan Todoroski <grnch_lists <at> gmx.net> writes:
>> Now share this repo using the Smart HTTP transport (git-http-backend) and
> then
>> try cloning it in a different directory. This is what you would get:
>>
>> $ git clone http://localhost/.../too-many-refs/.git
>> Cloning into 'too-many-refs'...
>> fatal: cannot exec 'fetch-pack': Argument list too long
>>
>> [...]
>>
>> The solution is conceptually simple: if the list of refs results in a too
> long
>> command line, split the refs in batches and call fetch-pack multiple times
> such
>> that each call is under the cmdline limit:
>>
>> git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
>> git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
>> ...
>> git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...
>
>
> BTW, I didn't want to sound like I am expecting or demanding a fix. If the
> experienced Git devs lack the time or inclination to work on this bug
> (understandable), I am certainly willing to try it myself. My C skills are a
> bit rusty and I'm not very familiar with the Git codebase, but I will do my
> best to follow Documentation/SubmittingPatches as well as the existing code
> structure.
>
> I will need a few pointers to get me started in the right direction though...
>
>
> 1) Is splitting the cmdline in batches and executing fetch-pack multiple times
> the right approach? If you have another solution please suggest.
>
>
> 2) Should I add the test case for this bug to existing scripts like t/t5551-
> http-fetch.sh and t/t5561-http-backend.sh, or should I create a new test script
> under t/ following their example? There will probably be only one test case for
> this bug, basically the script I pasted in the original email to reproduce it.
>
>
> 3) What would be the most portable way to get the cmdline length limit between
> POSIX and Windows? Would something like this be acceptable:
>
> #ifder _WIN32
>        int cmdline_limit = 32767;
> #else
>        int cmdline_limit = sysconf(_SC_ARG_MAX);
> #endif
>
> I couldn't actually find a Windows API to get the cmdline limit, but this blog
> post by one of the Windows people tells the value:
>
> http://blogs.msdn.com/b/oldnewthing/archive/2003/12/10/56028.aspx
>
>
> 4) Should this problem be fixed only in remote-curl.c:fetch_git() or should it
> be solved more generally in run-command.c:start_command(), which is used by
> fetch_git() for the actual invocation?
>
> If this is fixed only in remote-curl:fetch_git(), then the same logic would
> need to be open coded in any other such place that might be found. Are you
> aware of any other internal sub-commands that put all refs on the command line
> and could be susceptible to the same issue?
>
>
> If it's fixed at a lower level in run-command.c:start_command(), the logic
> would become available to any other sub-command that needs it.
>
> However, this would mean that struct child_process as well as struct rpc_state
> would need an additional field that would tell whether the command is safe to
> execute in multiple batches and how many of the arguments at the beginning of
> child_process.argv must be preserved on every invocation (the switches and
> such).
>
> Something like child_process.split_after, which if non-zero would mean that
> start_command() is free to invoke the command multiple times when argv exceeds
> the cmdline limit, by grouping any arguments after argv[split_after] in smaller
> batches for each invocation.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Duy

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-18  8:14 Clone fails on a repo with too many heads/tags Ivan Todoroski
  2012-03-18 11:37 ` Ivan Todoroski
@ 2012-03-18 16:36 ` Jakub Narebski
  2012-03-18 19:07   ` Jeff King
  1 sibling, 1 reply; 54+ messages in thread
From: Jakub Narebski @ 2012-03-18 16:36 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: git

Ivan Todoroski <grnch_lists@gmx.net> writes:

> I recently tried cloning a fresh copy of a large repo (converted from CVS, 
> nearly 10 years of history) and to my surprise "git clone" failed with the 
> following message:
> 
>     error: cannot spawn git: No such file or directory
> 
> The problem is only reproduced using the Smart HTTP transport.

[...]
> I tracked down the problem to remote-curl.c:fetch_git(). That's where the 
> "fetch-pack" command line is being constructed with all the refs on one line:
> 
> git fetch-pack --stateless-rpc --lock-pack ...<all the refs>...
> 
> The solution is conceptually simple: if the list of refs results in a too long 
> command line, split the refs in batches and call fetch-pack multiple times such 
> that each call is under the cmdline limit:
> 
> git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
> git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
> ...
> git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...

That, or implement --stdin / --revs in git-fetch-pach (perhaps
following git-pack-objects that implements --revs).

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-18 16:36 ` Jakub Narebski
@ 2012-03-18 19:07   ` Jeff King
  2012-03-18 22:07     ` Jakub Narebski
                       ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Jeff King @ 2012-03-18 19:07 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Ivan Todoroski, git, Nguyen Thai Ngoc Duy, Shawn O. Pearce

On Sun, Mar 18, 2012 at 09:36:24AM -0700, Jakub Narebski wrote:

> > The solution is conceptually simple: if the list of refs results in a too long 
> > command line, split the refs in batches and call fetch-pack multiple times such 
> > that each call is under the cmdline limit:
> > 
> > git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
> > git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
> > ...
> > git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...
> 
> That, or implement --stdin / --revs in git-fetch-pach (perhaps
> following git-pack-objects that implements --revs).

I don't think that will work, as stateless-rpc fetch-pack already uses
stdin to receive the list of advertised refs from the remote. Nor would
you want to have multiple invocations of fetch-pack, since that would
mean multiple http requests and multiple pack responses (which could not
delta between themselves).

And you can't condense the list in the general case. It is the set of
refs that we actually want to fetch. We could try passing just the
original refspecs (not the expansion) and letting fetch-pack try to do
the expansion, but in the worst case, you might really just have a
gigantic list of refs.

I think the only sane solution is to write the values to a temporary
file, and do something like:

  git fetch-pack --stateless-rpc --refs-from=$tmpfile

Even if you put the tmpfile in $GIT_DIR, I don't think this should run
afoul of any read-only repositories, since by definition you are
fetching into the repository (but you could also just put it in /tmp).

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-18 19:07   ` Jeff King
@ 2012-03-18 22:07     ` Jakub Narebski
  2012-03-19  2:32       ` Jeff King
  2012-03-19  1:05     ` Ivan Todoroski
  2012-03-19  1:30     ` Nguyen Thai Ngoc Duy
  2 siblings, 1 reply; 54+ messages in thread
From: Jakub Narebski @ 2012-03-18 22:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Ivan Todoroski, git, Nguyen Thai Ngoc Duy, Shawn O. Pearce

Jeff King wrote:
[...]
> I think the only sane solution is to write the values to a temporary
> file, and do something like:
> 
>   git fetch-pack --stateless-rpc --refs-from=$tmpfile
> 
> Even if you put the tmpfile in $GIT_DIR, I don't think this should run
> afoul of any read-only repositories, since by definition you are
> fetching into the repository (but you could also just put it in /tmp).

Or

   git fetch-pack --stateless-rpc --refs-fd=$n

and there would be no need for temporary file.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-18 19:07   ` Jeff King
  2012-03-18 22:07     ` Jakub Narebski
@ 2012-03-19  1:05     ` Ivan Todoroski
  2012-03-19  1:30     ` Nguyen Thai Ngoc Duy
  2 siblings, 0 replies; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-19  1:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Jakub Narebski, git, Nguyen Thai Ngoc Duy, Shawn O. Pearce

On 18.03.2012 20:07, Jeff King wrote:
> Nor would you want to have multiple invocations of fetch-pack, since
> that would mean multiple http requests and multiple pack responses
> (which could not delta between themselves).

Sure, but if fetch-pack is called with so many refs that it overflows 
the command line, it's looking at a lot of work ahead of it anyway, it's 
not going to be a fast operation. So the overhead of multiple HTTP 
requests and lack of delta compression between huge batches of refs 
wouldn't be all that significant in practice.

That said, I really like your temp file idea below. It's the best 
solution so far. Simple, efficient, non-intrusive.


> I think the only sane solution is to write the values to a temporary 
> file, and do something like:
> 
> git fetch-pack --stateless-rpc --refs-from=$tmpfile
> 
> Even if you put the tmpfile in $GIT_DIR, I don't think this should
> run afoul of any read-only repositories, since by definition you are 
> fetching into the repository (but you could also just put it in
> /tmp).

OK, if nobody beats me to it I will try to code this next weekend. Maybe 
sooner, but I don't usually have much free time over the week (day job, 
family, etc.).

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-18 19:07   ` Jeff King
  2012-03-18 22:07     ` Jakub Narebski
  2012-03-19  1:05     ` Ivan Todoroski
@ 2012-03-19  1:30     ` Nguyen Thai Ngoc Duy
  2012-03-19  2:44       ` Jeff King
  2 siblings, 1 reply; 54+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-03-19  1:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Jakub Narebski, Ivan Todoroski, git, Shawn O. Pearce

On Mon, Mar 19, 2012 at 2:07 AM, Jeff King <peff@peff.net> wrote:
> I don't think that will work, as stateless-rpc fetch-pack already uses
> stdin to receive the list of advertised refs from the remote. Nor would
> you want to have multiple invocations of fetch-pack, since that would
> mean multiple http requests and multiple pack responses (which could not
> delta between themselves).

remote-curl functions as middle man between http client and
fetch-pack. Can we just send ref list over fetch-pack.stdin first,
followed by maybe empty line, then normal stuff that remote-curl feeds
fetch-pack?
-- 
Duy

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-18 22:07     ` Jakub Narebski
@ 2012-03-19  2:32       ` Jeff King
  2012-03-19  2:43         ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 54+ messages in thread
From: Jeff King @ 2012-03-19  2:32 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Ivan Todoroski, git, Nguyen Thai Ngoc Duy, Shawn O. Pearce

On Sun, Mar 18, 2012 at 11:07:42PM +0100, Jakub Narebski wrote:

> > I think the only sane solution is to write the values to a temporary
> > file, and do something like:
> > 
> >   git fetch-pack --stateless-rpc --refs-from=$tmpfile
> > 
> > Even if you put the tmpfile in $GIT_DIR, I don't think this should run
> > afoul of any read-only repositories, since by definition you are
> > fetching into the repository (but you could also just put it in /tmp).
> 
>    git fetch-pack --stateless-rpc --refs-fd=$n
> 
> and there would be no need for temporary file.

Yeah. That is a slightly more awkward interface for command-line users,
but this is really meant to be an internal-to-git thing (just like
stateless-rpc is in the first place). And avoiding a tempfile is a good
thing.

However, I think you would need to teach the run-command API how to open
extra pipes to a child beyond stdout/stdin/stderr.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-19  2:32       ` Jeff King
@ 2012-03-19  2:43         ` Nguyen Thai Ngoc Duy
  2012-03-19  2:45           ` Jeff King
  0 siblings, 1 reply; 54+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-03-19  2:43 UTC (permalink / raw)
  To: Jeff King
  Cc: Jakub Narebski, Ivan Todoroski, git, Shawn O. Pearce,
	Johannes Sixt

On Mon, Mar 19, 2012 at 9:32 AM, Jeff King <peff@peff.net> wrote:
> However, I think you would need to teach the run-command API how to open
> extra pipes to a child beyond stdout/stdin/stderr.

That might be a problem on Windows. I think Windows exec* has special
support for stdin/stdout/stderr and only those.
-- 
Duy

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-19  1:30     ` Nguyen Thai Ngoc Duy
@ 2012-03-19  2:44       ` Jeff King
  2012-03-21 11:05         ` Ivan Todoroski
  0 siblings, 1 reply; 54+ messages in thread
From: Jeff King @ 2012-03-19  2:44 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Jakub Narebski, Ivan Todoroski, git, Shawn O. Pearce

On Mon, Mar 19, 2012 at 08:30:38AM +0700, Nguyen Thai Ngoc Duy wrote:

> On Mon, Mar 19, 2012 at 2:07 AM, Jeff King <peff@peff.net> wrote:
> > I don't think that will work, as stateless-rpc fetch-pack already uses
> > stdin to receive the list of advertised refs from the remote. Nor would
> > you want to have multiple invocations of fetch-pack, since that would
> > mean multiple http requests and multiple pack responses (which could not
> > delta between themselves).
> 
> remote-curl functions as middle man between http client and
> fetch-pack. Can we just send ref list over fetch-pack.stdin first,
> followed by maybe empty line, then normal stuff that remote-curl feeds
> fetch-pack?

Yes, I think that would work. You'd just have to take care to pass the
residual buffer (i.e., what is left in your input buffer after you
notice that reading the list of wanted refs is finished) along to
the git-protocol code.  So it would require a little refactoring of
get_remote_heads, I think.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-19  2:43         ` Nguyen Thai Ngoc Duy
@ 2012-03-19  2:45           ` Jeff King
  0 siblings, 0 replies; 54+ messages in thread
From: Jeff King @ 2012-03-19  2:45 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Jakub Narebski, Ivan Todoroski, git, Shawn O. Pearce,
	Johannes Sixt

On Mon, Mar 19, 2012 at 09:43:05AM +0700, Nguyen Thai Ngoc Duy wrote:

> On Mon, Mar 19, 2012 at 9:32 AM, Jeff King <peff@peff.net> wrote:
> > However, I think you would need to teach the run-command API how to open
> > extra pipes to a child beyond stdout/stdin/stderr.
> 
> That might be a problem on Windows. I think Windows exec* has special
> support for stdin/stdout/stderr and only those.

Then your sneak-it-across-stdin trick is probably the best bet. It's a
little ugly, but this is for stateless-rpc, so it's not like anything
outside of git is going to actually see this.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-19  2:44       ` Jeff King
@ 2012-03-21 11:05         ` Ivan Todoroski
  2012-03-21 14:28           ` Shawn Pearce
  0 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-21 11:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Nguyen Thai Ngoc Duy, Jakub Narebski, git, Shawn O. Pearce

On 19.03.2012 03:44, Jeff King wrote:
> On Mon, Mar 19, 2012 at 08:30:38AM +0700, Nguyen Thai Ngoc Duy wrote:
> 
>> On Mon, Mar 19, 2012 at 2:07 AM, Jeff King <peff@peff.net> wrote:
>>> I don't think that will work, as stateless-rpc fetch-pack already uses
>>> stdin to receive the list of advertised refs from the remote. Nor would
>>> you want to have multiple invocations of fetch-pack, since that would
>>> mean multiple http requests and multiple pack responses (which could not
>>> delta between themselves).
>> remote-curl functions as middle man between http client and
>> fetch-pack. Can we just send ref list over fetch-pack.stdin first,
>> followed by maybe empty line, then normal stuff that remote-curl feeds
>> fetch-pack?
> 
> Yes, I think that would work. You'd just have to take care to pass the
> residual buffer (i.e., what is left in your input buffer after you
> notice that reading the list of wanted refs is finished) along to
> the git-protocol code.  So it would require a little refactoring of
> get_remote_heads, I think.

Would it be OK for fetch-pack.c to use the packetized format 
(pkt-line.h) for reading the list of refs from stdin?

This way we can read() the exact number of bytes needed for the refs 
from the fd and there would be no need to refactor get_remote_heads() to 
pass a residual buffer, it could just continue reading straight from the 
same fd.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-21 11:05         ` Ivan Todoroski
@ 2012-03-21 14:28           ` Shawn Pearce
  2012-03-21 17:14             ` Jeff King
  0 siblings, 1 reply; 54+ messages in thread
From: Shawn Pearce @ 2012-03-21 14:28 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: Jeff King, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Wed, Mar 21, 2012 at 04:05, Ivan Todoroski <grnch_lists@gmx.net> wrote:
> On 19.03.2012 03:44, Jeff King wrote:
>> Yes, I think that would work. You'd just have to take care to pass the
>> residual buffer (i.e., what is left in your input buffer after you
>> notice that reading the list of wanted refs is finished) along to
>> the git-protocol code.  So it would require a little refactoring of
>> get_remote_heads, I think.
>
>
> Would it be OK for fetch-pack.c to use the packetized format (pkt-line.h)
> for reading the list of refs from stdin?

This is probably the easiest way to implement the sneak-into-stdin
patch. Use a pkt-line for each argument that should have been in the
argv array from the command line, and a flush pkt to terminate the
list.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-21 14:28           ` Shawn Pearce
@ 2012-03-21 17:14             ` Jeff King
  2012-03-21 17:59               ` Jakub Narebski
                                 ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Jeff King @ 2012-03-21 17:14 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Ivan Todoroski, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Wed, Mar 21, 2012 at 07:28:24AM -0700, Shawn O. Pearce wrote:

> > Would it be OK for fetch-pack.c to use the packetized format (pkt-line.h)
> > for reading the list of refs from stdin?
> 
> This is probably the easiest way to implement the sneak-into-stdin
> patch. Use a pkt-line for each argument that should have been in the
> argv array from the command line, and a flush pkt to terminate the
> list.

Something in me feels slightly uncomfortable with that, just because
simple newline-delimited formats make it easy for people to hack on the
tool and feed input from unexpected sources.

But stateless-rpc is already a pretty deep internal interface anyway,
and the format is already weird (the second half is already packetized
input from a remote anyway). So it's probably not worth caring about
hackability here.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-21 17:14             ` Jeff King
@ 2012-03-21 17:59               ` Jakub Narebski
  2012-03-21 20:02               ` Ivan Todoroski
  2012-03-27  6:23               ` [PATCH/RFC v2 0/4] Fix fetch-pack command line overflow during clone Ivan Todoroski
  2 siblings, 0 replies; 54+ messages in thread
From: Jakub Narebski @ 2012-03-21 17:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Ivan Todoroski, Nguyen Thai Ngoc Duy, git

Jeff King wrote:
> On Wed, Mar 21, 2012 at 07:28:24AM -0700, Shawn O. Pearce wrote:
> 
> > > Would it be OK for fetch-pack.c to use the packetized format (pkt-line.h)
> > > for reading the list of refs from stdin?
> > 
> > This is probably the easiest way to implement the sneak-into-stdin
> > patch. Use a pkt-line for each argument that should have been in the
> > argv array from the command line, and a flush pkt to terminate the
> > list.
> 
> Something in me feels slightly uncomfortable with that, just because
> simple newline-delimited formats make it easy for people to hack on the
> tool and feed input from unexpected sources.

Well, give people "pkt-line" command line tool which would transform
ordinary textual output on input into pkt-line (which is almost pure
text anyway) output and vice versa.

> But stateless-rpc is already a pretty deep internal interface anyway,
> and the format is already weird (the second half is already packetized
> input from a remote anyway). So it's probably not worth caring about
> hackability here.

Right.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-21 17:14             ` Jeff King
  2012-03-21 17:59               ` Jakub Narebski
@ 2012-03-21 20:02               ` Ivan Todoroski
  2012-03-21 20:17                 ` Jeff King
  2012-03-27  6:23               ` [PATCH/RFC v2 0/4] Fix fetch-pack command line overflow during clone Ivan Todoroski
  2 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-21 20:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On 21.03.2012 18:14, Jeff King wrote:
> On Wed, Mar 21, 2012 at 07:28:24AM -0700, Shawn O. Pearce wrote:
> 
>>> Would it be OK for fetch-pack.c to use the packetized format (pkt-line.h)
>>> for reading the list of refs from stdin?
>> This is probably the easiest way to implement the sneak-into-stdin
>> patch. Use a pkt-line for each argument that should have been in the
>> argv array from the command line, and a flush pkt to terminate the
>> list.
> 
> Something in me feels slightly uncomfortable with that, just because
> simple newline-delimited formats make it easy for people to hack on the
> tool and feed input from unexpected sources.

I understand what you mean. How about this:

If both --stdin and --stateless-rpc are specified to fetch-pack, it will 
use pkt-line to read the refs from stdin before handing off stdin to 
get_remote_heads().

However, if only --stdin is specified, it will read refs from stdin in a 
script-friendly newline delimited format, one ref per line. This is okay 
because when --stateless-rpc is not specified get_remote_heads() reads 
from an fd different from stdin so there is no issue with residual 
buffers in this case.

This way you preserve scriptability for any other callers who don't use 
--stateless-rpc.

How does this sound?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-21 20:02               ` Ivan Todoroski
@ 2012-03-21 20:17                 ` Jeff King
  2012-03-24 20:49                   ` Ivan Todoroski
                                     ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Jeff King @ 2012-03-21 20:17 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Wed, Mar 21, 2012 at 09:02:13PM +0100, Ivan Todoroski wrote:

> >Something in me feels slightly uncomfortable with that, just because
> >simple newline-delimited formats make it easy for people to hack on the
> >tool and feed input from unexpected sources.
> 
> I understand what you mean. How about this:
> 
> If both --stdin and --stateless-rpc are specified to fetch-pack, it
> will use pkt-line to read the refs from stdin before handing off
> stdin to get_remote_heads().
> 
> However, if only --stdin is specified, it will read refs from stdin
> in a script-friendly newline delimited format, one ref per line. This
> is okay because when --stateless-rpc is not specified
> get_remote_heads() reads from an fd different from stdin so there is
> no issue with residual buffers in this case.
> 
> This way you preserve scriptability for any other callers who don't
> use --stateless-rpc.
> 
> How does this sound?

I think that sounds quite reasonable, and shouldn't be more than a few
extra lines to implement.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-21 20:17                 ` Jeff King
@ 2012-03-24 20:49                   ` Ivan Todoroski
  2012-03-25  1:06                     ` Jeff King
  2012-03-24 20:53                   ` [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin Ivan Todoroski
  2012-03-24 20:54                   ` [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin Ivan Todoroski
  2 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-24 20:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

[-- Attachment #1: Type: text/plain, Size: 2474 bytes --]

On 21.03.2012 21:17, Jeff King wrote:
> On Wed, Mar 21, 2012 at 09:02:13PM +0100, Ivan Todoroski wrote:
> 
>>> Something in me feels slightly uncomfortable with that, just because
>>> simple newline-delimited formats make it easy for people to hack on the
>>> tool and feed input from unexpected sources.
>> I understand what you mean. How about this:
>>
>> If both --stdin and --stateless-rpc are specified to fetch-pack, it
>> will use pkt-line to read the refs from stdin before handing off
>> stdin to get_remote_heads().
>>
>> However, if only --stdin is specified, it will read refs from stdin
>> in a script-friendly newline delimited format, one ref per line. This
>> is okay because when --stateless-rpc is not specified
>> get_remote_heads() reads from an fd different from stdin so there is
>> no issue with residual buffers in this case.
>>
>> This way you preserve scriptability for any other callers who don't
>> use --stateless-rpc.
>>
>> How does this sound?
> 
> I think that sounds quite reasonable, and shouldn't be more than a few
> extra lines to implement.
> 
> -Peff


I wrote the code for this and went to write the test cases, but the test 
suite is not cooperating. :(

I'll send just the code changes for comments while I'm figuring out the 
test suite. I'll include the test cases along with better commit 
messages in the next version of the patches.

So, back to the test suite problem. To get familiar with it I first ran 
the test suite against the vanilla "maint" branch without any of my 
changes, just to see what I should expect on properly running code.

Unfortunately it failed, and to make matters worse it failed exactly on 
the parts dealing with smart HTTP, which is what I need to test in the 
first place. Talk about Murphy's law...

Is it failing for anyone else on the vanilla "maint" branch? I would 
appreciate any help I could get here.

My machine is CentOS 5.8 with latest updates and I have the httpd 
package installed. Here is the command that is failing (you can find the 
full output in the attachment):


$ GIT_TEST_HTTPD=yes ./t5551-http-fetch.sh -i -v
[... skip successful tests ...]
ok 5 - used upload-pack service

expecting success:
         git clone $HTTPD_URL/smart-redir-perm/repo.git --quiet repo-p

error: RPC failed; result=22, HTTP code = 405
fatal: The remote end hung up unexpectedly
not ok - 6 follow redirects (301)
#
#               git clone $HTTPD_URL/smart-redir-perm/repo.git --quiet 
repo-p
#


[-- Attachment #2: http-fetch.log --]
[-- Type: text/plain, Size: 2412 bytes --]

Initialized empty Git repository in /home/itodoroski/git/t/trash directory.t5551-http-fetch/.git/
expecting success: 
	echo content >file &&
	git add file &&
	git commit -m one

[master (root-commit) ba36540] one
 Author: A U Thor <author@example.com>
 1 file changed, 1 insertion(+)
 create mode 100644 file
ok 1 - setup repository

expecting success: 
	mkdir "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
	(cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
	 git --bare init
	) &&
	git remote add public "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
	git push public master:master

Initialized empty Git repository in /home/itodoroski/git/t/trash directory.t5551-http-fetch/httpd/www/repo.git/
To /home/itodoroski/git/t/trash directory.t5551-http-fetch/httpd/www/repo.git
 * [new branch]      master -> master
ok 2 - create http-accessible bare repository

expecting success: 
	GIT_CURL_VERBOSE=1 git clone --quiet $HTTPD_URL/smart/repo.git clone 2>err &&
	test_cmp file clone/file &&
	tr '\015' Q <err |
	sed -e "
		s/Q\$//
		/^[*] /d
		/^$/d
		/^< $/d

		/^[^><]/{
			s/^/> /
		}

		/^> User-Agent: /d
		/^> Host: /d
		/^> POST /,$ {
			/^> Accept: [*]\\/[*]/d
		}
		s/^> Content-Length: .*/> Content-Length: xxx/
		/^> 00..want /d
		/^> 00.*done/d

		/^< Server: /d
		/^< Expires: /d
		/^< Date: /d
		/^< Content-Length: /d
		/^< Transfer-Encoding: /d
	" >act &&
	test_cmp exp act

ok 3 - clone http repository

expecting success: 
	echo content >>file &&
	git commit -a -m two &&
	git push public
	(cd clone && git pull) &&
	test_cmp file clone/file

[master ace4881] two
 Author: A U Thor <author@example.com>
 1 file changed, 1 insertion(+)
To /home/itodoroski/git/t/trash directory.t5551-http-fetch/httpd/www/repo.git
   ba36540..ace4881  master -> master
From http://127.0.0.1:5551/smart/repo
   ba36540..ace4881  master     -> origin/master
Updating ba36540..ace4881
Fast-forward
 file |    1 +
 1 file changed, 1 insertion(+)
ok 4 - fetch changes via http

expecting success: 
	sed -e "
		s/^.* \"//
		s/\"//
		s/ [1-9][0-9]*\$//
		s/^GET /GET  /
	" >act <"$HTTPD_ROOT_PATH"/access.log &&
	test_cmp exp act

ok 5 - used upload-pack service

expecting success: 
	git clone $HTTPD_URL/smart-redir-perm/repo.git --quiet repo-p

error: RPC failed; result=22, HTTP code = 405
fatal: The remote end hung up unexpectedly
not ok - 6 follow redirects (301)
#	
#		git clone $HTTPD_URL/smart-redir-perm/repo.git --quiet repo-p
#	

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin
  2012-03-21 20:17                 ` Jeff King
  2012-03-24 20:49                   ` Ivan Todoroski
@ 2012-03-24 20:53                   ` Ivan Todoroski
  2012-03-25  1:19                     ` Jeff King
  2012-03-24 20:54                   ` [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin Ivan Todoroski
  2 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-24 20:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git


>From c4bb55f9f27569faa368d823ca6fe4b236e37cd6 Mon Sep 17 00:00:00 2001
From: Ivan Todoroski <grnch@gmx.net>
Date: Sat, 24 Mar 2012 15:13:05 +0100
Subject: [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin

---
 Documentation/git-fetch-pack.txt |    9 ++++++++
 builtin/fetch-pack.c             |   44 ++++++++++++++++++++++++++++++++++++--
 fetch-pack.h                     |    3 ++-
 3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-fetch-pack.txt b/Documentation/git-fetch-pack.txt
index ed1bdaacd1..7cd7c785bc 100644
--- a/Documentation/git-fetch-pack.txt
+++ b/Documentation/git-fetch-pack.txt
@@ -32,6 +32,15 @@ OPTIONS
 --all::
 	Fetch all remote refs.
 
+--stdin::
+	Take the list of refs from stdin, one per line. If there
+	are refs specified on the command line in addition to this
+	option, then the refs from stdin are processed after those
+	on the command line.
+	If '--stateless-rpc' is specified together with this option
+	then the list of refs must be in packet format (pkt-line)
+	with a flush packet terminating the list.
+
 -q::
 --quiet::
 	Pass '-q' flag to 'git unpack-objects'; this makes the
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index a4d3e90a86..3c4c193e45 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -23,7 +23,7 @@ static struct fetch_pack_args args = {
 };
 
 static const char fetch_pack_usage[] =
-"git fetch-pack [--all] [--quiet|-q] [--keep|-k] [--thin] [--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] [--no-progress] [-v] [<host>:]<directory> [<refs>...]";
+"git fetch-pack [--all] [--stdin] [--quiet|-q] [--keep|-k] [--thin] [--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] [--no-progress] [-v] [<host>:]<directory> [<refs>...]";
 
 #define COMPLETE	(1U << 0)
 #define COMMON		(1U << 1)
@@ -896,7 +896,7 @@ static void fetch_pack_setup(void)
 
 int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 {
-	int i, ret, nr_heads;
+	int i, ret, nr_heads, alloc_heads;
 	struct ref *ref = NULL;
 	char *dest = NULL, **heads;
 	int fd[2];
@@ -941,6 +941,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 				args.fetch_all = 1;
 				continue;
 			}
+			if (!strcmp("--stdin", arg)) {
+				args.refs_from_stdin = 1;
+				continue;
+			}
 			if (!strcmp("-v", arg)) {
 				args.verbose = 1;
 				continue;
@@ -972,6 +976,42 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	if (!dest)
 		usage(fetch_pack_usage);
 
+	if (args.refs_from_stdin) {
+		char ref[1000];
+		/* copy refs from cmdline to new growable list */
+		int size = nr_heads * sizeof(*heads);
+		heads = memcpy(xmalloc(size), heads, size);
+		alloc_heads = nr_heads;
+		/* append refs from stdin to ones from cmdline */
+		for (;;) {
+			if (args.stateless_rpc) {
+				/* read using pkt-line in stateless RPC mode,
+				   flush packet signifies end of refs */
+				int n = packet_read_line(0, ref, sizeof(ref));
+				if (!n)
+					break;
+				if (ref[n-1] == '\n')
+					ref[n-1] = '\0';
+			}
+			else {
+				int n;
+				if (!fgets(ref, sizeof(ref), stdin)) {
+					if (ferror(stdin))
+						die_errno("cannot read ref");
+					if (feof(stdin))
+						break;
+				}
+				n = strlen(ref);
+				if (ref[n-1] == '\n')
+					ref[n-1] = '\0';
+				else if (!feof(stdin))
+					die("ref too long on stdin");
+			}
+			ALLOC_GROW(heads, nr_heads + 1, alloc_heads);
+			heads[nr_heads++] = xstrdup(ref);
+		}
+	}
+
 	if (args.stateless_rpc) {
 		conn = NULL;
 		fd[0] = 0;
diff --git a/fetch-pack.h b/fetch-pack.h
index 0608edae3f..292d69389e 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -13,7 +13,8 @@ struct fetch_pack_args {
 		verbose:1,
 		no_progress:1,
 		include_tag:1,
-		stateless_rpc:1;
+		stateless_rpc:1,
+		refs_from_stdin:1;
 };
 
 struct ref *fetch_pack(struct fetch_pack_args *args,
-- 
1.7.9.4.16.gd24fa.dirty

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin
  2012-03-21 20:17                 ` Jeff King
  2012-03-24 20:49                   ` Ivan Todoroski
  2012-03-24 20:53                   ` [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin Ivan Todoroski
@ 2012-03-24 20:54                   ` Ivan Todoroski
  2012-03-25  1:24                     ` Jeff King
  2 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-24 20:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git


>From 723a561946824ee367f57f0d9b9d336a6bc28d13 Mon Sep 17 00:00:00 2001
From: Ivan Todoroski <grnch@gmx.net>
Date: Sat, 24 Mar 2012 15:53:36 +0100
Subject: [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin

---
 remote-curl.c |   18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index d159fe7f34..9fdfca9f32 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -531,12 +531,13 @@ static int post_rpc(struct rpc_state *rpc)
 	return err;
 }
 
-static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
+static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
+		int nr_fetch, struct ref **to_fetch)
 {
 	const char *svc = rpc->service_name;
 	struct strbuf buf = STRBUF_INIT;
 	struct child_process client;
-	int err = 0;
+	int err = 0, i;
 
 	memset(&client, 0, sizeof(client));
 	client.in = -1;
@@ -545,6 +546,13 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
 	client.argv = rpc->argv;
 	if (start_command(&client))
 		exit(1);
+
+	/* push the refs to fetch-pack via stdin, if requested */
+	if (to_fetch) {
+		for (i = 0; i < nr_fetch; i++)
+			packet_write(client.in, "%s\n", to_fetch[i]->name);
+		packet_flush(client.in);
+	}
 	if (heads)
 		write_or_die(client.in, heads->buf, heads->len);
 
@@ -633,6 +641,7 @@ static int fetch_git(struct discovery *heads,
 	argv = xmalloc((15 + nr_heads) * sizeof(char*));
 	argv[argc++] = "fetch-pack";
 	argv[argc++] = "--stateless-rpc";
+	argv[argc++] = "--stdin";
 	argv[argc++] = "--lock-pack";
 	if (options.followtags)
 		argv[argc++] = "--include-tag";
@@ -655,7 +664,6 @@ static int fetch_git(struct discovery *heads,
 		struct ref *ref = to_fetch[i];
 		if (!ref->name || !*ref->name)
 			die("cannot fetch by sha1 over smart http");
-		argv[argc++] = ref->name;
 	}
 	argv[argc++] = NULL;
 
@@ -664,7 +672,7 @@ static int fetch_git(struct discovery *heads,
 	rpc.argv = argv;
 	rpc.gzip_request = 1;
 
-	err = rpc_service(&rpc, heads);
+	err = rpc_service(&rpc, heads, nr_heads, to_fetch);
 	if (rpc.result.len)
 		safe_write(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
@@ -783,7 +791,7 @@ static int push_git(struct discovery *heads, int nr_spec, char **specs)
 	rpc.service_name = "git-receive-pack",
 	rpc.argv = argv;
 
-	err = rpc_service(&rpc, heads);
+	err = rpc_service(&rpc, heads, 0, NULL);
 	if (rpc.result.len)
 		safe_write(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
-- 
1.7.9.4.16.gd24fa.dirty

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-24 20:49                   ` Ivan Todoroski
@ 2012-03-25  1:06                     ` Jeff King
  2012-03-25  2:32                       ` Jeff King
  2012-03-25 15:30                       ` Ivan Todoroski
  0 siblings, 2 replies; 54+ messages in thread
From: Jeff King @ 2012-03-25  1:06 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Sat, Mar 24, 2012 at 09:49:55PM +0100, Ivan Todoroski wrote:

> Unfortunately it failed, and to make matters worse it failed exactly
> on the parts dealing with smart HTTP, which is what I need to test in
> the first place. Talk about Murphy's law...
> 
> Is it failing for anyone else on the vanilla "maint" branch? I would
> appreciate any help I could get here.

No, it passes fine here (Debian unstable).

> expecting success:
>         git clone $HTTPD_URL/smart-redir-perm/repo.git --quiet repo-p
> 
> error: RPC failed; result=22, HTTP code = 405
> fatal: The remote end hung up unexpectedly
> not ok - 6 follow redirects (301)

That's weird. 405 is "Method Not Allowed". Clone shouldn't be doing
anything more exotic than GET and POST. And the prior tests pass, so it
means that it's working in general. The only thing different about this
test is that apache is configured to use mod_rewrite to issue a
redirect.

Does your machine have mod_rewrite installed and enabled? I would think
apache would complain at startup if it wasn't.  I wonder if there's
something non-portable in the minimal apache config we ship.

Does httpd/error.log in the trash directory say anything interesting?

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin
  2012-03-24 20:53                   ` [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin Ivan Todoroski
@ 2012-03-25  1:19                     ` Jeff King
  2012-03-25  9:39                       ` Ivan Todoroski
  2012-03-25 20:00                       ` Ivan Todoroski
  0 siblings, 2 replies; 54+ messages in thread
From: Jeff King @ 2012-03-25  1:19 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Sat, Mar 24, 2012 at 09:53:26PM +0100, Ivan Todoroski wrote:

> From c4bb55f9f27569faa368d823ca6fe4b236e37cd6 Mon Sep 17 00:00:00 2001
> From: Ivan Todoroski <grnch@gmx.net>
> Date: Sat, 24 Mar 2012 15:13:05 +0100
> Subject: [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin

You can drop these lines; they are redundant with the actual email
headers.

> ---
>  Documentation/git-fetch-pack.txt |    9 ++++++++
>  builtin/fetch-pack.c             |   44 ++++++++++++++++++++++++++++++++++++--
>  fetch-pack.h                     |    3 ++-
>  3 files changed, 53 insertions(+), 3 deletions(-)

Give more of a commit message. Why is this option useful (I know, of
course, from our previous discussion. But keep in mind the audience of
developers reading "git log" a year from now).

> +--stdin::
> +	Take the list of refs from stdin, one per line. If there
> +	are refs specified on the command line in addition to this
> +	option, then the refs from stdin are processed after those
> +	on the command line.
> +	If '--stateless-rpc' is specified together with this option
> +	then the list of refs must be in packet format (pkt-line)
> +	with a flush packet terminating the list.

Nice. Thanks for taking the time not just to solve your problem, but to
give sane semantics in the non-stateless-rpc case.

I think there is a minor formatting bug in the above. Asciidoc will make
your two paragraphs into a single one, won't it? I think you need to do
the (horribly ugly):

  --stdin::
      First paragraph.
  +
  Second paragraph.

to appease asciidoc.

> --- a/builtin/fetch-pack.c
> +++ b/builtin/fetch-pack.c
> @@ -23,7 +23,7 @@ static struct fetch_pack_args args = {
>  };
>  
>  static const char fetch_pack_usage[] =
> -"git fetch-pack [--all] [--quiet|-q] [--keep|-k] [--thin] [--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] [--no-progress] [-v] [<host>:]<directory> [<refs>...]";
> +"git fetch-pack [--all] [--stdin] [--quiet|-q] [--keep|-k] [--thin] [--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] [--no-progress] [-v] [<host>:]<directory> [<refs>...]";

Not a problem you introduced, but maybe it is time for us to cut down
this gigantic line (it's 180 characters even before your patch!). Even
breaking it across lines at 80 columns would help.

> @@ -972,6 +976,42 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
>  	if (!dest)
>  		usage(fetch_pack_usage);
>  
> +	if (args.refs_from_stdin) {
> +		char ref[1000];

Ick. Is there any reason not to use a strbuf here? 1000 is probably
plenty, but we are generally moving towards removing such limits where
possible.

You'd also get to use strbuf_getline and strbuf_trim in the
newline-delimited case.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin
  2012-03-24 20:54                   ` [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin Ivan Todoroski
@ 2012-03-25  1:24                     ` Jeff King
  2012-03-25  9:52                       ` Ivan Todoroski
  0 siblings, 1 reply; 54+ messages in thread
From: Jeff King @ 2012-03-25  1:24 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Sat, Mar 24, 2012 at 09:54:16PM +0100, Ivan Todoroski wrote:

> From 723a561946824ee367f57f0d9b9d336a6bc28d13 Mon Sep 17 00:00:00 2001
> From: Ivan Todoroski <grnch@gmx.net>
> Date: Sat, 24 Mar 2012 15:53:36 +0100
> Subject: [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin
> 
> ---

Same comments as patch 1. Drop this stuff, and add in more commit
message. :)

>  remote-curl.c |   18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)

Overall the code looks sane to me. One minor comment:

> -static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
> +static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
> +		int nr_fetch, struct ref **to_fetch)

I was curious why you needed to add new arguments for this, since we
surely must be passing the information into rpc_service already (since
it has to put them on the command line). And the answer is that they are
already in the argv member of the struct rpc_state. I wonder if it would
fit the existing style of the code to pass the arguments through a
member in the rpc_state in the same way. I don't feel strongly about it,
though.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-25  1:06                     ` Jeff King
@ 2012-03-25  2:32                       ` Jeff King
  2012-03-25 17:33                         ` Ivan Todoroski
  2012-03-25 15:30                       ` Ivan Todoroski
  1 sibling, 1 reply; 54+ messages in thread
From: Jeff King @ 2012-03-25  2:32 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Sat, Mar 24, 2012 at 09:06:09PM -0400, Jeff King wrote:

> That's weird. 405 is "Method Not Allowed". Clone shouldn't be doing
> anything more exotic than GET and POST. And the prior tests pass, so it
> means that it's working in general. The only thing different about this
> test is that apache is configured to use mod_rewrite to issue a
> redirect.
> 
> Does your machine have mod_rewrite installed and enabled? I would think
> apache would complain at startup if it wasn't.  I wonder if there's
> something non-portable in the minimal apache config we ship.
> 
> Does httpd/error.log in the trash directory say anything interesting?

Also, does it work any better with this patch?

diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 3c12b05..714760d 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -61,9 +61,8 @@ ScriptAlias /smart_noexport/ ${GIT_EXEC_PATH}/git-http-backend/
 	Options ExecCGI
 </Files>
 
-RewriteEngine on
-RewriteRule ^/smart-redir-perm/(.*)$ /smart/$1 [R=301]
-RewriteRule ^/smart-redir-temp/(.*)$ /smart/$1 [R=302]
+RedirectMatch 301 ^/smart-redir-perm/(.*)$ /smart/$1
+RedirectMatch 302 ^/smart-redir-temp/(.*)$ /smart/$1
 
 <IfDefine SSL>
 LoadModule ssl_module modules/mod_ssl.so

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin
  2012-03-25  1:19                     ` Jeff King
@ 2012-03-25  9:39                       ` Ivan Todoroski
  2012-03-25 15:15                         ` Ivan Todoroski
  2012-03-25 20:00                       ` Ivan Todoroski
  1 sibling, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-25  9:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On 25.03.2012 03:19, Jeff King wrote:
> On Sat, Mar 24, 2012 at 09:53:26PM +0100, Ivan Todoroski wrote:
> 
>> ---
>>  Documentation/git-fetch-pack.txt |    9 ++++++++
>>  builtin/fetch-pack.c             |   44 ++++++++++++++++++++++++++++++++++++--
>>  fetch-pack.h                     |    3 ++-
>>  3 files changed, 53 insertions(+), 3 deletions(-)
> 
> Give more of a commit message. Why is this option useful (I know, of
> course, from our previous discussion. But keep in mind the audience of
> developers reading "git log" a year from now).

Definitely. This was just work in progress I posted while I figure out 
the test suite problem. I will add full commit messages when I complete 
the test cases. Thanks for reviewing, I will incorporate all your 
suggestions in the next version.

>> @@ -972,6 +976,42 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
>>  	if (!dest)
>>  		usage(fetch_pack_usage);
>>  
>> +	if (args.refs_from_stdin) {
>> +		char ref[1000];
> 
> Ick. Is there any reason not to use a strbuf here? 1000 is probably
> plenty, but we are generally moving towards removing such limits where
> possible.
> 
> You'd also get to use strbuf_getline and strbuf_trim in the
> newline-delimited case.

Right now that "char ref[1000]" code is rejecting refs on stdin if they 
are longer than 1000 chars or if they contain an ASCII NUL char in them. 
When I change this to strbuf_getline() should I be doing any similar 
checks, or do I just pass on whatever I read?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin
  2012-03-25  1:24                     ` Jeff King
@ 2012-03-25  9:52                       ` Ivan Todoroski
  2012-03-26 17:24                         ` Jeff King
  0 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-25  9:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On 25.03.2012 03:24, Jeff King wrote:
> On Sat, Mar 24, 2012 at 09:54:16PM +0100, Ivan Todoroski wrote:
>> -static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
>> +static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
>> +		int nr_fetch, struct ref **to_fetch)
> 
> I was curious why you needed to add new arguments for this, since we
> surely must be passing the information into rpc_service already (since
> it has to put them on the command line). And the answer is that they are
> already in the argv member of the struct rpc_state. I wonder if it would
> fit the existing style of the code to pass the arguments through a
> member in the rpc_state in the same way. I don't feel strongly about it,
> though.

Yeah, I realized this was ugly after I sent it.... not only are new 
arguments added to rpc_service(), but they seem completely arbitrary and 
specific to fetch-pack. There is no reason other callers of 
rpc_service() would want args in the exact same format.

On the other hand I can't reuse rpc_state.argv because that is already 
passed to client.argv for start_command().

Would it be OK if I add a new memeber in rpc_state, e.g. "struct strbuf 
*stdin_preamble"? If non-NULL, it would contain any data the caller of 
rpc_service() wants shoved into the stdin of the sub-command before 
anything else. That way the caller is free to format this data in 
whatever format they need, not only in the pkt-line format needed by 
fetch-pack.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin
  2012-03-25  9:39                       ` Ivan Todoroski
@ 2012-03-25 15:15                         ` Ivan Todoroski
  0 siblings, 0 replies; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-25 15:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On 25.03.2012 11:39, Ivan Todoroski wrote:
> On 25.03.2012 03:19, Jeff King wrote:
>> On Sat, Mar 24, 2012 at 09:53:26PM +0100, Ivan Todoroski wrote:
>>> @@ -972,6 +976,42 @@ int cmd_fetch_pack(int argc, const char **argv, 
>>> const char *prefix)
>>>      if (!dest)
>>>          usage(fetch_pack_usage);
>>>  
>>> +    if (args.refs_from_stdin) {
>>> +        char ref[1000];
>>
>> Ick. Is there any reason not to use a strbuf here? 1000 is probably
>> plenty, but we are generally moving towards removing such limits where
>> possible.
>>
>> You'd also get to use strbuf_getline and strbuf_trim in the
>> newline-delimited case.
> 
> Right now that "char ref[1000]" code is rejecting refs on stdin if they 
> are longer than 1000 chars or if they contain an ASCII NUL char in them. 
> When I change this to strbuf_getline() should I be doing any similar 
> checks, or do I just pass on whatever I read?

Never mind, there is no checking necessary. I just saw that 
strbuf_getline() grows the buffer automatically so there is no chance of 
buffer overflows, plus it doesn't mind if NUL chars are present in the 
middle of the string. The subsequent code that reads the refs will stop 
at the first NUL char anyway so all is well.

I didn't know about this cool strbuf infrastructure, thanks for the tip.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-25  1:06                     ` Jeff King
  2012-03-25  2:32                       ` Jeff King
@ 2012-03-25 15:30                       ` Ivan Todoroski
  1 sibling, 0 replies; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-25 15:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On 25.03.2012 03:06, Jeff King wrote:
> On Sat, Mar 24, 2012 at 09:49:55PM +0100, Ivan Todoroski wrote:
> 
>> Is it failing for anyone else on the vanilla "maint" branch? I would
>> appreciate any help I could get here.
> 
> No, it passes fine here (Debian unstable).

OK, thanks for checking, it's good to know.


> Does your machine have mod_rewrite installed and enabled? I would think
> apache would complain at startup if it wasn't.  I wonder if there's
> something non-portable in the minimal apache config we ship.

Looks like it comes by default with the RPM. Maybe the Apache version is 
too old?

$ rpm -q httpd
httpd-2.2.3-63.el5.centos.1

$ rpm -ql httpd | grep rewrite
/usr/lib64/httpd/modules/mod_rewrite.so

$ ls -l /usr/lib64/httpd/modules/mod_rewrite.so
-rwxr-xr-x 1 root root 60384 Feb 23 19:23 
/usr/lib64/httpd/modules/mod_rewrite.so

$ ls -l trash\ directory.t5551-http-fetch/httpd/modules/mod_rewrite.so
-rwxr-xr-x 1 root root 60384 Feb 23 19:23 trash 
directory.t5551-http-fetch/httpd/modules/mod_rewrite.so


Also, mod_rewrite is enabled in both the master config file 
/etc/httpd/conf/httpd.conf, as well as the minimal 
git/t/lib-httpd/apache.conf.


> Does httpd/error.log in the trash directory say anything interesting?

Nothing, it just gives the startup and shutdown message:

$ cat trash\ directory.t5551-http-fetch/httpd/error.log
[Sun Mar 25 15:27:12 2012] [notice] Apache/2.2.3 (CentOS) configured -- 
resuming normal operations
[Sun Mar 25 15:27:12 2012] [notice] caught SIGTERM, shutting down


I will try your patch from the other message a bit later and let you know.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-25  2:32                       ` Jeff King
@ 2012-03-25 17:33                         ` Ivan Todoroski
  2012-03-25 17:54                           ` Ivan Todoroski
  0 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-25 17:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On 25.03.2012 04:32, Jeff King wrote:
> On Sat, Mar 24, 2012 at 09:06:09PM -0400, Jeff King wrote:
> 
>> That's weird. 405 is "Method Not Allowed". Clone shouldn't be doing
>> anything more exotic than GET and POST. And the prior tests pass, so it
>> means that it's working in general. The only thing different about this
>> test is that apache is configured to use mod_rewrite to issue a
>> redirect.
>>
>> Does your machine have mod_rewrite installed and enabled? I would think
>> apache would complain at startup if it wasn't.  I wonder if there's
>> something non-portable in the minimal apache config we ship.
>>
>> Does httpd/error.log in the trash directory say anything interesting?
> 
> Also, does it work any better with this patch?
> 
> diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
> index 3c12b05..714760d 100644
> --- a/t/lib-httpd/apache.conf
> +++ b/t/lib-httpd/apache.conf
> @@ -61,9 +61,8 @@ ScriptAlias /smart_noexport/ ${GIT_EXEC_PATH}/git-http-backend/
>  	Options ExecCGI
>  </Files>
>  
> -RewriteEngine on
> -RewriteRule ^/smart-redir-perm/(.*)$ /smart/$1 [R=301]
> -RewriteRule ^/smart-redir-temp/(.*)$ /smart/$1 [R=302]
> +RedirectMatch 301 ^/smart-redir-perm/(.*)$ /smart/$1
> +RedirectMatch 302 ^/smart-redir-temp/(.*)$ /smart/$1
>  
>  <IfDefine SSL>
>  LoadModule ssl_module modules/mod_ssl.so


Yes! That patch did it, t5551-http-fetch.sh now passes fully. Thank you.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-25 17:33                         ` Ivan Todoroski
@ 2012-03-25 17:54                           ` Ivan Todoroski
  2012-03-26 17:33                             ` Jeff King
  0 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-25 17:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On 25.03.2012 19:33, Ivan Todoroski wrote:
> On 25.03.2012 04:32, Jeff King wrote:
>> On Sat, Mar 24, 2012 at 09:06:09PM -0400, Jeff King wrote:
>>
>>> That's weird. 405 is "Method Not Allowed". Clone shouldn't be doing
>>> anything more exotic than GET and POST. And the prior tests pass, so it
>>> means that it's working in general. The only thing different about this
>>> test is that apache is configured to use mod_rewrite to issue a
>>> redirect.
>>>
>>> Does your machine have mod_rewrite installed and enabled? I would think
>>> apache would complain at startup if it wasn't.  I wonder if there's
>>> something non-portable in the minimal apache config we ship.
>>>
>>> Does httpd/error.log in the trash directory say anything interesting?
>>
>> Also, does it work any better with this patch?
>>
>> diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
>> index 3c12b05..714760d 100644
>> --- a/t/lib-httpd/apache.conf
>> +++ b/t/lib-httpd/apache.conf
>> @@ -61,9 +61,8 @@ ScriptAlias /smart_noexport/ 
>> ${GIT_EXEC_PATH}/git-http-backend/
>>      Options ExecCGI
>>  </Files>
>>  
>> -RewriteEngine on
>> -RewriteRule ^/smart-redir-perm/(.*)$ /smart/$1 [R=301]
>> -RewriteRule ^/smart-redir-temp/(.*)$ /smart/$1 [R=302]
>> +RedirectMatch 301 ^/smart-redir-perm/(.*)$ /smart/$1
>> +RedirectMatch 302 ^/smart-redir-temp/(.*)$ /smart/$1
>>  
>>  <IfDefine SSL>
>>  LoadModule ssl_module modules/mod_ssl.so
> 
> 
> Yes! That patch did it, t5551-http-fetch.sh now passes fully. Thank you.

Ah, no... I spoke too soon.

I didn't look closely enough, I wasn't logged in at the right machine, 
sorry. :)

It's still failing on CentOS 5.8 with Apache 2.2.3. I will just find 
another machine where the test suite works.

If you'd still like to debug the problem on CentOS 5.8 I can run any 
commands or test any patches for you.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin
  2012-03-25  1:19                     ` Jeff King
  2012-03-25  9:39                       ` Ivan Todoroski
@ 2012-03-25 20:00                       ` Ivan Todoroski
  2012-03-26 17:21                         ` Jeff King
  1 sibling, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-25 20:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On 25.03.2012 03:19, Jeff King wrote:
> On Sat, Mar 24, 2012 at 09:53:26PM +0100, Ivan Todoroski wrote:
> I think there is a minor formatting bug in the above. Asciidoc will make
> your two paragraphs into a single one, won't it? I think you need to do
> the (horribly ugly):
> 
>   --stdin::
>       First paragraph.
>   +
>   Second paragraph.

Apparently this works too (i.e. indent the "+" too):

   --stdin::
       First paragraph.
       +
       Second paragraph.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin
  2012-03-25 20:00                       ` Ivan Todoroski
@ 2012-03-26 17:21                         ` Jeff King
  2012-03-26 17:49                           ` Ivan Todoroski
  0 siblings, 1 reply; 54+ messages in thread
From: Jeff King @ 2012-03-26 17:21 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Sun, Mar 25, 2012 at 10:00:01PM +0200, Ivan Todoroski wrote:

> On 25.03.2012 03:19, Jeff King wrote:
> >On Sat, Mar 24, 2012 at 09:53:26PM +0100, Ivan Todoroski wrote:
> >I think there is a minor formatting bug in the above. Asciidoc will make
> >your two paragraphs into a single one, won't it? I think you need to do
> >the (horribly ugly):
> >
> >  --stdin::
> >      First paragraph.
> >  +
> >  Second paragraph.
> 
> Apparently this works too (i.e. indent the "+" too):
> 
>   --stdin::
>       First paragraph.
>       +
>       Second paragraph.

Sadly, it's not quite the same (because I consider the source of your
version much more readable). The diff of the resulting HTML between my
version and yours is:

--- no-indent.html	2012-03-26 13:19:26.206728013 -0400
+++ indent.html	2012-03-26 13:19:04.270727319 -0400
@@ -5,7 +5,8 @@
 <dd>
 <p>
     First paragraph.
+   <br />
+    Second paragraph.
 </p>
-<div class="paragraph"><p>Second paragraph.</p></div>
 </dd>
 </dl></div>

So in your case it is putting in a line break, but not actually starting
a new paragraph.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin
  2012-03-25  9:52                       ` Ivan Todoroski
@ 2012-03-26 17:24                         ` Jeff King
  0 siblings, 0 replies; 54+ messages in thread
From: Jeff King @ 2012-03-26 17:24 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Sun, Mar 25, 2012 at 11:52:35AM +0200, Ivan Todoroski wrote:

> >>-static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
> >>+static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
> >>+		int nr_fetch, struct ref **to_fetch)
> [...]
> Yeah, I realized this was ugly after I sent it.... not only are new
> arguments added to rpc_service(), but they seem completely arbitrary
> and specific to fetch-pack. There is no reason other callers of
> rpc_service() would want args in the exact same format.

Yeah, I think that is what was bugging me about it.

> On the other hand I can't reuse rpc_state.argv because that is
> already passed to client.argv for start_command().

Right. You'd need a new member.

> Would it be OK if I add a new memeber in rpc_state, e.g. "struct
> strbuf *stdin_preamble"? If non-NULL, it would contain any data the
> caller of rpc_service() wants shoved into the stdin of the
> sub-command before anything else. That way the caller is free to
> format this data in whatever format they need, not only in the
> pkt-line format needed by fetch-pack.

I think that is the cleanest solution. It's a little less efficient, in
that we build the whole buffer in memory instead of sending each packet
as we form it. But I think we are talking about a few kilobytes at most.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-25 17:54                           ` Ivan Todoroski
@ 2012-03-26 17:33                             ` Jeff King
  2012-03-27  7:07                               ` Ivan Todoroski
  0 siblings, 1 reply; 54+ messages in thread
From: Jeff King @ 2012-03-26 17:33 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Sun, Mar 25, 2012 at 07:54:19PM +0200, Ivan Todoroski wrote:

> >>>That's weird. 405 is "Method Not Allowed". Clone shouldn't be doing
> >>>anything more exotic than GET and POST. And the prior tests pass, so it
> [...]
> It's still failing on CentOS 5.8 with Apache 2.2.3. I will just find
> another machine where the test suite works.

I'm running the tests with apache 2.2.22. So I wonder if there is some
config syntax or behavior that is simply different in the much older
version.

> If you'd still like to debug the problem on CentOS 5.8 I can run any
> commands or test any patches for you.

A few more things to try:

  - check httpd/access.log to confirm that apache really is returning a
    405 (I can't imagine that curl would not be reporting it accurately,
    but it's worth a shot)

  - try running with GIT_CURL_VERBOSE=1 to get debug output from curl

  - try adding these to the apache config:

      RewriteLog "httpd/rewrite.log"
      RewriteLogLevel 9

    which might yield more information.

I have no idea what you're looking for in any of those, but maybe
something useful will be obvious. I don't have a lot of apache
experience, so my next step in your shoes would be just trying to get
more information on what's happening.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin
  2012-03-26 17:21                         ` Jeff King
@ 2012-03-26 17:49                           ` Ivan Todoroski
  2012-03-26 17:51                             ` Jeff King
  0 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-26 17:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On 26.03.2012 19:21, Jeff King wrote:
> On Sun, Mar 25, 2012 at 10:00:01PM +0200, Ivan Todoroski wrote:
> 
>> On 25.03.2012 03:19, Jeff King wrote:
>>> On Sat, Mar 24, 2012 at 09:53:26PM +0100, Ivan Todoroski wrote:
>>> I think there is a minor formatting bug in the above. Asciidoc will make
>>> your two paragraphs into a single one, won't it? I think you need to do
>>> the (horribly ugly):
>>>
>>>  --stdin::
>>>      First paragraph.
>>>  +
>>>  Second paragraph.
>> Apparently this works too (i.e. indent the "+" too):
>>
>>   --stdin::
>>       First paragraph.
>>       +
>>       Second paragraph.
> 
> Sadly, it's not quite the same (because I consider the source of your
> version much more readable). The diff of the resulting HTML between my
> version and yours is:
> 
> --- no-indent.html	2012-03-26 13:19:26.206728013 -0400
> +++ indent.html	2012-03-26 13:19:04.270727319 -0400
> @@ -5,7 +5,8 @@
>  <dd>
>  <p>
>      First paragraph.
> +   <br />
> +    Second paragraph.
>  </p>
> -<div class="paragraph"><p>Second paragraph.</p></div>
>  </dd>
>  </dl></div>
> 
> So in your case it is putting in a line break, but not actually starting
> a new paragraph.
> 
> -Peff

Ah, that's too bad. OK, I guess we have two options:

1) Unindent the second paragraph - looks ugly in source but good in 
final output.

2) Just merge the two paragraphs into a single paragraph. It's not that 
much text anyway, it doesn't really *have* to be in two paragraphs.

Let me know which option you prefer.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin
  2012-03-26 17:49                           ` Ivan Todoroski
@ 2012-03-26 17:51                             ` Jeff King
  0 siblings, 0 replies; 54+ messages in thread
From: Jeff King @ 2012-03-26 17:51 UTC (permalink / raw)
  To: Ivan Todoroski; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On Mon, Mar 26, 2012 at 07:49:59PM +0200, Ivan Todoroski wrote:

> >So in your case it is putting in a line break, but not actually starting
> >a new paragraph.
> >
> Ah, that's too bad. OK, I guess we have two options:
> 
> 1) Unindent the second paragraph - looks ugly in source but good in
> final output.
> 
> 2) Just merge the two paragraphs into a single paragraph. It's not
> that much text anyway, it doesn't really *have* to be in two
> paragraphs.
> 
> Let me know which option you prefer.

I would do (1). While I (and I suspect other git devs) do just read the
source straight from Documentation/*.txt, the formatted output is read
by many more people, and should be considered the final product. So
optimize for that case.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH/RFC v2 0/4] Fix fetch-pack command line overflow during clone
  2012-03-21 17:14             ` Jeff King
  2012-03-21 17:59               ` Jakub Narebski
  2012-03-21 20:02               ` Ivan Todoroski
@ 2012-03-27  6:23               ` Ivan Todoroski
  2012-03-27  6:25                 ` [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin Ivan Todoroski
                                   ` (3 more replies)
  2 siblings, 4 replies; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-27  6:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

This patch series is against the latest "maint" branch. Please let me
know if I need to rebase this on top of some other branch.

The problem we are fixing is described in the first patch in the series.


Changes since the original patch:

* add test cases
* add full commit messages
* fix formatting problem in --stdin doc
* split overly long fetch_pack_usage line
* use strbuf_getline() instead of fgets() for reading refs from stdin
* minor optimization of the pkt-line reading loop, it was using xstrdup()
  even though the string length was already known, use xmemdupz() instead
* rework the remote-curl.c patch to not add new parameters to rpc_service(),
  instead add a new strbuf member to rpc_state to pass the info around


Ivan Todoroski (4):
  fetch-pack: new --stdin option to read refs from stdin
  remote-curl: send the refs to fetch-pack on stdin
  fetch-pack: test cases for the new --stdin option
  remote-curl: main test case for the OS command line overflow

 Documentation/git-fetch-pack.txt |   10 ++++
 builtin/fetch-pack.c             |   45 ++++++++++++++++-
 fetch-pack.h                     |    3 +-
 remote-curl.c                    |   15 +++++-
 t/t5500-fetch-pack.sh            |  100 ++++++++++++++++++++++++++++++++++++++
 t/t5551-http-fetch.sh            |   32 ++++++++++++
 6 files changed, 201 insertions(+), 4 deletions(-)

-- 
1.7.9.5.4.g4f508

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin
  2012-03-27  6:23               ` [PATCH/RFC v2 0/4] Fix fetch-pack command line overflow during clone Ivan Todoroski
@ 2012-03-27  6:25                 ` Ivan Todoroski
  2012-03-27 16:59                   ` Junio C Hamano
  2012-03-27  6:26                 ` [PATCH/RFC v2 2/4] remote-curl: send the refs to fetch-pack on stdin Ivan Todoroski
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-27  6:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

If a remote repo has too many tags (or branches), cloning it over the
smart HTTP transport can fail because remote-curl.c puts all the refs
from the remote repo on the fetch-pack command line. This can make the
command line longer than the global OS command line limit, causing
fetch-pack to fail.

This is especially a problem on Windows where the command line limit is
orders of magnitude shorter than Linux. There are already real repos out
there that msysGit cannot clone over smart HTTP due to this problem.

Here is an easy way to trigger this problem:

git init too-many-refs
cd too-many-refs
echo bla > bla.txt
git add .
git commit -m test
sha=$(git rev-parse HEAD)
for ((i=0; i<50000; i++)); do
	echo $sha refs/tags/artificially-long-tag-name-to-more-easily-\
demonstrate-the-problem-$i >> .git/packed-refs
done

Then share this repo over the smart HTTP protocol and try cloning it:

	$ git clone http://localhost/.../too-many-refs/.git
	Cloning into 'too-many-refs'...
	fatal: cannot exec 'fetch-pack': Argument list too long

50k tags is obviously an absurd number, but it is required to
demonstrate the problem on Linux because it has a much more generous
command line limit. On Windows the clone fails with as little as 500
tags in the above loop, which is getting uncomfortably close to the
number of tags you might see in real long lived repos.

This is not just theoretical, msysGit is already failing to clone our
company repo due to this. It's a large repo converted from CVS, nearly
10 years of history.

Four possible solutions were discussed on the Git mailing list (in no
particular order):

1) Call fetch-pack multiple times with smaller batches of refs.

This was dismissed as inefficient and inelegant.

2) Add option --refs-fd=$n to pass a an fd from where to read the refs.

This was rejected because inheriting descriptors other than
stdin/stdout/stderr through exec() is apparently problematic on Windows,
plus it would require changes to the run-command API to open extra
pipes.

3) Add option --refs-from=$tmpfile to pass the refs using a temp file.

This was not favored because of the temp file requirement.

4) Add option --stdin to pass the refs on stdin, one per line.

In the end this option was chosen as the most efficient and most
desirable from scripting perspective.

There was however a small complication when using stdin to pass refs to
fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for
communication with the remote server.

If we were going to sneak refs on stdin line by line in the presence of
--stateless-rpc it would have to be done very carefully, because when
reading refs one by line we might buffer too much data ahead and eat
some of the remote protocol data also coming on stdin.

One way to solve this would be refactor get_remote_heads() in
fetch-pack.c to accept a residual buffer from our stdin line parsing
above, but this function is used in several places so other callers
would be burdened by this residual buffer interface even when most of
them don't need it.

In the end we settled on the following solution:

If --stdin is specified without --stateless-rpc, fetch-pack would read
the refs from stdin one per line, in a script friendly format.

However if --stdin is specified together with --stateless-rpc,
fetch-pack would read the refs from stdin in packetized format
(pkt-line) with a flush packet terminating the list of refs. This way we
can read the exact number of bytes that we need from stdin, and then
get_remote_heads() can continue reading from the same fd without losing
a single byte of remote protocol data.

This way the --stdin option only loses generality and scriptability when
used together with --stateless-rpc, which is not easily scriptable
anyway because it also uses pkt-line when talking to the remote server.
---
 Documentation/git-fetch-pack.txt |   10 +++++++++
 builtin/fetch-pack.c             |   45 +++++++++++++++++++++++++++++++++++++-
 fetch-pack.h                     |    3 ++-
 3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fetch-pack.txt b/Documentation/git-fetch-pack.txt
index ed1bdaacd1..1dd44fd348 100644
--- a/Documentation/git-fetch-pack.txt
+++ b/Documentation/git-fetch-pack.txt
@@ -32,6 +32,16 @@ OPTIONS
 --all::
 	Fetch all remote refs.
 
+--stdin::
+	Take the list of refs from stdin, one per line. If there
+	are refs specified on the command line in addition to this
+	option, then the refs from stdin are processed after those
+	on the command line.
++
+If '--stateless-rpc' is specified together with this option then
+the list of refs must be in packet format (pkt-line) with a flush
+packet terminating the list.
+
 -q::
 --quiet::
 	Pass '-q' flag to 'git unpack-objects'; this makes the
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index a4d3e90a86..1a90fa852f 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -23,7 +23,9 @@ static struct fetch_pack_args args = {
 };
 
 static const char fetch_pack_usage[] =
-"git fetch-pack [--all] [--quiet|-q] [--keep|-k] [--thin] [--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] [--no-progress] [-v] [<host>:]<directory> [<refs>...]";
+"git fetch-pack [--all] [--stdin] [--quiet|-q] [--keep|-k] [--thin] "
+"[--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] "
+"[--no-progress] [-v] [<host>:]<directory> [<refs>...]";
 
 #define COMPLETE	(1U << 0)
 #define COMMON		(1U << 1)
@@ -941,6 +943,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 				args.fetch_all = 1;
 				continue;
 			}
+			if (!strcmp("--stdin", arg)) {
+				args.refs_from_stdin = 1;
+				continue;
+			}
 			if (!strcmp("-v", arg)) {
 				args.verbose = 1;
 				continue;
@@ -972,6 +978,43 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	if (!dest)
 		usage(fetch_pack_usage);
 
+	if (args.refs_from_stdin) {
+		/* copy refs from cmdline to new growable list,
+		   then append the refs from stdin */
+		int alloc_heads = nr_heads;
+		int size = nr_heads * sizeof(*heads);
+		heads = memcpy(xmalloc(size), heads, size);
+		if (args.stateless_rpc) {
+			/* in stateless RPC mode we use pkt-line to read
+			   from stdin, until we get a flush packet */
+			static char line[1000];
+			for (;;) {
+				int n = packet_read_line(0, line, sizeof(line));
+				if (!n)
+					break;
+				if (line[n-1] == '\n')
+					line[--n] = '\0';
+				ALLOC_GROW(heads, nr_heads + 1, alloc_heads);
+				heads[nr_heads++] = xmemdupz(line, n);
+			}
+		}
+		else {
+			/* read from stdin one ref per line, until EOF */
+			struct strbuf line;
+			strbuf_init(&line, 0);
+			for (;;) {
+				if (strbuf_getline(&line, stdin, '\n') == EOF)
+					break;
+				strbuf_trim(&line);
+				if (!line.len)
+					continue; /* skip empty lines */
+				ALLOC_GROW(heads, nr_heads + 1, alloc_heads);
+				heads[nr_heads++] = strbuf_detach(&line, NULL);
+			}
+			strbuf_release(&line);
+		}
+	}
+
 	if (args.stateless_rpc) {
 		conn = NULL;
 		fd[0] = 0;
diff --git a/fetch-pack.h b/fetch-pack.h
index 0608edae3f..292d69389e 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -13,7 +13,8 @@ struct fetch_pack_args {
 		verbose:1,
 		no_progress:1,
 		include_tag:1,
-		stateless_rpc:1;
+		stateless_rpc:1,
+		refs_from_stdin:1;
 };
 
 struct ref *fetch_pack(struct fetch_pack_args *args,
-- 
1.7.9.5.4.g4f508

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH/RFC v2 2/4] remote-curl: send the refs to fetch-pack on stdin
  2012-03-27  6:23               ` [PATCH/RFC v2 0/4] Fix fetch-pack command line overflow during clone Ivan Todoroski
  2012-03-27  6:25                 ` [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin Ivan Todoroski
@ 2012-03-27  6:26                 ` Ivan Todoroski
  2012-03-27 17:18                   ` Junio C Hamano
  2012-03-27  6:27                 ` [PATCH/RFC v2 3/4] fetch-pack: test cases for the new --stdin option Ivan Todoroski
  2012-03-27  6:28                 ` [PATCH/RFC v2 4/4] remote-curl: main test case for the OS command line overflow Ivan Todoroski
  3 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-27  6:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

If a remote repo has too many tags (or branches), cloning it over the
smart HTTP transport can fail because remote-curl.c puts all the refs
from the remote repo on the fetch-pack command line. This can make the
command line longer than the global OS command line limit, causing
fetch-pack to fail.

This is especially a problem on Windows where the command line limit is
orders of magnitude shorter than Linux. There are already real repos out
there that msysGit cannot clone over smart HTTP due to this problem.

To solve this problem we teach remote-curl.c to pipe the refs to
fetch-pack using the new --stdin option, instead of on the fetch-pack
command line.

For a more detailed discussion of the problem see the parent of this
commit, titled "fetch-pack: new --stdin option to read refs from stdin".
---
 remote-curl.c |   15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index d159fe7f34..52c21433c7 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -290,6 +290,7 @@ static void output_refs(struct ref *refs)
 struct rpc_state {
 	const char *service_name;
 	const char **argv;
+	struct strbuf *stdin_preamble;
 	char *service_url;
 	char *hdr_content_type;
 	char *hdr_accept;
@@ -535,6 +536,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
 {
 	const char *svc = rpc->service_name;
 	struct strbuf buf = STRBUF_INIT;
+	struct strbuf *preamble = rpc->stdin_preamble;
 	struct child_process client;
 	int err = 0;
 
@@ -545,6 +547,8 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
 	client.argv = rpc->argv;
 	if (start_command(&client))
 		exit(1);
+	if (preamble)
+		write_or_die(client.in, preamble->buf, preamble->len);
 	if (heads)
 		write_or_die(client.in, heads->buf, heads->len);
 
@@ -626,6 +630,7 @@ static int fetch_git(struct discovery *heads,
 	int nr_heads, struct ref **to_fetch)
 {
 	struct rpc_state rpc;
+	struct strbuf preamble;
 	char *depth_arg = NULL;
 	const char **argv;
 	int argc = 0, i, err;
@@ -633,6 +638,7 @@ static int fetch_git(struct discovery *heads,
 	argv = xmalloc((15 + nr_heads) * sizeof(char*));
 	argv[argc++] = "fetch-pack";
 	argv[argc++] = "--stateless-rpc";
+	argv[argc++] = "--stdin";
 	argv[argc++] = "--lock-pack";
 	if (options.followtags)
 		argv[argc++] = "--include-tag";
@@ -651,23 +657,28 @@ static int fetch_git(struct discovery *heads,
 		argv[argc++] = depth_arg;
 	}
 	argv[argc++] = url;
+	argv[argc++] = NULL;
+
+	strbuf_init(&preamble, 4);
 	for (i = 0; i < nr_heads; i++) {
 		struct ref *ref = to_fetch[i];
 		if (!ref->name || !*ref->name)
 			die("cannot fetch by sha1 over smart http");
-		argv[argc++] = ref->name;
+		packet_buf_write(&preamble, "%s\n", ref->name);
 	}
-	argv[argc++] = NULL;
+	packet_buf_flush(&preamble);
 
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-upload-pack",
 	rpc.argv = argv;
+	rpc.stdin_preamble = &preamble;
 	rpc.gzip_request = 1;
 
 	err = rpc_service(&rpc, heads);
 	if (rpc.result.len)
 		safe_write(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
+	strbuf_release(&preamble);
 	free(argv);
 	free(depth_arg);
 	return err;
-- 
1.7.9.5.4.g4f508

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH/RFC v2 3/4] fetch-pack: test cases for the new --stdin option
  2012-03-27  6:23               ` [PATCH/RFC v2 0/4] Fix fetch-pack command line overflow during clone Ivan Todoroski
  2012-03-27  6:25                 ` [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin Ivan Todoroski
  2012-03-27  6:26                 ` [PATCH/RFC v2 2/4] remote-curl: send the refs to fetch-pack on stdin Ivan Todoroski
@ 2012-03-27  6:27                 ` Ivan Todoroski
  2012-03-27 17:40                   ` Junio C Hamano
  2012-03-27  6:28                 ` [PATCH/RFC v2 4/4] remote-curl: main test case for the OS command line overflow Ivan Todoroski
  3 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-27  6:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

These test cases focus only on testing the parsing of refs on stdin,
without bothering with the rest of the fetch-pack machinery. We pass in
the refs using different combinations of command line and stdin and then
we watch fetch-pack's stdout to see whether it prints all the refs we
specified and in the same order as we specified them.

For the --stateless-rpc tests we cannot easily execute fetch-pack to the
end because that would require simulating the remote protocol. We settle
for only checking 2 cases:

1) Whether fetch-pack correctly fails to parse the refs if they are not
terminated by a flush packet

2) Whether fetch-pack finishes parsing the refs without error when they
are correctly terminated by a flush packet

The fetch-pack invocation fails in both cases due to the missing remote
side of the protocol, but it fails in different ways which allows us to
determine how the refs parsing ended by inspecting the different error
messages.
---
 t/t5500-fetch-pack.sh |  100 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 9bf69e9a0f..f4de7d07c1 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -248,4 +248,104 @@ test_expect_success 'clone shallow object count' '
 	grep "^count: 52" count.shallow
 '
 
+
+cat >stdin.exp <<EOF
+refs/heads/C
+refs/heads/A
+refs/heads/D
+refs/tags/C
+refs/heads/B
+refs/tags/A
+refs/heads/E
+refs/tags/B
+refs/tags/E
+refs/tags/D
+EOF
+
+test_expect_success 'setup tests for --stdin' '
+	for head in C D E F; do
+		add $head
+	done &&
+	for head in A B C D E F; do
+		git tag $head $head
+	done
+'
+
+test_expect_success 'fetch refs from cmdline, make sure it still works OK' '
+	cd client &&
+	git fetch-pack --no-progress .. $(cat ../stdin.exp) |
+	cut -d " " -f 2 > ../stdin.act &&
+	cd .. &&
+	test_cmp stdin.exp stdin.act
+'
+
+test_expect_success 'fetch refs from stdin' '
+	cd client &&
+	cat ../stdin.exp |
+	git fetch-pack --stdin --no-progress .. |
+	cut -d " " -f 2 > ../stdin.act &&
+	cd .. &&
+	test_cmp stdin.exp stdin.act
+'
+
+test_expect_success 'fetch mixed refs from cmdline and stdin in right order' '
+	cd client &&
+	tail -n +5 ../stdin.exp |
+	git fetch-pack --stdin --no-progress .. $(head -n 4 ../stdin.exp) |
+	cut -d " " -f 2 > ../stdin.act &&
+	cd .. &&
+	test_cmp stdin.exp stdin.act
+'
+
+
+# remove final newline and insert random spaces, NULs, and empty lines
+head -c -1 stdin.exp | sed -e '
+	1i
+	2s/^\|$/   /g
+	4s/^/    /
+	6s/$/    /
+	8s/$/\n     /
+	9s/$/Ztrailing garbage/
+	9s/^\|$/    /g
+' | tr "Z" "\000" > stdin.spaces
+
+test_expect_success 'ignore leading/trailing spaces, empty lines and NULs' '
+	cd client &&
+	cat ../stdin.spaces |
+	git fetch-pack --stdin --no-progress .. |
+	cut -d " " -f 2 > ../stdin.act &&
+	cd .. &&
+	test_cmp stdin.exp stdin.act
+'
+
+
+packet_write() { printf "%04x%s\n" $((${#1} + 5)) "$1"; }
+packet_flush() { printf "0000"; }
+
+cat stdin.exp | while read ref; do
+	packet_write $ref
+done | tee stdin.pkt.bad |
+(packet_flush; packet_write garbage) > stdin.pkt.good
+
+echo "fatal: protocol error: expected sha/ref, got 'garbage'" > stdin.pkt.good.exp
+echo "fatal: The remote end hung up unexpectedly" > stdin.pkt.bad.exp
+
+test_expect_success 'refs on stdin in packetized format' '
+	cd client &&
+	cat ../stdin.pkt.good |
+	test_must_fail git fetch-pack --stdin --stateless-rpc .. \
+	2> ../stdin.pkt.good.act &&
+	cd .. &&
+	test_cmp stdin.pkt.good.exp stdin.pkt.good.act
+'
+
+test_expect_success 'fail if flush packet is missing' '
+	cd client &&
+	cat ../stdin.pkt.bad |
+	test_must_fail git fetch-pack --stdin --stateless-rpc .. \
+	2> ../stdin.pkt.bad.act &&
+	cd .. &&
+	test_cmp stdin.pkt.bad.exp stdin.pkt.bad.act
+'
+
 test_done
-- 
1.7.9.5.4.g4f508

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH/RFC v2 4/4] remote-curl: main test case for the OS command line overflow
  2012-03-27  6:23               ` [PATCH/RFC v2 0/4] Fix fetch-pack command line overflow during clone Ivan Todoroski
                                   ` (2 preceding siblings ...)
  2012-03-27  6:27                 ` [PATCH/RFC v2 3/4] fetch-pack: test cases for the new --stdin option Ivan Todoroski
@ 2012-03-27  6:28                 ` Ivan Todoroski
  2012-03-27 17:43                   ` Junio C Hamano
  3 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-27  6:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

This is main test case for the original problem that triggered this
patch series. We create a repo with 50k tags and then test whether
git-clone over the smart HTTP protocol succeeds.

Note that we construct the repo in a slightly different way than the
original script used to reproduce the problem. This is because the
original script just created 50k tags all pointing to the same commit,
so if there was a bug where remote-curl.c was not passing all the refs
to fetch-pack we wouldn't know. The clone would succed even if only one
tag was passed, because all the other tags were pointing at the same SHA
and would be considered present.

Instead we create a repo with 50k independent (dangling) commits and
then tag each of those commits with a unique tag. This way if one of the
tags is not given to fetch-pack, later stages of the clone would
complain about it.

This allows us to test both that the command line overflow was fixed, as
well as that it was fixed in a way that doesn't leave out any of the
refs.
---
 t/t5551-http-fetch.sh |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/t/t5551-http-fetch.sh b/t/t5551-http-fetch.sh
index 26d355725f..fb970bda22 100755
--- a/t/t5551-http-fetch.sh
+++ b/t/t5551-http-fetch.sh
@@ -109,5 +109,37 @@ test_expect_success 'follow redirects (302)' '
 	git clone $HTTPD_URL/smart-redir-temp/repo.git --quiet repo-t
 '
 
+
+test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'create 50,000 tags in the repo' '
+	(
+	cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
+	N=50000 &&
+	for ((i=1; i<=$N; i++)); do
+		echo "commit refs/heads/too-many-refs"
+		echo "mark :$i"
+		echo "committer git <git@example.com> $i +0000"
+		echo "data 0"
+		echo "M 644 inline bla.txt"
+		echo "data 4"
+		echo "bla"
+		# make every commit dangling by always
+		# rewinding the branch after each commit
+		echo "reset refs/heads/too-many-refs"
+		echo "from :1"
+	done | git fast-import --export-marks=marks &&
+
+	# now assign tags to all the dangling commits we created above
+	export tag=refs/tags/artificially-long-tag-name-to-more-easily-demonstrate-the-problem &&
+	perl -nle '\''/:(.+) (.+)/; print "$2 $ENV{tag}-$1"'\'' < marks >> packed-refs
+	)
+'
+
+test_expect_success EXPENSIVE 'clone the 50,000 tag repo to check OS command line overflow' '
+	git clone $HTTPD_URL/smart/repo.git too-many-refs 2> too-many-refs.err &&
+	test_line_count = 0 too-many-refs.err
+'
+
 stop_httpd
 test_done
-- 
1.7.9.5.4.g4f508

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: Clone fails on a repo with too many heads/tags
  2012-03-26 17:33                             ` Jeff King
@ 2012-03-27  7:07                               ` Ivan Todoroski
  0 siblings, 0 replies; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-27  7:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski, git

On 26.03.2012 19:33, Jeff King wrote:
> On Sun, Mar 25, 2012 at 07:54:19PM +0200, Ivan Todoroski wrote:
> 
>>>>> That's weird. 405 is "Method Not Allowed". Clone shouldn't be doing
>>>>> anything more exotic than GET and POST. And the prior tests pass, so it
>> [...]
>> It's still failing on CentOS 5.8 with Apache 2.2.3. I will just find
>> another machine where the test suite works.
> 
> I'm running the tests with apache 2.2.22. So I wonder if there is some
> config syntax or behavior that is simply different in the much older
> version.

Most likely.

>> If you'd still like to debug the problem on CentOS 5.8 I can run any
>> commands or test any patches for you.
> 
> A few more things to try:
> 
>   - check httpd/access.log to confirm that apache really is returning a
>     405 (I can't imagine that curl would not be reporting it accurately,
>     but it's worth a shot)

Yeah, it has 405 in access log too.

>   - try running with GIT_CURL_VERBOSE=1 to get debug output from curl
> 
>   - try adding these to the apache config:
> 
>       RewriteLog "httpd/rewrite.log"
>       RewriteLogLevel 9
> 
>     which might yield more information.
> 
> I have no idea what you're looking for in any of those, but maybe
> something useful will be obvious. I don't have a lot of apache
> experience, so my next step in your shoes would be just trying to get
> more information on what's happening.

Thanks for the tips. I'll keep tinkering with this as time allows and 
post a patch if I find something.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin
  2012-03-27  6:25                 ` [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin Ivan Todoroski
@ 2012-03-27 16:59                   ` Junio C Hamano
  2012-03-27 23:18                     ` Ivan Todoroski
  0 siblings, 1 reply; 54+ messages in thread
From: Junio C Hamano @ 2012-03-27 16:59 UTC (permalink / raw)
  To: Ivan Todoroski
  Cc: Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	git

Ivan Todoroski <grnch@gmx.net> writes:

> sha=$(git rev-parse HEAD)
> for ((i=0; i<50000; i++)); do
> 	echo $sha refs/tags/artificially-long-tag-name-to-more-easily-\
> demonstrate-the-problem-$i >> .git/packed-refs
> done

This comment does not have much to the issue, but the above is better
written as

	commit=$(git rev-parse HEAD) i=0
        while test $i -le 50000
        do
        	echo $commit refs/tags/....-$i
	done >>.git/packed.refs

to stay within the usual Bourne shell.

> 4) Add option --stdin to pass the refs on stdin, one per line.
> ...
> In the end we settled on the following solution:
>
> If --stdin is specified without --stateless-rpc, fetch-pack would read
> the refs from stdin one per line, in a script friendly format.
>
> However if --stdin is specified together with --stateless-rpc,
> fetch-pack would read the refs from stdin in packetized format
> (pkt-line) with a flush packet terminating the list of refs. This way we
> can read the exact number of bytes that we need from stdin, and then
> get_remote_heads() can continue reading from the same fd without losing
> a single byte of remote protocol data.

That sounds like the right way to use pkt-line machinery.  Send stuff you
need to send, and mark it with a flush to tell the receiver that you have
finished giving one logical collection of information.

> This way the --stdin option only loses generality and scriptability when
> used together with --stateless-rpc, which is not easily scriptable
> anyway because it also uses pkt-line when talking to the remote server.
> ---

Just a gentle reminder; the final submission will need your sign off here.

> diff --git a/Documentation/git-fetch-pack.txt b/Documentation/git-fetch-pack.txt
> index ed1bdaacd1..1dd44fd348 100644
> --- a/Documentation/git-fetch-pack.txt
> +++ b/Documentation/git-fetch-pack.txt
> @@ -32,6 +32,16 @@ OPTIONS
>  --all::
>  	Fetch all remote refs.
>  
> +--stdin::
> +	Take the list of refs from stdin, one per line. If there
> +	are refs specified on the command line in addition to this
> +	option, then the refs from stdin are processed after those
> +	on the command line.
> ++
> +If '--stateless-rpc' is specified together with this option then
> +the list of refs must be in packet format (pkt-line) with a flush
> +packet terminating the list.
> +

It is not clear from this description alone if this is a single (possibly
giant) packet with multiple lines, each of which describes a ref, or a
series of one packet per ref with a flush at the end of the sequence.

> diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
> index a4d3e90a86..1a90fa852f 100644
> --- a/builtin/fetch-pack.c
> +++ b/builtin/fetch-pack.c
> @@ -972,6 +978,43 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
>  	if (!dest)
>  		usage(fetch_pack_usage);
>  
> +	if (args.refs_from_stdin) {
> +		/* copy refs from cmdline to new growable list,
> +		   then append the refs from stdin */

        /*
         * We tend to format our multi-line
         * comments like this
         */

> +		int alloc_heads = nr_heads;
> +		int size = nr_heads * sizeof(*heads);
> +		heads = memcpy(xmalloc(size), heads, size);
> +		if (args.stateless_rpc) {
> +			/* in stateless RPC mode we use pkt-line to read
> +			   from stdin, until we get a flush packet */
> +			static char line[1000];

We will never have a refname that is longer than this limit?

> +			for (;;) {
> +				int n = packet_read_line(0, line, sizeof(line));
> +				if (!n)
> +					break;
> +				if (line[n-1] == '\n')
> +					line[--n] = '\0';
> +				ALLOC_GROW(heads, nr_heads + 1, alloc_heads);
> +				heads[nr_heads++] = xmemdupz(line, n);

Micronit. The use of xmemdupz() here means you do not have to replace LF
with NUL above; decrementing 'n' should be sufficient.

> +			}
> +		}
> +		else {
> +			/* read from stdin one ref per line, until EOF */
> +			struct strbuf line;
> +			strbuf_init(&line, 0);
> +			for (;;) {
> +				if (strbuf_getline(&line, stdin, '\n') == EOF)
> +					break;
> +				strbuf_trim(&line);
> +				if (!line.len)
> +					continue; /* skip empty lines */

Curious.  "stop at EOF", "trim" and "skip empty" imply that you are
catering to people who debug this from the terminal by typing (or copy
pasting).  Is that the expected use case?

Otherwise we may want to tighten this part a bit to forbid cruft (e.g. a
line with leading or trailing whitespaces).

> +				ALLOC_GROW(heads, nr_heads + 1, alloc_heads);
> +				heads[nr_heads++] = strbuf_detach(&line, NULL);
> +			}
> +			strbuf_release(&line);
> +		}
> +	}
> +
>  	if (args.stateless_rpc) {
>  		conn = NULL;
>  		fd[0] = 0;
> diff --git a/fetch-pack.h b/fetch-pack.h
> index 0608edae3f..292d69389e 100644
> --- a/fetch-pack.h
> +++ b/fetch-pack.h
> @@ -13,7 +13,8 @@ struct fetch_pack_args {
>  		verbose:1,
>  		no_progress:1,
>  		include_tag:1,
> -		stateless_rpc:1;
> +		stateless_rpc:1,
> +		refs_from_stdin:1;
>  };
>  
>  struct ref *fetch_pack(struct fetch_pack_args *args,

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 2/4] remote-curl: send the refs to fetch-pack on stdin
  2012-03-27  6:26                 ` [PATCH/RFC v2 2/4] remote-curl: send the refs to fetch-pack on stdin Ivan Todoroski
@ 2012-03-27 17:18                   ` Junio C Hamano
  2012-03-27 23:20                     ` Ivan Todoroski
  0 siblings, 1 reply; 54+ messages in thread
From: Junio C Hamano @ 2012-03-27 17:18 UTC (permalink / raw)
  To: Ivan Todoroski
  Cc: Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	git

Ivan Todoroski <grnch@gmx.net> writes:

> If a remote repo has too many tags (or branches), cloning it over the
> smart HTTP transport can fail because remote-curl.c puts all the refs
> from the remote repo on the fetch-pack command line. This can make the
> command line longer than the global OS command line limit, causing
> fetch-pack to fail.
>
> This is especially a problem on Windows where the command line limit is
> orders of magnitude shorter than Linux. There are already real repos out
> there that msysGit cannot clone over smart HTTP due to this problem.
>
> To solve this problem we teach remote-curl.c to pipe the refs to
> fetch-pack using the new --stdin option, instead of on the fetch-pack
> command line.
>
> For a more detailed discussion of the problem see the parent of this
> commit, titled "fetch-pack: new --stdin option to read refs from stdin".

Thanks.  As there is no way for this single commit to be applied without
the previous step, you could have just said:

	Now we can throw arbitrary number of refs at fetch-pack using
        its --stdin option, use it in remote-curl helper to lift the
        command line length limit.

or something ;-).

> @@ -626,6 +630,7 @@ static int fetch_git(struct discovery *heads,
>  	int nr_heads, struct ref **to_fetch)
>  {
>  	struct rpc_state rpc;
> +	struct strbuf preamble;
>  	char *depth_arg = NULL;
>  	const char **argv;
>  	int argc = 0, i, err;
> @@ -633,6 +638,7 @@ static int fetch_git(struct discovery *heads,
>  	argv = xmalloc((15 + nr_heads) * sizeof(char*));
>  	argv[argc++] = "fetch-pack";
>  	argv[argc++] = "--stateless-rpc";
> +	argv[argc++] = "--stdin";
>  	argv[argc++] = "--lock-pack";
>  	if (options.followtags)
>  		argv[argc++] = "--include-tag";
> @@ -651,23 +657,28 @@ static int fetch_git(struct discovery *heads,
>  		argv[argc++] = depth_arg;
>  	}
>  	argv[argc++] = url;
> +	argv[argc++] = NULL;
> +
> +	strbuf_init(&preamble, 4);

Curious.

If "4" does not really matter, I would drop this and use STRBUF_INIT at
the beginning instead.

>  	for (i = 0; i < nr_heads; i++) {
>  		struct ref *ref = to_fetch[i];
>  		if (!ref->name || !*ref->name)
>  			die("cannot fetch by sha1 over smart http");
> -		argv[argc++] = ref->name;
> +		packet_buf_write(&preamble, "%s\n", ref->name);
>  	}
> -	argv[argc++] = NULL;
> +	packet_buf_flush(&preamble);
>  
>  	memset(&rpc, 0, sizeof(rpc));
>  	rpc.service_name = "git-upload-pack",
>  	rpc.argv = argv;
> +	rpc.stdin_preamble = &preamble;
>  	rpc.gzip_request = 1;
>  
>  	err = rpc_service(&rpc, heads);
>  	if (rpc.result.len)
>  		safe_write(1, rpc.result.buf, rpc.result.len);
>  	strbuf_release(&rpc.result);
> +	strbuf_release(&preamble);
>  	free(argv);
>  	free(depth_arg);
>  	return err;

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 3/4] fetch-pack: test cases for the new --stdin option
  2012-03-27  6:27                 ` [PATCH/RFC v2 3/4] fetch-pack: test cases for the new --stdin option Ivan Todoroski
@ 2012-03-27 17:40                   ` Junio C Hamano
  2012-03-27 23:36                     ` Ivan Todoroski
  0 siblings, 1 reply; 54+ messages in thread
From: Junio C Hamano @ 2012-03-27 17:40 UTC (permalink / raw)
  To: Ivan Todoroski
  Cc: Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	git

Ivan Todoroski <grnch@gmx.net> writes:

> These test cases focus only on testing the parsing of refs on stdin,
> without bothering with the rest of the fetch-pack machinery. We pass in
> the refs using different combinations of command line and stdin and then
> we watch fetch-pack's stdout to see whether it prints all the refs we
> specified and in the same order as we specified them.

Thanks for writing what it does concisely and clearly.  It makes it very
pleasant to review the patch.

It is sensible to expect that we see all the refs we told it to fetch, but
I do not think it is sensible to require they come in the same order as we
have given them to the command.

> For the --stateless-rpc tests we cannot easily execute fetch-pack to the
> end because that would require simulating the remote protocol. We settle
> for only checking 2 cases:
>
> 1) Whether fetch-pack correctly fails to parse the refs if they are not
> terminated by a flush packet
>
> 2) Whether fetch-pack finishes parsing the refs without error when they
> are correctly terminated by a flush packet
>
> The fetch-pack invocation fails in both cases due to the missing remote
> side of the protocol, but it fails in different ways which allows us to
> determine how the refs parsing ended by inspecting the different error
> messages.

Ick.

> ---
>  t/t5500-fetch-pack.sh |  100 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 100 insertions(+)
>
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index 9bf69e9a0f..f4de7d07c1 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -248,4 +248,104 @@ test_expect_success 'clone shallow object count' '
>  	grep "^count: 52" count.shallow
>  '
>  
> +
> +cat >stdin.exp <<EOF
> +refs/heads/C
> +refs/heads/A
> +refs/heads/D
> +refs/tags/C
> +refs/heads/B
> +refs/tags/A
> +refs/heads/E
> +refs/tags/B
> +refs/tags/E
> +refs/tags/D
> +EOF
> +
> +test_expect_success 'setup tests for --stdin' '
> +	for head in C D E F; do
> +		add $head
> +	done &&
> +	for head in A B C D E F; do
> +		git tag $head $head
> +	done
> +'

Drop ";" and write "do" on its own line with proper indentation.

> +
> +test_expect_success 'fetch refs from cmdline, make sure it still works OK' '
> +	cd client &&
> +	git fetch-pack --no-progress .. $(cat ../stdin.exp) |
> +	cut -d " " -f 2 > ../stdin.act &&
> +	cd .. &&
> +	test_cmp stdin.exp stdin.act
> +'

 - Do not chdir around without being in a subprocess ();

 - Do not place the command you are testing that might crash on the
   upstream of the pipe;

 - style;

i.e.

	(
		cd client &&
                git fetch-pack --no-progress .. $(cat ../stdin.exp) >../stdin.raw
	) &&
        cut -d " " -f 2 <stdin.raw | sort >stdin.act &&
	test_cmp stdin.exp stdin.act

Note that I lifted the "in the same order" requirement, which should not
be there.  You may need to adjust the hardcoded stdin.exp to be sorted.

> +test_expect_success 'fetch refs from stdin' '
> +	cd client &&
> +	cat ../stdin.exp |
> +	git fetch-pack --stdin --no-progress .. |
> +	cut -d " " -f 2 > ../stdin.act &&
> +	cd .. &&
> +	test_cmp stdin.exp stdin.act
> +'

In addition to the comments on the previous one:

 - Do not pipe output from cat.

i.e.

	(
		cd client &&
                git fetch-pack ... <../stdin.exp >stdin.raw
	) &&
	cut -d " " -f 2 <stdin.raw | sort >stdin.act &&
	test_cmp stdin.exp stdin.act

By the way, why are these not called "expect" and "actual" like most other
tests?

> +test_expect_success 'fetch mixed refs from cmdline and stdin in right order' '
> +	cd client &&
> +	tail -n +5 ../stdin.exp |
> +	git fetch-pack --stdin --no-progress .. $(head -n 4 ../stdin.exp) |
> +	cut -d " " -f 2 > ../stdin.act &&
> +	cd .. &&
> +	test_cmp stdin.exp stdin.act
> +'

Ditto.

Do we want to have a separate test to see what happens when there are dups
in the input?

> +# remove final newline and insert random spaces, NULs, and empty lines
> +head -c -1 stdin.exp | sed -e '
> +	1i
> +	2s/^\|$/   /g
> +	4s/^/    /
> +	6s/$/    /
> +	8s/$/\n     /
> +	9s/$/Ztrailing garbage/
> +	9s/^\|$/    /g
> +' | tr "Z" "\000" > stdin.spaces

Somebody may want to try this sed expression on Macs and BSDs, but more
importantly...

> +test_expect_success 'ignore leading/trailing spaces, empty lines and NULs' '
> +	cd client &&
> +	cat ../stdin.spaces |
> +	git fetch-pack --stdin --no-progress .. |
> +	cut -d " " -f 2 > ../stdin.act &&
> +	cd .. &&
> +	test_cmp stdin.exp stdin.act
> +'

... it is unclear why it is a good thing to let the input with garbage
produce a result instead of erroring out.

The rest of the patch skimmed but not thoroughly reviewed; seems to have
similar issues as already have been pointed out above.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 4/4] remote-curl: main test case for the OS command line overflow
  2012-03-27  6:28                 ` [PATCH/RFC v2 4/4] remote-curl: main test case for the OS command line overflow Ivan Todoroski
@ 2012-03-27 17:43                   ` Junio C Hamano
  0 siblings, 0 replies; 54+ messages in thread
From: Junio C Hamano @ 2012-03-27 17:43 UTC (permalink / raw)
  To: Ivan Todoroski
  Cc: Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	git

Ivan Todoroski <grnch@gmx.net> writes:

> +test_expect_success EXPENSIVE 'create 50,000 tags in the repo' '
> +	(
> +	cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
> +	N=50000 &&
> +	for ((i=1; i<=$N; i++)); do

That's bash-ism isn't it?

	N=50000 I=1
        while test $I -le $N
        do
		...
                I=$(( $I + 1 ))
	done

The body of the test itself looked reasonable.

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin
  2012-03-27 16:59                   ` Junio C Hamano
@ 2012-03-27 23:18                     ` Ivan Todoroski
  2012-03-27 23:26                       ` Junio C Hamano
  0 siblings, 1 reply; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-27 23:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	git

On 27.03.2012 18:59, Junio C Hamano wrote:
> Ivan Todoroski <grnch@gmx.net> writes:
> 
>> +		int alloc_heads = nr_heads;
>> +		int size = nr_heads * sizeof(*heads);
>> +		heads = memcpy(xmalloc(size), heads, size);
>> +		if (args.stateless_rpc) {
>> +			/* in stateless RPC mode we use pkt-line to read
>> +			   from stdin, until we get a flush packet */
>> +			static char line[1000];
> 
> We will never have a refname that is longer than this limit?

I don't know. I grepped the code for existing usages of packet_read_line 
and that seemed to be a common idiom everywhere. Should I just bump up 
the size or is there some accepted way to read arbitrary length packets?

>> +			}
>> +		}
>> +		else {
>> +			/* read from stdin one ref per line, until EOF */
>> +			struct strbuf line;
>> +			strbuf_init(&line, 0);
>> +			for (;;) {
>> +				if (strbuf_getline(&line, stdin, '\n') == EOF)
>> +					break;
>> +				strbuf_trim(&line);
>> +				if (!line.len)
>> +					continue; /* skip empty lines */
> 
> Curious.  "stop at EOF", "trim" and "skip empty" imply that you are
> catering to people who debug this from the terminal by typing (or copy
> pasting).  Is that the expected use case?

The expected use case is people using this from shell scripts that could 
be getting refs by slicing and dicing output of other commands with 
regexps and what not which could leave some whitespace here and there, 
so a more liberal interface might be more friendly to such script writers.

Currently you would pass a list of generated refs to fetch-pack using 
something like this:

     generate-refs | xargs fetch-pack

or this:

     fetch-pack $(generate-refs)

Both of these commands will ignore any extra whitespace produced by 
"generate-refs".

Since --stdin is meant to be a spiritual replacement for the two 
commands above, I thought it should behave in a similar spirit.

At least that was the reasoning... if you're not swayed by it I can just 
remove those lines and not tolerate any extra whitespace.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 2/4] remote-curl: send the refs to fetch-pack on stdin
  2012-03-27 17:18                   ` Junio C Hamano
@ 2012-03-27 23:20                     ` Ivan Todoroski
  0 siblings, 0 replies; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-27 23:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	git

On 27.03.2012 19:18, Junio C Hamano wrote:
> Ivan Todoroski <grnch@gmx.net> writes:
> 
>> +	strbuf_init(&preamble, 4);
> 
> Curious.
> 
> If "4" does not really matter, I would drop this and use STRBUF_INIT at
> the beginning instead.

I only put 4 because I know the strbuf will always have at least 4 
bytes, because it always includes a flush packet. It doesn't matter at 
all otherwise, I will do as you suggest.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin
  2012-03-27 23:18                     ` Ivan Todoroski
@ 2012-03-27 23:26                       ` Junio C Hamano
  2012-03-27 23:48                         ` Ivan Todoroski
  0 siblings, 1 reply; 54+ messages in thread
From: Junio C Hamano @ 2012-03-27 23:26 UTC (permalink / raw)
  To: Ivan Todoroski
  Cc: Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	git

Ivan Todoroski <grnch@gmx.net> writes:

> On 27.03.2012 18:59, Junio C Hamano wrote:
>> Ivan Todoroski <grnch@gmx.net> writes:
>>
>>> +		int alloc_heads = nr_heads;
>>> +		int size = nr_heads * sizeof(*heads);
>>> +		heads = memcpy(xmalloc(size), heads, size);
>>> +		if (args.stateless_rpc) {
>>> +			/* in stateless RPC mode we use pkt-line to read
>>> +			   from stdin, until we get a flush packet */
>>> +			static char line[1000];
>>
>> We will never have a refname that is longer than this limit?
>
> I don't know. I grepped the code for existing usages of packet_read_line
> and that seemed to be a common idiom everywhere. Should I just bump up
> the size or is there some accepted way to read arbitrary length packets?

As you may have already guessed, you can have up to (64k - slop) bytes in
a packet.  But it probably does not matter, as a loong ref that bust your
limit will not fit in the packet used in the normal codepath anyway.

>> Curious.  "stop at EOF", "trim" and "skip empty" imply that you are
>> catering to people who debug this from the terminal by typing (or copy
>> pasting).  Is that the expected use case?
>
> The expected use case is people using this from shell scripts that
> could be getting refs by slicing and dicing output of other commands
> with regexps and what not

That kind of use would not have leading or trailing whitespaces, and if
the target audience is scripts, I would prefer to force them to be
strict.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 3/4] fetch-pack: test cases for the new --stdin option
  2012-03-27 17:40                   ` Junio C Hamano
@ 2012-03-27 23:36                     ` Ivan Todoroski
  2012-03-27 23:43                       ` Junio C Hamano
  2012-03-28  0:14                       ` Ivan Todoroski
  0 siblings, 2 replies; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-27 23:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	git

On 27.03.2012 19:40, Junio C Hamano wrote:
> 
> It is sensible to expect that we see all the refs we told it to fetch, but
> I do not think it is sensible to require they come in the same order as we
> have given them to the command.

Oh, OK. I was not aware that fetch-pack is free to reorder commits (at 
least in principle). I will adjust the tests to be order-independent.

>> For the --stateless-rpc tests we cannot easily execute fetch-pack to the
>> end because that would require simulating the remote protocol. We settle
>> for only checking 2 cases:
>>
>> 1) Whether fetch-pack correctly fails to parse the refs if they are not
>> terminated by a flush packet
>>
>> 2) Whether fetch-pack finishes parsing the refs without error when they
>> are correctly terminated by a flush packet
>>
>> The fetch-pack invocation fails in both cases due to the missing remote
>> side of the protocol, but it fails in different ways which allows us to
>> determine how the refs parsing ended by inspecting the different error
>> messages.
> 
> Ick.

Yeah... I couldn't figure out a way to do an isolated test of the 
packetized version of --stdin when --stateless-rpc is also in effect. 
Any guidance here would be welcome.

On second thought, maybe we can just drop these two --stateless-rpc 
tests from this patch? The "git clone" test in the next patch also 
exercises the packetized refs in --stateless-rpc mode and if there was 
anything wrong with them it would fail.

>> +
>> +test_expect_success 'fetch refs from cmdline, make sure it still works OK' '
>> +	cd client &&
>> +	git fetch-pack --no-progress .. $(cat ../stdin.exp) |
>> +	cut -d " " -f 2 > ../stdin.act &&
>> +	cd .. &&
>> +	test_cmp stdin.exp stdin.act
>> +'
> 
>  - Do not chdir around without being in a subprocess ();

Sorry, I didn't realize the tests were eval-ed in the current 
environment. I will correct all such problems in the next version.

>  - Do not place the command you are testing that might crash on the
>    upstream of the pipe;
> 
>  - style;

Noted.

> 	(
> 		cd client &&
>                 git fetch-pack ... <../stdin.exp >stdin.raw
> 	) &&
> 	cut -d " " -f 2 <stdin.raw | sort >stdin.act &&
> 	test_cmp stdin.exp stdin.act
> 
> By the way, why are these not called "expect" and "actual" like most other
> tests?

The test files I worked with used the shorter exp/act convention so I 
followed that.

Or are you wondering about the "stdin." prefix I added? That was because 
I didn't want to overwrite any exp/act files that might be created by 
the previous tests, or is that not a concern?

Sorry, my inexperience with the Git test framework is showing...

> Do we want to have a separate test to see what happens when there are dups
> in the input?

Good idea. I will check what happens in the current fetch-pack if 
duplicate refs are passed on the command line and will write a test to 
ensure that the --stdin version does the same thing.


P.S.
It goes without saying that I implicitly accept all the corrections that 
I did not reply to and will include them in the next version.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 3/4] fetch-pack: test cases for the new --stdin option
  2012-03-27 23:36                     ` Ivan Todoroski
@ 2012-03-27 23:43                       ` Junio C Hamano
  2012-03-28  0:14                       ` Ivan Todoroski
  1 sibling, 0 replies; 54+ messages in thread
From: Junio C Hamano @ 2012-03-27 23:43 UTC (permalink / raw)
  To: Ivan Todoroski
  Cc: Junio C Hamano, Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy,
	Jakub Narebski, git

Ivan Todoroski <grnch@gmx.net> writes:

> On second thought, maybe we can just drop these two --stateless-rpc
> tests from this patch? The "git clone" test in the next patch also
> exercises the packetized refs in --stateless-rpc mode and if there was
> anything wrong with them it would fail.

Yeah, I was thinking about the same.

>>> +
>>> +test_expect_success 'fetch refs from cmdline, make sure it still works OK' '
>>> +	cd client &&
>>> +	git fetch-pack --no-progress .. $(cat ../stdin.exp) |
>>> +	cut -d " " -f 2 > ../stdin.act &&
>>> +	cd .. &&
>>> +	test_cmp stdin.exp stdin.act
>>> +'
>>
>>  - Do not chdir around without being in a subprocess ();
>
> Sorry, I didn't realize the tests were eval-ed in the current
> environment. I will correct all such problems in the next version.
>
>>  - Do not place the command you are testing that might crash on the
>>    upstream of the pipe;
>>
>>  - style;
>
> Noted.
>
>> 	(
>> 		cd client &&
>>                 git fetch-pack ... <../stdin.exp >stdin.raw
>> 	) &&
>> 	cut -d " " -f 2 <stdin.raw | sort >stdin.act &&
>> 	test_cmp stdin.exp stdin.act
>>
>> By the way, why are these not called "expect" and "actual" like most other
>> tests?
>
> The test files I worked with used the shorter exp/act convention so I
> followed that.
>
> Or are you wondering about the "stdin." prefix I added?

No.  I was referring to just that these two files were not literally named
"expect"/"actual", and I was lazy to look beyond what was in the patch
context ;-).  If the surrounding tests uses exp/act, mimicking them in
this patch is a good idea (we may want to fix them later but that is a
separate topic, and should not be done in this patch).

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin
  2012-03-27 23:26                       ` Junio C Hamano
@ 2012-03-27 23:48                         ` Ivan Todoroski
  0 siblings, 0 replies; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-27 23:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	git

On 28.03.2012 01:26, Junio C Hamano wrote:
> Ivan Todoroski <grnch@gmx.net> writes:
> 
>>> Curious.  "stop at EOF", "trim" and "skip empty" imply that you are
>>> catering to people who debug this from the terminal by typing (or copy
>>> pasting).  Is that the expected use case?
>> The expected use case is people using this from shell scripts that
>> could be getting refs by slicing and dicing output of other commands
>> with regexps and what not
> 
> That kind of use would not have leading or trailing whitespaces, and if
> the target audience is scripts, I would prefer to force them to be
> strict.

Depends on how good the script writer is. :)

But you're right, they can always change the script to tighten up what 
it generates.

Also, I just checked what other commands with --stdin flags do (like 
rev-list, checkout-index, etc). They all expect an exact format like you 
said and they don't tolerate extra whitespace, making my patch the odd 
one out. It is important that similar flags behave consistently across 
different commands.

Sorry for not checking for precedent before deciding to tolerate 
whitespace. I will tighten it up.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH/RFC v2 3/4] fetch-pack: test cases for the new --stdin option
  2012-03-27 23:36                     ` Ivan Todoroski
  2012-03-27 23:43                       ` Junio C Hamano
@ 2012-03-28  0:14                       ` Ivan Todoroski
  1 sibling, 0 replies; 54+ messages in thread
From: Ivan Todoroski @ 2012-03-28  0:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Shawn Pearce, Nguyen Thai Ngoc Duy, Jakub Narebski,
	git

On 28.03.2012 01:36, Ivan Todoroski wrote:
> On 27.03.2012 19:40, Junio C Hamano wrote:
>> 
>> It is sensible to expect that we see all the refs we told it to
>> fetch, but I do not think it is sensible to require they come in
>> the same order as we have given them to the command.
> 
> Oh, OK. I was not aware that fetch-pack is free to reorder commits
> (at least in principle). I will adjust the tests to be
> order-independent.

s/reorder commits/reorder refs for fetching/

I didn't mean actually reordering commits in history, of course.

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2012-03-28  0:14 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-18  8:14 Clone fails on a repo with too many heads/tags Ivan Todoroski
2012-03-18 11:37 ` Ivan Todoroski
2012-03-18 12:04   ` Nguyen Thai Ngoc Duy
2012-03-18 16:36 ` Jakub Narebski
2012-03-18 19:07   ` Jeff King
2012-03-18 22:07     ` Jakub Narebski
2012-03-19  2:32       ` Jeff King
2012-03-19  2:43         ` Nguyen Thai Ngoc Duy
2012-03-19  2:45           ` Jeff King
2012-03-19  1:05     ` Ivan Todoroski
2012-03-19  1:30     ` Nguyen Thai Ngoc Duy
2012-03-19  2:44       ` Jeff King
2012-03-21 11:05         ` Ivan Todoroski
2012-03-21 14:28           ` Shawn Pearce
2012-03-21 17:14             ` Jeff King
2012-03-21 17:59               ` Jakub Narebski
2012-03-21 20:02               ` Ivan Todoroski
2012-03-21 20:17                 ` Jeff King
2012-03-24 20:49                   ` Ivan Todoroski
2012-03-25  1:06                     ` Jeff King
2012-03-25  2:32                       ` Jeff King
2012-03-25 17:33                         ` Ivan Todoroski
2012-03-25 17:54                           ` Ivan Todoroski
2012-03-26 17:33                             ` Jeff King
2012-03-27  7:07                               ` Ivan Todoroski
2012-03-25 15:30                       ` Ivan Todoroski
2012-03-24 20:53                   ` [PATCH/RFC 1/2] fetch-pack: new option to read refs from stdin Ivan Todoroski
2012-03-25  1:19                     ` Jeff King
2012-03-25  9:39                       ` Ivan Todoroski
2012-03-25 15:15                         ` Ivan Todoroski
2012-03-25 20:00                       ` Ivan Todoroski
2012-03-26 17:21                         ` Jeff King
2012-03-26 17:49                           ` Ivan Todoroski
2012-03-26 17:51                             ` Jeff King
2012-03-24 20:54                   ` [PATCH/RFC 2/2] remote-curl: send the refs to fetch-pack on stdin Ivan Todoroski
2012-03-25  1:24                     ` Jeff King
2012-03-25  9:52                       ` Ivan Todoroski
2012-03-26 17:24                         ` Jeff King
2012-03-27  6:23               ` [PATCH/RFC v2 0/4] Fix fetch-pack command line overflow during clone Ivan Todoroski
2012-03-27  6:25                 ` [PATCH/RFC v2 1/4] fetch-pack: new --stdin option to read refs from stdin Ivan Todoroski
2012-03-27 16:59                   ` Junio C Hamano
2012-03-27 23:18                     ` Ivan Todoroski
2012-03-27 23:26                       ` Junio C Hamano
2012-03-27 23:48                         ` Ivan Todoroski
2012-03-27  6:26                 ` [PATCH/RFC v2 2/4] remote-curl: send the refs to fetch-pack on stdin Ivan Todoroski
2012-03-27 17:18                   ` Junio C Hamano
2012-03-27 23:20                     ` Ivan Todoroski
2012-03-27  6:27                 ` [PATCH/RFC v2 3/4] fetch-pack: test cases for the new --stdin option Ivan Todoroski
2012-03-27 17:40                   ` Junio C Hamano
2012-03-27 23:36                     ` Ivan Todoroski
2012-03-27 23:43                       ` Junio C Hamano
2012-03-28  0:14                       ` Ivan Todoroski
2012-03-27  6:28                 ` [PATCH/RFC v2 4/4] remote-curl: main test case for the OS command line overflow Ivan Todoroski
2012-03-27 17:43                   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).