Git development

Git development
 help / color / mirror / Atom feed

* Re: t5800-*.sh: Intermittent test failures
From: Sverre Rabbelier @ 2011-11-02 23:35 UTC (permalink / raw)
  To: Alex Riesen, Ævar Arnfjörð
  Cc: Junio C Hamano, Ramsay Jones, Jeff King, GIT Mailing-list,
	Jonathan Nieder
In-Reply-To: <CALxABCbKSi-aHezjyn5wJ0-BPW1PvvaC2i9VeV7yXOf4yCdx4Q@mail.gmail.com>

Heya,

On Wed, Nov 2, 2011 at 00:02, Alex Riesen <raa.lkml@gmail.com> wrote:
> On Tue, Nov 1, 2011 at 23:18, Junio C Hamano <gitster@pobox.com> wrote:
>> Alex Riesen <raa.lkml@gmail.com> writes:
>>
>>> On Sun, Sep 11, 2011 at 21:14, Ramsay Jones <ramsay@ramsay1.demon.co.uk> wrote:
>>>> ... these hangs *are* the failures of which I speak!  Yes, the script
>>>> doesn't get to declare a failure, but AFAIAC a hanging test (and it
>>>> isn't the same test # each time) is a failing test. :-D
>>>
>>> Was there any outcome of this discussion? I'm asking because I
>>> can reproduce this very reliably on a little server here.
>>
>> I do remember this discussion and recall seeing _no_ outcome.
>>
>> I did see the hang myself once or twice but did not and do not have a
>> reliable reproduction. I have been waiting for somebody to raise the issue
>> again ;-).
>>
>
> I think I managed to bisect it (between 1.7.6 and 1.7.7):
>
> $ git bisect start v1.7.7 v1.7.6
> ...
> $ git bisect good
> a515ebe9f1ac9bc248c12a291dc008570de505ca is the first bad commit
> commit a515ebe9f1ac9bc248c12a291dc008570de505ca
> Author: Sverre Rabbelier <srabbelier@gmail.com>
> Date:   Sat Jul 16 15:03:40 2011 +0200
>
>    transport-helper: implement marks location as capability
>
>    Now that the gitdir location is exported as an environment variable
>    this can be implemented elegantly without requiring any explicit
>    flushes nor an ad-hoc exchange of values.
>
>    Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
>    Acked-by: Jeff King <peff@peff.net>
>    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>
> :100644 100644 1ed7a5651ef5a2320c56856b5a1fe784e178ab23
> e9c832bfd3da7db771cc2113027d3e590dc51d59 M      git-remote-testgit.py
> :100644 100644 0cfc9ae9059ce121b567406d7941b71cd54b961c
> 74c3122df1835c45a6b621205fb18b4fc89af366 M      transport-helper.c
>
> Sadly, I'm going to be able to repeat the test in about 20 hours.

Ævar, this seems like something we could look at during the mini
GitTogether in Amsterdam this Saturday, no?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Junio C Hamano @ 2011-11-02 23:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: git, James Bottomley, Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <CA+55aFx_rAA6TJkZn1Zvu6u9UjxnmTVt0HpMnvaE_q9Sx-jzPg@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> I hate how anonymous our branches are. Sure, we can use good names for
> them, but it was a mistake to think we should describe the repository
> (for gitweb), rather than the branch.
>
> Ok, "hate" is a strong word. I don't "hate" it. I don't even think
> it's a major design issue. But I do think that it would have been
> nicer if we had had some branch description model.
> ...
> Maybe just verifying the email message (with the suggested kind of
> change to "git request-pull") is actually the right approach. And what
> I should do is to just wrap my "git pull" in some script that I can
> just cut-and-paste the gpg-signed thing into, and which just does the
> "gpg --verify" on it, and then does the "git pull" after that.
>
> Because in many ways, "git request-pull" is when you do want to sign
> stuff. A developer might well want to push out his stuff for some
> random internal testing (linux-next, for example), and then only later
> decide "Ok, it was all good, now I want to make it 'official' and ask
> Linus to pull it", and sign it at *that* time, rather than when
> actually pushing it out.

You keep saying cut-and-paste, but do you mind feeding the e-mail text
itself to a tool, instead of cut-and-paste?

The reason I am wondering about this is because in another topic (also in
'next') cooking there is an extended support for topic description for the
branch that states what the purpose of the topic is why the requestor
wants you to have it (this information can be set and updated with "git
branch --edit-description").

A respond-to-request-pull wrapper you would use could be:

 - Get the e-mail from the standard input;
 - Pick up the signed bits and validate the signature;
 - Perform the requested fetch; and
 - Record the merge (or prepare .git/MERGE_MSG) with both the signed bits.

and the "signed bits" could include:

   - the repository and the branch you were expected to pull;
   - the topic description.

among other things the requestor can edit when request-pull message is
prepared.

That would get us back to your "the lieutenant tip is not so special, but
the merge commit the integrator makes using that tip has the signature for
this particular pull" model.

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Jeff King @ 2011-11-02 23:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, netroby, Git Mail List, Tomas Carnecky
In-Reply-To: <7vwrbigna7.fsf@alter.siamese.dyndns.org>

On Wed, Nov 02, 2011 at 03:41:36PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Which is all a roundabout way of saying that the git protocol is really
> > the sane way to do efficient transfers. An alternative, much simpler
> > scheme would be for the server to just say:
> >
> >   - if you have nothing, then prime with URL http://host/bundle
> >
> > And then _only_ clone would bother with checking mirrors. People doing
> > fetch would be expected to do it often enough that not being resumable
> > isn't a big deal.
> 
> I think that is a sensible place to start.

OK. That had been my original intent, but somebody (you?) mentioned the
"if you have X" thing at the GitTogether, which got me thinking.

I don't mind starting slow, as long as we don't paint ourselves into a
corner for future expansion. I'll try to design the data format for
specifying the mirror locations with that extension in mind.

Even if the bundle thing ends up too wasteful, it may still be useful to
offer a "if you don't have X, go see Y" type of mirror when "Y" is
something efficient, like git:// at a faster host (i.e., the "I built 3
commits on top of Linus" case).

> A more fancy conditional "If you have X then fetch this, if you have Y
> fetch that, ..." sounds nice but depending on what branch you are fetching
> the answer has to be different. If we were to do that, the natural place
> for the server to give the redirect instruction to the client is after the
> client finishes saying "want", and before the client starts saying "have".

Agreed. I was really trying to avoid protocol extensions, though, at
least for an initial version. I'd like to see how far we can get doing
the simplest thing.

-Peff

^ permalink raw reply

* Re: git-p4: problem with commit 97a21ca50ef8
From: Michael Wookey @ 2011-11-02 22:42 UTC (permalink / raw)
  To: Vitor Antunes; +Cc: Pete Wyckoff, Git Mailing List, Luke Diamand
In-Reply-To: <loom.20111102T153631-769@post.gmane.org>

On 3 November 2011 01:43, Vitor Antunes <vitor.hda@gmail.com> wrote:
> Michael Wookey <michaelwookey <at> gmail.com> writes:
>> Of course, I'd love to have git-p4 work seamlessly for this scenario.
>> Even Perforce have a KB article on the limitation of the "apple"
>> filetype with git-p4:
>>
>>   http://kb.perforce.com/article/1417/git-p4
>>
> """
> Step 2: Download Git-p4
>
> Recommended version is ermshiperete’s branch, which is available from:
>
> https://github.com/ermshiperete/git-p4
>
> Note: Omit the “git-p4.py25” file, which is an older version that is no longer
> needed.
> Avoid Kernel.org’s Version of Git-p4
>
> Git’s main source at http://git-scm.com/download and
> http://www.kernel.org/pub/software/scm/git/ contains an older version of Git-p4
> with limitations that ermshiperete’s branch avoids.
> """
>
> I can almost guess _who_ wrote this KB ;)
>
> But this is really frustrating. Why can't people just cooperate to make sure the
> version in the main branch is the latest?

I tried your suggested version of git-p4 (at rev 630fb678c46c) and
unfortunately, the perforce repository fails to import. Firstly, there
was a problem with importing UTF-16 encoded files, secondly the
"apple" filetype files are still skipped.

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Junio C Hamano @ 2011-11-02 22:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, netroby, Git Mail List, Tomas Carnecky
In-Reply-To: <20111102220614.GB14108@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Which is all a roundabout way of saying that the git protocol is really
> the sane way to do efficient transfers. An alternative, much simpler
> scheme would be for the server to just say:
>
>   - if you have nothing, then prime with URL http://host/bundle
>
> And then _only_ clone would bother with checking mirrors. People doing
> fetch would be expected to do it often enough that not being resumable
> isn't a big deal.

I think that is a sensible place to start.

A more fancy conditional "If you have X then fetch this, if you have Y
fetch that, ..." sounds nice but depending on what branch you are fetching
the answer has to be different. If we were to do that, the natural place
for the server to give the redirect instruction to the client is after the
client finishes saying "want", and before the client starts saying "have".

^ permalink raw reply

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Daniel Stenberg @ 2011-11-02 22:40 UTC (permalink / raw)
  To: Mika Fischer; +Cc: Jeff King, git, gitster
In-Reply-To: <CAOs=hR+QqUpYuth8Uvi2o7pm1LO8ogO2pN7nrMchYj96Cutmww@mail.gmail.com>

On Wed, 2 Nov 2011, Mika Fischer wrote:

> The only problem I can see is that curl_multi_fdset is not guaranteed to 
> return any fds. So in theory it could be possible that we don't get fds, but 
> we're actually reading stuff. In this case things would get slow, because we 
> would sleep for 50ms after every read...
>
> However, I don't know if this is a case that actually comes up in the real 
> world. Maybe Daniel has some advice on this.

It doesn't really happen so it should be safe.

The case where no fds are returned is when libcurl cannot return a socket to 
wait for during name resolving (if your particular libcurl is built to use 
such a resolver backend - libcurl has several different ones). And during name 
resolving there won't be any data to read for the libcurl-app anyway.

-- 

  / daniel.haxx.se

^ permalink raw reply

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Mika Fischer @ 2011-11-02 22:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, daniel
In-Reply-To: <20111102203221.GB5628@sigill.intra.peff.net>

On Wed, Nov 2, 2011 at 21:32, Jeff King <peff@peff.net> wrote:
> Do we still need to care about data_received?
>
> My understanding was that the code was originally trying to do:
>
>  1. Call curl, maybe get some data.
>
>  2. If we got data, then ask curl against immediately for some data.
>
>  3. Otherwise, sleep 50ms and then ask curl again.

Yes, that's exactly what it did.

> But now that we are actually selecting on the proper descriptors, it
> should now be safe to just do:
>
>  1. Call curl, maybe get some data.
>
>  2. Call select, which will wake immediately if curl is going to get
>     data.

The only problem I can see is that curl_multi_fdset is not guaranteed
to return any fds. So in theory it could be possible that we don't get
fds, but we're actually reading stuff. In this case things would get
slow, because we would sleep for 50ms after every read...

However, I don't know if this is a case that actually comes up in the
real world. Maybe Daniel has some advice on this.

Best,
 Mika

^ permalink raw reply

* Re: [ANNOUNCE] Git 1.7.7.2
From: Junio C Hamano @ 2011-11-02 22:30 UTC (permalink / raw)
  To: Stefan Roas; +Cc: git, Linux Kernel
In-Reply-To: <20111102214725.GA2860@geminga.roas.networks.roath.org>

Stefan Roas <sroas@roath.org> writes:

> is it possible that you forgot to update the GIT-VERSION-GEN with the
> release of 1.7.7.2? I stll get version 1.7.7.1 from the tarball on
> http://git-scm.com/ and when building from the git repository itself.

Probably.

^ permalink raw reply

* Re: [PATCH] Escape file:// URL's to meet subversion SVN::Ra requirements
From: Eric Wong @ 2011-11-02 22:09 UTC (permalink / raw)
  To: Ben Walton; +Cc: Jonathan Nieder, git
In-Reply-To: <1320260449-sup-479@pinkfloyd.chass.utoronto.ca>

Ben Walton <bwalton@artsci.utoronto.ca> wrote:
> Sorry for the clumsy patch.

I don't have much time to help you fix it, but I got numerous errors on
SVN 1.6.x (svn 1.6.12).  Can you make sure things continue to work on
1.6 and earlier, also?

Maybe just enable the escaping for file:// on >= SVN 1.7

Here are the tests that failed for me:

make[1]: *** [t9100-git-svn-basic.sh] Error 1
make[1]: *** [t9103-git-svn-tracked-directory-removed.sh] Error 1
make[1]: *** [t9104-git-svn-follow-parent.sh] Error 1
make[1]: *** [t9105-git-svn-commit-diff.sh] Error 1
make[1]: *** [t9107-git-svn-migrate.sh] Error 1
make[1]: *** [t9108-git-svn-glob.sh] Error 1
make[1]: *** [t9109-git-svn-multi-glob.sh] Error 1
make[1]: *** [t9110-git-svn-use-svm-props.sh] Error 1
make[1]: *** [t9111-git-svn-use-svnsync-props.sh] Error 1
make[1]: *** [t9114-git-svn-dcommit-merge.sh] Error 1
make[1]: *** [t9116-git-svn-log.sh] Error 1
make[1]: *** [t9117-git-svn-init-clone.sh] Error 1
make[1]: *** [t9118-git-svn-funky-branch-names.sh] Error 1
make[1]: *** [t9120-git-svn-clone-with-percent-escapes.sh] Error 1
make[1]: *** [t9125-git-svn-multi-glob-branch-names.sh] Error 1
make[1]: *** [t9127-git-svn-partial-rebuild.sh] Error 1
make[1]: *** [t9128-git-svn-cmd-branch.sh] Error 1
make[1]: *** [t9130-git-svn-authors-file.sh] Error 1
make[1]: *** [t9135-git-svn-moved-branch-empty-file.sh] Error 1
make[1]: *** [t9136-git-svn-recreated-branch-empty-file.sh] Error 1
make[1]: *** [t9141-git-svn-multiple-branches.sh] Error 1
make[1]: *** [t9145-git-svn-master-branch.sh] Error 1
make[1]: *** [t9146-git-svn-empty-dirs.sh] Error 1
make[1]: *** [t9150-svk-mergetickets.sh] Error 1
make[1]: *** [t9151-svn-mergeinfo.sh] Error 1
make[1]: *** [t9153-git-svn-rewrite-uuid.sh] Error 1
make[1]: *** [t9154-git-svn-fancy-glob.sh] Error 1
make[1]: *** [t9155-git-svn-fetch-deleted-tag.sh] Error 1
make[1]: *** [t9156-git-svn-fetch-deleted-tag-2.sh] Error 1
make[1]: *** [t9157-git-svn-fetch-merge.sh] Error 1
make[1]: *** [t9159-git-svn-no-parent-mergeinfo.sh] Error 1
make[1]: *** [t9161-git-svn-mergeinfo-push.sh] Error 1

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Jeff King @ 2011-11-02 22:06 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: netroby, Git Mail List, Tomas Carnecky
In-Reply-To: <20111031090717.GA24978@elie.hsd1.il.comcast.net>

On Mon, Oct 31, 2011 at 04:07:18AM -0500, Jonathan Nieder wrote:

> Something like Jeff's "priming the well with a server-specified
> bundle" proposal[2] might be a good way to make the same trick
> transparent to clients in the future.

Yes, that is one of the use cases I hope to address. But it will require
the publisher specifying a mirror location (it's possible we could add
some kind of automagic "hit a bundler service first" config option,
though I fear that the existing small-time bundler services would
crumble under the load).

So in the general case (and in the meantime), you may have to learn to
manually prime the repo using a bundle.

I haven't started on the patches for communicating mirror sites between
the server and client, but I did just write some patches to handle "git
fetch http://host/path/to/file.bundle" automatically, which is the first
step. They need a few finishing touches and some testing, though.

> Even with that, later fetches, which grab a pack generated on the fly
> to only contain the objects not already fetched, are generally not
> resumable.  Overcoming that would presumably require larger protocol
> changes, and I don't know of anyone working on it.  (My workaround
> when in a setup where this mattered was to use the old-fashioned
> "dumb" http protocol.  It worked fine.)

My goal was for the mirror communication between client and server to be
something like:

  - if you don't have object XXXXXX, then prime with URL
    http://host/bundle1

  - if you don't have object YYYYYY, then prime with URL
    http://host/bundle2

and so forth. A cloning client would grab the first bundle, then the
second, and then hit the real repo via the git protocol. A client who
had previously cloned might have XXX, but would now grab bundle2, and
then hit the real repo.

So depending on how often the server side feels like creating new
bundles, you would get most of the changes via bundles, and then only
be getting a small number of objects via git.

The downside of cumulative fetching is that the bundles can only serve
well-known checkpoints. So if you have a timeline like this:

  t0: server publishes bundle/mirror config with one line (the XXX bit
      above)

  t1: you clone, getting the whole bundle. No waste, because you had
      nothing in the first place, and you needed everything.

  t2: you fetch again, getting N commits worth of history via the git
      protocol

  t3: server decides a lot of new objects (let's say M commits worth)
      have accumulated, and generates a new line (the YYY line).

  t4: you fetch, see that you don't yet have YYY, and grab the second
      bundle

But in t4 you grabbed a bundle containing M commits, when you already
had the first N of them. So you actually wasted bandwidth getting
objects you already had. The only benefit is that you grabbed a static
file, which is resumable.

So I suspect there is some black magic involved in deciding when to
create a new bundle, and at what tip. If you create a bundle once a
month, but include only commits up to a week ago, then people pulling
weekly will never grab the bundle, but people pulling less frequently
will get the whole month as a bundle.

A secondary issue is also that in a scheme like this, your mirror list
will grow without bound. So you'd want to periodically repack everything
into a single bundle. But then people who are fetching wouldn't want
that, as it is just an exacerbated version of the same problem above.

Which is all a roundabout way of saying that the git protocol is really
the sane way to do efficient transfers. An alternative, much simpler
scheme would be for the server to just say:

  - if you have nothing, then prime with URL http://host/bundle

And then _only_ clone would bother with checking mirrors. People doing
fetch would be expected to do it often enough that not being resumable
isn't a big deal.

-Peff

^ permalink raw reply

* Re: [ANNOUNCE] Git 1.7.7.2
From: Stefan Roas @ 2011-11-02 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linux Kernel
In-Reply-To: <7v7h3jl3kw.fsf@alter.siamese.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

Hi Junio,

is it possible that you forgot to update the GIT-VERSION-GEN with the
release of 1.7.7.2? I stll get version 1.7.7.1 from the tarball on
http://git-scm.com/ and when building from the git repository itself.

Regards,
  Stefan

-- 
Stefan Roas                                         sroas@roath.org
Joh.-Seb.-Bach-Str. 4                            D-91083 Baiersdorf
-------------------------------------------------------------------
Key fingerprint = 557C 99BE 865B 1463 2A44  7936 C662 8970 4DA5 50B8


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: long fsck time
From: Jeff King @ 2011-11-02 21:33 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Git Mailing List
In-Reply-To: <CACsJy8B=5mEWoOBkrTfmJ+p7HxqJM97zdG-k71oW81-3XxuO_Q@mail.gmail.com>

On Wed, Nov 02, 2011 at 07:10:26PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Wed, Nov 2, 2011 at 7:06 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> > On git.git
> >
> > $ /usr/bin/time git fsck
> > 333.25user 4.28system 5:37.59elapsed 99%CPU (0avgtext+0avgdata
> > 420080maxresident)k
> > 0inputs+0outputs (0major+726560minor)pagefaults 0swaps
> >
> > That's really long time, perhaps we should print progress so users
> > know it's still running?
> 
> Ahh.. --verbose. Sorry for the noise. Still good to show the number of
> checked objects though.

fsck --verbose is _really_ verbose. It could probably stand to have some
progress meters sprinkled throughout. The patch below produces this on
my git.git repo:

  $ git fsck
  Checking object directories: 100% (256/256), done.
  Verifying packs: 100% (7/7), done.
  Checking objects (pack 1/7): 100% (241/241), done.
  Checking objects (pack 2/7): 100% (176/176), done.
  Checking objects (pack 3/7): 100% (312/312), done.
  Checking objects (pack 4/7): 100% (252/252), done.
  Checking objects (pack 5/7): 100% (353/353), done.
  Checking objects (pack 6/7): 100% (375/375), done.
  Checking objects (pack 7/7): 100% (171079/171079), done.

which gives reasonably smooth progress. The longest hang is that
"Verifying pack" 7 is slow (I believe it's doing a sha1 over the whole
thing). If you really wanted to get fancy, you could probably do a
throughput meter as we sha1 the whole contents.

Patch is below. It would need --{no-,}progress support on the command
line, and to check isatty(2) before it would be acceptable.

---
diff --git a/builtin/fsck.c b/builtin/fsck.c
index df1a88b..481de4e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -11,6 +11,7 @@
 #include "fsck.h"
 #include "parse-options.h"
 #include "dir.h"
+#include "progress.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -512,15 +513,19 @@ static void get_default_heads(void)
 static void fsck_object_dir(const char *path)
 {
 	int i;
+	struct progress *progress;
 
 	if (verbose)
 		fprintf(stderr, "Checking object directory\n");
 
+	progress = start_progress("Checking object directories", 256);
 	for (i = 0; i < 256; i++) {
 		static char dir[4096];
 		sprintf(dir, "%s/%02x", path, i);
 		fsck_dir(i, dir);
+		display_progress(progress, i+1);
 	}
+	stop_progress(&progress);
 	fsck_sha1_list();
 }
 
@@ -622,19 +627,36 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 
 	if (check_full) {
 		struct packed_git *p;
+		int i, nr_packs = 0;
+		struct progress *progress;
 
 		prepare_packed_git();
 		for (p = packed_git; p; p = p->next)
+			nr_packs++;
+
+		progress = start_progress("Verifying packs", nr_packs);
+		for (i = 1, p = packed_git; p; p = p->next, i++) {
 			/* verify gives error messages itself */
 			verify_pack(p);
+			display_progress(progress, i);
+		}
+		stop_progress(&progress);
 
-		for (p = packed_git; p; p = p->next) {
+		for (i = 1, p = packed_git; p; p = p->next, i++) {
+			char buf[32];
 			uint32_t j, num;
 			if (open_pack_index(p))
 				continue;
 			num = p->num_objects;
-			for (j = 0; j < num; j++)
+
+			snprintf(buf, sizeof(buf), "Checking objects (pack %d/%d)",
+				 i, nr_packs);
+			progress = start_progress(buf, num);
+			for (j = 0; j < num; j++) {
 				fsck_sha1(nth_packed_object_sha1(p, j));
+				display_progress(progress, j+1);
+			}
+			stop_progress(&progress);
 		}
 	}
 

^ permalink raw reply related

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Junio C Hamano @ 2011-11-02 21:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Mika Fischer, git, gitster, daniel
In-Reply-To: <20111102203543.GC5628@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Wed, Nov 02, 2011 at 04:32:21PM -0400, Jeff King wrote:
>
>> At least that's my reading. I am working on unrelated patches that clean
>> up the handling of data_received, but if it could go away altogether,
>> that would be even simpler.
>
> That patch, btw, looks like this:
>
> -- >8 --
> Subject: [PATCH] http: remove "local" member from slot struct
>
> The curl-multi http code does something like this:
>
>   while (!finished) {
> 	  try_to_read_from_slots();
> 	  if (!data_received)
> 		  wait_for_50_ms();
>   }
>
> ...
> Let's do the same thing for the write-to-file case as we do
> for the write-to-strbuf case: use a thin wrapper callback
> and increment the received flag. This makes both methods
> consistent with each other, and saves us from managing the
> "local" struct member at all, reducing the code size.

Looks very sensible.

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Junio C Hamano @ 2011-11-02 21:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: git, James Bottomley, Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <CA+55aFz7TeQQH3D4Tpp31cZYZoQKeK37jouo+2Kh61Wa07knfw@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> And "add a fake empty commit just for the signature" is not the answer
> either - because that is clearly inferior to the tags we already had.
>
> I dunno. Did I miss something? As far as I can tell, the signed tags
> that we've had since day one are *clearly* much better in very
> fundamental ways.

Ok, back to the drawing board (which is not a loss as I wasn't expecting
this to be in the official release in upcoming 1.7.8 anyway).

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Michael J Gruber @ 2011-11-02 21:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <7v1utqjqre.fsf@alter.siamese.dyndns.org>

Junio C Hamano venit, vidit, dixit 02.11.2011 19:58:
> Michael J Gruber <git@drmicha.warpmail.net> writes:
> 
>> The advantage of tags is that they can be added without rewriting the
>> commit, of course.
> 
> And you did neither think about the downsides of tags, nor read what
> others already explained for you?

We're just weighing things differently here, and no accusations of
"misinformation" or "not thinking" will change this.

^ permalink raw reply

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Jeff King @ 2011-11-02 20:35 UTC (permalink / raw)
  To: Mika Fischer; +Cc: git, gitster, daniel
In-Reply-To: <20111102203221.GB5628@sigill.intra.peff.net>

On Wed, Nov 02, 2011 at 04:32:21PM -0400, Jeff King wrote:

> At least that's my reading. I am working on unrelated patches that clean
> up the handling of data_received, but if it could go away altogether,
> that would be even simpler.

That patch, btw, looks like this:

-- >8 --
Subject: [PATCH] http: remove "local" member from slot struct

The curl-multi http code does something like this:

  while (!finished) {
	  try_to_read_from_slots();
	  if (!data_received)
		  wait_for_50_ms();
  }

Which is horrible enough in itself, because of the
hard-coded 50ms wait.

But there's some additional complexity: the method for
finding whether we received data is to actually run ftell()
on the file we are writing to to see if it got anything. So
the "local" member of the slot struct contains the FILE
pointer. Except that sometimes we don't have a file, because
we're writing to a strbuf. In that case, since curl calls
our custom callback, we just increment the data_received
flag when curl gives data to our callback.

Let's do the same thing for the write-to-file case as we do
for the write-to-strbuf case: use a thin wrapper callback
and increment the received flag. This makes both methods
consistent with each other, and saves us from managing the
"local" struct member at all, reducing the code size.

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c |   23 +++++++----------------
 http.h |    1 -
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/http.c b/http.c
index a4bc770..99386ef 100644
--- a/http.c
+++ b/http.c
@@ -93,6 +93,12 @@ curlioerr ioctl_buffer(CURL *handle, int cmd, void *clientp)
 }
 #endif
 
+size_t fwrite_file(char *ptr, size_t eltsize, size_t nmemb, void *handle)
+{
+	data_received++;
+	return fwrite(ptr, eltsize, nmemb, handle);
+}
+
 size_t fwrite_buffer(char *ptr, size_t eltsize, size_t nmemb, void *buffer_)
 {
 	size_t size = eltsize * nmemb;
@@ -538,7 +544,6 @@ struct active_request_slot *get_active_slot(void)
 
 	active_requests++;
 	slot->in_use = 1;
-	slot->local = NULL;
 	slot->results = NULL;
 	slot->finished = NULL;
 	slot->callback_data = NULL;
@@ -642,8 +647,6 @@ void step_active_slots(void)
 void run_active_slot(struct active_request_slot *slot)
 {
 #ifdef USE_CURL_MULTI
-	long last_pos = 0;
-	long current_pos;
 	fd_set readfds;
 	fd_set writefds;
 	fd_set excfds;
@@ -656,13 +659,6 @@ void run_active_slot(struct active_request_slot *slot)
 		data_received = 0;
 		step_active_slots();
 
-		if (!data_received && slot->local != NULL) {
-			current_pos = ftell(slot->local);
-			if (current_pos > last_pos)
-				data_received++;
-			last_pos = current_pos;
-		}
-
 		if (slot->in_use && !data_received) {
 			max_fd = 0;
 			FD_ZERO(&readfds);
@@ -818,13 +814,12 @@ static int http_request(const char *url, void *result, int target, int options)
 		if (target == HTTP_REQUEST_FILE) {
 			long posn = ftell(result);
 			curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION,
-					 fwrite);
+					 fwrite_file);
 			if (posn > 0) {
 				strbuf_addf(&buf, "Range: bytes=%ld-", posn);
 				headers = curl_slist_append(headers, buf.buf);
 				strbuf_reset(&buf);
 			}
-			slot->local = result;
 		} else
 			curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION,
 					 fwrite_buffer);
@@ -871,7 +866,6 @@ static int http_request(const char *url, void *result, int target, int options)
 		ret = HTTP_START_FAILED;
 	}
 
-	slot->local = NULL;
 	curl_slist_free_all(headers);
 	strbuf_release(&buf);
 
@@ -1066,7 +1060,6 @@ void release_http_pack_request(struct http_pack_request *preq)
 	if (preq->packfile != NULL) {
 		fclose(preq->packfile);
 		preq->packfile = NULL;
-		preq->slot->local = NULL;
 	}
 	if (preq->range_header != NULL) {
 		curl_slist_free_all(preq->range_header);
@@ -1088,7 +1081,6 @@ int finish_http_pack_request(struct http_pack_request *preq)
 
 	fclose(preq->packfile);
 	preq->packfile = NULL;
-	preq->slot->local = NULL;
 
 	lst = preq->lst;
 	while (*lst != p)
@@ -1157,7 +1149,6 @@ struct http_pack_request *new_http_pack_request(
 	}
 
 	preq->slot = get_active_slot();
-	preq->slot->local = preq->packfile;
 	curl_easy_setopt(preq->slot->curl, CURLOPT_FILE, preq->packfile);
 	curl_easy_setopt(preq->slot->curl, CURLOPT_WRITEFUNCTION, fwrite);
 	curl_easy_setopt(preq->slot->curl, CURLOPT_URL, preq->url);
diff --git a/http.h b/http.h
index 3c332a9..7429381 100644
--- a/http.h
+++ b/http.h
@@ -49,7 +49,6 @@ struct slot_results {
 
 struct active_request_slot {
 	CURL *curl;
-	FILE *local;
 	int in_use;
 	CURLcode curl_result;
 	long http_code;
-- 
1.7.7.rc3.12.g571d67

^ permalink raw reply related

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Jeff King @ 2011-11-02 20:32 UTC (permalink / raw)
  To: Mika Fischer; +Cc: git, gitster, daniel
In-Reply-To: <1320265288-12647-2-git-send-email-mika.fischer@zoopnet.de>

On Wed, Nov 02, 2011 at 09:21:27PM +0100, Mika Fischer wrote:

> Instead of sleeping unconditionally for a 50ms, when no data can be read
> from the http connection(s), use curl_multi_fdset to obtain the actual
> file descriptors of the open connections and use them in the select call.
> This way, the 50ms sleep is interrupted when new data arrives.
> ---
>  http.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/http.c b/http.c
> index a4bc770..ae92318 100644
> --- a/http.c
> +++ b/http.c
> @@ -664,14 +664,14 @@ void run_active_slot(struct active_request_slot *slot)
>  		}
>  
>  		if (slot->in_use && !data_received) {
> -			max_fd = 0;
> +			max_fd = -1;
>  			FD_ZERO(&readfds);
>  			FD_ZERO(&writefds);
>  			FD_ZERO(&excfds);
> +			curl_multi_fdset(curlm, &readfds, &writefds, &excfds, &max_fd);
>  			select_timeout.tv_sec = 0;
>  			select_timeout.tv_usec = 50000;
> -			select(max_fd, &readfds, &writefds,
> -			       &excfds, &select_timeout);
> +			select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout);
>  		}
>  	}

Do we still need to care about data_received?

My understanding was that the code was originally trying to do:

  1. Call curl, maybe get some data.

  2. If we got data, then ask curl against immediately for some data.

  3. Otherwise, sleep 50ms and then ask curl again.

But now that we are actually selecting on the proper descriptors, it
should now be safe to just do:

  1. Call curl, maybe get some data.

  2. Call select, which will wake immediately if curl is going to get
     data.

At least that's my reading. I am working on unrelated patches that clean
up the handling of data_received, but if it could go away altogether,
that would be even simpler.

-Peff

^ permalink raw reply

* [PATCH 2/2] http.c: Use timeout suggested by curl instead of fixed 50ms timeout
From: Mika Fischer @ 2011-11-02 20:21 UTC (permalink / raw)
  To: git; +Cc: gitster, daniel, Mika Fischer
In-Reply-To: <1320265288-12647-1-git-send-email-mika.fischer@zoopnet.de>

Recent versions of curl can suggest a period of time the library user
should sleep and try again, when curl is blocked on reading or writing
(or connecting). Use this timeout instead of always sleeping for 50ms.
---
 http.c |   22 ++++++++++++++++++++--
 1 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/http.c b/http.c
index ae92318..e91a2ab 100644
--- a/http.c
+++ b/http.c
@@ -649,6 +649,9 @@ void run_active_slot(struct active_request_slot *slot)
 	fd_set excfds;
 	int max_fd;
 	struct timeval select_timeout;
+#if LIBCURL_VERSION_NUM >= 0x070f04
+	long curl_timeout;
+#endif
 	int finished = 0;
 
 	slot->finished = &finished;
@@ -664,13 +667,28 @@ void run_active_slot(struct active_request_slot *slot)
 		}
 
 		if (slot->in_use && !data_received) {
+#if LIBCURL_VERSION_NUM >= 0x070f04
+			curl_multi_timeout(curlm, &curl_timeout);
+			if (curl_timeout == 0) {
+				continue;
+			} else if (curl_timeout == -1) {
+				select_timeout.tv_sec  = 0;
+				select_timeout.tv_usec = 50000;
+			} else {
+				select_timeout.tv_sec  =  curl_timeout / 1000;
+				select_timeout.tv_usec = (curl_timeout % 1000) * 1000;
+			}
+#else
+			select_timeout.tv_sec  = 0;
+			select_timeout.tv_usec = 50000;
+#endif
+
 			max_fd = -1;
 			FD_ZERO(&readfds);
 			FD_ZERO(&writefds);
 			FD_ZERO(&excfds);
 			curl_multi_fdset(curlm, &readfds, &writefds, &excfds, &max_fd);
-			select_timeout.tv_sec = 0;
-			select_timeout.tv_usec = 50000;
+
 			select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout);
 		}
 	}
-- 
1.7.8.rc0.33.g09c6f.dirty

^ permalink raw reply related

* [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Mika Fischer @ 2011-11-02 20:21 UTC (permalink / raw)
  To: git; +Cc: gitster, daniel, Mika Fischer
In-Reply-To: <1320265288-12647-1-git-send-email-mika.fischer@zoopnet.de>

Instead of sleeping unconditionally for a 50ms, when no data can be read
from the http connection(s), use curl_multi_fdset to obtain the actual
file descriptors of the open connections and use them in the select call.
This way, the 50ms sleep is interrupted when new data arrives.
---
 http.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/http.c b/http.c
index a4bc770..ae92318 100644
--- a/http.c
+++ b/http.c
@@ -664,14 +664,14 @@ void run_active_slot(struct active_request_slot *slot)
 		}
 
 		if (slot->in_use && !data_received) {
-			max_fd = 0;
+			max_fd = -1;
 			FD_ZERO(&readfds);
 			FD_ZERO(&writefds);
 			FD_ZERO(&excfds);
+			curl_multi_fdset(curlm, &readfds, &writefds, &excfds, &max_fd);
 			select_timeout.tv_sec = 0;
 			select_timeout.tv_usec = 50000;
-			select(max_fd, &readfds, &writefds,
-			       &excfds, &select_timeout);
+			select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout);
 		}
 	}
 #else
-- 
1.7.8.rc0.33.g09c6f.dirty

^ permalink raw reply related

* [PATCHv2] Improve use of select in http backend
From: Mika Fischer @ 2011-11-02 20:21 UTC (permalink / raw)
  To: git; +Cc: gitster, daniel

I've split my previous patch into two, since the two parts actually do
different things.

The first patch uses curl_multi_fdset to use the file descriptors of the http
connections in the call to select, in effect waking up immediately when new
data arrives.

The second patch uses curl_multi_timeout (if available) to get a better
recommendation for how long to sleep from curl instead of always sleeping
for 50ms.

^ permalink raw reply

* Re: [ANNOUNCE] Git 1.7.8.rc0
From: Jeff King @ 2011-11-02 20:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Naewe, Michael J Gruber, git
In-Reply-To: <7vwrbiibgz.fsf@alter.siamese.dyndns.org>

On Wed, Nov 02, 2011 at 12:13:48PM -0700, Junio C Hamano wrote:

> > The simplest thing is to just drop the behavior in (2), and let it drop
> > to a 401. The extra round trip probably isn't that big a deal.
> 
> That is essentially what Stefan's fix is about.

Right. I think it may be the sanest thing to do.

> The cases we have "extra" roundtrip are:
> 
>  - when you have username@ in URL but no password is stored in .netrc;
>  - when you have username@ in URL and no $HOME/.netrc file.
> 
> and in such a case using URL without username@ in it as a workaround would
> save the roundtrip but forces you to type your username@ over and over
> again, which is _not_ a real workaround.

Yeah. There's no way for us to know before we hand off to curl what you
have in netrc. So these netrc cases will always be at odds with the
no-netrc case.

Normally I would say to implement in favor of the no-netrc case, as it
is probably more common (and will hopefully be more so after the auth
helpers are finished). But the problem is that the penalties are
different. On the one hand, we have the extra http round-trip. Which is
annoying, but mostly invisible to the user. But on the other, we have
git prompting the user unnecessarily, which is just awful.

> > The other option is to start parsing netrc ourselves, or do the extra
> > round trip if we detect ~/.netrc or something. But that last one is
> > getting pretty hackish.
> 
> I tend to agree that we wouldn't want to parse netrc ourselves (that is
> what library support e.g. CURLOPT_NETRC is for). The latter is hackish but
> on the other hand it is a cheap, simple and useful hack.

Note that it's not always right, of course. You might have a .netrc but
no entry for that host. But at least it lets the common case people
(i.e., people who never heard of or touched netrc) to avoid the round
trip.

> How would the upcoming keystore support fit in this picture, by the way?

Any time we would call getpass(), we ask the helper for the credential.
So for user@host, we would call out to the helper for the password
proactively, and otherwise wait for a 401.

We _could_ be proactive and actually ask the helpers for a username and
password even for "https://host/repo", which would save a round-trip to
get the 401 in some cases. But that assumes that asking the helper is
cheap. It might actually require the user inputting a password to unlock
the keystore, which would be annoying if the remote doesn't require
auth at all.

We could try to be clever and use a heuristic that fetch probably
doesn't need auth, but push does. Then fetch gets the extra round-trip
but push doesn't. But that just seems needlessly complex to save one
http round-trip on push.

-Peff

^ permalink raw reply

* Re: Q: "git diff" using tag names
From: Alexey Shumkin @ 2011-11-02 20:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ulrich Windl, git
In-Reply-To: <7vty6mkgsg.fsf@alter.siamese.dyndns.org>

Oh, I see

> Alexey Shumkin <alex.crezoff@gmail.com> writes:
> 
> >> Also it seems that both syntaxes work:
> >> git diff v0.4..v0.5
> >> git diff v0.4 v0.5
> > honestly, I do not know the difference (at the moment :))
> > may be gurus or manual will help to discover it
> 
> The latter is the kosher version, as diff is about two "endpoints"
> and not about "ranges". The only reason the former is parsed without
> erroring out is because too many people are used to type .. between
> two things without thinking, learned the notation from "git log",
> which _is_ about ranges.

^ permalink raw reply

* Re: Q: "git diff" using tag names
From: Alexey Shumkin @ 2011-11-02 20:08 UTC (permalink / raw)
  To: Frans Klaver; +Cc: Ulrich Windl, git
In-Reply-To: <CAH6sp9OdjNZ6_j-eSFMpecUcxW_y6fpkDZ0cRds62wOrc12x-Q@mail.gmail.com>

thanks )

> On Wed, Nov 2, 2011 at 10:29 AM, Alexey Shumkin
> <alex.crezoff@gmail.com> wrote:
> 
> >> Also it seems that both syntaxes work:
> >> git diff v0.4..v0.5
> >> git diff v0.4 v0.5
> > honestly, I do not know the difference (at the moment :))
> > may be gurus or manual will help to discover it
> 
> As per the git-diff documentation, these two versions behave equally
> -- i.e. no differences.
> 
> Comparing branches
> $ git diff topic master    <1>
> $ git diff topic..master   <2>
> $ git diff topic...master  <3>
> ‪1.‬ Changes between the tips of the topic and the master branches.
> ‪2.‬ Same as above.
> ‪3.‬ Changes that occurred on the master branch since when the topic
> branch was started off it.
> 
> Cheers,
> Frans

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-02 20:04 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, James Bottomley, Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <7vk47jld5s.fsf@alter.siamese.dyndns.org>

On Tue, Nov 1, 2011 at 2:56 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> But on the other hand, in many ways, publishing your commit to the outside
> world, not necessarily for getting pulled into the final destination
> (i.e. your tree) but merely for other people to try it out, is the point
> of no return (aka "don't rewind or rebase once you publish").  "pushing
> out" might be less special than "please pull", but it still is special.

So I really think that signing the top commit itself is fundamentally wrong.

That commit may not even be *yours*. You may have pulled it from a
sub-lieutenant as a fast-forward, or similar. Amending it later would
be actively very very *wrong*.

So quite frankly, I think the stuff in pu (or next?) is completely
mis-designed. Doing it in the commit is wrong for fundamental reasons,
which all boil down to a simple issue:

 - you absolutely *need* to add the signature later. You *cannot* do
it at "git commit" time.

That's a fundamental issue both from a "workflow model" issue (ie you
want to sign stuff after it has passed testing etc, but you may need
to commit it in order to *get* testing), as well as from a
"fundamental git datastructures" issue (ie you would want to sign
commits that aren't yours.

"git commit --amend" is not the answer - that destroys the fundamental
concept of history being immutable, and while it works for your local
commits, it doesn't work for anybody elses commits, or for stuff you
already pushed out.

And "add a fake empty commit just for the signature" is not the answer
either - because that is clearly inferior to the tags we already had.

I dunno. Did I miss something? As far as I can tell, the signed tags
that we've had since day one are *clearly* much better in very
fundamental ways.

                             Linus

^ permalink raw reply

* Re: [PATCH] t3200: add test case for 'branch -m'
From: Junio C Hamano @ 2011-11-02 19:43 UTC (permalink / raw)
  To: Stefan Naewe; +Cc: git, Tay Ray Chuan
In-Reply-To: <1320246425-2141-1-git-send-email-stefan.naewe@gmail.com>

Thanks, both.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox