Git development
 help / color / mirror / Atom feed
* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-03  2:19 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CA+55aFwXu=+HdQ5nW11Ts5p-V=KgpxjyagKqB+Xv+qBOEEWXvQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

On Wed, Nov 2, 2011 at 6:45 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>   [torvalds@i5 linux]$ git fetch git://github.com/rustyrussell/linux.git  refs/tags/rusty@rustcorp.com.au-v3.1-8068-g5087a50

So this trivial patch removes one line of code, and makes this actually work.

However, it also makes us fail many tests that *test* that we peeled
what we fetched. However, I think the tests are wrong.

If the tag doesn't resolve into a commit, we happily output the SHA1
of the tag itself - and we say that it shouldn't be merged.

And it the tag *does* resolve into a commit, why would we output the
SHA1 of the commit? The tag should be peeled properly later when it
gets used, so peeling it here seems to be just a misfeature that makes
signed tags not work well.

So I suspect we should just apply this patch, but I didn't check
exacty what the failed tests are - except for the first one, that just
compares against a canned response (and the canned response should
just be changed). Maybe there was some reason for the peeling,
although I suspect it was just a fairly mindless case of "make it a
commit, because the merge needs the commit" - never mind that the
merge would peel it anyway.

                           Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 550 bytes --]

 builtin/fetch.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 91731b909aeb..494a7f9976f8 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -436,8 +436,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 		note[note_len] = '\0';
 		fprintf(fp, "%s\t%s\t%s",
-			sha1_to_hex(commit ? commit->object.sha1 :
-				    rm->old_sha1),
+			sha1_to_hex(rm->old_sha1),
 			rm->merge ? "" : "not-for-merge",
 			note);
 		for (i = 0; i < url_len; ++i)

^ permalink raw reply related

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Shawn Pearce @ 2011-11-03  2:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CA+55aFwXu=+HdQ5nW11Ts5p-V=KgpxjyagKqB+Xv+qBOEEWXvQ@mail.gmail.com>

On Wed, Nov 2, 2011 at 18:45, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Nov 2, 2011 at 6:19 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I'm not saying that you shouldn't use them - go ahead and use the
>> feature if you like it. But please spare me your excuses for stupid
>> workarounds that come from the fact that they aren't a good match for
>> sane workflows.

We often disagree. :-)

> Btw, having now done odd things with signed tags (because we've used
> them as a side-band verification mechanism), I can certainly also say
> that the signed tags have their set of problems too.
...
> But practically, all of these issues should be pretty easily solvable.
> So it should be quite easy to make
>
>    git pull <repo> <tag-name>
>
> just do the right thing - including verifying the tag, and adding the
> information in the tag into the merge commit message.

Uhm, sure.

Quoting you 2 days ago:

On Mon, Oct 31, 2011 at 15:52, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Oct 31, 2011 at 3:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> So nobody is worried about this (quoting from my earlier message)?
>
> No, because you haven't been reading what we write.
>
> The tag is useless.
>
> The information *in* the tag is not. But it shouldn't be saved in the
> tag (or note, or whatever). Because that's just an annoying place for
> it to be, with no upside.
>
> Save it in the commit we generate. BAM! Useful, readable, permanent,
> and independently verifiable.

So you propose we put the tag contents into the merge commit message
so it can be verified after the fact? So merges are now going to be
something much more horrific to read, because it will end with Git
object tag cruft, the tag message, and the PGP signature spew that no
human can decode in the head?

Oh, right, tags are almost good enough. Elsewhere in this thread you
also stated we have to redo the way tags are signed so that the tag
message body itself is not part of the signature, allowing you to fix
spelin errors so you are not stuck with them in your commit history.
But I assume we will have to keep the more typical headers of object /
type / tag / tagger fields, as that is the key information the
signature needs to be over to be of any value. So now there will be
two different ways in which a Git annotated tag object will have its
signature created, as certainly you don't mean to remove the tag
message body from the PGP signature content for release tags.

I fail to see how shoving Git object data fields and a complete PGP
signature block into a merge commit message body, which will show by
default in all git log type tools, and exist in cherry-picks or
rebases that might make that data less valuable, is somehow better
than the gpgsig header that neatly tucks it away until requested. I
also fail to see how scraping the message body for the proper fields
in order to implement automated verification of the signature (because
no human can do it themselves and copy-paste sucks) is a good idea.
Everywhere else in Git that we have machine readable formats its very
well structured so that no guessing is required.

> So signed tags are not mis-designed from a conceptual standpoint -
> they just work really really awkwardly right now for what the kernel
> would like to do with them.
>
> With a few UI fixes, I think the signed tag thing would "just work".

Well, UI fixes, protocol changes, improvements to manage a large
reference space which we have previously said is an insane and stupid
workflow, etc. One reason you picked up all of those extra tags was
the include-tag capability kicking on and picking up older tag
history. We now have to disable it in certain cases.

Its not just a few UI fixes. And there is a lot more work to write a
verify for the tag contents+signature that appears in the body of a
merge commit message. Not to mention we now have to do that verify
logic twice, once in the signed pull request tag like but not quite a
tag but uses a tag thing you are advocating, and again for the merge
commit message body that contains the tag object data that we don't
normally show to an end user, but will now be in every merge commit
you make.

Go ahead and call me stupid, but this already is a bigger amount of
surgery to the git-core code, not to mention worse user experience for
the average `git log` reading human, than having a hidden by default
gpgsig header that might ask a contributor to take 2 extra seconds
before making a commit to consider the useful lifespan of that commit.
Or $DEITY forbid, write a new empty commit to record the equivalent of
their Signed-off-by.

Oh, and while I am on that subject...


<rant>
I have never grasped why sometimes a Signed-off-by is added to a
patch, and why sometimes its not. It seems to be this weird function
of "If the commit SHA-1 is already stable DON'T FUCKING TOUCH IT BY
ADDING SIGNED-OFF-BY IT RUINS THE HISTORY", but if you are too far
down the food chain to be fortunate enough for your commit SHA-1 to
remain frozen, the Signed-off-by has to be added to assert that the
code can be contributed. It sounds like the workflow developed around
where it wasn't acceptable to force history rewriting, you suffer by
not having the SOB, but whenever possible you force a history rewrite
on the contributor just so you can add a SOB and feel good about the
fact that the SOB is added to the commit message.

Get over it. Add the fucking empty commit to show the flow of a
change. Stop forcing every fucking contributor to rebase/rewrite his
commits just so someone higher up in the food chain can wank with
their SOB line.

Everyone I talk to that contributes code to the kernel who isn't Linus
or Ted Tso complains about this, and then asks me to fucking fix it.
They want stable SHA-1s so they know their change arrived into Linus'
tree unmolested. Unfortunately, despite their volume of changes, they
aren't high enough in the food chain to be this lucky. Nope, someone
has to wank their SOB in first. And maybe fix a spelin error.
</rant>

^ permalink raw reply

* Re: long fsck time
From: Nguyen Thai Ngoc Duy @ 2011-11-03  1:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List
In-Reply-To: <20111102213332.GA14108@sigill.intra.peff.net>

2011/11/3 Jeff King <peff@peff.net>:
> On Wed, Nov 02, 2011 at 07:10:26PM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> On Wed, Nov 2, 2011 at 7:06 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
>> > On git.git
>> >
>> > $ /usr/bin/time git fsck
>> > 333.25user 4.28system 5:37.59elapsed 99%CPU (0avgtext+0avgdata
>> > 420080maxresident)k
>> > 0inputs+0outputs (0major+726560minor)pagefaults 0swaps
>> >
>> > That's really long time, perhaps we should print progress so users
>> > know it's still running?
>>
>> Ahh.. --verbose. Sorry for the noise. Still good to show the number of
>> checked objects though.
>
> fsck --verbose is _really_ verbose. It could probably stand to have some
> progress meters sprinkled throughout. The patch below produces this on
> my git.git repo:


Yes, I wanted something like this.

>  $ git fsck
>  Checking object directories: 100% (256/256), done.
>  Verifying packs: 100% (7/7), done.
>  Checking objects (pack 1/7): 100% (241/241), done.
>  Checking objects (pack 2/7): 100% (176/176), done.
>  Checking objects (pack 3/7): 100% (312/312), done.
>  Checking objects (pack 4/7): 100% (252/252), done.
>  Checking objects (pack 5/7): 100% (353/353), done.
>  Checking objects (pack 6/7): 100% (375/375), done.
>  Checking objects (pack 7/7): 100% (171079/171079), done.

Would be better if we only output one "Checking objects" line.

> which gives reasonably smooth progress. The longest hang is that
> "Verifying pack" 7 is slow (I believe it's doing a sha1 over the whole
> thing). If you really wanted to get fancy, you could probably do a
> throughput meter as we sha1 the whole contents.

I'll give it a try.

> Patch is below. It would need --{no-,}progress support on the command
> line, and to check isatty(2) before it would be acceptable.

Agreed on isatty(), though I think this output should be default (with
maybe --quiet to silence it on tty). Other messages may be prepended
with severity to indicate they are not progress output.
-- 
Duy

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-03  1:45 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CA+55aFx0oCd6-sh0psYxho-s=sHAK0RHXJHfLewRuUcdXzxZbg@mail.gmail.com>

On Wed, Nov 2, 2011 at 6:19 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm not saying that you shouldn't use them - go ahead and use the
> feature if you like it. But please spare me your excuses for stupid
> workarounds that come from the fact that they aren't a good match for
> sane workflows.

Btw, having now done odd things with signed tags (because we've used
them as a side-band verification mechanism), I can certainly also say
that the signed tags have their set of problems too.

So signed tags aren't perfect. They were designed for making releases,
and that shows very clearly in how git works with them. The default
choices that git makes are very awkward indeed when you use signed
tags as "security tokens".

But unlike the "sign the commit" approach, those are implementation
and UI issues, not "fundamentally broken design" issues.

For example, fetching a single signed tag with git is surprisingly
hard. It *shouldn't* be hard - and there's no underlying technical or
design reason why it would be hard, but it is. Why? Because all the
git actions when it comes to tags are all geared towards one
particular use, that is *not* about the signature checking aspect of
them.

Here's an example: Rusty Russell now makes nice signed tags for the
things he asks me to pull, and then states them in the pull message.
So he will mention that he has a tag named

   rusty@rustcorp.com.au-v3.1-8068-g5087a50

in his git repository at

   git://github.com/rustyrussell/linux.git

and while I don't think his tag names are all that wonderful, it makes
sense from an automated script kind of standpoint.

Now, let's try to get that tag:

  [torvalds@i5 linux]$ git fetch
git://github.com/rustyrussell/linux.git
rusty@rustcorp.com.au-v3.1-8068-g5087a50
  fatal: Couldn't find remote ref rusty@rustcorp.com.au-v3.1-8068-g5087a50

oops. Ok, so his tag naming is *really* akward. Whatever. Let's try again:

   [torvalds@i5 linux]$ git fetch
git://github.com/rustyrussell/linux.git
refs/tags/rusty@rustcorp.com.au-v3.1-8068-g5087a50
   From git://github.com/rustyrussell/linux
    * tag
rusty@rustcorp.com.au-v3.1-8068-g5087a50 -> FETCH_HEAD

Ahh, success!

Oops. Nope. It turns out that git will *peel* the tag when you fetch
it, so FETCH_HEAD actually doesn't contain the tag object at all, but
the commit object that the tag pointed to. MAJOR FAIL.

Quite frankly, I think that's a git bug, but it's a git bug because
"git fetch" was designed to get the commit to merge. Fair enough.
Let's work around it, and rename the tag at the same time:

   [torvalds@i5 linux]$ git fetch
git://github.com/rustyrussell/linux.git
refs/tags/rusty@rustcorp.com.au-v3.1-8068-g5087a50:refs/tags/rusty
   From git://github.com/rustyrussell/linux
    * [new tag]
rusty@rustcorp.com.au-v3.1-8068-g5087a50 -> rusty
    * [new tag]
rusty@rustcorp.com.au-v3.1-2-gb1e4d20 ->
rusty@rustcorp.com.au-v3.1-2-gb1e4d20
    * [new tag]
rusty@rustcorp.com.au-v3.1-4896-g0acf000 ->
rusty@rustcorp.com.au-v3.1-4896-g0acf000
    * [new tag]
rusty@rustcorp.com.au-v3.1-8068-g5087a50 ->
rusty@rustcorp.com.au-v3.1-8068-g5087a50

WTF? Now we finally *did* get the tag, and we can do

   git verify-tag rusty

and that will work. But what the hell happened? We got three other
tags too that we didn't even ask for!

So we have actual git bugs here, that relate to the fact that we've
treated signed tags specially, and have magic code to basically say
"if there's a signed tag that is reachable from the thing you pull,
and you're not just doing a temporary pull into FETCH_HEAD, we'll
fetch that signed tag too".

Again - not a fundamental design mistake in the data structures, and
it actually made sense from a "signed tags are important release
points" standpoint, but it makes it *really* inconvenient to use
signed tags for signature verification.

Also, the fact that the signed tag gets peeled when we do fetch into
FETCH_HEAD also means that we can't actually save the signature in
resulting the merge commit. The merge, instead of being able to
perhaps save the information that we merged a nice trusted signed
point, only has the commit.

But practically, all of these issues should be pretty easily solvable.
So it should be quite easy to make

    git pull <repo> <tag-name>

just do the right thing - including verifying the tag, and adding the
information in the tag into the merge commit message.

So signed tags are not mis-designed from a conceptual standpoint -
they just work really really awkwardly right now for what the kernel
would like to do with them.

With a few UI fixes, I think the signed tag thing would "just work".

That said, I do think that the "signature in the pull request" should
also "just work", and I'm not entirely sure which one is better. It
might be more convenient to get the signature data from the pull
request. So I'm not at all married the the notion of using signed tags
for this.

                       Linus

^ permalink raw reply

* Re: t5800-*.sh: Intermittent test failures
From: Junio C Hamano @ 2011-11-03  1:30 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Alex Riesen, Ævar Arnfjörð, Ramsay Jones,
	Jeff King, GIT Mailing-list, Jonathan Nieder
In-Reply-To: <CAGdFq_h+Hpv9perLTU2rbdT6oZ3kZy22t5nghJQeEjNGvunL+A@mail.gmail.com>

Sverre Rabbelier <srabbelier@gmail.com> writes:

> Ævar, this seems like something we could look at during the mini
> GitTogether in Amsterdam this Saturday, no?

Have fun.

I think I happened to hit this while testing today's 'pu' that hasn't been
pushed out. The process chain looks like this:

pid  command                     stuck at
4767 sh t5800-remote-helpers.sh  wait4(-1)
 4793 git push                   read(6)
  4809 git-remote-testgit        wait4(4906)
   4906 git fast-import          wait4(4912)
    4912 git-fast-import         read(0)

lr-x------ 1 junio junio 64 Nov  2 18:21 /proc/4793/fd/6 -> pipe:[133037701]
l-wx------ 1 junio junio 64 Nov  2 18:21 /proc/4793/fd/7 -> pipe:[133037700]
lr-x------ 1 junio junio 64 Nov  2 18:21 /proc/4793/fd/8 -> pipe:[133037701]
lr-x------ 1 junio junio 64 Nov  2 18:05 /proc/4809/fd/0 -> pipe:[133037700]
l-wx------ 1 junio junio 64 Nov  2 18:05 /proc/4809/fd/1 -> pipe:[133037701]
lr-x------ 1 junio junio 64 Nov  2 18:05 /proc/4906/fd/0 -> pipe:[133037700]
l-wx------ 1 junio junio 64 Nov  2 18:05 /proc/4906/fd/1 -> pipe:[133037701]
lr-x------ 1 junio junio 64 Nov  2 18:03 /proc/4912/fd/0 -> pipe:[133037700]
l-wx------ 1 junio junio 64 Nov  2 18:03 /proc/4912/fd/1 -> pipe:[133037701]

So "git push (4793)" is stuck reading from pipe:[133037701], expecting the
innermost "git-fast-import (4912)" to write to it via its standard output,
but the latter is waiting to read from pipe:[133037700], hoping the former
to write to it via its fd#7.

Does this deadlock ring a bell to anybody who's involved in these
codepaths?

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-03  1:19 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CAJo=hJv5nAKH_ptYSWfMvFQv0Dj+naPXK35wSzKYkfPOYsWkxg@mail.gmail.com>

On Wed, Nov 2, 2011 at 6:02 PM, Shawn Pearce <spearce@spearce.org> wrote:
>>
>> So I really think that signing the top commit itself is fundamentally wrong.
>
> I really disagree. I like the signed commit approach.

If you like it so much, go ahead and use them.

But stop with the crazy excuses for the downsides. I explained exactly
why amending is stupid and wrong, and why empty commits are f*cking
moronic. But even apart from the *technical* problems with the stupid
mis-designed feature, I explained why it was fundamentally broken from
a workflow standpoint too.

I'm not saying that you shouldn't use them - go ahead and use the
feature if you like it. But please spare me your excuses for stupid
workarounds that come from the fact that they aren't a good match for
sane workflows.

                       Linus

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Shawn Pearce @ 2011-11-03  1:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <CA+55aFz7TeQQH3D4Tpp31cZYZoQKeK37jouo+2Kh61Wa07knfw@mail.gmail.com>

On Wed, Nov 2, 2011 at 13:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Nov 1, 2011 at 2:56 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> But on the other hand, in many ways, publishing your commit to the outside
>> world, not necessarily for getting pulled into the final destination
>> (i.e. your tree) but merely for other people to try it out, is the point
>> of no return (aka "don't rewind or rebase once you publish").  "pushing
>> out" might be less special than "please pull", but it still is special.
>
> So I really think that signing the top commit itself is fundamentally wrong.

I really disagree. I like the signed commit approach. It allows for a
lot more workflows than just providing a way for you to validate a
pull from a trusted lieutenant. Debian/Gentoo folks want a way to sign
every commit in their workflow. Just because you don't want that and
think its crazy doesn't mean its not a valid workflow for that
community and is something Git shouldn't support. I never use `git
stash`. I hate the damn command. Yet its still there. I just choose
not to use it. Junio's gpgsig header on each commit is also optional,
and communities/contributors can choose to use (or ignore) the feature
as they need to.

> That commit may not even be *yours*. You may have pulled it from a
> sub-lieutenant as a fast-forward, or similar. Amending it later would
> be actively very very *wrong*.

Obviously you shouldn't amend a commit that would otherwise be a
fast-forward. But why not write a new empty signed commit on top, and
teach `git log` without the verify signatures flag to skip over
commits that have a gpgsig header line, have exactly one parent, and
whose parent tree matches the commit's own tree? This removes these
commits from the normal `git log` revision output, but yet the flow of
changes is still very visible within the history.

As I understand it, the point of multiple Signed-off-by lines in
commit message bodies is to show the flow of a change, who reviewed
and applied a given commit, until it finally lands in a tree where its
commit SHA-1 is frozen in stone and you can later pull it. The empty
signed commit on top of a fast-forward provides that same flow of a
change, readily visible with standard `git log` tools, but doesn't
have to clutter up history if we teach log how to skip this particular
type. Similar to the --no-merges way to skip merges. :-)

> So quite frankly, I think the stuff in pu (or next?) is completely
> mis-designed. Doing it in the commit is wrong for fundamental reasons,
> which all boil down to a simple issue:

Totally disagree. I'm really in favor of embedding these into the
commit headers the way Junio has done.

>  - you absolutely *need* to add the signature later. You *cannot* do
> it at "git commit" time.

Why can't you add it at commit time? What is stopping me from running
`git commit -S` every time I make a commit? Is it that my fingers will
wear out more quickly because I have to type my pass-phrase too often?

What is wrong with making a signed commit on a commit I have a high
level of confidence in, but not signing the others? In my own workflow
I make a lot of commit --amends  / rebases until I am pretty confident
in the code being written and organized the way I think it should be
for distribution to others. But at some point in that workflow I'm
doing an --amend or a rebase to make that last final touch, and during
that commit I can add -S to make it signed, because I'm pretty certain
its ready to go. At that point, barring some horrific bug or reviewer
comments, I am unlikely to change the commit. I know at the time I
make that commit that I am pretty confident in the commit, so I take
the extra few key strokes to sign it.

> That's a fundamental issue both from a "workflow model" issue (ie you
> want to sign stuff after it has passed testing etc,

Why do I have to wait until its tested to sign it? The gpgsig
signature isn't any more special than the Signed-off-by line I put
into my commit message to agree to the developer's certificate of
origin, nor is it any more special than the committer line in the
commit header. Its just a statement on the commit that I have a
reasonable enough confidence in the value of this particular commit
and its ancestors that I should take the time to unlock my GPG key and
sign the content in case I do distribute this to others.

If you are going to spend time testing a commit, its probably going to
take longer to perform that testing than it is to perform the GPG key
unlock and signature. So why are you complaining about the time it
takes to sign something you think is worthy of testing?  If the tests
fail, you'll need to rewind/amend/whatever to address the breakage. If
the tests pass, the commit is already signed and ready for
distribution. If you are spending a lot of time signing commits that
are highly likely to fail tests, well, maybe you should look at other
ways to improve your workflow so that you have a higher level of
confidence in the code you record and assume will be a permanent part
of the project's history.

> but you may need
> to commit it in order to *get* testing),

Maybe consider allowing a ".dirty" suffix like git-core does on
builds? Or if you are submitting the code to a remote test cluster
that auto-compiles the code for you (and that is why you need a
commit), it sounds like the time it takes for that to push, compile,
test, and report back is way higher than the time it takes to make the
signature. So you probably should only be submitting something that
you had a reasonable level of confidence in. So you should go ahead
and sign it before sending it for testing, in case the tests do pass
and you want to publish that commit.

> as well as from a
> "fundamental git datastructures" issue (ie you would want to sign
> commits that aren't yours.

Sure. But this is why you can make an empty commit and sign that.

> "git commit --amend" is not the answer - that destroys the fundamental
> concept of history being immutable, and while it works for your local
> commits, it doesn't work for anybody elses commits, or for stuff you
> already pushed out.

Nobody said you had to amend everything. You can add an empty commit.

> And "add a fake empty commit just for the signature" is not the answer
> either - because that is clearly inferior to the tags we already had.

Really? I disagree. The commit DAG scales quite well. The tag
namespace does not. A refs/signatures/$COMMIT_SHA1 namespace also does
not scale well.

An empty commit with a gpgsig header has about the same object cost as
an annotated tag once packed. But it has the advantage that the damn
thing doesn't clog up the reference space, the reference handling
code, or the advertisements in the native protocol. As history goes
on, older signatures are less relevant, and automatically are
avoided/skipped/bypassed by the normal DAG walking code. Tags don't do
this well because they have no relationship to the project history.

The only downside to an empty commit with the gpgsig header is I
cannot grab an arbitrarily deep ancestor and say "Who has signed a
commit that depends on this"? Today we already have this with git
describe --contains (aka git name-rev) for annotated tags. Its a new
feature we have to teach to some part of the log machinery, but the
algorithm will be easier because it doesn't have to mess with the
mapping table of tag objects. It just has to start digging from roots,
remembering each commit that has a gpgsig on any given branch path,
and then outputting the matches when it finds the commit in question.

The commit approach also has the advantage that your tree
automatically carries any lieutenant's signatures, by virtue of them
already being frozen in the commits.  This allows anyone downstream of
you to verify the same signatures, and check them against their own
keyring contents. If the signatures are all detached in some transient
annotated tag space, its impossible for anyone other than you to
verify pull requests. I would hate to say we have this nice
distributed version control system, but only Linus can prove the pull
requests in his repository are what they claim, and we have to then
implicitly trust you to resign that data without the original
signatures being present. $DAY_JOB would feel a lot better about the
integrity of the Linux kernel repository if _ANYONE_ can validate pull
requests offline after they have happened.

> I dunno. Did I miss something? As far as I can tell, the signed tags
> that we've had since day one are *clearly* much better in very
> fundamental ways.

Completely disagree. :-)

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Shawn Pearce @ 2011-11-03  0:06 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Jonathan Nieder, netroby, Git Mail List,
	Tomas Carnecky
In-Reply-To: <20111102232735.GA17466@sigill.intra.peff.net>

On Wed, Nov 2, 2011 at 16:27, Jeff King <peff@peff.net> wrote:
> On Wed, Nov 02, 2011 at 03:41:36PM -0700, Junio C Hamano wrote:
>> Jeff King <peff@peff.net> writes:
>>
>> > Which is all a roundabout way of saying that the git protocol is really
>> > the sane way to do efficient transfers. An alternative, much simpler
>> > scheme would be for the server to just say:
>> >
>> >   - if you have nothing, then prime with URL http://host/bundle
>> >
>> > And then _only_ clone would bother with checking mirrors. People doing
>> > fetch would be expected to do it often enough that not being resumable
>> > isn't a big deal.
>>
>> I think that is a sensible place to start.

Yup, I agree. The "repo" tool used by Android does this in Python
right now[1].  Its a simple hack, if the protocol is HTTP or HTTPS the
client first tries to download $URL/clone.bundle. My servers have
rules that trap on */clone.bundle and issue an HTTP 302 Found response
to direct the client to a CDN. Works. :-)

[1] http://code.google.com/p/git-repo/source/detail?r=f322b9abb4cadc67b991baf6ba1b9f2fbd5d7812&name=stable

> OK. That had been my original intent, but somebody (you?) mentioned the
> "if you have X" thing at the GitTogether, which got me thinking.
>
> I don't mind starting slow, as long as we don't paint ourselves into a
> corner for future expansion. I'll try to design the data format for
> specifying the mirror locations with that extension in mind.

Right. Aside from the fact that $URL/clone.bundle is perhaps a bad way
to decide on the URL to actually fetch (and isn't supportable over
git:// or ssh://)... we should start with the clone case and worry
about incremental updates later.

> Even if the bundle thing ends up too wasteful, it may still be useful to
> offer a "if you don't have X, go see Y" type of mirror when "Y" is
> something efficient, like git:// at a faster host (i.e., the "I built 3
> commits on top of Linus" case).

Actually, I really think the bundle thing is wasteful. Its a ton of
additional disk. Hosts like kernel.org want to use sendfile() when
possible to handle bulk transfers. git:// is not efficient for them
because we don't have sendfile() capability.

Its also expensive for kernel.org to create each Git repository twice
on disk. The disk is cheap. Its the kernel buffer cache that is damned
expensive. Assume for a minute that Linus' kernel repository is a
popular thing to access. If 400M of that history is available in a
normal pack file on disk, and again 400M is available as a "clone
bundle thingy", kernel.org now has to eat 800M of disk buffer cache
for that one Git repository, because both of those files are going to
be hot.

I think I messed up with "repo" using a Git bundle file as its data
source. What we should have done was a bog standard pack file. Then
the client can download the pack file into the .git/objects/pack
directory and just generate the index, reusing the entire dumb
protocol transport logic. It also allows the server to pass out the
same file the server retains for the repository itself, and thus makes
the disk buffer cache only 400M for Linus' repository.

> Agreed. I was really trying to avoid protocol extensions, though, at
> least for an initial version. I'd like to see how far we can get doing
> the simplest thing.

One (maybe dumb idea I had) was making the $GIT_DIR/objects/info/packs
file contain other lines to list reference tips at the time the pack
was made. The client just needs the SHA-1s, it doesn't necessarily
need the branch names themselves. A client could initialize itself by
getting this set of references, creating temporary dummy references at
those SHA-1s, and downloading the corresponding pack file, indexing
it, then resuming with a normal fetch.

Then we wind up with a git:// or ssh:// protocol extension that
enables sendfile() on an entire pack, and to provide the matching
objects/info/packs data to help a client over git:// or ssh://
initialize off the existing pack files.


Obviously there is the existing security feature that over git:// or
ssh:// (or even smart HTTP), a deleted or rewound reference stops
exposing the content in the repository that isn't reachable from the
other reference tips. The repository owner / server administrator will
have to make a choice here, either the existing packs are not exposed
as available via sendfile() until after GC can be run to rebuild them
around the right content set, or they are exposed and the time to
expunge/hide an unreferenced object is expanded until the GC completes
(rather than being immediate after the reference updates).

But either way, I like the idea of coupling the "resumable pack
download" to the *existing* pack files, because this is easy to deal
with. If you do have a rewind/delete and need to expunge content,
users/administrators already know how to run `git gc --expire=now` to
accomplish a full erase. Adding another thing with bundle files
somewhere else that may or may not contain the data you want to erase
and remembering to clean that up is not a good idea.

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Linus Torvalds @ 2011-11-02 23:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, James Bottomley, Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <7vsjm6gkte.fsf@alter.siamese.dyndns.org>

On Wed, Nov 2, 2011 at 4:34 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> You keep saying cut-and-paste, but do you mind feeding the e-mail text
> itself to a tool, instead of cut-and-paste?

Feeding the email to a tool is actually a fair amount of extra work.
It would have worked well in the days when I used text-based email
clients that just had a "pipe email to command" model, but that's long
gone.

In contrast, cut-and-paste to another program is easy - but then you
really can't depend on whitespace or headers or other subtle things.

> A respond-to-request-pull wrapper you would use could be:
>
>  - Get the e-mail from the standard input;
>  - Pick up the signed bits and validate the signature;
>  - Perform the requested fetch; and
>  - Record the merge (or prepare .git/MERGE_MSG) with both the signed bits.

So is there any reason this couldn't be cut-and-paste? Make the signed
part small (*not* including diffstat and shortlog), and make it
whitespace-safe, and I wouldn't mind a tool at all.

If it *can* take the whole email, that would probably be a good design
(so that a "pipe email to command"  model would still work), but it
would be much better if it doesn't require it.

> and the "signed bits" could include:
>
>   - the repository and the branch you were expected to pull;
>   - the topic description.
>
> among other things the requestor can edit when request-pull message is
> prepared.

One thing I'd like is that it would also fire up an editor for the
merge, even if it gets the topic description from the email or
cut-and-paste. I often want to fix up peoples grammar etc. That's a
separate argument for trying to keep the signed part minimal - because
 I really don't want to have to maintain spelin errors just because
they are part of what was signed..

                  Linus

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: david @ 2011-11-02 23:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, git, James Bottomley, Jeff Garzik, Andrew Morton,
	linux-ide, LKML
In-Reply-To: <7vsjm6gkte.fsf@alter.siamese.dyndns.org>

On Wed, 2 Nov 2011, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> I hate how anonymous our branches are. Sure, we can use good names for
>> them, but it was a mistake to think we should describe the repository
>> (for gitweb), rather than the branch.
>>
>> Ok, "hate" is a strong word. I don't "hate" it. I don't even think
>> it's a major design issue. But I do think that it would have been
>> nicer if we had had some branch description model.
>> ...
>> Maybe just verifying the email message (with the suggested kind of
>> change to "git request-pull") is actually the right approach. And what
>> I should do is to just wrap my "git pull" in some script that I can
>> just cut-and-paste the gpg-signed thing into, and which just does the
>> "gpg --verify" on it, and then does the "git pull" after that.
>>
>> Because in many ways, "git request-pull" is when you do want to sign
>> stuff. A developer might well want to push out his stuff for some
>> random internal testing (linux-next, for example), and then only later
>> decide "Ok, it was all good, now I want to make it 'official' and ask
>> Linus to pull it", and sign it at *that* time, rather than when
>> actually pushing it out.
>
> You keep saying cut-and-paste, but do you mind feeding the e-mail text
> itself to a tool, instead of cut-and-paste?

think webmail (i.e. gmail), to feed the e-mail itself to a tool you either 
need to cut-n-paste the entire e-mail or you have to first save the mail 
to a text file. both of which are significantly harder than doing a 
cut-n-past of a portion of the message.

David Lang

> The reason I am wondering about this is because in another topic (also in
> 'next') cooking there is an extended support for topic description for the
> branch that states what the purpose of the topic is why the requestor
> wants you to have it (this information can be set and updated with "git
> branch --edit-description").
>
> A respond-to-request-pull wrapper you would use could be:
>
> - Get the e-mail from the standard input;
> - Pick up the signed bits and validate the signature;
> - Perform the requested fetch; and
> - Record the merge (or prepare .git/MERGE_MSG) with both the signed bits.
>
> and the "signed bits" could include:
>
>   - the repository and the branch you were expected to pull;
>   - the topic description.
>
> among other things the requestor can edit when request-pull message is
> prepared.
>
> That would get us back to your "the lieutenant tip is not so special, but
> the merge commit the integrator makes using that tip has the signature for
> this particular pull" model.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: Fork freedesktop project to bitbucket, make changes, generate patch back to freedesktop?
From: Jeff King @ 2011-11-02 23:37 UTC (permalink / raw)
  To: Alec Taylor; +Cc: git
In-Reply-To: <CAO+9iGeHSsJz+7=N0BzmGGbkGN1P=CyNvxJWO_1nCNjiZZzetA@mail.gmail.com>

On Sat, Oct 29, 2011 at 02:35:42PM +1100, Alec Taylor wrote:

> I am working with a team extending the functionality of this project.
> 
> After many MANY adds, commits and pushes back and forth on the
> bitbucket project, we then want to send this freedesktop project a
> PATCH with the changes we've made.
> 
> Can you tell me the command I need to do this?

Do you want to send them one patch, or a series of patches?

If one, then you probably want to diff off of some known point (either
their current branch tip, or maybe some recently released version). And
then send them the resulting diff in an email. You can just use "git
diff" for this if you want, and include it in an email, or you can
actually create a new "squashed" commit in git, like this:

  git checkout v1.0 ;# or wherever you think they would want to apply
  git merge --squash your-branch
  git commit

and then use "git format-patch" to create a patch (and optionally
git-send-email to send it).

If you want to share the whole series, you can use format-patch to
create the series, but note that a patch series can only represent a
linear history. If you have a lot of merges from pushing back and forth,
you may want to linearize it first using "git rebase -i".

That's just a high level overview of what you'll need. You can try
reading up on those commands to get a better sense of exactly how you
want to proceed, or if you have more specific questions, ask.

-Peff

^ permalink raw reply

* Re: t5800-*.sh: Intermittent test failures
From: Sverre Rabbelier @ 2011-11-02 23:35 UTC (permalink / raw)
  To: Alex Riesen, Ævar Arnfjörð
  Cc: Junio C Hamano, Ramsay Jones, Jeff King, GIT Mailing-list,
	Jonathan Nieder
In-Reply-To: <CALxABCbKSi-aHezjyn5wJ0-BPW1PvvaC2i9VeV7yXOf4yCdx4Q@mail.gmail.com>

Heya,

On Wed, Nov 2, 2011 at 00:02, Alex Riesen <raa.lkml@gmail.com> wrote:
> On Tue, Nov 1, 2011 at 23:18, Junio C Hamano <gitster@pobox.com> wrote:
>> Alex Riesen <raa.lkml@gmail.com> writes:
>>
>>> On Sun, Sep 11, 2011 at 21:14, Ramsay Jones <ramsay@ramsay1.demon.co.uk> wrote:
>>>> ... these hangs *are* the failures of which I speak!  Yes, the script
>>>> doesn't get to declare a failure, but AFAIAC a hanging test (and it
>>>> isn't the same test # each time) is a failing test. :-D
>>>
>>> Was there any outcome of this discussion? I'm asking because I
>>> can reproduce this very reliably on a little server here.
>>
>> I do remember this discussion and recall seeing _no_ outcome.
>>
>> I did see the hang myself once or twice but did not and do not have a
>> reliable reproduction. I have been waiting for somebody to raise the issue
>> again ;-).
>>
>
> I think I managed to bisect it (between 1.7.6 and 1.7.7):
>
> $ git bisect start v1.7.7 v1.7.6
> ...
> $ git bisect good
> a515ebe9f1ac9bc248c12a291dc008570de505ca is the first bad commit
> commit a515ebe9f1ac9bc248c12a291dc008570de505ca
> Author: Sverre Rabbelier <srabbelier@gmail.com>
> Date:   Sat Jul 16 15:03:40 2011 +0200
>
>    transport-helper: implement marks location as capability
>
>    Now that the gitdir location is exported as an environment variable
>    this can be implemented elegantly without requiring any explicit
>    flushes nor an ad-hoc exchange of values.
>
>    Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
>    Acked-by: Jeff King <peff@peff.net>
>    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>
> :100644 100644 1ed7a5651ef5a2320c56856b5a1fe784e178ab23
> e9c832bfd3da7db771cc2113027d3e590dc51d59 M      git-remote-testgit.py
> :100644 100644 0cfc9ae9059ce121b567406d7941b71cd54b961c
> 74c3122df1835c45a6b621205fb18b4fc89af366 M      transport-helper.c
>
> Sadly, I'm going to be able to repeat the test in about 20 hours.

Ævar, this seems like something we could look at during the mini
GitTogether in Amsterdam this Saturday, no?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Junio C Hamano @ 2011-11-02 23:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: git, James Bottomley, Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <CA+55aFx_rAA6TJkZn1Zvu6u9UjxnmTVt0HpMnvaE_q9Sx-jzPg@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> I hate how anonymous our branches are. Sure, we can use good names for
> them, but it was a mistake to think we should describe the repository
> (for gitweb), rather than the branch.
>
> Ok, "hate" is a strong word. I don't "hate" it. I don't even think
> it's a major design issue. But I do think that it would have been
> nicer if we had had some branch description model.
> ...
> Maybe just verifying the email message (with the suggested kind of
> change to "git request-pull") is actually the right approach. And what
> I should do is to just wrap my "git pull" in some script that I can
> just cut-and-paste the gpg-signed thing into, and which just does the
> "gpg --verify" on it, and then does the "git pull" after that.
>
> Because in many ways, "git request-pull" is when you do want to sign
> stuff. A developer might well want to push out his stuff for some
> random internal testing (linux-next, for example), and then only later
> decide "Ok, it was all good, now I want to make it 'official' and ask
> Linus to pull it", and sign it at *that* time, rather than when
> actually pushing it out.

You keep saying cut-and-paste, but do you mind feeding the e-mail text
itself to a tool, instead of cut-and-paste?

The reason I am wondering about this is because in another topic (also in
'next') cooking there is an extended support for topic description for the
branch that states what the purpose of the topic is why the requestor
wants you to have it (this information can be set and updated with "git
branch --edit-description").

A respond-to-request-pull wrapper you would use could be:

 - Get the e-mail from the standard input;
 - Pick up the signed bits and validate the signature;
 - Perform the requested fetch; and
 - Record the merge (or prepare .git/MERGE_MSG) with both the signed bits.

and the "signed bits" could include:

   - the repository and the branch you were expected to pull;
   - the topic description.

among other things the requestor can edit when request-pull message is
prepared.

That would get us back to your "the lieutenant tip is not so special, but
the merge commit the integrator makes using that tip has the signature for
this particular pull" model.

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Jeff King @ 2011-11-02 23:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, netroby, Git Mail List, Tomas Carnecky
In-Reply-To: <7vwrbigna7.fsf@alter.siamese.dyndns.org>

On Wed, Nov 02, 2011 at 03:41:36PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Which is all a roundabout way of saying that the git protocol is really
> > the sane way to do efficient transfers. An alternative, much simpler
> > scheme would be for the server to just say:
> >
> >   - if you have nothing, then prime with URL http://host/bundle
> >
> > And then _only_ clone would bother with checking mirrors. People doing
> > fetch would be expected to do it often enough that not being resumable
> > isn't a big deal.
> 
> I think that is a sensible place to start.

OK. That had been my original intent, but somebody (you?) mentioned the
"if you have X" thing at the GitTogether, which got me thinking.

I don't mind starting slow, as long as we don't paint ourselves into a
corner for future expansion. I'll try to design the data format for
specifying the mirror locations with that extension in mind.

Even if the bundle thing ends up too wasteful, it may still be useful to
offer a "if you don't have X, go see Y" type of mirror when "Y" is
something efficient, like git:// at a faster host (i.e., the "I built 3
commits on top of Linus" case).

> A more fancy conditional "If you have X then fetch this, if you have Y
> fetch that, ..." sounds nice but depending on what branch you are fetching
> the answer has to be different. If we were to do that, the natural place
> for the server to give the redirect instruction to the client is after the
> client finishes saying "want", and before the client starts saying "have".

Agreed. I was really trying to avoid protocol extensions, though, at
least for an initial version. I'd like to see how far we can get doing
the simplest thing.

-Peff

^ permalink raw reply

* Re: git-p4: problem with commit 97a21ca50ef8
From: Michael Wookey @ 2011-11-02 22:42 UTC (permalink / raw)
  To: Vitor Antunes; +Cc: Pete Wyckoff, Git Mailing List, Luke Diamand
In-Reply-To: <loom.20111102T153631-769@post.gmane.org>

On 3 November 2011 01:43, Vitor Antunes <vitor.hda@gmail.com> wrote:
> Michael Wookey <michaelwookey <at> gmail.com> writes:
>> Of course, I'd love to have git-p4 work seamlessly for this scenario.
>> Even Perforce have a KB article on the limitation of the "apple"
>> filetype with git-p4:
>>
>>   http://kb.perforce.com/article/1417/git-p4
>>
> """
> Step 2: Download Git-p4
>
> Recommended version is ermshiperete’s branch, which is available from:
>
> https://github.com/ermshiperete/git-p4
>
> Note: Omit the “git-p4.py25” file, which is an older version that is no longer
> needed.
> Avoid Kernel.org’s Version of Git-p4
>
> Git’s main source at http://git-scm.com/download and
> http://www.kernel.org/pub/software/scm/git/ contains an older version of Git-p4
> with limitations that ermshiperete’s branch avoids.
> """
>
> I can almost guess _who_ wrote this KB ;)
>
> But this is really frustrating. Why can't people just cooperate to make sure the
> version in the main branch is the latest?

I tried your suggested version of git-p4 (at rev 630fb678c46c) and
unfortunately, the perforce repository fails to import. Firstly, there
was a problem with importing UTF-16 encoded files, secondly the
"apple" filetype files are still skipped.

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Junio C Hamano @ 2011-11-02 22:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, netroby, Git Mail List, Tomas Carnecky
In-Reply-To: <20111102220614.GB14108@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Which is all a roundabout way of saying that the git protocol is really
> the sane way to do efficient transfers. An alternative, much simpler
> scheme would be for the server to just say:
>
>   - if you have nothing, then prime with URL http://host/bundle
>
> And then _only_ clone would bother with checking mirrors. People doing
> fetch would be expected to do it often enough that not being resumable
> isn't a big deal.

I think that is a sensible place to start.

A more fancy conditional "If you have X then fetch this, if you have Y
fetch that, ..." sounds nice but depending on what branch you are fetching
the answer has to be different. If we were to do that, the natural place
for the server to give the redirect instruction to the client is after the
client finishes saying "want", and before the client starts saying "have".

^ permalink raw reply

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Daniel Stenberg @ 2011-11-02 22:40 UTC (permalink / raw)
  To: Mika Fischer; +Cc: Jeff King, git, gitster
In-Reply-To: <CAOs=hR+QqUpYuth8Uvi2o7pm1LO8ogO2pN7nrMchYj96Cutmww@mail.gmail.com>

On Wed, 2 Nov 2011, Mika Fischer wrote:

> The only problem I can see is that curl_multi_fdset is not guaranteed to 
> return any fds. So in theory it could be possible that we don't get fds, but 
> we're actually reading stuff. In this case things would get slow, because we 
> would sleep for 50ms after every read...
>
> However, I don't know if this is a case that actually comes up in the real 
> world. Maybe Daniel has some advice on this.

It doesn't really happen so it should be safe.

The case where no fds are returned is when libcurl cannot return a socket to 
wait for during name resolving (if your particular libcurl is built to use 
such a resolver backend - libcurl has several different ones). And during name 
resolving there won't be any data to read for the libcurl-app anyway.

-- 

  / daniel.haxx.se

^ permalink raw reply

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Mika Fischer @ 2011-11-02 22:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, daniel
In-Reply-To: <20111102203221.GB5628@sigill.intra.peff.net>

On Wed, Nov 2, 2011 at 21:32, Jeff King <peff@peff.net> wrote:
> Do we still need to care about data_received?
>
> My understanding was that the code was originally trying to do:
>
>  1. Call curl, maybe get some data.
>
>  2. If we got data, then ask curl against immediately for some data.
>
>  3. Otherwise, sleep 50ms and then ask curl again.

Yes, that's exactly what it did.

> But now that we are actually selecting on the proper descriptors, it
> should now be safe to just do:
>
>  1. Call curl, maybe get some data.
>
>  2. Call select, which will wake immediately if curl is going to get
>     data.

The only problem I can see is that curl_multi_fdset is not guaranteed
to return any fds. So in theory it could be possible that we don't get
fds, but we're actually reading stuff. In this case things would get
slow, because we would sleep for 50ms after every read...

However, I don't know if this is a case that actually comes up in the
real world. Maybe Daniel has some advice on this.

Best,
 Mika

^ permalink raw reply

* Re: [ANNOUNCE] Git 1.7.7.2
From: Junio C Hamano @ 2011-11-02 22:30 UTC (permalink / raw)
  To: Stefan Roas; +Cc: git, Linux Kernel
In-Reply-To: <20111102214725.GA2860@geminga.roas.networks.roath.org>

Stefan Roas <sroas@roath.org> writes:

> is it possible that you forgot to update the GIT-VERSION-GEN with the
> release of 1.7.7.2? I stll get version 1.7.7.1 from the tarball on
> http://git-scm.com/ and when building from the git repository itself.

Probably.

^ permalink raw reply

* Re: [PATCH] Escape file:// URL's to meet subversion SVN::Ra requirements
From: Eric Wong @ 2011-11-02 22:09 UTC (permalink / raw)
  To: Ben Walton; +Cc: Jonathan Nieder, git
In-Reply-To: <1320260449-sup-479@pinkfloyd.chass.utoronto.ca>

Ben Walton <bwalton@artsci.utoronto.ca> wrote:
> Sorry for the clumsy patch.

I don't have much time to help you fix it, but I got numerous errors on
SVN 1.6.x (svn 1.6.12).  Can you make sure things continue to work on
1.6 and earlier, also?

Maybe just enable the escaping for file:// on >= SVN 1.7

Here are the tests that failed for me:

make[1]: *** [t9100-git-svn-basic.sh] Error 1
make[1]: *** [t9103-git-svn-tracked-directory-removed.sh] Error 1
make[1]: *** [t9104-git-svn-follow-parent.sh] Error 1
make[1]: *** [t9105-git-svn-commit-diff.sh] Error 1
make[1]: *** [t9107-git-svn-migrate.sh] Error 1
make[1]: *** [t9108-git-svn-glob.sh] Error 1
make[1]: *** [t9109-git-svn-multi-glob.sh] Error 1
make[1]: *** [t9110-git-svn-use-svm-props.sh] Error 1
make[1]: *** [t9111-git-svn-use-svnsync-props.sh] Error 1
make[1]: *** [t9114-git-svn-dcommit-merge.sh] Error 1
make[1]: *** [t9116-git-svn-log.sh] Error 1
make[1]: *** [t9117-git-svn-init-clone.sh] Error 1
make[1]: *** [t9118-git-svn-funky-branch-names.sh] Error 1
make[1]: *** [t9120-git-svn-clone-with-percent-escapes.sh] Error 1
make[1]: *** [t9125-git-svn-multi-glob-branch-names.sh] Error 1
make[1]: *** [t9127-git-svn-partial-rebuild.sh] Error 1
make[1]: *** [t9128-git-svn-cmd-branch.sh] Error 1
make[1]: *** [t9130-git-svn-authors-file.sh] Error 1
make[1]: *** [t9135-git-svn-moved-branch-empty-file.sh] Error 1
make[1]: *** [t9136-git-svn-recreated-branch-empty-file.sh] Error 1
make[1]: *** [t9141-git-svn-multiple-branches.sh] Error 1
make[1]: *** [t9145-git-svn-master-branch.sh] Error 1
make[1]: *** [t9146-git-svn-empty-dirs.sh] Error 1
make[1]: *** [t9150-svk-mergetickets.sh] Error 1
make[1]: *** [t9151-svn-mergeinfo.sh] Error 1
make[1]: *** [t9153-git-svn-rewrite-uuid.sh] Error 1
make[1]: *** [t9154-git-svn-fancy-glob.sh] Error 1
make[1]: *** [t9155-git-svn-fetch-deleted-tag.sh] Error 1
make[1]: *** [t9156-git-svn-fetch-deleted-tag-2.sh] Error 1
make[1]: *** [t9157-git-svn-fetch-merge.sh] Error 1
make[1]: *** [t9159-git-svn-no-parent-mergeinfo.sh] Error 1
make[1]: *** [t9161-git-svn-mergeinfo-push.sh] Error 1

^ permalink raw reply

* Re: New Feature wanted: Is it possible to let git clone continue last break point?
From: Jeff King @ 2011-11-02 22:06 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: netroby, Git Mail List, Tomas Carnecky
In-Reply-To: <20111031090717.GA24978@elie.hsd1.il.comcast.net>

On Mon, Oct 31, 2011 at 04:07:18AM -0500, Jonathan Nieder wrote:

> Something like Jeff's "priming the well with a server-specified
> bundle" proposal[2] might be a good way to make the same trick
> transparent to clients in the future.

Yes, that is one of the use cases I hope to address. But it will require
the publisher specifying a mirror location (it's possible we could add
some kind of automagic "hit a bundler service first" config option,
though I fear that the existing small-time bundler services would
crumble under the load).

So in the general case (and in the meantime), you may have to learn to
manually prime the repo using a bundle.

I haven't started on the patches for communicating mirror sites between
the server and client, but I did just write some patches to handle "git
fetch http://host/path/to/file.bundle" automatically, which is the first
step. They need a few finishing touches and some testing, though.

> Even with that, later fetches, which grab a pack generated on the fly
> to only contain the objects not already fetched, are generally not
> resumable.  Overcoming that would presumably require larger protocol
> changes, and I don't know of anyone working on it.  (My workaround
> when in a setup where this mattered was to use the old-fashioned
> "dumb" http protocol.  It worked fine.)

My goal was for the mirror communication between client and server to be
something like:

  - if you don't have object XXXXXX, then prime with URL
    http://host/bundle1

  - if you don't have object YYYYYY, then prime with URL
    http://host/bundle2

and so forth. A cloning client would grab the first bundle, then the
second, and then hit the real repo via the git protocol. A client who
had previously cloned might have XXX, but would now grab bundle2, and
then hit the real repo.

So depending on how often the server side feels like creating new
bundles, you would get most of the changes via bundles, and then only
be getting a small number of objects via git.

The downside of cumulative fetching is that the bundles can only serve
well-known checkpoints. So if you have a timeline like this:

  t0: server publishes bundle/mirror config with one line (the XXX bit
      above)

  t1: you clone, getting the whole bundle. No waste, because you had
      nothing in the first place, and you needed everything.

  t2: you fetch again, getting N commits worth of history via the git
      protocol

  t3: server decides a lot of new objects (let's say M commits worth)
      have accumulated, and generates a new line (the YYY line).

  t4: you fetch, see that you don't yet have YYY, and grab the second
      bundle

But in t4 you grabbed a bundle containing M commits, when you already
had the first N of them. So you actually wasted bandwidth getting
objects you already had. The only benefit is that you grabbed a static
file, which is resumable.

So I suspect there is some black magic involved in deciding when to
create a new bundle, and at what tip. If you create a bundle once a
month, but include only commits up to a week ago, then people pulling
weekly will never grab the bundle, but people pulling less frequently
will get the whole month as a bundle.

A secondary issue is also that in a scheme like this, your mirror list
will grow without bound. So you'd want to periodically repack everything
into a single bundle. But then people who are fetching wouldn't want
that, as it is just an exacerbated version of the same problem above.

Which is all a roundabout way of saying that the git protocol is really
the sane way to do efficient transfers. An alternative, much simpler
scheme would be for the server to just say:

  - if you have nothing, then prime with URL http://host/bundle

And then _only_ clone would bother with checking mirrors. People doing
fetch would be expected to do it often enough that not being resumable
isn't a big deal.

-Peff

^ permalink raw reply

* Re: [ANNOUNCE] Git 1.7.7.2
From: Stefan Roas @ 2011-11-02 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linux Kernel
In-Reply-To: <7v7h3jl3kw.fsf@alter.siamese.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

Hi Junio,

is it possible that you forgot to update the GIT-VERSION-GEN with the
release of 1.7.7.2? I stll get version 1.7.7.1 from the tarball on
http://git-scm.com/ and when building from the git repository itself.

Regards,
  Stefan

-- 
Stefan Roas                                         sroas@roath.org
Joh.-Seb.-Bach-Str. 4                            D-91083 Baiersdorf
-------------------------------------------------------------------
Key fingerprint = 557C 99BE 865B 1463 2A44  7936 C662 8970 4DA5 50B8


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: long fsck time
From: Jeff King @ 2011-11-02 21:33 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Git Mailing List
In-Reply-To: <CACsJy8B=5mEWoOBkrTfmJ+p7HxqJM97zdG-k71oW81-3XxuO_Q@mail.gmail.com>

On Wed, Nov 02, 2011 at 07:10:26PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Wed, Nov 2, 2011 at 7:06 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> > On git.git
> >
> > $ /usr/bin/time git fsck
> > 333.25user 4.28system 5:37.59elapsed 99%CPU (0avgtext+0avgdata
> > 420080maxresident)k
> > 0inputs+0outputs (0major+726560minor)pagefaults 0swaps
> >
> > That's really long time, perhaps we should print progress so users
> > know it's still running?
> 
> Ahh.. --verbose. Sorry for the noise. Still good to show the number of
> checked objects though.

fsck --verbose is _really_ verbose. It could probably stand to have some
progress meters sprinkled throughout. The patch below produces this on
my git.git repo:

  $ git fsck
  Checking object directories: 100% (256/256), done.
  Verifying packs: 100% (7/7), done.
  Checking objects (pack 1/7): 100% (241/241), done.
  Checking objects (pack 2/7): 100% (176/176), done.
  Checking objects (pack 3/7): 100% (312/312), done.
  Checking objects (pack 4/7): 100% (252/252), done.
  Checking objects (pack 5/7): 100% (353/353), done.
  Checking objects (pack 6/7): 100% (375/375), done.
  Checking objects (pack 7/7): 100% (171079/171079), done.

which gives reasonably smooth progress. The longest hang is that
"Verifying pack" 7 is slow (I believe it's doing a sha1 over the whole
thing). If you really wanted to get fancy, you could probably do a
throughput meter as we sha1 the whole contents.

Patch is below. It would need --{no-,}progress support on the command
line, and to check isatty(2) before it would be acceptable.

---
diff --git a/builtin/fsck.c b/builtin/fsck.c
index df1a88b..481de4e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -11,6 +11,7 @@
 #include "fsck.h"
 #include "parse-options.h"
 #include "dir.h"
+#include "progress.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -512,15 +513,19 @@ static void get_default_heads(void)
 static void fsck_object_dir(const char *path)
 {
 	int i;
+	struct progress *progress;
 
 	if (verbose)
 		fprintf(stderr, "Checking object directory\n");
 
+	progress = start_progress("Checking object directories", 256);
 	for (i = 0; i < 256; i++) {
 		static char dir[4096];
 		sprintf(dir, "%s/%02x", path, i);
 		fsck_dir(i, dir);
+		display_progress(progress, i+1);
 	}
+	stop_progress(&progress);
 	fsck_sha1_list();
 }
 
@@ -622,19 +627,36 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 
 	if (check_full) {
 		struct packed_git *p;
+		int i, nr_packs = 0;
+		struct progress *progress;
 
 		prepare_packed_git();
 		for (p = packed_git; p; p = p->next)
+			nr_packs++;
+
+		progress = start_progress("Verifying packs", nr_packs);
+		for (i = 1, p = packed_git; p; p = p->next, i++) {
 			/* verify gives error messages itself */
 			verify_pack(p);
+			display_progress(progress, i);
+		}
+		stop_progress(&progress);
 
-		for (p = packed_git; p; p = p->next) {
+		for (i = 1, p = packed_git; p; p = p->next, i++) {
+			char buf[32];
 			uint32_t j, num;
 			if (open_pack_index(p))
 				continue;
 			num = p->num_objects;
-			for (j = 0; j < num; j++)
+
+			snprintf(buf, sizeof(buf), "Checking objects (pack %d/%d)",
+				 i, nr_packs);
+			progress = start_progress(buf, num);
+			for (j = 0; j < num; j++) {
 				fsck_sha1(nth_packed_object_sha1(p, j));
+				display_progress(progress, j+1);
+			}
+			stop_progress(&progress);
 		}
 	}
 

^ permalink raw reply related

* Re: [PATCH 1/2] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Junio C Hamano @ 2011-11-02 21:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Mika Fischer, git, gitster, daniel
In-Reply-To: <20111102203543.GC5628@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Wed, Nov 02, 2011 at 04:32:21PM -0400, Jeff King wrote:
>
>> At least that's my reading. I am working on unrelated patches that clean
>> up the handling of data_received, but if it could go away altogether,
>> that would be even simpler.
>
> That patch, btw, looks like this:
>
> -- >8 --
> Subject: [PATCH] http: remove "local" member from slot struct
>
> The curl-multi http code does something like this:
>
>   while (!finished) {
> 	  try_to_read_from_slots();
> 	  if (!data_received)
> 		  wait_for_50_ms();
>   }
>
> ...
> Let's do the same thing for the write-to-file case as we do
> for the write-to-strbuf case: use a thin wrapper callback
> and increment the received flag. This makes both methods
> consistent with each other, and saves us from managing the
> "local" struct member at all, reducing the code size.

Looks very sensible.

^ permalink raw reply

* Re: [git patches] libata updates, GPG signed (but see admin notes)
From: Junio C Hamano @ 2011-11-02 21:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: git, James Bottomley, Jeff Garzik, Andrew Morton, linux-ide, LKML
In-Reply-To: <CA+55aFz7TeQQH3D4Tpp31cZYZoQKeK37jouo+2Kh61Wa07knfw@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> And "add a fake empty commit just for the signature" is not the answer
> either - because that is clearly inferior to the tags we already had.
>
> I dunno. Did I miss something? As far as I can tell, the signed tags
> that we've had since day one are *clearly* much better in very
> fundamental ways.

Ok, back to the drawing board (which is not a loss as I wasn't expecting
this to be in the official release in upcoming 1.7.8 anyway).

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox