Git development

Git development
 help / color / mirror / Atom feed

* How to extract files out of a "git bundle", no matter what?
From: jidanni @ 2008-12-19 19:29 UTC (permalink / raw)
  To: mdl123; +Cc: git

Someone has handed you a "git bundle".
How do you get the files out of it?
If it were cpio, you would use -i, if it were tar, you would use -x...
You read the git-bundle man page.
You only get as far as
# git-bundle verify bundle.bdl
The bundle contains 1 ref
d01... /heads/master
The bundle requires these 0 ref
bundle.bdl is okay

The rest is mish-mosh. There should be an emergency example for non
git club members, even starting from apt-get install git-core, of the
all the real steps needed _to get the files out of the bundle_.

Assume the user _just wants to get the files out of the bundle_ and
not learn about or participate in some project.

^ permalink raw reply

* Re: How to extract files out of a "git bundle", no matter what?
From: Shawn O. Pearce @ 2008-12-19 19:32 UTC (permalink / raw)
  To: jidanni; +Cc: mdl123, git
In-Reply-To: <87iqpgc6bn.fsf@jidanni.org>

jidanni@jidanni.org wrote:
> Someone has handed you a "git bundle".
> How do you get the files out of it?
> If it were cpio, you would use -i, if it were tar, you would use -x...
> You read the git-bundle man page.
> You only get as far as
> # git-bundle verify bundle.bdl
> The bundle contains 1 ref
> d01... /heads/master
> The bundle requires these 0 ref
> bundle.bdl is okay
> 
> The rest is mish-mosh. There should be an emergency example for non
> git club members, even starting from apt-get install git-core, of the
> all the real steps needed _to get the files out of the bundle_.
> 
> Assume the user _just wants to get the files out of the bundle_ and
> not learn about or participate in some project.

You can't just "get the files out".  A bundle contains deltas,
where you need the base in order to recreate the file content.
It can't be unpacked in a vacuum.

To unpack a bundle you need to clone the project and then fetch
from it:

	git clone src...
	git pull bundle.bdl master

If the bundle requires 0 refs (like above) then you can init a
new repository and should be able to fetch from it:

	git init
	git pull bundle.bdl master

-- 
Shawn.

^ permalink raw reply

* Re: How to extract files out of a "git bundle", no matter what?
From: Mark Levedahl @ 2008-12-19 19:57 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: jidanni, git
In-Reply-To: <20081219193256.GU32487@spearce.org>

Shawn O. Pearce wrote:
>
> If the bundle requires 0 refs (like above) then you can init a
> new repository and should be able to fetch from it:
>
> 	git init
> 	git pull bundle.bdl master
>
>   

With relatively recent git (not sure the version), you can just do

    git clone bundle.bdl

Mark

^ permalink raw reply

* Re: How to extract files out of a "git bundle", no matter what?
From: Junio C Hamano @ 2008-12-19 20:07 UTC (permalink / raw)
  To: jidanni; +Cc: mdl123, git
In-Reply-To: <87iqpgc6bn.fsf@jidanni.org>

jidanni@jidanni.org writes:

> Someone has handed you a "git bundle".
> How do you get the files out of it?
> If it were cpio, you would use -i, if it were tar, you would use -x...
> You read the git-bundle man page.
> You only get as far as
> # git-bundle verify bundle.bdl
> The bundle contains 1 ref
> d01... /heads/master
> The bundle requires these 0 ref
> bundle.bdl is okay
>
> The rest is mish-mosh.

The last example in the git-bundle man page might be a bit cryptic but
that is how bundles are expected to be used.  To give people repository
access who do not have real network connection other than Sneakernet.

For one shot extraction, defining a remote in the config is overkill and
you could just say:

	git ls-remote bundle.bdl

to see what branches it contains and if you are interested in its
master branch and want to merge it to your history, then

	git pull bundle.bdl master

should do that.

^ permalink raw reply

* Re: How to extract files out of a "git bundle", no matter what?
From: jidanni @ 2008-12-19 20:13 UTC (permalink / raw)
  To: mdl123; +Cc: spearce, git
In-Reply-To: <494BFCAF.9060703@verizon.net>

SOP> If the bundle requires 0 refs (like above) then you can init a
SOP> new repository and should be able to fetch from it:

SOP> 	git init
SOP> 	git pull bundle.bdl master

Phew, that worked. Thank you!

ML> With relatively recent git (not sure the version), you can just do
ML>    git clone bundle.bdl
Not with git version 1.5.6.5, Debian sid.

Anyway, for man page completeness, I still see the day when:

SOP> You can't just "get the files out".  A bundle contains deltas,
SOP> where you need the base in order to recreate the file content.
SOP> It can't be unpacked in a vacuum.

That is nice by we here at the forensics department of XYZ police
force just need to get the files out. We tried "PK UNZIP" but that
didn't extract them. We contacted the Computer Science Dept. but
that's who they're holding hostage.

SOP> To unpack a bundle you need to clone the project and then fetch
SOP> from it:

SOP> 	git clone src...
SOP> 	git pull bundle.bdl master

That is nice but the perpetrators have destroyed everything except for
that one bundle.bdl file, which contains the password to defuse the
time bomb.

There must be a way to make a "phony tree" or whatever to "attach to"
so extraction can proceed. Be sure to spell it all out on the
git-bundle man page as a reference in case some non-computer people
need to do aforementioned emergency extraction one day.

^ permalink raw reply

* Re: How to extract files out of a "git bundle", no matter what?
From: Jeff King @ 2008-12-19 20:21 UTC (permalink / raw)
  To: jidanni; +Cc: mdl123, spearce, git
In-Reply-To: <87zlirc49l.fsf@jidanni.org>

On Sat, Dec 20, 2008 at 04:13:26AM +0800, jidanni@jidanni.org wrote:

> There must be a way to make a "phony tree" or whatever to "attach to"
> so extraction can proceed. Be sure to spell it all out on the
> git-bundle man page as a reference in case some non-computer people
> need to do aforementioned emergency extraction one day.

No, that information may not even be in the bundle at all (unless it is
a bundle that has a 0-ref basis). In particular, if a bundle contains
changes between some commit A and some commit B, then:

  - files that were not changed between A and B will not be included at
    all

  - the object pack in the bundle is "thin", meaning it may contain
    deltas against objects that are reachable from A, but not B. So even
    _within_ a changed file, you may see only the changes from A to B.

If the bundle has a 0-ref basis, then you can clone straight from the
bundle, which must have everything.

-Peff

^ permalink raw reply

* Re: How to extract files out of a "git bundle", no matter what?
From: jidanni @ 2008-12-19 20:35 UTC (permalink / raw)
  To: peff; +Cc: mdl123, spearce, git
In-Reply-To: <20081219202118.GA26513@coredump.intra.peff.net>

JK> In particular, if a bundle contains changes between some commit A
JK> and some commit B, then:

JK>   - files that were not changed between A and B will not be included at
JK>     all

JK>   - the object pack in the bundle is "thin", meaning it may contain
JK>     deltas against objects that are reachable from A, but not B. So even
JK>     _within_ a changed file, you may see only the changes from A to B.

OK, we here at the police forensics department would be very happy if
we could at least get some ASCII out of that .BDL file, even if it is
just a diff shred,
-       The password to the time bomb was BLORFZ
+       The password to the time bomb is  NORFLZ
that would be fine. All we know is after the work PACK it is all
binary, and git-unpack-objects and git-unpack-file don't work on it.

^ permalink raw reply

* Re: jgit doesn't support "compare with" and "replace with"?
From: Robin Rosenberg @ 2008-12-19 20:39 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Martin_S, git
In-Reply-To: <20081219152045.GR32487@spearce.org>

fredag 19 december 2008 16:20:45 skrev Shawn O. Pearce:
> Martin_S <iksdrijf@yahoo.com> wrote:
> > 
> > Hi, I'm using eclipse 3.4 and jgit 0.4. The right click context menus don't
> > list "compare with" and "replace with". Am I doing something wrong?
> 
> We haven't implemented them in EGit.  So its not surprising that
> they aren't appearing.

Actually, we had it in v0.3 though it didn't always work. In particular it didn't work on
Windows... 

The history rewrited killed it, but re-adding it would not be to hard, It's mostly about passing two explicit
versions to compare, which is already done in 

The old version disappeared in 07f04ae5b1771069667028d225196daff29402a0, checkout out and rebuild
if you are really desperate. Reverting it is an option, but that is not trivial either so going forward and 
reimplementing it (correctly this time) is a more appealing approach. Dependig on your needs, i.e. if
you only don't need clone/fetch/push you could go back to the commit mentioned above. The closest
tagged version is v0.3.1. As a bonus it draws the graph correctly, though it is not optimal.

-- robin

^ permalink raw reply

* Re: How to extract files out of a "git bundle", no matter what?
From: Jeff King @ 2008-12-19 20:51 UTC (permalink / raw)
  To: jidanni; +Cc: mdl123, spearce, git
In-Reply-To: <87vdtfc389.fsf@jidanni.org>

On Sat, Dec 20, 2008 at 04:35:50AM +0800, jidanni@jidanni.org wrote:

> JK>   - the object pack in the bundle is "thin", meaning it may contain
> JK>     deltas against objects that are reachable from A, but not B. So even
> JK>     _within_ a changed file, you may see only the changes from A to B.
> 
> OK, we here at the police forensics department would be very happy if
> we could at least get some ASCII out of that .BDL file, even if it is
> just a diff shred,
> -       The password to the time bomb was BLORFZ
> +       The password to the time bomb is  NORFLZ
> that would be fine. All we know is after the work PACK it is all
> binary, and git-unpack-objects and git-unpack-file don't work on it.

AFAIK, there is no tool to try salvaging strings from an incomplete pack
(and you can't just run "strings" because the deltas are zlib
compressed). So if I were in the police forensics department, I think I
would read Documentation/technical/pack-format.txt and start hacking a
solution as quickly as possible.

-Peff

^ permalink raw reply

* Re: Git Notes idea.
From: Jeff King @ 2008-12-19 21:25 UTC (permalink / raw)
  To: Govind Salinas; +Cc: Johannes Schindelin, Git Mailing List
In-Reply-To: <5d46db230812190938r4e8ff994gfcb616c750be0f22@mail.gmail.com>

On Fri, Dec 19, 2008 at 11:38:55AM -0600, Govind Salinas wrote:

> This is my concern with keeping a history of the notes pseudo-branch.  Let
> me restate what you are saying with an example
> 
> 1) on branch A commit a
> 2) add note a`
> 3) on branch B commit b
> 4) add note b`
> 5) on branch B commit c
> 6) add note c`
> 7) delete branch A
> 8) gc after a time such that a is pruned
> 
> Now either I will always have a note a` as an object forever even though
> the only commit that points to it is gone or I have to re-write the history of
> the notes branch from the point that it was added.

Yes, that's correct.

> Given this problem, is it really such a good idea to keep the history?

I think so. Otherwise how will you push and pull notes? You won't even
know which one is the more recent tree, let alone handle any merges
caused by editing notes in two places.

> On the other, other hand, pushing and pulling notes if a history is kept
> will have to involve a lot of rebasing/merging.

Depending on your workflow. It might just involve a lot of fast forwards
if the note writer is in one place.

> A possible solution is that notes are per-branch,
> 
> refs/notes/heads/master
> refs/notes/heads/foo/bar
> refs/notes/remotes/baz/bang

Sorry, I don't quite get it. You are asking for per-branch notes that
keep history, or per-branch notes that don't keep history?

If the former, then you haven't solved the cruft accumulation problem.
You can get obsolete notes in your note history by rebasing on a branch
that is long-running (which is OK as long as you haven't published
_those particular_ commits). Or are you proposing to rebase and cleanup
the notes history every time you do a destructive operation?

If the latter, then I don't see how you've solved the push-pull and
merge problem (which you need history for).

But in either case, I think the solution is non-intuitive. If I annotate
a commit, and then merge the commit from one branch to another,
shouldn't the annotation stay?

Really, I am not sure this is worth getting too concerned about. Since
we are talking about cruft in the _history_ of the notes branch, it
won't impact actual notes usage (which will always just deal with the
most recent tree). So really we are talking about some uninteresting
objects in the db, which wastes some space. In practice, I suspect this
won't be that large because notes themselves are going to be relatively
short and in many cases, repetitive (i.e., many annotations may have the
same blob hash for several commits). And if it is a space problem, then
the right solution is to periodically truncate the notes history by
rewriting.

-Peff

^ permalink raw reply

* Re: Odd merge behaviour involving reverts
From: Nanako Shiraishi @ 2008-12-19 21:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Alan, git
In-Reply-To: <7vocz8a6zk.fsf@gitster.siamese.dyndns.org>

Quoting Junio C Hamano <gitster@pobox.com>:

> I hope this clears up confusion and fear.

You are correct that I misunderstood what Alan meant by corrected branch.

I think your explanation will help people if we make it part of the documentation.  Especially because two different cases need two different recovery methods, and people need to learn which is which.

Thank you for your detailed response.
-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply

* Re: [PATCH] Clarify git-format-patch --in-reply-to
From: Nanako Shiraishi @ 2008-12-19 21:51 UTC (permalink / raw)
  To: jidanni; +Cc: gitster, git
In-Reply-To: <87k59wc73n.fsf@jidanni.org>

Quoting jidanni@jidanni.org:

> Signed-off-by: jidanni <jidanni@jidanni.org>

I understand that "Signed-off-by" is about code ownership and thought that the official history prefers to have a real name instead of a pseudonym. Perhaps you would want to say "Dan Jacobson <jidanni@jidanni.org>" or something similar?

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply

* [PATCH] diff.c: fix pointer type warning
From: René Scharfe @ 2008-12-19 22:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Mark Burton, Linus Torvalds, Git Mailing List
In-Reply-To: <20081218121118.3635c53c@crow>

As Mark Burton noted, the conversion to strbuf_readlink() caused a
compile warning on some architectures:

> diff.c: In function ‘diff_populate_filespec’:
> diff.c:1781: warning: passing argument 2 of ‘strbuf_detach’ from incompatible pointer type

A pointer to an unsigned long is given while a pointer to a size_t is
expected; the two types are not considered to be equivalent everywhere.

The real fix would be to change the type of the size member of struct
diff_filespec to size_t, but that would cause other warnings in
connection with functions expecting unsigned long, and attempts to fix
them might loose an avalanche of changes.  Later.  This patch just
silences the warning by adding an (implicit) casting step.

Reported-by: Mark Burton <markb@ordern.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 diff.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/diff.c b/diff.c
index f160c1a..0484601 100644
--- a/diff.c
+++ b/diff.c
@@ -1778,7 +1778,8 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)

 			if (strbuf_readlink(&sb, s->path, s->size))
 				goto err_empty;
-			s->data = strbuf_detach(&sb, &s->size);
+			s->size = sb.len;
+			s->data = strbuf_detach(&sb, NULL);
 			s->should_free = 1;
 			return 0;
 		}

^ permalink raw reply related

* Re: [PATCH] Clarify git-format-patch --in-reply-to
From: Miklos Vajna @ 2008-12-19 22:22 UTC (permalink / raw)
  To: Nanako Shiraishi; +Cc: jidanni, gitster, git
In-Reply-To: <20081220065135.6117@nanako3.lavabit.com>

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

On Sat, Dec 20, 2008 at 06:51:35AM +0900, Nanako Shiraishi <nanako3@lavabit.com> wrote:
> I understand that "Signed-off-by" is about code ownership and thought
> that the official history prefers to have a real name instead of a
> pseudonym. Perhaps you would want to say "Dan Jacobson
> <jidanni@jidanni.org>" or something similar?

I don't think it's a requirement, see 2b36b14 for example. Though yes,
in general it's considered childish to hide behind a nickname, instead
of using your real name. (ESR has a section about this in the "hacker
howto".)

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: Git Notes idea.
From: Govind Salinas @ 2008-12-19 22:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Git Mailing List
In-Reply-To: <20081219212536.GA27168@coredump.intra.peff.net>

On Fri, Dec 19, 2008 at 3:25 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Dec 19, 2008 at 11:38:55AM -0600, Govind Salinas wrote:
>
>> This is my concern with keeping a history of the notes pseudo-branch.  Let
>> me restate what you are saying with an example
>>
>> 1) on branch A commit a
>> 2) add note a`
>> 3) on branch B commit b
>> 4) add note b`
>> 5) on branch B commit c
>> 6) add note c`
>> 7) delete branch A
>> 8) gc after a time such that a is pruned
>>
>> Now either I will always have a note a` as an object forever even though
>> the only commit that points to it is gone or I have to re-write the history of
>> the notes branch from the point that it was added.
>
> Yes, that's correct.
>
>> Given this problem, is it really such a good idea to keep the history?
>
> I think so. Otherwise how will you push and pull notes? You won't even
> know which one is the more recent tree, let alone handle any merges
> caused by editing notes in two places.

Couldn't you simply merge your tree and theirs even if there is no
history.  You would have to find a way to handle merges in any event
since they could just as easily happen if you have a history.

>> On the other, other hand, pushing and pulling notes if a history is kept
>> will have to involve a lot of rebasing/merging.
>
> Depending on your workflow. It might just involve a lot of fast forwards
> if the note writer is in one place.
>
>> A possible solution is that notes are per-branch,
>>
>> refs/notes/heads/master
>> refs/notes/heads/foo/bar
>> refs/notes/remotes/baz/bang
>
> Sorry, I don't quite get it. You are asking for per-branch notes that
> keep history, or per-branch notes that don't keep history?

Both, at the end of my previous mail I said...

"So perhaps we could use the above layout with no history?"

But they are two separate fixes to 2 different problems.

> If the former, then you haven't solved the cruft accumulation problem.
> You can get obsolete notes in your note history by rebasing on a branch
> that is long-running (which is OK as long as you haven't published
> _those particular_ commits). Or are you proposing to rebase and cleanup
> the notes history every time you do a destructive operation?

Yes, it does not solve that problem.  But it does solve things like

Dev1 and Dev2 both have branches A and topic branch B. and they
are in refs/notes/public (or refs/notes or something not branch specific).

Dev1 adds 100 notes to topic B, lets say half of them are obsolete due
to rebases or whatever.  Dev2 pulls A and updates their notes
as well.  Now Dev2 has acquired all the notes from Dev1 including the
obsolete ones.  So you have 100 commits, 100 blobs and all the new
trees that go with them that the user was not interested in.

Run this across 1000 users and you have a lot of cruft.

Now, if instead we have a per-branch notes scheme, then you only get
the cruft from the branches you were interested in.  If you remove the
history you could end up with no cruft because gc should handle it.

> If the latter, then I don't see how you've solved the push-pull and
> merge problem (which you need history for).

What git-fetch would have to do is say.  This is a note.  The remote
sha is not the same as mine, i will treat this as a force and fetch the
objects without checking history and then run a merge on the 2
commits.  The notes merge could have its own strategy that checked
if an object exists before deciding to add a new item or delete a
removed one.  Then the user would only have to intervene if the
notes where edited.

> But in either case, I think the solution is non-intuitive. If I annotate
> a commit, and then merge the commit from one branch to another,
> shouldn't the annotation stay?

Sure, either the merge command could run 2 merges, one for the
real branch and one for the notes pseudo branch or the user
could be required to do that manually.  I would think that doing
it automatically would be good.  Especially if you use a special
merge strategy.

> Really, I am not sure this is worth getting too concerned about. Since
> we are talking about cruft in the _history_ of the notes branch, it
> won't impact actual notes usage (which will always just deal with the
> most recent tree). So really we are talking about some uninteresting
> objects in the db, which wastes some space. In practice, I suspect this
> won't be that large because notes themselves are going to be relatively
> short and in many cases, repetitive (i.e., many annotations may have the
> same blob hash for several commits). And if it is a space problem, then
> the right solution is to periodically truncate the notes history by
> rewriting.

You are correct of course that it will just be wasted space.  But I am
concerned that it could end up being a lot of wasted space.  I mean, what
if every person who contributed to the kernel contributed note cruft.  Users
have branches that they consider public, so they might go into the a public
note store if there is no per-branch store.  Or errant users could use the
public store without understanding how they are affecting the central repo,
including the obsolete ones.

If you *really* don't think its something to be worried about then I am OK
with that since you have a lot more experience with this, but it sounds hairy
to me.

Thanks,
Govind.

^ permalink raw reply

* Re: [PATCH] Clarify git-format-patch --in-reply-to
From: jidanni @ 2008-12-19 22:27 UTC (permalink / raw)
  To: nanako3; +Cc: gitster, git
In-Reply-To: <20081220065135.6117@nanako3.lavabit.com>

NS> I understand that "Signed-off-by" is about code ownership and
NS> thought that the official history prefers to have a real name
NS> instead of a pseudonym. Perhaps you would want to say "Dan
NS> Jacobson <jidanni@jidanni.org>" or something similar?

Thanks but I want to be http://zh.wikipedia.org/wiki/積丹尼
(which is what all my ID cards say) and not any
http://en.wikipedia.org/wiki/Jacobson , http://en.wikipedia.org/wiki/Jacobsen .
I've completed a name-change operation, and prefer to use my post-op name.
Childish? Bingo.

^ permalink raw reply

* Re: [PATCH] Clarify git-format-patch --in-reply-to
From: Bernt Hansen @ 2008-12-19 22:01 UTC (permalink / raw)
  To: jidanni; +Cc: gitster, git
In-Reply-To: <87k59wc73n.fsf@jidanni.org>

jidanni@jidanni.org writes:

> Signed-off-by: jidanni <jidanni@jidanni.org>
>
> diff --git a/git-format-patch.txt b/git-format-patch.txt
> index ee27eff..04958de 100644
> --- a/git-format-patch.txt
> +++ b/git-format-patch.txt
> @@ -130 +130,2 @@ include::diff-options.txt[]
> -	provide a new patch series.
> +	provide a new patch series. Generates coresponding References and
                                              ^^^^^^^^^^^^
Typo                                          corresponding
> +	In-Reply-To headers. Angle brackets around <Message-Id> are optional.
> -- 
> 1.5.6.5

^ permalink raw reply

* Re: [PATCH] Clarify git-format-patch --in-reply-to
From: Miklos Vajna @ 2008-12-19 22:33 UTC (permalink / raw)
  To: jidanni; +Cc: nanako3, gitster, git
In-Reply-To: <87k59vby27.fsf@jidanni.org>

[-- Attachment #1: Type: text/plain, Size: 778 bytes --]

On Sat, Dec 20, 2008 at 06:27:28AM +0800, jidanni@jidanni.org wrote:
> NS> I understand that "Signed-off-by" is about code ownership and
> NS> thought that the official history prefers to have a real name
> NS> instead of a pseudonym. Perhaps you would want to say "Dan
> NS> Jacobson <jidanni@jidanni.org>" or something similar?
> 
> Thanks but I want to be http://zh.wikipedia.org/wiki/?????????
> (which is what all my ID cards say) and not any
> http://en.wikipedia.org/wiki/Jacobson , http://en.wikipedia.org/wiki/Jacobsen .
> I've completed a name-change operation, and prefer to use my post-op name.

Who said you can't use utf-8 chars in the author field?

See
http://repo.or.cz/w/git.git?a=commit;h=6b312253cb6e8b21e74882a3ae0972fac1290244
for example.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Very slow clone time over http
From: demerphq @ 2008-12-19 22:33 UTC (permalink / raw)
  To: git

Hi,

I've been working on the migration of the perl5 repositories from
perforce to git, which is soon to be officially released.

We have set up a server with http/git/ssh access, and we had hoped
also rsync access.

However, it appears that clone using the rsync:// protocol is broken,
and now we are discovering that http cloning is extremely slow. I
reported our experiences with rsync in a previous mail, and now im
reporting the http performance issue.

When we strace the clone we observe (to quote our sysadmin):

"git seems to be stuck for 10 minutes after downloading"
"no output, strace shows it's having sex with the memory allocator"
"sex with memory allocator" ==> loads of memory allocation/dealoccation

A provisional release of the conversion is available at

   http://dromedary.booking.com/perl.git

We are using git 1.6.0.5 on another more or less equivalent host, with
the same problems, and dromedary is using git version
1.6.0.4.724.ga0d3a. However we believe that the problem is actually on
the client side. The pack download itself appears to be quite fast,
however there is an extremely long pause (minutes) after which a HUGE
amount of essentially imcomprehensible output is generated about
walking packs or some such.

Timing a clone via http gets us number like:

real    7m42.459s
user    3m42.154s
sys     0m12.641s

Wheras using the git:// protocol gets us times like:

real    4m6.162s
user    0m43.595s
sys     0m4.852s

The client these numbers are from is git version 1.6.0.3.

So it take approximately twice the time via http as it does via git.
This seems somewhat strange. Is there anything we can do to improve
this? Repack? Anything like that?

A post about the github system suggests that this is not an isolated problem.

   http://github.com/blog/92-http-cloning

if there is anything we can do to help resolve this issue please let us know.

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

^ permalink raw reply

* Re: Odd merge behaviour involving reverts
From: Junio C Hamano @ 2008-12-19 23:05 UTC (permalink / raw)
  To: Nanako Shiraishi; +Cc: Linus Torvalds, Alan, git
In-Reply-To: <20081220064532.6117@nanako3.lavabit.com>

Nanako Shiraishi <nanako3@lavabit.com> writes:

> I think your explanation will help people if we make it part of the
> documentation.  Especially because two different cases need two
> different recovery methods, and people need to learn which is which.

Sure.  It needs copyediting to make it readable standalone by not
mentioning "your misunderstanding", inlining "earlier Linus's suggestion",
etc., though.

Patches welcome ;-)

^ permalink raw reply

* Re: [PATCH] Clarify git-format-patch --in-reply-to
From: Junio C Hamano @ 2008-12-19 23:07 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Nanako Shiraishi, jidanni, gitster, git
In-Reply-To: <20081219222209.GG21154@genesis.frugalware.org>

Miklos Vajna <vmiklos@frugalware.org> writes:

> On Sat, Dec 20, 2008 at 06:51:35AM +0900, Nanako Shiraishi <nanako3@lavabit.com> wrote:
>> I understand that "Signed-off-by" is about code ownership and thought
>> that the official history prefers to have a real name instead of a
>> pseudonym. Perhaps you would want to say "Dan Jacobson
>> <jidanni@jidanni.org>" or something similar?
>
> I don't think it's a requirement, see 2b36b14 for example. Though yes,
> in general it's considered childish to hide behind a nickname, instead
> of using your real name. (ESR has a section about this in the "hacker
> howto".)

An earlier mistake does not justify adding new ones.  Besides, I think
ALASCM once revealed his "real name" on the list.

^ permalink raw reply

* Re: [PATCH] diff.c: fix pointer type warning
From: Junio C Hamano @ 2008-12-19 23:09 UTC (permalink / raw)
  To: René Scharfe; +Cc: Mark Burton, Linus Torvalds, Git Mailing List
In-Reply-To: <494C1BE8.20607@lsrfire.ath.cx>

Thanks; I think I already have it in my tree from your yesterday's
e-mail.  I just have been too busy to whip the other branches into shape
to push the results out.

^ permalink raw reply

* Re: Odd merge behaviour involving reverts
From: Nanako Shiraishi @ 2008-12-19 23:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Alan, git
In-Reply-To: <7vljub7oko.fsf@gitster.siamese.dyndns.org>

Quoting Junio C Hamano <gitster@pobox.com>:

> Nanako Shiraishi <nanako3@lavabit.com> writes:
>
>> I think your explanation will help people if we make it part of the
>> documentation.  Especially because two different cases need two
>> different recovery methods, and people need to learn which is which.
>
> Sure.  It needs copyediting to make it readable standalone by not
> mentioning "your misunderstanding", inlining "earlier Linus's suggestion",
> etc., though.
>
> Patches welcome ;-)

Okay, I'll send one later.

Thanks.
-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply

* just can't live without a user.name
From: jidanni @ 2008-12-19 23:20 UTC (permalink / raw)
  To: vmiklos; +Cc: nanako3, gitster, git
In-Reply-To: <20081219223306.GH21154@genesis.frugalware.org>

Actually it's all git's fault for not working if user.name is null or
unset. Ask yourself, would email programs panic if all there was only
bob@example.org in a header?
> But we need a user.name for legal reasons.
But there should be a way to override it, in case those laws don't apply.
I want
  Author: jidanni@jidanni.org
like my email address above. But the closest I can get is
  Author: jidanni <jidanni@jidanni.org>
And then there are some programs that need
$ git config --global user.name $USER
just to get that, else the springs come loose:
$ git-format-patch -s
*** Please tell me who you are.
Run
  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"
to set your account's default identity.
Omit --global to set the identity only in this repository.
fatal: empty ident  <jidanni@jidanni.org
> not allowed

^ permalink raw reply

* [PATCH 0/4] Notes reloaded
From: Johannes Schindelin @ 2008-12-19 23:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List
In-Reply-To: <20081216085108.GA3031@coredump.intra.peff.net>

Hi,

On Tue, 16 Dec 2008, Jeff King wrote:

>   Johannes Schindelin's notes proposal (which is more or less 
>   the current proposal, but I think the on-disk notes index was not 
>   well liked): 
>   http://thread.gmane.org/gmane.comp.version-control.git/52598

I redid the benchmark (this time with a bit beefier machine), just 
comparing no notes with David's/Peff's idea:


-- snip --
$ GIT_NOTES_TIMING_TESTS=1 sh t3302-notes-index-expensive.sh -i -v
Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/.git/
* expecting success: create_repo 10
Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/10/.git/
*   ok 1: setup 10

* expecting success: test_notes 10
*   ok 2: notes work

* expecting success: time_notes 100
no-notes
0.08user 0.10system 0:00.18elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+58926minor)pagefaults 0swaps
notes
0.14user 0.07system 0:00.54elapsed 38%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+60319minor)pagefaults 0swaps
*   ok 3: notes timing

* expecting success: create_repo 100
Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/100/.git/
*   ok 1: setup 100

* expecting success: test_notes 100
*   ok 2: notes work

* expecting success: time_notes 100
no-notes
0.23user 0.21system 0:00.45elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+68043minor)pagefaults 0swaps
notes
0.38user 0.21system 0:00.59elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+78829minor)pagefaults 0swaps
*   ok 3: notes timing

* expecting success: create_repo 1000
Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/1000/.git/
*   ok 1: setup 1000

* expecting success: test_notes 1000
*   ok 2: notes work

* expecting success: time_notes 100
no-notes
2.06user 0.95system 0:04.26elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+159115minor)pagefaults 0swaps
notes
2.83user 1.54system 0:04.38elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+267416minor)pagefaults 0swaps
*   ok 3: notes timing

* expecting success: create_repo 10000
Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/10000/.git/
*   ok 1: setup 10000

* expecting success: test_notes 10000
*   ok 2: notes work

* expecting success: time_notes 100
no-notes
20.46user 7.63system 0:28.30elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1083378minor)pagefaults 0swaps
notes
28.78user 13.74system 0:42.85elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2240296minor)pagefaults 0swaps
*   ok 3: notes timing

* passed all 0 test(s)
-- snap --


Keep in mind that the tests run "git log" 99 times, and show the 
accumulated time.

So it seems that an increase of roughly 40% in the user time, and roughly 
70% in the system time is the price to have notes associated with every 
single commit.

Note that in that very same repository, a single "git show" goes from

0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (0major+561minor)pagefaults 0swaps

to this:

0.03user 0.02system 0:00.04elapsed 113%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (0major+2294minor)pagefaults 0swaps

(In another run, it only used 90%CPU)

That's not too shabby, given that Git needs to unpack double the number of 
objects in this test when using notes vs. no notes.

For comparison, the numbers back then were something like 10% in user time 
with a penalty of an extraordinary magnitude everytime the notes are 
updated: around 800%.

Note: all these numbers are worst-case numbers, i.e. every commit has one 
note.

To be frank, I do not completely understand why the numbers are that high.  
I would have understood an increase roughly 4 seconds for reading the 
quite large tree 99 times, and then the same ~0.20 seconds back then.  
Maybe I made a huge mistake when implementing the thing.

And BTW, my code does not yet handle the case when 
refs/notes/commits:$commit is a tree instead of a blob.  That is left as 
an exercise to the reader.



Johannes Schindelin (4):
  Introduce commit notes
  Add a script to edit/inspect notes
  Speed up git notes lookup
  Add an expensive test for git-notes

 .gitignore                       |    1 +
 Documentation/config.txt         |   15 ++++
 Documentation/git-notes.txt      |   46 +++++++++++
 Makefile                         |    3 +
 cache.h                          |    3 +
 command-list.txt                 |    1 +
 commit.c                         |    1 +
 config.c                         |    5 +
 environment.c                    |    1 +
 git-notes.sh                     |   65 +++++++++++++++
 notes.c                          |  159 ++++++++++++++++++++++++++++++++++++++
 notes.h                          |    7 ++
 pretty.c                         |    5 +
 t/t3301-notes.sh                 |   65 +++++++++++++++
 t/t3302-notes-index-expensive.sh |   98 +++++++++++++++++++++++
 15 files changed, 475 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-notes.txt
 create mode 100755 git-notes.sh
 create mode 100644 notes.c
 create mode 100644 notes.h
 create mode 100755 t/t3301-notes.sh
 create mode 100755 t/t3302-notes-index-expensive.sh

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox