git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Weird growth in packfile during initial push
@ 2009-04-15 18:27 Robin H. Johnson
  2009-04-15 19:51 ` Nicolas Pitre
  0 siblings, 1 reply; 17+ messages in thread
From: Robin H. Johnson @ 2009-04-15 18:27 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2285 bytes --]

I was doing a more recent conversion of the Gentoo repo, and ran into
some odd behavior in the packfile size.

For anybody else following the repo, you can now get it on the new hardware at:
http://git-exp.overlays.gentoo.org/gitweb/?p=exp/gentoo-x86.git;a=summary

I did the conversion with cvs2svn, packed, added the remote and pushed, only to
find that the pack on the remote side suddenly seemed to be ~60MiB larger.

$ time git repack -adf --window=250 --depth=250
real    19m59.339s
user    96m48.011s
sys     0m36.914s

$ ls -la /tmp/convert/gentoo-x86-cvs2git/.git/objects/pack
total 903804
drwxr-xr-x 2 robbat2 users       119 Apr 14 08:05 .
drwxr-xr-x 4 robbat2 users        28 Apr 14 08:05 ..
-r--r--r-- 1 robbat2 users 139155472 Apr 14 08:05 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.idx
-r--r--r-- 1 robbat2 users 786336481 Apr 14 08:05 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.pack

$ git remote add origin git+ssh://git@git-exp.overlays.gentoo.org/exp/gentoo-x86.git
$ git push origin master:master
Initialized empty Git repository in /var/gitroot/exp/gentoo-x86.git/
Counting objects: 4969800, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (1217809/1217809), done.
Writing objects: 100% (4969800/4969800), 810.56 MiB | 21608 KiB/s, done.
Total 4969800 (delta 3735812), reused 4969800 (delta 3735812)
To git+ssh://git@git-exp.overlays.gentoo.org/exp/gentoo-x86.git
 * [new branch]      master -> master

$ ls -la /var/gitroot/exp/gentoo-x86.git/objects/pack
total 966876
drwxr-xr-x 2 git git      4096 Apr 14 08:43 .
drwxr-xr-x 4 git git      4096 Apr 14 08:35 ..
-r--r--r-- 1 git git 139155472 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.idx
-r--r--r-- 1 git git 849936308 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.pack

On the client side after the initial clone, it DOES match (in size) what was
cloned.

(If you're looking for the 849MB one right now, I'll have to get it back for
you, I wanted to save that extra space so just did an rsync of the other pack
over the too-large one for now).

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 330 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: Compatibility between git.git and jgit
@ 2009-05-02 11:00 Mark Struberg
  0 siblings, 0 replies; 17+ messages in thread
From: Mark Struberg @ 2009-05-02 11:00 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre, Shawn O. Pearce


As for compatibility between JGIT and GIT:

We (the Apache maven-scm team with Shawn supporting us (thanks again for patiently answering my sometimes stupid questions)) are currently working on a JGIT SCM provider for maven. The commandline git-provider already works pretty ok since more than a year now and once we have the JGIT version too. all this gets tested automatically via our TCK suite.

The TCK suite is pretty high-level, but at least all the fundamental stuff is then guaranteed to work for both implementations.

One step on our road is to further 'abstract' the current jgit-core library and introduce a SimpleRepository which basically contains the most important git commands as Java calls (e.g. addRemote, fetch, ... ) [1]. So after having this it should be really easy to side-by-side compare the .git/* of e.g. git-clone uri vs SimpleRepository.clone(uri)


LieGrue,
strub

[1] http://github.com/sonatype/JGit/ branch struberg
--- Shawn O. Pearce <spearce@spearce.org> schrieb am Sa, 2.5.2009:

> Von: Shawn O. Pearce <spearce@spearce.org>
> Betreff: Re: Compatibility between git.git and jgit
> An: "Nicolas Pitre" <nico@cam.org>
> CC: "Junio C Hamano" <gitster@pobox.com>, git@vger.kernel.org
> Datum: Samstag, 2. Mai 2009, 3:59
> Nicolas Pitre <nico@cam.org>
> wrote:
> > On Fri, 1 May 2009, Shawn O. Pearce wrote:
> > 
> > > On an unrelated note, someone asked me recently,
> how do we ensure
> > > compatibility in implementations between git.git
> and jgit?
> > 
> > Well... this is not exactly easy.  As I said in
> the past 
> > (http://marc.info/?l=git&m=121035043412788&w=2), I think
> that the C 
> > version must remain the reference with regards to
> protocols and on-disk 
> > data structures.
> 
> I agree fully.
> 
> > If people go wild with JGit and start making changes 
> > to data structures then it simply won't be Git
> compatible anymore and 
> > the user base will get fragmented.
> 
> Agree.  We may see some prototyping happen in JGit
> first on some
> topics, and JGit may even support something earlier than
> git.git,
> e.g JGit has an amazon-s3:// transport that git.git doesn't
> have.
> But it also isn't widely used.
> 
> > A formal compatibility test suite would imply that
> every Git 
> > reimplementation should be compatible with the
> reference C version.  
> > You could add some tests in your test suite which are
> performed in 
> > parallel using JGit and the C git, and make sure that
> the produced 
> > results are identical, etc.
> 
> Yea, and to some extent we try to do that already in JGit,
> but our
> tests aren't complete enough in that area.
>  
> > But to which extent should the C version remain
> backward compatible with 
> > other implementations?  Let's suppose a future
> protocol extension is 
> > made and old unsuspecting C clients work just fine but
> some other 
> > implementation crashes with it?
> 
> This is what I think scares both myself and the folks that
> have
> recently asked me about compatibility.
> 
> If JGit gets a broader user base, and suddenly it stops
> working
> against a newer C git-daemon because of a protocol change,
> those
> users are going to be pissed.  Its no worse than the
> "github can't
> ever upgrade past 1.6.1" issue we had not too long ago.
> 
> I think we're doing better these days about embedding file
> format
> version numbers into files (e.g. pack idx v2) to help alert
> older
> clients that the format is different.  But we also
> have a something
> of a history of looking for "holes" in older C git parsers
> in
> order to wedge in new features where we didn't plan for
> them in
> the first place.  E.g. the protocol capability slots
> we have now.
> 
> I think that as reimplementations become more popular, we
> need to
> rely less on extending things by exploiting parser quirks
> in older
> C git.git code, and rely more on at least explicit version
> markers
> that everyone can work with.
> 
> > And the reference implementation cannot be held back
> because 
> > of bugs in all alternative implementations.
> 
> I agree.  A bug is a bug.  But I'd really like to
> get away from the
> trend where we exploit bugs in older C git.git
> implementations to
> add new functionality, because maybe JGit doesn't have that
> same
> bug and will fall flat on its face with that exploit.
> 
> > As long as they're futzing^Wdeveloping on top of Jgit
> then 
> > interoperability shouldn't be at risk.  If people
> would start adding new 
> > object types and pack formats and the like without
> obtaining a consensus 
> > with people around the C version then I might get
> extremely worried (and 
> > pissed) though.
> 
> That's why JGit is BSD, so everyone can use the one f'king
> library
> and not risk fragmenting the Java market further.
> 
> But yea, I'd be really pissed too if someone hacked up JGit
> and made
> it incompatible with anything else.  Its a risk that
> the liberal
> BSD license permits.
> 
> I'm really sort of hoping that the development momentum
> around
> git.git and JGit trying to keep up will keep them coming
> back
> to the canonical JGit for updates, forcing them to give
> back any
> hacks^Wimprovements they have made.  If the
> improvements really are
> worthwhile, they can be easily ported over to C before they
> become
> widely used in JGit.
>  
> > One defensive approach we could adopt is to use a
> capability slot to 
> > identify the software version of each peer involved in
> the network 
> > communication.  The advantage would be for a
> later Git version to avoid 
> > doing some things that are known to break with client
> X or Y.  Of course 
> > even such a scheme can be abused and misused, like on
> some web sites if 
> > you don't have the "right" browser, leading some of
> them to allow faking 
> > the User-Agent string, etc.  But maybe the
> upsides are more important 
> > than the downsides.  This doesn't help with
> on-disk interoperability, 
> > but this is probably less important than communication
> interoperability.
> 
> Blargh.  I'm with you about the whole User-Agent
> mess.
> 
> Asking clients and servers to identify with implementation
> and
> version markers might be useful for analysis of
> who-is-using-what,
> but I don't think its a good way to negotiate between the
> peers of
> what functionality to enable or disable, or what bug
> workarounds
> to use.  Reminds me of the Apache hack during output
> to work around
> an HTTP header parsing bug in Netscape 2 when the "\r\n"
> pair was
> exactly at byte 256 in the stream.  *shudder*
> 
> 
> FWIW, an EGit user recently complained that some random Git
> hosting
> site they were using couldn't work with EGit, but EGit
> worked fine
> with other sites, e.g. GitHub.  Apparently this site's
> SSH forced command
> filter script didn't like EGit asking for "git upload-pack
> 'path.git'".
> 
> Its not strictly a Git protocol issue, how the client
> launches
> the remote process over SSH, but this random hosting site
> was
> apparently relying on C git's current calling convention
> of
> "git-upload-pack 'path.git'".
> 
> Long story short, I claimed it was the hosting site's
> bug.  :-)
> 
> -- 
> Shawn.
> --
> To unsubscribe from this list: send the line "unsubscribe
> git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


      

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-05-04 22:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-15 18:27 Weird growth in packfile during initial push Robin H. Johnson
2009-04-15 19:51 ` Nicolas Pitre
2009-04-29 23:57   ` Junio C Hamano
2009-04-30  2:52     ` Nicolas Pitre
2009-05-01  6:17     ` Robin H. Johnson
2009-05-01 20:56     ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
2009-05-01 23:49       ` Junio C Hamano
2009-05-02  0:01         ` Compatibility between git.git and jgit Shawn O. Pearce
2009-05-02  1:14           ` A Large Angry SCM
2009-05-02  1:39           ` Nicolas Pitre
2009-05-02  1:59             ` Shawn O. Pearce
2009-05-02 16:56             ` Ealdwulf Wuffinga
2009-05-02  1:40           ` Michael Witten
2009-05-02  0:24         ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
2009-05-04 22:11       ` Shawn O. Pearce
2009-05-04 22:30         ` Shawn O. Pearce
  -- strict thread matches above, loose matches on Subject: below --
2009-05-02 11:00 Compatibility between git.git and jgit Mark Struberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).