git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Errors cloning large repo
@ 2007-03-09 19:20 Anton Tropashko
  2007-03-09 21:37 ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Anton Tropashko @ 2007-03-09 19:20 UTC (permalink / raw)
  To: git

I managed to stuff 8.5 GB worth of files into a git repo (two two git commits since
it was running out of memory when I gave it -a option)

but when I'm cloning to another linux box I get:

Generating pack...
Done counting 152200 objects.
Deltifying 152200 objects.
0* 80% (122137/152200) donee
 100% (152200/152200) done
/usr/bin/git-clone: line 321:  2072 File size limit exceededgit-fetch-pack --all -k $quiet "$repo"


Would be nice to be able to work around this somehow if the bug can not be fixed.
1.5.0 on the server
1.4.1 on the client





 
____________________________________________________________________________________
Food fight? Enjoy some healthy debate 
in the Yahoo! Answers Food & Drink Q&A.
http://answers.yahoo.com/dir/?link=list&sid=396545367

^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Errors cloning large repo
@ 2007-03-09 23:48 Anton Tropashko
  2007-03-10  0:54 ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Anton Tropashko @ 2007-03-09 23:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

answers inline prefixed with >>>> while I'm trying to figure out how to deal
with new "improvement" to yahoo mail beta.

On Fri, 9 Mar 2007, Anton Tropashko wrote:
>
> I managed to stuff 8.5 GB worth of files into a git repo (two two git commits since
> it was running out of memory when I gave it -a option)

So you might be able to do just do

    git add dir1
    git add dir2
    git add dir3
    ..
    git commit

or something.

>>>>>>>>>>>>  For some reason git add . swallowed the whole thing
>>>>>>>>>>>>  but git commit did not and I had to split it up. I trimmed the tree a bit
>>>>>>>>>>>>  since then by removing c & c++ files ;-)

But one caveat: git may not be the right tool for the job. May I inquire 
what the heck you're doing? We may be able to fix git even for your kinds 

>>>>>>>>>>>>  I dumped a rather large SDK into it. Headers, libraries
>>>>>>>>>>>> event crs.o from the toolchains that are part of SDK. The idea is to keep
>>>>>>>>>>>>  SDK versioned and being able to pull an arbitrary version once tagged.

So I'm not saying that git won't work for you, I'm just warning that the 
whole model of operation may or may not actually match what you want to 
do. Do you really want to track that 8.5GB as *one* entity?

>>>>>>>>>>>> Yes. It would be nice if I won't have to prune pdfs, txts, and who
>>>>>>>>>>>> knows what else people put in there just to reduce the size.

> but when I'm cloning to another linux box I get:
> 
> Generating pack...
> Done counting 152200 objects.
> Deltifying 152200 objects.

.. this is the part makes me think git *should* be able to work for you. 
Having lots of smallish files is much better for git than a few DVD 
images, for example. And if those 152200 objects are just from two 
commits, you obviously have lots of files ;)

However, if it packs really badly (and without any history, that's quite 
likely), maybe the resulting pack-file is bigger than 4GB, and then you'd 
have trouble (in fact, I think you'd hit trouble at the 2GB pack-file 
mark).

Does "git repack -a -d" work for you?

>>>>>>>>>>>> I'll tell you as soon as I get another failure. As you
>>>>>>>>>>>> might guess it takes a while :-]

> /usr/bin/git-clone: line 321:  2072 File size limit exceededgit-fetch-pack --all -k $quiet "$repo"

"File size limit exceeded" sounds like SIGXFSZ, which is either:

 - you have file limits enabled, and the resulting pack-file was just too 
   big for the limits.

 - the file size is bigger than MAX_NON_LFS (2GB-1), and we don't use 
   O_LARGEFILE.

I suspect the second case. Shawn and Nico have worked on 64-bit packfile 
indexing, so they may have a patch / git tree for you to try out.

>>>>>>>>>>>> Ok. I think you're correct:
from ulimit -a:
...
file size             (blocks, -f) unlimited
...

Good to know developers are ahead of the users.

Is there way to get rid of pending (uncommitted) changes?
git revert does not work the same way as svn revert as I just discovered
and git status still reports a ton of pending deletions
(I changed my mind and need my object files back). I suppose I can move .git out
of the way blow all the files move it back and git pull or whatever
does a local checkout, but there must be a better way.
 





 
____________________________________________________________________________________
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.
http://mobile.yahoo.com/mail 

^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Errors cloning large repo
@ 2007-03-10  1:21 Anton Tropashko
  2007-03-10  1:45 ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Anton Tropashko @ 2007-03-10  1:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

> I should try it out with some made-up auto-generated directory setup, but 
> I'm not sure I have the energy to do it ;)

but your /usr should be large enough if /usr/local and /usr/local/src are not!!!
I don't think you need to generate anything.
Or you are saying that the problem is the number of files I have, not the
total size of the files? In any event there should be a plenty of files in /usr

> That said, it might also be a good idea (regardless of anything else) to 
> split things up, if only because it's quite possible that not everybody is 
> interested in having *everything*. Forcing people to work with a 8.5GB 
> repository when they might not care about it all could be a bad idea.

> "git reset --hard" will do it for you. As will "git checkout -f", for that 
> matter.

> "git revert" will just undo an old commit (as you apparently already found 
> out)

Yep. I found checkout -f works before I got the rest alternative.

I was pleased that git did not lock me out of committing a few
deletions for *.pdf, *.doc and makefiles after repack started.
repack -a -d just finished and I started clone again.
It's already deltifying at 6%.

Thank you.






 
____________________________________________________________________________________
Now that's room service!  Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
http://farechase.yahoo.com/promo-generic-14795097

^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Errors cloning large repo
@ 2007-03-10  2:37 Anton Tropashko
  2007-03-10  3:07 ` Shawn O. Pearce
  2007-03-10  5:10 ` Linus Torvalds
  0 siblings, 2 replies; 25+ messages in thread
From: Anton Tropashko @ 2007-03-10  2:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

> I suspect we shouldn't bother with the diffstat for the initial commit. 
> Just removing "--root" migth be sufficient.

My problem is git-clone though since for commit it's no big deal
to git commit [a-c]* , or use xargs as a workaround

For git clone I got this

Deltifying 144511 objects.
 100% (144511/144511) done
1625.375MB  (1713 kB/s)       
1729.057MB  (499 kB/s)       
/usr/bin/git-clone: line 321: 24360 File size limit exceededgit-fetch-pack --all -k $quiet "$repo"

again after git repack and don't see how to work around that aside from artifically
splitting the tree at the top or resorting to a tarball on an ftp site.
That 64 bit indexing code you previously mentioned would force me to upgrade git on both ends?
Anywhere I can pull it out from?






 
____________________________________________________________________________________
Food fight? Enjoy some healthy debate 
in the Yahoo! Answers Food & Drink Q&A.
http://answers.yahoo.com/dir/?link=list&sid=396545367

^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Errors cloning large repo
@ 2007-03-12 17:39 Anton Tropashko
  2007-03-12 18:40 ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Anton Tropashko @ 2007-03-12 17:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

> For example, if "du -sh" says 8.5GB, it doesn't necessarily mean that 
> there really is 8.5GB of data there.

> Its very likely this did fit in just under 4 GiB of packed data,
> but as you said, without O_LARGEFILE we can't work with it.

.git is 3.5GB according to du -H :)

> As Linus said earlier in this thread; Nico and I are working on
> pushing out the packfile limits, just not fast enough for some users
> needs apparently (sorry about that!).  Troy's patch was rejected

No problem.
You're providing things to work around faster than I can process them :-)

> So the "git repack" actually worked for you? It really shouldn't have 
> worked.

It did not complain. I did not check the exit status but there were no so
much as a single warning message:
index file has overflown the kernel will panic shortly. please stand by...

> Is the server side perhaps 64-bit? If so, the limit ends up being 4GB 
> instead of 2GB, and your 8.5GB project may actually fit.

both server and client are 32 bit.

> If so, we can trivially fix it with the current index file even for a 
> 32-bit machine. The reason we limit pack-files to 2GB on 32-bit machines 

Unfortunately the server machine is managed by IT. I can't install whatever
I want. The client is not and it's against the IT policy to have rogue linux boxes
on the net ;)

> So, wouldn't the correct fix be to automatically split a pack
> file in two pieces when it would become larger than 2 GB?

Just curious why won't you use something like 
PostgreSQL for data storage at this point, but, then
I know nothing about git internals :)

Anyhow, I have a patch to apply now and a bash script to hone my
bashing skills on. If you have anything else for me to test just shoot me
an e-mail.

I'm glad I can keep you all busy.






 
____________________________________________________________________________________
Expecting? Get great news right away with email Auto-Check. 
Try the Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html 

^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Errors cloning large repo
@ 2007-03-13  0:02 Anton Tropashko
  0 siblings, 0 replies; 25+ messages in thread
From: Anton Tropashko @ 2007-03-13  0:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

> probably don't have many deltas either, so I'm hoping that the fact 
> that I only have 5.7GB will approximate your data thanks to it not being 
> compressible).

I made a tarball for the sdk and it's 5.2GB
I think you do have a good test set.






 
____________________________________________________________________________________
8:00? 8:25? 8:40? Find a flick in no time 
with the Yahoo! Search movie showtime shortcut.
http://tools.search.yahoo.com/shortcuts/#news

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-03-17 13:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-09 19:20 Errors cloning large repo Anton Tropashko
2007-03-09 21:37 ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2007-03-09 23:48 Anton Tropashko
2007-03-10  0:54 ` Linus Torvalds
2007-03-10  2:03   ` Linus Torvalds
2007-03-10  2:12     ` Junio C Hamano
2007-03-10  1:21 Anton Tropashko
2007-03-10  1:45 ` Linus Torvalds
2007-03-10  2:37 Anton Tropashko
2007-03-10  3:07 ` Shawn O. Pearce
2007-03-10  5:54   ` Linus Torvalds
2007-03-10  6:01     ` Shawn O. Pearce
2007-03-10 22:32       ` Martin Waitz
2007-03-10 22:46         ` Linus Torvalds
2007-03-11 21:35           ` Martin Waitz
2007-03-10 10:27   ` Jakub Narebski
2007-03-11  2:00     ` Shawn O. Pearce
2007-03-12 11:09       ` Jakub Narebski
2007-03-12 14:24         ` Shawn O. Pearce
2007-03-17 13:23           ` Jakub Narebski
     [not found]   ` <82B0999F-73E8-494E-8D66-FEEEDA25FB91@adacore.com>
2007-03-10 22:21     ` Linus Torvalds
2007-03-10  5:10 ` Linus Torvalds
2007-03-12 17:39 Anton Tropashko
2007-03-12 18:40 ` Linus Torvalds
2007-03-13  0:02 Anton Tropashko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).