From: Linus Torvalds <torvalds@linux-foundation.org>
To: Anton Tropashko <atropashko@yahoo.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Errors cloning large repo
Date: Fri, 9 Mar 2007 17:45:22 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0703091736290.10832@woody.linux-foundation.org> (raw)
In-Reply-To: <891197.22028.qm@web52611.mail.yahoo.com>
On Fri, 9 Mar 2007, Anton Tropashko wrote:
>
> but your /usr should be large enough if /usr/local and /usr/local/src
> are not!!!
I don't like the size distribution.
My /usr has 181585 files, but is 4.0G in size, which doesn't match yours.
Also, I've wanted to generate bogus data for a while, just for testing, so
I wrote this silly program that I can tweak the size distribution for.
It gives me something that approaches your distribution (I ran it a few
times, I now have 110402 files, and 5.7GB of space according to 'du').
It's totally unrealistic wrt packing, though (no deltas, and no
compression, since the data itself is all random), and I don't know how to
approximate that kind of details samely.
I'll need to call it a day for the kids dinner etc, so I'm probably done
for the day. I'll play with this a bit more to see if I can find various
scalability issues (and just ignore the delta/compression problem - you
probably don't have many deltas either, so I'm hoping that the fact
that I only have 5.7GB will approximate your data thanks to it not being
compressible).
Linus
---
#include <time.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/fcntl.h>
/*
* Create a file with a random size in the range
* 0-1MB, but with a "pink noise"ish distribution
* (ie equally many files in the 1-2kB range as in
* the half-meg to megabyte range).
*/
static void create_file(const char *name)
{
int i;
int fd = open(name, O_CREAT | O_WRONLY | O_TRUNC, 0666);
static char buffer[1000];
unsigned long size = random() % (1 << (10+(random() % 10)));
if (fd < 0)
return;
for (i = 0; i < sizeof(buffer); i++)
buffer[i] = random();
while (size) {
int len = sizeof(buffer);
if (len > size)
len = size;
write(fd, buffer, len);
size -= len;
}
close(fd);
}
static void start(const char *base,
float dir_likely, float dir_expand,
float end_likely, float end_expand)
{
int len = strlen(base);
char *name = malloc(len + 10);
mkdir(base, 0777);
memcpy(name, base, len);
name[len++] = '/';
dir_likely *= dir_expand;
end_likely *= end_expand;
for (;;) {
float rand = (random() & 65535) / 65536.0;
sprintf(name + len, "%ld", random() % 1000000);
rand -= dir_likely;
if (rand < 0) {
start(name, dir_likely, dir_expand, end_likely, end_expand);
continue;
}
rand -= end_likely;
if (rand < 0)
break;
create_file(name);
}
}
int main(int argc, char **argv)
{
/*
* Tune the numbers to your liking..
*
* The floats are:
* - dir_likely (likelihood of creating a recursive directory)
* - dir_expand (how dir_likely behaves as we move down recursively)
* - end_likely (likelihood of ending file creation in a directory)
* - end_expand (how end_likely behaves as we move down recursively)
*
* The numbers 0.3/0.6 0.03/1.1 are totally made up, and for me
* generate a tree of between a few hundred files and a few tens
* of thousands of files.
*
* Re-run several times to generate more files in the tree.
*/
srandom(time(NULL));
start("tree",
0.3, 0.6,
0.02, 1.1);
return 0;
}
next prev parent reply other threads:[~2007-03-10 1:45 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-10 1:21 Errors cloning large repo Anton Tropashko
2007-03-10 1:45 ` Linus Torvalds [this message]
-- strict thread matches above, loose matches on Subject: below --
2007-03-13 0:02 Anton Tropashko
2007-03-12 17:39 Anton Tropashko
2007-03-12 18:40 ` Linus Torvalds
2007-03-10 2:37 Anton Tropashko
2007-03-10 3:07 ` Shawn O. Pearce
2007-03-10 5:54 ` Linus Torvalds
2007-03-10 6:01 ` Shawn O. Pearce
2007-03-10 22:32 ` Martin Waitz
2007-03-10 22:46 ` Linus Torvalds
2007-03-11 21:35 ` Martin Waitz
2007-03-10 10:27 ` Jakub Narebski
2007-03-11 2:00 ` Shawn O. Pearce
2007-03-12 11:09 ` Jakub Narebski
2007-03-12 14:24 ` Shawn O. Pearce
2007-03-17 13:23 ` Jakub Narebski
[not found] ` <82B0999F-73E8-494E-8D66-FEEEDA25FB91@adacore.com>
2007-03-10 22:21 ` Linus Torvalds
2007-03-10 5:10 ` Linus Torvalds
2007-03-09 23:48 Anton Tropashko
2007-03-10 0:54 ` Linus Torvalds
2007-03-10 2:03 ` Linus Torvalds
2007-03-10 2:12 ` Junio C Hamano
2007-03-09 19:20 Anton Tropashko
2007-03-09 21:37 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0703091736290.10832@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=atropashko@yahoo.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).