git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Anton Tropashko <atropashko@yahoo.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Errors cloning large repo
Date: Fri, 9 Mar 2007 17:45:22 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0703091736290.10832@woody.linux-foundation.org> (raw)
In-Reply-To: <891197.22028.qm@web52611.mail.yahoo.com>



On Fri, 9 Mar 2007, Anton Tropashko wrote:
> 
> but your /usr should be large enough if /usr/local and /usr/local/src 
> are not!!!

I don't like the size distribution.

My /usr has 181585 files, but is 4.0G in size, which doesn't match yours. 
Also, I've wanted to generate bogus data for a while, just for testing, so 
I wrote this silly program that I can tweak the size distribution for.

It gives me something that approaches your distribution (I ran it a few 
times, I now have 110402 files, and 5.7GB of space according to 'du').

It's totally unrealistic wrt packing, though (no deltas, and no 
compression, since the data itself is all random), and I don't know how to 
approximate that kind of details samely.

I'll need to call it a day for the kids dinner etc, so I'm probably done 
for the day. I'll play with this a bit more to see if I can find various 
scalability issues (and just ignore the delta/compression problem - you 
probably don't have many deltas either, so I'm hoping that the fact 
that I only have 5.7GB will approximate your data thanks to it not being 
compressible).

		Linus

---
#include <time.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/fcntl.h>

/*
 * Create a file with a random size in the range
 * 0-1MB, but with a "pink noise"ish distribution
 * (ie equally many files in the 1-2kB range as in
 * the half-meg to megabyte range).
 */
static void create_file(const char *name)
{
	int i;
	int fd = open(name, O_CREAT | O_WRONLY | O_TRUNC, 0666);
	static char buffer[1000];
	unsigned long size = random() % (1 << (10+(random() % 10)));

	if (fd < 0)
		return;
	for (i = 0; i < sizeof(buffer); i++)
		buffer[i] = random();
	while (size) {
		int len = sizeof(buffer);
		if (len > size)
			len = size;
		write(fd, buffer, len);
		size -= len;
	}
	close(fd);
}

static void start(const char *base,
	float dir_likely, float dir_expand,
	float end_likely, float end_expand)
{
	int len = strlen(base);
	char *name = malloc(len + 10);

	mkdir(base, 0777);

	memcpy(name, base, len);
	name[len++] = '/';

	dir_likely *= dir_expand;
	end_likely *= end_expand;

	for (;;) {
		float rand = (random() & 65535) / 65536.0;

		sprintf(name + len, "%ld", random() % 1000000);
		rand -= dir_likely;
		if (rand < 0) {
			start(name, dir_likely, dir_expand, end_likely, end_expand);
			continue;
		}
		rand -= end_likely;
		if (rand < 0)
			break;
		create_file(name);
	}
}

int main(int argc, char **argv)
{
	/*
	 * Tune the numbers to your liking..
	 *
	 * The floats are:
	 *  - dir_likely (likelihood of creating a recursive directory)
	 *  - dir_expand (how dir_likely behaves as we move down recursively)
	 *  - end_likely (likelihood of ending file creation in a directory)
	 *  - end_expand (how end_likely behaves as we move down recursively)
	 *
	 * The numbers 0.3/0.6 0.03/1.1 are totally made up, and for me
	 * generate a tree of between a few hundred files and a few tens 
	 * of thousands of files.
	 *
	 * Re-run several times to generate more files in the tree.
	 */
	srandom(time(NULL));
	start("tree",
		0.3, 0.6,
		0.02, 1.1);
	return 0;
}

  reply	other threads:[~2007-03-10  1:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-10  1:21 Errors cloning large repo Anton Tropashko
2007-03-10  1:45 ` Linus Torvalds [this message]
  -- strict thread matches above, loose matches on Subject: below --
2007-03-13  0:02 Anton Tropashko
2007-03-12 17:39 Anton Tropashko
2007-03-12 18:40 ` Linus Torvalds
2007-03-10  2:37 Anton Tropashko
2007-03-10  3:07 ` Shawn O. Pearce
2007-03-10  5:54   ` Linus Torvalds
2007-03-10  6:01     ` Shawn O. Pearce
2007-03-10 22:32       ` Martin Waitz
2007-03-10 22:46         ` Linus Torvalds
2007-03-11 21:35           ` Martin Waitz
2007-03-10 10:27   ` Jakub Narebski
2007-03-11  2:00     ` Shawn O. Pearce
2007-03-12 11:09       ` Jakub Narebski
2007-03-12 14:24         ` Shawn O. Pearce
2007-03-17 13:23           ` Jakub Narebski
     [not found]   ` <82B0999F-73E8-494E-8D66-FEEEDA25FB91@adacore.com>
2007-03-10 22:21     ` Linus Torvalds
2007-03-10  5:10 ` Linus Torvalds
2007-03-09 23:48 Anton Tropashko
2007-03-10  0:54 ` Linus Torvalds
2007-03-10  2:03   ` Linus Torvalds
2007-03-10  2:12     ` Junio C Hamano
2007-03-09 19:20 Anton Tropashko
2007-03-09 21:37 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0703091736290.10832@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=atropashko@yahoo.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).