git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Funny error with git gc...
@ 2009-05-15 18:02 Johannes Schindelin
  2009-05-15 18:10 ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Schindelin @ 2009-05-15 18:02 UTC (permalink / raw)
  To: git

Hi,

if you need a chuckle, like me, you might appreciate this story: in one of 
my repositories, "git gc" dies with

	unable to open object pack directory: ...: Too many open files

turns out that there are a whopping 1088 packs in that repository...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 18:02 Funny error with git gc Johannes Schindelin
@ 2009-05-15 18:10 ` Junio C Hamano
  2009-05-15 18:24   ` Avery Pennarun
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Junio C Hamano @ 2009-05-15 18:10 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> if you need a chuckle, like me, you might appreciate this story: in one of 
> my repositories, "git gc" dies with
>
> 	unable to open object pack directory: ...: Too many open files
>
> turns out that there are a whopping 1088 packs in that repository...

Isn't it a more serious problem than a mere chuckle?  How would one
recover from such a situation (other than "mv .git/objects/pack-*;
for p in pack-*.pack; do git unpack-objects <$p; done")?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 18:10 ` Junio C Hamano
@ 2009-05-15 18:24   ` Avery Pennarun
  2009-05-15 18:46   ` Linus Torvalds
  2009-05-15 18:52   ` [MAKESHIFT PATCH] Cope better with a _lot_ of packs Johannes Schindelin
  2 siblings, 0 replies; 18+ messages in thread
From: Avery Pennarun @ 2009-05-15 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Fri, May 15, 2009 at 2:10 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> if you need a chuckle, like me, you might appreciate this story: in one of
>> my repositories, "git gc" dies with
>>
>>       unable to open object pack directory: ...: Too many open files
>>
>> turns out that there are a whopping 1088 packs in that repository...
>
> Isn't it a more serious problem than a mere chuckle?  How would one
> recover from such a situation (other than "mv .git/objects/pack-*;
> for p in pack-*.pack; do git unpack-objects <$p; done")?

I guess if you have root access, you could increase your file ulimit
(-n) temporarily and then repack.  Of course that's not actually
addressing the root cause of the problem.

Avery

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 18:10 ` Junio C Hamano
  2009-05-15 18:24   ` Avery Pennarun
@ 2009-05-15 18:46   ` Linus Torvalds
  2009-05-15 19:08     ` Johannes Schindelin
  2009-05-18  8:28     ` Matthias Andree
  2009-05-15 18:52   ` [MAKESHIFT PATCH] Cope better with a _lot_ of packs Johannes Schindelin
  2 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2009-05-15 18:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git



On Fri, 15 May 2009, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > if you need a chuckle, like me, you might appreciate this story: in one of 
> > my repositories, "git gc" dies with
> >
> > 	unable to open object pack directory: ...: Too many open files
> >
> > turns out that there are a whopping 1088 packs in that repository...
> 
> Isn't it a more serious problem than a mere chuckle?  How would one
> recover from such a situation (other than "mv .git/objects/pack-*;
> for p in pack-*.pack; do git unpack-objects <$p; done")?

Well, you can probably just increase the file limits and try again. 
Depending on setup, you may need root to do so, though.

I also think you _should_ be able to avoid this by just limiting the pack 
size usage. IOW, with some packed_git_limit, something like

	[core]
		packedGitWindowSize = 16k
		packedGitLimit = 1M

you should hopefully be able to repack (slowly) even with a low file 
descriptor limit, because of the total limit on the size.

That said, I do agree that ulimit doesn't always work on all systems 
(whether due to hard system limits or due to not having permission to 
raise the limits), and playing games with pack limits is non-obvious. We 
should really try to avoid getting into such a situation. But I think git 
by default avoids it by the auto-gc, no? So you have to disable that 
explicitly to get into this bad situation.

One solution - which I think may be the right one regardless - is to not 
use "mmap()" for small packs or small SHA1 files.

mmap is great for random-access multi-use scenarios (and to avoid some 
memory pressure by allowing sharing of pages), but for anything that is 
just a couple of pages in size, mmap() just adds big overhead with little 
upside. 

So if we use malloc+read for small things, we'd probably avoid this. Now, 
if you have a few thousand _large_ packs, you'd still be screwed, but the 
most likely reason for having a thousand packfiles is that you did daily 
"git pull"s, and have lots and lots of packs that are pretty small.

Dscho? What are your pack-file statistics in this case?

		Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MAKESHIFT PATCH] Cope better with a _lot_ of packs
  2009-05-15 18:10 ` Junio C Hamano
  2009-05-15 18:24   ` Avery Pennarun
  2009-05-15 18:46   ` Linus Torvalds
@ 2009-05-15 18:52   ` Johannes Schindelin
  2009-05-15 19:17     ` Shawn O. Pearce
  2 siblings, 1 reply; 18+ messages in thread
From: Johannes Schindelin @ 2009-05-15 18:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


You might end up with a situation where you have tons of pack files, e.g.
when using hg2git.  In this situation, all kinds of operations may 
end up with a "too many files open" error.  Let's recover gracefully from 
that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	On Fri, 15 May 2009, Junio C Hamano wrote:

	> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
	> 
	> > if you need a chuckle, like me, you might appreciate this 
	> > story: in one of my repositories, "git gc" dies with
	> >
	> > 	unable to open object pack directory: ...: Too many open files
	> >
	> > turns out that there are a whopping 1088 packs in that 
	> > repository...
	> 
	> Isn't it a more serious problem than a mere chuckle?

	Well, it was kind of morbid humor, I guess.  So much so that I 
	just had to send off that mail when I encountered the problem.

	> How would one recover from such a situation (other than
	> "mv .git/objects/pack-*; for p in pack-*.pack; do git unpack-objects
	> <$p; done")?

	Very easy: one writes a patch.

	I am sure that I did not catch all problematic sites, but this 
	patch works almost, at least.

	Another issue is how to include a test for this... I cannot think
	of a non-expensive way.

 sha1_file.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 28bd908..bd5edd8 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -720,6 +720,8 @@ static int open_packed_git_1(struct packed_git *p)
 		return error("packfile %s index unavailable", p->pack_name);
 
 	p->pack_fd = open(p->pack_name, O_RDONLY);
+	while (p->pack_fd < 0 && errno == EMFILE && unuse_one_window(p, -1))
+		p->pack_fd = open(p->pack_name, O_RDONLY);
 	if (p->pack_fd < 0 || fstat(p->pack_fd, &st))
 		return -1;
 
@@ -937,6 +939,8 @@ static void prepare_packed_git_one(char *objdir, int local)
 	sprintf(path, "%s/pack", objdir);
 	len = strlen(path);
 	dir = opendir(path);
+	while (!dir && errno == EMFILE && unuse_one_window(packed_git, -1))
+		dir = opendir(path);
 	if (!dir) {
 		if (errno != ENOENT)
 			error("unable to open object pack directory: %s: %s",
@@ -2339,6 +2343,8 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 
 	filename = sha1_file_name(sha1);
 	fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
+	while (fd < 0 && errno == EMFILE && unuse_one_window(packed_git, -1))
+		fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
 	if (fd < 0) {
 		if (errno == EACCES)
 			return error("insufficient permission for adding an object to repository database %s\n", get_object_directory());
-- 
1.6.3.1.10.g12b3.dirty

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 18:46   ` Linus Torvalds
@ 2009-05-15 19:08     ` Johannes Schindelin
  2009-05-15 19:12       ` Johannes Schindelin
  2009-05-15 19:16       ` Linus Torvalds
  2009-05-18  8:28     ` Matthias Andree
  1 sibling, 2 replies; 18+ messages in thread
From: Johannes Schindelin @ 2009-05-15 19:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Hi,

On Fri, 15 May 2009, Linus Torvalds wrote:

> On Fri, 15 May 2009, Junio C Hamano wrote:
> 
> > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > 
> > > if you need a chuckle, like me, you might appreciate this story: in 
> > > one of my repositories, "git gc" dies with
> > >
> > > 	unable to open object pack directory: ...: Too many open files
> > >
> > > turns out that there are a whopping 1088 packs in that repository...
> > 
> > Isn't it a more serious problem than a mere chuckle?  How would one 
> > recover from such a situation (other than "mv .git/objects/pack-*; for 
> > p in pack-*.pack; do git unpack-objects <$p; done")?
> 
> Well, you can probably just increase the file limits and try again. 
> Depending on setup, you may need root to do so, though.
> 
> I also think you _should_ be able to avoid this by just limiting the 
> pack size usage. IOW, with some packed_git_limit, something like
> 
> 	[core]
> 		packedGitWindowSize = 16k
> 		packedGitLimit = 1M
> 
> you should hopefully be able to repack (slowly) even with a low file 
> descriptor limit, because of the total limit on the size.

I don't think so, because the window size has nothing to do with the 
amount of open windows, right?

> That said, I do agree that ulimit doesn't always work on all systems 
> (whether due to hard system limits or due to not having permission to 
> raise the limits), and playing games with pack limits is non-obvious. We 
> should really try to avoid getting into such a situation. But I think git 
> by default avoids it by the auto-gc, no? So you have to disable that 
> explicitly to get into this bad situation.

No, in this case, nothing was disabled.  auto-gc did not kick in, probably 
due to funny Git usage in hg2git.

> One solution - which I think may be the right one regardless - is to not 
> use "mmap()" for small packs or small SHA1 files.
> 
> mmap is great for random-access multi-use scenarios (and to avoid some 
> memory pressure by allowing sharing of pages), but for anything that is 
> just a couple of pages in size, mmap() just adds big overhead with 
> little upside.
> 
> So if we use malloc+read for small things, we'd probably avoid this. Now, 
> if you have a few thousand _large_ packs, you'd still be screwed, but the 
> most likely reason for having a thousand packfiles is that you did daily 
> "git pull"s, and have lots and lots of packs that are pretty small.
> 
> Dscho? What are your pack-file statistics in this case?

Mostly around 50kB.

But using malloc()+read() to avoid my use case sounds not 
straight-forward; it is rather a work-around than a proper solution.

For performance, I agree that malloc()+read() might be a sensible thing in 
a lot of cases.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 19:08     ` Johannes Schindelin
@ 2009-05-15 19:12       ` Johannes Schindelin
  2009-05-15 19:18         ` Johannes Schindelin
  2009-05-15 19:16       ` Linus Torvalds
  1 sibling, 1 reply; 18+ messages in thread
From: Johannes Schindelin @ 2009-05-15 19:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Hi,

On Fri, 15 May 2009, Johannes Schindelin wrote:


> > 	[core]
> > 		packedGitWindowSize = 16k
> > 		packedGitLimit = 1M
> > 
> > you should hopefully be able to repack (slowly) even with a low file 
> > descriptor limit, because of the total limit on the size.
> 
> I don't think so, because the window size has nothing to do with the 
> amount of open windows, right?

Obviously I overlooked the "packedGitLimit" setting.  Still, just a 
work-around, not a solution that tackles the problem heads-on.

Sorry for the noise,
Dscho

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 19:08     ` Johannes Schindelin
  2009-05-15 19:12       ` Johannes Schindelin
@ 2009-05-15 19:16       ` Linus Torvalds
  2009-05-15 19:24         ` Johannes Schindelin
  2009-05-15 20:30         ` Daniel Barkalow
  1 sibling, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2009-05-15 19:16 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git



On Fri, 15 May 2009, Johannes Schindelin wrote:
> > 
> > 	[core]
> > 		packedGitWindowSize = 16k
> > 		packedGitLimit = 1M
> > 
> > you should hopefully be able to repack (slowly) even with a low file 
> > descriptor limit, because of the total limit on the size.
> 
> I don't think so, because the window size has nothing to do with the 
> amount of open windows, right?

No, but the packedGitLimit does.

So the argument is that you can't fit all that many open windows in 1M. 

> No, in this case, nothing was disabled.  auto-gc did not kick in, probably 
> due to funny Git usage in hg2git.

Ahh. Scripting.

> > Dscho? What are your pack-file statistics in this case?
> 
> Mostly around 50kB.

Can you give us an approximation for how many are under 16kB or so?

The 16kB cutoff is where it's usually just better to malloc+read (because 
even if you don't end up using all the 16kB, just a page-fault or two is 
already more expensive than just doing the memcpy implied in a 16k read() 
system call).

> But using malloc()+read() to avoid my use case sounds not 
> straight-forward; it is rather a work-around than a proper solution.

Well, it's a workaround that is correct for other reasons too. So it's 
likely worth doing.

The "proper solution" is likely to not avoid repacking. Scripted stuff 
that does that is buggy.

			Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [MAKESHIFT PATCH] Cope better with a _lot_ of packs
  2009-05-15 18:52   ` [MAKESHIFT PATCH] Cope better with a _lot_ of packs Johannes Schindelin
@ 2009-05-15 19:17     ` Shawn O. Pearce
  2009-05-21  1:22       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Shawn O. Pearce @ 2009-05-15 19:17 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> You might end up with a situation where you have tons of pack files, e.g.
> when using hg2git.  In this situation, all kinds of operations may 
> end up with a "too many files open" error.  Let's recover gracefully from 
> that.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

Yea, this looks right to me.  JGit had a similar problem and now
maintains a core.packedGitOpenFiles parameter, set to 128 by default,
as one of the rules it uses to limit the size of its pack cache area.

> diff --git a/sha1_file.c b/sha1_file.c
> index 28bd908..bd5edd8 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -720,6 +720,8 @@ static int open_packed_git_1(struct packed_git *p)
>  		return error("packfile %s index unavailable", p->pack_name);
>  
>  	p->pack_fd = open(p->pack_name, O_RDONLY);
> +	while (p->pack_fd < 0 && errno == EMFILE && unuse_one_window(p, -1))
> +		p->pack_fd = open(p->pack_name, O_RDONLY);
>  	if (p->pack_fd < 0 || fstat(p->pack_fd, &st))
>  		return -1;
>  
> @@ -937,6 +939,8 @@ static void prepare_packed_git_one(char *objdir, int local)
>  	sprintf(path, "%s/pack", objdir);
>  	len = strlen(path);
>  	dir = opendir(path);
> +	while (!dir && errno == EMFILE && unuse_one_window(packed_git, -1))
> +		dir = opendir(path);
>  	if (!dir) {
>  		if (errno != ENOENT)
>  			error("unable to open object pack directory: %s: %s",
> @@ -2339,6 +2343,8 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
>  
>  	filename = sha1_file_name(sha1);
>  	fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
> +	while (fd < 0 && errno == EMFILE && unuse_one_window(packed_git, -1))
> +		fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
>  	if (fd < 0) {
>  		if (errno == EACCES)
>  			return error("insufficient permission for adding an object to repository database %s\n", get_object_directory());
> -- 

-- 
Shawn.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 19:12       ` Johannes Schindelin
@ 2009-05-15 19:18         ` Johannes Schindelin
  2009-05-15 20:08           ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Schindelin @ 2009-05-15 19:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Hi,

On Fri, 15 May 2009, Johannes Schindelin wrote:

> On Fri, 15 May 2009, Johannes Schindelin wrote:
> 
> 
> > > 	[core]
> > > 		packedGitWindowSize = 16k
> > > 		packedGitLimit = 1M
> > > 
> > > you should hopefully be able to repack (slowly) even with a low file 
> > > descriptor limit, because of the total limit on the size.
> > 
> > I don't think so, because the window size has nothing to do with the 
> > amount of open windows, right?
> 
> Obviously I overlooked the "packedGitLimit" setting.  Still, just a 
> work-around, not a solution that tackles the problem heads-on.

But it worked with the work-around, and not with my heads-on approach.  
The problem is in xmkstemp(), but when I apply this patch on top of my 
MAKESHIFT PATCH:

-- snip --
diff --git a/wrapper.c b/wrapper.c
index d8efb13..47806ba 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -192,8 +192,13 @@ int xmkstemp(char *template)
 	int fd;
 
 	fd = mkstemp(template);
+	if (fd < 0) {
+		release_pack_memory((size_t)-1ll, -1);
+		fd = mkstemp(template);
+	}
 	if (fd < 0)
-		die("Unable to create temporary file: %s", strerror(errno));
+		die("Unable to create temporary file '%s': %s",
+				template, strerror(errno));
 	return fd;
 }
 
-- snap --

I get an even funnier error:

fatal: Unable to create temporary file '.git/objects/pack/tmp_pack_rvhvZb': Invalid argument
error: failed to run repack

Oh well, I'll let it sit over the weekend...

Ciao,
Dscho

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 19:16       ` Linus Torvalds
@ 2009-05-15 19:24         ` Johannes Schindelin
  2009-05-15 20:30         ` Daniel Barkalow
  1 sibling, 0 replies; 18+ messages in thread
From: Johannes Schindelin @ 2009-05-15 19:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Hi,

On Fri, 15 May 2009, Linus Torvalds wrote:

> On Fri, 15 May 2009, Johannes Schindelin wrote:
>
> > Linus said:
> >
> > > Dscho? What are your pack-file statistics in this case?
> > 
> > Mostly around 50kB.
> 
> Can you give us an approximation for how many are under 16kB or so?

None.

Starts at around 20kB, median is 34kB.

> The "proper solution" is likely to not avoid repacking. Scripted stuff 
> that does that is buggy.

Yeah, but maybe there are even more clueless people than me running into 
that situation.

Anyway, off for the weekend,
Dscho

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 19:18         ` Johannes Schindelin
@ 2009-05-15 20:08           ` Linus Torvalds
  2009-05-23  8:53             ` Johannes Schindelin
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2009-05-15 20:08 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git



On Fri, 15 May 2009, Johannes Schindelin wrote:
>  
>  	fd = mkstemp(template);
> +	if (fd < 0) {
> +		release_pack_memory((size_t)-1ll, -1);
> +		fd = mkstemp(template);

This is wrong. You can't use "template" twice. You need to re-initialize 
it. The first one will blow away the XXXX pattern.

		Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 19:16       ` Linus Torvalds
  2009-05-15 19:24         ` Johannes Schindelin
@ 2009-05-15 20:30         ` Daniel Barkalow
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel Barkalow @ 2009-05-15 20:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Junio C Hamano, git

On Fri, 15 May 2009, Linus Torvalds wrote:

> On Fri, 15 May 2009, Johannes Schindelin wrote:
> > > 
> > > 	[core]
> > > 		packedGitWindowSize = 16k
> > > 		packedGitLimit = 1M
> > > 
> > > you should hopefully be able to repack (slowly) even with a low file 
> > > descriptor limit, because of the total limit on the size.
> > 
> > I don't think so, because the window size has nothing to do with the 
> > amount of open windows, right?
> 
> No, but the packedGitLimit does.
> 
> So the argument is that you can't fit all that many open windows in 1M. 
> 
> > No, in this case, nothing was disabled.  auto-gc did not kick in, probably 
> > due to funny Git usage in hg2git.
> 
> Ahh. Scripting.
> 
> > > Dscho? What are your pack-file statistics in this case?
> > 
> > Mostly around 50kB.
> 
> Can you give us an approximation for how many are under 16kB or so?
> 
> The 16kB cutoff is where it's usually just better to malloc+read (because 
> even if you don't end up using all the 16kB, just a page-fault or two is 
> already more expensive than just doing the memcpy implied in a 16k read() 
> system call).
> 
> > But using malloc()+read() to avoid my use case sounds not 
> > straight-forward; it is rather a work-around than a proper solution.
> 
> Well, it's a workaround that is correct for other reasons too. So it's 
> likely worth doing.
> 
> The "proper solution" is likely to not avoid repacking. Scripted stuff 
> that does that is buggy.

I think it's going to be not-too-uncommon in fast-import scripts to want 
to checkpoint periodically so that you don't lose your progress if 
interacting with the foreign system starts failing partway through, and 
you'll obviously want to repack afterwards rather than in the middle. But 
if you have different input characteristics than the tool was optimized 
for, you could end up with a ton of packs by the time it finishes.

Obviously, you don't want to avoid repacking before you actually use the 
repository, but I think it's reasonable to wait until the end of the 
script to repack, which might actually be too late.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 18:46   ` Linus Torvalds
  2009-05-15 19:08     ` Johannes Schindelin
@ 2009-05-18  8:28     ` Matthias Andree
  1 sibling, 0 replies; 18+ messages in thread
From: Matthias Andree @ 2009-05-18  8:28 UTC (permalink / raw)
  To: Linus Torvalds, Junio C Hamano; +Cc: Johannes Schindelin, git

Am 15.05.2009, 20:46 Uhr, schrieb Linus Torvalds  
<torvalds@linux-foundation.org>:

> I also think you _should_ be able to avoid this by just limiting the pack
> size usage. IOW, with some packed_git_limit, something like
>
> 	[core]
> 		packedGitWindowSize = 16k
> 		packedGitLimit = 1M

Does the latter do the equivalent of "--max-pack-size"?

If it does, then there is a related issue on the output side - and that  
is, that I get humongous amounts of packs with that option. Say, I'm  
repacking my leafnode repos, then
with --max-pack-size=1, I get eight packs of 1 MB (8 MB),
  and --max-pack-size=4  yields one pack of 2 - 3 MB size...

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [MAKESHIFT PATCH] Cope better with a _lot_ of packs
  2009-05-15 19:17     ` Shawn O. Pearce
@ 2009-05-21  1:22       ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2009-05-21  1:22 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Johannes Schindelin, Junio C Hamano, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>> 
>> You might end up with a situation where you have tons of pack files, e.g.
>> when using hg2git.  In this situation, all kinds of operations may 
>> end up with a "too many files open" error.  Let's recover gracefully from 
>> that.
>> 
>> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Yea, this looks right to me.  JGit had a similar problem and now
> maintains a core.packedGitOpenFiles parameter, set to 128 by default,
> as one of the rules it uses to limit the size of its pack cache area.
>
>> diff --git a/sha1_file.c b/sha1_file.c
>> index 28bd908..bd5edd8 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -720,6 +720,8 @@ static int open_packed_git_1(struct packed_git *p)
>>  		return error("packfile %s index unavailable", p->pack_name);
>>  
>>  	p->pack_fd = open(p->pack_name, O_RDONLY);
>> +	while (p->pack_fd < 0 && errno == EMFILE && unuse_one_window(p, -1))
>> +		p->pack_fd = open(p->pack_name, O_RDONLY);
>>  	if (p->pack_fd < 0 || fstat(p->pack_fd, &st))
>>  		return -1;

I wonder if this use of unuse_one_window() can be made more clever to drop
windows from the same packfile first.  If you have 200 packfiles with 2
windows open for each one when you ran out the file descriptors, you do
not want to drop one window from each packfile (which would not give you
any free file descriptor) and then finally release another window (at
which point we have one packfile with no window and we can close the
descriptor).

But at the same time I do not know if it matters in practice.  Presumably
you have too many packfiles because many of them are very small.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-15 20:08           ` Linus Torvalds
@ 2009-05-23  8:53             ` Johannes Schindelin
  2009-05-23 16:02               ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Schindelin @ 2009-05-23  8:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Hi,

On Fri, 15 May 2009, Linus Torvalds wrote:

> On Fri, 15 May 2009, Johannes Schindelin wrote:
> >  
> >  	fd = mkstemp(template);
> > +	if (fd < 0) {
> > +		release_pack_memory((size_t)-1ll, -1);
> > +		fd = mkstemp(template);
> 
> This is wrong. You can't use "template" twice. You need to re-initialize 
> it. The first one will blow away the XXXX pattern.

Turns out that would be awkward, as the xmkstemp() function would have to 
copy the template just in case mkstemp() fails due to too many open files, 
and of course it would need to release the copy afterwards.

OTOH we cannot just use the initialized filename, because there 
might be a race condition with another process, right?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-23  8:53             ` Johannes Schindelin
@ 2009-05-23 16:02               ` Linus Torvalds
  2009-05-24 17:37                 ` Johannes Schindelin
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2009-05-23 16:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git



On Sat, 23 May 2009, Johannes Schindelin wrote:
> On Fri, 15 May 2009, Linus Torvalds wrote:
> 
> > On Fri, 15 May 2009, Johannes Schindelin wrote:
> > >  
> > >  	fd = mkstemp(template);
> > > +	if (fd < 0) {
> > > +		release_pack_memory((size_t)-1ll, -1);
> > > +		fd = mkstemp(template);
> > 
> > This is wrong. You can't use "template" twice. You need to re-initialize 
> > it. The first one will blow away the XXXX pattern.
> 
> Turns out that would be awkward, as the xmkstemp() function would have to 
> copy the template just in case mkstemp() fails due to too many open files, 
> and of course it would need to release the copy afterwards.
> 
> OTOH we cannot just use the initialized filename, because there 
> might be a race condition with another process, right?

Correct. You basically need to have the caller re-create the template. We 
already do that in some other cases: see odb_mkstemp(), or 
create_tempfile(). Both of them do that template re-creation on failure, 
and try again.

You could save off the template into a local temporary array, of course.

		Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Funny error with git gc...
  2009-05-23 16:02               ` Linus Torvalds
@ 2009-05-24 17:37                 ` Johannes Schindelin
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Schindelin @ 2009-05-24 17:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Hi,

On Sat, 23 May 2009, Linus Torvalds wrote:

> On Sat, 23 May 2009, Johannes Schindelin wrote:
> > On Fri, 15 May 2009, Linus Torvalds wrote:
> > 
> > > On Fri, 15 May 2009, Johannes Schindelin wrote:
> > > >  
> > > >  	fd = mkstemp(template);
> > > > +	if (fd < 0) {
> > > > +		release_pack_memory((size_t)-1ll, -1);
> > > > +		fd = mkstemp(template);
> > > 
> > > This is wrong. You can't use "template" twice. You need to re-initialize 
> > > it. The first one will blow away the XXXX pattern.
> > 
> > Turns out that would be awkward, as the xmkstemp() function would have to 
> > copy the template just in case mkstemp() fails due to too many open files, 
> > and of course it would need to release the copy afterwards.
> > 
> > OTOH we cannot just use the initialized filename, because there 
> > might be a race condition with another process, right?
> 
> Correct. You basically need to have the caller re-create the template. We 
> already do that in some other cases: see odb_mkstemp(), or 
> create_tempfile(). Both of them do that template re-creation on failure, 
> and try again.
> 
> You could save off the template into a local temporary array, of course.

I'll let it slip, due to time constraints.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-05-24 17:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-15 18:02 Funny error with git gc Johannes Schindelin
2009-05-15 18:10 ` Junio C Hamano
2009-05-15 18:24   ` Avery Pennarun
2009-05-15 18:46   ` Linus Torvalds
2009-05-15 19:08     ` Johannes Schindelin
2009-05-15 19:12       ` Johannes Schindelin
2009-05-15 19:18         ` Johannes Schindelin
2009-05-15 20:08           ` Linus Torvalds
2009-05-23  8:53             ` Johannes Schindelin
2009-05-23 16:02               ` Linus Torvalds
2009-05-24 17:37                 ` Johannes Schindelin
2009-05-15 19:16       ` Linus Torvalds
2009-05-15 19:24         ` Johannes Schindelin
2009-05-15 20:30         ` Daniel Barkalow
2009-05-18  8:28     ` Matthias Andree
2009-05-15 18:52   ` [MAKESHIFT PATCH] Cope better with a _lot_ of packs Johannes Schindelin
2009-05-15 19:17     ` Shawn O. Pearce
2009-05-21  1:22       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).