git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] CGit v0.1-pre
@ 2006-12-10 23:42 Lars Hjemli
  2006-12-11  0:08 ` Jakub Narebski
  2006-12-11  1:04 ` Linus Torvalds
  0 siblings, 2 replies; 12+ messages in thread
From: Lars Hjemli @ 2006-12-10 23:42 UTC (permalink / raw)
  To: git

CGit is another cgi-app for git.

It is written in C (using libgit.a), and implements an internal page
cache. A PoC version is available for cloning at

  git://hjemli.net/pub/git/cgit

and it is also (barely) selfhosted at

  http://hjemli.net/git/


For a quick summary of the cache algorithm, the project README is here:

  http://hjemli.net/git/cgit/view/?id=5917c37ce30b3f0a374c9fa376955f51f1d7bfbf

Enjoy!

-- 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
  2006-12-10 23:42 [ANNOUNCE] CGit v0.1-pre Lars Hjemli
@ 2006-12-11  0:08 ` Jakub Narebski
       [not found]   ` <8c5c35580612101616g179715ecyd02fcbb023246ecc@mail.gmail.com>
  2006-12-11 13:37   ` Michael
  2006-12-11  1:04 ` Linus Torvalds
  1 sibling, 2 replies; 12+ messages in thread
From: Jakub Narebski @ 2006-12-11  0:08 UTC (permalink / raw)
  To: git

Lars Hjemli wrote:

> CGit is another cgi-app for git.

I have added this to GitWiki: check out
  http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#cgit

Hmmm... git has now 4 web interfaces (5 if counting gittracker, which works
but gittracker repository shows empty): gitweb in Perl, git-php in PHP, wit
in Python (is it actively developed? where it can be downloaded from?),
gitarella in Ruby, now CGit in C...

By the way, it is cgit, CGit or CGIt ;-p?
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
       [not found]   ` <8c5c35580612101616g179715ecyd02fcbb023246ecc@mail.gmail.com>
@ 2006-12-11  0:29     ` Jakub Narebski
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Narebski @ 2006-12-11  0:29 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: git

Lars Hjemli wrote:
> On 12/11/06, Jakub Narebski <jnareb@gmail.com> wrote:
> 
>> Hmmm... git has now 4 web interfaces (5 if counting gittracker, which works
>> but gittracker repository shows empty): gitweb in Perl, git-php in PHP, wit
>> in Python (is it actively developed? where it can be downloaded from?),
>> gitarella in Ruby, now CGit in C...
> 
> Hmm, it must be a popular scm :-D

Miscalculation: it is 5 with yours (6 counting gittracker).

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
  2006-12-10 23:42 [ANNOUNCE] CGit v0.1-pre Lars Hjemli
  2006-12-11  0:08 ` Jakub Narebski
@ 2006-12-11  1:04 ` Linus Torvalds
  2006-12-11  8:33   ` Lars Hjemli
  1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2006-12-11  1:04 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: git



On Mon, 11 Dec 2006, Lars Hjemli wrote:
> 
> For a quick summary of the cache algorithm, the project README is here:
> 
>  http://hjemli.net/git/cgit/view/?id=5917c37ce30b3f0a374c9fa376955f51f1d7bfbf

Your pseudo-algorithm is dubious:

		name = generate_cache_name(request);
	top:
		if (!exists(name)) {
			if (lock_cache(name)) {
				generate_cache(request, name);
				unlock_cache(name);
			} else {
				sched_yield();
				goto top;
			}
		} else if (expired(name)) {
			if (lock_cache(name)) {
				generate_cache(request, name);
				unlock_cache(name);
			}
		}
		print_file(name);


You really should have:

	if (!exists) {
		if (!lock)
			delay-and-repeat;
		/* RETEST exists _after_ getting the lock */
		if (!exists) {
			generate into lock-file
			mv lockfile exists;
		} else {
			rm lockfile
		}
	}

because you really want to re-check the existence after you got the lock, 
otherwise you would race with somebody else that got the lock, generated 
the data, and then unlocked (and you got the lock _after_ the data was 
generated, so now you generate it unnecessarily).

As a side note: how do you release your caches? 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
  2006-12-11  1:04 ` Linus Torvalds
@ 2006-12-11  8:33   ` Lars Hjemli
  2006-12-11 11:59     ` Lars Hjemli
  2006-12-11 17:01     ` Linus Torvalds
  0 siblings, 2 replies; 12+ messages in thread
From: Lars Hjemli @ 2006-12-11  8:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On 12/11/06, Linus Torvalds <torvalds@osdl.org> wrote:
> you really want to re-check the existence after you got the lock,
> otherwise you would race with somebody else that got the lock, generated
> the data, and then unlocked (and you got the lock _after_ the data was
> generated, so now you generate it unnecessarily).

Yes, you're right. Thanks for noticing.

But this also applies to the case where the cachefile has expired,
right? In that case, after getting the lock, I have to recheck that
the cachefile is _still_ expired.

Anyway, I must say I find it rather unlikely for these cases to occur
(frequently) in real life. That would seem to imply that the caching
isn't really needed at all.

>
> As a side note: how do you release your caches?
>

Simple timeouts (time()-stat.st_mtime), depending on what kind of page
was requested. If anyone cares about invalid cache content (branch
head moving), relevant cachefiles can be deleted with an update-hook.

-- 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
  2006-12-11  8:33   ` Lars Hjemli
@ 2006-12-11 11:59     ` Lars Hjemli
  2006-12-11 17:01     ` Linus Torvalds
  1 sibling, 0 replies; 12+ messages in thread
From: Lars Hjemli @ 2006-12-11 11:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On 12/11/06, Lars Hjemli <hjemli@gmail.com> wrote:
> Anyway, I must say I find it rather unlikely for these cases to occur
> (frequently) in real life.

*blush*

Of course this will happen in real life, I just needed to think about
it for a while (it just depends on the _size_ of the thundering hurd,
right?)

Anyway, it's fixed (I think) and pushed out

-- 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
  2006-12-11  0:08 ` Jakub Narebski
       [not found]   ` <8c5c35580612101616g179715ecyd02fcbb023246ecc@mail.gmail.com>
@ 2006-12-11 13:37   ` Michael
  1 sibling, 0 replies; 12+ messages in thread
From: Michael @ 2006-12-11 13:37 UTC (permalink / raw)
  To: git

Jakub Narebski wrote:
> wit
> in Python (is it actively developed? where it can be downloaded from?),

here?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
  2006-12-11  8:33   ` Lars Hjemli
  2006-12-11 11:59     ` Lars Hjemli
@ 2006-12-11 17:01     ` Linus Torvalds
  2006-12-11 17:33       ` Linus Torvalds
  2006-12-11 17:40       ` Lars Hjemli
  1 sibling, 2 replies; 12+ messages in thread
From: Linus Torvalds @ 2006-12-11 17:01 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: git



On Mon, 11 Dec 2006, Lars Hjemli wrote:
>
> But this also applies to the case where the cachefile has expired,
> right? In that case, after getting the lock, I have to recheck that
> the cachefile is _still_ expired.

Yes.

> Anyway, I must say I find it rather unlikely for these cases to occur
> (frequently) in real life. That would seem to imply that the caching
> isn't really needed at all.

The point is, if you have races, you will hit them _occasionally_. It may 
not be a performance problem in real life, BUT:

 - quite often, you can take advantage of the serialization guarantees 
   that a front-side cache offers you, and do the uncached accesses (or 
   writing the final cache-file) knowing that there's only one thing that 
   does that.

 - If so, then a race condition in the cache goes from a "unlikely 
   performance problem" to a BUG.

> > As a side note: how do you release your caches?
> 
> Simple timeouts (time()-stat.st_mtime), depending on what kind of page
> was requested. If anyone cares about invalid cache content (branch
> head moving), relevant cachefiles can be deleted with an update-hook.

I was more thinking about the locking part. Again, to be safe, you should 
probably take the lock before releasing any cache.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
  2006-12-11 17:01     ` Linus Torvalds
@ 2006-12-11 17:33       ` Linus Torvalds
  2006-12-11 17:40       ` Lars Hjemli
  1 sibling, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2006-12-11 17:33 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: git



On Mon, 11 Dec 2006, Linus Torvalds wrote:
> 
> The point is, if you have races, you will hit them _occasionally_. It may 
> not be a performance problem in real life, BUT:

Side note: another problem with races is that

 - exactly because they are rare, any potential problems they cause are 
   really hard to debug - you may have a hard time reproducing things.

 - some loads may be able to trigger them thanks to very unlucky timing, 
   so even if the problem is "just a theoretical performance issue", 
   sometimes that theoretical performance problem that only happens once 
   in a blue moon ends up happening a _lot_ for a particular user.

I'm probably biased, simply because I've done system programming for so 
long (and race conditions etc are one of the most common source of subtle 
bugs), but I've long since come to the very strong opinion that locking is 
simply too important to not do right.

Even if you can argue that "it doesn't matter" (and you may even be 
right), I personally tend to just consider dodgy locking to be a serious 
bug _whether_ it really matters or not.

Just a hangup of mine.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
  2006-12-11 17:01     ` Linus Torvalds
  2006-12-11 17:33       ` Linus Torvalds
@ 2006-12-11 17:40       ` Lars Hjemli
  2006-12-11 18:18         ` Linus Torvalds
  1 sibling, 1 reply; 12+ messages in thread
From: Lars Hjemli @ 2006-12-11 17:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On 12/11/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
> On Mon, 11 Dec 2006, Lars Hjemli wrote:
> > Anyway, I must say I find it rather unlikely for these cases to occur
> > (frequently) in real life. That would seem to imply that the caching
> > isn't really needed at all.
>
> The point is, if you have races, you will hit them _occasionally_. It may
> not be a performance problem in real life, BUT:
>
>  - quite often, you can take advantage of the serialization guarantees
>    that a front-side cache offers you, and do the uncached accesses (or
>    writing the final cache-file) knowing that there's only one thing that
>    does that.
>
>  - If so, then a race condition in the cache goes from a "unlikely
>    performance problem" to a BUG.

Yes, I finally understood (see my other reply)


> > > As a side note: how do you release your caches?
> >
> > Simple timeouts (time()-stat.st_mtime), depending on what kind of page
> > was requested. If anyone cares about invalid cache content (branch
> > head moving), relevant cachefiles can be deleted with an update-hook.
>
> I was more thinking about the locking part. Again, to be safe, you should
> probably take the lock before releasing any cache.

Ok. Code speeks louder than words, so I'll blatantly paste the key functions:

--->8----

const int NOLOCK = -1;

int cache_lock(struct cacheitem *item)
{
	int i = 0;
	char *lockfile = fmt("%s.lock", item->name);

 top:
	if (++i > cgit_max_lock_attempts)
		die("cache_lock: unable to lock %s: %s",
		    item->name, strerror(errno));

       	item->fd = open(lockfile, O_WRONLY|O_CREAT|O_EXCL, S_IRUSR|S_IWUSR);

	if (item->fd == NOLOCK && errno == ENOENT && cache_create_dirs())
		goto top;

	if (item->fd == NOLOCK && errno == EEXIST &&
	    cache_refill_overdue(lockfile) && !unlink(lockfile))
			goto top;

	return (item->fd > 0);
}

int cache_unlock(struct cacheitem *item)
{
	close(item->fd);
	return (rename(fmt("%s.lock", item->name), item->name) == 0);
}


static void cgit_check_cache(struct cacheitem *item)
{
	int i = 0;

	cache_prepare(item);
 top:
	if (++i > cgit_max_lock_attempts) {
		die("cgit_refresh_cache: unable to lock %s: %s",
		    item->name, strerror(errno));
	}
	if (!cache_exist(item)) {
		if (!cache_lock(item)) {
			sleep(1);
			goto top;
		}
		if (!cache_exist(item))
			cgit_fill_cache(item);
		cache_unlock(item);
	} else if (cache_expired(item) && cache_lock(item)) {
		if (cache_expired(item))
			cgit_fill_cache(item);
		cache_unlock(item);
	}
}

--->8----

I am a bit conserned about the effect of cache_unlink() if another
concurrent process gets "lucky" with the test
"cache_refill_overdue(lockfile) && !unlink(lockfile)".

This is supposed to be a safety valve against a stale lock file (lock
file not modified in n secs), but it doesn't feel quite right.

Probably, if cache_unlink() fails in cgit_check_cache(), I should "goto top".

Opinions?

-- 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
  2006-12-11 17:40       ` Lars Hjemli
@ 2006-12-11 18:18         ` Linus Torvalds
  2006-12-11 18:38           ` Lars Hjemli
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2006-12-11 18:18 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: git



On Mon, 11 Dec 2006, Lars Hjemli wrote:
> 
> Ok. Code speeks louder than words, so I'll blatantly paste the key functions:

Yeah, they're still buggy:

> int cache_unlock(struct cacheitem *item)
> {
> 	close(item->fd);
> 	return (rename(fmt("%s.lock", item->name), item->name) == 0);
> }
...
> 	if (!cache_exist(item)) {
> 		if (!cache_lock(item)) {
> 			sleep(1);
> 			goto top;
> 		}
> 		if (!cache_exist(item))
> 			cgit_fill_cache(item);
> 		cache_unlock(item);

What do you think happens if that last "cache_exist()" returned true?

That's right: the "cache_unlock()" will now OVERWRITE the valid cache with 
the (empty) lock-file that you didn't fill in.

Oops.

So you really have two different cases:

 - the "I created the file" case: rename lockfile to final name
 - the "somebody else created the file": remove the lockfile

and you can't use the same "cache_unlock()" for both of them.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] CGit v0.1-pre
  2006-12-11 18:18         ` Linus Torvalds
@ 2006-12-11 18:38           ` Lars Hjemli
  0 siblings, 0 replies; 12+ messages in thread
From: Lars Hjemli @ 2006-12-11 18:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On 12/11/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 11 Dec 2006, Lars Hjemli wrote:
> >
> > Ok. Code speeks louder than words, so I'll blatantly paste the key functions:
>
> Yeah, they're still buggy:
>
> > int cache_unlock(struct cacheitem *item)
> > {
> >       close(item->fd);
> >       return (rename(fmt("%s.lock", item->name), item->name) == 0);
> > }
> ...
> >       if (!cache_exist(item)) {
> >               if (!cache_lock(item)) {
> >                       sleep(1);
> >                       goto top;
> >               }
> >               if (!cache_exist(item))
> >                       cgit_fill_cache(item);
> >               cache_unlock(item);
>
> What do you think happens if that last "cache_exist()" returned true?
>
> That's right: the "cache_unlock()" will now OVERWRITE the valid cache with
> the (empty) lock-file that you didn't fill in.
>
> Oops.
>

Wow, that's embarrassing.

Thanks for the debugging, I'll try to actually fix it this time :-)

-- 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-12-11 18:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-10 23:42 [ANNOUNCE] CGit v0.1-pre Lars Hjemli
2006-12-11  0:08 ` Jakub Narebski
     [not found]   ` <8c5c35580612101616g179715ecyd02fcbb023246ecc@mail.gmail.com>
2006-12-11  0:29     ` Jakub Narebski
2006-12-11 13:37   ` Michael
2006-12-11  1:04 ` Linus Torvalds
2006-12-11  8:33   ` Lars Hjemli
2006-12-11 11:59     ` Lars Hjemli
2006-12-11 17:01     ` Linus Torvalds
2006-12-11 17:33       ` Linus Torvalds
2006-12-11 17:40       ` Lars Hjemli
2006-12-11 18:18         ` Linus Torvalds
2006-12-11 18:38           ` Lars Hjemli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).