git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>, Git Mailing List <git@vger.kernel.org>
Subject: Re: Remove "refs" field from "struct object"
Date: Sun, 18 Jun 2006 11:57:40 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0606181147080.5498@g5.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0606181137380.5498@g5.osdl.org>



On Sun, 18 Jun 2006, Linus Torvalds wrote:
>
> The "refs" field, which is really needed only for fsck, is maintained in
> a separate hashed lookup-table, allowing all normal users to totally
> ignore it.

Btw, in case people wondered: the cost to git-fsck-objects seems to be 
zero and sometimes apparently even negative.

In order to remove "refs" from "struct object", I had to add it to the 
object_refs structure instead, and so you'd think that the memory usage 
for git-fsck-objects (which needs the object refs) should be unchanged, 
while the hashed lookup should be more expensive than just the direct 
pointer lookup.

Actually testing it, though, implies that isn't the case. Lots of objects 
(every single blob object, in fact) have no refs at all, and for that case 
we don't create any "object_refs" structure at all, so we don't actually 
end up with a 1:1 relationship, and we win a small amount of memory.

And the hashing seems to be effective enough that it's no costlier than 
looking up the ref pointer directly from the object. There's probably 
some bad cache behaviour from the hashing, but it didn't show up in the 
benchmarking I did (ie fsck took as long before as it did afterwards, 
both for git and for the kernel archive).

It may be (probably is) that the reachability analysis is just a very 
small portion of the overall costs, and it's just not very noticeable. It 
may also be that whatever bad cache behaviours you get from the extra hash 
lookup are just balanced out by the objects themselves being slightly 
denser and better in the cache (although that is probably partly hidden 
again by the extra malloc padding).

Regardless, there doesn't really seem to be any downsides, but I didn't 
test it _that_ exhaustively.

		Linus

      reply	other threads:[~2006-06-18 18:57 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-18 18:45 Remove "refs" field from "struct object" Linus Torvalds
2006-06-18 18:57 ` Linus Torvalds [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0606181147080.5498@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).