From: Johan Herland <johan@herland.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
"Shawn O. Pearce" <spearce@spearce.org>,
trast@student.ethz.ch, tavestbo@trolltech.com,
git@drmicha.warpmail.net, chriscool@tuxfamily.org
Subject: Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
Date: Fri, 28 Aug 2009 12:40:12 +0200 [thread overview]
Message-ID: <200908281240.13311.johan@herland.net> (raw)
In-Reply-To: <alpine.DEB.1.00.0908281048320.7434@intel-tinevez-2-302>
On Friday 28 August 2009, Johannes Schindelin wrote:
> Hi,
>
> On Fri, 28 Aug 2009, Johan Herland wrote:
> > On Thursday 27 August 2009, Junio C Hamano wrote:
> > > "Shawn O. Pearce" <spearce@spearce.org> writes:
> > > > Yea, it was me. I still think it might be a useful idea, since
> > > > it allows you better density of loading notes when parsing the
> > > > recent commits. In theory the last 256 commits can easly be in
> > > > each of the 2/ fanout buckets, making 2/38 pointless for
> > > > reducing the search space. Commit date on the other hand can
> > > > probably force all of them into the same bucket, making it easy
> > > > to have the last 256 commits in cache, from a single bucket.
> > > >
> > > > But I thought you shot it down, by saying that we also wanted
> > > > to support notes on blobs. I happen to see no value in a note
> > > > on a blob, a blob alone doesn't make much sense without at
> > > > least an annotated tag or commit to provide it some named
> > > > context, and the latter two have dates.
> > >
> > > Yeah, and in this thread everybody seems to be talking about
> > > commits so I think it is fine to limit notes only to commits.
> >
> > Agreed. I'm starting to come around to the idea of storing them in
> > subtrees based on commit dates. For one, you don't have multiple
> > notes for one commit in the same notes tree. Also, the common-case
> > access pattern seems tempting.
> >
> > Dscho: Were there other problems with the date-based approach other
> > than not supporting notes on trees and blobs?
>
> It emphasized an implementation detail too much for my liking.
>
> And I would rather have some flexibility in the code as to _when_ it
> fans out and when not.
>
> So I can easily imagine a full repository which has only, say, 5
> notes. Why not have a single tree for all of those?
Yes, if you only have a handful of notes, the date-based approach is
definitely overkill. On the other hand, if you only have a handful of
notes, performance is not going to be a problem in the first place, no
matter which notes structure you use...
> And I can easily imagine a repository that has a daily note generated
> by an automatic build, and no other notes. The date-based fan-out
> just wastes our time here, and even hurts performance.
What about a month-based fanout? Looking at the kernel repo with
git log --all --date=iso --format="%ad" |
cut -c1-7 | sort | uniq -c | sort -n
I find that commits are spread across 66 months, and the most active
month (2008-07) has 5661 commits. If we assume the one-note-per-commit
worst case, that gives up to 5661 notes per month-based subdir. Is that
too much?
Doing
for subdir in $(find . -type d); do
echo "$(ls -1 $subdir | wc -l) $subdir"
done | sort -n
shows me that the currently largest tree in the kernel has 985 entries
(include/linux), so a 5661-entry tree is probably larger than what git
is used to...
...just thinking that we shold make things as simple as possible (but no
simpler), and if a month-based fanout works adequately in all practical
cases, then we should go with that...
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
next prev parent reply other threads:[~2009-08-28 10:42 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 01/12] Introduce commit notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 02/12] Add a script to edit/inspect notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 03/12] Speed up git notes lookup Johan Herland
2009-08-27 1:43 ` [PATCHv4 04/12] Add an expensive test for git-notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 05/12] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-08-27 1:43 ` [PATCHv4 06/12] fast-import: Add support for importing commit notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 07/12] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-08-27 1:43 ` [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-08-27 5:00 ` Junio C Hamano
2009-08-27 9:35 ` Johan Herland
2009-08-27 10:47 ` Johannes Schindelin
2009-08-27 20:58 ` Junio C Hamano
2009-08-28 8:48 ` Johannes Schindelin
2009-08-27 20:55 ` Junio C Hamano
2009-08-27 21:27 ` Shawn O. Pearce
2009-08-27 21:50 ` Junio C Hamano
2009-08-27 23:03 ` Johan Herland
2009-08-27 23:39 ` Jeff King
2009-08-28 0:30 ` Junio C Hamano
2009-08-28 0:40 ` Sverre Rabbelier
2009-08-28 1:43 ` Junio C Hamano
2009-08-28 2:51 ` Sverre Rabbelier
2009-08-28 3:02 ` Junio C Hamano
2009-08-28 3:05 ` Sverre Rabbelier
2009-08-28 3:35 ` Junio C Hamano
2009-08-28 8:51 ` Johannes Schindelin
2009-08-28 10:40 ` Johan Herland [this message]
2009-08-28 11:56 ` Johannes Schindelin
2009-08-28 14:15 ` Johan Herland
2009-08-27 10:42 ` Johannes Schindelin
2009-08-27 1:43 ` [PATCHv4 09/12] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-08-27 1:43 ` [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes Johan Herland
2009-08-27 7:39 ` Alex Riesen
2009-08-27 9:49 ` Johan Herland
2009-08-27 22:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 11/12] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-08-27 1:43 ` [PATCHv4 12/12] Add '%N'-format for pretty-printing commit notes Johan Herland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200908281240.13311.johan@herland.net \
--to=johan@herland.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=chriscool@tuxfamily.org \
--cc=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=spearce@spearce.org \
--cc=tavestbo@trolltech.com \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.