From: Johan Herland <johan@herland.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
"Shawn O. Pearce" <spearce@spearce.org>,
trast@student.ethz.ch, tavestbo@trolltech.com,
git@drmicha.warpmail.net, chriscool@tuxfamily.org
Subject: Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
Date: Fri, 28 Aug 2009 12:40:12 +0200 [thread overview]
Message-ID: <200908281240.13311.johan@herland.net> (raw)
In-Reply-To: <alpine.DEB.1.00.0908281048320.7434@intel-tinevez-2-302>
On Friday 28 August 2009, Johannes Schindelin wrote:
> Hi,
>
> On Fri, 28 Aug 2009, Johan Herland wrote:
> > On Thursday 27 August 2009, Junio C Hamano wrote:
> > > "Shawn O. Pearce" <spearce@spearce.org> writes:
> > > > Yea, it was me. I still think it might be a useful idea, since
> > > > it allows you better density of loading notes when parsing the
> > > > recent commits. In theory the last 256 commits can easly be in
> > > > each of the 2/ fanout buckets, making 2/38 pointless for
> > > > reducing the search space. Commit date on the other hand can
> > > > probably force all of them into the same bucket, making it easy
> > > > to have the last 256 commits in cache, from a single bucket.
> > > >
> > > > But I thought you shot it down, by saying that we also wanted
> > > > to support notes on blobs. I happen to see no value in a note
> > > > on a blob, a blob alone doesn't make much sense without at
> > > > least an annotated tag or commit to provide it some named
> > > > context, and the latter two have dates.
> > >
> > > Yeah, and in this thread everybody seems to be talking about
> > > commits so I think it is fine to limit notes only to commits.
> >
> > Agreed. I'm starting to come around to the idea of storing them in
> > subtrees based on commit dates. For one, you don't have multiple
> > notes for one commit in the same notes tree. Also, the common-case
> > access pattern seems tempting.
> >
> > Dscho: Were there other problems with the date-based approach other
> > than not supporting notes on trees and blobs?
>
> It emphasized an implementation detail too much for my liking.
>
> And I would rather have some flexibility in the code as to _when_ it
> fans out and when not.
>
> So I can easily imagine a full repository which has only, say, 5
> notes. Why not have a single tree for all of those?
Yes, if you only have a handful of notes, the date-based approach is
definitely overkill. On the other hand, if you only have a handful of
notes, performance is not going to be a problem in the first place, no
matter which notes structure you use...
> And I can easily imagine a repository that has a daily note generated
> by an automatic build, and no other notes. The date-based fan-out
> just wastes our time here, and even hurts performance.
What about a month-based fanout? Looking at the kernel repo with
git log --all --date=iso --format="%ad" |
cut -c1-7 | sort | uniq -c | sort -n
I find that commits are spread across 66 months, and the most active
month (2008-07) has 5661 commits. If we assume the one-note-per-commit
worst case, that gives up to 5661 notes per month-based subdir. Is that
too much?
Doing
for subdir in $(find . -type d); do
echo "$(ls -1 $subdir | wc -l) $subdir"
done | sort -n
shows me that the currently largest tree in the kernel has 985 entries
(include/linux), so a 5661-entry tree is probably larger than what git
is used to...
...just thinking that we shold make things as simple as possible (but no
simpler), and if a month-based fanout works adequately in all practical
cases, then we should go with that...
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
next prev parent reply other threads:[~2009-08-28 10:42 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 01/12] Introduce commit notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 02/12] Add a script to edit/inspect notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 03/12] Speed up git notes lookup Johan Herland
2009-08-27 1:43 ` [PATCHv4 04/12] Add an expensive test for git-notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 05/12] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-08-27 1:43 ` [PATCHv4 06/12] fast-import: Add support for importing commit notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 07/12] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-08-27 1:43 ` [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-08-27 5:00 ` Junio C Hamano
2009-08-27 9:35 ` Johan Herland
2009-08-27 10:47 ` Johannes Schindelin
2009-08-27 20:58 ` Junio C Hamano
2009-08-28 8:48 ` Johannes Schindelin
2009-08-27 20:55 ` Junio C Hamano
2009-08-27 21:27 ` Shawn O. Pearce
2009-08-27 21:50 ` Junio C Hamano
2009-08-27 23:03 ` Johan Herland
2009-08-27 23:39 ` Jeff King
2009-08-28 0:30 ` Junio C Hamano
2009-08-28 0:40 ` Sverre Rabbelier
2009-08-28 1:43 ` Junio C Hamano
2009-08-28 2:51 ` Sverre Rabbelier
2009-08-28 3:02 ` Junio C Hamano
2009-08-28 3:05 ` Sverre Rabbelier
2009-08-28 3:35 ` Junio C Hamano
2009-08-28 8:51 ` Johannes Schindelin
2009-08-28 10:40 ` Johan Herland [this message]
2009-08-28 11:56 ` Johannes Schindelin
2009-08-28 14:15 ` Johan Herland
2009-08-27 10:42 ` Johannes Schindelin
2009-08-27 1:43 ` [PATCHv4 09/12] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-08-27 1:43 ` [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes Johan Herland
2009-08-27 7:39 ` Alex Riesen
2009-08-27 9:49 ` Johan Herland
2009-08-27 22:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 11/12] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-08-27 1:43 ` [PATCHv4 12/12] Add '%N'-format for pretty-printing commit notes Johan Herland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200908281240.13311.johan@herland.net \
--to=johan@herland.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=chriscool@tuxfamily.org \
--cc=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=spearce@spearce.org \
--cc=tavestbo@trolltech.com \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).