git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johan Herland <johan@herland.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	trast@student.ethz.ch, tavestbo@trolltech.com,
	git@drmicha.warpmail.net, chriscool@tuxfamily.org
Subject: Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
Date: Fri, 28 Aug 2009 12:40:12 +0200	[thread overview]
Message-ID: <200908281240.13311.johan@herland.net> (raw)
In-Reply-To: <alpine.DEB.1.00.0908281048320.7434@intel-tinevez-2-302>

On Friday 28 August 2009, Johannes Schindelin wrote:
> Hi,
>
> On Fri, 28 Aug 2009, Johan Herland wrote:
> > On Thursday 27 August 2009, Junio C Hamano wrote:
> > > "Shawn O. Pearce" <spearce@spearce.org> writes:
> > > > Yea, it was me.  I still think it might be a useful idea, since
> > > > it allows you better density of loading notes when parsing the
> > > > recent commits.  In theory the last 256 commits can easly be in
> > > > each of the 2/ fanout buckets, making 2/38 pointless for
> > > > reducing the search space.  Commit date on the other hand can
> > > > probably force all of them into the same bucket, making it easy
> > > > to have the last 256 commits in cache, from a single bucket.
> > > >
> > > > But I thought you shot it down, by saying that we also wanted
> > > > to support notes on blobs.  I happen to see no value in a note
> > > > on a blob, a blob alone doesn't make much sense without at
> > > > least an annotated tag or commit to provide it some named
> > > > context, and the latter two have dates.
> > >
> > > Yeah, and in this thread everybody seems to be talking about
> > > commits so I think it is fine to limit notes only to commits.
> >
> > Agreed. I'm starting to come around to the idea of storing them in
> > subtrees based on commit dates. For one, you don't have multiple
> > notes for one commit in the same notes tree. Also, the common-case
> > access pattern seems tempting.
> >
> > Dscho: Were there other problems with the date-based approach other
> > than not supporting notes on trees and blobs?
>
> It emphasized an implementation detail too much for my liking.
>
> And I would rather have some flexibility in the code as to _when_ it
> fans out and when not.
>
> So I can easily imagine a full repository which has only, say, 5
> notes. Why not have a single tree for all of those?

Yes, if you only have a handful of notes, the date-based approach is 
definitely overkill. On the other hand, if you only have a handful of 
notes, performance is not going to be a problem in the first place, no 
matter which notes structure you use...

> And I can easily imagine a repository that has a daily note generated
> by an automatic build, and no other notes.  The date-based fan-out
> just wastes our time here, and even hurts performance.

What about a month-based fanout? Looking at the kernel repo with

  git log --all --date=iso --format="%ad" |
  cut -c1-7 | sort | uniq -c | sort -n

I find that commits are spread across 66 months, and the most active 
month (2008-07) has 5661 commits. If we assume the one-note-per-commit 
worst case, that gives up to 5661 notes per month-based subdir. Is that 
too much?

Doing

  for subdir in $(find . -type d); do
      echo "$(ls -1 $subdir | wc -l) $subdir"
  done | sort -n

shows me that the currently largest tree in the kernel has 985 entries 
(include/linux), so a 5661-entry tree is probably larger than what git 
is used to...

...just thinking that we shold make things as simple as possible (but no 
simpler), and if a month-based fanout works adequately in all practical 
cases, then we should go with that...


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

  reply	other threads:[~2009-08-28 10:42 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-27  1:43 [PATCHv4 00/12] git notes Johan Herland
2009-08-27  1:43 ` [PATCHv4 01/12] Introduce commit notes Johan Herland
2009-08-27  1:43 ` [PATCHv4 02/12] Add a script to edit/inspect notes Johan Herland
2009-08-27  1:43 ` [PATCHv4 03/12] Speed up git notes lookup Johan Herland
2009-08-27  1:43 ` [PATCHv4 04/12] Add an expensive test for git-notes Johan Herland
2009-08-27  1:43 ` [PATCHv4 05/12] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-08-27  1:43 ` [PATCHv4 06/12] fast-import: Add support for importing commit notes Johan Herland
2009-08-27  1:43 ` [PATCHv4 07/12] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-08-27  1:43 ` [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-08-27  5:00   ` Junio C Hamano
2009-08-27  9:35     ` Johan Herland
2009-08-27 10:47       ` Johannes Schindelin
2009-08-27 20:58         ` Junio C Hamano
2009-08-28  8:48           ` Johannes Schindelin
2009-08-27 20:55       ` Junio C Hamano
2009-08-27 21:27         ` Shawn O. Pearce
2009-08-27 21:50           ` Junio C Hamano
2009-08-27 23:03             ` Johan Herland
2009-08-27 23:39               ` Jeff King
2009-08-28  0:30                 ` Junio C Hamano
2009-08-28  0:40                   ` Sverre Rabbelier
2009-08-28  1:43                     ` Junio C Hamano
2009-08-28  2:51                       ` Sverre Rabbelier
2009-08-28  3:02                         ` Junio C Hamano
2009-08-28  3:05                           ` Sverre Rabbelier
2009-08-28  3:35                             ` Junio C Hamano
2009-08-28  8:51               ` Johannes Schindelin
2009-08-28 10:40                 ` Johan Herland [this message]
2009-08-28 11:56                   ` Johannes Schindelin
2009-08-28 14:15                     ` Johan Herland
2009-08-27 10:42     ` Johannes Schindelin
2009-08-27  1:43 ` [PATCHv4 09/12] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-08-27  1:43 ` [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes Johan Herland
2009-08-27  7:39   ` Alex Riesen
2009-08-27  9:49     ` Johan Herland
2009-08-27 22:43       ` Johan Herland
2009-08-27  1:43 ` [PATCHv4 11/12] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-08-27  1:43 ` [PATCHv4 12/12] Add '%N'-format for pretty-printing commit notes Johan Herland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200908281240.13311.johan@herland.net \
    --to=johan@herland.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=chriscool@tuxfamily.org \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=spearce@spearce.org \
    --cc=tavestbo@trolltech.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).