All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Theodore Tso <tytso@mit.edu>, Nicolas Pitre <nico@cam.org>,
	"Randal L. Schwartz" <merlyn@stonehenge.com>,
	git@vger.kernel.org
Subject: Re: cloning the kernel - why long time in "Resolving 313037 deltas"
Date: Tue, 19 Dec 2006 20:54:32 -0500	[thread overview]
Message-ID: <20061220015431.GA27638@spearce.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0612190855300.3479@woody.osdl.org>

Linus Torvalds <torvalds@osdl.org> wrote:
> On Tue, 19 Dec 2006, Theodore Tso wrote:
> > 
> > So the main reason to use mamp, as Linus puts it, is if the management
> > overhead of needing to read lots of small bits of the file makes the
> > use of malloc/read to be a pain in the *ss, then go for it.
> 
> An example of this in git is the regular pack-file accesses. We're MUCH 
> better off just mmap'ing the whole pack-file (or at least big chunks of 
> it) and not having to maintain difficult structures of "this is where I 
> read that part of the file into memory", or read _big_ chunks when 
> quite often we just use a few kB of it.
> 
> So mmap for pack-files does make sense, but probably only when you can 
> mmap big chunks, and are going to access much smaller (random) parts of 
> it.

Yes, exactly.

git-fast-import mmaps the pack file for this very reason.  It every
so often needs to go back and reread a tree object which has expired
from its own in-memory LRU cache.  This usually doesn't happen
very often, but when it does we don't know where we are going to
jump to get data from.  mmaping a huge segment of the pack file
(or the whole thing if its reasonably small) works for this case as
the OS buffer cache can just take care of it for us.  But as Linus
pointed out mmap and write() aren't safe on some systems.  Arrrgh.

However git-fast-import would probably work just as well (or maybe
slightly better) with pread().  I really should port that code
forward to current Git, use pread() instead, and submit the patch
to Junio.  But nobody really showed a lot of interest.


My sliding window pack-file access implementation (that I'm currently
rewriting on top of current Git) tries to work in very large chunks,
by default its 32 MiB per chunk, but its user/repository configurable
so kernel hackers may just set it to 256 MiB and continue to get
one large mmap for quite some time to come.  Of course I would
also like to get that to autoselect the window size rather than
just hardcode it.  :-)

The implementation would prefer a very small number (<8) of very
large chunks (>32 MiB), but is designed to more gracefully degrade
on huge packs on limited address space systems (e.g. Windows 32 bit)
then the current code does.

-- 

  reply	other threads:[~2006-12-20  1:54 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <86y7p57y05.fsf@blue.stonehenge.com>
     [not found] ` <Pine.LNX.4.63.0612182154170.19693@wbgn013.biozentrum.uni-wuerzburg.de>
     [not found]   ` <Pine.LNX.4.63.0612182213020.19693@wbgn013.biozentrum.uni-wuerzburg.de>
     [not found]     ` <Pine.LNX.4.64.0612181638220.18171@xanadu.home>
2006-12-18 21:55       ` [PATCH] fetch-pack: avoid fixing thin packs when unnecessary Johannes Schindelin
2006-12-18 22:17         ` Nicolas Pitre
     [not found] ` <Pine.LNX.4.64.0612181251020.3479@woody.osdl.org>
     [not found]   ` <86r6uw9azn.fsf@blue.stonehenge.com>
     [not found]     ` <Pine.LNX.4.64.0612181625140.18171@xanadu.home>
2006-12-18 22:01       ` cloning the kernel - why long time in "Resolving 313037 deltas" Randal L. Schwartz
2006-12-18 22:09         ` Nicolas Pitre
2006-12-18 22:21           ` Randal L. Schwartz
2006-12-18 22:50             ` Nicolas Pitre
2006-12-18 22:22         ` Linus Torvalds
2006-12-18 22:26           ` Randal L. Schwartz
2006-12-18 23:02             ` Martin Langhoff
2006-12-22  1:44               ` Kyle Moffett
2006-12-22  1:56                 ` Shawn Pearce
2006-12-22  8:04                 ` Marco Roeland
2007-01-03 13:55               ` Andreas Ericsson
2006-12-18 23:28             ` Linus Torvalds
2006-12-19  0:13               ` Nicolas Pitre
2006-12-19  5:11                 ` Theodore Tso
2006-12-19  6:39                   ` Shawn Pearce
2006-12-19  6:51                     ` Linus Torvalds
2006-12-19  7:26                       ` Shawn Pearce
2006-12-19  7:52                         ` Marco Roeland
2006-12-19  7:58                           ` Shawn Pearce
2006-12-19  8:32                       ` Shawn Pearce
2006-12-19  8:40                         ` Marco Roeland
2006-12-19  8:49                           ` Shawn Pearce
2006-12-19  9:13                             ` Marco Roeland
2006-12-19 20:28                               ` Alex Riesen
2006-12-21 20:35                                 ` Juergen Ruehle
2006-12-19 16:19                     ` Theodore Tso
2006-12-19 16:57                       ` Linus Torvalds
2006-12-20  1:54                         ` Shawn Pearce [this message]
2006-12-20  1:58                       ` Shawn Pearce
2006-12-19  6:47                   ` Linus Torvalds
2006-12-19  8:32                     ` Johannes Schindelin
2006-12-19  9:10                       ` Junio C Hamano
2006-12-19  9:47                         ` Jeff King
2006-12-19 10:24                         ` Andy Whitcroft
2006-12-19 15:53                         ` [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux Nicolas Pitre
2006-12-19 19:00                           ` Junio C Hamano
2006-12-19 19:14                             ` Nicolas Pitre
2006-12-19 19:55                               ` Linus Torvalds
2006-12-19 19:57                                 ` Randal L. Schwartz
2006-12-19 20:03                                   ` Randal L. Schwartz
2006-12-19 20:02                                 ` Jeff Garzik
2006-12-20  0:30                                   ` Junio C Hamano
2006-12-20  0:40                                     ` Linus Torvalds
2006-12-20  0:50                                       ` Jeff Garzik
2006-12-20  1:12                                       ` Junio C Hamano
2006-12-20 20:17                                         ` Junio C Hamano
2006-12-20 20:53                                           ` Linus Torvalds
2006-12-20 21:52                                             ` Junio C Hamano
2006-12-20 22:13                                 ` Nikolai Weibull
2006-12-21  8:41 cloning the kernel - why long time in "Resolving 313037 deltas" linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061220015431.GA27638@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=merlyn@stonehenge.com \
    --cc=nico@cam.org \
    --cc=torvalds@osdl.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.