git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Shawn Pearce <spearce@spearce.org>
Cc: Nicolas Pitre <nico@cam.org>, Linus Torvalds <torvalds@osdl.org>,
	"Randal L. Schwartz" <merlyn@stonehenge.com>,
	git@vger.kernel.org
Subject: Re: cloning the kernel - why long time in "Resolving 313037 deltas"
Date: Tue, 19 Dec 2006 11:19:19 -0500	[thread overview]
Message-ID: <20061219161919.GA16980@thunk.org> (raw)
In-Reply-To: <20061219063930.GA2511@spearce.org>

On Tue, Dec 19, 2006 at 01:39:30AM -0500, Shawn Pearce wrote:
> This is why git-fast-import mmaps 128 MiB blocks from the file at
> a time.  The mmap region is usually much larger than the file itself;
> the application appends to the file via write() then goes back
> and rereads data when necessary via the already established mmap.
> Its rare for the application to need to unmap/remap a different block
> so there really isn't very much page table manipulation overhead.

Yes, but unless you are using the (non-portable, Linux specific)
MAP_POPULATE flag to mmap, each time you touch a new page, you end up
taking a page fault; and so malloc/read/free might *still* be faster.
I'd encourage you to make the change and benchmark it; the results may
be surprising.  I played with this with dcraw, the Canon Raw File
converter a while back (before MAP_POPULATE was added), where I found
that with a linear access pattern, if you are reading the entire file,
it's stil marginally faster to use read() over mmap(), because with
dcraw taking a page fault every 4k of raw file, the system time was
significantly higher.

So the main reason to use mamp, as Linus puts it, is if the management
overhead of needing to read lots of small bits of the file makes the
use of malloc/read to be a pain in the *ss, then go for it.  But don't
assume that you'll get better performance; in my experience, even on
the hyper-performant Linus kernel, mmap() in general only barely
breaks even with read().  On other systems, things are probably going
to be even worse.


  parent reply	other threads:[~2006-12-19 16:29 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <86y7p57y05.fsf@blue.stonehenge.com>
     [not found] ` <Pine.LNX.4.63.0612182154170.19693@wbgn013.biozentrum.uni-wuerzburg.de>
     [not found]   ` <Pine.LNX.4.63.0612182213020.19693@wbgn013.biozentrum.uni-wuerzburg.de>
     [not found]     ` <Pine.LNX.4.64.0612181638220.18171@xanadu.home>
2006-12-18 21:55       ` [PATCH] fetch-pack: avoid fixing thin packs when unnecessary Johannes Schindelin
2006-12-18 22:17         ` Nicolas Pitre
     [not found] ` <Pine.LNX.4.64.0612181251020.3479@woody.osdl.org>
     [not found]   ` <86r6uw9azn.fsf@blue.stonehenge.com>
     [not found]     ` <Pine.LNX.4.64.0612181625140.18171@xanadu.home>
2006-12-18 22:01       ` cloning the kernel - why long time in "Resolving 313037 deltas" Randal L. Schwartz
2006-12-18 22:09         ` Nicolas Pitre
2006-12-18 22:21           ` Randal L. Schwartz
2006-12-18 22:50             ` Nicolas Pitre
2006-12-18 22:22         ` Linus Torvalds
2006-12-18 22:26           ` Randal L. Schwartz
2006-12-18 23:02             ` Martin Langhoff
2006-12-22  1:44               ` Kyle Moffett
2006-12-22  1:56                 ` Shawn Pearce
2006-12-22  8:04                 ` Marco Roeland
2007-01-03 13:55               ` Andreas Ericsson
2006-12-18 23:28             ` Linus Torvalds
2006-12-19  0:13               ` Nicolas Pitre
2006-12-19  5:11                 ` Theodore Tso
2006-12-19  6:39                   ` Shawn Pearce
2006-12-19  6:51                     ` Linus Torvalds
2006-12-19  7:26                       ` Shawn Pearce
2006-12-19  7:52                         ` Marco Roeland
2006-12-19  7:58                           ` Shawn Pearce
2006-12-19  8:32                       ` Shawn Pearce
2006-12-19  8:40                         ` Marco Roeland
2006-12-19  8:49                           ` Shawn Pearce
2006-12-19  9:13                             ` Marco Roeland
2006-12-19 20:28                               ` Alex Riesen
2006-12-21 20:35                                 ` Juergen Ruehle
2006-12-19 16:19                     ` Theodore Tso [this message]
2006-12-19 16:57                       ` Linus Torvalds
2006-12-20  1:54                         ` Shawn Pearce
2006-12-20  1:58                       ` Shawn Pearce
2006-12-19  6:47                   ` Linus Torvalds
2006-12-19  8:32                     ` Johannes Schindelin
2006-12-19  9:10                       ` Junio C Hamano
2006-12-19  9:47                         ` Jeff King
2006-12-19 10:24                         ` Andy Whitcroft
2006-12-19 15:53                         ` [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux Nicolas Pitre
2006-12-19 19:00                           ` Junio C Hamano
2006-12-19 19:14                             ` Nicolas Pitre
2006-12-19 19:55                               ` Linus Torvalds
2006-12-19 19:57                                 ` Randal L. Schwartz
2006-12-19 20:03                                   ` Randal L. Schwartz
2006-12-19 20:02                                 ` Jeff Garzik
2006-12-20  0:30                                   ` Junio C Hamano
2006-12-20  0:40                                     ` Linus Torvalds
2006-12-20  0:50                                       ` Jeff Garzik
2006-12-20  1:12                                       ` Junio C Hamano
2006-12-20 20:17                                         ` Junio C Hamano
2006-12-20 20:53                                           ` Linus Torvalds
2006-12-20 21:52                                             ` Junio C Hamano
2006-12-20 22:13                                 ` Nikolai Weibull
2006-12-21  8:41 cloning the kernel - why long time in "Resolving 313037 deltas" linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061219161919.GA16980@thunk.org \
    --to=tytso@mit.edu \
    --cc=git@vger.kernel.org \
    --cc=merlyn@stonehenge.com \
    --cc=nico@cam.org \
    --cc=spearce@spearce.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).