public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davide Libenzi <davidel@xmailserver.org>,
	Ulrich Drepper <drepper@redhat.com>,
	Jeff Garzik <jeff@garzik.org>, Zach Brown <zach.brown@oracle.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Arjan van de Ven <arjan@infradead.org>,
	Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@zip.com.au>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
	"David S. Miller" <davem@davemloft.net>,
	Suparna Bhattacharya <suparna@in.ibm.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: Syslets, Threadlets, generic AIO support, v6
Date: Thu, 31 May 2007 11:02:52 +0200	[thread overview]
Message-ID: <20070531090252.GA29817@elte.hu> (raw)
In-Reply-To: <20070531061303.GA4436@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> it's both a flexibility and a speedup thing as well:
> 
> flexibility: for libraries to be able to open files and keep them open 
> comes up regularly. For example currently glibc is quite wasteful in a 
> number of common networking related functions (Ulrich, please correct 
> me if i'm wrong), which could be optimized if glibc could just keep a 
> netlink channel fd open and could poll() it for changes and cache the 
> results if there are no changes (or something like that).
> 
> speedup: i suggested O_ANY 6 years ago as a speedup to Apache - 
> non-linear fds are cheaper to allocate/map:
> 
>   http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg23820.html
> 
> (i definitely remember having written code for that too, but i cannot 
> find that in the archives. hm.) In theory we could avoid _all_ 
> fd-bitmap overhead as well and use a per-process list/pool of struct 
> file buffers plus a maximum-fd field as the 'non-linear fd allocator' 
> (at the price of only deallocating them at process exit time).

to measure this i've written fd-scale-bench.c:

   http://redhat.com/~mingo/fd-scale-patches/fd-scale-bench.c

which tests the (cache-hot or cache-cold) cost of open()-ing of two fds 
while there are N other fds already open: one is from the 'middle' of 
the range, one is from the end of it.

Lets check our current 'extreme high end' performance with 1 million 
fds. (which is not realistic right now but there certainly are systems 
with over a hundred thousand open fds). Results from a fast CPU with 2MB 
of cache:

 cache-hot:

 # ./fd-scale-bench 1000000 0
 checking the cache-hot performance of open()-ing 1000000 fds.
 num_fds: 1, best cost: 1.40 us, worst cost: 2.00 us
 num_fds: 2, best cost: 1.40 us, worst cost: 1.40 us
 num_fds: 3, best cost: 1.40 us, worst cost: 2.00 us
 num_fds: 4, best cost: 1.40 us, worst cost: 1.40 us
 ...
 num_fds: 77117, best cost: 1.60 us, worst cost: 2.00 us
 num_fds: 96397, best cost: 2.00 us, worst cost: 2.20 us
 num_fds: 120497, best cost: 2.20 us, worst cost: 2.40 us
 num_fds: 150622, best cost: 2.20 us, worst cost: 3.00 us
 num_fds: 188278, best cost: 2.60 us, worst cost: 3.00 us
 num_fds: 235348, best cost: 2.80 us, worst cost: 3.80 us
 num_fds: 294186, best cost: 3.40 us, worst cost: 4.20 us
 num_fds: 367733, best cost: 4.00 us, worst cost: 5.00 us
 num_fds: 459667, best cost: 4.60 us, worst cost: 6.00 us
 num_fds: 574584, best cost: 5.60 us, worst cost: 8.20 us
 num_fds: 718231, best cost: 6.40 us, worst cost: 10.00 us
 num_fds: 897789, best cost: 7.60 us, worst cost: 11.80 us
 num_fds: 1000000, best cost: 8.20 us, worst cost: 9.60 us

 cache-cold:

 # ./fd-scale-bench 1000000 1
 checking the performance of open()-ing 1000000 fds.
 num_fds: 1, best cost: 4.60 us, worst cost: 7.00 us
 num_fds: 2, best cost: 5.00 us, worst cost: 6.60 us
 ...
 num_fds: 77117, best cost: 5.60 us, worst cost: 7.40 us
 num_fds: 96397, best cost: 5.60 us, worst cost: 7.40 us
 num_fds: 120497, best cost: 6.20 us, worst cost: 6.80 us
 num_fds: 150622, best cost: 6.40 us, worst cost: 7.60 us
 num_fds: 188278, best cost: 6.80 us, worst cost: 9.20 us
 num_fds: 235348, best cost: 7.20 us, worst cost: 8.80 us
 num_fds: 294186, best cost: 8.00 us, worst cost: 9.40 us
 num_fds: 367733, best cost: 8.80 us, worst cost: 11.60 us
 num_fds: 459667, best cost: 9.20 us, worst cost: 12.20 us
 num_fds: 574584, best cost: 10.00 us, worst cost: 12.40 us
 num_fds: 718231, best cost: 11.00 us, worst cost: 13.40 us
 num_fds: 897789, best cost: 12.80 us, worst cost: 15.80 us
 num_fds: 1000000, best cost: 13.60 us, worst cost: 15.40 us

we are pretty good at the moment: the open() cost starts to increase at 
around 100K open fds, both in the cache-cold and cache-hot case. (that 
roughly corresponds to the fd bitmap falling out of the 32K L1 cache) At 
1 million fds our fd bitmap has a size of 128K when there are 1 million 
fds open in a single process.

so while it's certainly not 'urgent' to improve this, private fds are an 
easier target for optimizations in this area, because they dont have the 
continuity requirement anymore, so the fd bitmap is not a 'forced' 
property of them.

	Ingo

  parent reply	other threads:[~2007-05-31  9:03 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-29 21:27 Syslets, Threadlets, generic AIO support, v6 Zach Brown
2007-05-29 21:49 ` Linus Torvalds
2007-05-29 22:49   ` Zach Brown
2007-05-29 22:16 ` Jeff Garzik
2007-05-29 23:09   ` Zach Brown
2007-05-29 23:20     ` Ulrich Drepper
2007-05-30  1:11       ` Dave Jones
2007-05-30 17:08         ` Zach Brown
2007-05-30  7:26     ` Ingo Molnar
2007-05-30  7:20   ` Ingo Molnar
2007-05-30  7:31     ` Ulrich Drepper
2007-05-30  8:42       ` Ingo Molnar
2007-05-30  8:51         ` Evgeniy Polyakov
2007-05-30  9:05           ` Ingo Molnar
2007-05-30 15:16         ` Linus Torvalds
2007-05-30 15:39         ` Ulrich Drepper
2007-05-30 19:40         ` Davide Libenzi
2007-05-30 19:55           ` Ulrich Drepper
2007-05-30 20:00           ` Linus Torvalds
2007-05-30 20:21             ` Davide Libenzi
2007-05-30 20:31             ` Eric Dumazet
2007-05-30 20:44               ` Linus Torvalds
2007-05-30 21:53                 ` Eric Dumazet
2007-05-30 21:31               ` Davide Libenzi
2007-05-30 21:16             ` Ulrich Drepper
2007-05-30 21:27               ` Linus Torvalds
2007-05-30 21:47                 ` Ulrich Drepper
2007-05-30 22:06                   ` Davide Libenzi
2007-05-30 21:48                 ` Davide Libenzi
2007-05-30 22:01                   ` Linus Torvalds
2007-05-31  6:13                     ` Ingo Molnar
2007-05-31  7:35                       ` Eric Dumazet
2007-05-31  9:26                         ` Ingo Molnar
2007-05-31  9:02                       ` Ingo Molnar [this message]
2007-05-31 10:41                         ` Eric Dumazet
2007-05-31 10:50                           ` Ingo Molnar
2007-05-31  9:32                       ` Ingo Molnar
2007-05-31  9:34                         ` Jens Axboe
2007-05-30 22:09                   ` Eric Dumazet
2007-05-30 21:51                 ` David M. Lloyd
2007-05-30 22:24                 ` William Lee Irwin III
2007-05-30 21:38               ` Jeremy Fitzhardinge
2007-05-30 21:39               ` Davide Libenzi
2007-05-30 21:36             ` Jeremy Fitzhardinge
2007-05-30 21:44               ` Linus Torvalds
2007-05-30 21:48                 ` Linus Torvalds
2007-05-30 21:54                   ` Jeremy Fitzhardinge
2007-05-30 22:27             ` Matt Mackall
2007-05-30 22:38               ` William Lee Irwin III
2007-05-30  8:32     ` Evgeniy Polyakov
2007-05-30  8:54       ` Ingo Molnar
2007-05-30  9:30         ` Evgeniy Polyakov
2007-05-30  9:28     ` Jeff Garzik
2007-05-30 13:02       ` Ingo Molnar
2007-05-30 13:20         ` Ingo Molnar
2007-05-30 15:31       ` Linus Torvalds
2007-05-30 16:09         ` Ingo Molnar
2007-05-30 17:57           ` Jens Axboe
2007-05-30 19:05           ` Mark Lord
2007-05-30 19:10             ` Jens Axboe
2007-05-30 19:15             ` Linus Torvalds
2007-05-30 19:32               ` Jens Axboe
2007-05-30 20:07               ` Eric Dumazet
2007-05-30 20:31                 ` Linus Torvalds
2007-05-30 20:46                   ` Eric Dumazet
2007-05-30 19:52           ` Davide Libenzi
2007-05-30  7:40 ` Jens Axboe
2007-05-30 16:55   ` Zach Brown
2007-05-30 17:33     ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2007-05-31  8:15 Albert Cahalan
2007-05-31  9:50 ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070531090252.GA29817@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@zip.com.au \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=arjan@infradead.org \
    --cc=davem@davemloft.net \
    --cc=davidel@xmailserver.org \
    --cc=drepper@redhat.com \
    --cc=hch@infradead.org \
    --cc=jeff@garzik.org \
    --cc=jens.axboe@oracle.com \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=suparna@in.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox