public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Rik van Riel <riel@conectiva.com.br>
Cc: Momchil Velikov <velco@fadata.bg>,
	John Stoffel <stoffel@casc.com>,
	Linus Torvalds <torvalds@transmeta.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Radix-tree pagecache for 2.5
Date: Thu, 31 Jan 2002 15:36:07 +0100	[thread overview]
Message-ID: <20020131153607.C1309@athlon.random> (raw)
In-Reply-To: <20020131033345.X1309@athlon.random> <Pine.LNX.4.33L.0201311155010.32634-100000@imladris.surriel.com>
In-Reply-To: <Pine.LNX.4.33L.0201311155010.32634-100000@imladris.surriel.com>; from riel@conectiva.com.br on Thu, Jan 31, 2002 at 11:58:10AM -0200

On Thu, Jan 31, 2002 at 11:58:10AM -0200, Rik van Riel wrote:
> On Thu, 31 Jan 2002, Andrea Arcangeli wrote:
> 
> > So I wouldn't merge it, at least until some math is done for the memory
> > consumation with 500k inodes with only 1 page in them each, and on the
> > number of heights/levels that must be walked during the tree lookup,
> > during a access at offset 10G (or worst case in general [biggest
> > height]) of an inode with 10G just allocated in pagecache.
> 
> Ummm, I don't see how this worst case is any more realistic
> as the worst case for the hash table (where all pages live
> in very few hash buckets and we have really deep chains).

Mathematically the hashtable complexity is O(N). But probabilistically
with the tuning we do on the hashtable size, the collisions will be
nearly zero for most buckets for of most workloads. Despite the worst
case is with all the pagecache and swapcache queued in a single linked
list :).

So in short math is wrong about O(N) being bad, hashtable is infact the
only way we can get an effective O(1) by just paying RAM. We pay with
RAM and we get performance back to us.

but with the radix tree (please correct me if I'm wrong) the height will
increase eventually, no matter what (so it won't be an effective O(1)
like the hashtable provides in real life, not the worst case, the common
case). With the hashtable the height won't increase instead.

In short to get the same performance with the radix tree, you'd need to
waste an huge amount of ram per inode, the hashtable is instead global
so we pay only once for all the pages in the system. At least this is my
understanding, I'm not a radix tree guru though, so I may be missing
something.

> 
> People just don't go around caching a single page each for
> all of their 10 GB files and even if they _wanted to_ they
> couldn't because of the readahead code.
> 
> I suspect that for large files we'll always have around
> min_readahead logically contiguous pages cached, if not more.

readahead really doesn't matter at all. consider all the data just in
cache, assume you wrote it and you will never ever need to read it once
again because you've 64G of ram and only 20G of disk.

I/O on large files can and must run as fast as I/O on small files, at
the pagecache level. If the fs doesn't support extents or it's
inefficient with large files that's an enterly different problem, and
the underlying fs doesn't matter any longer once the data is in cache.
pagecache must not slowdown on big files.

Otherwise for an unix fs developer usage (the small files ala dbench)
the rbtree was much nicer data structure than the hash in first place
(and it eats less ram than the radix tree if only one page is queued
etc...).

Andrea

  reply	other threads:[~2002-01-31 14:35 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-01-29 15:54 [PATCH] Radix-tree pagecache for 2.5 Christoph Hellwig
2002-01-29 19:27 ` Linus Torvalds
2002-01-29 21:40   ` David S. Miller
2002-01-29 22:07     ` Linus Torvalds
2002-01-29 23:01       ` Rik van Riel
2002-01-29 23:32         ` Alan Cox
2002-01-29 23:35           ` Rik van Riel
2002-01-30  3:00             ` Daniel Phillips
2002-01-31 23:44           ` Anton Blanchard
2002-02-01  0:34             ` Alan Cox
2002-02-01 11:04               ` Rik van Riel
2002-02-01 11:33                 ` Arjan van de Ven
2002-02-02 18:57                 ` Richard Henderson
2002-02-02 21:15                   ` Rik van Riel
2002-01-29 23:02   ` Momchil Velikov
2002-01-29 23:33     ` Linus Torvalds
2002-01-29 23:45       ` Christoph Hellwig
2002-01-30 21:25       ` Momchil Velikov
2002-01-30 22:05         ` John Stoffel
2002-01-30 22:15           ` Momchil Velikov
2002-01-31  2:33             ` Andrea Arcangeli
2002-01-31 13:58               ` Rik van Riel
2002-01-31 14:36                 ` Andrea Arcangeli [this message]
2002-01-31 15:32                   ` Alan Cox
2002-01-31 16:39                   ` William Lee Irwin III
2002-01-31 17:19                     ` William Lee Irwin III
2002-01-31 17:21                     ` Andrea Arcangeli
2002-01-31 17:50                       ` William Lee Irwin III
2002-01-31 17:46                   ` Linus Torvalds
2002-01-31 18:02                     ` Andrea Arcangeli
2002-01-31 18:32                       ` Linus Torvalds
2002-01-31 18:38                         ` Rik van Riel
2002-01-31 18:49                           ` Linus Torvalds
2002-01-31 19:09                             ` Momchil Velikov
2002-01-31 19:26                             ` Andrew Morton
2002-01-31 21:12                             ` Momchil Velikov
2002-01-31 19:14                         ` Andrea Arcangeli
2002-01-31 19:23                           ` Linus Torvalds
2002-01-31 21:34                             ` Ingo Molnar
2002-01-31 23:12                               ` Anton Blanchard
2002-01-31 23:55                                 ` Andrea Arcangeli
2002-02-01  0:01                                   ` David S. Miller
2002-02-16 16:20                                     ` Andrea Arcangeli
2002-02-01  3:56                                   ` Anton Blanchard
2002-02-01  6:32                                 ` Momchil Velikov
2002-02-01 18:38                                   ` Anton Blanchard
2002-02-01  9:04                                 ` Ingo Molnar
2002-02-01  7:59                                   ` Momchil Velikov
2002-02-01 10:29                                     ` Ingo Molnar
2002-02-01  9:01                                       ` Momchil Velikov
2002-02-01  9:10                                         ` David S. Miller
2002-02-01  9:07                                       ` David S. Miller
2002-02-01  9:13                                         ` Momchil Velikov
2002-02-01 17:06                                       ` Linus Torvalds
2002-02-01 18:29                                         ` Jeff Garzik
2002-02-01 18:44                                           ` arjan
2002-02-01 19:47                                             ` Jeff Garzik
2002-02-02 15:39                                               ` Rik van Riel
2002-02-05 14:21                                                 ` Pavel Machek
2002-02-05 18:45                                                   ` Rik van Riel
2002-02-05 20:30                                                     ` Eric Dumazet
2002-02-05 20:46                                                     ` Pavel Machek
2002-02-06  9:07                                                     ` Daniel Phillips
2002-02-05  9:19                                           ` Zdenek Kabelac
2002-02-01 23:49                                         ` Ingo Molnar
2002-02-01 14:44                                   ` Andrea Arcangeli
2002-02-01 14:59                                   ` Momchil Velikov
2002-02-01 17:03                                     ` Ingo Molnar
2002-02-01 15:26                                       ` Momchil Velikov
2002-02-01 23:45                                         ` Ingo Molnar
2002-01-31 10:41             ` Josh MacDonald
2002-01-31 14:00               ` Rik van Riel
2002-01-31 14:21               ` Momchil Velikov
2002-01-30 22:22           ` Christoph Hellwig
2002-01-30  3:02     ` Daniel Phillips
2002-01-29 23:00 ` William Lee Irwin III
  -- strict thread matches above, loose matches on Subject: below --
2002-02-02 19:23 rwhron
2002-02-03 14:31 ` chris
2002-02-03 23:33 ` Momchil Velikov
2002-02-04  3:59   ` rwhron
2002-02-06  2:04   ` rwhron
2002-02-06 11:44     ` Rik van Riel
2002-02-06 21:34 rwhron
2002-02-06 21:37 ` Rik van Riel
2002-02-06 22:06   ` rwhron
2002-02-07 11:32   ` Daniel Phillips
2002-02-07 11:32   ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020131153607.C1309@athlon.random \
    --to=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@conectiva.com.br \
    --cc=stoffel@casc.com \
    --cc=torvalds@transmeta.com \
    --cc=velco@fadata.bg \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox