From: Andrea Arcangeli <andrea@suse.de>
To: Ingo Molnar <mingo@elte.hu>
Cc: Anton Blanchard <anton@samba.org>,
Linus Torvalds <torvalds@transmeta.com>,
Rik van Riel <riel@conectiva.com.br>,
Momchil Velikov <velco@fadata.bg>,
John Stoffel <stoffel@casc.com>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Radix-tree pagecache for 2.5
Date: Fri, 1 Feb 2002 15:44:33 +0100 [thread overview]
Message-ID: <20020201154433.C9904@athlon.random> (raw)
In-Reply-To: <20020131231242.GA4138@krispykreme> <Pine.LNX.4.33.0202010958220.2111-100000@localhost.localdomain>
In-Reply-To: <Pine.LNX.4.33.0202010958220.2111-100000@localhost.localdomain>; from mingo@elte.hu on Fri, Feb 01, 2002 at 10:04:50AM +0100
On Fri, Feb 01, 2002 at 10:04:50AM +0100, Ingo Molnar wrote:
>
> On Fri, 1 Feb 2002, Anton Blanchard wrote:
>
> > There were a few solutions (from davem and ingo) to allocate a larger
> > hash but with the radix patch we no longer have to worry about this.
>
> there is one big issue we forgot to consider.
>
> in the case of radix trees it's not only search depth that gets worse with
> big files. The thing i'm worried about is the 'big pagecache lock' being
> reintroduced again. If eg. a database application puts lots of data into a
> single file (multiple gigabytes - why not), then the
> mapping->i_shared_lock becomes a 'big pagecache lock' again, causing
> serious SMP contention for even the read() case. Benchmarks show that it's
> the distribution of locks that matters on big boxes.
exactly, this is the same thing I mentioned in some past email. It's not
that having per-inode data structures solves the locking completly, DBMS
are used to store stuff in a single file. And of course with a structure
like radix tree it would be a pain to have it scale within the same
file, unlike with the hashtable where each bucket is indipendent from
the others.
>
> dbench hides this issue, because it uses many temporary files, so the
Indeed, a lot of workloads would benefit from the separate data
structure and locking, but not all, some important one not.
> locking overhead is distributed. Would you be willing to run benchmarks
> that measure the scalability of reading from one bigger file, from
> multiple CPUs?
Agreed, also with DaveM patch applied, sizing the hash properly so it
has a mean distribution of 1 entry per bucket or so, will decrease the
window for the spinlock collisions as well btw.
>
> with hash based locking, the locking overhead is *always* distributed.
>
> with radix trees the locking overhead is distributed only if multiple
> files are used. With one big file (or a few big files), the i_shared_lock
> will always bounce between CPUs wildly in read() workloads, degrading
> scalability just as much as it is degraded with the pagecache_lock now.
>
> Ingo
Andrea
next prev parent reply other threads:[~2002-02-01 14:44 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-01-29 15:54 [PATCH] Radix-tree pagecache for 2.5 Christoph Hellwig
2002-01-29 19:27 ` Linus Torvalds
2002-01-29 21:40 ` David S. Miller
2002-01-29 22:07 ` Linus Torvalds
2002-01-29 23:01 ` Rik van Riel
2002-01-29 23:32 ` Alan Cox
2002-01-29 23:35 ` Rik van Riel
2002-01-30 3:00 ` Daniel Phillips
2002-01-31 23:44 ` Anton Blanchard
2002-02-01 0:34 ` Alan Cox
2002-02-01 11:04 ` Rik van Riel
2002-02-01 11:33 ` Arjan van de Ven
2002-02-02 18:57 ` Richard Henderson
2002-02-02 21:15 ` Rik van Riel
2002-01-29 23:02 ` Momchil Velikov
2002-01-29 23:33 ` Linus Torvalds
2002-01-29 23:45 ` Christoph Hellwig
2002-01-30 21:25 ` Momchil Velikov
2002-01-30 22:05 ` John Stoffel
2002-01-30 22:15 ` Momchil Velikov
2002-01-31 2:33 ` Andrea Arcangeli
2002-01-31 13:58 ` Rik van Riel
2002-01-31 14:36 ` Andrea Arcangeli
2002-01-31 15:32 ` Alan Cox
2002-01-31 16:39 ` William Lee Irwin III
2002-01-31 17:19 ` William Lee Irwin III
2002-01-31 17:21 ` Andrea Arcangeli
2002-01-31 17:50 ` William Lee Irwin III
2002-01-31 17:46 ` Linus Torvalds
2002-01-31 18:02 ` Andrea Arcangeli
2002-01-31 18:32 ` Linus Torvalds
2002-01-31 18:38 ` Rik van Riel
2002-01-31 18:49 ` Linus Torvalds
2002-01-31 19:09 ` Momchil Velikov
2002-01-31 19:26 ` Andrew Morton
2002-01-31 21:12 ` Momchil Velikov
2002-01-31 19:14 ` Andrea Arcangeli
2002-01-31 19:23 ` Linus Torvalds
2002-01-31 21:34 ` Ingo Molnar
2002-01-31 23:12 ` Anton Blanchard
2002-01-31 23:55 ` Andrea Arcangeli
2002-02-01 0:01 ` David S. Miller
2002-02-16 16:20 ` Andrea Arcangeli
2002-02-01 3:56 ` Anton Blanchard
2002-02-01 6:32 ` Momchil Velikov
2002-02-01 18:38 ` Anton Blanchard
2002-02-01 9:04 ` Ingo Molnar
2002-02-01 7:59 ` Momchil Velikov
2002-02-01 10:29 ` Ingo Molnar
2002-02-01 9:01 ` Momchil Velikov
2002-02-01 9:10 ` David S. Miller
2002-02-01 9:07 ` David S. Miller
2002-02-01 9:13 ` Momchil Velikov
2002-02-01 17:06 ` Linus Torvalds
2002-02-01 18:29 ` Jeff Garzik
2002-02-01 18:44 ` arjan
2002-02-01 19:47 ` Jeff Garzik
2002-02-02 15:39 ` Rik van Riel
2002-02-05 14:21 ` Pavel Machek
2002-02-05 18:45 ` Rik van Riel
2002-02-05 20:30 ` Eric Dumazet
2002-02-05 20:46 ` Pavel Machek
2002-02-06 9:07 ` Daniel Phillips
2002-02-05 9:19 ` Zdenek Kabelac
2002-02-01 23:49 ` Ingo Molnar
2002-02-01 14:44 ` Andrea Arcangeli [this message]
2002-02-01 14:59 ` Momchil Velikov
2002-02-01 17:03 ` Ingo Molnar
2002-02-01 15:26 ` Momchil Velikov
2002-02-01 23:45 ` Ingo Molnar
2002-01-31 10:41 ` Josh MacDonald
2002-01-31 14:00 ` Rik van Riel
2002-01-31 14:21 ` Momchil Velikov
2002-01-30 22:22 ` Christoph Hellwig
2002-01-30 3:02 ` Daniel Phillips
2002-01-29 23:00 ` William Lee Irwin III
-- strict thread matches above, loose matches on Subject: below --
2002-02-02 19:23 rwhron
2002-02-03 14:31 ` chris
2002-02-03 23:33 ` Momchil Velikov
2002-02-04 3:59 ` rwhron
2002-02-06 2:04 ` rwhron
2002-02-06 11:44 ` Rik van Riel
2002-02-06 21:34 rwhron
2002-02-06 21:37 ` Rik van Riel
2002-02-06 22:06 ` rwhron
2002-02-07 11:32 ` Daniel Phillips
2002-02-07 11:32 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020201154433.C9904@athlon.random \
--to=andrea@suse.de \
--cc=anton@samba.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=riel@conectiva.com.br \
--cc=stoffel@casc.com \
--cc=torvalds@transmeta.com \
--cc=velco@fadata.bg \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox