From: David Chinner <dgc@sgi.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: dgc@sgi.com, Jens Axboe <axboe@suse.de>,
Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org, npiggin@suse.de,
linux-mm@kvack.org
Subject: Re: Lockless page cache test results
Date: Sat, 29 Apr 2006 00:01:47 +1000 [thread overview]
Message-ID: <20060428140146.GA4657648@melbourne.sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0604261330310.20897@schroedinger.engr.sgi.com>
On Wed, Apr 26, 2006 at 01:31:14PM -0700, Christoph Lameter wrote:
> Dave: Can you tell us more about the tree_lock contentions on I/O that you
> have seen?
Sorry to be slow responding - I've been sick the last couple of days.
Take a large file - say Size = 5x RAM or so - and then start
N threads runnnning at offset (n / Size) where n = the thread
number. They each read (Size / N) and so typically don't overlap.
Throughput with increasing numbers of threads on a 24p altix
on an XFS filesystem on 2.6.15-rc5 looks like:
++++ Local I/O Block size 262144 ++++ Thu Dec 22 03:41:42 PST 2005
Loads Type blksize count av_time tput usr% sys% intr%
----- ---- ------- ----- ------- ------- ---- ---- -----
1 read 256.00K 256.00K 82.92 789.59 1.80 215.40 18.40
2 read 256.00K 256.00K 53.97 1191.56 2.10 389.40 22.60
4 read 256.00K 256.00K 37.83 1724.63 2.20 776.00 29.30
8 read 256.00K 256.00K 52.57 1213.63 2.20 1423.60 24.30
16 read 256.00K 256.00K 60.05 1057.03 1.90 1951.10 24.30
32 read 256.00K 256.00K 82.13 744.73 2.00 2277.50 18.60
^^^^^^^ ^^^^^^^
Basically, we hit a scaling limitation at b/t 4 and 8 threads. This was
consistent across I/O sizes from 4KB to 4MB. I took a simple 30s PC sample
profile:
user ticks: 0 0 %
kernel ticks: 2982 99.97 %
idle ticks: 4 0.13 %
Using /proc/kallsyms as the kernel map file.
====================================================================
Kernel
Ticks Percent Cumulative Routine
Percent
--------------------------------------------------------------------
1897 63.62 63.62 _write_lock_irqsave
467 15.66 79.28 _read_unlock_irq
91 3.05 82.33 established_get_next
74 2.48 84.81 generic__raw_read_trylock
59 1.98 86.79 xfs_iunlock
47 1.58 88.36 _write_unlock_irq
46 1.54 89.91 xfs_bmapi
40 1.34 91.25 do_generic_mapping_read
35 1.17 92.42 xfs_ilock_map_shared
26 0.87 93.29 __copy_user
23 0.77 94.06 __do_page_cache_readahead
16 0.54 94.60 unlock_page
15 0.50 95.10 xfs_ilock
15 0.50 95.61 shrink_cache
15 0.50 96.11 _spin_unlock_irqrestore
13 0.44 96.55 sub_preempt_count
11 0.37 96.91 mpage_end_io_read
10 0.34 97.25 add_preempt_count
10 0.34 97.59 xfs_iomap
9 0.30 97.89 _read_unlock
So read_unlock_irq looks to be triggered by the mapping->tree_lock.
I think that the write_lock_irqsave() contention is from memory
reclaim (shrink_list()->try_to_release_page()-> ->releasepage()->
xfs_vm_releasepage()-> try_to_free_buffers()->clear_page_dirty()->
test_clear_page_dirty()-> write_lock_irqsave(&mapping->tree_lock...))
because page cache memory was full of this one file and demand is
causing them to be constantly recycled.
Cheers,
Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group
WARNING: multiple messages have this Message-ID (diff)
From: David Chinner <dgc@sgi.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: dgc@sgi.com, Jens Axboe <axboe@suse.de>,
Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org, npiggin@suse.de,
linux-mm@kvack.org
Subject: Re: Lockless page cache test results
Date: Sat, 29 Apr 2006 00:01:47 +1000 [thread overview]
Message-ID: <20060428140146.GA4657648@melbourne.sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0604261330310.20897@schroedinger.engr.sgi.com>
On Wed, Apr 26, 2006 at 01:31:14PM -0700, Christoph Lameter wrote:
> Dave: Can you tell us more about the tree_lock contentions on I/O that you
> have seen?
Sorry to be slow responding - I've been sick the last couple of days.
Take a large file - say Size = 5x RAM or so - and then start
N threads runnnning at offset (n / Size) where n = the thread
number. They each read (Size / N) and so typically don't overlap.
Throughput with increasing numbers of threads on a 24p altix
on an XFS filesystem on 2.6.15-rc5 looks like:
++++ Local I/O Block size 262144 ++++ Thu Dec 22 03:41:42 PST 2005
Loads Type blksize count av_time tput usr% sys% intr%
----- ---- ------- ----- ------- ------- ---- ---- -----
1 read 256.00K 256.00K 82.92 789.59 1.80 215.40 18.40
2 read 256.00K 256.00K 53.97 1191.56 2.10 389.40 22.60
4 read 256.00K 256.00K 37.83 1724.63 2.20 776.00 29.30
8 read 256.00K 256.00K 52.57 1213.63 2.20 1423.60 24.30
16 read 256.00K 256.00K 60.05 1057.03 1.90 1951.10 24.30
32 read 256.00K 256.00K 82.13 744.73 2.00 2277.50 18.60
^^^^^^^ ^^^^^^^
Basically, we hit a scaling limitation at b/t 4 and 8 threads. This was
consistent across I/O sizes from 4KB to 4MB. I took a simple 30s PC sample
profile:
user ticks: 0 0 %
kernel ticks: 2982 99.97 %
idle ticks: 4 0.13 %
Using /proc/kallsyms as the kernel map file.
====================================================================
Kernel
Ticks Percent Cumulative Routine
Percent
--------------------------------------------------------------------
1897 63.62 63.62 _write_lock_irqsave
467 15.66 79.28 _read_unlock_irq
91 3.05 82.33 established_get_next
74 2.48 84.81 generic__raw_read_trylock
59 1.98 86.79 xfs_iunlock
47 1.58 88.36 _write_unlock_irq
46 1.54 89.91 xfs_bmapi
40 1.34 91.25 do_generic_mapping_read
35 1.17 92.42 xfs_ilock_map_shared
26 0.87 93.29 __copy_user
23 0.77 94.06 __do_page_cache_readahead
16 0.54 94.60 unlock_page
15 0.50 95.10 xfs_ilock
15 0.50 95.61 shrink_cache
15 0.50 96.11 _spin_unlock_irqrestore
13 0.44 96.55 sub_preempt_count
11 0.37 96.91 mpage_end_io_read
10 0.34 97.25 add_preempt_count
10 0.34 97.59 xfs_iomap
9 0.30 97.89 _read_unlock
So read_unlock_irq looks to be triggered by the mapping->tree_lock.
I think that the write_lock_irqsave() contention is from memory
reclaim (shrink_list()->try_to_release_page()-> ->releasepage()->
xfs_vm_releasepage()-> try_to_free_buffers()->clear_page_dirty()->
test_clear_page_dirty()-> write_lock_irqsave(&mapping->tree_lock...))
because page cache memory was full of this one file and demand is
causing them to be constantly recycled.
Cheers,
Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-04-28 14:02 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-26 13:53 Lockless page cache test results Jens Axboe
2006-04-26 14:43 ` Nick Piggin
2006-04-26 14:43 ` Nick Piggin
2006-04-26 19:46 ` Jens Axboe
2006-04-26 19:46 ` Jens Axboe
2006-04-27 5:39 ` Chen, Kenneth W
2006-04-27 5:39 ` Chen, Kenneth W
2006-04-27 6:07 ` Nick Piggin
2006-04-27 6:07 ` Nick Piggin
2006-04-27 6:15 ` Andi Kleen
2006-04-27 6:15 ` Andi Kleen
2006-04-27 7:51 ` Chen, Kenneth W
2006-04-27 7:51 ` Chen, Kenneth W
2006-04-26 16:55 ` Andrew Morton
2006-04-26 16:55 ` Andrew Morton
2006-04-26 17:42 ` Jens Axboe
2006-04-26 17:42 ` Jens Axboe
2006-04-26 18:10 ` Andrew Morton
2006-04-26 18:10 ` Andrew Morton
2006-04-26 18:23 ` Jens Axboe
2006-04-26 18:23 ` Jens Axboe
2006-04-26 18:46 ` Andrew Morton
2006-04-26 18:46 ` Andrew Morton
2006-04-26 19:21 ` Jens Axboe
2006-04-26 19:21 ` Jens Axboe
2006-04-27 5:58 ` Nick Piggin
2006-04-27 5:58 ` Nick Piggin
2006-04-26 18:34 ` Christoph Lameter
2006-04-26 18:34 ` Christoph Lameter
2006-04-26 18:47 ` Andrew Morton
2006-04-26 18:47 ` Andrew Morton
2006-04-26 18:48 ` Christoph Lameter
2006-04-26 18:48 ` Christoph Lameter
2006-04-26 18:49 ` Jens Axboe
2006-04-26 18:49 ` Jens Axboe
2006-04-26 20:31 ` Christoph Lameter
2006-04-26 20:31 ` Christoph Lameter
2006-04-28 14:01 ` David Chinner [this message]
2006-04-28 14:01 ` David Chinner
2006-04-28 14:10 ` David Chinner
2006-04-28 14:10 ` David Chinner
2006-04-30 9:49 ` Nick Piggin
2006-04-30 11:20 ` Nick Piggin
2006-04-30 11:20 ` Nick Piggin
2006-04-30 11:39 ` Jens Axboe
2006-04-30 11:39 ` Jens Axboe
2006-04-30 11:44 ` Nick Piggin
2006-04-26 18:58 ` Christoph Hellwig
2006-04-26 18:58 ` Christoph Hellwig
2006-04-26 19:02 ` Jens Axboe
2006-04-26 19:02 ` Jens Axboe
2006-04-26 19:00 ` Linus Torvalds
2006-04-26 19:00 ` Linus Torvalds
2006-04-26 19:15 ` Jens Axboe
2006-04-26 19:15 ` Jens Axboe
2006-04-26 20:12 ` Andrew Morton
2006-04-26 20:12 ` Andrew Morton
2006-04-27 7:45 ` Jens Axboe
2006-04-27 7:47 ` Jens Axboe
2006-04-27 7:47 ` Jens Axboe
2006-04-27 7:57 ` Nick Piggin
2006-04-27 7:57 ` Nick Piggin
2006-04-27 8:02 ` Nick Piggin
2006-04-27 8:02 ` Nick Piggin
2006-04-27 9:00 ` Jens Axboe
2006-04-27 9:00 ` Jens Axboe
2006-04-27 13:36 ` Nick Piggin
2006-04-27 13:36 ` Nick Piggin
2006-04-27 8:36 ` Jens Axboe
2006-04-27 8:36 ` Jens Axboe
2006-04-28 11:28 ` Wu Fengguang
2006-04-28 11:28 ` Wu Fengguang
2006-04-28 11:28 ` Wu Fengguang
2006-04-27 5:49 ` Nick Piggin
2006-04-27 5:49 ` Nick Piggin
2006-04-27 15:12 ` Linus Torvalds
2006-04-27 15:12 ` Linus Torvalds
2006-04-28 4:54 ` Nick Piggin
2006-04-28 4:54 ` Nick Piggin
2006-04-28 5:34 ` Linus Torvalds
2006-04-28 5:34 ` Linus Torvalds
2006-04-27 9:35 ` Jens Axboe
2006-04-27 5:22 ` Nick Piggin
2006-04-27 5:22 ` Nick Piggin
2006-04-26 18:57 ` Jens Axboe
2006-04-27 2:19 ` KAMEZAWA Hiroyuki
2006-04-27 2:19 ` KAMEZAWA Hiroyuki
2006-04-27 8:03 ` Jens Axboe
2006-04-27 8:03 ` Jens Axboe
2006-04-27 11:16 ` Jens Axboe
2006-04-27 11:16 ` Jens Axboe
2006-04-27 11:41 ` KAMEZAWA Hiroyuki
2006-04-27 11:41 ` KAMEZAWA Hiroyuki
2006-04-27 11:45 ` Jens Axboe
2006-04-27 11:45 ` Jens Axboe
2006-04-28 9:10 ` Pavel Machek
2006-04-28 9:10 ` Pavel Machek
2006-04-28 9:21 ` Jens Axboe
2006-04-28 9:21 ` Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2006-04-28 16:58 Al Boldi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060428140146.GA4657648@melbourne.sgi.com \
--to=dgc@sgi.com \
--cc=akpm@osdl.org \
--cc=axboe@suse.de \
--cc=clameter@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.