All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: dgc@sgi.com, Jens Axboe <axboe@suse.de>,
	Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, npiggin@suse.de,
	linux-mm@kvack.org
Subject: Re: Lockless page cache test results
Date: Sat, 29 Apr 2006 00:01:47 +1000	[thread overview]
Message-ID: <20060428140146.GA4657648@melbourne.sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0604261330310.20897@schroedinger.engr.sgi.com>

On Wed, Apr 26, 2006 at 01:31:14PM -0700, Christoph Lameter wrote:
> Dave: Can you tell us more about the tree_lock contentions on I/O that you 
> have seen?

Sorry to be slow responding - I've been sick the last couple of days.

Take a large file - say Size = 5x RAM or so - and then start
N threads runnnning at offset (n / Size) where n = the thread
number. They each read (Size / N) and so typically don't overlap. 

Throughput with increasing numbers of threads on a 24p altix
on an XFS filesystem on 2.6.15-rc5 looks like:

++++ Local I/O Block size 262144 ++++ Thu Dec 22 03:41:42 PST 2005


Loads   Type    blksize count   av_time    tput    usr%   sys%   intr%
-----   ----    ------- -----   ------- -------    ----   ----   -----
  1      read   256.00K 256.00K   82.92  789.59    1.80  215.40   18.40
  2      read   256.00K 256.00K   53.97 1191.56    2.10  389.40   22.60
  4      read   256.00K 256.00K   37.83 1724.63    2.20  776.00   29.30
  8      read   256.00K 256.00K   52.57 1213.63    2.20 1423.60   24.30
  16     read   256.00K 256.00K   60.05 1057.03    1.90 1951.10   24.30
  32     read   256.00K 256.00K   82.13  744.73    2.00 2277.50   18.60
                                        ^^^^^^^         ^^^^^^^

Basically,  we hit a scaling limitation at b/t 4 and 8 threads. This was
consistent across I/O sizes from 4KB to 4MB. I took a simple 30s PC sample
profile:

user ticks:             0               0 %
kernel ticks:           2982            99.97 %
idle ticks:             4               0.13 %

Using /proc/kallsyms as the kernel map file.
====================================================================
                           Kernel

      Ticks     Percent  Cumulative   Routine
                          Percent
--------------------------------------------------------------------
       1897       63.62    63.62      _write_lock_irqsave
        467       15.66    79.28      _read_unlock_irq
         91        3.05    82.33      established_get_next
         74        2.48    84.81      generic__raw_read_trylock
         59        1.98    86.79      xfs_iunlock
         47        1.58    88.36      _write_unlock_irq
         46        1.54    89.91      xfs_bmapi
         40        1.34    91.25      do_generic_mapping_read
         35        1.17    92.42      xfs_ilock_map_shared
         26        0.87    93.29      __copy_user
         23        0.77    94.06      __do_page_cache_readahead
         16        0.54    94.60      unlock_page
         15        0.50    95.10      xfs_ilock
         15        0.50    95.61      shrink_cache
         15        0.50    96.11      _spin_unlock_irqrestore
         13        0.44    96.55      sub_preempt_count
         11        0.37    96.91      mpage_end_io_read
         10        0.34    97.25      add_preempt_count
         10        0.34    97.59      xfs_iomap
          9        0.30    97.89      _read_unlock


So read_unlock_irq looks to be triggered by the mapping->tree_lock.

I think that the write_lock_irqsave() contention is from memory
reclaim (shrink_list()->try_to_release_page()-> ->releasepage()->
xfs_vm_releasepage()-> try_to_free_buffers()->clear_page_dirty()->
test_clear_page_dirty()-> write_lock_irqsave(&mapping->tree_lock...))
because page cache memory was full of this one file and demand is
causing them to be constantly recycled.

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

WARNING: multiple messages have this Message-ID (diff)
From: David Chinner <dgc@sgi.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: dgc@sgi.com, Jens Axboe <axboe@suse.de>,
	Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, npiggin@suse.de,
	linux-mm@kvack.org
Subject: Re: Lockless page cache test results
Date: Sat, 29 Apr 2006 00:01:47 +1000	[thread overview]
Message-ID: <20060428140146.GA4657648@melbourne.sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0604261330310.20897@schroedinger.engr.sgi.com>

On Wed, Apr 26, 2006 at 01:31:14PM -0700, Christoph Lameter wrote:
> Dave: Can you tell us more about the tree_lock contentions on I/O that you 
> have seen?

Sorry to be slow responding - I've been sick the last couple of days.

Take a large file - say Size = 5x RAM or so - and then start
N threads runnnning at offset (n / Size) where n = the thread
number. They each read (Size / N) and so typically don't overlap. 

Throughput with increasing numbers of threads on a 24p altix
on an XFS filesystem on 2.6.15-rc5 looks like:

++++ Local I/O Block size 262144 ++++ Thu Dec 22 03:41:42 PST 2005


Loads   Type    blksize count   av_time    tput    usr%   sys%   intr%
-----   ----    ------- -----   ------- -------    ----   ----   -----
  1      read   256.00K 256.00K   82.92  789.59    1.80  215.40   18.40
  2      read   256.00K 256.00K   53.97 1191.56    2.10  389.40   22.60
  4      read   256.00K 256.00K   37.83 1724.63    2.20  776.00   29.30
  8      read   256.00K 256.00K   52.57 1213.63    2.20 1423.60   24.30
  16     read   256.00K 256.00K   60.05 1057.03    1.90 1951.10   24.30
  32     read   256.00K 256.00K   82.13  744.73    2.00 2277.50   18.60
                                        ^^^^^^^         ^^^^^^^

Basically,  we hit a scaling limitation at b/t 4 and 8 threads. This was
consistent across I/O sizes from 4KB to 4MB. I took a simple 30s PC sample
profile:

user ticks:             0               0 %
kernel ticks:           2982            99.97 %
idle ticks:             4               0.13 %

Using /proc/kallsyms as the kernel map file.
====================================================================
                           Kernel

      Ticks     Percent  Cumulative   Routine
                          Percent
--------------------------------------------------------------------
       1897       63.62    63.62      _write_lock_irqsave
        467       15.66    79.28      _read_unlock_irq
         91        3.05    82.33      established_get_next
         74        2.48    84.81      generic__raw_read_trylock
         59        1.98    86.79      xfs_iunlock
         47        1.58    88.36      _write_unlock_irq
         46        1.54    89.91      xfs_bmapi
         40        1.34    91.25      do_generic_mapping_read
         35        1.17    92.42      xfs_ilock_map_shared
         26        0.87    93.29      __copy_user
         23        0.77    94.06      __do_page_cache_readahead
         16        0.54    94.60      unlock_page
         15        0.50    95.10      xfs_ilock
         15        0.50    95.61      shrink_cache
         15        0.50    96.11      _spin_unlock_irqrestore
         13        0.44    96.55      sub_preempt_count
         11        0.37    96.91      mpage_end_io_read
         10        0.34    97.25      add_preempt_count
         10        0.34    97.59      xfs_iomap
          9        0.30    97.89      _read_unlock


So read_unlock_irq looks to be triggered by the mapping->tree_lock.

I think that the write_lock_irqsave() contention is from memory
reclaim (shrink_list()->try_to_release_page()-> ->releasepage()->
xfs_vm_releasepage()-> try_to_free_buffers()->clear_page_dirty()->
test_clear_page_dirty()-> write_lock_irqsave(&mapping->tree_lock...))
because page cache memory was full of this one file and demand is
causing them to be constantly recycled.

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-04-28 14:02 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-26 13:53 Lockless page cache test results Jens Axboe
2006-04-26 14:43 ` Nick Piggin
2006-04-26 14:43   ` Nick Piggin
2006-04-26 19:46   ` Jens Axboe
2006-04-26 19:46     ` Jens Axboe
2006-04-27  5:39     ` Chen, Kenneth W
2006-04-27  5:39       ` Chen, Kenneth W
2006-04-27  6:07       ` Nick Piggin
2006-04-27  6:07         ` Nick Piggin
2006-04-27  6:15       ` Andi Kleen
2006-04-27  6:15         ` Andi Kleen
2006-04-27  7:51         ` Chen, Kenneth W
2006-04-27  7:51           ` Chen, Kenneth W
2006-04-26 16:55 ` Andrew Morton
2006-04-26 16:55   ` Andrew Morton
2006-04-26 17:42   ` Jens Axboe
2006-04-26 17:42     ` Jens Axboe
2006-04-26 18:10     ` Andrew Morton
2006-04-26 18:10       ` Andrew Morton
2006-04-26 18:23       ` Jens Axboe
2006-04-26 18:23         ` Jens Axboe
2006-04-26 18:46         ` Andrew Morton
2006-04-26 18:46           ` Andrew Morton
2006-04-26 19:21           ` Jens Axboe
2006-04-26 19:21             ` Jens Axboe
2006-04-27  5:58           ` Nick Piggin
2006-04-27  5:58             ` Nick Piggin
2006-04-26 18:34       ` Christoph Lameter
2006-04-26 18:34         ` Christoph Lameter
2006-04-26 18:47         ` Andrew Morton
2006-04-26 18:47           ` Andrew Morton
2006-04-26 18:48           ` Christoph Lameter
2006-04-26 18:48             ` Christoph Lameter
2006-04-26 18:49           ` Jens Axboe
2006-04-26 18:49             ` Jens Axboe
2006-04-26 20:31             ` Christoph Lameter
2006-04-26 20:31               ` Christoph Lameter
2006-04-28 14:01               ` David Chinner [this message]
2006-04-28 14:01                 ` David Chinner
2006-04-28 14:10                 ` David Chinner
2006-04-28 14:10                   ` David Chinner
2006-04-30  9:49                 ` Nick Piggin
2006-04-30 11:20                   ` Nick Piggin
2006-04-30 11:20                     ` Nick Piggin
2006-04-30 11:39                   ` Jens Axboe
2006-04-30 11:39                     ` Jens Axboe
2006-04-30 11:44                     ` Nick Piggin
2006-04-26 18:58       ` Christoph Hellwig
2006-04-26 18:58         ` Christoph Hellwig
2006-04-26 19:02         ` Jens Axboe
2006-04-26 19:02           ` Jens Axboe
2006-04-26 19:00       ` Linus Torvalds
2006-04-26 19:00         ` Linus Torvalds
2006-04-26 19:15         ` Jens Axboe
2006-04-26 19:15           ` Jens Axboe
2006-04-26 20:12           ` Andrew Morton
2006-04-26 20:12             ` Andrew Morton
2006-04-27  7:45             ` Jens Axboe
2006-04-27  7:47               ` Jens Axboe
2006-04-27  7:47                 ` Jens Axboe
2006-04-27  7:57               ` Nick Piggin
2006-04-27  7:57                 ` Nick Piggin
2006-04-27  8:02                 ` Nick Piggin
2006-04-27  8:02                   ` Nick Piggin
2006-04-27  9:00                   ` Jens Axboe
2006-04-27  9:00                     ` Jens Axboe
2006-04-27 13:36                     ` Nick Piggin
2006-04-27 13:36                       ` Nick Piggin
2006-04-27  8:36                 ` Jens Axboe
2006-04-27  8:36                   ` Jens Axboe
2006-04-28 11:28             ` Wu Fengguang
2006-04-28 11:28               ` Wu Fengguang
2006-04-28 11:28                 ` Wu Fengguang
2006-04-27  5:49         ` Nick Piggin
2006-04-27  5:49           ` Nick Piggin
2006-04-27 15:12           ` Linus Torvalds
2006-04-27 15:12             ` Linus Torvalds
2006-04-28  4:54             ` Nick Piggin
2006-04-28  4:54               ` Nick Piggin
2006-04-28  5:34               ` Linus Torvalds
2006-04-28  5:34                 ` Linus Torvalds
2006-04-27  9:35         ` Jens Axboe
2006-04-27  5:22       ` Nick Piggin
2006-04-27  5:22         ` Nick Piggin
2006-04-26 18:57     ` Jens Axboe
2006-04-27  2:19       ` KAMEZAWA Hiroyuki
2006-04-27  2:19         ` KAMEZAWA Hiroyuki
2006-04-27  8:03         ` Jens Axboe
2006-04-27  8:03           ` Jens Axboe
2006-04-27 11:16           ` Jens Axboe
2006-04-27 11:16             ` Jens Axboe
2006-04-27 11:41             ` KAMEZAWA Hiroyuki
2006-04-27 11:41               ` KAMEZAWA Hiroyuki
2006-04-27 11:45               ` Jens Axboe
2006-04-27 11:45                 ` Jens Axboe
2006-04-28  9:10 ` Pavel Machek
2006-04-28  9:10   ` Pavel Machek
2006-04-28  9:21   ` Jens Axboe
2006-04-28  9:21     ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2006-04-28 16:58 Al Boldi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060428140146.GA4657648@melbourne.sgi.com \
    --to=dgc@sgi.com \
    --cc=akpm@osdl.org \
    --cc=axboe@suse.de \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.