All of lore.kernel.org
 help / color / mirror / Atom feed
From: William Lee Irwin III <wli@holomorphy.com>
To: akpm@osdl.org
Cc: linux-kernel@vger.kernel.org
Subject: [0/2] filtered wakeups
Date: Sun, 2 May 2004 19:17:09 -0700	[thread overview]
Message-ID: <20040503021709.GF1397@holomorphy.com> (raw)

The thundering herd issue in waitqueue hashing has been seen in
practice. In order to preserve the space footprint reduction while
improving performance, I wrote "filtered wakeups", which discriminate
between waiters based on a key.

The following patch series, vs. 2.6.6-rc3-mm1, drastically reduces the
kernel cpu consumption of tiobench --threads 512 --size 16384 (fed to
tiotest by hand since apparently the perl script is buggy) on a 6x336MHz
UltraSPARC III Sun Enterprise 3000 with 3.5GB RAM, ESP-366HME HBA,
10x10Krpm 18GB U160 SCSI disks configured for dm thusly:
0 355655680 striped 10 64 /dev/sda 0 /dev/sdb 0 /dev/sdc 0 /dev/sdd 0 \
	/dev/sde 0 /dev/sdf 0 /dev/sdg 0 /dev/sdh 0 /dev/sdi 0 /dev/sdj 0
This was mkfs'd freshly to a single 171GB ext2 fs.

1/2, filtered page waitqueues, resolves the thundering herd issue with
	hashed page waitqueues.
2/2, filtered buffer_head waitqueues, resolves the thundering herd issue
	with hashed buffer_head waitqueues.
Futexes appear to have their own solution to this issue, which is
necessarily different from this as it needs to discriminate based on a
longer key. They could in principle be consolidated by passing a
comparator instead of comparing a key field or some similar strategy at
the cost of indirect function calls.

I furthermore instrumented the calls to schedule(), possibly done
indirectly, in patch 0.5/2 of the series, which isn't necessarily meant
to be applied to anything, but merely shows how I collected some of the
information in the runtime logs, which for space reasons I've posted as
URL's instead of including them inline.
ftp://ftp.kernel.org/pub/linux/kernel/people/wli/vm/filtered_wakeup/virgin_mm.log.tar.bz2
ftp://ftp.kernel.org/pub/linux/kernel/people/wli/vm/filtered_wakeup/filtered_wakeup.log.tar.bz2

Here "cpusec" represents 1 second of actual cpu consumed, counting cpu
consumption of both user and kernel. Apart from regular sampling of
profile data, no other load was running on the machine.

before:
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write       16384 MBs | 1118.1 s |  14.654 MB/s |   1.6 %  | 280.9 % |
| Random Write 2000 MBs |  336.2 s |   5.950 MB/s |   0.8 %  |  20.4 % |
| Read        16384 MBs | 1717.1 s |   9.542 MB/s |   1.4 %  |  31.8 % |
| Random Read  2000 MBs |  465.2 s |   4.300 MB/s |   1.1 %  |  36.1 % |
`----------------------------------------------------------------------'

Throughput scaled by %cpu:
Write:            5.1873MB/cpusec
Random Write:    28.0660MB/cpusec
Read:            28.7410MB/cpusec
Random Read:     11.5591MB/cpusec

top 10 kernel cpu consumers:
 21733 finish_task_switch                       113.1927
 11976 __wake_up                                187.1250
 11433 generic_file_aio_write_nolock              5.0321
  9730 read_sched_profile                        43.4375
  9606 file_read_actor                           42.8839
  9116 __do_softirq                              31.6528
  8682 do_anonymous_page                         19.3795
  3635 prepare_to_wait                           28.3984
  2159 kmem_cache_free                           16.8672
  1944 buffered_rmqueue                           3.3750

top 10 callers of scheduling functions:
9391185 wait_on_page_bit                         32608.2812
7280055 cpu_idle                                 37916.9531
1458446 __lock_page                              5064.0486
258142 __handle_preemption                      16133.8750
134815 worker_thread                            247.8217
 45989 __wait_on_buffer                         205.3080
 22294 do_exit                                   21.7715
 22187 generic_file_aio_write_nolock              9.7654
 14932 sys_wait4                                 25.9236
 14652 shrink_list                                7.8944


after:
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write       16384 MBs | 1099.5 s |  14.901 MB/s |   2.2 %  | 279.3 % |
| Random Write 2000 MBs |  333.8 s |   5.991 MB/s |   1.0 %  |  14.9 % |
| Read        16384 MBs | 1706.3 s |   9.602 MB/s |   1.4 %  |  19.1 % |
| Random Read  2000 MBs |  460.3 s |   4.345 MB/s |   1.1 %  |  14.8 % |
`----------------------------------------------------------------------'

Throughput scaled by %cpu:
Write:            5.2934MB/cpusec
Random Write:    37.6792MB/cpusec
Read:            46.8390MB/cpusec
Random Read:     27.3270MB/cpusec

top 10 kernel cpu consumers:
 11873 generic_file_aio_write_nolock              5.2258
 10245 file_read_actor                           45.7366
 10212 read_sched_profile                        45.5893
 10135 finish_task_switch                        52.7865
  9171 do_anonymous_page                         20.4710
  8619 __do_softirq                              29.9271
  2905 wake_up_filtered                          18.1562
  2325 __get_page_state                          10.3795
  2278 del_timer_sync                             5.0848
  2033 buffered_rmqueue                           3.5295

top 10 callers of scheduling functions:
3985424 cpu_idle                                 20757.4167
2396754 wait_on_page_bit                         7489.8562
209453 __handle_preemption                      13090.8125
164071 worker_thread                            301.6011
 24321 do_exit                                   23.7510
 21272 generic_file_aio_write_nolock              9.3627
 16271 sys_wait4                                 28.2483
 11080 pipe_wait                                 86.5625
  9634 compat_sys_nanosleep                      25.0885
  7742 shrink_list                                4.1713


-- wli

             reply	other threads:[~2004-05-03  2:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-03  2:17 William Lee Irwin III [this message]
2004-05-03  2:23 ` [0.5/2] scheduler caller profiling William Lee Irwin III
2004-05-03  2:29   ` William Lee Irwin III
2004-05-03  2:32     ` [2/2] filtered buffer_head wakeups William Lee Irwin III
2004-05-03 18:51   ` [0.5/2] scheduler caller profiling David Mosberger
2004-05-03  2:46 ` [0/2] filtered wakeups William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040503021709.GF1397@holomorphy.com \
    --to=wli@holomorphy.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.