From: William Lee Irwin III <wli@holomorphy.com>
To: akpm@osdl.org
Cc: linux-kernel@vger.kernel.org
Subject: [0/2] filtered wakeups
Date: Sun, 2 May 2004 19:17:09 -0700 [thread overview]
Message-ID: <20040503021709.GF1397@holomorphy.com> (raw)
The thundering herd issue in waitqueue hashing has been seen in
practice. In order to preserve the space footprint reduction while
improving performance, I wrote "filtered wakeups", which discriminate
between waiters based on a key.
The following patch series, vs. 2.6.6-rc3-mm1, drastically reduces the
kernel cpu consumption of tiobench --threads 512 --size 16384 (fed to
tiotest by hand since apparently the perl script is buggy) on a 6x336MHz
UltraSPARC III Sun Enterprise 3000 with 3.5GB RAM, ESP-366HME HBA,
10x10Krpm 18GB U160 SCSI disks configured for dm thusly:
0 355655680 striped 10 64 /dev/sda 0 /dev/sdb 0 /dev/sdc 0 /dev/sdd 0 \
/dev/sde 0 /dev/sdf 0 /dev/sdg 0 /dev/sdh 0 /dev/sdi 0 /dev/sdj 0
This was mkfs'd freshly to a single 171GB ext2 fs.
1/2, filtered page waitqueues, resolves the thundering herd issue with
hashed page waitqueues.
2/2, filtered buffer_head waitqueues, resolves the thundering herd issue
with hashed buffer_head waitqueues.
Futexes appear to have their own solution to this issue, which is
necessarily different from this as it needs to discriminate based on a
longer key. They could in principle be consolidated by passing a
comparator instead of comparing a key field or some similar strategy at
the cost of indirect function calls.
I furthermore instrumented the calls to schedule(), possibly done
indirectly, in patch 0.5/2 of the series, which isn't necessarily meant
to be applied to anything, but merely shows how I collected some of the
information in the runtime logs, which for space reasons I've posted as
URL's instead of including them inline.
ftp://ftp.kernel.org/pub/linux/kernel/people/wli/vm/filtered_wakeup/virgin_mm.log.tar.bz2
ftp://ftp.kernel.org/pub/linux/kernel/people/wli/vm/filtered_wakeup/filtered_wakeup.log.tar.bz2
Here "cpusec" represents 1 second of actual cpu consumed, counting cpu
consumption of both user and kernel. Apart from regular sampling of
profile data, no other load was running on the machine.
before:
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item | Time | Rate | Usr CPU | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write 16384 MBs | 1118.1 s | 14.654 MB/s | 1.6 % | 280.9 % |
| Random Write 2000 MBs | 336.2 s | 5.950 MB/s | 0.8 % | 20.4 % |
| Read 16384 MBs | 1717.1 s | 9.542 MB/s | 1.4 % | 31.8 % |
| Random Read 2000 MBs | 465.2 s | 4.300 MB/s | 1.1 % | 36.1 % |
`----------------------------------------------------------------------'
Throughput scaled by %cpu:
Write: 5.1873MB/cpusec
Random Write: 28.0660MB/cpusec
Read: 28.7410MB/cpusec
Random Read: 11.5591MB/cpusec
top 10 kernel cpu consumers:
21733 finish_task_switch 113.1927
11976 __wake_up 187.1250
11433 generic_file_aio_write_nolock 5.0321
9730 read_sched_profile 43.4375
9606 file_read_actor 42.8839
9116 __do_softirq 31.6528
8682 do_anonymous_page 19.3795
3635 prepare_to_wait 28.3984
2159 kmem_cache_free 16.8672
1944 buffered_rmqueue 3.3750
top 10 callers of scheduling functions:
9391185 wait_on_page_bit 32608.2812
7280055 cpu_idle 37916.9531
1458446 __lock_page 5064.0486
258142 __handle_preemption 16133.8750
134815 worker_thread 247.8217
45989 __wait_on_buffer 205.3080
22294 do_exit 21.7715
22187 generic_file_aio_write_nolock 9.7654
14932 sys_wait4 25.9236
14652 shrink_list 7.8944
after:
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item | Time | Rate | Usr CPU | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write 16384 MBs | 1099.5 s | 14.901 MB/s | 2.2 % | 279.3 % |
| Random Write 2000 MBs | 333.8 s | 5.991 MB/s | 1.0 % | 14.9 % |
| Read 16384 MBs | 1706.3 s | 9.602 MB/s | 1.4 % | 19.1 % |
| Random Read 2000 MBs | 460.3 s | 4.345 MB/s | 1.1 % | 14.8 % |
`----------------------------------------------------------------------'
Throughput scaled by %cpu:
Write: 5.2934MB/cpusec
Random Write: 37.6792MB/cpusec
Read: 46.8390MB/cpusec
Random Read: 27.3270MB/cpusec
top 10 kernel cpu consumers:
11873 generic_file_aio_write_nolock 5.2258
10245 file_read_actor 45.7366
10212 read_sched_profile 45.5893
10135 finish_task_switch 52.7865
9171 do_anonymous_page 20.4710
8619 __do_softirq 29.9271
2905 wake_up_filtered 18.1562
2325 __get_page_state 10.3795
2278 del_timer_sync 5.0848
2033 buffered_rmqueue 3.5295
top 10 callers of scheduling functions:
3985424 cpu_idle 20757.4167
2396754 wait_on_page_bit 7489.8562
209453 __handle_preemption 13090.8125
164071 worker_thread 301.6011
24321 do_exit 23.7510
21272 generic_file_aio_write_nolock 9.3627
16271 sys_wait4 28.2483
11080 pipe_wait 86.5625
9634 compat_sys_nanosleep 25.0885
7742 shrink_list 4.1713
-- wli
next reply other threads:[~2004-05-03 2:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-05-03 2:17 William Lee Irwin III [this message]
2004-05-03 2:23 ` [0.5/2] scheduler caller profiling William Lee Irwin III
2004-05-03 2:29 ` William Lee Irwin III
2004-05-03 2:32 ` [2/2] filtered buffer_head wakeups William Lee Irwin III
2004-05-03 18:51 ` [0.5/2] scheduler caller profiling David Mosberger
2004-05-03 2:46 ` [0/2] filtered wakeups William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040503021709.GF1397@holomorphy.com \
--to=wli@holomorphy.com \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox