From: William Lee Irwin III <wli@holomorphy.com>
To: akpm@osdl.org
Cc: linux-kernel@vger.kernel.org
Subject: [0/2] filtered wakeups
Date: Sun, 2 May 2004 19:17:09 -0700 [thread overview]
Message-ID: <20040503021709.GF1397@holomorphy.com> (raw)
The thundering herd issue in waitqueue hashing has been seen in
practice. In order to preserve the space footprint reduction while
improving performance, I wrote "filtered wakeups", which discriminate
between waiters based on a key.
The following patch series, vs. 2.6.6-rc3-mm1, drastically reduces the
kernel cpu consumption of tiobench --threads 512 --size 16384 (fed to
tiotest by hand since apparently the perl script is buggy) on a 6x336MHz
UltraSPARC III Sun Enterprise 3000 with 3.5GB RAM, ESP-366HME HBA,
10x10Krpm 18GB U160 SCSI disks configured for dm thusly:
0 355655680 striped 10 64 /dev/sda 0 /dev/sdb 0 /dev/sdc 0 /dev/sdd 0 \
/dev/sde 0 /dev/sdf 0 /dev/sdg 0 /dev/sdh 0 /dev/sdi 0 /dev/sdj 0
This was mkfs'd freshly to a single 171GB ext2 fs.
1/2, filtered page waitqueues, resolves the thundering herd issue with
hashed page waitqueues.
2/2, filtered buffer_head waitqueues, resolves the thundering herd issue
with hashed buffer_head waitqueues.
Futexes appear to have their own solution to this issue, which is
necessarily different from this as it needs to discriminate based on a
longer key. They could in principle be consolidated by passing a
comparator instead of comparing a key field or some similar strategy at
the cost of indirect function calls.
I furthermore instrumented the calls to schedule(), possibly done
indirectly, in patch 0.5/2 of the series, which isn't necessarily meant
to be applied to anything, but merely shows how I collected some of the
information in the runtime logs, which for space reasons I've posted as
URL's instead of including them inline.
ftp://ftp.kernel.org/pub/linux/kernel/people/wli/vm/filtered_wakeup/virgin_mm.log.tar.bz2
ftp://ftp.kernel.org/pub/linux/kernel/people/wli/vm/filtered_wakeup/filtered_wakeup.log.tar.bz2
Here "cpusec" represents 1 second of actual cpu consumed, counting cpu
consumption of both user and kernel. Apart from regular sampling of
profile data, no other load was running on the machine.
before:
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item | Time | Rate | Usr CPU | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write 16384 MBs | 1118.1 s | 14.654 MB/s | 1.6 % | 280.9 % |
| Random Write 2000 MBs | 336.2 s | 5.950 MB/s | 0.8 % | 20.4 % |
| Read 16384 MBs | 1717.1 s | 9.542 MB/s | 1.4 % | 31.8 % |
| Random Read 2000 MBs | 465.2 s | 4.300 MB/s | 1.1 % | 36.1 % |
`----------------------------------------------------------------------'
Throughput scaled by %cpu:
Write: 5.1873MB/cpusec
Random Write: 28.0660MB/cpusec
Read: 28.7410MB/cpusec
Random Read: 11.5591MB/cpusec
top 10 kernel cpu consumers:
21733 finish_task_switch 113.1927
11976 __wake_up 187.1250
11433 generic_file_aio_write_nolock 5.0321
9730 read_sched_profile 43.4375
9606 file_read_actor 42.8839
9116 __do_softirq 31.6528
8682 do_anonymous_page 19.3795
3635 prepare_to_wait 28.3984
2159 kmem_cache_free 16.8672
1944 buffered_rmqueue 3.3750
top 10 callers of scheduling functions:
9391185 wait_on_page_bit 32608.2812
7280055 cpu_idle 37916.9531
1458446 __lock_page 5064.0486
258142 __handle_preemption 16133.8750
134815 worker_thread 247.8217
45989 __wait_on_buffer 205.3080
22294 do_exit 21.7715
22187 generic_file_aio_write_nolock 9.7654
14932 sys_wait4 25.9236
14652 shrink_list 7.8944
after:
Tiotest results for 512 concurrent io threads:
,----------------------------------------------------------------------.
| Item | Time | Rate | Usr CPU | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write 16384 MBs | 1099.5 s | 14.901 MB/s | 2.2 % | 279.3 % |
| Random Write 2000 MBs | 333.8 s | 5.991 MB/s | 1.0 % | 14.9 % |
| Read 16384 MBs | 1706.3 s | 9.602 MB/s | 1.4 % | 19.1 % |
| Random Read 2000 MBs | 460.3 s | 4.345 MB/s | 1.1 % | 14.8 % |
`----------------------------------------------------------------------'
Throughput scaled by %cpu:
Write: 5.2934MB/cpusec
Random Write: 37.6792MB/cpusec
Read: 46.8390MB/cpusec
Random Read: 27.3270MB/cpusec
top 10 kernel cpu consumers:
11873 generic_file_aio_write_nolock 5.2258
10245 file_read_actor 45.7366
10212 read_sched_profile 45.5893
10135 finish_task_switch 52.7865
9171 do_anonymous_page 20.4710
8619 __do_softirq 29.9271
2905 wake_up_filtered 18.1562
2325 __get_page_state 10.3795
2278 del_timer_sync 5.0848
2033 buffered_rmqueue 3.5295
top 10 callers of scheduling functions:
3985424 cpu_idle 20757.4167
2396754 wait_on_page_bit 7489.8562
209453 __handle_preemption 13090.8125
164071 worker_thread 301.6011
24321 do_exit 23.7510
21272 generic_file_aio_write_nolock 9.3627
16271 sys_wait4 28.2483
11080 pipe_wait 86.5625
9634 compat_sys_nanosleep 25.0885
7742 shrink_list 4.1713
-- wli
next reply other threads:[~2004-05-03 2:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-05-03 2:17 William Lee Irwin III [this message]
2004-05-03 2:23 ` [0.5/2] scheduler caller profiling William Lee Irwin III
2004-05-03 2:29 ` William Lee Irwin III
2004-05-03 2:32 ` [2/2] filtered buffer_head wakeups William Lee Irwin III
2004-05-03 18:51 ` [0.5/2] scheduler caller profiling David Mosberger
2004-05-03 2:46 ` [0/2] filtered wakeups William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040503021709.GF1397@holomorphy.com \
--to=wli@holomorphy.com \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.