From: Shawn Bohrer <shawn.bohrer@gmail.com>
To: xfs@oss.sgi.com
Subject: High latencies writing to a memory mapped file
Date: Thu, 22 Jul 2010 09:47:06 -0500 [thread overview]
Message-ID: <20100722144706.GA2840@BohrerMBP.rgmadvisors.com> (raw)
Hello,
We have an application that receives data over the network, and then
writes data out to a shared memory mapped file on xfs. Previously
everything worked great when we were running a CentOS 5.3 kernel
(2.6.18-128.7.1.el5), but when we upgraded to the newer CentOS 5.4 or
5.5 kernels we started to see occasional spikes (up to 1 second) in
the time it takes to write the data to the memory mapped file. I
noticed that between the CentOS 5.3 and 5.4 kernels the upstream xfs
changes were backported. Testing on 2.6.32.16 shows the same
occasional latency spikes.
After doing some debugging on 2.6.32.16 I have found the cause of the
latency spikes. The spikes occur every time pdflush runs to flush the
dirty pages. Note our app only writes data at about 1.5 MB/s and we
are well below both the dirty_background_ratio and the dirty_ratio so
these flushes are only of pages that are dirty_expire_centisecs old.
Capturing one of the spikes with ftrace shows:
...
flush-8:0-601 [011] 1291592.127590: funcgraph_entry: | xfs_ilock() {
flush-8:0-601 [011] 1291592.127590: funcgraph_entry: | down_write() {
flush-8:0-601 [011] 1291592.127590: funcgraph_entry: 0.213 us | _cond_resched();
flush-8:0-601 [011] 1291592.127591: funcgraph_exit: 0.657 us | }
flush-8:0-601 [011] 1291592.127591: funcgraph_exit: 1.174 us | }
flush-8:0-601 [011] 1291592.127764: sched_switch: task flush-8:0:601 [120] (D) ==> swapper:0 [120]
...
Here you can see flush-8:0 acquires the xfs_ilock() then I believe it
blocks waiting for the I/O to complete. Next you can see my
file_writer process generates a page fault while writing new data to
the memory mapped file. The page fault tries to acquire the
xfs_ilock() and then blocks because it is held by the flush-8:0
process.
...
file_writer-10549 [001] 1291593.100416: print: 00000000 write_delay start
file_writer-10549 [001] 1291593.100419: funcgraph_entry: | handle_mm_fault() {
file_writer-10549 [001] 1291593.100420: funcgraph_entry: | __do_fault() {
file_writer-10549 [001] 1291593.100420: funcgraph_entry: | filemap_fault() {
file_writer-10549 [001] 1291593.100421: funcgraph_entry: 0.775 us | find_get_page();
file_writer-10549 [001] 1291593.100422: funcgraph_entry: 0.191 us | _cond_resched();
file_writer-10549 [001] 1291593.100422: funcgraph_exit: 1.980 us | }
file_writer-10549 [001] 1291593.100422: funcgraph_entry: | unlock_page() {
file_writer-10549 [001] 1291593.100423: funcgraph_entry: 0.154 us | page_waitqueue();
file_writer-10549 [001] 1291593.100423: funcgraph_entry: 0.313 us | __wake_up_bit();
file_writer-10549 [001] 1291593.100423: funcgraph_exit: 1.080 us | }
file_writer-10549 [001] 1291593.100424: funcgraph_entry: | xfs_vm_page_mkwrite() {
file_writer-10549 [001] 1291593.100424: funcgraph_entry: | block_page_mkwrite() {
file_writer-10549 [001] 1291593.100424: funcgraph_entry: 0.240 us | _cond_resched();
file_writer-10549 [001] 1291593.100425: funcgraph_entry: | block_prepare_write() {
file_writer-10549 [001] 1291593.100425: funcgraph_entry: | __block_prepare_write() {
file_writer-10549 [001] 1291593.100426: funcgraph_entry: | create_empty_buffers() {
file_writer-10549 [001] 1291593.100426: funcgraph_entry: | alloc_page_buffers() {
file_writer-10549 [001] 1291593.100426: funcgraph_entry: | alloc_buffer_head() {
file_writer-10549 [001] 1291593.100426: funcgraph_entry: | kmem_cache_alloc() {
file_writer-10549 [001] 1291593.100427: funcgraph_entry: 0.491 us | _cond_resched();
file_writer-10549 [001] 1291593.100427: funcgraph_exit: 1.043 us | }
file_writer-10549 [001] 1291593.100428: funcgraph_entry: 0.185 us | recalc_bh_state();
file_writer-10549 [001] 1291593.100428: funcgraph_exit: 2.003 us | }
file_writer-10549 [001] 1291593.100428: funcgraph_entry: 0.137 us | set_bh_page();
file_writer-10549 [001] 1291593.100428: funcgraph_exit: 2.701 us | }
file_writer-10549 [001] 1291593.100429: funcgraph_entry: 0.271 us | _spin_lock();
file_writer-10549 [001] 1291593.100429: funcgraph_exit: 3.751 us | }
file_writer-10549 [001] 1291593.100429: funcgraph_entry: | xfs_get_blocks() {
file_writer-10549 [001] 1291593.100430: funcgraph_entry: | __xfs_get_blocks() {
file_writer-10549 [001] 1291593.100430: funcgraph_entry: | xfs_iomap() {
file_writer-10549 [001] 1291593.100430: funcgraph_entry: | xfs_ilock() {
file_writer-10549 [001] 1291593.100430: funcgraph_entry: | down_write() {
file_writer-10549 [001] 1291593.100431: funcgraph_entry: 0.241 us | _cond_resched();
file_writer-10549 [001] 1291593.100431: funcgraph_entry: 0.258 us | _spin_lock_irq();
file_writer-10549 [001] 1291593.100432: funcgraph_entry: | schedule() {
file_writer-10549 [001] 1291593.100432: funcgraph_entry: 0.236 us | rcu_sched_qs();
file_writer-10549 [001] 1291593.100432: funcgraph_entry: 0.152 us | _spin_lock_irq();
file_writer-10549 [001] 1291593.100433: funcgraph_entry: | deactivate_task() {
file_writer-10549 [001] 1291593.100433: funcgraph_entry: | dequeue_task() {
file_writer-10549 [001] 1291593.100433: funcgraph_entry: | dequeue_task_fair() {
file_writer-10549 [001] 1291593.100433: funcgraph_entry: | dequeue_entity() {
file_writer-10549 [001] 1291593.100434: funcgraph_entry: | update_curr() {
file_writer-10549 [001] 1291593.100434: funcgraph_entry: 0.351 us | task_of();
file_writer-10549 [001] 1291593.100434: funcgraph_exit: 0.862 us | }
file_writer-10549 [001] 1291593.100435: funcgraph_entry: 0.139 us | task_of();
file_writer-10549 [001] 1291593.100435: funcgraph_exit: 1.462 us | }
file_writer-10549 [001] 1291593.100435: funcgraph_entry: 0.159 us | hrtick_start_fair();
file_writer-10549 [001] 1291593.100435: funcgraph_exit: 2.188 us | }
file_writer-10549 [001] 1291593.100435: funcgraph_exit: 2.627 us | }
file_writer-10549 [001] 1291593.100436: funcgraph_exit: 2.934 us | }
file_writer-10549 [001] 1291593.100436: funcgraph_entry: 0.138 us | update_shares_locked();
file_writer-10549 [001] 1291593.100436: funcgraph_entry: | find_busiest_group() {
file_writer-10549 [001] 1291593.100436: funcgraph_entry: 0.144 us | default_scale_smt_power();
file_writer-10549 [001] 1291593.100437: funcgraph_entry: | scale_rt_power() {
file_writer-10549 [001] 1291593.100437: funcgraph_entry: 0.140 us | sched_avg_update();
file_writer-10549 [001] 1291593.100437: funcgraph_exit: 0.413 us | }
file_writer-10549 [001] 1291593.100437: funcgraph_exit: 1.290 us | }
file_writer-10549 [001] 1291593.100438: funcgraph_entry: 0.137 us | msecs_to_jiffies();
file_writer-10549 [001] 1291593.100438: funcgraph_entry: 0.140 us | update_shares_locked();
file_writer-10549 [001] 1291593.100438: funcgraph_entry: 0.502 us | find_busiest_group();
file_writer-10549 [001] 1291593.100439: funcgraph_entry: 0.140 us | msecs_to_jiffies();
file_writer-10549 [001] 1291593.100439: funcgraph_entry: 0.141 us | update_shares_locked();
file_writer-10549 [001] 1291593.100439: funcgraph_entry: 0.706 us | find_busiest_group();
file_writer-10549 [001] 1291593.100440: funcgraph_entry: 0.598 us | find_busiest_queue();
file_writer-10549 [001] 1291593.100441: funcgraph_entry: 0.137 us | msecs_to_jiffies();
file_writer-10549 [001] 1291593.100441: funcgraph_entry: 0.155 us | put_prev_task_fair();
file_writer-10549 [001] 1291593.100442: funcgraph_entry: 0.152 us | pick_next_task_fair();
file_writer-10549 [001] 1291593.100442: funcgraph_entry: 0.144 us | pick_next_task_rt();
file_writer-10549 [001] 1291593.100442: funcgraph_entry: 0.149 us | pick_next_task_fair();
file_writer-10549 [001] 1291593.100442: funcgraph_entry: 0.203 us | pick_next_task_idle();
file_writer-10549 [001] 1291593.100443: funcgraph_entry: 0.143 us | perf_event_task_sched_out();
file_writer-10549 [001] 1291593.100443: funcgraph_entry: | ftrace_raw_event_sched_switch() {
file_writer-10549 [001] 1291593.100443: sched_switch: task file_writer:10549 [120] (D) ==> swapper:0 [120]
...
Finally xfslogd/1 wakes up the flush-8:0 process which releases the
xfs_ilock() waking up my file_writer process and allowing it to
continue.
...
xfslogd/1-213 [001] 1291593.158606: sched_wakeup: task flush-8:0:601 [120] success=1 [011]
<idle>-0 [011] 1291593.158609: sched_switch: task swapper:0 [120] (R) ==> flush-8:0:601 [120]
flush-8:0-601 [011] 1291593.158623: funcgraph_entry: | xfs_iunlock() {
flush-8:0-601 [011] 1291593.158623: funcgraph_entry: | up_write() {
flush-8:0-601 [011] 1291593.158623: funcgraph_entry: 0.266 us | _spin_lock_irqsave();
flush-8:0-601 [011] 1291593.158624: funcgraph_entry: | wake_up_process() {
flush-8:0-601 [011] 1291593.158624: funcgraph_entry: | try_to_wake_up() {
flush-8:0-601 [011] 1291593.158625: funcgraph_entry: | task_rq_lock() {
flush-8:0-601 [011] 1291593.158625: funcgraph_entry: 0.159 us | _spin_lock();
flush-8:0-601 [011] 1291593.158625: funcgraph_exit: 0.763 us | }
flush-8:0-601 [011] 1291593.158626: funcgraph_entry: 0.237 us | _spin_unlock_irqrestore();
flush-8:0-601 [011] 1291593.158626: funcgraph_entry: | select_task_rq_fair() {
flush-8:0-601 [011] 1291593.158627: funcgraph_entry: 0.231 us | update_shares();
flush-8:0-601 [011] 1291593.158627: funcgraph_entry: 0.162 us | effective_load();
flush-8:0-601 [011] 1291593.158628: funcgraph_entry: 0.141 us | effective_load();
flush-8:0-601 [011] 1291593.158628: funcgraph_exit: 1.623 us | }
flush-8:0-601 [011] 1291593.158628: funcgraph_entry: | task_rq_lock() {
flush-8:0-601 [011] 1291593.158628: funcgraph_entry: 0.142 us | _spin_lock();
flush-8:0-601 [011] 1291593.158628: funcgraph_exit: 0.420 us | }
flush-8:0-601 [011] 1291593.158629: funcgraph_entry: | activate_task() {
flush-8:0-601 [011] 1291593.158629: funcgraph_entry: | enqueue_task() {
flush-8:0-601 [011] 1291593.158629: funcgraph_entry: | enqueue_task_fair() {
flush-8:0-601 [011] 1291593.158629: funcgraph_entry: | enqueue_entity() {
flush-8:0-601 [011] 1291593.158630: funcgraph_entry: 0.277 us | update_curr();
flush-8:0-601 [011] 1291593.158630: funcgraph_entry: 0.159 us | place_entity();
flush-8:0-601 [011] 1291593.158630: funcgraph_entry: 0.137 us | task_of();
flush-8:0-601 [011] 1291593.158631: funcgraph_entry: 0.172 us | __enqueue_entity();
flush-8:0-601 [011] 1291593.158631: funcgraph_exit: 1.654 us | }
flush-8:0-601 [011] 1291593.158631: funcgraph_exit: 2.062 us | }
flush-8:0-601 [011] 1291593.158631: funcgraph_exit: 2.444 us | }
flush-8:0-601 [011] 1291593.158631: funcgraph_exit: 2.760 us | }
flush-8:0-601 [011] 1291593.158632: funcgraph_entry: | ftrace_raw_event_sched_wakeup() {
flush-8:0-601 [011] 1291593.158632: sched_wakeup: task file_writer:10549 [120] success=1 [001]
...
Looking back at the 2.6.18-128.7.1.el5 kernel I see that the old
kernel provided the following vm operations:
static struct vm_operations_struct xfs_file_vm_ops = {
.nopage = filemap_nopage,
.populate = filemap_populate,
};
Where the new kernels provides:
static struct vm_operations_struct xfs_file_vm_ops = {
.nopage = filemap_nopage,
.populate = filemap_populate,
.page_mkwrite = xfs_vm_page_mkwrite,
};
static const struct vm_operations_struct xfs_file_vm_ops = {
.fault = filemap_fault,
.page_mkwrite = xfs_vm_page_mkwrite,
};
So I can see that the xfs_vm_page_mkwrite is a new addition and this
is what is causing the stalls on the newer kernels. I'll admit I know
very little about file systems, or the implications of these changes.
My question in general is what can we do to eliminate these latencies?
Thanks,
Shawn
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next reply other threads:[~2010-07-22 14:44 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-22 14:47 Shawn Bohrer [this message]
2010-07-22 23:28 ` High latencies writing to a memory mapped file Dave Chinner
2010-07-26 22:09 ` Shawn Bohrer
2010-07-26 23:22 ` Dave Chinner
2010-08-03 22:03 ` Shawn Bohrer
2010-07-27 9:24 ` Matthias Schniedermeyer
2010-07-27 10:47 ` Emmanuel Florac
2010-07-27 11:27 ` Matthias Schniedermeyer
2010-07-27 13:06 ` Emmanuel Florac
2010-07-27 13:00 ` Stan Hoeppner
[not found] ` <4C4EDEFD.7000401@hardwarefreak.com>
2010-07-27 14:49 ` Matthias Schniedermeyer
2010-07-27 19:59 ` Kinzel, David
2010-07-27 21:02 ` Matthias Schniedermeyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100722144706.GA2840@BohrerMBP.rgmadvisors.com \
--to=shawn.bohrer@gmail.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox