All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Jan Kara <jack@suse.cz>, David Chinner <dgc@sgi.com>,
	lkml <linux-kernel@vger.kernel.org>, marcelo <marcelo@kvack.org>
Subject: Re: BUG: ext3 hang in transaction commit
Date: Thu, 3 Apr 2008 13:15:00 -0300	[thread overview]
Message-ID: <20080403161500.GA21434@dmt> (raw)
In-Reply-To: <1207235335.8514.843.camel@twins>

On Thu, Apr 03, 2008 at 05:08:55PM +0200, Peter Zijlstra wrote:
> 
> Marcelo has seen something similar, kjournald got stuck and the rest of
> userspace piled on top of it.
> 
> The softlockup watchdog triggered because kjournald got stuck in D state
> for over 2 minutes. However Marcelo told me that it did get going
> eventually...
> 
> He had a rather easy to reproduce case IIRC, I'll let him tell you more
> about it :-)

Run iozone (-t 4) with a sufficiently large file size under a KVM guest
with a disk file residing in an ext3 filesystem with ordered data mode.

There's nothing special in this testcase, boils down to the QEMU process
writing data to a file.

Issue is not seen under "writeback" mode. While fsync latency in ordered
data mode is known to be bad, _2 minutes_ feels more like a DoS rather
than latency issue to me.

vmstat 2 output on the host shows that the first write pass (initial
writers and rewriters) blasts at full disk speed:

 0  2      0 6552980 748972 662876    0    0    58 69838  322 9912  5  6 78 10  0
 1  1      0 6493620 779452 693596    0    0    10 54884  417 7032  1  3 78 17  0
 1  2      0 6322968 870556 772156    0    0    58 43042  379 12651  4  7 77 12  0
 1  1      0 6248416 915004 806428    0    0    20 37600  367 7800  2  4 71 23  0
 1  0      0 5981148 1070300 906120    0    0    88    64  243 19734  8 10 80  1  0
 0  4      0 5875992 1165220 906700    0    0    80 62620  332 13192  5  7 76 13  0
 0  4      0 5867244 1177440 906724    0    0    10 66728  385 2753  1  2 66 31  0
 0  2      0 5792912 1252668 907900    0    0    86 26288  401 12130  4  6 72 18  0
 2  0      0 5760280 1290860 908092    0    0    18 34704  487 7130  2  3 80 15  0

But in the "mixed workload" stage write throughtput slows down drastically:

 0  2      0 5666016 468036 1823052    0    0     0  1248  433 8236  5  5 83  7  0
 0  2      0 5666016 468036 1823052    0    0     0  3162  899  679  0  0 78 22  0
 0  2      0 5666016 468036 1823052    0    0     0  3290  864  829  0  0 76 24  0
 0  2      0 5666024 468036 1823052    0    0     0  3414  925 2170  1  1 79 19  0
 1  1      0 5665528 468036 1823052    0    0     0  3148  854 7935  6  6 83  4  0
 0  2      0 5665024 468036 1823052    0    0     0  2570  799 4065  3  4 80 13  0
 0  2      0 5665024 468036 1823052    0    0     0  2462  779  464  0  0 88 12  0
 0  2      0 5664908 468036 1823052    0    0     0  2458  794 1028  0  0 88 12  0
 1  1      0 5664908 468036 1823052    0    0     0  2548  799 4153  2  2 87  9  0

Which can be seen in the iozone performance inside the guest:

        Children see throughput for 4 mixed workload    =  106073.50 KB/sec
        Parent sees throughput for 4 mixed workload     =    2560.13 KB/sec
        Min throughput per process                      =    3700.33 KB/sec
        Max throughput per process                      =   66393.62 KB/sec
        Avg throughput per process                      =   26518.38 KB/sec
        Min xfer                                        =   44424.00 KB

And this is when fsync becomes nasty.

blktrace output shows that the maximum latency for a single write
request io complete is 1.5 seconds, which is similar to what is seen
under "writeback" mode.

I reduced hung_task_timeout_secs to 30 for this report, but vim and
rsyslogd have been seen hung up to 120 seconds.

INFO: task vim:4476 blocked for more than 30 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vim           D 00000001000e3c44     0  4476   3591
 ffff8101da80dd28 0000000000000046 0000000000000000 00000000001cc549
 ffff8101fe1e8a00 ffff81022f20e740 ffff8101fe1e8c30 0000000600000246
 00000000ffffffff 0000000000000246 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff8030985b>] log_wait_commit+0xa9/0xfe
 [<ffffffff80242347>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff80304c19>] journal_stop+0x1a4/0x1ed
 [<ffffffff803056c6>] journal_force_commit+0x23/0x25
 [<ffffffff802fe337>] ext3_force_commit+0x26/0x28
 [<ffffffff802f8756>] ext3_write_inode+0x39/0x3f
 [<ffffffff802a58e9>] __writeback_single_inode+0x18c/0x2ab
 [<ffffffff802a5a2c>] sync_inode+0x24/0x39
 [<ffffffff802f4baf>] ext3_sync_file+0x8f/0xa0
 [<ffffffff802a892d>] do_fsync+0x56/0xac
 [<ffffffff802a89b1>] __do_fsync+0x2e/0x44
 [<ffffffff802a89e2>] sys_fsync+0xb/0xd
 [<ffffffff8020b21b>] system_call_after_swapgs+0x7b/0x80

2 locks held by vim/4476:
 #0:  (&sb->s_type->i_mutex_key#9){--..}, at: [<ffffffff802a891c>] do_fsync+0x45/0xac
 #1:  (jbd_handle){--..}, at: [<ffffffff8030567c>] journal_start+0xef/0x116

2 locks held by rsyslogd/3266:
 #0:  (&sb->s_type->i_mutex_key#9){--..}, at: [<ffffffff802a891c>] do_fsync+0x45/0xac
 #1:  (jbd_handle){--..}, at: [<ffffffff8030567c>] journal_start+0xef/0x116
INFO: task rklogd:3269 blocked for more than 30 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rklogd        D 00000001000fbb55     0  3269      1
 ffff8102280b9d28 0000000000000046 0000000000000000 00000000001cc553
 ffff81022d9b0b40 ffff81022f196540 ffff81022d9b0d70 0000000400000246
 00000000ffffffff 0000000000000246 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff8030985b>] log_wait_commit+0xa9/0xfe
 [<ffffffff80242347>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff80304c19>] journal_stop+0x1a4/0x1ed
 [<ffffffff803056c6>] journal_force_commit+0x23/0x25
 [<ffffffff802fe337>] ext3_force_commit+0x26/0x28
 [<ffffffff802f8756>] ext3_write_inode+0x39/0x3f
 [<ffffffff802a58e9>] __writeback_single_inode+0x18c/0x2ab
 [<ffffffff802a5a2c>] sync_inode+0x24/0x39
 [<ffffffff802f4baf>] ext3_sync_file+0x8f/0xa0
 [<ffffffff802a892d>] do_fsync+0x56/0xac
 [<ffffffff802a89b1>] __do_fsync+0x2e/0x44
 [<ffffffff802a89e2>] sys_fsync+0xb/0xd
 [<ffffffff8020b21b>] system_call_after_swapgs+0x7b/0x80


qemu-system-x D 00000001000c1809     0  4359   4356
 ffff8102111fd838 0000000000000046 0000000000000000 0000000000000046
 ffff81021101c080 ffff81022f196540 ffff81021101c2b0 0000000400000002
 00000000ffffffff ffffffff8024bc0b 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff8024bc0b>] ? mark_held_locks+0x4a/0x6a
 [<ffffffff805593d1>] io_schedule+0x28/0x34
 [<ffffffff803596dc>] get_request_wait+0x122/0x170
 [<ffffffff8055b685>] ? _spin_unlock_irqrestore+0x3f/0x47
 [<ffffffff80242347>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff80357872>] ? elv_merge+0x1df/0x221
 [<ffffffff80359d27>] __make_request+0x33e/0x3dd
 [<ffffffff80245115>] ? up_read+0x26/0x2a
 [<ffffffff803585ad>] generic_make_request+0x33c/0x377
 [<ffffffff802626f3>] ? mempool_alloc_slab+0x11/0x13
 [<ffffffff803599e0>] submit_bio+0xce/0xd7
 [<ffffffff802a970e>] submit_bh+0xea/0x10e
 [<ffffffff802aca88>] sync_dirty_buffer+0x6e/0xd3
 [<ffffffff80305e90>] journal_dirty_data+0xeb/0x1ce
 [<ffffffff802f6805>] ext3_journal_dirty_data+0x18/0x41
 [<ffffffff802f5b74>] walk_page_buffers+0x67/0x8e
 [<ffffffff802f67ed>] ? ext3_journal_dirty_data+0x0/0x41
 [<ffffffff802f84ad>] ext3_ordered_write_end+0x73/0x110
 [<ffffffff8026117d>] generic_file_buffered_write+0x1c2/0x63f
 [<ffffffff8024bda1>] ? trace_hardirqs_on+0xf1/0x115
 [<ffffffff80233eed>] ? current_fs_time+0x22/0x29
 [<ffffffff80261953>] __generic_file_aio_write_nolock+0x359/0x3c3
 [<ffffffff80249d4d>] ? debug_mutex_free_waiter+0x46/0x4a
 [<ffffffff80261a23>] generic_file_aio_write+0x66/0xc2
 [<ffffffff802f4a1e>] ext3_file_write+0x1e/0x9e
 [<ffffffff80289912>] do_sync_write+0xe2/0x126
 [<ffffffff80242347>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff8024c6fa>] ? __lock_acquire+0x4e0/0xc0e
 [<ffffffff80246881>] ? getnstimeofday+0x31/0x88
 [<ffffffff8028a09b>] vfs_write+0xae/0x137
 [<ffffffff8028a6f9>] sys_pwrite64+0x55/0x76
 [<ffffffff8020b21b>] system_call_after_swapgs+0x7b/0x80


kjournald     D 00000001000d0612     0  1252      2
 ffff81022c66fcf0 0000000000000046 0000000000000000 ffff81022c4b6280
 ffff81022e90e2c0 ffff81022f196540 ffff81022e90e4f0 000000042c66fd80
 00000000ffffffff ffffffff8805d272 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff8805d272>] ? :dm_mod:dm_unplug_all+0x2d/0x31
 [<ffffffff805593d1>] io_schedule+0x28/0x34
 [<ffffffff802aa7e1>] sync_buffer+0x3e/0x42
 [<ffffffff8055962e>] __wait_on_bit+0x45/0x77
 [<ffffffff802aa7a3>] ? sync_buffer+0x0/0x42
 [<ffffffff802aa7a3>] ? sync_buffer+0x0/0x42
 [<ffffffff805596ce>] out_of_line_wait_on_bit+0x6e/0x7b
 [<ffffffff8024237f>] ? wake_bit_function+0x0/0x2a
 [<ffffffff802aa72b>] __wait_on_buffer+0x1f/0x21
 [<ffffffff80306a02>] journal_commit_transaction+0x516/0xe70
 [<ffffffff8055b685>] ? _spin_unlock_irqrestore+0x3f/0x47
 [<ffffffff8024bda1>] ? trace_hardirqs_on+0xf1/0x115

As Peter mentioned it eventually gets out of this state (after several minutes) and fsync
instances complete.



  reply	other threads:[~2008-04-03 16:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-26 23:16 BUG: ext3 hang in transaction commit David Chinner
2008-04-03 10:07 ` Jan Kara
2008-04-03 15:08   ` Peter Zijlstra
2008-04-03 16:15     ` Marcelo Tosatti [this message]
2008-04-04 10:34       ` Jan Kara
2008-04-04 15:02         ` Marcelo Tosatti
2008-04-04  1:27   ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080403161500.GA21434@dmt \
    --to=mtosatti@redhat.com \
    --cc=dgc@sgi.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo@kvack.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.