From: Mel Gorman <mgorman@suse.de>
To: Theodore Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>, Jiri Slaby <jslaby@suse.cz>
Subject: Re: Excessive stall times on ext4 in 3.9-rc2
Date: Thu, 11 Apr 2013 18:04:02 +0100 [thread overview]
Message-ID: <20130411170402.GB11656@suse.de> (raw)
In-Reply-To: <20130410131245.GC4862@thunk.org>
On Wed, Apr 10, 2013 at 09:12:45AM -0400, Theodore Ts'o wrote:
> On Wed, Apr 10, 2013 at 11:56:08AM +0100, Mel Gorman wrote:
> > During major activity there is likely to be "good" behaviour
> > with stalls roughly every 30 seconds roughly corresponding to
> > dirty_expire_centiseconds. As you'd expect, the flusher thread is stuck
> > when this happens.
> >
> > 237 ? 00:00:00 flush-8:0
> > [<ffffffff811a35b9>] sleep_on_buffer+0x9/0x10
> > [<ffffffff811a35ee>] __lock_buffer+0x2e/0x30
> > [<ffffffff8123a21f>] do_get_write_access+0x43f/0x4b0
>
> If we're stalling on lock_buffer(), that implies that buffer was being
> written, and for some reason it was taking a very long time to
> complete.
>
Yes.
> It might be worthwhile to put a timestamp in struct dm_crypt_io, and
> record the time when a particular I/O encryption/decryption is getting
> queued to the kcryptd workqueues, and when they finally squirt out.
>
That somewhat assumes that dm_crypt was at fault which is not unreasonable
but I was skeptical as the workload on dm_crypt was opening a maildir
and mostly reads.
I used a tracepoint in jbd2 to get an idea of what device the buffer_head
was managing and dm did not show up on the list. This is what a trace-cmd
log of the test told me.
flush-8:0-240 [005] 236.655363: jbd2_lock_buffer_stall: dev 8,8 stall_ms 1096
awesome-1364 [005] 290.594396: jbd2_lock_buffer_stall: dev 8,6 stall_ms 7312
gnome-pty-helpe-2256 [005] 290.836952: jbd2_lock_buffer_stall: dev 8,8 stall_ms 7528
flush-8:0-240 [003] 304.012424: jbd2_lock_buffer_stall: dev 8,8 stall_ms 4472
gnome-terminal-2332 [005] 308.290879: jbd2_lock_buffer_stall: dev 8,6 stall_ms 3060
awesome-1364 [006] 308.291318: jbd2_lock_buffer_stall: dev 8,6 stall_ms 3048
flush-8:0-240 [005] 331.525996: jbd2_lock_buffer_stall: dev 8,5 stall_ms 8732
flush-8:0-240 [005] 332.353526: jbd2_lock_buffer_stall: dev 8,5 stall_ms 472
flush-8:0-240 [005] 345.341547: jbd2_lock_buffer_stall: dev 8,5 stall_ms 10024
gnome-terminal-2418 [005] 347.166876: jbd2_lock_buffer_stall: dev 8,6 stall_ms 11852
awesome-1364 [005] 347.167082: jbd2_lock_buffer_stall: dev 8,6 stall_ms 11844
flush-8:0-240 [005] 347.424520: jbd2_lock_buffer_stall: dev 8,5 stall_ms 2012
flush-8:0-240 [005] 347.583752: jbd2_lock_buffer_stall: dev 8,5 stall_ms 156
flush-8:0-240 [005] 390.079682: jbd2_lock_buffer_stall: dev 8,8 stall_ms 396
flush-8:0-240 [002] 407.882385: jbd2_lock_buffer_stall: dev 8,8 stall_ms 12244
flush-8:0-240 [005] 408.003976: jbd2_lock_buffer_stall: dev 8,8 stall_ms 124
gnome-terminal-2610 [005] 413.613365: jbd2_lock_buffer_stall: dev 8,6 stall_ms 3400
awesome-1364 [006] 413.613605: jbd2_lock_buffer_stall: dev 8,6 stall_ms 3736
flush-8:0-240 [002] 430.706475: jbd2_lock_buffer_stall: dev 8,5 stall_ms 9748
flush-8:0-240 [005] 458.188896: jbd2_lock_buffer_stall: dev 8,5 stall_ms 7748
flush-8:0-240 [005] 458.828143: jbd2_lock_buffer_stall: dev 8,5 stall_ms 348
flush-8:0-240 [006] 459.163814: jbd2_lock_buffer_stall: dev 8,5 stall_ms 252
flush-8:0-240 [005] 462.340173: jbd2_lock_buffer_stall: dev 8,5 stall_ms 3160
flush-8:0-240 [005] 469.917705: jbd2_lock_buffer_stall: dev 8,5 stall_ms 6340
flush-8:0-240 [005] 474.434206: jbd2_lock_buffer_stall: dev 8,5 stall_ms 4512
tar-2315 [005] 510.043613: jbd2_lock_buffer_stall: dev 8,5 stall_ms 4316
tclsh-1780 [005] 773.336488: jbd2_lock_buffer_stall: dev 8,5 stall_ms 736
git-3100 [005] 775.933506: jbd2_lock_buffer_stall: dev 8,5 stall_ms 3664
git-4763 [005] 864.093317: jbd2_lock_buffer_stall: dev 8,5 stall_ms 140
flush-8:0-240 [005] 864.242068: jbd2_lock_buffer_stall: dev 8,6 stall_ms 280
git-4763 [005] 864.264157: jbd2_lock_buffer_stall: dev 8,5 stall_ms 148
flush-8:0-240 [005] 865.200004: jbd2_lock_buffer_stall: dev 8,5 stall_ms 464
git-4763 [000] 865.602469: jbd2_lock_buffer_stall: dev 8,5 stall_ms 300
flush-8:0-240 [005] 865.705448: jbd2_lock_buffer_stall: dev 8,5 stall_ms 500
flush-8:0-240 [005] 885.367576: jbd2_lock_buffer_stall: dev 8,8 stall_ms 11024
flush-8:0-240 [005] 895.339697: jbd2_lock_buffer_stall: dev 8,5 stall_ms 120
flush-8:0-240 [005] 895.765488: jbd2_lock_buffer_stall: dev 8,5 stall_ms 424
systemd-journal-265 [005] 915.687201: jbd2_lock_buffer_stall: dev 8,8 stall_ms 14844
flush-8:0-240 [005] 915.690529: jbd2_lock_buffer_stall: dev 8,6 stall_ms 19656
git-5442 [005] 1034.845674: jbd2_lock_buffer_stall: dev 8,5 stall_ms 344
git-5442 [005] 1035.157389: jbd2_lock_buffer_stall: dev 8,5 stall_ms 264
flush-8:0-240 [005] 1035.875478: jbd2_lock_buffer_stall: dev 8,8 stall_ms 1368
flush-8:0-240 [005] 1036.189218: jbd2_lock_buffer_stall: dev 8,8 stall_ms 312
gnome-terminal-5592 [005] 1037.318594: jbd2_lock_buffer_stall: dev 8,6 stall_ms 2628
awesome-1364 [000] 1037.318913: jbd2_lock_buffer_stall: dev 8,6 stall_ms 2632
git-5789 [005] 1076.805405: jbd2_lock_buffer_stall: dev 8,5 stall_ms 184
flush-8:0-240 [005] 1078.401721: jbd2_lock_buffer_stall: dev 8,5 stall_ms 700
flush-8:0-240 [005] 1078.784200: jbd2_lock_buffer_stall: dev 8,5 stall_ms 356
git-5789 [005] 1079.722683: jbd2_lock_buffer_stall: dev 8,5 stall_ms 1452
flush-8:0-240 [005] 1109.928552: jbd2_lock_buffer_stall: dev 8,5 stall_ms 976
flush-8:0-240 [005] 1111.762280: jbd2_lock_buffer_stall: dev 8,5 stall_ms 1832
flush-8:0-240 [005] 1260.197720: jbd2_lock_buffer_stall: dev 8,5 stall_ms 344
flush-8:0-240 [005] 1260.403556: jbd2_lock_buffer_stall: dev 8,5 stall_ms 204
flush-8:0-240 [005] 1260.550904: jbd2_lock_buffer_stall: dev 8,5 stall_ms 108
git-6598 [005] 1260.832948: jbd2_lock_buffer_stall: dev 8,5 stall_ms 1084
flush-8:0-240 [005] 1311.736367: jbd2_lock_buffer_stall: dev 8,5 stall_ms 260
flush-8:0-240 [005] 1313.689297: jbd2_lock_buffer_stall: dev 8,5 stall_ms 412
flush-8:0-240 [005] 1314.230420: jbd2_lock_buffer_stall: dev 8,5 stall_ms 540
git-7022 [006] 1314.241607: jbd2_lock_buffer_stall: dev 8,5 stall_ms 668
flush-8:0-240 [000] 1347.980425: jbd2_lock_buffer_stall: dev 8,5 stall_ms 308
flush-8:0-240 [005] 1348.164598: jbd2_lock_buffer_stall: dev 8,5 stall_ms 104
git-7998 [005] 1547.755328: jbd2_lock_buffer_stall: dev 8,5 stall_ms 304
flush-8:0-240 [006] 1547.764209: jbd2_lock_buffer_stall: dev 8,5 stall_ms 208
flush-8:0-240 [005] 1548.653365: jbd2_lock_buffer_stall: dev 8,5 stall_ms 844
flush-8:0-240 [005] 1549.255022: jbd2_lock_buffer_stall: dev 8,5 stall_ms 460
flush-8:0-240 [005] 1725.036408: jbd2_lock_buffer_stall: dev 8,6 stall_ms 156
git-8743 [005] 1740.492630: jbd2_lock_buffer_stall: dev 8,5 stall_ms 15032
git-8743 [005] 1749.485214: jbd2_lock_buffer_stall: dev 8,5 stall_ms 8648
flush-8:0-240 [005] 1775.937819: jbd2_lock_buffer_stall: dev 8,5 stall_ms 4268
flush-8:0-240 [006] 1776.335682: jbd2_lock_buffer_stall: dev 8,5 stall_ms 336
flush-8:0-240 [006] 1776.446799: jbd2_lock_buffer_stall: dev 8,5 stall_ms 112
flush-8:0-240 [005] 1802.593183: jbd2_lock_buffer_stall: dev 8,6 stall_ms 108
flush-8:0-240 [006] 1802.809237: jbd2_lock_buffer_stall: dev 8,8 stall_ms 208
flush-8:0-240 [005] 2012.041976: jbd2_lock_buffer_stall: dev 8,6 stall_ms 292
tclsh-1778 [005] 2012.055139: jbd2_lock_buffer_stall: dev 8,5 stall_ms 424
latency-output-1933 [002] 2012.055147: jbd2_lock_buffer_stall: dev 8,5 stall_ms 136
git-10209 [005] 2012.074584: jbd2_lock_buffer_stall: dev 8,5 stall_ms 164
flush-8:0-240 [005] 2012.177241: jbd2_lock_buffer_stall: dev 8,5 stall_ms 128
git-10209 [005] 2012.297472: jbd2_lock_buffer_stall: dev 8,5 stall_ms 216
flush-8:0-240 [005] 2012.299828: jbd2_lock_buffer_stall: dev 8,5 stall_ms 120
dm is not obviously at fault there. sda5 is /usr/src (git checkout
running there with some logging), sda6 is /home and sda8 is / . This is
the tracepoint patch used.
---8<---
jbd2: Trace when lock_buffer at the start of a journal write takes a long time
While investigating interactivity problems it was clear that processes
sometimes stall for long periods of times if an attempt is made to lock
a buffer that is already part of a transaction. It would stall in a
trace looking something like
[<ffffffff811a39de>] __lock_buffer+0x2e/0x30
[<ffffffff8123a60f>] do_get_write_access+0x43f/0x4b0
[<ffffffff8123a7cb>] jbd2_journal_get_write_access+0x2b/0x50
[<ffffffff81220f79>] __ext4_journal_get_write_access+0x39/0x80
[<ffffffff811f3198>] ext4_reserve_inode_write+0x78/0xa0
[<ffffffff811f3209>] ext4_mark_inode_dirty+0x49/0x220
[<ffffffff811f57d1>] ext4_dirty_inode+0x41/0x60
[<ffffffff8119ac3e>] __mark_inode_dirty+0x4e/0x2d0
[<ffffffff8118b9b9>] update_time+0x79/0xc0
[<ffffffff8118ba98>] file_update_time+0x98/0x100
[<ffffffff81110ffc>] __generic_file_aio_write+0x17c/0x3b0
[<ffffffff811112aa>] generic_file_aio_write+0x7a/0xf0
[<ffffffff811ea853>] ext4_file_write+0x83/0xd0
[<ffffffff81172b23>] do_sync_write+0xa3/0xe0
[<ffffffff811731ae>] vfs_write+0xae/0x180
[<ffffffff8117361d>] sys_write+0x4d/0x90
[<ffffffff8159d62d>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
There was a suspicion that dm_crypt might be part responsible so this
patch adds a tracepoint capturing when lock_buffer takes too long
in do_get_write_access() that logs what device is being written and
how long the stall was for.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
fs/jbd2/transaction.c | 8 ++++++++
include/trace/events/jbd2.h | 21 +++++++++++++++++++++
2 files changed, 29 insertions(+)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 325bc01..1be0ccb 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -640,6 +640,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
int error;
char *frozen_buffer = NULL;
int need_copy = 0;
+ unsigned long start_lock, time_lock;
if (is_handle_aborted(handle))
return -EROFS;
@@ -655,9 +656,16 @@ repeat:
/* @@@ Need to check for errors here at some point. */
+ start_lock = jiffies;
lock_buffer(bh);
jbd_lock_bh_state(bh);
+ /* If it takes too long to lock the buffer, trace it */
+ time_lock = jbd2_time_diff(start_lock, jiffies);
+ if (time_lock > HZ/10)
+ trace_jbd2_lock_buffer_stall(bh->b_bdev->bd_dev,
+ jiffies_to_msecs(time_lock));
+
/* We now hold the buffer lock so it is safe to query the buffer
* state. Is the buffer dirty?
*
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 070df49..c1d1f3e 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -358,6 +358,27 @@ TRACE_EVENT(jbd2_write_superblock,
MINOR(__entry->dev), __entry->write_op)
);
+TRACE_EVENT(jbd2_lock_buffer_stall,
+
+ TP_PROTO(dev_t dev, unsigned long stall_ms),
+
+ TP_ARGS(dev, stall_ms),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field(unsigned long, stall_ms )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dev;
+ __entry->stall_ms = stall_ms;
+ ),
+
+ TP_printk("dev %d,%d stall_ms %lu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->stall_ms)
+);
+
#endif /* _TRACE_JBD2_H */
/* This part must be outside protection */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>, Jiri Slaby <jslaby@suse.cz>
Subject: Re: Excessive stall times on ext4 in 3.9-rc2
Date: Thu, 11 Apr 2013 18:04:02 +0100 [thread overview]
Message-ID: <20130411170402.GB11656@suse.de> (raw)
In-Reply-To: <20130410131245.GC4862@thunk.org>
On Wed, Apr 10, 2013 at 09:12:45AM -0400, Theodore Ts'o wrote:
> On Wed, Apr 10, 2013 at 11:56:08AM +0100, Mel Gorman wrote:
> > During major activity there is likely to be "good" behaviour
> > with stalls roughly every 30 seconds roughly corresponding to
> > dirty_expire_centiseconds. As you'd expect, the flusher thread is stuck
> > when this happens.
> >
> > 237 ? 00:00:00 flush-8:0
> > [<ffffffff811a35b9>] sleep_on_buffer+0x9/0x10
> > [<ffffffff811a35ee>] __lock_buffer+0x2e/0x30
> > [<ffffffff8123a21f>] do_get_write_access+0x43f/0x4b0
>
> If we're stalling on lock_buffer(), that implies that buffer was being
> written, and for some reason it was taking a very long time to
> complete.
>
Yes.
> It might be worthwhile to put a timestamp in struct dm_crypt_io, and
> record the time when a particular I/O encryption/decryption is getting
> queued to the kcryptd workqueues, and when they finally squirt out.
>
That somewhat assumes that dm_crypt was at fault which is not unreasonable
but I was skeptical as the workload on dm_crypt was opening a maildir
and mostly reads.
I used a tracepoint in jbd2 to get an idea of what device the buffer_head
was managing and dm did not show up on the list. This is what a trace-cmd
log of the test told me.
flush-8:0-240 [005] 236.655363: jbd2_lock_buffer_stall: dev 8,8 stall_ms 1096
awesome-1364 [005] 290.594396: jbd2_lock_buffer_stall: dev 8,6 stall_ms 7312
gnome-pty-helpe-2256 [005] 290.836952: jbd2_lock_buffer_stall: dev 8,8 stall_ms 7528
flush-8:0-240 [003] 304.012424: jbd2_lock_buffer_stall: dev 8,8 stall_ms 4472
gnome-terminal-2332 [005] 308.290879: jbd2_lock_buffer_stall: dev 8,6 stall_ms 3060
awesome-1364 [006] 308.291318: jbd2_lock_buffer_stall: dev 8,6 stall_ms 3048
flush-8:0-240 [005] 331.525996: jbd2_lock_buffer_stall: dev 8,5 stall_ms 8732
flush-8:0-240 [005] 332.353526: jbd2_lock_buffer_stall: dev 8,5 stall_ms 472
flush-8:0-240 [005] 345.341547: jbd2_lock_buffer_stall: dev 8,5 stall_ms 10024
gnome-terminal-2418 [005] 347.166876: jbd2_lock_buffer_stall: dev 8,6 stall_ms 11852
awesome-1364 [005] 347.167082: jbd2_lock_buffer_stall: dev 8,6 stall_ms 11844
flush-8:0-240 [005] 347.424520: jbd2_lock_buffer_stall: dev 8,5 stall_ms 2012
flush-8:0-240 [005] 347.583752: jbd2_lock_buffer_stall: dev 8,5 stall_ms 156
flush-8:0-240 [005] 390.079682: jbd2_lock_buffer_stall: dev 8,8 stall_ms 396
flush-8:0-240 [002] 407.882385: jbd2_lock_buffer_stall: dev 8,8 stall_ms 12244
flush-8:0-240 [005] 408.003976: jbd2_lock_buffer_stall: dev 8,8 stall_ms 124
gnome-terminal-2610 [005] 413.613365: jbd2_lock_buffer_stall: dev 8,6 stall_ms 3400
awesome-1364 [006] 413.613605: jbd2_lock_buffer_stall: dev 8,6 stall_ms 3736
flush-8:0-240 [002] 430.706475: jbd2_lock_buffer_stall: dev 8,5 stall_ms 9748
flush-8:0-240 [005] 458.188896: jbd2_lock_buffer_stall: dev 8,5 stall_ms 7748
flush-8:0-240 [005] 458.828143: jbd2_lock_buffer_stall: dev 8,5 stall_ms 348
flush-8:0-240 [006] 459.163814: jbd2_lock_buffer_stall: dev 8,5 stall_ms 252
flush-8:0-240 [005] 462.340173: jbd2_lock_buffer_stall: dev 8,5 stall_ms 3160
flush-8:0-240 [005] 469.917705: jbd2_lock_buffer_stall: dev 8,5 stall_ms 6340
flush-8:0-240 [005] 474.434206: jbd2_lock_buffer_stall: dev 8,5 stall_ms 4512
tar-2315 [005] 510.043613: jbd2_lock_buffer_stall: dev 8,5 stall_ms 4316
tclsh-1780 [005] 773.336488: jbd2_lock_buffer_stall: dev 8,5 stall_ms 736
git-3100 [005] 775.933506: jbd2_lock_buffer_stall: dev 8,5 stall_ms 3664
git-4763 [005] 864.093317: jbd2_lock_buffer_stall: dev 8,5 stall_ms 140
flush-8:0-240 [005] 864.242068: jbd2_lock_buffer_stall: dev 8,6 stall_ms 280
git-4763 [005] 864.264157: jbd2_lock_buffer_stall: dev 8,5 stall_ms 148
flush-8:0-240 [005] 865.200004: jbd2_lock_buffer_stall: dev 8,5 stall_ms 464
git-4763 [000] 865.602469: jbd2_lock_buffer_stall: dev 8,5 stall_ms 300
flush-8:0-240 [005] 865.705448: jbd2_lock_buffer_stall: dev 8,5 stall_ms 500
flush-8:0-240 [005] 885.367576: jbd2_lock_buffer_stall: dev 8,8 stall_ms 11024
flush-8:0-240 [005] 895.339697: jbd2_lock_buffer_stall: dev 8,5 stall_ms 120
flush-8:0-240 [005] 895.765488: jbd2_lock_buffer_stall: dev 8,5 stall_ms 424
systemd-journal-265 [005] 915.687201: jbd2_lock_buffer_stall: dev 8,8 stall_ms 14844
flush-8:0-240 [005] 915.690529: jbd2_lock_buffer_stall: dev 8,6 stall_ms 19656
git-5442 [005] 1034.845674: jbd2_lock_buffer_stall: dev 8,5 stall_ms 344
git-5442 [005] 1035.157389: jbd2_lock_buffer_stall: dev 8,5 stall_ms 264
flush-8:0-240 [005] 1035.875478: jbd2_lock_buffer_stall: dev 8,8 stall_ms 1368
flush-8:0-240 [005] 1036.189218: jbd2_lock_buffer_stall: dev 8,8 stall_ms 312
gnome-terminal-5592 [005] 1037.318594: jbd2_lock_buffer_stall: dev 8,6 stall_ms 2628
awesome-1364 [000] 1037.318913: jbd2_lock_buffer_stall: dev 8,6 stall_ms 2632
git-5789 [005] 1076.805405: jbd2_lock_buffer_stall: dev 8,5 stall_ms 184
flush-8:0-240 [005] 1078.401721: jbd2_lock_buffer_stall: dev 8,5 stall_ms 700
flush-8:0-240 [005] 1078.784200: jbd2_lock_buffer_stall: dev 8,5 stall_ms 356
git-5789 [005] 1079.722683: jbd2_lock_buffer_stall: dev 8,5 stall_ms 1452
flush-8:0-240 [005] 1109.928552: jbd2_lock_buffer_stall: dev 8,5 stall_ms 976
flush-8:0-240 [005] 1111.762280: jbd2_lock_buffer_stall: dev 8,5 stall_ms 1832
flush-8:0-240 [005] 1260.197720: jbd2_lock_buffer_stall: dev 8,5 stall_ms 344
flush-8:0-240 [005] 1260.403556: jbd2_lock_buffer_stall: dev 8,5 stall_ms 204
flush-8:0-240 [005] 1260.550904: jbd2_lock_buffer_stall: dev 8,5 stall_ms 108
git-6598 [005] 1260.832948: jbd2_lock_buffer_stall: dev 8,5 stall_ms 1084
flush-8:0-240 [005] 1311.736367: jbd2_lock_buffer_stall: dev 8,5 stall_ms 260
flush-8:0-240 [005] 1313.689297: jbd2_lock_buffer_stall: dev 8,5 stall_ms 412
flush-8:0-240 [005] 1314.230420: jbd2_lock_buffer_stall: dev 8,5 stall_ms 540
git-7022 [006] 1314.241607: jbd2_lock_buffer_stall: dev 8,5 stall_ms 668
flush-8:0-240 [000] 1347.980425: jbd2_lock_buffer_stall: dev 8,5 stall_ms 308
flush-8:0-240 [005] 1348.164598: jbd2_lock_buffer_stall: dev 8,5 stall_ms 104
git-7998 [005] 1547.755328: jbd2_lock_buffer_stall: dev 8,5 stall_ms 304
flush-8:0-240 [006] 1547.764209: jbd2_lock_buffer_stall: dev 8,5 stall_ms 208
flush-8:0-240 [005] 1548.653365: jbd2_lock_buffer_stall: dev 8,5 stall_ms 844
flush-8:0-240 [005] 1549.255022: jbd2_lock_buffer_stall: dev 8,5 stall_ms 460
flush-8:0-240 [005] 1725.036408: jbd2_lock_buffer_stall: dev 8,6 stall_ms 156
git-8743 [005] 1740.492630: jbd2_lock_buffer_stall: dev 8,5 stall_ms 15032
git-8743 [005] 1749.485214: jbd2_lock_buffer_stall: dev 8,5 stall_ms 8648
flush-8:0-240 [005] 1775.937819: jbd2_lock_buffer_stall: dev 8,5 stall_ms 4268
flush-8:0-240 [006] 1776.335682: jbd2_lock_buffer_stall: dev 8,5 stall_ms 336
flush-8:0-240 [006] 1776.446799: jbd2_lock_buffer_stall: dev 8,5 stall_ms 112
flush-8:0-240 [005] 1802.593183: jbd2_lock_buffer_stall: dev 8,6 stall_ms 108
flush-8:0-240 [006] 1802.809237: jbd2_lock_buffer_stall: dev 8,8 stall_ms 208
flush-8:0-240 [005] 2012.041976: jbd2_lock_buffer_stall: dev 8,6 stall_ms 292
tclsh-1778 [005] 2012.055139: jbd2_lock_buffer_stall: dev 8,5 stall_ms 424
latency-output-1933 [002] 2012.055147: jbd2_lock_buffer_stall: dev 8,5 stall_ms 136
git-10209 [005] 2012.074584: jbd2_lock_buffer_stall: dev 8,5 stall_ms 164
flush-8:0-240 [005] 2012.177241: jbd2_lock_buffer_stall: dev 8,5 stall_ms 128
git-10209 [005] 2012.297472: jbd2_lock_buffer_stall: dev 8,5 stall_ms 216
flush-8:0-240 [005] 2012.299828: jbd2_lock_buffer_stall: dev 8,5 stall_ms 120
dm is not obviously at fault there. sda5 is /usr/src (git checkout
running there with some logging), sda6 is /home and sda8 is / . This is
the tracepoint patch used.
---8<---
jbd2: Trace when lock_buffer at the start of a journal write takes a long time
While investigating interactivity problems it was clear that processes
sometimes stall for long periods of times if an attempt is made to lock
a buffer that is already part of a transaction. It would stall in a
trace looking something like
[<ffffffff811a39de>] __lock_buffer+0x2e/0x30
[<ffffffff8123a60f>] do_get_write_access+0x43f/0x4b0
[<ffffffff8123a7cb>] jbd2_journal_get_write_access+0x2b/0x50
[<ffffffff81220f79>] __ext4_journal_get_write_access+0x39/0x80
[<ffffffff811f3198>] ext4_reserve_inode_write+0x78/0xa0
[<ffffffff811f3209>] ext4_mark_inode_dirty+0x49/0x220
[<ffffffff811f57d1>] ext4_dirty_inode+0x41/0x60
[<ffffffff8119ac3e>] __mark_inode_dirty+0x4e/0x2d0
[<ffffffff8118b9b9>] update_time+0x79/0xc0
[<ffffffff8118ba98>] file_update_time+0x98/0x100
[<ffffffff81110ffc>] __generic_file_aio_write+0x17c/0x3b0
[<ffffffff811112aa>] generic_file_aio_write+0x7a/0xf0
[<ffffffff811ea853>] ext4_file_write+0x83/0xd0
[<ffffffff81172b23>] do_sync_write+0xa3/0xe0
[<ffffffff811731ae>] vfs_write+0xae/0x180
[<ffffffff8117361d>] sys_write+0x4d/0x90
[<ffffffff8159d62d>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
There was a suspicion that dm_crypt might be part responsible so this
patch adds a tracepoint capturing when lock_buffer takes too long
in do_get_write_access() that logs what device is being written and
how long the stall was for.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
fs/jbd2/transaction.c | 8 ++++++++
include/trace/events/jbd2.h | 21 +++++++++++++++++++++
2 files changed, 29 insertions(+)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 325bc01..1be0ccb 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -640,6 +640,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
int error;
char *frozen_buffer = NULL;
int need_copy = 0;
+ unsigned long start_lock, time_lock;
if (is_handle_aborted(handle))
return -EROFS;
@@ -655,9 +656,16 @@ repeat:
/* @@@ Need to check for errors here at some point. */
+ start_lock = jiffies;
lock_buffer(bh);
jbd_lock_bh_state(bh);
+ /* If it takes too long to lock the buffer, trace it */
+ time_lock = jbd2_time_diff(start_lock, jiffies);
+ if (time_lock > HZ/10)
+ trace_jbd2_lock_buffer_stall(bh->b_bdev->bd_dev,
+ jiffies_to_msecs(time_lock));
+
/* We now hold the buffer lock so it is safe to query the buffer
* state. Is the buffer dirty?
*
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 070df49..c1d1f3e 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -358,6 +358,27 @@ TRACE_EVENT(jbd2_write_superblock,
MINOR(__entry->dev), __entry->write_op)
);
+TRACE_EVENT(jbd2_lock_buffer_stall,
+
+ TP_PROTO(dev_t dev, unsigned long stall_ms),
+
+ TP_ARGS(dev, stall_ms),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field(unsigned long, stall_ms )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dev;
+ __entry->stall_ms = stall_ms;
+ ),
+
+ TP_printk("dev %d,%d stall_ms %lu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->stall_ms)
+);
+
#endif /* _TRACE_JBD2_H */
/* This part must be outside protection */
next prev parent reply other threads:[~2013-04-11 17:04 UTC|newest]
Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-02 14:27 Excessive stall times on ext4 in 3.9-rc2 Mel Gorman
2013-04-02 14:27 ` Mel Gorman
2013-04-02 15:00 ` Jiri Slaby
2013-04-02 15:00 ` Jiri Slaby
2013-04-02 15:03 ` Zheng Liu
2013-04-02 15:03 ` Zheng Liu
2013-04-02 15:15 ` Mel Gorman
2013-04-02 15:15 ` Mel Gorman
2013-04-02 15:06 ` Theodore Ts'o
2013-04-02 15:06 ` Theodore Ts'o
2013-04-02 15:14 ` Theodore Ts'o
2013-04-02 15:14 ` Theodore Ts'o
2013-04-02 18:19 ` Theodore Ts'o
2013-04-02 18:19 ` Theodore Ts'o
2013-04-07 21:59 ` Frank Ch. Eigler
2013-04-07 21:59 ` Frank Ch. Eigler
2013-04-08 8:36 ` Mel Gorman
2013-04-08 8:36 ` Mel Gorman
2013-04-08 10:52 ` Frank Ch. Eigler
2013-04-08 10:52 ` Frank Ch. Eigler
2013-04-08 11:01 ` Theodore Ts'o
2013-04-08 11:01 ` Theodore Ts'o
2013-04-03 10:19 ` Mel Gorman
2013-04-03 10:19 ` Mel Gorman
2013-04-03 12:05 ` Theodore Ts'o
2013-04-03 12:05 ` Theodore Ts'o
2013-04-03 15:15 ` Mel Gorman
2013-04-05 22:18 ` Jiri Slaby
2013-04-05 22:18 ` Jiri Slaby
2013-04-05 23:16 ` Theodore Ts'o
2013-04-05 23:16 ` Theodore Ts'o
2013-04-06 7:29 ` Jiri Slaby
2013-04-06 7:29 ` Jiri Slaby
2013-04-06 7:37 ` Jiri Slaby
2013-04-06 7:37 ` Jiri Slaby
2013-04-06 8:19 ` Jiri Slaby
2013-04-06 13:15 ` Theodore Ts'o
2013-04-06 13:15 ` Theodore Ts'o
2013-04-10 10:56 ` Mel Gorman
2013-04-10 10:56 ` Mel Gorman
2013-04-10 13:12 ` Theodore Ts'o
2013-04-10 13:12 ` Theodore Ts'o
2013-04-11 17:04 ` Mel Gorman [this message]
2013-04-11 17:04 ` Mel Gorman
2013-04-11 18:35 ` Theodore Ts'o
2013-04-11 18:35 ` Theodore Ts'o
2013-04-11 21:33 ` Jan Kara
2013-04-11 21:33 ` Jan Kara
2013-04-12 2:57 ` Theodore Ts'o
2013-04-12 2:57 ` Theodore Ts'o
2013-04-12 4:50 ` Dave Chinner
2013-04-12 4:50 ` Dave Chinner
2013-04-12 15:19 ` Theodore Ts'o
2013-04-12 15:19 ` Theodore Ts'o
2013-04-13 1:23 ` Dave Chinner
2013-04-13 1:23 ` Dave Chinner
2013-04-22 14:38 ` Mel Gorman
2013-04-22 14:38 ` Mel Gorman
2013-04-22 22:42 ` Jeff Moyer
2013-04-22 22:42 ` Jeff Moyer
2013-04-23 0:02 ` Theodore Ts'o
2013-04-23 0:02 ` Theodore Ts'o
2013-04-23 9:31 ` Jan Kara
2013-04-23 9:31 ` Jan Kara
2013-04-23 14:01 ` Mel Gorman
2013-04-23 14:01 ` Mel Gorman
2013-04-24 19:09 ` Jeff Moyer
2013-04-24 19:09 ` Jeff Moyer
2013-04-25 12:21 ` Mel Gorman
2013-04-25 12:21 ` Mel Gorman
2013-04-12 9:47 ` Mel Gorman
2013-04-12 9:47 ` Mel Gorman
2013-04-21 0:05 ` Theodore Ts'o
2013-04-21 0:05 ` Theodore Ts'o
2013-04-21 0:07 ` [PATCH 1/3] ext4: mark all metadata I/O with REQ_META Theodore Ts'o
2013-04-21 0:07 ` Theodore Ts'o
2013-04-21 0:07 ` [PATCH 2/3] buffer: add BH_Prio and BH_Meta flags Theodore Ts'o
2013-04-21 0:07 ` Theodore Ts'o
2013-04-21 0:07 ` [PATCH 3/3] ext4: mark metadata blocks using bh flags Theodore Ts'o
2013-04-21 0:07 ` Theodore Ts'o
2013-04-21 6:09 ` Jiri Slaby
2013-04-21 6:09 ` Jiri Slaby
2013-04-21 6:09 ` Jiri Slaby
2013-04-21 19:55 ` Theodore Ts'o
2013-04-21 19:55 ` Theodore Ts'o
2013-04-21 19:55 ` Theodore Ts'o
2013-04-21 20:48 ` [PATCH 3/3 -v2] " Theodore Ts'o
2013-04-21 20:48 ` Theodore Ts'o
2013-04-21 20:48 ` Theodore Ts'o
2013-04-22 12:06 ` [PATCH 1/3] ext4: mark all metadata I/O with REQ_META Zheng Liu
2013-04-22 12:06 ` Zheng Liu
2013-04-23 15:33 ` Excessive stall times on ext4 in 3.9-rc2 Mel Gorman
2013-04-23 15:33 ` Mel Gorman
2013-04-23 15:50 ` Theodore Ts'o
2013-04-23 15:50 ` Theodore Ts'o
2013-04-23 16:13 ` Mel Gorman
2013-04-23 16:13 ` Mel Gorman
2013-04-12 10:18 ` Tvrtko Ursulin
2013-04-12 10:18 ` Tvrtko Ursulin
2013-04-12 9:45 ` Mel Gorman
2013-04-12 9:45 ` Mel Gorman
2013-04-02 23:16 ` Theodore Ts'o
2013-04-02 23:16 ` Theodore Ts'o
2013-04-03 15:22 ` Mel Gorman
2013-04-03 15:22 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130411170402.GB11656@suse.de \
--to=mgorman@suse.de \
--cc=jslaby@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.