From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
Zwane Mwaikambo <zwane@arm.linux.org.uk>,
"Theodore Ts'o" <tytso@mit.edu>,
Randy Dunlap <rdunlap@xenotime.net>,
Dave Jones <davej@redhat.com>,
Chuck Wolber <chuckw@quantumlinux.com>,
Chris Wedgwood <reviews@ml.cw.f00f.org>,
Michael Krufky <mkrufky@linuxtv.org>,
Chuck Ebbert <cebbert@redhat.com>,
Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
Jake Edge <jake@lwn.net>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Jens Axboe <jens.axboe@oracle.com>
Subject: [patch 09/47] block: Properly notify block layer of sync writes
Date: Tue, 22 Jul 2008 16:14:48 -0700 [thread overview]
Message-ID: <20080722231448.GJ8282@suse.de> (raw)
In-Reply-To: <20080722231342.GA8282@suse.de>
[-- Attachment #1: block-properly-notify-block-layer-of-sync-writes.patch --]
[-- Type: text/plain, Size: 3958 bytes --]
2.6.25-stable review patch. If anyone has any objections, please let us
know.
------------------
From: Jens Axboe <jens.axboe@oracle.com>
commit 18ce3751ccd488c78d3827e9f6bf54e6322676fb upstream
fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.
This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:
INFO: task kjournald:20558 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kjournald D ffff810010820978 6712 20558 2
ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
Call Trace:
[<ffffffff803ba6f2>] kobject_get+0x12/0x17
[<ffffffff80247537>] getnstimeofday+0x2f/0x83
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d195>] io_schedule+0x5d/0x9f
[<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f
[<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff80243909>] wake_bit_function+0x0/0x23
[<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb
[<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6
[<ffffffff8023a676>] lock_timer_base+0x26/0x4b
[<ffffffff8030300a>] kjournald+0xc1/0x1fb
[<ffffffff802438db>] autoremove_wake_function+0x0/0x2e
[<ffffffff80302f49>] kjournald+0x0/0x1fb
[<ffffffff802437bb>] kthread+0x47/0x74
[<ffffffff8022de51>] schedule_tail+0x28/0x5d
[<ffffffff8020cac8>] child_rip+0xa/0x12
[<ffffffff80243774>] kthread+0x0/0x74
[<ffffffff8020cabe>] child_rip+0x0/0x12
Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/buffer.c | 13 ++++++++-----
include/linux/fs.h | 1 +
2 files changed, 9 insertions(+), 5 deletions(-)
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -818,7 +818,7 @@ static int fsync_buffers_list(spinlock_t
* contents - it is a noop if I/O is still in
* flight on potentially older contents.
*/
- ll_rw_block(SWRITE, 1, &bh);
+ ll_rw_block(SWRITE_SYNC, 1, &bh);
brelse(bh);
spin_lock(lock);
}
@@ -2952,16 +2952,19 @@ void ll_rw_block(int rw, int nr, struct
for (i = 0; i < nr; i++) {
struct buffer_head *bh = bhs[i];
- if (rw == SWRITE)
+ if (rw == SWRITE || rw == SWRITE_SYNC)
lock_buffer(bh);
else if (test_set_buffer_locked(bh))
continue;
- if (rw == WRITE || rw == SWRITE) {
+ if (rw == WRITE || rw == SWRITE || rw == SWRITE_SYNC) {
if (test_clear_buffer_dirty(bh)) {
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
- submit_bh(WRITE, bh);
+ if (rw == SWRITE_SYNC)
+ submit_bh(WRITE_SYNC, bh);
+ else
+ submit_bh(WRITE, bh);
continue;
}
} else {
@@ -2990,7 +2993,7 @@ int sync_dirty_buffer(struct buffer_head
if (test_clear_buffer_dirty(bh)) {
get_bh(bh);
bh->b_end_io = end_buffer_write_sync;
- ret = submit_bh(WRITE, bh);
+ ret = submit_bh(WRITE_SYNC, bh);
wait_on_buffer(bh);
if (buffer_eopnotsupp(bh)) {
clear_buffer_eopnotsupp(bh);
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -83,6 +83,7 @@ extern int dir_notify_enable;
#define READ_SYNC (READ | (1 << BIO_RW_SYNC))
#define READ_META (READ | (1 << BIO_RW_META))
#define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNC))
+#define SWRITE_SYNC (SWRITE | (1 << BIO_RW_SYNC))
#define WRITE_BARRIER ((1 << BIO_RW) | (1 << BIO_RW_BARRIER))
#define SEL_IN 1
--
next prev parent reply other threads:[~2008-07-22 23:21 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20080722230208.148102983@mini.kroah.org>
2008-07-22 23:13 ` [patch 00/47] 2.6.25-stable review Greg KH
2008-07-22 23:14 ` [patch 01/47] b43legacy: Do not return TX_BUSY from op_tx Greg KH
2008-07-22 23:14 ` [patch 02/47] b43: " Greg KH
2008-07-22 23:14 ` [patch 03/47] b43: Fix possible MMIO access while device is down Greg KH
2008-07-22 23:14 ` [patch 04/47] mac80211: detect driver tx bugs Greg KH
2008-07-22 23:14 ` [patch 05/47] block: Fix the starving writes bug in the anticipatory IO scheduler Greg KH
2008-07-22 23:14 ` [patch 06/47] md: Fix error paths if md_probe fails Greg KH
2008-07-22 23:14 ` [patch 07/47] md: Dont acknowlege that stripe-expand is complete until it really is Greg KH
2008-07-22 23:14 ` [patch 08/47] md: Ensure interrupted recovery completed properly (v1 metadata plus bitmap) Greg KH
2008-07-22 23:14 ` Greg KH [this message]
2008-07-22 23:14 ` [patch 10/47] OHCI: Fix problem if SM501 and another platform driver is selected Greg KH
2008-07-22 23:14 ` [patch 11/47] USB: ehci - fix timer regression Greg KH
2008-07-22 23:14 ` [patch 12/47] USB: ohci - record data toggle after unlink Greg KH
2008-07-22 23:15 ` [patch 13/47] USB: fix interrupt disabling for HCDs with shared interrupt handlers Greg KH
2008-07-22 23:15 ` [patch 14/47] hdaps: add support for various newer Lenovo thinkpads Greg KH
2008-07-22 23:15 ` [patch 15/47] b43legacy: Fix possible NULL pointer dereference in DMA code Greg KH
2008-07-22 23:15 ` [patch 16/47] netdrvr: 3c59x: remove irqs_disabled warning from local_bh_enable Greg KH
2008-07-22 23:15 ` [patch 17/47] SCSI: esp: Fix OOPS in esp_reset_cleanup() Greg KH
2008-07-22 23:15 ` [patch 18/47] SCSI: esp: tidy up target reference counting Greg KH
2008-07-22 23:15 ` [patch 19/47] SCSI: ses: Fix timeout Greg KH
2008-07-22 23:16 ` [patch 20/47] mm: switch node meminfo Active & Inactive pages to Kbytes Greg KH
2008-07-22 23:16 ` [patch 21/47] reiserfs: discard prealloc in reiserfs_delete_inode Greg KH
2008-07-22 23:16 ` [patch 22/47] cciss: read config to obtain max outstanding commands per controller Greg KH
2008-07-22 23:16 ` [patch 23/47] serial: fix serial_match_port() for dynamic major tty-device numbers Greg KH
2008-07-22 23:16 ` [patch 24/47] can: add sanity checks Greg KH
2008-07-22 23:16 ` [patch 25/47] sisusbvga: Fix oops on disconnect Greg KH
2008-07-22 23:16 ` [patch 26/47] md: ensure all blocks are uptodate or locked when syncing Greg KH
2008-07-22 23:16 ` [patch 27/47] textsearch: fix Boyer-Moore text search bug Greg KH
2008-07-22 23:16 ` [patch 28/47] netfilter: nf_conntrack_tcp: fixing to check the lower bound of valid ACK Greg KH
2008-07-22 23:16 ` [patch 29/47] zd1211rw: add ID for AirTies WUS-201 Greg KH
2008-07-22 23:16 ` [patch 30/47] exec: fix stack excutability without PT_GNU_STACK Greg KH
2008-07-22 23:16 ` [patch 31/47] slub: Fix use-after-preempt of per-CPU data structure Greg KH
2008-07-22 23:16 ` [patch 32/47] rtc: fix reported IRQ rate for when HPET is enabled Greg KH
2008-07-22 23:16 ` [patch 33/47] rapidio: fix device reference counting Greg KH
2008-07-22 23:16 ` [patch 34/47] tpm: add Intel TPM TIS device HID Greg KH
2008-07-22 23:16 ` [patch 35/47] cifs: fix wksidarr declaration to be big-endian friendly Greg KH
2008-07-22 23:16 ` [patch 36/47] ov7670: clean up ov7670_read semantics Greg KH
2008-07-22 23:17 ` [patch 37/47] serial8250: sanity check nr_uarts on all paths Greg KH
2008-07-22 23:17 ` [patch 38/47] fbdev: bugfix for multiprocess defio Greg KH
2008-07-22 23:17 ` [patch 39/47] drivers/isdn/i4l/isdn_common.c fix small resource leak Greg KH
2008-07-22 23:17 ` [patch 40/47] drivers/char/pcmcia/ipwireless/hardware.c fix " Greg KH
2008-07-22 23:17 ` [patch 41/47] SCSI: mptspi: fix oops in mptspi_dv_renegotiate_work() Greg KH
2008-07-22 23:17 ` [patch 42/47] crypto: chainiv - Invoke completion function Greg KH
2008-07-22 23:17 ` [patch 43/47] powerpc: Add missing reference to coherent_dma_mask Greg KH
2008-07-22 23:17 ` [patch 44/47] pxamci: fix byte aligned DMA transfers Greg KH
2008-07-23 7:01 ` pHilipp Zabel
2008-07-23 20:12 ` [stable] " Greg KH
2008-07-23 20:24 ` Linus Torvalds
2008-07-23 20:32 ` Greg KH
2008-07-24 10:33 ` pHilipp Zabel
2008-07-24 15:05 ` Greg KH
2008-07-24 19:22 ` Linus Torvalds
2008-07-24 20:34 ` Pierre Ossman
2008-07-22 23:17 ` [patch 45/47] mmc: dont use DMA on newer ENE controllers Greg KH
2008-07-22 23:17 ` [patch 46/47] hrtimer: prevent migration for raising softirq Greg KH
2008-07-22 23:17 ` [patch 47/47] V4L/DVB (7475): Added support for Terratec Cinergy T USB XXS Greg KH
2008-07-23 4:42 ` [patch 00/47] 2.6.25-stable review Michael Krufky
2008-07-23 4:51 ` Michael Krufky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080722231448.GJ8282@suse.de \
--to=gregkh@suse.de \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=cavokz@gmail.com \
--cc=cebbert@redhat.com \
--cc=chuckw@quantumlinux.com \
--cc=davej@redhat.com \
--cc=jake@lwn.net \
--cc=jens.axboe@oracle.com \
--cc=jmforbes@linuxtx.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkrufky@linuxtv.org \
--cc=rbranco@la.checkpoint.com \
--cc=rdunlap@xenotime.net \
--cc=reviews@ml.cw.f00f.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=w@1wt.eu \
--cc=zwane@arm.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox