From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: "Cc: Ken Chen" <kenchen@google.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Subject: [PATCH 3/6] writeback: remove pages_skipped accounting in __block_write_full_page()
Date: Sun, 12 Aug 2007 17:11:23 +0800 [thread overview]
Message-ID: <386910468.27673@ustc.edu.cn> (raw)
Message-ID: <20070812092052.848213359@mail.ustc.edu.cn> (raw)
In-Reply-To: 20070812091120.189651872@mail.ustc.edu.cn
[-- Attachment #1: no-skipped.patch --]
[-- Type: text/plain, Size: 6253 bytes --]
Miklos Szeredi <miklos@szeredi.hu> and me identified a writeback bug:
> The following strange behavior can be observed:
>
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
>
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.
It can be produced by the following test scheme:
# cat bin/test-writeback.sh
grep nr_dirty /proc/vmstat
echo 1 > /proc/sys/fs/inode_debug
dd if=/dev/zero of=/var/x bs=1K count=204800&
while true; do grep nr_dirty /proc/vmstat; sleep 1; done
# bin/test-writeback.sh
nr_dirty 19207
nr_dirty 19207
nr_dirty 30924
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 1.58363 seconds, 132 MB/s
nr_dirty 47150
nr_dirty 47141
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47205
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47215
nr_dirty 47216
nr_dirty 47216
nr_dirty 47216
nr_dirty 47154
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47134
nr_dirty 47134
nr_dirty 47135
nr_dirty 47135
nr_dirty 47135
nr_dirty 46097 <== -1038
nr_dirty 46098
nr_dirty 46098
nr_dirty 46098
[...]
nr_dirty 46091
nr_dirty 46092
nr_dirty 46092
nr_dirty 45069 <== -1023
nr_dirty 45056
nr_dirty 45056
nr_dirty 45056
[...]
nr_dirty 37822
nr_dirty 36799 <== -1023
[...]
nr_dirty 36781
nr_dirty 35758 <== -1023
[...]
nr_dirty 34708
nr_dirty 33672 <== -1024
[...]
nr_dirty 33692
nr_dirty 32669 <== -1023
% ls -li /var/x
847824 -rw-r--r-- 1 root root 200M 2007-08-12 04:12 /var/x
% dmesg|grep 847824 # generated by a debug printk
[ 529.263184] redirtied inode 847824 line 548
[ 564.250872] redirtied inode 847824 line 548
[ 594.272797] redirtied inode 847824 line 548
[ 629.231330] redirtied inode 847824 line 548
[ 659.224674] redirtied inode 847824 line 548
[ 689.219890] redirtied inode 847824 line 548
[ 724.226655] redirtied inode 847824 line 548
[ 759.198568] redirtied inode 847824 line 548
# line 548 in fs/fs-writeback.c:
543 if (wbc->pages_skipped != pages_skipped) {
544 /*
545 * writeback is not making progress due to locked
546 * buffers. Skip this inode for now.
547 */
548 redirty_tail(inode);
549 }
More debug efforts show that __block_write_full_page()
never has the chance to call submit_bh() for that big dirty file:
the buffer head is *clean*. So basicly no page io is issued by
__block_write_full_page(), hence pages_skipped goes up.
This patch fixes this bug. I'm not quite sure about it.
But at least the comment in generic_sync_sb_inodes():
544 /*
545 * writeback is not making progress due to locked
546 * buffers. Skip this inode for now.
547 */
and the comment in __block_write_full_page():
1713 /*
1714 * The page was marked dirty, but the buffers were
1715 * clean. Someone wrote them back by hand with
1716 * ll_rw_block/submit_bh. A rare case.
1717 */
do not quite agree with each other. The page writeback is skipped not because
of 'locked buffer', but 'clean buffer'.
This is the new behavior after the patch:
# bin/test-writeback.sh
nr_dirty 60
847824 /var/x
nr_dirty 60
nr_dirty 31139
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 1.55338 seconds, 135 MB/s
nr_dirty 47137
nr_dirty 46147
nr_dirty 46147
nr_dirty 46147
nr_dirty 46148
nr_dirty 46148
nr_dirty 46148
nr_dirty 46148
nr_dirty 46193
nr_dirty 46193
nr_dirty 46193
nr_dirty 46193
nr_dirty 46126
nr_dirty 46126
nr_dirty 46126
nr_dirty 46126
nr_dirty 46126
nr_dirty 46109
nr_dirty 46109
nr_dirty 46109
nr_dirty 46113
nr_dirty 46113
nr_dirty 46106
nr_dirty 46106
nr_dirty 46106
nr_dirty 46106
nr_dirty 46106
nr_dirty 46089
nr_dirty 46089
nr_dirty 46090
nr_dirty 46093
nr_dirty 46093
nr_dirty 15
nr_dirty 15
nr_dirty 15
nr_dirty 15
It is pretty numbers: wait 30s and write ALL:)
But another run is not so good:
# sh bin/test-writeback.sh
mount: proc already mounted
nr_dirty 223
nr_dirty 223
nr_dirty 23664
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 1.51092 seconds, 139 MB/s
nr_dirty 47299
nr_dirty 47271
nr_dirty 47260
nr_dirty 47260
nr_dirty 47267
nr_dirty 47267
nr_dirty 47329
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47606
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47480
nr_dirty 47492
nr_dirty 47492
nr_dirty 47492
nr_dirty 47492
nr_dirty 46470
nr_dirty 46473
nr_dirty 46473
nr_dirty 46473
nr_dirty 46473
nr_dirty 45428
nr_dirty 45435
nr_dirty 45436
nr_dirty 45436
nr_dirty 45436
nr_dirty 257
nr_dirty 259
nr_dirty 259
nr_dirty 259
nr_dirty 259
nr_dirty 16
nr_dirty 16
nr_dirty 16
nr_dirty 16
nr_dirty 16
Basicly they are
- during the dd: ~16M
- after 30s: ~4M
- after 5s: ~4M
- after 5s: ~176M
The box has 2G memory.
Question 1:
How come the 5s delays? I run 4 tests in total, 2 of which have such 5s delays.
Question 2:
__block_write_full_page() is virtually doing nothing for the whole dirty file.
Isn't it abnormal? Who did the actual write back for us? The jounal? How to fix it?
Any suggestions? Thank you.
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/buffer.c | 1 -
1 file changed, 1 deletion(-)
--- linux-2.6.23-rc2-mm2.orig/fs/buffer.c
+++ linux-2.6.23-rc2-mm2/fs/buffer.c
@@ -1713,7 +1713,6 @@ done:
* The page and buffer_heads can be released at any time from
* here on.
*/
- wbc->pages_skipped++; /* We didn't write this page */
}
return err;
--
next prev parent reply other threads:[~2007-08-12 9:21 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20070812091120.189651872@mail.ustc.edu.cn>
2007-08-12 9:11 ` [PATCH 0/6] writeback time order/delay fixes take 3 Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-22 0:23 ` Chris Mason
[not found] ` <20070822011841.GA8090@mail.ustc.edu.cn>
2007-08-22 1:18 ` Fengguang Wu
2007-08-22 1:18 ` Fengguang Wu
2007-08-22 12:42 ` Chris Mason
2007-08-23 2:47 ` David Chinner
2007-08-23 12:13 ` Chris Mason
[not found] ` <20070824125643.GB7933@mail.ustc.edu.cn>
2007-08-24 12:56 ` Fengguang Wu
2007-08-24 12:56 ` Fengguang Wu
[not found] ` <20070824132458.GC7933@mail.ustc.edu.cn>
2007-08-24 13:24 ` Fengguang Wu
2007-08-24 13:24 ` Fengguang Wu
2007-08-24 14:36 ` Chris Mason
2007-08-23 2:33 ` David Chinner
[not found] ` <20070824135504.GA9029@mail.ustc.edu.cn>
2007-08-24 13:55 ` Fengguang Wu
2007-08-24 13:55 ` Fengguang Wu
[not found] ` <20070828145530.GD61154114@sgi.com>
[not found] ` <20070828110820.542bbd67@think.oraclecorp.com>
[not found] ` <20070828163308.GE61154114@sgi.com>
[not found] ` <20070829075330.GA5960@mail.ustc.edu.cn>
2007-08-29 7:53 ` Fengguang Wu
2007-08-29 7:53 ` Fengguang Wu
[not found] ` <20070812092052.558804846@mail.ustc.edu.cn>
2007-08-12 9:11 ` [PATCH 1/6] writeback: fix time ordering of the per superblock inode lists 8 Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
[not found] ` <20070812092052.704326603@mail.ustc.edu.cn>
2007-08-12 9:11 ` [PATCH 2/6] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
[not found] ` <20070812092052.848213359@mail.ustc.edu.cn>
2007-08-12 9:11 ` [PATCH 3/6] writeback: remove pages_skipped accounting in __block_write_full_page() Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu [this message]
2007-08-13 1:03 ` David Chinner
[not found] ` <20070813103000.GA8520@mail.ustc.edu.cn>
2007-08-13 10:30 ` Fengguang Wu
2007-08-13 10:30 ` Fengguang Wu
[not found] ` <20070817071317.GA8965@mail.ustc.edu.cn>
2007-08-17 7:13 ` Fengguang Wu
[not found] ` <20070812092052.983296733@mail.ustc.edu.cn>
2007-08-12 9:11 ` [PATCH 4/6] check dirty inode list Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
[not found] ` <20070812092053.113127445@mail.ustc.edu.cn>
2007-08-12 9:11 ` [PATCH 5/6] prevent time-ordering warnings Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
[not found] ` <20070812092053.242474484@mail.ustc.edu.cn>
2007-08-12 9:11 ` [PATCH 6/6] track redirty_tail() calls Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=386910468.27673@ustc.edu.cn \
--to=wfg@mail.ustc.edu.cn \
--cc=akpm@linux-foundation.org \
--cc=akpm@osdl.org \
--cc=kenchen@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).