From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: "Cc: Ken Chen" <kenchen@google.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Subject: [PATCH 3/6] writeback: remove pages_skipped accounting in __block_write_full_page()
Date: Sun, 12 Aug 2007 17:11:23 +0800 [thread overview]
Message-ID: <386910468.27673@ustc.edu.cn> (raw)
Message-ID: <20070812092052.848213359@mail.ustc.edu.cn> (raw)
In-Reply-To: 20070812091120.189651872@mail.ustc.edu.cn
[-- Attachment #1: no-skipped.patch --]
[-- Type: text/plain, Size: 6253 bytes --]
Miklos Szeredi <miklos@szeredi.hu> and me identified a writeback bug:
> The following strange behavior can be observed:
>
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
>
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.
It can be produced by the following test scheme:
# cat bin/test-writeback.sh
grep nr_dirty /proc/vmstat
echo 1 > /proc/sys/fs/inode_debug
dd if=/dev/zero of=/var/x bs=1K count=204800&
while true; do grep nr_dirty /proc/vmstat; sleep 1; done
# bin/test-writeback.sh
nr_dirty 19207
nr_dirty 19207
nr_dirty 30924
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 1.58363 seconds, 132 MB/s
nr_dirty 47150
nr_dirty 47141
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47205
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47215
nr_dirty 47216
nr_dirty 47216
nr_dirty 47216
nr_dirty 47154
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47134
nr_dirty 47134
nr_dirty 47135
nr_dirty 47135
nr_dirty 47135
nr_dirty 46097 <== -1038
nr_dirty 46098
nr_dirty 46098
nr_dirty 46098
[...]
nr_dirty 46091
nr_dirty 46092
nr_dirty 46092
nr_dirty 45069 <== -1023
nr_dirty 45056
nr_dirty 45056
nr_dirty 45056
[...]
nr_dirty 37822
nr_dirty 36799 <== -1023
[...]
nr_dirty 36781
nr_dirty 35758 <== -1023
[...]
nr_dirty 34708
nr_dirty 33672 <== -1024
[...]
nr_dirty 33692
nr_dirty 32669 <== -1023
% ls -li /var/x
847824 -rw-r--r-- 1 root root 200M 2007-08-12 04:12 /var/x
% dmesg|grep 847824 # generated by a debug printk
[ 529.263184] redirtied inode 847824 line 548
[ 564.250872] redirtied inode 847824 line 548
[ 594.272797] redirtied inode 847824 line 548
[ 629.231330] redirtied inode 847824 line 548
[ 659.224674] redirtied inode 847824 line 548
[ 689.219890] redirtied inode 847824 line 548
[ 724.226655] redirtied inode 847824 line 548
[ 759.198568] redirtied inode 847824 line 548
# line 548 in fs/fs-writeback.c:
543 if (wbc->pages_skipped != pages_skipped) {
544 /*
545 * writeback is not making progress due to locked
546 * buffers. Skip this inode for now.
547 */
548 redirty_tail(inode);
549 }
More debug efforts show that __block_write_full_page()
never has the chance to call submit_bh() for that big dirty file:
the buffer head is *clean*. So basicly no page io is issued by
__block_write_full_page(), hence pages_skipped goes up.
This patch fixes this bug. I'm not quite sure about it.
But at least the comment in generic_sync_sb_inodes():
544 /*
545 * writeback is not making progress due to locked
546 * buffers. Skip this inode for now.
547 */
and the comment in __block_write_full_page():
1713 /*
1714 * The page was marked dirty, but the buffers were
1715 * clean. Someone wrote them back by hand with
1716 * ll_rw_block/submit_bh. A rare case.
1717 */
do not quite agree with each other. The page writeback is skipped not because
of 'locked buffer', but 'clean buffer'.
This is the new behavior after the patch:
# bin/test-writeback.sh
nr_dirty 60
847824 /var/x
nr_dirty 60
nr_dirty 31139
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 1.55338 seconds, 135 MB/s
nr_dirty 47137
nr_dirty 46147
nr_dirty 46147
nr_dirty 46147
nr_dirty 46148
nr_dirty 46148
nr_dirty 46148
nr_dirty 46148
nr_dirty 46193
nr_dirty 46193
nr_dirty 46193
nr_dirty 46193
nr_dirty 46126
nr_dirty 46126
nr_dirty 46126
nr_dirty 46126
nr_dirty 46126
nr_dirty 46109
nr_dirty 46109
nr_dirty 46109
nr_dirty 46113
nr_dirty 46113
nr_dirty 46106
nr_dirty 46106
nr_dirty 46106
nr_dirty 46106
nr_dirty 46106
nr_dirty 46089
nr_dirty 46089
nr_dirty 46090
nr_dirty 46093
nr_dirty 46093
nr_dirty 15
nr_dirty 15
nr_dirty 15
nr_dirty 15
It is pretty numbers: wait 30s and write ALL:)
But another run is not so good:
# sh bin/test-writeback.sh
mount: proc already mounted
nr_dirty 223
nr_dirty 223
nr_dirty 23664
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 1.51092 seconds, 139 MB/s
nr_dirty 47299
nr_dirty 47271
nr_dirty 47260
nr_dirty 47260
nr_dirty 47267
nr_dirty 47267
nr_dirty 47329
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47352
nr_dirty 47606
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47604
nr_dirty 47480
nr_dirty 47492
nr_dirty 47492
nr_dirty 47492
nr_dirty 47492
nr_dirty 46470
nr_dirty 46473
nr_dirty 46473
nr_dirty 46473
nr_dirty 46473
nr_dirty 45428
nr_dirty 45435
nr_dirty 45436
nr_dirty 45436
nr_dirty 45436
nr_dirty 257
nr_dirty 259
nr_dirty 259
nr_dirty 259
nr_dirty 259
nr_dirty 16
nr_dirty 16
nr_dirty 16
nr_dirty 16
nr_dirty 16
Basicly they are
- during the dd: ~16M
- after 30s: ~4M
- after 5s: ~4M
- after 5s: ~176M
The box has 2G memory.
Question 1:
How come the 5s delays? I run 4 tests in total, 2 of which have such 5s delays.
Question 2:
__block_write_full_page() is virtually doing nothing for the whole dirty file.
Isn't it abnormal? Who did the actual write back for us? The jounal? How to fix it?
Any suggestions? Thank you.
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/buffer.c | 1 -
1 file changed, 1 deletion(-)
--- linux-2.6.23-rc2-mm2.orig/fs/buffer.c
+++ linux-2.6.23-rc2-mm2/fs/buffer.c
@@ -1713,7 +1713,6 @@ done:
* The page and buffer_heads can be released at any time from
* here on.
*/
- wbc->pages_skipped++; /* We didn't write this page */
}
return err;
--
next prev parent reply other threads:[~2007-08-12 9:23 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-12 9:11 [PATCH 0/6] writeback time order/delay fixes take 3 Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-22 0:23 ` Chris Mason
2007-08-22 1:18 ` Fengguang Wu
2007-08-22 1:18 ` Fengguang Wu
2007-08-22 12:42 ` Chris Mason
2007-08-23 2:47 ` David Chinner
2007-08-23 12:13 ` Chris Mason
2007-08-24 12:56 ` Fengguang Wu
2007-08-24 12:56 ` Fengguang Wu
2007-08-24 13:24 ` Fengguang Wu
2007-08-24 13:24 ` Fengguang Wu
2007-08-24 14:36 ` Chris Mason
2007-08-23 2:33 ` David Chinner
2007-08-24 13:55 ` Fengguang Wu
2007-08-24 13:55 ` Fengguang Wu
2007-08-28 14:55 ` David Chinner
2007-08-28 15:08 ` Chris Mason
2007-08-28 16:33 ` David Chinner
2007-08-28 16:57 ` Chris Mason
2007-08-29 7:53 ` Fengguang Wu
2007-08-29 7:53 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 1/6] writeback: fix time ordering of the per superblock inode lists 8 Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 2/6] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu [this message]
2007-08-12 9:11 ` [PATCH 3/6] writeback: remove pages_skipped accounting in __block_write_full_page() Fengguang Wu
2007-08-13 1:03 ` David Chinner
2007-08-13 10:30 ` Fengguang Wu
2007-08-13 10:30 ` Fengguang Wu
2007-08-17 7:13 ` Fengguang Wu
2007-08-17 7:13 ` Fengguang Wu
2007-08-17 7:13 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 4/6] check dirty inode list Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 5/6] prevent time-ordering warnings Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 6/6] track redirty_tail() calls Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
-- strict thread matches above, loose matches on Subject: below --
2007-08-19 6:53 [PATCH 0/6] dirty inode lists time delay/ordering fixes Fengguang Wu
2007-08-19 6:53 ` [PATCH 3/6] writeback: remove pages_skipped accounting in __block_write_full_page() Fengguang Wu
2007-08-19 6:53 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=386910468.27673@ustc.edu.cn \
--to=wfg@mail.ustc.edu.cn \
--cc=akpm@linux-foundation.org \
--cc=akpm@osdl.org \
--cc=kenchen@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.