* 2.6.24-rc1 - Regularly getting processes stuck in D state on startup
@ 2007-11-05 18:23 David
2007-11-06 6:46 ` Stephen Rothwell
[not found] ` <E1IpJLy-0002ag-TL@localhost>
0 siblings, 2 replies; 12+ messages in thread
From: David @ 2007-11-05 18:23 UTC (permalink / raw)
To: Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 378 bytes --]
I've been testing rc1 for a week or so, and about 25% of the time I'm
seeing Firefox and Thunderbird getting stuck in 'D' state as they startup.
I've attached the output of Sysrq-T to this mail... system is a
dual-core AMD64, and files are on a RAID-1 root partition connected two
SATA disks on the on-board NVidia controller. I've had no problems
before .24 rc1
Cheers
David
[-- Attachment #2: sysrq-t.bz2 --]
[-- Type: application/x-bzip, Size: 15488 bytes --]
[-- Attachment #3: config.gz --]
[-- Type: application/x-gzip, Size: 18889 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup 2007-11-05 18:23 2.6.24-rc1 - Regularly getting processes stuck in D state on startup David @ 2007-11-06 6:46 ` Stephen Rothwell 2007-11-06 12:20 ` Peter Zijlstra 2007-11-07 3:24 ` Stephen Rothwell [not found] ` <E1IpJLy-0002ag-TL@localhost> 1 sibling, 2 replies; 12+ messages in thread From: Stephen Rothwell @ 2007-11-06 6:46 UTC (permalink / raw) To: David; +Cc: Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 1170 bytes --] On Mon, 05 Nov 2007 18:23:07 +0000 David <david@unsolicited.net> wrote: > > I've been testing rc1 for a week or so, and about 25% of the time I'm > seeing Firefox and Thunderbird getting stuck in 'D' state as they startup. > > I've attached the output of Sysrq-T to this mail... system is a > dual-core AMD64, and files are on a RAID-1 root partition connected two > SATA disks on the on-board NVidia controller. I've had no problems > before .24 rc1 I am seeing something very similar on a PowerPC machine where copying a file from an LVM volume with ext3 on it to a simple scsi partition (again ext3) on the same disk will hang in congestion_wait. If I am patient enough, the copy makes very slow progress. A kill -9 will kill it eventually, but a simple control-C will not. This hang occurs more often than not (and usually when I am trying to install a new kernel into /boot for testing :-)). I don't have access to the machine today, but if more information would be useful, I could boot into 2.6.24-rc1-<mumble> again tomorrow. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup 2007-11-06 6:46 ` Stephen Rothwell @ 2007-11-06 12:20 ` Peter Zijlstra 2007-11-07 3:24 ` Stephen Rothwell 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2007-11-06 12:20 UTC (permalink / raw) To: Stephen Rothwell Cc: David, Linux Kernel Mailing List, Fengguang Wu, Andrew Morton, Dave Chinner, Christoph Lameter [-- Attachment #1: Type: text/plain, Size: 9125 bytes --] On Tue, 2007-11-06 at 17:46 +1100, Stephen Rothwell wrote: > On Mon, 05 Nov 2007 18:23:07 +0000 David <david@unsolicited.net> wrote: > > > > I've been testing rc1 for a week or so, and about 25% of the time I'm > > seeing Firefox and Thunderbird getting stuck in 'D' state as they startup. > > > > I've attached the output of Sysrq-T to this mail... system is a > > dual-core AMD64, and files are on a RAID-1 root partition connected two > > SATA disks on the on-board NVidia controller. I've had no problems > > before .24 rc1 > > I am seeing something very similar on a PowerPC machine where copying a > file from an LVM volume with ext3 on it to a simple scsi partition (again > ext3) on the same disk will hang in congestion_wait. If I am patient > enough, the copy makes very slow progress. A kill -9 will kill it > eventually, but a simple control-C will not. > > This hang occurs more often than not (and usually when I am trying to > install a new kernel into /boot for testing :-)). > > I don't have access to the machine today, but if more information would > be useful, I could boot into 2.6.24-rc1-<mumble> again tomorrow. LVM will provide a different BDI even though it could be on the same disk as another 'real' partition. Still that should not make the copy take that long. I tried copying a 1M file from the lvm to a real partition on the same disk (after ensuring the lvm had all the dirty limit), works like advertised. x86_64 SMP PREEMPT v2.6.24-rc1-748-g2655e2c + the four attached patches rawhide x86_64 userland To test this scenario I made an lvm thingy /dev/lvm/foo on /dev/sdb6 / -> /dev/sda3 /dev/sdb1 /mnt/sdb1 /dev/lvm/foo -> /mnt/foo All ext3 for this test. The pretty numbers come from: # while sleep 1; do cat /sys/class/bdi/*/bdi_dirty_kb | awk '{t=$0; n+= $0; while (getline) { t=t " " $0; n+=$0; } ; getline total < "/sys/class/bdi/sda/dirty_kb" ; print t " : " n "/" total }' ; done while doing: # dd if=/dev/zero of=/mnt/foo/zero bs=4096 count=$((1024*1024/4)) dm-0 ............................................. sda sdb .......... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 159440 0 0 0 0 0 0 : 159440/193540 5848 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89588 0 0 0 0 0 0 : 95436/193092 41488 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 82908 0 0 0 0 0 0 : 124396/192576 69984 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 62100 0 0 0 0 0 0 : 132084/191952 93488 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 67132 0 0 0 0 0 0 : 160620/191752 114452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 57676 0 0 0 0 0 0 : 172128/191696 124260 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 53508 0 0 0 0 0 0 : 177768/191544 138072 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 53140 0 0 0 0 0 0 : 191212/191252 145004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45748 0 0 0 0 0 0 : 190752/190804 155408 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35508 0 0 0 0 0 0 : 190916/190920 162252 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29192 0 0 0 0 0 0 : 191444/191392 165968 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25108 0 0 0 0 0 0 : 191076/191036 168480 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22316 0 0 0 0 0 0 : 190796/190768 173308 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17428 0 0 0 0 0 0 : 190736/190640 177504 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13784 0 0 0 0 0 0 : 191288/191240 179792 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12036 0 0 0 0 0 0 : 191828/191768 179976 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11920 0 0 0 0 0 0 : 191896/191836 179956 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11920 0 0 0 0 0 0 : 191876/191828 179996 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11900 0 0 0 0 0 0 : 191896/191836 180088 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 191992/191932 180084 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 191988/191928 180092 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 191996/191948 180108 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192012/191952 180128 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192032/191976 180112 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192016/191968 180124 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192028/191972 180120 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192024/191964 180116 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192020/191960 180108 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192012/191952 180116 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192020/191960 180112 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192016/191956 180116 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192020/191960 180108 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11904 0 0 0 0 0 0 : 192012/191964 182444 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9344 0 0 0 0 0 0 : 191788/191744 182436 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9344 0 0 0 0 0 0 : 191780/191736 182452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9344 0 0 0 0 0 0 : 191796/191752 182412 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9340 0 0 0 0 0 0 : 191752/191712 182436 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9344 0 0 0 0 0 0 : 191780/191736 182620 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9352 0 0 0 0 0 0 : 191972/191940 182616 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9352 0 0 0 0 0 0 : 191968/191924 182600 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9352 0 0 0 0 0 0 : 191952/191920 182636 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9352 0 0 0 0 0 0 : 191988/191948 # dd if=/dev/zero of=/mnt/sdb1/zero bs=4096 count=$((1024*1024/4)) dm-0 ............................................. sda sdb .......... 107608 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9344 0 0 0 0 0 0 : 116952/191732 78824 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7984 27644 0 0 0 0 0 : 114452/191544 77372 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6548 56972 0 0 0 0 0 : 140892/191400 81412 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5392 80476 0 0 0 0 0 : 167280/191224 76444 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4252 104060 0 0 0 0 0 : 184756/191492 63408 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3412 121332 0 0 0 0 0 : 188152/191464 57868 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2976 130160 0 0 0 0 0 : 191004/191368 49324 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2520 139324 0 0 0 0 0 : 191168/191192 40516 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2072 148420 0 0 0 0 0 : 191008/191020 33748 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1724 156288 0 0 0 0 0 : 191760/191772 29280 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1496 160896 0 0 0 0 0 : 191672/191688 26288 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1344 163744 0 0 0 0 0 : 191376/191400 21440 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1096 168844 0 0 0 0 0 : 191380/191372 17796 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 908 172452 0 0 0 0 0 : 191156/191164 16004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 816 174636 0 0 0 0 0 : 191456/191468 15048 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 768 175836 0 0 0 0 0 : 191652/191664 15052 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 768 175896 0 0 0 0 0 : 191716/191728 12904 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 660 178228 0 0 0 0 0 : 191792/191812 12880 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 656 178264 0 0 0 0 0 : 191800/191812 12884 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 656 178284 0 0 0 0 0 : 191824/191832 12900 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 656 178512 0 0 0 0 0 : 192068/192092 12900 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 656 178528 0 0 0 0 0 : 192084/192096 12900 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 656 178516 0 0 0 0 0 : 192072/192084 9256 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182184 0 0 0 0 0 : 191912/191892 9256 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182156 0 0 0 0 0 : 191884/191860 9256 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182180 0 0 0 0 0 : 191908/191888 9256 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182172 0 0 0 0 0 : 191900/191880 9260 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182192 0 0 0 0 0 : 191924/191900 9268 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182352 0 0 0 0 0 : 192092/192080 9268 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182384 0 0 0 0 0 : 192124/192100 9268 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182372 0 0 0 0 0 : 192112/192100 9268 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182380 0 0 0 0 0 : 192120/192096 9268 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182364 0 0 0 0 0 : 192104/192092 9268 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182396 0 0 0 0 0 : 192136/192112 9268 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 472 182392 0 0 0 0 0 : 192132/192108 [-- Attachment #2: wu-reiser.patch --] [-- Type: application/mbox, Size: 5588 bytes --] [-- Attachment #3: writeback-early.patch --] [-- Type: text/x-patch, Size: 1939 bytes --] Subject: mm: speed up writeback ramp-up on clean systems We allow violation of bdi limits if there is a lot of room on the system. Once we hit half the total limit we start enforcing bdi limits and bdi ramp-up should happen. Doing it this way avoids many small writeouts on an otherwise idle system and should also speed up the ramp-up. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- mm/page-writeback.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) Index: linux-2.6/mm/page-writeback.c =================================================================== --- linux-2.6.orig/mm/page-writeback.c 2007-09-28 10:08:33.937415368 +0200 +++ linux-2.6/mm/page-writeback.c 2007-09-28 10:54:26.018247516 +0200 @@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long */ static void balance_dirty_pages(struct address_space *mapping) { - long bdi_nr_reclaimable; - long bdi_nr_writeback; + long nr_reclaimable, bdi_nr_reclaimable; + long nr_writeback, bdi_nr_writeback; long background_thresh; long dirty_thresh; long bdi_thresh; @@ -376,11 +376,26 @@ static void balance_dirty_pages(struct a get_dirty_limits(&background_thresh, &dirty_thresh, &bdi_thresh, bdi); + + nr_reclaimable = global_page_state(NR_FILE_DIRTY) + + global_page_state(NR_UNSTABLE_NFS); + nr_writeback = global_page_state(NR_WRITEBACK); + bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) break; + /* + * Throttle it only when the background writeback cannot + * catch-up. This avoids (excessively) small writeouts + * when the bdi limits are ramping up. + */ + if (nr_reclaimable + nr_writeback < + (background_thresh + dirty_thresh) / 2) + break; + if (!bdi->dirty_exceeded) bdi->dirty_exceeded = 1; [-- Attachment #4: bdi-task-dirty.patch --] [-- Type: text/x-patch, Size: 1314 bytes --] Subject: mm: bdi: tweak task dirty penalty Penalizing heavy dirtiers with 1/8-th the total dirty limit might be rather excessive on large memory machines. Use sqrt to scale it sub-linearly. Update the comment while we're there. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- mm/page-writeback.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) Index: linux-2.6-2/mm/page-writeback.c =================================================================== --- linux-2.6-2.orig/mm/page-writeback.c +++ linux-2.6-2/mm/page-writeback.c @@ -213,17 +213,21 @@ static inline void task_dirties_fraction } /* - * scale the dirty limit + * Task specific dirty limit: * - * task specific dirty limit: + * dirty -= 8 * sqrt(dirty) * p_{t} * - * dirty -= (dirty/8) * p_{t} + * Penalize tasks that dirty a lot of pages by lowering their dirty limit. This + * avoids infrequent dirtiers from getting stuck in this other guys dirty + * pages. + * + * Use a sub-linear function to scale the penalty, we only need a little room. */ void task_dirty_limit(struct task_struct *tsk, long *pdirty) { long numerator, denominator; long dirty = *pdirty; - u64 inv = dirty >> 3; + u64 inv = 8*int_sqrt(dirty); task_dirties_fraction(tsk, &numerator, &denominator); inv *= numerator; [-- Attachment #5: bdi-sysfs.patch --] [-- Type: text/x-patch, Size: 14227 bytes --] Subject: mm: sysfs: expose the BDI object in sysfs Provide a place in sysfs for the backing_dev_info object. This allows us to see and set the various BDI specific variables. In particular this properly exposes the read-ahead window for all relevant users and /sys/block/<block>/queue/read_ahead_kb should be deprecated. With patient help from Kay Sievers and Greg KH Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- block/genhd.c | 3 + fs/fuse/inode.c | 3 - fs/nfs/client.c | 24 +++++---- fs/nfs/internal.h | 10 ++-- fs/nfs/super.c | 10 ++-- include/linux/backing-dev.h | 19 +++++++ include/linux/writeback.h | 3 + lib/percpu_counter.c | 1 mm/backing-dev.c | 109 ++++++++++++++++++++++++++++++++++++++++++++ mm/page-writeback.c | 2 10 files changed, 163 insertions(+), 21 deletions(-) Index: linux-2.6-2/block/genhd.c =================================================================== --- linux-2.6-2.orig/block/genhd.c +++ linux-2.6-2/block/genhd.c @@ -182,6 +182,8 @@ void add_disk(struct gendisk *disk) disk->minors, NULL, exact_match, exact_lock, disk); register_disk(disk); blk_register_queue(disk); + bdi_register(&disk->queue->backing_dev_info, NULL, + "%s", disk->disk_name); } EXPORT_SYMBOL(add_disk); @@ -190,6 +192,7 @@ EXPORT_SYMBOL(del_gendisk); /* in partit void unlink_gendisk(struct gendisk *disk) { blk_unregister_queue(disk); + bdi_unregister(&disk->queue->backing_dev_info); blk_unregister_region(MKDEV(disk->major, disk->first_minor), disk->minors); } Index: linux-2.6-2/fs/fuse/inode.c =================================================================== --- linux-2.6-2.orig/fs/fuse/inode.c +++ linux-2.6-2/fs/fuse/inode.c @@ -467,7 +467,8 @@ static struct fuse_conn *new_conn(void) atomic_set(&fc->num_waiting, 0); fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; fc->bdi.unplug_io_fn = default_unplug_io_fn; - err = bdi_init(&fc->bdi); + err = bdi_init_fmt(&fc->bdi, NULL, + "fuse-%llu", (unsigned long long)fc->id); if (err) { kfree(fc); fc = NULL; Index: linux-2.6-2/fs/nfs/client.c =================================================================== --- linux-2.6-2.orig/fs/nfs/client.c +++ linux-2.6-2/fs/nfs/client.c @@ -657,7 +657,8 @@ static void nfs_server_set_fsinfo(struct /* * Probe filesystem information, including the FSID on v2/v3 */ -static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, struct nfs_fattr *fattr) +static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, + struct nfs_fattr *fattr, const char *dev_name) { struct nfs_fsinfo fsinfo; struct nfs_client *clp = server->nfs_client; @@ -678,7 +679,8 @@ static int nfs_probe_fsinfo(struct nfs_s goto out_error; nfs_server_set_fsinfo(server, &fsinfo); - error = bdi_init(&server->backing_dev_info); + error = bdi_init_fmt(&server->backing_dev_info, NULL, + "nfs-%s", dev_name); if (error) goto out_error; @@ -772,7 +774,7 @@ void nfs_free_server(struct nfs_server * * - keyed on server and FSID */ struct nfs_server *nfs_create_server(const struct nfs_parsed_mount_data *data, - struct nfs_fh *mntfh) + struct nfs_fh *mntfh, const char *dev_name) { struct nfs_server *server; struct nfs_fattr fattr; @@ -792,7 +794,7 @@ struct nfs_server *nfs_create_server(con BUG_ON(!server->nfs_client->rpc_ops->file_inode_ops); /* Probe the root fh to retrieve its FSID */ - error = nfs_probe_fsinfo(server, mntfh, &fattr); + error = nfs_probe_fsinfo(server, mntfh, &fattr, dev_name); if (error < 0) goto error; if (server->nfs_client->rpc_ops->version == 3) { @@ -949,7 +951,7 @@ static int nfs4_init_server(struct nfs_s * - keyed on server and FSID */ struct nfs_server *nfs4_create_server(const struct nfs_parsed_mount_data *data, - struct nfs_fh *mntfh) + struct nfs_fh *mntfh, const char *dev_name) { struct nfs_fattr fattr; struct nfs_server *server; @@ -991,7 +993,7 @@ struct nfs_server *nfs4_create_server(co (unsigned long long) server->fsid.minor); dprintk("Mount FH: %d\n", mntfh->size); - error = nfs_probe_fsinfo(server, mntfh, &fattr); + error = nfs_probe_fsinfo(server, mntfh, &fattr, dev_name); if (error < 0) goto error; @@ -1021,7 +1023,8 @@ error: * Create an NFS4 referral server record */ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, - struct nfs_fh *mntfh) + struct nfs_fh *mntfh, + const char *dev_name) { struct nfs_client *parent_client; struct nfs_server *server, *parent_server; @@ -1066,7 +1069,7 @@ struct nfs_server *nfs4_create_referral_ goto error; /* probe the filesystem info for this server filesystem */ - error = nfs_probe_fsinfo(server, mntfh, &fattr); + error = nfs_probe_fsinfo(server, mntfh, &fattr, dev_name); if (error < 0) goto error; @@ -1100,7 +1103,8 @@ error: */ struct nfs_server *nfs_clone_server(struct nfs_server *source, struct nfs_fh *fh, - struct nfs_fattr *fattr) + struct nfs_fattr *fattr, + const char *dev_name) { struct nfs_server *server; struct nfs_fattr fattr_fsinfo; @@ -1128,7 +1132,7 @@ struct nfs_server *nfs_clone_server(stru nfs_init_server_aclclient(server); /* probe the filesystem info for this server filesystem */ - error = nfs_probe_fsinfo(server, fh, &fattr_fsinfo); + error = nfs_probe_fsinfo(server, fh, &fattr_fsinfo, dev_name); if (error < 0) goto out_free_server; Index: linux-2.6-2/include/linux/backing-dev.h =================================================================== --- linux-2.6-2.orig/include/linux/backing-dev.h +++ linux-2.6-2/include/linux/backing-dev.h @@ -11,6 +11,8 @@ #include <linux/percpu_counter.h> #include <linux/log2.h> #include <linux/proportions.h> +#include <linux/kernel.h> +#include <linux/device.h> #include <asm/atomic.h> struct page; @@ -48,11 +50,28 @@ struct backing_dev_info { struct prop_local_percpu completions; int dirty_exceeded; + + struct device *dev; }; int bdi_init(struct backing_dev_info *bdi); void bdi_destroy(struct backing_dev_info *bdi); +int bdi_register(struct backing_dev_info *bdi, struct device *parent, + const char *fmt, ...); +void bdi_unregister(struct backing_dev_info *bdi); + +#define bdi_init_fmt(bdi, parent, fmt...) \ + ({ \ + int ret = bdi_init(bdi); \ + if (!ret) { \ + ret = 0; /* bdi_register(bdi, parent, ##fmt); */ \ + if (ret) \ + bdi_destroy(bdi); \ + } \ + ret; \ + }) + static inline void __add_bdi_stat(struct backing_dev_info *bdi, enum bdi_stat_item item, s64 amount) { Index: linux-2.6-2/include/linux/writeback.h =================================================================== --- linux-2.6-2.orig/include/linux/writeback.h +++ linux-2.6-2/include/linux/writeback.h @@ -113,6 +113,9 @@ struct file; int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, + struct backing_dev_info *bdi); + void page_writeback_init(void); void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, unsigned long nr_pages_dirtied); Index: linux-2.6-2/mm/backing-dev.c =================================================================== --- linux-2.6-2.orig/mm/backing-dev.c +++ linux-2.6-2/mm/backing-dev.c @@ -4,12 +4,119 @@ #include <linux/fs.h> #include <linux/sched.h> #include <linux/module.h> +#include <linux/writeback.h> +#include <linux/device.h> + + +static struct class *bdi_class; + +static ssize_t read_ahead_kb_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + char *end; + + bdi->ra_pages = simple_strtoul(buf, &end, 10) >> (PAGE_SHIFT - 10); + + return end - buf; +} + +#define K(pages) ((pages) << (PAGE_SHIFT - 10)) + +#define BDI_SHOW(name, expr) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, char *page) \ +{ \ + struct backing_dev_info *bdi = dev_get_drvdata(dev); \ + \ + return snprintf(page, PAGE_SIZE-1, "%lld\n", (long long)expr); \ +} + +BDI_SHOW(read_ahead_kb, K(bdi->ra_pages)) + +BDI_SHOW(reclaimable_kb, K(bdi_stat(bdi, BDI_RECLAIMABLE))) +BDI_SHOW(writeback_kb, K(bdi_stat(bdi, BDI_WRITEBACK))) + +static inline unsigned long get_dirty(struct backing_dev_info *bdi, int i) +{ + unsigned long thresh[3]; + + get_dirty_limits(&thresh[0], &thresh[1], &thresh[2], bdi); + + return thresh[i]; +} + +BDI_SHOW(dirty_kb, K(get_dirty(bdi, 1))) +BDI_SHOW(bdi_dirty_kb, K(get_dirty(bdi, 2))) + +#define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store) + +static struct device_attribute bdi_dev_attrs[] = { + __ATTR_RW(read_ahead_kb), + __ATTR_RO(reclaimable_kb), + __ATTR_RO(writeback_kb), + __ATTR_RO(dirty_kb), + __ATTR_RO(bdi_dirty_kb), + __ATTR_NULL, +}; + +static __init int bdi_class_init(void) +{ + bdi_class = class_create(THIS_MODULE, "bdi"); + bdi_class->dev_attrs = bdi_dev_attrs; + return 0; +} + +__initcall(bdi_class_init); + +int bdi_register(struct backing_dev_info *bdi, struct device *parent, + const char *fmt, ...) +{ + char *name; + va_list args; + int ret = 0; + struct device *dev; + + va_start(args, fmt); + name = kvasprintf(GFP_KERNEL, fmt, args); + va_end(args); + + if (!name) + return -ENOMEM; + + dev = device_create(bdi_class, parent, MKDEV(0,0), name); + if (IS_ERR(dev)) { + ret = PTR_ERR(dev); + goto exit; + } + + bdi->dev = dev; + dev_set_drvdata(bdi->dev, bdi); + +exit: + kfree(name); + return ret; +} + +void bdi_unregister(struct backing_dev_info *bdi) +{ + if (bdi->dev) { + device_unregister(bdi->dev); + bdi->dev = NULL; + } +} + +EXPORT_SYMBOL(bdi_register); +EXPORT_SYMBOL(bdi_unregister); int bdi_init(struct backing_dev_info *bdi) { int i, j; int err; + bdi->dev = NULL; + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) { err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0); if (err) @@ -33,6 +140,8 @@ void bdi_destroy(struct backing_dev_info { int i; + bdi_unregister(bdi); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) percpu_counter_destroy(&bdi->bdi_stat[i]); Index: linux-2.6-2/mm/page-writeback.c =================================================================== --- linux-2.6-2.orig/mm/page-writeback.c +++ linux-2.6-2/mm/page-writeback.c @@ -295,7 +295,7 @@ static unsigned long determine_dirtyable return x + 1; /* Ensure that we never return 0 */ } -static void +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, struct backing_dev_info *bdi) { Index: linux-2.6-2/lib/percpu_counter.c =================================================================== --- linux-2.6-2.orig/lib/percpu_counter.c +++ linux-2.6-2/lib/percpu_counter.c @@ -102,6 +102,7 @@ void percpu_counter_destroy(struct percp return; free_percpu(fbc->counters); + fbc->counters = NULL; #ifdef CONFIG_HOTPLUG_CPU mutex_lock(&percpu_counters_lock); list_del(&fbc->list); Index: linux-2.6-2/fs/nfs/internal.h =================================================================== --- linux-2.6-2.orig/fs/nfs/internal.h +++ linux-2.6-2/fs/nfs/internal.h @@ -65,16 +65,18 @@ extern void nfs_put_client(struct nfs_cl extern struct nfs_client *nfs_find_client(const struct sockaddr_in *, int); extern struct nfs_server *nfs_create_server( const struct nfs_parsed_mount_data *, - struct nfs_fh *); + struct nfs_fh *, const char *); extern struct nfs_server *nfs4_create_server( const struct nfs_parsed_mount_data *, - struct nfs_fh *); + struct nfs_fh *, const char *); extern struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *, - struct nfs_fh *); + struct nfs_fh *, + const char *); extern void nfs_free_server(struct nfs_server *server); extern struct nfs_server *nfs_clone_server(struct nfs_server *, struct nfs_fh *, - struct nfs_fattr *); + struct nfs_fattr *, + const char *); #ifdef CONFIG_PROC_FS extern int __init nfs_fs_proc_init(void); extern void nfs_fs_proc_exit(void); Index: linux-2.6-2/fs/nfs/super.c =================================================================== --- linux-2.6-2.orig/fs/nfs/super.c +++ linux-2.6-2/fs/nfs/super.c @@ -1359,7 +1359,7 @@ static int nfs_get_sb(struct file_system goto out; /* Get a volume representation */ - server = nfs_create_server(&data, &mntfh); + server = nfs_create_server(&data, &mntfh, dev_name); if (IS_ERR(server)) { error = PTR_ERR(server); goto out; @@ -1442,7 +1442,7 @@ static int nfs_xdev_get_sb(struct file_s dprintk("--> nfs_xdev_get_sb()\n"); /* create a new volume representation */ - server = nfs_clone_server(NFS_SB(data->sb), data->fh, data->fattr); + server = nfs_clone_server(NFS_SB(data->sb), data->fh, data->fattr, dev_name); if (IS_ERR(server)) { error = PTR_ERR(server); goto out_err_noserver; @@ -1702,7 +1702,7 @@ static int nfs4_get_sb(struct file_syste goto out; /* Get a volume representation */ - server = nfs4_create_server(&data, &mntfh); + server = nfs4_create_server(&data, &mntfh, dev_name); if (IS_ERR(server)) { error = PTR_ERR(server); goto out; @@ -1787,7 +1787,7 @@ static int nfs4_xdev_get_sb(struct file_ dprintk("--> nfs4_xdev_get_sb()\n"); /* create a new volume representation */ - server = nfs_clone_server(NFS_SB(data->sb), data->fh, data->fattr); + server = nfs_clone_server(NFS_SB(data->sb), data->fh, data->fattr, dev_name); if (IS_ERR(server)) { error = PTR_ERR(server); goto out_err_noserver; @@ -1861,7 +1861,7 @@ static int nfs4_referral_get_sb(struct f dprintk("--> nfs4_referral_get_sb()\n"); /* create a new volume representation */ - server = nfs4_create_referral_server(data, &mntfh); + server = nfs4_create_referral_server(data, &mntfh, dev_name); if (IS_ERR(server)) { error = PTR_ERR(server); goto out_err_noserver; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup 2007-11-06 6:46 ` Stephen Rothwell 2007-11-06 12:20 ` Peter Zijlstra @ 2007-11-07 3:24 ` Stephen Rothwell 1 sibling, 0 replies; 12+ messages in thread From: Stephen Rothwell @ 2007-11-07 3:24 UTC (permalink / raw) To: David; +Cc: Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 625 bytes --] On Tue, 6 Nov 2007 17:46:26 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > I am seeing something very similar on a PowerPC machine where copying a > file from an LVM volume with ext3 on it to a simple scsi partition (again > ext3) on the same disk will hang in congestion_wait. If I am patient > enough, the copy makes very slow progress. A kill -9 will kill it > eventually, but a simple control-C will not. Turns out a simple control-C would kill the copy, I was just not patient enough :-) -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <E1IpJLy-0002ag-TL@localhost>]
* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup [not found] ` <E1IpJLy-0002ag-TL@localhost> @ 2007-11-06 8:00 ` Fengguang Wu 2007-11-06 18:03 ` David [not found] ` <E1IpJgH-0003H1-AD@localhost> 1 sibling, 1 reply; 12+ messages in thread From: Fengguang Wu @ 2007-11-06 8:00 UTC (permalink / raw) To: David; +Cc: Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 644 bytes --] On Mon, Nov 05, 2007 at 06:23:07PM +0000, David wrote: > I've been testing rc1 for a week or so, and about 25% of the time I'm > seeing Firefox and Thunderbird getting stuck in 'D' state as they startup. > > I've attached the output of Sysrq-T to this mail... system is a > dual-core AMD64, and files are on a RAID-1 root partition connected two > SATA disks on the on-board NVidia controller. I've had no problems > before .24 rc1 David, thank you for the reporting. Could you try with the attached 4 patches? Two of them are expected to fix your problem, another two are debugging ones(in case the problem persists). Thank you, Fengguang [-- Attachment #2: reiserfs-writeback-fix.patch --] [-- Type: text/x-diff, Size: 3264 bytes --] Subject: reiserfs: fix writeback Reiserfs could leave newly created sub-page-size files in dirty state for ever. They cannot be synced to disk by pdflush routines or an explicit `sync' command. Only `umount' can do the trick. This is not a new issue in 2.6.23-git17. 2.6.23 is buggy in the same way. The direct cause is, the dirty page's PG_dirty is cleared on reiserfs_file_release(). Call trace: [<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0 [<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710 [<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530 [<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0 [<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340 [<ffffffff802a187c>] __fput+0xcc/0x1b0 [<ffffffff802a1ba6>] fput+0x16/0x20 [<ffffffff8029e676>] filp_close+0x56/0x90 [<ffffffff8029fe0d>] sys_close+0xad/0x110 [<ffffffff8020c41e>] system_call+0x7e/0x83 Fix the problem by simply removing the cancel_dirty_page() call. Here are more detailed demonstrations of the problem: 1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to; and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed. ------------------------------ screen 0 ------------------------------ [T0] root /home/wfg# cat > /test/tiny [T1] hi [T2] root /home/wfg# ------------------------------ screen 1 ------------------------------ [T1] root /home/wfg# echo /test/tiny > /proc/filecache [T1] root /home/wfg# cat /proc/filecache # file /test/tiny # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback # idx len state refcnt 0 1 ___UD__Bd_ 2 [T2] root /home/wfg# cat /proc/filecache # file /test/tiny # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback # idx len state refcnt 0 1 ___U___Bd_ 2 2) note the non-zero `cancelled_write_bytes' after /tmp/hi is copied. ------------------------------ screen 0 ------------------------------ [T0] root /home/wfg# echo hi > /tmp/hi [T1] root /home/wfg# cp /tmp/hi /dev/stdin /test [T2] hi [T3] root /home/wfg# ------------------------------ screen 1 ------------------------------ [T1] root /proc/4397# cd /proc/`pidof cp` [T1] root /proc/4713# cat io rchar: 8396 wchar: 3 syscr: 20 syscw: 1 read_bytes: 0 write_bytes: 20480 cancelled_write_bytes: 4096 [T2] root /proc/4713# cat io rchar: 8399 wchar: 6 syscr: 21 syscw: 2 read_bytes: 0 write_bytes: 24576 cancelled_write_bytes: 4096 Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/reiserfs/stree.c | 3 --- 1 file changed, 3 deletions(-) --- linux-2.6.24-git17.orig/fs/reiserfs/stree.c +++ linux-2.6.24-git17/fs/reiserfs/stree.c @@ -1458,9 +1458,6 @@ static void unmap_buffers(struct page *p } bh = next; } while (bh != head); - if (PAGE_SIZE == bh->b_size) { - cancel_dirty_page(page, PAGE_CACHE_SIZE); - } } } } [-- Attachment #3: mm-speed-up-writeback-ramp-up-on-clean-systems.patch --] [-- Type: text/x-diff, Size: 1827 bytes --] From: Peter Zijlstra <a.p.zijlstra@chello.nl> Subject: mm: speed up writeback ramp-up on clean systems We allow violation of bdi limits if there is a lot of room on the system. Once we hit half the total limit we start enforcing bdi limits and bdi ramp-up should happen. Doing it this way avoids many small writeouts on an otherwise idle system and should also speed up the ramp-up. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- mm/page-writeback.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) --- linux-2.6.24-git17.orig/mm/page-writeback.c +++ linux-2.6.24-git17/mm/page-writeback.c @@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long */ static void balance_dirty_pages(struct address_space *mapping) { - long bdi_nr_reclaimable; - long bdi_nr_writeback; + long nr_reclaimable, bdi_nr_reclaimable; + long nr_writeback, bdi_nr_writeback; long background_thresh; long dirty_thresh; long bdi_thresh; @@ -376,11 +376,26 @@ static void balance_dirty_pages(struct a get_dirty_limits(&background_thresh, &dirty_thresh, &bdi_thresh, bdi); + + nr_reclaimable = global_page_state(NR_FILE_DIRTY) + + global_page_state(NR_UNSTABLE_NFS); + nr_writeback = global_page_state(NR_WRITEBACK); + bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) break; + /* + * Throttle it only when the background writeback cannot + * catch-up. This avoids (excessively) small writeouts + * when the bdi limits are ramping up. + */ + if (nr_reclaimable + nr_writeback < + (background_thresh + dirty_thresh) / 2) + break; + if (!bdi->dirty_exceeded) bdi->dirty_exceeded = 1; [-- Attachment #4: writeback-debug.patch --] [-- Type: text/x-diff, Size: 1926 bytes --] --- mm/page-writeback.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) --- linux-2.6.23-rc8-mm2.orig/mm/page-writeback.c +++ linux-2.6.23-rc8-mm2/mm/page-writeback.c @@ -98,6 +98,26 @@ EXPORT_SYMBOL(laptop_mode); /* End of sysctl-exported parameters */ +#define writeback_debug_report(n, wbc) do { \ + __writeback_debug_report(n, wbc, __FILE__, __LINE__, __FUNCTION__); \ +} while (0) + +void __writeback_debug_report(long n, struct writeback_control *wbc, + const char *file, int line, const char *func) +{ + printk("%s %d %s: %s(%d) %ld " + "global %lu %lu %lu " + "wc %c%c tw %ld sk %ld\n", + file, line, func, + current->comm, current->pid, n, + global_page_state(NR_FILE_DIRTY), + global_page_state(NR_WRITEBACK), + global_page_state(NR_UNSTABLE_NFS), + wbc->encountered_congestion ? 'C':'_', + wbc->more_io ? 'M':'_', + wbc->nr_to_write, + wbc->pages_skipped); +} static void background_writeout(unsigned long _min_pages); @@ -404,6 +424,7 @@ static void balance_dirty_pages(struct a pages_written += write_chunk - wbc.nr_to_write; get_dirty_limits(&background_thresh, &dirty_thresh, &bdi_thresh, bdi); + writeback_debug_report(pages_written, &wbc); } /* @@ -568,6 +589,7 @@ static void background_writeout(unsigned wbc.pages_skipped = 0; writeback_inodes(&wbc); min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write; + writeback_debug_report(min_pages, &wbc); if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) { /* Wrote less than expected */ if (wbc.encountered_congestion) @@ -643,6 +665,7 @@ static void wb_kupdate(unsigned long arg wbc.encountered_congestion = 0; wbc.nr_to_write = MAX_WRITEBACK_PAGES; writeback_inodes(&wbc); + writeback_debug_report(nr_to_write, &wbc); if (wbc.nr_to_write > 0) { if (wbc.encountered_congestion) congestion_wait(WRITE, HZ/10); [-- Attachment #5: requeue_io-debug.patch --] [-- Type: text/x-diff, Size: 1140 bytes --] Subject: track redirty_tail() calls It helps a lot to know how redirty_tail() are called. Cc: Ken Chen <kenchen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) --- linux-2.6.24-git17.orig/fs/fs-writeback.c +++ linux-2.6.24-git17/fs/fs-writeback.c @@ -164,12 +164,26 @@ static void redirty_tail(struct inode *i list_move(&inode->i_list, &sb->s_dirty); } +#define requeue_io(inode) \ + do { \ + __requeue_io(inode, __LINE__); \ + } while (0) + /* * requeue inode for re-scanning after sb->s_io list is exhausted. */ -static void requeue_io(struct inode *inode) +static void __requeue_io(struct inode *inode, int line) { list_move(&inode->i_list, &inode->i_sb->s_more_io); + + printk(KERN_DEBUG "requeue_io %d: inode %lu size %llu at %02x:%02x(%s)\n", + line, + inode->i_ino, + i_size_read(inode), + MAJOR(inode->i_sb->s_dev), + MINOR(inode->i_sb->s_dev), + inode->i_sb->s_id + ); } static void inode_sync_complete(struct inode *inode) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup 2007-11-06 8:00 ` Fengguang Wu @ 2007-11-06 18:03 ` David 0 siblings, 0 replies; 12+ messages in thread From: David @ 2007-11-06 18:03 UTC (permalink / raw) To: Fengguang Wu; +Cc: Linux Kernel Mailing List Fengguang Wu wrote: > On Mon, Nov 05, 2007 at 06:23:07PM +0000, David wrote: > >> I've attached the output of Sysrq-T to this mail... system is a >> dual-core AMD64, and files are on a RAID-1 root partition connected two >> SATA disks on the on-board NVidia controller. I've had no problems >> before .24 rc1 >> > > Could you try with the attached 4 patches? Two of them are expected to > fix your problem, another two are debugging ones(in case the problem > persists). > > I've applied the patches, and have tried a few reboots with no problems so far. I will report back if I see any further problems. Thanks David ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <E1IpJgH-0003H1-AD@localhost>]
* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup [not found] ` <E1IpJgH-0003H1-AD@localhost> @ 2007-11-06 8:21 ` Fengguang Wu 2007-11-07 3:17 ` Stephen Rothwell 0 siblings, 1 reply; 12+ messages in thread From: Fengguang Wu @ 2007-11-06 8:21 UTC (permalink / raw) To: David Cc: Stephen Rothwell, Andrew Morton, Linux Kernel Mailing List, Peter Zijlstra [added CC list] On Tue, Nov 06, 2007 at 04:00:06PM +0800, Fengguang Wu wrote: > On Mon, Nov 05, 2007 at 06:23:07PM +0000, David wrote: > > I've been testing rc1 for a week or so, and about 25% of the time I'm > > seeing Firefox and Thunderbird getting stuck in 'D' state as they startup. > > > > I've attached the output of Sysrq-T to this mail... system is a > > dual-core AMD64, and files are on a RAID-1 root partition connected two > > SATA disks on the on-board NVidia controller. I've had no problems > > before .24 rc1 > > David, thank you for the reporting. > > Could you try with the attached 4 patches? Two of them are expected to > fix your problem, another two are debugging ones(in case the problem > persists). > > Thank you, > Fengguang > Subject: reiserfs: fix writeback > > Reiserfs could leave newly created sub-page-size files in dirty state for ever. > They cannot be synced to disk by pdflush routines or an explicit `sync' command. > Only `umount' can do the trick. > > This is not a new issue in 2.6.23-git17. 2.6.23 is buggy in the same way. > > The direct cause is, the dirty page's PG_dirty is cleared on > reiserfs_file_release(). Call trace: > > [<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0 > [<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710 > [<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530 > [<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0 > [<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340 > [<ffffffff802a187c>] __fput+0xcc/0x1b0 > [<ffffffff802a1ba6>] fput+0x16/0x20 > [<ffffffff8029e676>] filp_close+0x56/0x90 > [<ffffffff8029fe0d>] sys_close+0xad/0x110 > [<ffffffff8020c41e>] system_call+0x7e/0x83 > > Fix the problem by simply removing the cancel_dirty_page() call. > > > Here are more detailed demonstrations of the problem: > > 1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to; > and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed. > > ------------------------------ screen 0 ------------------------------ > [T0] root /home/wfg# cat > /test/tiny > [T1] hi > [T2] root /home/wfg# > > ------------------------------ screen 1 ------------------------------ > [T1] root /home/wfg# echo /test/tiny > /proc/filecache > [T1] root /home/wfg# cat /proc/filecache > # file /test/tiny > # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback > # idx len state refcnt > 0 1 ___UD__Bd_ 2 > [T2] root /home/wfg# cat /proc/filecache > # file /test/tiny > # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback > # idx len state refcnt > 0 1 ___U___Bd_ 2 > > 2) note the non-zero `cancelled_write_bytes' after /tmp/hi is copied. > > ------------------------------ screen 0 ------------------------------ > [T0] root /home/wfg# echo hi > /tmp/hi > [T1] root /home/wfg# cp /tmp/hi /dev/stdin /test > [T2] hi > [T3] root /home/wfg# > > ------------------------------ screen 1 ------------------------------ > [T1] root /proc/4397# cd /proc/`pidof cp` > [T1] root /proc/4713# cat io > rchar: 8396 > wchar: 3 > syscr: 20 > syscw: 1 > read_bytes: 0 > write_bytes: 20480 > cancelled_write_bytes: 4096 > [T2] root /proc/4713# cat io > rchar: 8399 > wchar: 6 > syscr: 21 > syscw: 2 > read_bytes: 0 > write_bytes: 24576 > cancelled_write_bytes: 4096 > > Cc: Maxim Levitsky <maximlevitsky@gmail.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> > --- > fs/reiserfs/stree.c | 3 --- > 1 file changed, 3 deletions(-) > > --- linux-2.6.24-git17.orig/fs/reiserfs/stree.c > +++ linux-2.6.24-git17/fs/reiserfs/stree.c > @@ -1458,9 +1458,6 @@ static void unmap_buffers(struct page *p > } > bh = next; > } while (bh != head); > - if (PAGE_SIZE == bh->b_size) { > - cancel_dirty_page(page, PAGE_CACHE_SIZE); > - } > } > } > } > From: Peter Zijlstra <a.p.zijlstra@chello.nl> > Subject: mm: speed up writeback ramp-up on clean systems > > We allow violation of bdi limits if there is a lot of room on the > system. Once we hit half the total limit we start enforcing bdi limits > and bdi ramp-up should happen. Doing it this way avoids many small > writeouts on an otherwise idle system and should also speed up the > ramp-up. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> > --- > mm/page-writeback.c | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > --- linux-2.6.24-git17.orig/mm/page-writeback.c > +++ linux-2.6.24-git17/mm/page-writeback.c > @@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long > */ > static void balance_dirty_pages(struct address_space *mapping) > { > - long bdi_nr_reclaimable; > - long bdi_nr_writeback; > + long nr_reclaimable, bdi_nr_reclaimable; > + long nr_writeback, bdi_nr_writeback; > long background_thresh; > long dirty_thresh; > long bdi_thresh; > @@ -376,11 +376,26 @@ static void balance_dirty_pages(struct a > > get_dirty_limits(&background_thresh, &dirty_thresh, > &bdi_thresh, bdi); > + > + nr_reclaimable = global_page_state(NR_FILE_DIRTY) + > + global_page_state(NR_UNSTABLE_NFS); > + nr_writeback = global_page_state(NR_WRITEBACK); > + > bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > + > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > break; > > + /* > + * Throttle it only when the background writeback cannot > + * catch-up. This avoids (excessively) small writeouts > + * when the bdi limits are ramping up. > + */ > + if (nr_reclaimable + nr_writeback < > + (background_thresh + dirty_thresh) / 2) > + break; > + > if (!bdi->dirty_exceeded) > bdi->dirty_exceeded = 1; > > --- > mm/page-writeback.c | 23 +++++++++++++++++++++++ > 1 file changed, 23 insertions(+) > > --- linux-2.6.23-rc8-mm2.orig/mm/page-writeback.c > +++ linux-2.6.23-rc8-mm2/mm/page-writeback.c > @@ -98,6 +98,26 @@ EXPORT_SYMBOL(laptop_mode); > > /* End of sysctl-exported parameters */ > > +#define writeback_debug_report(n, wbc) do { \ > + __writeback_debug_report(n, wbc, __FILE__, __LINE__, __FUNCTION__); \ > +} while (0) > + > +void __writeback_debug_report(long n, struct writeback_control *wbc, > + const char *file, int line, const char *func) > +{ > + printk("%s %d %s: %s(%d) %ld " > + "global %lu %lu %lu " > + "wc %c%c tw %ld sk %ld\n", > + file, line, func, > + current->comm, current->pid, n, > + global_page_state(NR_FILE_DIRTY), > + global_page_state(NR_WRITEBACK), > + global_page_state(NR_UNSTABLE_NFS), > + wbc->encountered_congestion ? 'C':'_', > + wbc->more_io ? 'M':'_', > + wbc->nr_to_write, > + wbc->pages_skipped); > +} > > static void background_writeout(unsigned long _min_pages); > > @@ -404,6 +424,7 @@ static void balance_dirty_pages(struct a > pages_written += write_chunk - wbc.nr_to_write; > get_dirty_limits(&background_thresh, &dirty_thresh, > &bdi_thresh, bdi); > + writeback_debug_report(pages_written, &wbc); > } > > /* > @@ -568,6 +589,7 @@ static void background_writeout(unsigned > wbc.pages_skipped = 0; > writeback_inodes(&wbc); > min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write; > + writeback_debug_report(min_pages, &wbc); > if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) { > /* Wrote less than expected */ > if (wbc.encountered_congestion) > @@ -643,6 +665,7 @@ static void wb_kupdate(unsigned long arg > wbc.encountered_congestion = 0; > wbc.nr_to_write = MAX_WRITEBACK_PAGES; > writeback_inodes(&wbc); > + writeback_debug_report(nr_to_write, &wbc); > if (wbc.nr_to_write > 0) { > if (wbc.encountered_congestion) > congestion_wait(WRITE, HZ/10); > Subject: track redirty_tail() calls > > It helps a lot to know how redirty_tail() are called. > > Cc: Ken Chen <kenchen@google.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> > --- > fs/fs-writeback.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > --- linux-2.6.24-git17.orig/fs/fs-writeback.c > +++ linux-2.6.24-git17/fs/fs-writeback.c > @@ -164,12 +164,26 @@ static void redirty_tail(struct inode *i > list_move(&inode->i_list, &sb->s_dirty); > } > > +#define requeue_io(inode) \ > + do { \ > + __requeue_io(inode, __LINE__); \ > + } while (0) > + > /* > * requeue inode for re-scanning after sb->s_io list is exhausted. > */ > -static void requeue_io(struct inode *inode) > +static void __requeue_io(struct inode *inode, int line) > { > list_move(&inode->i_list, &inode->i_sb->s_more_io); > + > + printk(KERN_DEBUG "requeue_io %d: inode %lu size %llu at %02x:%02x(%s)\n", > + line, > + inode->i_ino, > + i_size_read(inode), > + MAJOR(inode->i_sb->s_dev), > + MINOR(inode->i_sb->s_dev), > + inode->i_sb->s_id > + ); > } > > static void inode_sync_complete(struct inode *inode) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup 2007-11-06 8:21 ` Fengguang Wu @ 2007-11-07 3:17 ` Stephen Rothwell 2007-11-07 3:26 ` Stephen Rothwell 0 siblings, 1 reply; 12+ messages in thread From: Stephen Rothwell @ 2007-11-07 3:17 UTC (permalink / raw) To: Fengguang Wu Cc: David, Andrew Morton, Linux Kernel Mailing List, Peter Zijlstra [-- Attachment #1: Type: text/plain, Size: 462 bytes --] On Tue, Nov 06, 2007 at 04:00:06PM +0800, Fengguang Wu wrote: > > Could you try with the attached 4 patches? Two of them are expected to > fix your problem, another two are debugging ones(in case the problem > persists). Applying these four patches fixes it for me. Obviously the reiserfs patch was not relevant in my case (only using ext3). -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup 2007-11-07 3:17 ` Stephen Rothwell @ 2007-11-07 3:26 ` Stephen Rothwell [not found] ` <E1IpegZ-0001aK-OJ@localhost> 0 siblings, 1 reply; 12+ messages in thread From: Stephen Rothwell @ 2007-11-07 3:26 UTC (permalink / raw) To: Fengguang Wu Cc: David, Andrew Morton, Linux Kernel Mailing List, Peter Zijlstra [-- Attachment #1: Type: text/plain, Size: 699 bytes --] On Wed, 7 Nov 2007 14:17:17 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > On Tue, Nov 06, 2007 at 04:00:06PM +0800, Fengguang Wu wrote: > > > > Could you try with the attached 4 patches? Two of them are expected to > > fix your problem, another two are debugging ones(in case the problem > > persists). > > Applying these four patches fixes it for me. Obviously the reiserfs patch > was not relevant in my case (only using ext3). I am now running on a kernel with just the mm-speed-up-writeback-ramp-up-on-clean-systems.patch applied and I am seeing no hangs. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <E1IpegZ-0001aK-OJ@localhost>]
* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup [not found] ` <E1IpegZ-0001aK-OJ@localhost> @ 2007-11-07 6:46 ` Fengguang Wu 2007-11-13 5:11 ` Stephen Rothwell 0 siblings, 1 reply; 12+ messages in thread From: Fengguang Wu @ 2007-11-07 6:46 UTC (permalink / raw) To: Stephen Rothwell Cc: David, Andrew Morton, Linux Kernel Mailing List, Peter Zijlstra On Wed, Nov 07, 2007 at 02:26:09PM +1100, Stephen Rothwell wrote: > On Wed, 7 Nov 2007 14:17:17 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > > > On Tue, Nov 06, 2007 at 04:00:06PM +0800, Fengguang Wu wrote: > > > > > > Could you try with the attached 4 patches? Two of them are expected to > > > fix your problem, another two are debugging ones(in case the problem > > > persists). > > > > Applying these four patches fixes it for me. Obviously the reiserfs patch > > was not relevant in my case (only using ext3). > > I am now running on a kernel with just the > mm-speed-up-writeback-ramp-up-on-clean-systems.patch applied and I am > seeing no hangs. Thank you(including David:-)) for the confirmation. Andrew: so mm-speed-up-writeback-ramp-up-on-clean-systems.patch is a safe and working patch ;-) Fengguang ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup 2007-11-07 6:46 ` Fengguang Wu @ 2007-11-13 5:11 ` Stephen Rothwell 2007-11-13 5:29 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: Stephen Rothwell @ 2007-11-13 5:11 UTC (permalink / raw) To: Fengguang Wu Cc: David, Andrew Morton, Linux Kernel Mailing List, Peter Zijlstra [-- Attachment #1: Type: text/plain, Size: 469 bytes --] On Wed, 7 Nov 2007 14:46:47 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn> wrote: > > Thank you(including David:-)) for the confirmation. > > Andrew: so mm-speed-up-writeback-ramp-up-on-clean-systems.patch is a > safe and working patch ;-) So is anything happening with this patch? It is really necessary to have it (or something equivalent) in 2.6.24. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.24-rc1 - Regularly getting processes stuck in D state on startup 2007-11-13 5:11 ` Stephen Rothwell @ 2007-11-13 5:29 ` Andrew Morton 0 siblings, 0 replies; 12+ messages in thread From: Andrew Morton @ 2007-11-13 5:29 UTC (permalink / raw) To: Stephen Rothwell Cc: Fengguang Wu, David, Linux Kernel Mailing List, Peter Zijlstra On Tue, 13 Nov 2007 16:11:45 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote: > On Wed, 7 Nov 2007 14:46:47 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn> wrote: > > > > Thank you(including David:-)) for the confirmation. > > > > Andrew: so mm-speed-up-writeback-ramp-up-on-clean-systems.patch is a > > safe and working patch ;-) > > So is anything happening with this patch? It is really necessary to have > it (or something equivalent) in 2.6.24. > It's in my queue of 2.6.24 stuff. Along with, umm, 112 other patches. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-11-13 5:29 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-05 18:23 2.6.24-rc1 - Regularly getting processes stuck in D state on startup David
2007-11-06 6:46 ` Stephen Rothwell
2007-11-06 12:20 ` Peter Zijlstra
2007-11-07 3:24 ` Stephen Rothwell
[not found] ` <E1IpJLy-0002ag-TL@localhost>
2007-11-06 8:00 ` Fengguang Wu
2007-11-06 18:03 ` David
[not found] ` <E1IpJgH-0003H1-AD@localhost>
2007-11-06 8:21 ` Fengguang Wu
2007-11-07 3:17 ` Stephen Rothwell
2007-11-07 3:26 ` Stephen Rothwell
[not found] ` <E1IpegZ-0001aK-OJ@localhost>
2007-11-07 6:46 ` Fengguang Wu
2007-11-13 5:11 ` Stephen Rothwell
2007-11-13 5:29 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox