* Temporary lockup on loopback block device
@ 2007-11-10 19:51 Mikulas Patocka
2007-11-10 22:54 ` Andrew Morton
0 siblings, 1 reply; 11+ messages in thread
From: Mikulas Patocka @ 2007-11-10 19:51 UTC (permalink / raw)
To: linux-kernel
Hi
I am experiencing a transient lockup in 'D' state with loopback device. It
happens when process writes to a filesystem in loopback with command like
dd if=/dev/zero of=/s/fill bs=4k
CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in
congestion_wait called from balance_dirty_pages.
After about 30 seconds, the lockup is gone and dd resumes, but it locks up
soon again.
I added a printk to the balance_dirty_pages
printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d,
pages_written %d, write_chunk %d\n", nr_reclaimable,
global_page_state(NR_WRITEBACK), dirty_thresh, pages_written,
write_chunk);
and it shows this during the lockup:
wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985,
pages_written 1021, write_chunk 1522
wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985,
pages_written 1021, write_chunk 1522
wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985,
pages_written 1021, write_chunk 1522
What apparently happens:
writeback_inodes syncs inodes only on the given wbc->bdi, however
balance_dirty_pages checks against global counts of dirty pages. So if
there's nothing to sync on a given device, but there are other dirty pages
so that the counts are over the limit, it will loop without doing any
work.
To reproduce it, you need totally idle machine (no GUI, etc.) -- if
something writes to the backing device, it flushes the dirty pages
generated by the loopback and the lockup is gone. If you add printk, don't
forget to stop klogd, otherwise logging would end the lockup.
The hotfix (that I verified to work) is to not set wbc->bdi, so that all
devices are flushed ... but the code probably needs some redesign (i.e.
either account per-device and flush per-device, or account-global and
flush-global).
Mikulas
diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
--- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.000000000 +0200
+++ mm/page-writeback.c 2007-11-10 20:32:43.000000000 +0100
@@ -214,7 +214,6 @@
for (;;) {
struct writeback_control wbc = {
- .bdi = bdi,
.sync_mode = WB_SYNC_NONE,
.older_than_this = NULL,
.nr_to_write = write_chunk,
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Temporary lockup on loopback block device 2007-11-10 19:51 Temporary lockup on loopback block device Mikulas Patocka @ 2007-11-10 22:54 ` Andrew Morton 2007-11-10 23:02 ` Peter Zijlstra 2007-11-11 0:33 ` Mikulas Patocka 0 siblings, 2 replies; 11+ messages in thread From: Andrew Morton @ 2007-11-10 22:54 UTC (permalink / raw) To: Mikulas Patocka; +Cc: linux-kernel, Peter Zijlstra, WU Fengguang On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > Hi > > I am experiencing a transient lockup in 'D' state with loopback device. It > happens when process writes to a filesystem in loopback with command like > dd if=/dev/zero of=/s/fill bs=4k > > CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in > congestion_wait called from balance_dirty_pages. > > After about 30 seconds, the lockup is gone and dd resumes, but it locks up > soon again. > > I added a printk to the balance_dirty_pages > printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, > pages_written %d, write_chunk %d\n", nr_reclaimable, > global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, > write_chunk); > > and it shows this during the lockup: > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > pages_written 1021, write_chunk 1522 > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > pages_written 1021, write_chunk 1522 > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > pages_written 1021, write_chunk 1522 > > What apparently happens: > > writeback_inodes syncs inodes only on the given wbc->bdi, however > balance_dirty_pages checks against global counts of dirty pages. So if > there's nothing to sync on a given device, but there are other dirty pages > so that the counts are over the limit, it will loop without doing any > work. > > To reproduce it, you need totally idle machine (no GUI, etc.) -- if > something writes to the backing device, it flushes the dirty pages > generated by the loopback and the lockup is gone. If you add printk, don't > forget to stop klogd, otherwise logging would end the lockup. erk. > The hotfix (that I verified to work) is to not set wbc->bdi, so that all > devices are flushed ... but the code probably needs some redesign (i.e. > either account per-device and flush per-device, or account-global and > flush-global). > > Mikulas > > > diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c > --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.000000000 +0200 > +++ mm/page-writeback.c 2007-11-10 20:32:43.000000000 +0100 > @@ -214,7 +214,6 @@ > > for (;;) { > struct writeback_control wbc = { > - .bdi = bdi, > .sync_mode = WB_SYNC_NONE, > .older_than_this = NULL, > .nr_to_write = write_chunk, Arguably we just have the wrong backing-device here, and what we should do is to propagate the real backing device's pointer through up into the filesystem. There's machinery for this which things like DM stacks use. I wonder if the post-2.6.23 changes happened to make this problem go away. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Temporary lockup on loopback block device 2007-11-10 22:54 ` Andrew Morton @ 2007-11-10 23:02 ` Peter Zijlstra 2007-11-11 0:38 ` Mikulas Patocka 2007-11-11 0:33 ` Mikulas Patocka 1 sibling, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2007-11-10 23:02 UTC (permalink / raw) To: Andrew Morton; +Cc: Mikulas Patocka, linux-kernel, WU Fengguang, Miklos Szeredi On Sat, 2007-11-10 at 14:54 -0800, Andrew Morton wrote: > On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > > > Hi > > > > I am experiencing a transient lockup in 'D' state with loopback device. It > > happens when process writes to a filesystem in loopback with command like > > dd if=/dev/zero of=/s/fill bs=4k > > > > CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in > > congestion_wait called from balance_dirty_pages. > > > > After about 30 seconds, the lockup is gone and dd resumes, but it locks up > > soon again. > > > > I added a printk to the balance_dirty_pages > > printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, > > pages_written %d, write_chunk %d\n", nr_reclaimable, > > global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, > > write_chunk); > > > > and it shows this during the lockup: > > > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > > pages_written 1021, write_chunk 1522 > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > > pages_written 1021, write_chunk 1522 > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > > pages_written 1021, write_chunk 1522 > > > > What apparently happens: > > > > writeback_inodes syncs inodes only on the given wbc->bdi, however > > balance_dirty_pages checks against global counts of dirty pages. So if > > there's nothing to sync on a given device, but there are other dirty pages > > so that the counts are over the limit, it will loop without doing any > > work. > > > > To reproduce it, you need totally idle machine (no GUI, etc.) -- if > > something writes to the backing device, it flushes the dirty pages > > generated by the loopback and the lockup is gone. If you add printk, don't > > forget to stop klogd, otherwise logging would end the lockup. > > erk. known issue. > > The hotfix (that I verified to work) is to not set wbc->bdi, so that all > > devices are flushed ... but the code probably needs some redesign (i.e. > > either account per-device and flush per-device, or account-global and > > flush-global). .24 will have the per-device solution. > > > > diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c > > --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.000000000 +0200 > > +++ mm/page-writeback.c 2007-11-10 20:32:43.000000000 +0100 > > @@ -214,7 +214,6 @@ > > > > for (;;) { > > struct writeback_control wbc = { > > - .bdi = bdi, > > .sync_mode = WB_SYNC_NONE, > > .older_than_this = NULL, > > .nr_to_write = write_chunk, > > Arguably we just have the wrong backing-device here, and what we should do > is to propagate the real backing device's pointer through up into the > filesystem. There's machinery for this which things like DM stacks use. > > I wonder if the post-2.6.23 changes happened to make this problem go away. The per BDI dirty stuff in 24 should make this work, I just checked and loopback thingies seem to have their own BDI, so all should be well. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Temporary lockup on loopback block device 2007-11-10 23:02 ` Peter Zijlstra @ 2007-11-11 0:38 ` Mikulas Patocka 2007-11-11 7:50 ` Miklos Szeredi 0 siblings, 1 reply; 11+ messages in thread From: Mikulas Patocka @ 2007-11-11 0:38 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Andrew Morton, linux-kernel, WU Fengguang, Miklos Szeredi > > Arguably we just have the wrong backing-device here, and what we should do > > is to propagate the real backing device's pointer through up into the > > filesystem. There's machinery for this which things like DM stacks use. > > > > I wonder if the post-2.6.23 changes happened to make this problem go away. > > The per BDI dirty stuff in 24 should make this work, I just checked and > loopback thingies seem to have their own BDI, so all should be well. This is not only about loopback (I think the lockup can happen even without loopback) --- the main problem is: Why are there over-limit dirty pages that no one is writing? Mikulas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Temporary lockup on loopback block device 2007-11-11 0:38 ` Mikulas Patocka @ 2007-11-11 7:50 ` Miklos Szeredi 2007-11-11 18:29 ` Mikulas Patocka 0 siblings, 1 reply; 11+ messages in thread From: Miklos Szeredi @ 2007-11-11 7:50 UTC (permalink / raw) To: mikulas; +Cc: a.p.zijlstra, akpm, linux-kernel, wfg, miklos > > > Arguably we just have the wrong backing-device here, and what we should do > > > is to propagate the real backing device's pointer through up into the > > > filesystem. There's machinery for this which things like DM stacks use. > > > > > > I wonder if the post-2.6.23 changes happened to make this problem go away. > > > > The per BDI dirty stuff in 24 should make this work, I just checked and > > loopback thingies seem to have their own BDI, so all should be well. > > This is not only about loopback (I think the lockup can happen even > without loopback) --- the main problem is: > > Why are there over-limit dirty pages that no one is writing? Please do a sysrq-t, and cat /proc/vmstat during the hang. Those will show us what exactly is happening. I've seen this type of hang many times, and I agree with Peter, that it's probably about loopback, and is fixed in 2.6.24-rc. Thanks, Miklos ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Temporary lockup on loopback block device 2007-11-11 7:50 ` Miklos Szeredi @ 2007-11-11 18:29 ` Mikulas Patocka 2007-11-12 13:32 ` Miklos Szeredi 0 siblings, 1 reply; 11+ messages in thread From: Mikulas Patocka @ 2007-11-11 18:29 UTC (permalink / raw) To: Miklos Szeredi; +Cc: a.p.zijlstra, akpm, linux-kernel, wfg > > Why are there over-limit dirty pages that no one is writing? > > Please do a sysrq-t, and cat /proc/vmstat during the hang. Those > will show us what exactly is happening. I did and I posted relevant information from my finding --- it looped in balance_dirty_pages. > I've seen this type of hang many times, and I agree with Peter, that > it's probably about loopback, and is fixed in 2.6.24-rc. On 2.6.23 it could happen even without loopback --- loopback just made it happen very often. 2.6.24 seems ok. Mikulas > Thanks, > Miklos > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Temporary lockup on loopback block device 2007-11-11 18:29 ` Mikulas Patocka @ 2007-11-12 13:32 ` Miklos Szeredi 2007-11-15 22:35 ` Mikulas Patocka 0 siblings, 1 reply; 11+ messages in thread From: Miklos Szeredi @ 2007-11-12 13:32 UTC (permalink / raw) To: mikulas; +Cc: a.p.zijlstra, akpm, linux-kernel, wfg > On 2.6.23 it could happen even without loopback Let's focus on this point, because we already know how the lockup happens _with_ loopback and any other kind of bdi stacking. Can you describe the setup? Or better still, can you reproduce it and post the sysrq-t output? Thanks, Miklos ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Temporary lockup on loopback block device 2007-11-12 13:32 ` Miklos Szeredi @ 2007-11-15 22:35 ` Mikulas Patocka 0 siblings, 0 replies; 11+ messages in thread From: Mikulas Patocka @ 2007-11-15 22:35 UTC (permalink / raw) To: Miklos Szeredi; +Cc: a.p.zijlstra, akpm, linux-kernel, wfg > > On 2.6.23 it could happen even without loopback > > Let's focus on this point, because we already know how the lockup > happens _with_ loopback and any other kind of bdi stacking. > > Can you describe the setup? Or better still, can you reproduce it and > post the sysrq-t output? Hi The trace is this, it is perfectly reproducible. It is 128M machine, Pentium 2 300MHz, host filesystem ext2, loop filesystems ext2 and spadfs (both of them locked up). But the problem is really over in 2.6.24, I think there is no more need to investigate it. Mikulas Nov 10 19:34:45 gerlinda kernel: SysRq : HELP : loglevel0-8 reBoot tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks Nov 10 19:34:53 gerlinda kernel: SysRq : Show Blocked State Nov 10 19:34:53 gerlinda kernel: task PC stack pid father Nov 10 19:34:54 gerlinda kernel: dd D 00000286 0 4603 2985 Nov 10 19:34:55 gerlinda kernel: c580bcdc 00000086 c0308c20 00000286 00000286 c580bcec 002a4e87 00000000 Nov 10 19:34:55 gerlinda kernel: c580bd10 c0284bba c580bd1c 00000000 c03775e0 c03775e0 002a4e87 c011d050 Nov 10 19:34:55 gerlinda kernel: c117c030 c03771a0 00000064 c02f8eb4 c0283efe c580bd44 c0145ebc 00000000 Nov 10 19:34:55 gerlinda kernel: Call Trace: Nov 10 19:34:55 gerlinda kernel: [<c0284bba>] schedule_timeout+0x4a/0xc0 Nov 10 19:34:55 gerlinda kernel: [<c011d050>] process_timeout+0x0/0x10 Nov 10 19:34:55 gerlinda kernel: [<c0283efe>] io_schedule_timeout+0xe/0x20 Nov 10 19:34:55 gerlinda kernel: [<c0145ebc>] congestion_wait+0x6c/0x90 Nov 10 19:34:55 gerlinda kernel: [<c01274e0>] autoremove_wake_function+0x0/0x50Nov 10 19:34:55 gerlinda kernel: [<c014135f>] balance_dirty_pages_ratelimited_nr+0x11f/0x1e0 Nov 10 19:34:55 gerlinda kernel: [<c013cb98>] generic_file_buffered_write+0x2f8/0x6f0 Nov 10 19:34:55 gerlinda kernel: [<c01198b7>] irq_exit+0x47/0x70 Nov 10 19:34:55 gerlinda kernel: [<c01049e7>] do_IRQ+0x47/0x80 Nov 10 19:34:55 gerlinda kernel: [<c0102cbf>] common_interrupt+0x23/0x28 Nov 10 19:34:55 gerlinda kernel: [<c013d1e3>] __generic_file_aio_write_nolock+0x253/0x540 Nov 10 19:34:55 gerlinda kernel: [<c012a87b>] hrtimer_run_queues+0x6b/0x290 Nov 10 19:34:55 gerlinda kernel: [<c013d526>] generic_file_aio_write+0x56/0xd0 Nov 10 19:34:55 gerlinda kernel: [<c012ed9f>] tick_handle_periodic+0xf/0x70 Nov 10 19:34:55 gerlinda kernel: [<c015a1d6>] do_sync_write+0xc6/0x110 Nov 10 19:34:55 gerlinda kernel: [<c01274e0>] autoremove_wake_function+0x0/0x50Nov 10 19:34:55 gerlinda kernel: [<c01c604f>] clear_user+0x2f/0x50 Nov 10 19:34:55 gerlinda kernel: [<c0120000>] ptrace_notify+0x30/0x90 Nov 10 19:34:55 gerlinda kernel: [<c015aa56>] vfs_write+0xa6/0x140 Nov 10 19:34:55 gerlinda kernel: [<c8926310>] SPADFS_FILE_WRITE+0x0/0x10 [spadfs] Nov 10 19:34:55 gerlinda kernel: [<c015b031>] sys_write+0x41/0x70 Nov 10 19:34:55 gerlinda kernel: [<c0102b16>] syscall_call+0x7/0xb Nov 10 19:34:55 gerlinda kernel: ======================= > Thanks, > Miklos > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Temporary lockup on loopback block device 2007-11-10 22:54 ` Andrew Morton 2007-11-10 23:02 ` Peter Zijlstra @ 2007-11-11 0:33 ` Mikulas Patocka 2007-11-11 3:56 ` Mikulas Patocka 1 sibling, 1 reply; 11+ messages in thread From: Mikulas Patocka @ 2007-11-11 0:33 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Peter Zijlstra, WU Fengguang On Sat, 10 Nov 2007, Andrew Morton wrote: > On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > > > Hi > > > > I am experiencing a transient lockup in 'D' state with loopback device. It > > happens when process writes to a filesystem in loopback with command like > > dd if=/dev/zero of=/s/fill bs=4k > > > > CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in > > congestion_wait called from balance_dirty_pages. > > > > After about 30 seconds, the lockup is gone and dd resumes, but it locks up > > soon again. > > > > I added a printk to the balance_dirty_pages > > printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, > > pages_written %d, write_chunk %d\n", nr_reclaimable, > > global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, > > write_chunk); > > > > and it shows this during the lockup: > > > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > > pages_written 1021, write_chunk 1522 > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > > pages_written 1021, write_chunk 1522 > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > > pages_written 1021, write_chunk 1522 > > > > What apparently happens: > > > > writeback_inodes syncs inodes only on the given wbc->bdi, however > > balance_dirty_pages checks against global counts of dirty pages. So if > > there's nothing to sync on a given device, but there are other dirty pages > > so that the counts are over the limit, it will loop without doing any > > work. > > > > To reproduce it, you need totally idle machine (no GUI, etc.) -- if > > something writes to the backing device, it flushes the dirty pages > > generated by the loopback and the lockup is gone. If you add printk, don't > > forget to stop klogd, otherwise logging would end the lockup. > > erk. > > > The hotfix (that I verified to work) is to not set wbc->bdi, so that all > > devices are flushed ... but the code probably needs some redesign (i.e. > > either account per-device and flush per-device, or account-global and > > flush-global). > > > > Mikulas > > > > > > diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c > > --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.000000000 +0200 > > +++ mm/page-writeback.c 2007-11-10 20:32:43.000000000 +0100 > > @@ -214,7 +214,6 @@ > > > > for (;;) { > > struct writeback_control wbc = { > > - .bdi = bdi, > > .sync_mode = WB_SYNC_NONE, > > .older_than_this = NULL, > > .nr_to_write = write_chunk, > > Arguably we just have the wrong backing-device here, and what we should do > is to propagate the real backing device's pointer through up into the > filesystem. There's machinery for this which things like DM stacks use. If you change loopback backing-device, you just turn this nicely reproducible example into a subtle race condition that can happen whenever you use loopback or not. Think, what happens when different process dirties memory: You have process "A" that dirtied a lot of pages on device "1" but has not started writing them. You have process "B" that is trying to write to device "2", sees dirty page count over limit, but can't do anything about it, because it is only allowed to flush pages on device "2". --- so it endlessly loops. If you want to use the current flushing semantics, you just have to audit the whole kernel to make sure that if some process sees over-limit dirty page count, there is another process that is flushing the pages. Currently it is not true, the "dd" process sees over-limit count, but there is no-one writing. > I wonder if the post-2.6.23 changes happened to make this problem go away. I will try 2.6.24-rc2, but I don't think the root cause of this went away. Maybe you just reduced probability. Mikulas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Temporary lockup on loopback block device 2007-11-11 0:33 ` Mikulas Patocka @ 2007-11-11 3:56 ` Mikulas Patocka 2007-11-11 5:33 ` Mikulas Patocka 0 siblings, 1 reply; 11+ messages in thread From: Mikulas Patocka @ 2007-11-11 3:56 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Peter Zijlstra, WU Fengguang On Sun, 11 Nov 2007, Mikulas Patocka wrote: > On Sat, 10 Nov 2007, Andrew Morton wrote: > > > On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote: > > > > > Hi > > > > > > I am experiencing a transient lockup in 'D' state with loopback device. It > > > happens when process writes to a filesystem in loopback with command like > > > dd if=/dev/zero of=/s/fill bs=4k > > > > > > CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in > > > congestion_wait called from balance_dirty_pages. > > > > > > After about 30 seconds, the lockup is gone and dd resumes, but it locks up > > > soon again. > > > > > > I added a printk to the balance_dirty_pages > > > printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, > > > pages_written %d, write_chunk %d\n", nr_reclaimable, > > > global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, > > > write_chunk); > > > > > > and it shows this during the lockup: > > > > > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > > > pages_written 1021, write_chunk 1522 > > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > > > pages_written 1021, write_chunk 1522 > > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > > > pages_written 1021, write_chunk 1522 > > > > > > What apparently happens: > > > > > > writeback_inodes syncs inodes only on the given wbc->bdi, however > > > balance_dirty_pages checks against global counts of dirty pages. So if > > > there's nothing to sync on a given device, but there are other dirty pages > > > so that the counts are over the limit, it will loop without doing any > > > work. > > > > > > To reproduce it, you need totally idle machine (no GUI, etc.) -- if > > > something writes to the backing device, it flushes the dirty pages > > > generated by the loopback and the lockup is gone. If you add printk, don't > > > forget to stop klogd, otherwise logging would end the lockup. > > > > erk. > > > > > The hotfix (that I verified to work) is to not set wbc->bdi, so that all > > > devices are flushed ... but the code probably needs some redesign (i.e. > > > either account per-device and flush per-device, or account-global and > > > flush-global). > > > > > > Mikulas > > > > > > > > > diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c > > > --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.000000000 +0200 > > > +++ mm/page-writeback.c 2007-11-10 20:32:43.000000000 +0100 > > > @@ -214,7 +214,6 @@ > > > > > > for (;;) { > > > struct writeback_control wbc = { > > > - .bdi = bdi, > > > .sync_mode = WB_SYNC_NONE, > > > .older_than_this = NULL, > > > .nr_to_write = write_chunk, > > > > Arguably we just have the wrong backing-device here, and what we should do > > is to propagate the real backing device's pointer through up into the > > filesystem. There's machinery for this which things like DM stacks use. > > If you change loopback backing-device, you just turn this nicely > reproducible example into a subtle race condition that can happen whenever > you use loopback or not. Think, what happens when different process > dirties memory: > > You have process "A" that dirtied a lot of pages on device "1" but has not > started writing them. > You have process "B" that is trying to write to device "2", sees dirty > page count over limit, but can't do anything about it, because it is only > allowed to flush pages on device "2". --- so it endlessly loops. > > If you want to use the current flushing semantics, you just have to audit > the whole kernel to make sure that if some process sees over-limit dirty > page count, there is another process that is flushing the pages. Currently > it is not true, the "dd" process sees over-limit count, but there is > no-one writing. > > > I wonder if the post-2.6.23 changes happened to make this problem go away. > > I will try 2.6.24-rc2, but I don't think the root cause of this went away. > Maybe you just reduced probability. > > Mikulas So I compiled it and I don't see any more lock-ups. The writeback loop doesn't depend on any global page count, so the above scenario can't happen here. Good. Mikulas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Temporary lockup on loopback block device 2007-11-11 3:56 ` Mikulas Patocka @ 2007-11-11 5:33 ` Mikulas Patocka 0 siblings, 0 replies; 11+ messages in thread From: Mikulas Patocka @ 2007-11-11 5:33 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Peter Zijlstra, WU Fengguang > > > Arguably we just have the wrong backing-device here, and what we > > > should do is to propagate the real backing device's pointer through > > > up into the filesystem. There's machinery for this which things > > > like DM stacks use. Just thinking about the new implementation --- you shouldn't really propagate physical block device's backing_device into loopback device. If you leave it as is (each loop device has it's own backing store), you can nicely avoid the long-standing loopback deadlock coming from the fact that flushing one page on loopback device can generate several more dirty pages on the filesystem. If you let loopback device and physical device have the same backing store, then it can go wild creating more and more dirty pages up to a memory exhaustion. If you let them have different backing stores, it can't happen --- loopback flushing will just wait until the pages on the filesystem are written. Mikulas > So I compiled it and I don't see any more lock-ups. The writeback loop > doesn't depend on any global page count, so the above scenario can't > happen here. Good. > > Mikulas > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-11-15 22:35 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-10 19:51 Temporary lockup on loopback block device Mikulas Patocka 2007-11-10 22:54 ` Andrew Morton 2007-11-10 23:02 ` Peter Zijlstra 2007-11-11 0:38 ` Mikulas Patocka 2007-11-11 7:50 ` Miklos Szeredi 2007-11-11 18:29 ` Mikulas Patocka 2007-11-12 13:32 ` Miklos Szeredi 2007-11-15 22:35 ` Mikulas Patocka 2007-11-11 0:33 ` Mikulas Patocka 2007-11-11 3:56 ` Mikulas Patocka 2007-11-11 5:33 ` Mikulas Patocka
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.