* 2.6.18 ext3 panic. @ 2006-10-02 19:47 Dave Jones 2006-10-03 5:22 ` Dave Jones 0 siblings, 1 reply; 37+ messages in thread From: Dave Jones @ 2006-10-02 19:47 UTC (permalink / raw) To: Linux Kernel Not sure what exactly happened here. Was running fsx on ext3 over 2 disk raid0, and running a yum update. Box locked up solid after a few minutes.. http://www.codemonkey.org.uk/junk/DSC00747.JPG The unwinder getting stuck meant I lost the top of the trace though. (I have backporting the .19 fixes to .18 on my todo unless someone beats me to it and they end up in -stable). Will try to reproduce with a serial console hooked up. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-02 19:47 2.6.18 ext3 panic Dave Jones @ 2006-10-03 5:22 ` Dave Jones 2006-10-03 5:43 ` Eric Sandeen 0 siblings, 1 reply; 37+ messages in thread From: Dave Jones @ 2006-10-03 5:22 UTC (permalink / raw) To: Linux Kernel; +Cc: esandeen <Cc'd Eric as he's been looking into this> On Mon, Oct 02, 2006 at 03:47:11PM -0400, Dave Jones wrote: > Not sure what exactly happened here. Was running fsx on ext3 over 2 disk raid0, > and running a yum update. Box locked up solid after a few minutes.. > http://www.codemonkey.org.uk/junk/DSC00747.JPG > > The unwinder getting stuck meant I lost the top of the trace though. > (I have backporting the .19 fixes to .18 on my todo unless someone > beats me to it and they end up in -stable). > > Will try to reproduce with a serial console hooked up. So I managed to reproduce it with an 'fsx foo' and a 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from a vanilla 2.6.18 with none of the Fedora patches.. I'll give 2.6.18-git a try next. Dave ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at fs/buffer.c:2791 invalid opcode: 0000 [1] SMP CPU 1 Modules linked in: hidp l2cap bluetooth nfs lockd nfs_acl sunrpc ipv6 dm_mirror dm_mod video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq sr_mod snd_seq_device cdrom intel_rng snd_pcm_oss sg snd_mixer_oss snd_pcm shpchp floppy serio_raw pcspkr i2c_i801 ohci1394 ieee1394 snd_timer snd e1000 i2c_core soundcore snd_page_alloc sata_sil ahci libata sd_mod scsi_mod raid0 ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 408, comm: kjournald Not tainted 2.6.18 #1 RIP: 0010:[<ffffffff8021b425>] [<ffffffff8021b425>] submit_bh+0x29/0x124 RSP: 0018:ffff81007ebcbd40 EFLAGS: 00010246 RAX: 0000000000000005 RBX: ffff810063dd0ec8 RCX: 8000000000000000 RDX: ffff81007f1f5430 RSI: ffff810063dd0ec8 RDI: 0000000000000001 RBP: ffff81007ebcbd60 R08: 0000000000800000 R09: 0000000000000003 R10: ffff810068621510 R11: 0000000000000400 R12: ffff81007f7f48c8 R13: 0000000000000001 R14: 0000000000000033 R15: 0000000000000080 FS: 0000000000000000(0000) GS:ffff810037ff1cc0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002b76d5365000 CR3: 00000000658e3000 CR4: 00000000000006e0 Process kjournald (pid: 408, threadinfo ffff81007ebca000, task ffff810037f52040) Stack: 0000000000000003 ffff810063dd0ec8 ffff81007f7f48c8 0000000000000003 ffff81007ebcbda0 ffffffff80217ca1 ffff81007aa572a0 ffff810063f8d400 ffff810064145478 ffff81007f1ea208 ffff81007aa572a0 0000000000000080 Call Trace: [<ffffffff80217ca1>] ll_rw_block+0xa6/0xcd [<ffffffff88035991>] :jbd:journal_commit_transaction+0x40b/0x10ce [<ffffffff8803a033>] :jbd:kjournald+0xc7/0x222 [<ffffffff80236089>] kthread+0x100/0x136 [<ffffffff802624a0>] child_rip+0xa/0x12 DWARF2 unwinder stuck at child_rip+0xa/0x12 Leftover inexact backtrace: [<ffffffff80268c22>] _spin_unlock_irq+0x2b/0x31 [<ffffffff80261adc>] restore_args+0x0/0x30 [<ffffffff80250ec3>] run_workqueue+0x19/0xfa [<ffffffff80250ec3>] run_workqueue+0x19/0xfa [<ffffffff80235f89>] kthread+0x0/0x136 [<ffffffff80262496>] child_rip+0x0/0x12 Code: 0f 0b 68 c8 86 49 80 c2 e7 0a 48 83 7b 38 00 75 0a 0f 0b 68 RIP [<ffffffff8021b425>] submit_bh+0x29/0x124 RSP <ffff81007ebcbd40> <0>general protection fault: 0000 [2] SMP CPU 1 Modules linked in: hidp l2cap bluetooth nfs lockd nfs_acl sunrpc ipv6 dm_mirror dm_mod video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq sr_mod snd_seq_device cdrom intel_rng snd_pcm_oss sg snd_mixer_oss snd_pcm shpchp floppy serio_raw pcspkr i2c_i801 ohci1394 ieee1394 snd_timer snd e1000 i2c_core soundcore snd_page_alloc sata_sil ahci libata sd_mod scsi_mod raid0 ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Not tainted 2.6.18 #1 RIP: 0010:[<ffffffff8028e1a1>] [<ffffffff8028e1a1>] task_rq_lock+0x2b/0x74 RSP: 0018:ffff810037e17df0 EFLAGS: 00010006 RAX: 6b6b6b6b6b6b6b6b RBX: ffffffff80aae480 RCX: ffff81007f1ea5a8 RDX: ffff81007ee71080 RSI: ffff810037e17e78 RDI: ffff810037f52040 RBP: ffff810037e17e10 R08: ffff810037e17eb0 R09: 0000000000000001 R10: 0000000000000001 R11: ffffffff8029995d R12: ffffffff80aae480 R13: ffff810037e17e78 R14: ffff810037f52040 R15: 0000000000000100 FS: 0000000000000000(0000) GS:ffff810037ff1cc0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002b76d535d000 CR3: 000000007e290000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff810037e10000, task ffff81007ee71080) Stack: 000000000000000f ffff810037e08000 ffff810037f52040 ffffffff880398d9 ffff810037e17eb0 ffffffff80249e98 ffff810037e08000 000000007ee71080 ffffffff80268c22 0000000200000001 0000000000000000 0000000100000000 Call Trace: [<ffffffff80249e98>] try_to_wake_up+0x27/0x421 [<ffffffff8028e3ce>] wake_up_process+0x10/0x12 [<ffffffff880398e2>] :jbd:commit_timeout+0x9/0xb [<ffffffff80299a67>] run_timer_softirq+0x14c/0x1d5 [<ffffffff80212724>] __do_softirq+0x68/0xf5 [<ffffffff802627f8>] call_softirq+0x1c/0x28 DWARF2 unwinder stuck at call_softirq+0x1c/0x28 Leftover inexact backtrace: <IRQ> [<ffffffff80270c65>] do_softirq+0x39/0x9f [<ffffffff802962a3>] irq_exit+0x57/0x59 [<ffffffff8027b063>] smp_apic_timer_interrupt+0x5d/0x62 [<ffffffff8025b784>] mwait_idle+0x0/0x54 [<ffffffff8026216f>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff80266026>] __sched_text_start+0xaa6/0xadd [<ffffffff8025b7c3>] mwait_idle+0x3f/0x54 [<ffffffff8025b78d>] mwait_idle+0x9/0x54 [<ffffffff8024c916>] cpu_idle+0xa2/0xc5 [<ffffffff8027a674>] start_secondary+0x468/0x477 Code: 8b 40 18 48 8b 04 c5 c0 59 a6 80 4c 03 60 08 4c 89 e7 e8 5e RIP [<ffffffff8028e1a1>] task_rq_lock+0x2b/0x74 RSP <ffff810037e17df0> <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():1, irqs_disabled():1 Call Trace: [<ffffffff8026f956>] show_trace+0xae/0x336 [<ffffffff8026fbf3>] dump_stack+0x15/0x17 [<ffffffff8020bb01>] __might_sleep+0xb2/0xb4 [<ffffffff802a5160>] down_read+0x1d/0x4a [<ffffffff8029cf62>] blocking_notifier_call_chain+0x1b/0x41 [<ffffffff80293e10>] profile_task_exit+0x15/0x17 [<ffffffff80215a74>] do_exit+0x25/0x96a [<ffffffff8026fe21>] kernel_math_error+0x0/0x96 [<ffff810037e17d48>] DWARF2 unwinder stuck at 0xffff810037e17d48 Leftover inexact backtrace: <IRQ> [<ffffffff8026993f>] do_general_protection+0x10a/0x115 [<ffffffff8808bab2>] :scsi_mod:scsi_run_queue+0x1ab/0x1ba [<ffffffff802622d1>] error_exit+0x0/0x96 [<ffffffff8029995d>] run_timer_softirq+0x42/0x1d5 [<ffffffff8028e1a1>] task_rq_lock+0x2b/0x74 [<ffffffff880398d9>] :jbd:commit_timeout+0x0/0xb [<ffffffff80249e98>] try_to_wake_up+0x27/0x421 [<ffffffff80268c22>] _spin_unlock_irq+0x2b/0x31 [<ffffffff80268c22>] _spin_unlock_irq+0x2b/0x31 [<ffffffff880398d9>] :jbd:commit_timeout+0x0/0xb [<ffffffff880398d9>] :jbd:commit_timeout+0x0/0xb [<ffffffff8028e3ce>] wake_up_process+0x10/0x12 [<ffffffff880398e2>] :jbd:commit_timeout+0x9/0xb [<ffffffff80299a67>] run_timer_softirq+0x14c/0x1d5 [<ffffffff80212724>] __do_softirq+0x68/0xf5 [<ffffffff802627f8>] call_softirq+0x1c/0x28 [<ffffffff80270c65>] do_softirq+0x39/0x9f [<ffffffff802962a3>] irq_exit+0x57/0x59 [<ffffffff8027b063>] smp_apic_timer_interrupt+0x5d/0x62 [<ffffffff8025b784>] mwait_idle+0x0/0x54 [<ffffffff8026216f>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff80266026>] __sched_text_start+0xaa6/0xadd [<ffffffff8025b7c3>] mwait_idle+0x3f/0x54 [<ffffffff8025b78d>] mwait_idle+0x9/0x54 [<ffffffff8024c916>] cpu_idle+0xa2/0xc5 [<ffffffff8027a674>] start_secondary+0x468/0x477 Kernel panic - not syncing: Aiee, killing interrupt handler! -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-03 5:22 ` Dave Jones @ 2006-10-03 5:43 ` Eric Sandeen 2006-10-03 6:19 ` Andrew Morton 0 siblings, 1 reply; 37+ messages in thread From: Eric Sandeen @ 2006-10-03 5:43 UTC (permalink / raw) To: Dave Jones, Linux Kernel, esandeen Dave Jones wrote: > So I managed to reproduce it with an 'fsx foo' and a > 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from > a vanilla 2.6.18 with none of the Fedora patches.. > > I'll give 2.6.18-git a try next. > > Dave > > ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at fs/buffer.c:2791 I had thought/hoped that this was fixed by Jan's patch at http://lkml.org/lkml/2006/9/7/236 from the thread started at http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug first by going through that new codepath.... -Eric ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-03 5:43 ` Eric Sandeen @ 2006-10-03 6:19 ` Andrew Morton 2006-10-03 6:40 ` Dave Jones 2006-10-09 19:46 ` Eric Sandeen 0 siblings, 2 replies; 37+ messages in thread From: Andrew Morton @ 2006-10-03 6:19 UTC (permalink / raw) To: Eric Sandeen Cc: Dave Jones, Linux Kernel, esandeen, Badari Pulavarty, Jan Kara On Tue, 03 Oct 2006 00:43:01 -0500 Eric Sandeen <sandeen@sandeen.net> wrote: > Dave Jones wrote: > > > So I managed to reproduce it with an 'fsx foo' and a > > 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from > > a vanilla 2.6.18 with none of the Fedora patches.. > > > > I'll give 2.6.18-git a try next. > > > > Dave > > > > ----------- [cut here ] --------- [please bite here ] --------- > > Kernel BUG at fs/buffer.c:2791 > > I had thought/hoped that this was fixed by Jan's patch at > http://lkml.org/lkml/2006/9/7/236 from the thread started at > http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug > first by going through that new codepath.... Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, Badari was hitting that BUG and was able to confirm that Jan's patch (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed it. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-03 6:19 ` Andrew Morton @ 2006-10-03 6:40 ` Dave Jones 2006-10-03 16:45 ` Dave Jones 2006-10-09 19:46 ` Eric Sandeen 1 sibling, 1 reply; 37+ messages in thread From: Dave Jones @ 2006-10-03 6:40 UTC (permalink / raw) To: Andrew Morton Cc: Eric Sandeen, Linux Kernel, esandeen, Badari Pulavarty, Jan Kara On Mon, Oct 02, 2006 at 11:19:45PM -0700, Andrew Morton wrote: > On Tue, 03 Oct 2006 00:43:01 -0500 > Eric Sandeen <sandeen@sandeen.net> wrote: > > > Dave Jones wrote: > > > > > So I managed to reproduce it with an 'fsx foo' and a > > > 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from > > > a vanilla 2.6.18 with none of the Fedora patches.. > > > > > > I'll give 2.6.18-git a try next. > > > > > > Dave > > > > > > ----------- [cut here ] --------- [please bite here ] --------- > > > Kernel BUG at fs/buffer.c:2791 > > > > I had thought/hoped that this was fixed by Jan's patch at > > http://lkml.org/lkml/2006/9/7/236 from the thread started at > > http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug > > first by going through that new codepath.... > > Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, > Badari was hitting that BUG and was able to confirm that Jan's patch > (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed > it. Ok, this afternoon I was definitly running a kernel with that patch in it, and managed to get a trace (It was the one from the top of this thread that unfortunatly got truncated). Now, I can't reproduce it on a plain 2.6.18+that patch. I'll leave the stress test running overnight, and see if anything falls out in the morning. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-03 6:40 ` Dave Jones @ 2006-10-03 16:45 ` Dave Jones 0 siblings, 0 replies; 37+ messages in thread From: Dave Jones @ 2006-10-03 16:45 UTC (permalink / raw) To: Andrew Morton, Eric Sandeen, Linux Kernel, esandeen, Badari Pulavarty, Jan Kara On Tue, Oct 03, 2006 at 02:40:30AM -0400, Dave Jones wrote: > > > > ----------- [cut here ] --------- [please bite here ] --------- > > > > Kernel BUG at fs/buffer.c:2791 > > > > > > I had thought/hoped that this was fixed by Jan's patch at > > > http://lkml.org/lkml/2006/9/7/236 from the thread started at > > > http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug > > > first by going through that new codepath.... > > > > Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, > > Badari was hitting that BUG and was able to confirm that Jan's patch > > (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed > > it. > > Ok, this afternoon I was definitly running a kernel with that patch in it, > and managed to get a trace (It was the one from the top of this thread > that unfortunatly got truncated). > > Now, I can't reproduce it on a plain 2.6.18+that patch. > I'll leave the stress test running overnight, and see if anything > falls out in the morning. Been chugging away for 10 hrs now without repeating that incident. Hmm. That patch looks like good -stable material. I'll keep digging to see if I can somehow reproduce the problem I saw with the patch applied, but in absense of something better, I think we should go with it. One thing that did happen in the 10hrs was fsx-over-NFS spewed some nasty looking trace. I'll post that separately next. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-03 6:19 ` Andrew Morton 2006-10-03 6:40 ` Dave Jones @ 2006-10-09 19:46 ` Eric Sandeen 2006-10-09 19:59 ` Eric Sandeen ` (2 more replies) 1 sibling, 3 replies; 37+ messages in thread From: Eric Sandeen @ 2006-10-09 19:46 UTC (permalink / raw) To: Andrew Morton Cc: Dave Jones, Linux Kernel, esandeen, Badari Pulavarty, Jan Kara Andrew Morton wrote: > On Tue, 03 Oct 2006 00:43:01 -0500 > Eric Sandeen <sandeen@sandeen.net> wrote: > >> Dave Jones wrote: >> >>> So I managed to reproduce it with an 'fsx foo' and a >>> 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from >>> a vanilla 2.6.18 with none of the Fedora patches.. >>> >>> I'll give 2.6.18-git a try next. >>> >>> Dave >>> >>> ----------- [cut here ] --------- [please bite here ] --------- >>> Kernel BUG at fs/buffer.c:2791 >> I had thought/hoped that this was fixed by Jan's patch at >> http://lkml.org/lkml/2006/9/7/236 from the thread started at >> http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug >> first by going through that new codepath.... > > Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, > Badari was hitting that BUG and was able to confirm that Jan's patch > (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed > it. Looking at some BH traces*, it appears that what Dave hit is a truncate racing with a sync... truncate ... ext3_invalidate_page journal_invalidatepage journal_unmap buffer going off at the same time as sync ... journal_dirty_data sync_dirty_buffer submit_bh <-- finds unmapped buffer, boom. I'm not sure what should be coordinating this, and I'm not sure why we've not yet seen it on a stock kernel, but only FC6... I haven't found anything in FC6 that looks like it may affect this. -Eric *http://people.redhat.com/esandeen/traces/davej_ext3_oops1.txt ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-09 19:46 ` Eric Sandeen @ 2006-10-09 19:59 ` Eric Sandeen 2006-10-09 21:59 ` Badari Pulavarty 2006-10-09 22:40 ` Jan-Benedict Glaw 2 siblings, 0 replies; 37+ messages in thread From: Eric Sandeen @ 2006-10-09 19:59 UTC (permalink / raw) To: Eric Sandeen Cc: Andrew Morton, Dave Jones, Linux Kernel, Badari Pulavarty, Jan Kara Eric Sandeen wrote: >>> I had thought/hoped that this was fixed by Jan's patch at >>> http://lkml.org/lkml/2006/9/7/236 from the thread started at >>> http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug >>> first by going through that new codepath.... >> Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, >> Badari was hitting that BUG and was able to confirm that Jan's patch >> (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed >> it. > > Looking at some BH traces*, it appears that what Dave hit is a truncate > racing with a sync... (oh btw this is -with the above patch from Jan in place...) -Eric > truncate ... > ext3_invalidate_page > journal_invalidatepage > journal_unmap buffer > > going off at the same time as > > sync ... > journal_dirty_data > sync_dirty_buffer > submit_bh <-- finds unmapped buffer, boom. > > I'm not sure what should be coordinating this, and I'm not sure why > we've not yet seen it on a stock kernel, but only FC6... I haven't found > anything in FC6 that looks like it may affect this. > > -Eric > > *http://people.redhat.com/esandeen/traces/davej_ext3_oops1.txt ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-09 19:46 ` Eric Sandeen 2006-10-09 19:59 ` Eric Sandeen @ 2006-10-09 21:59 ` Badari Pulavarty 2006-10-09 22:50 ` Dave Jones 2006-10-09 22:40 ` Jan-Benedict Glaw 2 siblings, 1 reply; 37+ messages in thread From: Badari Pulavarty @ 2006-10-09 21:59 UTC (permalink / raw) To: Eric Sandeen; +Cc: Andrew Morton, Dave Jones, Linux Kernel, esandeen, Jan Kara On Mon, 2006-10-09 at 14:46 -0500, Eric Sandeen wrote: > Andrew Morton wrote: > > On Tue, 03 Oct 2006 00:43:01 -0500 > > Eric Sandeen <sandeen@sandeen.net> wrote: > > > >> Dave Jones wrote: > >> > >>> So I managed to reproduce it with an 'fsx foo' and a > >>> 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from > >>> a vanilla 2.6.18 with none of the Fedora patches.. > >>> > >>> I'll give 2.6.18-git a try next. > >>> > >>> Dave > >>> > >>> ----------- [cut here ] --------- [please bite here ] --------- > >>> Kernel BUG at fs/buffer.c:2791 > >> I had thought/hoped that this was fixed by Jan's patch at > >> http://lkml.org/lkml/2006/9/7/236 from the thread started at > >> http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug > >> first by going through that new codepath.... > > > > Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, > > Badari was hitting that BUG and was able to confirm that Jan's patch > > (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed > > it. > > Looking at some BH traces*, it appears that what Dave hit is a truncate > racing with a sync... > > truncate ... > ext3_invalidate_page > journal_invalidatepage > journal_unmap buffer > > going off at the same time as > > sync ... > journal_dirty_data > sync_dirty_buffer > submit_bh <-- finds unmapped buffer, boom. > I don't understand how this can happen .. journal_unmap_buffer() zapping the buffer since its not attached to any transaction. journal_unmap_buffer():[fs/jbd/transaction.c:1789] not on any transaction: zap b_state:0x10402f b_jlist:BJ_None cpu:0 b_count:3 b_blocknr:52735707 b_jbd:1 b_frozen_data:0000000000000000 b_committed_data:0000000000000000 b_transaction:0 b_next_transaction:0 b_cp_transaction:0 b_trans_is_running:0 b_trans_is_comitting:0 b_jcount:2 pg_dirty:1 journal_dirty_data() would do submit_bh() ONLY if its part of the older transaction. I need to take a closer look to understand the race. BTW, is this 1k or 2k filesystem ? How easy is to reproduce the problem ? Thanks, Badari ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-09 21:59 ` Badari Pulavarty @ 2006-10-09 22:50 ` Dave Jones 2006-10-10 14:11 ` Jan Kara 0 siblings, 1 reply; 37+ messages in thread From: Dave Jones @ 2006-10-09 22:50 UTC (permalink / raw) To: Badari Pulavarty Cc: Eric Sandeen, Andrew Morton, Linux Kernel, esandeen, Jan Kara On Mon, Oct 09, 2006 at 02:59:25PM -0700, Badari Pulavarty wrote: > journal_dirty_data() would do submit_bh() ONLY if its part of the older > transaction. > > I need to take a closer look to understand the race. > > BTW, is this 1k or 2k filesystem ? (18:41:11:davej@gelk:~)$ sudo tune2fs -l /dev/md0 | grep size Block size: 1024 Fragment size: 1024 Inode size: 128 (18:41:16:davej@gelk:~)$ > How easy is to reproduce the problem ? I can reproduce it within a few hours of stressing, but only on that one box. I've not figured out exactly what's so special about it yet (though the 1k block thing may be it). I had been thinking it was a raid0 only thing, as none of my other boxes have that. I'm not entirely sure how it got set up that way either. The Fedora installer being too smart for its own good perhaps. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-09 22:50 ` Dave Jones @ 2006-10-10 14:11 ` Jan Kara 2006-10-10 18:42 ` Andrew Morton 2006-10-10 22:03 ` Eric Sandeen 0 siblings, 2 replies; 37+ messages in thread From: Jan Kara @ 2006-10-10 14:11 UTC (permalink / raw) To: Dave Jones, Badari Pulavarty, Eric Sandeen, Andrew Morton, Linux Kernel, esandeen > On Mon, Oct 09, 2006 at 02:59:25PM -0700, Badari Pulavarty wrote: > > > journal_dirty_data() would do submit_bh() ONLY if its part of the older > > transaction. > > > > I need to take a closer look to understand the race. > > > > BTW, is this 1k or 2k filesystem ? > > (18:41:11:davej@gelk:~)$ sudo tune2fs -l /dev/md0 | grep size > Block size: 1024 > Fragment size: 1024 > Inode size: 128 > (18:41:16:davej@gelk:~)$ > > > How easy is to reproduce the problem ? > > I can reproduce it within a few hours of stressing, but only > on that one box. I've not figured out exactly what's so > special about it yet (though the 1k block thing may be it). > I had been thinking it was a raid0 only thing, as none of > my other boxes have that. > > I'm not entirely sure how it got set up that way either. > The Fedora installer being too smart for its own good perhaps. I think it's really the 1KB block size that makes it happen. I've looked at journal_dirty_data() code and I think the following can happen: sync() eventually ends up in journal_dirty_data(bh) as Eric writes. There is finds dirty buffer attached to the comitting transaction. So it drops all locks and calls sync_dirty_buffer(bh). Now in other process, file is truncated so that 'bh' gets just after EOF. As we have 1kb buffers, it can happen that bh is in the partially truncated page. Buffer is marked unmapped and clean. But in a moment the page is marked dirty and msync() is called. That eventually calls set_page_dirty() and all buffers in the page are marked dirty. The first process now wakes up, locks the buffer, clears the dirty bit and does submit_bh() - Oops. This is essentially the same problem Badari found but in a different place. There are two places that are arguably wrong... 1) We mark buffer dirty after EOF. But actually that may be needed - or what is the expected behaviour when we write into mmapped file after EOF, then extend the file and do msync()? 2) We submit a buffer after EOF for IO. This should be clearly avoided but getting the needed info from bh is really ugly... What we could do is: Instead od calling sync_dirty_buffer() we do something like: lock_buffer(bh); jbd_lock_bh_state(bh); if (!buffer_jbd(bh) || jh != bh2jh(bh) || jh->b_transaction != journal->j_committing_transaction) { jbd_unlock_bh_state(bh); unlock_buffer(bh); } jbd_unlock_bh_state(bh); if (test_clear_buffer_dirty(bh)) { get_bh(bh); bh->b_end_io = end_buffer_write_sync; submit_bh(WRITE, bh); wait_on_buffer(bh); } else unlock_buffer(bh); That should deal with the problem... Much more adventurous change would be to remove the syncing code altogether - the new commit code makes sure to write out each buffer just once so the livelock should not happen now. But then we'd have to put running transaction in j_next_transaction and refile data buffers instead of unfiling them. That should actually give quite some performance improvement when intensively rewriting files. But I guess that is JBD2 matter ;). Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-10 14:11 ` Jan Kara @ 2006-10-10 18:42 ` Andrew Morton 2006-10-10 22:03 ` Eric Sandeen 1 sibling, 0 replies; 37+ messages in thread From: Andrew Morton @ 2006-10-10 18:42 UTC (permalink / raw) To: Jan Kara; +Cc: Dave Jones, Badari Pulavarty, Eric Sandeen, Linux Kernel, esandeen On Tue, 10 Oct 2006 16:11:45 +0200 Jan Kara <jack@suse.cz> wrote: > > On Mon, Oct 09, 2006 at 02:59:25PM -0700, Badari Pulavarty wrote: > > > > > journal_dirty_data() would do submit_bh() ONLY if its part of the older > > > transaction. > > > > > > I need to take a closer look to understand the race. > > > > > > BTW, is this 1k or 2k filesystem ? > > > > (18:41:11:davej@gelk:~)$ sudo tune2fs -l /dev/md0 | grep size > > Block size: 1024 > > Fragment size: 1024 > > Inode size: 128 > > (18:41:16:davej@gelk:~)$ > > > > > How easy is to reproduce the problem ? > > > > I can reproduce it within a few hours of stressing, but only > > on that one box. I've not figured out exactly what's so > > special about it yet (though the 1k block thing may be it). > > I had been thinking it was a raid0 only thing, as none of > > my other boxes have that. > > > > I'm not entirely sure how it got set up that way either. > > The Fedora installer being too smart for its own good perhaps. > I think it's really the 1KB block size that makes it happen. > I've looked at journal_dirty_data() code and I think the following can > happen: > sync() eventually ends up in journal_dirty_data(bh) as Eric writes. > There is finds dirty buffer attached to the comitting transaction. So it drops > all locks and calls sync_dirty_buffer(bh). > Now in other process, file is truncated so that 'bh' gets just after EOF. > As we have 1kb buffers, it can happen that bh is in the partially > truncated page. Buffer is marked unmapped and clean. But in a moment the page > is marked dirty and msync() is called. That eventually calls > set_page_dirty() and all buffers in the page are marked dirty. > The first process now wakes up, locks the buffer, clears the dirty bit > and does submit_bh() - Oops. > > This is essentially the same problem Badari found but in a different > place. There are two places that are arguably wrong... > 1) We mark buffer dirty after EOF. But actually that may be needed - > or what is the expected behaviour when we write into mmapped file after > EOF, then extend the file and do msync()? yup. > 2) We submit a buffer after EOF for IO. This should be clearly avoided > but getting the needed info from bh is really ugly... Things like __block_write_full_page() avoid this by checking the block's offset against i_size. (Not racy against truncate-down because the page is locked, not racy against truncate-up because the bh is zero and up-to-date). But for jbd writeout we don't hold the page lock, so checking against bh->b_page->host->i_size is a bit racy. hm. But we do lock the buffers in journal_invalidatepage(), so checking i_size after locking the buffer in the writeout path might get us there. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-10 14:11 ` Jan Kara 2006-10-10 18:42 ` Andrew Morton @ 2006-10-10 22:03 ` Eric Sandeen 2006-10-10 22:25 ` Badari Pulavarty 1 sibling, 1 reply; 37+ messages in thread From: Eric Sandeen @ 2006-10-10 22:03 UTC (permalink / raw) To: Jan Kara Cc: Dave Jones, Badari Pulavarty, Eric Sandeen, Andrew Morton, Linux Kernel Jan Kara wrote: > I think it's really the 1KB block size that makes it happen. > I've looked at journal_dirty_data() code and I think the following can > happen: > sync() eventually ends up in journal_dirty_data(bh) as Eric writes. > There is finds dirty buffer attached to the comitting transaction. So it drops > all locks and calls sync_dirty_buffer(bh). > Now in other process, file is truncated so that 'bh' gets just after EOF. > As we have 1kb buffers, it can happen that bh is in the partially > truncated page. Buffer is marked unmapped and clean. But in a moment the page > is marked dirty and msync() is called. That eventually calls > set_page_dirty() and all buffers in the page are marked dirty. > The first process now wakes up, locks the buffer, clears the dirty bit > and does submit_bh() - Oops. Hm, just FWIW I have a couple traces* of the buffer getting unmapped -before- journal_submit_data_buffers ever even finds it... journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, adding to array pid 1836 b_state:0x114025 b_jlist:BJ_SyncData cpu:0 b_count:2 b_blocknr:27130 b_jbd:1 b_frozen_data:0000000000000000 b_committed_data:0000000000000000 b_transaction:1 b_next_transaction:0 b_cp_transaction:0 b_trans_is_running:0 b_trans_is_comitting:1 b_jcount:0 pg_dirty:0 so it's already unmapped at this point. Could journal_submit_data_buffers benefit from some buffer_mapped checks? Or is that just a bandaid too late... -Eric *http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt http://people.redhat.com/esandeen/traces/eric_ext3_oops2.txt ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-10 22:03 ` Eric Sandeen @ 2006-10-10 22:25 ` Badari Pulavarty 2006-10-11 1:43 ` Eric Sandeen 0 siblings, 1 reply; 37+ messages in thread From: Badari Pulavarty @ 2006-10-10 22:25 UTC (permalink / raw) To: Eric Sandeen Cc: Jan Kara, Dave Jones, Eric Sandeen, Andrew Morton, Linux Kernel On Tue, 2006-10-10 at 17:03 -0500, Eric Sandeen wrote: > Jan Kara wrote: > > > I think it's really the 1KB block size that makes it happen. > > I've looked at journal_dirty_data() code and I think the following can > > happen: > > sync() eventually ends up in journal_dirty_data(bh) as Eric writes. > > There is finds dirty buffer attached to the comitting transaction. So it drops > > all locks and calls sync_dirty_buffer(bh). > > Now in other process, file is truncated so that 'bh' gets just after EOF. > > As we have 1kb buffers, it can happen that bh is in the partially > > truncated page. Buffer is marked unmapped and clean. But in a moment the page > > is marked dirty and msync() is called. That eventually calls > > set_page_dirty() and all buffers in the page are marked dirty. > > The first process now wakes up, locks the buffer, clears the dirty bit > > and does submit_bh() - Oops. > > Hm, just FWIW I have a couple traces* of the buffer getting unmapped > -before- journal_submit_data_buffers ever even finds it... > > journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, > adding to array pid 1836 > b_state:0x114025 b_jlist:BJ_SyncData cpu:0 b_count:2 b_blocknr:27130 > b_jbd:1 b_frozen_data:0000000000000000 > b_committed_data:0000000000000000 > b_transaction:1 b_next_transaction:0 b_cp_transaction:0 > b_trans_is_running:0 > b_trans_is_comitting:1 b_jcount:0 pg_dirty:0 > > so it's already unmapped at this point. Could > journal_submit_data_buffers benefit from some buffer_mapped checks? Or > is that just a bandaid too late... Hmm.. b_state: 0x114025 ^ means BH_Mapped. Isn't it ? Thanks, Badari ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-10 22:25 ` Badari Pulavarty @ 2006-10-11 1:43 ` Eric Sandeen 2006-10-11 10:33 ` Jan Kara 0 siblings, 1 reply; 37+ messages in thread From: Eric Sandeen @ 2006-10-11 1:43 UTC (permalink / raw) To: Badari Pulavarty Cc: Eric Sandeen, Jan Kara, Dave Jones, Andrew Morton, Linux Kernel Badari Pulavarty wrote: > On Tue, 2006-10-10 at 17:03 -0500, Eric Sandeen wrote: >> Jan Kara wrote: >> >>> I think it's really the 1KB block size that makes it happen. >>> I've looked at journal_dirty_data() code and I think the following can >>> happen: >>> sync() eventually ends up in journal_dirty_data(bh) as Eric writes. >>> There is finds dirty buffer attached to the comitting transaction. So it drops >>> all locks and calls sync_dirty_buffer(bh). >>> Now in other process, file is truncated so that 'bh' gets just after EOF. >>> As we have 1kb buffers, it can happen that bh is in the partially >>> truncated page. Buffer is marked unmapped and clean. But in a moment the page >>> is marked dirty and msync() is called. That eventually calls >>> set_page_dirty() and all buffers in the page are marked dirty. >>> The first process now wakes up, locks the buffer, clears the dirty bit >>> and does submit_bh() - Oops. >> Hm, just FWIW I have a couple traces* of the buffer getting unmapped >> -before- journal_submit_data_buffers ever even finds it... >> >> journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, >> adding to array pid 1836 >> b_state:0x114025 b_jlist:BJ_SyncData cpu:0 b_count:2 b_blocknr:27130 >> b_jbd:1 b_frozen_data:0000000000000000 >> b_committed_data:0000000000000000 >> b_transaction:1 b_next_transaction:0 b_cp_transaction:0 >> b_trans_is_running:0 >> b_trans_is_comitting:1 b_jcount:0 pg_dirty:0 >> >> so it's already unmapped at this point. Could >> journal_submit_data_buffers benefit from some buffer_mapped checks? Or >> is that just a bandaid too late... > > Hmm.. > > b_state: 0x114025 > ^ > means BH_Mapped. Isn't it ? Whoops, I pasted in the wrong one, I guess, from earlier in the trace. Here are the ones I was looking at: journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, adding to array pid 1690 b_state:0x104005 b_jlist:BJ_SyncData cpu:0 b_count:2 b_blocknr:30045 b_jbd:1 b_frozen_data:0000000000000000 b_committed_data:0000000000000000 b_transaction:1 b_next_transaction:0 b_cp_transaction:0 b_trans_is_running:0 b_trans_is_comitting:1 b_jcount:0 pg_dirty:1 and journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, adding to array pid 1836 b_state:0x114005 b_jlist:BJ_SyncData cpu:1 b_count:2 b_blocknr:27130 b_jbd:1 b_frozen_data:0000000000000000 b_committed_data:0000000000000000 b_transaction:1 b_next_transaction:0 b_cp_transaction:0 b_trans_is_running:0 b_trans_is_comitting:1 b_jcount:0 pg_dirty:1 -Eric ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-11 1:43 ` Eric Sandeen @ 2006-10-11 10:33 ` Jan Kara 2006-10-11 13:44 ` Eric Sandeen 0 siblings, 1 reply; 37+ messages in thread From: Jan Kara @ 2006-10-11 10:33 UTC (permalink / raw) To: Eric Sandeen Cc: Badari Pulavarty, Eric Sandeen, Jan Kara, Dave Jones, Andrew Morton, Linux Kernel > Badari Pulavarty wrote: > >On Tue, 2006-10-10 at 17:03 -0500, Eric Sandeen wrote: > >>Jan Kara wrote: > >> > >>> I think it's really the 1KB block size that makes it happen. > >>>I've looked at journal_dirty_data() code and I think the following can > >>>happen: > >>> sync() eventually ends up in journal_dirty_data(bh) as Eric writes. > >>>There is finds dirty buffer attached to the comitting transaction. So it > >>>drops > >>>all locks and calls sync_dirty_buffer(bh). > >>> Now in other process, file is truncated so that 'bh' gets just after > >>> EOF. > >>>As we have 1kb buffers, it can happen that bh is in the partially > >>>truncated page. Buffer is marked unmapped and clean. But in a moment the > >>>page > >>>is marked dirty and msync() is called. That eventually calls > >>>set_page_dirty() and all buffers in the page are marked dirty. > >>> The first process now wakes up, locks the buffer, clears the dirty bit > >>>and does submit_bh() - Oops. > >>Hm, just FWIW I have a couple traces* of the buffer getting unmapped > >>-before- journal_submit_data_buffers ever even finds it... > >> > >> journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, > >>adding to array pid 1836 > >> b_state:0x114025 b_jlist:BJ_SyncData cpu:0 b_count:2 b_blocknr:27130 > >> b_jbd:1 b_frozen_data:0000000000000000 > >>b_committed_data:0000000000000000 > >> b_transaction:1 b_next_transaction:0 b_cp_transaction:0 > >>b_trans_is_running:0 > >> b_trans_is_comitting:1 b_jcount:0 pg_dirty:0 > >> > >>so it's already unmapped at this point. Could > >>journal_submit_data_buffers benefit from some buffer_mapped checks? Or > >>is that just a bandaid too late... > > > >Hmm.. > > > >b_state: 0x114025 > > ^ > >means BH_Mapped. Isn't it ? > > Whoops, I pasted in the wrong one, I guess, from earlier in the trace. > Here are the ones I was looking at: > > journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, adding > to array pid 1690 > b_state:0x104005 b_jlist:BJ_SyncData cpu:0 b_count:2 b_blocknr:30045 > b_jbd:1 b_frozen_data:0000000000000000 > b_committed_data:0000000000000000 > b_transaction:1 b_next_transaction:0 b_cp_transaction:0 > b_trans_is_running:0 > b_trans_is_comitting:1 b_jcount:0 pg_dirty:1 > > and > > journal_submit_data_buffers():[fs/jbd/commit.c:242] needs writeout, adding > to array pid 1836 > b_state:0x114005 b_jlist:BJ_SyncData cpu:1 b_count:2 b_blocknr:27130 > b_jbd:1 b_frozen_data:0000000000000000 > b_committed_data:0000000000000000 > b_transaction:1 b_next_transaction:0 b_cp_transaction:0 > b_trans_is_running:0 > b_trans_is_comitting:1 b_jcount:0 pg_dirty:1 Umm, but these two traces confuse me: 1) They are different traces that those you wrote about initially, aren't they? Because here we would not call sync_dirty_buffer() from journal_dirty_data(). BTW: Does this buffer trace lead to that Oops in submit_bh()? I guess not as the buffer is not dirty... Am I right that now there are no Oopses because of buffers submitted from the commit code? Only those from journal_dirty_data()? 2) Those buffers have strange states - they are !mapped, !dirty (so this is fine) but they are also JBD_Dirty and ion BJ_SyncData. This is really strange! Either it is ordered data buffer and should not be JBD_Dirty or it is not ordered data buffer and then it should not be in BJ_SyncData! The second buffer even has JBD_JWrite set so it really was metadabuffer under commit and later something took it from the committing transaction (without clearing the JWrite bit) and filed it in the SyncData list... Umm, this reminds me something... <looks into commit code> Oh no, who could write that BJ_Forget list handling.. me? ;) I think the problem is in my change to BJ_Forget list handling - if we find out buffer has b_next_transaction set, we just refile it (previously we BUGged). That it just fine, except when we are in the ordered mode and the buffer is reallocated as data. Then we refile the buffer to BJ_Metadata or BJ_Reserved list, instead of BJ_SyncData. What then happens is uncertain but probably something gets (rightfully so) confused and JBD_Dirty buffer gets to BJ_SyncData list. This bug does not seem to cause those problems with unmapped buffers but still we should fix it as it is asking for trouble. Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-11 10:33 ` Jan Kara @ 2006-10-11 13:44 ` Eric Sandeen 2006-10-11 14:22 ` Jan Kara 0 siblings, 1 reply; 37+ messages in thread From: Eric Sandeen @ 2006-10-11 13:44 UTC (permalink / raw) To: Jan Kara Cc: Badari Pulavarty, Eric Sandeen, Dave Jones, Andrew Morton, Linux Kernel Jan Kara wrote: > Umm, but these two traces confuse me: > 1) They are different traces that those you wrote about initially, > aren't they? Because here we would not call sync_dirty_buffer() from > journal_dirty_data(). > BTW: Does this buffer trace lead to that Oops in submit_bh()? I guess not > as the buffer is not dirty... They do wind up at the same oops, from the same "testcase" (i.e. beat the tar out of the filesystem with multiple fsx's and fsstress...) The buffer is not dirty at that tracepoint because it has just done if (locked && test_clear_buffer_dirty(bh)) { prior to the tracepoint... > Am I right that now there are no Oopses because of buffers submitted > from the commit code? Only those from journal_dirty_data()? Well, it's hard to sort out which thread is doing what; I added a pid to each tracepoint to try to make that a little easier. Both of the traces I posted seem to be journal_submit_data_buffers leading to an unmapped buffer submitted in submit_bh. > 2) Those buffers have strange states - they are !mapped, !dirty (so this > is fine) Well, they were just undirtied in journal_submit_data_buffers. See the whole traces at http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt http://people.redhat.com/esandeen/traces/eric_ext3_oops2.txt I will try to reproduce with fewer threads, to see if we can find a simpler sequence of events... As an aside, when we do journal_unmap_buffer... should it stay on t_sync_datalist? Thanks, -Eric > but they are also JBD_Dirty and ion BJ_SyncData. This is really > strange! Either it is ordered data buffer and should not be JBD_Dirty or > it is not ordered data buffer and then it should not be in BJ_SyncData! > The second buffer even has JBD_JWrite set so it really was metadabuffer > under commit and later something took it from the committing transaction > (without clearing the JWrite bit) and filed it in the SyncData list... > Umm, this reminds me something... <looks into commit code> Oh no, who could > write that BJ_Forget list handling.. me? ;) > > I think the problem is in my change to BJ_Forget list handling - if we > find out buffer has b_next_transaction set, we just refile it > (previously we BUGged). That it just fine, except when we are in the > ordered mode and the buffer is reallocated as data. Then we refile the > buffer to BJ_Metadata or BJ_Reserved list, instead of BJ_SyncData. What > then happens is uncertain but probably something gets (rightfully so) > confused and JBD_Dirty buffer gets to BJ_SyncData list. This bug does > not seem to cause those problems with unmapped buffers but still we > should fix it as it is asking for trouble. > > Honza ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-11 13:44 ` Eric Sandeen @ 2006-10-11 14:22 ` Jan Kara 2006-10-11 17:54 ` Badari Pulavarty 0 siblings, 1 reply; 37+ messages in thread From: Jan Kara @ 2006-10-11 14:22 UTC (permalink / raw) To: Eric Sandeen Cc: Badari Pulavarty, Eric Sandeen, Dave Jones, Andrew Morton, Linux Kernel > Jan Kara wrote: > > > Umm, but these two traces confuse me: > >1) They are different traces that those you wrote about initially, > >aren't they? Because here we would not call sync_dirty_buffer() from > >journal_dirty_data(). > > BTW: Does this buffer trace lead to that Oops in submit_bh()? I guess not > >as the buffer is not dirty... > > They do wind up at the same oops, from the same "testcase" (i.e. beat the > tar out of the filesystem with multiple fsx's and fsstress...) > > The buffer is not dirty at that tracepoint because it has just done > if (locked && test_clear_buffer_dirty(bh)) { > prior to the tracepoint... Oh, I see. OK. > > See the whole traces at > > http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt > http://people.redhat.com/esandeen/traces/eric_ext3_oops2.txt Hmm, those traces look really useful. I just have to digest them ;). > As an aside, when we do journal_unmap_buffer... should it stay on > t_sync_datalist? Yes, it should and it seems it really was removed from it at some point. Only later journal_dirty_data() came and filed it back to the BJ_SyncData list. And the buffer remained unmapped till the commit time and then *bang*... It may even be a race in ext3 itself that it called journal_dirty_data() on an unmapped buffer but I have to read some more code. Bye Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-11 14:22 ` Jan Kara @ 2006-10-11 17:54 ` Badari Pulavarty 2006-10-12 2:36 ` Eric Sandeen 0 siblings, 1 reply; 37+ messages in thread From: Badari Pulavarty @ 2006-10-11 17:54 UTC (permalink / raw) To: Jan Kara Cc: Eric Sandeen, Eric Sandeen, Dave Jones, Andrew Morton, Linux Kernel On Wed, 2006-10-11 at 16:22 +0200, Jan Kara wrote: > > Jan Kara wrote: > > > > > Umm, but these two traces confuse me: > > >1) They are different traces that those you wrote about initially, > > >aren't they? Because here we would not call sync_dirty_buffer() from > > >journal_dirty_data(). > > > BTW: Does this buffer trace lead to that Oops in submit_bh()? I guess not > > >as the buffer is not dirty... > > > > They do wind up at the same oops, from the same "testcase" (i.e. beat the > > tar out of the filesystem with multiple fsx's and fsstress...) > > > > The buffer is not dirty at that tracepoint because it has just done > > if (locked && test_clear_buffer_dirty(bh)) { > > prior to the tracepoint... > Oh, I see. OK. > > > > > See the whole traces at > > > > http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt > > http://people.redhat.com/esandeen/traces/eric_ext3_oops2.txt > Hmm, those traces look really useful. I just have to digest them ;). > > > As an aside, when we do journal_unmap_buffer... should it stay on > > t_sync_datalist? > Yes, it should and it seems it really was removed from it at some > point. Only later journal_dirty_data() came and filed it back to the > BJ_SyncData list. And the buffer remained unmapped till the commit time > and then *bang*... It may even be a race in ext3 itself that it called > journal_dirty_data() on an unmapped buffer but I have to read some more > code. > Yes. calling journal_dirty_data() on unmapped buffer can definitely happen. (only thing i am not sure is - why doesn't happen with a simple testcase like dirtying only a part of a page in 1k filesystem. I am not sure why we need journal_unmap_buffer() in the sequence). Here is what I think is happening.. journal_unmap_buffer() - cleaned the buffer, since its outside EOF, but its a part of the same page. So it remained on the page->buffers list. (at this time its not part of any transaction). Then, ordererd_commit_write() called journal_dirty_data() and we added all these buffers to BJ_SyncData list. (at this time buffer is clean - not dirty). Now msync() called __set_page_dirty_buffers() and dirtied *all* the buffers attached to this page. journal_submit_data_buffers() got around to this buffer and tried to submit the buffer... Andrew is right - only option for us to check the filesize in the write out path and skip the buffers beyond EOF. Thanks, Badari ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-11 17:54 ` Badari Pulavarty @ 2006-10-12 2:36 ` Eric Sandeen 2006-10-12 4:34 ` John Wendel 2006-10-12 12:28 ` Jan Kara 0 siblings, 2 replies; 37+ messages in thread From: Eric Sandeen @ 2006-10-12 2:36 UTC (permalink / raw) To: Badari Pulavarty Cc: Jan Kara, Eric Sandeen, Dave Jones, Andrew Morton, Linux Kernel Badari Pulavarty wrote: > Here is what I think is happening.. > > journal_unmap_buffer() - cleaned the buffer, since its outside EOF, but > its a part of the same page. So it remained on the page->buffers > list. (at this time its not part of any transaction). > > Then, ordererd_commit_write() called journal_dirty_data() and we added > all these buffers to BJ_SyncData list. (at this time buffer is clean - > not dirty). > > Now msync() called __set_page_dirty_buffers() and dirtied *all* the > buffers attached to this page. > > journal_submit_data_buffers() got around to this buffer and tried to > submit the buffer... This seems about right, but one thing bothers me in the traces; it seems like there is some locking that is missing. In http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt for example, it looks like journal_dirty_data gets started, but then the buffer_head is acted on by journal_unmap_buffer, which decides this buffer is part of the running transaction, past EOF, and clears mapped, dirty, etc. Then journal_dirty_data picks up again, decides that the buffer is not on the right list (now BJ_None) and puts it back on BJ_SyncData. Then it gets picked up by journal_submit_data_buffers and submitted, and oops. Talking with Stephen, it seemed like the page lock should synchronize these threads, but I've found that we can get to journal_dirty_data acting on the buffer heads w/o having the page locked... I'm still digging, and, er, grasping at straws here... Am I off base? -Eric > Andrew is right - only option for us to check the filesize in the > write out path and skip the buffers beyond EOF. > > Thanks, > Badari > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 2:36 ` Eric Sandeen @ 2006-10-12 4:34 ` John Wendel 2006-10-12 6:57 ` Jan-Benedict Glaw 2006-10-12 12:28 ` Jan Kara 1 sibling, 1 reply; 37+ messages in thread From: John Wendel @ 2006-10-12 4:34 UTC (permalink / raw) To: Eric Sandeen Cc: Badari Pulavarty, Jan Kara, Eric Sandeen, Dave Jones, Andrew Morton, Linux Kernel Eric Sandeen wrote: > Badari Pulavarty wrote: > >> Here is what I think is happening.. >> >> journal_unmap_buffer() - cleaned the buffer, since its outside EOF, but >> its a part of the same page. So it remained on the page->buffers >> list. (at this time its not part of any transaction). >> >> Then, ordererd_commit_write() called journal_dirty_data() and we added >> all these buffers to BJ_SyncData list. (at this time buffer is clean - >> not dirty). >> >> Now msync() called __set_page_dirty_buffers() and dirtied *all* the >> buffers attached to this page. >> >> journal_submit_data_buffers() got around to this buffer and tried to >> submit the buffer... > > This seems about right, but one thing bothers me in the traces; it > seems like there is some locking that is missing. In > http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt > for example, it looks like journal_dirty_data gets started, but then > the buffer_head is acted on by journal_unmap_buffer, which decides > this buffer is part of the running transaction, past EOF, and clears > mapped, dirty, etc. Then journal_dirty_data picks up again, decides > that the buffer is not on the right list (now BJ_None) and puts it > back on BJ_SyncData. Then it gets picked up by > journal_submit_data_buffers and submitted, and oops. > > Talking with Stephen, it seemed like the page lock should synchronize > these threads, but I've found that we can get to journal_dirty_data > acting on the buffer heads w/o having the page locked... > > I'm still digging, and, er, grasping at straws here... Am I off base? > > -Eric > > >> Andrew is right - only option for us to check the filesize in the >> write out path and skip the buffers beyond EOF. >> >> Thanks, >> Badari >> Here's another data point for your consideration. I've been seeing this error since I started running 2.6.18, I assumed it was hardware, so I've tried 3 different disks, a PATA and 2 SATA drives, with VIA and Promise controllers, the error has occurred on all of them. I see the error infrequently, always when downloading lots of small files from Usenet and building, copying and deleting large (200 - 300 MB). I haven't ever had an oops/panic, just this error. When I run fsck, I always see a single message that "deleted inode nnn has zero dtime". I hope this will be useful. Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5): ext3_free_blocks_sb: bit already cleared for block 4740550 Oct 11 20:37:32 Godzilla kernel: Aborting journal on device hda5. Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_free_blocks_sb: Journal has aborted Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_free_blocks_sb: Journal has aborted Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_reserve_inode_write: Journal has aborted Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_truncate: Journal has aborted Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_reserve_inode_write: Journal has aborted Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_orphan_del: Journal has aborted Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_reserve_inode_write: Journal has aborted Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_delete_inode: Journal has aborted Oct 11 20:37:32 Godzilla kernel: __journal_remove_journal_head: freeing b_committed_data Oct 11 20:37:32 Godzilla kernel: __journal_remove_journal_head: freeing b_committed_data Oct 11 20:37:32 Godzilla kernel: ext3_abort called. Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5): ext3_journal_start_sb: Detected aborted journal Oct 11 20:37:32 Godzilla kernel: Remounting filesystem read-only ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 4:34 ` John Wendel @ 2006-10-12 6:57 ` Jan-Benedict Glaw 0 siblings, 0 replies; 37+ messages in thread From: Jan-Benedict Glaw @ 2006-10-12 6:57 UTC (permalink / raw) To: John Wendel Cc: Eric Sandeen, Badari Pulavarty, Jan Kara, Eric Sandeen, Dave Jones, Andrew Morton, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 1793 bytes --] On Wed, 2006-10-11 21:34:13 -0700, John Wendel <jwendel10@comcast.net> wrote: > Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5): ext3_free_blocks_sb: bit already cleared for block 4740550 > Oct 11 20:37:32 Godzilla kernel: Aborting journal on device hda5. > Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_free_blocks_sb: Journal has aborted > Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_free_blocks_sb: Journal has aborted > Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_reserve_inode_write: Journal has aborted > Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_truncate: Journal has aborted > Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_reserve_inode_write: Journal has aborted > Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_orphan_del: Journal has aborted > Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_reserve_inode_write: Journal has aborted > Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5) in ext3_delete_inode: Journal has aborted > Oct 11 20:37:32 Godzilla kernel: __journal_remove_journal_head: freeing b_committed_data > Oct 11 20:37:32 Godzilla kernel: __journal_remove_journal_head: freeing b_committed_data > Oct 11 20:37:32 Godzilla kernel: ext3_abort called. > Oct 11 20:37:32 Godzilla kernel: EXT3-fs error (device hda5): ext3_journal_start_sb: Detected aborted journal > Oct 11 20:37:32 Godzilla kernel: Remounting filesystem read-only This looks very much like the issue I see. MfG, JBG -- Jan-Benedict Glaw jbglaw@lug-owl.de +49-172-7608481 Signature of: http://catb.org/~esr/faqs/smart-questions.html the second : [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 2:36 ` Eric Sandeen 2006-10-12 4:34 ` John Wendel @ 2006-10-12 12:28 ` Jan Kara 2006-10-12 13:20 ` Eric Sandeen 2006-10-12 16:40 ` Andrew Morton 1 sibling, 2 replies; 37+ messages in thread From: Jan Kara @ 2006-10-12 12:28 UTC (permalink / raw) To: Eric Sandeen Cc: Badari Pulavarty, Eric Sandeen, Dave Jones, Andrew Morton, Linux Kernel > Badari Pulavarty wrote: > > >Here is what I think is happening.. > > > >journal_unmap_buffer() - cleaned the buffer, since its outside EOF, but > >its a part of the same page. So it remained on the page->buffers > >list. (at this time its not part of any transaction). > > > >Then, ordererd_commit_write() called journal_dirty_data() and we added > >all these buffers to BJ_SyncData list. (at this time buffer is clean - > >not dirty). > > > >Now msync() called __set_page_dirty_buffers() and dirtied *all* the > >buffers attached to this page. > > > >journal_submit_data_buffers() got around to this buffer and tried to > >submit the buffer... Yes, this is certainly one we need to fix. > This seems about right, but one thing bothers me in the traces; it seems > like there is some locking that is missing. In > http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt > for example, it looks like journal_dirty_data gets started, but then the > buffer_head is acted on by journal_unmap_buffer, which decides this buffer > is part of the running transaction, past EOF, and clears mapped, dirty, It's part of the committing transaction. > etc. Then journal_dirty_data picks up again, decides that the buffer is > not on the right list (now BJ_None) and puts it back on BJ_SyncData. Then > it gets picked up by journal_submit_data_buffers and submitted, and oops. Now it is put on the running transaction's BJ_SyncData list. But otherwise you're right. > Talking with Stephen, it seemed like the page lock should synchronize these > threads, but I've found that we can get to journal_dirty_data acting on the > buffer heads w/o having the page locked... Yes, PageLock should protect us. Where can we call journal_dirty_data() without PageLock? I see the following callers: ext3_ordered_commit_write - should have PageLock ext3_ordered_writepage - has PageLock ext3_block_truncate_page - has PageLock And that are all callers from ext3. Am I missing something? Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 12:28 ` Jan Kara @ 2006-10-12 13:20 ` Eric Sandeen 2006-10-12 16:40 ` Andrew Morton 1 sibling, 0 replies; 37+ messages in thread From: Eric Sandeen @ 2006-10-12 13:20 UTC (permalink / raw) To: Jan Kara Cc: Eric Sandeen, Badari Pulavarty, Dave Jones, Andrew Morton, Linux Kernel Jan Kara wrote: >> Talking with Stephen, it seemed like the page lock should synchronize these >> threads, but I've found that we can get to journal_dirty_data acting on the >> buffer heads w/o having the page locked... > Yes, PageLock should protect us. Where can we call > journal_dirty_data() without PageLock? I see the following callers: > ext3_ordered_commit_write - should have PageLock > ext3_ordered_writepage - has PageLock > ext3_block_truncate_page - has PageLock > > And that are all callers from ext3. Am I missing something? > > Honza I put an assert about the page being locked in journal_dirty_data, and hit it right away. I'll look a bit more but I think this is how I got there: ext3_ordered_writepage <-- assert PageLocked ... block_write_full_page __block_write_full_page unlock_page() ... walk_page_buffers journal_dirty_data_fn ext3_journal_dirty_data journal_dirty_data <-- find page unlocked there's a comment in ext3_ordered_writepage: /* * The page can become unlocked at any point now, and * truncate can then come in and change things. So we * can't touch *page from now on. But *page_bufs is * safe due to elevated refcount. */ -Eric ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 12:28 ` Jan Kara 2006-10-12 13:20 ` Eric Sandeen @ 2006-10-12 16:40 ` Andrew Morton 2006-10-12 16:44 ` Eric Sandeen 2006-10-12 20:07 ` Eric Sandeen 1 sibling, 2 replies; 37+ messages in thread From: Andrew Morton @ 2006-10-12 16:40 UTC (permalink / raw) To: Jan Kara Cc: Eric Sandeen, Badari Pulavarty, Eric Sandeen, Dave Jones, Linux Kernel On Thu, 12 Oct 2006 14:28:20 +0200 Jan Kara <jack@suse.cz> wrote: > Where can we call > journal_dirty_data() without PageLock? block_write_full_page() will unlock the page, so ext3_writepage() will run journal_dirty_data_fn() against an unlocked page. I haven't looked into the exact details of the race, but it should be addressable via jbd_lock_bh_state() or j_list_lock coverage. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 16:40 ` Andrew Morton @ 2006-10-12 16:44 ` Eric Sandeen 2006-10-12 20:07 ` Eric Sandeen 1 sibling, 0 replies; 37+ messages in thread From: Eric Sandeen @ 2006-10-12 16:44 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Badari Pulavarty, Eric Sandeen, Dave Jones, Linux Kernel Andrew Morton wrote: > On Thu, 12 Oct 2006 14:28:20 +0200 > Jan Kara <jack@suse.cz> wrote: > >> Where can we call >> journal_dirty_data() without PageLock? > > block_write_full_page() will unlock the page, so ext3_writepage() > will run journal_dirty_data_fn() against an unlocked page. > > I haven't looked into the exact details of the race, but it should > be addressable via jbd_lock_bh_state() or j_list_lock coverage. Yep, that's what I've been hashing out with Stephen today... In one of my cases journal_dirty_data has dropped & re-acquired the bh_state lock and j_list_lock, and journal_unmap_buffer has come along in the meantime. So it looks like we are missing some state tests, i.e. buffer_mapped(), at a couple points after we acquire jbd_lock_bh_state(). -Eric ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 16:40 ` Andrew Morton 2006-10-12 16:44 ` Eric Sandeen @ 2006-10-12 20:07 ` Eric Sandeen 2006-10-12 21:55 ` Badari Pulavarty 1 sibling, 1 reply; 37+ messages in thread From: Eric Sandeen @ 2006-10-12 20:07 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Badari Pulavarty, Eric Sandeen, Dave Jones, Linux Kernel Andrew Morton wrote: > On Thu, 12 Oct 2006 14:28:20 +0200 > Jan Kara <jack@suse.cz> wrote: > > >> Where can we call >> journal_dirty_data() without PageLock? >> > > block_write_full_page() will unlock the page, so ext3_writepage() > will run journal_dirty_data_fn() against an unlocked page. > > I haven't looked into the exact details of the race, but it should > be addressable via jbd_lock_bh_state() or j_list_lock coverage I'm testing with something like this now; seem sane? journal_dirty_data & journal_unmap_data both check do jbd_lock_bh_state(bh) close to the top... journal_dirty_data_fn has checked buffer_mapped before getting into journal_dirty_data, but that state may change before the lock is grabbed. Similarly re-check after we drop the lock. -Eric Index: linux-2.6.18-1.2737.fc6/fs/jbd/transaction.c =================================================================== --- linux-2.6.18-1.2737.fc6.orig/fs/jbd/transaction.c +++ linux-2.6.18-1.2737.fc6/fs/jbd/transaction.c @@ -967,6 +967,13 @@ int journal_dirty_data(handle_t *handle, */ jbd_lock_bh_state(bh); spin_lock(&journal->j_list_lock); + + /* Now that we have bh_state locked, are we really still mapped? */ + if (!buffer_mapped(bh)) { + JBUFFER_TRACE(jh, "unmapped, bailing out"); + goto no_journal; + } + if (jh->b_transaction) { JBUFFER_TRACE(jh, "has transaction"); if (jh->b_transaction != handle->h_transaction) { @@ -1028,6 +1036,11 @@ int journal_dirty_data(handle_t *handle, sync_dirty_buffer(bh); jbd_lock_bh_state(bh); spin_lock(&journal->j_list_lock); + /* Since we dropped the lock... */ + if (!buffer_mapped(bh)) { + JBUFFER_TRACE(jh, "Got unmapped"); + goto no_journal; + } /* The buffer may become locked again at any time if it is redirtied */ } ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 20:07 ` Eric Sandeen @ 2006-10-12 21:55 ` Badari Pulavarty 2006-10-12 21:57 ` Eric Sandeen 2006-10-13 7:56 ` Jan Kara 0 siblings, 2 replies; 37+ messages in thread From: Badari Pulavarty @ 2006-10-12 21:55 UTC (permalink / raw) To: Eric Sandeen Cc: Andrew Morton, Jan Kara, Eric Sandeen, Dave Jones, Linux Kernel Eric Sandeen wrote: > Andrew Morton wrote: > > >> On Thu, 12 Oct 2006 14:28:20 +0200 >> Jan Kara <jack@suse.cz> wrote: >> >> >> >>> Where can we call >>> journal_dirty_data() without PageLock? >>> >>> >> block_write_full_page() will unlock the page, so ext3_writepage() >> will run journal_dirty_data_fn() against an unlocked page. >> >> I haven't looked into the exact details of the race, but it should >> be addressable via jbd_lock_bh_state() or j_list_lock coverage >> > I'm testing with something like this now; seem sane? > > journal_dirty_data & journal_unmap_data both check do > jbd_lock_bh_state(bh) close to the top... journal_dirty_data_fn has checked > buffer_mapped before getting into journal_dirty_data, but that state may > change before the lock is grabbed. Similarly re-check after we drop the lock. > > This is exactly the solution I proposed earlier (to check buffer_mapped() before calling submit_bh()). But at that time, Jan pointed out that the whole handling is wrong. But if this is the only case we need to handle, I am okay with this band aid :) Thanks, Badari ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 21:55 ` Badari Pulavarty @ 2006-10-12 21:57 ` Eric Sandeen 2006-10-12 22:34 ` Badari Pulavarty 2006-10-13 7:56 ` Jan Kara 1 sibling, 1 reply; 37+ messages in thread From: Eric Sandeen @ 2006-10-12 21:57 UTC (permalink / raw) To: Badari Pulavarty Cc: Eric Sandeen, Andrew Morton, Jan Kara, Dave Jones, Linux Kernel Badari Pulavarty wrote: > This is exactly the solution I proposed earlier (to check > buffer_mapped() before calling submit_bh()). > But at that time, Jan pointed out that the whole handling is wrong. > > But if this is the only case we need to handle, I am okay with this band > aid :) Doh! And we come full circle... ok let me go reread that thread, it got long enough I had to swap out... :) -Eric ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 21:57 ` Eric Sandeen @ 2006-10-12 22:34 ` Badari Pulavarty 0 siblings, 0 replies; 37+ messages in thread From: Badari Pulavarty @ 2006-10-12 22:34 UTC (permalink / raw) To: Eric Sandeen Cc: Eric Sandeen, Andrew Morton, Jan Kara, Dave Jones, Linux Kernel Eric Sandeen wrote: > Badari Pulavarty wrote: > > >> This is exactly the solution I proposed earlier (to check >> buffer_mapped() before calling submit_bh()). >> But at that time, Jan pointed out that the whole handling is wrong. >> >> But if this is the only case we need to handle, I am okay with this band >> aid :) >> > > Doh! > > And we come full circle... ok let me go reread that thread, it got long > enough I had to swap out... :) Don't bother. Lets see if this is the only case that needs fixing and move forward.. Thanks, Badari ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-12 21:55 ` Badari Pulavarty 2006-10-12 21:57 ` Eric Sandeen @ 2006-10-13 7:56 ` Jan Kara 2006-10-13 16:08 ` Eric Sandeen 1 sibling, 1 reply; 37+ messages in thread From: Jan Kara @ 2006-10-13 7:56 UTC (permalink / raw) To: Badari Pulavarty Cc: Eric Sandeen, Andrew Morton, Eric Sandeen, Dave Jones, Linux Kernel > Eric Sandeen wrote: > >Andrew Morton wrote: > > > > > >>On Thu, 12 Oct 2006 14:28:20 +0200 > >>Jan Kara <jack@suse.cz> wrote: > >> > >> > >> > >>>Where can we call > >>>journal_dirty_data() without PageLock? > >>> > >>> > >>block_write_full_page() will unlock the page, so ext3_writepage() > >>will run journal_dirty_data_fn() against an unlocked page. > >> > >>I haven't looked into the exact details of the race, but it should > >>be addressable via jbd_lock_bh_state() or j_list_lock coverage > >> > >I'm testing with something like this now; seem sane? > > > >journal_dirty_data & journal_unmap_data both check do > >jbd_lock_bh_state(bh) close to the top... journal_dirty_data_fn has > >checked buffer_mapped before getting into journal_dirty_data, but that > >state may > >change before the lock is grabbed. Similarly re-check after we drop the > >lock. > > > > > This is exactly the solution I proposed earlier (to check > buffer_mapped() before calling submit_bh()). > But at that time, Jan pointed out that the whole handling is wrong. Yes, and it was. However it turned out that there are more problems than I thought ;). > But if this is the only case we need to handle, I am okay with this band > aid :) I think Eric's patch may be a part of it. But we still need to check whether the buffer is not after EOF before submitting it (or better said just after we manage to lock the buffer). Because while we are waiting for the buffer lock, journal_unmap_buffer() can still come and steal the buffer - at least the write-out in journal_dirty_data() definitely needs the check if I haven't overlooked something. Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-13 7:56 ` Jan Kara @ 2006-10-13 16:08 ` Eric Sandeen 2006-10-16 16:54 ` Jan Kara 0 siblings, 1 reply; 37+ messages in thread From: Eric Sandeen @ 2006-10-13 16:08 UTC (permalink / raw) To: Jan Kara Cc: Badari Pulavarty, Andrew Morton, Eric Sandeen, Dave Jones, Linux Kernel Jan Kara wrote: >> This is exactly the solution I proposed earlier (to check >> buffer_mapped() before calling submit_bh()). >> But at that time, Jan pointed out that the whole handling is wrong. > Yes, and it was. However it turned out that there are more problems > than I thought ;). > >> But if this is the only case we need to handle, I am okay with this band >> aid :) > I think Eric's patch may be a part of it. But we still need to check whether > the buffer is not after EOF before submitting it (or better said just > after we manage to lock the buffer). Because while we are waiting for > the buffer lock, journal_unmap_buffer() can still come and steal the > buffer - at least the write-out in journal_dirty_data() definitely needs > the check if I haven't overlooked something. Ok, let me think on that today. My first reaction is that if we have the bh state lock and pay attention to mapped in journal_dirty_data(), then any blocks past EOF which have gotten unmapped by journal_unmap_buffer will be recognized as such (because they are now unmapped... without needing to check for past EOF...) and we'll be fine. As a datapoint, davej's stresstest (several fsx's and fsstresses) survived an overnight run on his box, which used to panic in < 2 hrs. Survived about 6 hours on my box until I intentionally stopped it; my box had added a write/truncate test in a loop, with a bunch of periodic syncs as well.... -Eric ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-13 16:08 ` Eric Sandeen @ 2006-10-16 16:54 ` Jan Kara 2006-10-16 16:56 ` Eric Sandeen 0 siblings, 1 reply; 37+ messages in thread From: Jan Kara @ 2006-10-16 16:54 UTC (permalink / raw) To: Eric Sandeen Cc: Badari Pulavarty, Andrew Morton, Eric Sandeen, Dave Jones, Linux Kernel > Jan Kara wrote: > > >> This is exactly the solution I proposed earlier (to check > >> buffer_mapped() before calling submit_bh()). > >> But at that time, Jan pointed out that the whole handling is wrong. > > Yes, and it was. However it turned out that there are more problems > > than I thought ;). > > > >> But if this is the only case we need to handle, I am okay with this band > >> aid :) > > I think Eric's patch may be a part of it. But we still need to check whether > > the buffer is not after EOF before submitting it (or better said just > > after we manage to lock the buffer). Because while we are waiting for > > the buffer lock, journal_unmap_buffer() can still come and steal the > > buffer - at least the write-out in journal_dirty_data() definitely needs > > the check if I haven't overlooked something. > > Ok, let me think on that today. My first reaction is that if we have > the bh state lock and pay attention to mapped in journal_dirty_data(), > then any blocks past EOF which have gotten unmapped by > journal_unmap_buffer will be recognized as such (because they are now > unmapped... without needing to check for past EOF...) and we'll be fine. Hmm, yes, you're right. If we do the test in journal_dirty_data() we should not file unmapped buffer into transaction's list and hence we should be safe. Fine. In case we eventually hit the assertion, we can think further ;). > As a datapoint, davej's stresstest (several fsx's and fsstresses) > survived an overnight run on his box, which used to panic in < 2 hrs. > Survived about 6 hours on my box until I intentionally stopped it; my > box had added a write/truncate test in a loop, with a bunch of periodic > syncs as well.... Perfect :). Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-16 16:54 ` Jan Kara @ 2006-10-16 16:56 ` Eric Sandeen 0 siblings, 0 replies; 37+ messages in thread From: Eric Sandeen @ 2006-10-16 16:56 UTC (permalink / raw) To: Jan Kara Cc: Badari Pulavarty, Andrew Morton, Eric Sandeen, Dave Jones, Linux Kernel Jan Kara wrote: >>> I think Eric's patch may be a part of it. But we still need to check whether >>> the buffer is not after EOF before submitting it (or better said just >>> after we manage to lock the buffer). Because while we are waiting for >>> the buffer lock, journal_unmap_buffer() can still come and steal the >>> buffer - at least the write-out in journal_dirty_data() definitely needs >>> the check if I haven't overlooked something. >> Ok, let me think on that today. My first reaction is that if we have >> the bh state lock and pay attention to mapped in journal_dirty_data(), >> then any blocks past EOF which have gotten unmapped by >> journal_unmap_buffer will be recognized as such (because they are now >> unmapped... without needing to check for past EOF...) and we'll be fine. > Hmm, yes, you're right. If we do the test in journal_dirty_data() we > should not file unmapped buffer into transaction's list and hence we > should be safe. Fine. In case we eventually hit the assertion, we can > think further ;). Awww no way that can possibly happen right? :) >> As a datapoint, davej's stresstest (several fsx's and fsstresses) >> survived an overnight run on his box, which used to panic in < 2 hrs. >> Survived about 6 hours on my box until I intentionally stopped it; my >> box had added a write/truncate test in a loop, with a bunch of periodic >> syncs as well.... > Perfect :). Ok, thanks Jan! I'll send a patch for -mm as a new thread, this one has gotten buried pretty deep. -Eric ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-09 19:46 ` Eric Sandeen 2006-10-09 19:59 ` Eric Sandeen 2006-10-09 21:59 ` Badari Pulavarty @ 2006-10-09 22:40 ` Jan-Benedict Glaw 2006-10-10 13:16 ` Jan Kara 2 siblings, 1 reply; 37+ messages in thread From: Jan-Benedict Glaw @ 2006-10-09 22:40 UTC (permalink / raw) To: Eric Sandeen Cc: Andrew Morton, Dave Jones, Linux Kernel, esandeen, Badari Pulavarty, Jan Kara [-- Attachment #1: Type: text/plain, Size: 1891 bytes --] On Mon, 2006-10-09 14:46:30 -0500, Eric Sandeen <sandeen@sandeen.net> wrote: > Andrew Morton wrote: > > On Tue, 03 Oct 2006 00:43:01 -0500 > > Eric Sandeen <sandeen@sandeen.net> wrote: > > > Dave Jones wrote: > > > > So I managed to reproduce it with an 'fsx foo' and a > > > > 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from > > > > a vanilla 2.6.18 with none of the Fedora patches.. > > > > > > > > I'll give 2.6.18-git a try next. > > > > > > > > ----------- [cut here ] --------- [please bite here ] --------- > > > > Kernel BUG at fs/buffer.c:2791 > > > I had thought/hoped that this was fixed by Jan's patch at > > > http://lkml.org/lkml/2006/9/7/236 from the thread started at > > > http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug > > > first by going through that new codepath.... > > > > Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, > > Badari was hitting that BUG and was able to confirm that Jan's patch > > (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed > > it. > > Looking at some BH traces*, it appears that what Dave hit is a truncate > racing with a sync... > > truncate ... > ext3_invalidate_page > journal_invalidatepage > journal_unmap buffer > > going off at the same time as > > sync ... > journal_dirty_data > sync_dirty_buffer > submit_bh <-- finds unmapped buffer, boom. Is this possibly related to the issues that are discussed in another thread? We're seeing problems while unlinking large files (usually get it within some hours with 200MB files, but couldn't yet reproduce it with 20MB.) MfG, JBG -- Jan-Benedict Glaw jbglaw@lug-owl.de +49-172-7608481 Signature of: Alles wird gut! ...und heute wirds schon ein bißchen besser. the second : [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-09 22:40 ` Jan-Benedict Glaw @ 2006-10-10 13:16 ` Jan Kara 2006-10-10 16:39 ` Jan-Benedict Glaw 0 siblings, 1 reply; 37+ messages in thread From: Jan Kara @ 2006-10-10 13:16 UTC (permalink / raw) To: Eric Sandeen, Andrew Morton, Dave Jones, Linux Kernel, esandeen, Badari Pulavarty > On Mon, 2006-10-09 14:46:30 -0500, Eric Sandeen <sandeen@sandeen.net> wrote: > > Andrew Morton wrote: > > > On Tue, 03 Oct 2006 00:43:01 -0500 > > > Eric Sandeen <sandeen@sandeen.net> wrote: > > > > Dave Jones wrote: > > > > > So I managed to reproduce it with an 'fsx foo' and a > > > > > 'fsstress -d . -r -n 100000 -p 20 -r'. This time I grabbed it from > > > > > a vanilla 2.6.18 with none of the Fedora patches.. > > > > > > > > > > I'll give 2.6.18-git a try next. > > > > > > > > > > ----------- [cut here ] --------- [please bite here ] --------- > > > > > Kernel BUG at fs/buffer.c:2791 > > > > I had thought/hoped that this was fixed by Jan's patch at > > > > http://lkml.org/lkml/2006/9/7/236 from the thread started at > > > > http://lkml.org/lkml/2006/9/1/149, but it seems maybe not. Dave hit this bug > > > > first by going through that new codepath.... > > > > > > Yes, Jan's patch is supposed to fix that !buffer_mapped() assertion. iirc, > > > Badari was hitting that BUG and was able to confirm that Jan's patch > > > (3998b9301d3d55be8373add22b6bc5e11c1d9b71 in post-2.6.18 mainline) fixed > > > it. > > > > Looking at some BH traces*, it appears that what Dave hit is a truncate > > racing with a sync... > > > > truncate ... > > ext3_invalidate_page > > journal_invalidatepage > > journal_unmap buffer > > > > going off at the same time as > > > > sync ... > > journal_dirty_data > > sync_dirty_buffer > > submit_bh <-- finds unmapped buffer, boom. > > Is this possibly related to the issues that are discussed in another > thread? We're seeing problems while unlinking large files (usually get > it within some hours with 200MB files, but couldn't yet reproduce it > with 20MB.) I don't think this is related (BTW: I've run your test for 5 hours without any luck ;( Maybe I'll try again for some longer time...). Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.18 ext3 panic. 2006-10-10 13:16 ` Jan Kara @ 2006-10-10 16:39 ` Jan-Benedict Glaw 0 siblings, 0 replies; 37+ messages in thread From: Jan-Benedict Glaw @ 2006-10-10 16:39 UTC (permalink / raw) To: Jan Kara Cc: Eric Sandeen, Andrew Morton, Dave Jones, Linux Kernel, esandeen, Badari Pulavarty [-- Attachment #1: Type: text/plain, Size: 841 bytes --] On Tue, 2006-10-10 15:16:03 +0200, Jan Kara <jack@suse.cz> wrote: > > Is this possibly related to the issues that are discussed in another > > thread? We're seeing problems while unlinking large files (usually get > > it within some hours with 200MB files, but couldn't yet reproduce it > > with 20MB.) > I don't think this is related (BTW: I've run your test for 5 hours > without any luck ;( Maybe I'll try again for some longer time...). Looking at it, I also think it's a different thing. Just to add, I couldn't make it bug with a 10 MB file in a day, but it failed again using a 100MB file. So file size seems to matter somehow. MfG, JBG -- Jan-Benedict Glaw jbglaw@lug-owl.de +49-172-7608481 Signature of: Wenn ich wach bin, träume ich. the second : [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2006-10-16 16:57 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-02 19:47 2.6.18 ext3 panic Dave Jones 2006-10-03 5:22 ` Dave Jones 2006-10-03 5:43 ` Eric Sandeen 2006-10-03 6:19 ` Andrew Morton 2006-10-03 6:40 ` Dave Jones 2006-10-03 16:45 ` Dave Jones 2006-10-09 19:46 ` Eric Sandeen 2006-10-09 19:59 ` Eric Sandeen 2006-10-09 21:59 ` Badari Pulavarty 2006-10-09 22:50 ` Dave Jones 2006-10-10 14:11 ` Jan Kara 2006-10-10 18:42 ` Andrew Morton 2006-10-10 22:03 ` Eric Sandeen 2006-10-10 22:25 ` Badari Pulavarty 2006-10-11 1:43 ` Eric Sandeen 2006-10-11 10:33 ` Jan Kara 2006-10-11 13:44 ` Eric Sandeen 2006-10-11 14:22 ` Jan Kara 2006-10-11 17:54 ` Badari Pulavarty 2006-10-12 2:36 ` Eric Sandeen 2006-10-12 4:34 ` John Wendel 2006-10-12 6:57 ` Jan-Benedict Glaw 2006-10-12 12:28 ` Jan Kara 2006-10-12 13:20 ` Eric Sandeen 2006-10-12 16:40 ` Andrew Morton 2006-10-12 16:44 ` Eric Sandeen 2006-10-12 20:07 ` Eric Sandeen 2006-10-12 21:55 ` Badari Pulavarty 2006-10-12 21:57 ` Eric Sandeen 2006-10-12 22:34 ` Badari Pulavarty 2006-10-13 7:56 ` Jan Kara 2006-10-13 16:08 ` Eric Sandeen 2006-10-16 16:54 ` Jan Kara 2006-10-16 16:56 ` Eric Sandeen 2006-10-09 22:40 ` Jan-Benedict Glaw 2006-10-10 13:16 ` Jan Kara 2006-10-10 16:39 ` Jan-Benedict Glaw
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox