* possible ext4 race situation freezing linux @ 2009-02-24 12:22 Andreas Friedrich Berendsen 2009-02-24 14:17 ` Theodore Tso 0 siblings, 1 reply; 9+ messages in thread From: Andreas Friedrich Berendsen @ 2009-02-24 12:22 UTC (permalink / raw) To: linux-ext4 Kernel version: 2.26.28.7 efs2progs version: 1.41.4 arch: x86_64 (amd) I made this test four times and the results were the same: linux freezes and becomes unresponsive. Only solution is to reset the box. I do not know if the problem is with the USB devices sub-subsystem or a possible ext4 race condition. That's the scenario: 1. a vg group (VGSTORE) using usb mass storage devices 2. a non-lvm controlled usb mass storage device named ANDREAS 3. a cpio operation running from the ANDREAS device to a VGSTORE.LV filesystem (LV1) 4. a cpio operation from a VGSTORE.LV filesystem (LV2) to LV1 5. a rm operation on LV1 Nothing is recorded in /var/log/messages I know that this is not too much information but I'm keen to repeat the tests and turn on any debug options necessary to track the problem. -- Berendsen ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: possible ext4 race situation freezing linux 2009-02-24 12:22 possible ext4 race situation freezing linux Andreas Friedrich Berendsen @ 2009-02-24 14:17 ` Theodore Tso 2009-02-24 15:38 ` Eric Sandeen 0 siblings, 1 reply; 9+ messages in thread From: Theodore Tso @ 2009-02-24 14:17 UTC (permalink / raw) To: Andreas Friedrich Berendsen; +Cc: linux-ext4 On Wed, Feb 25, 2009 at 01:22:22AM +1300, Andreas Friedrich Berendsen wrote: > Kernel version: 2.26.28.7 > efs2progs version: 1.41.4 > arch: x86_64 (amd) > > I made this test four times and the results were the same: linux > freezes and becomes unresponsive. Only solution is to reset the box. I > do not know if the problem is with the USB devices sub-subsystem or a > possible ext4 race condition. Can you use alt-sysrq to get some stack traces or a register dump out, so we can see where the kernel is hanging? - Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: possible ext4 race situation freezing linux 2009-02-24 14:17 ` Theodore Tso @ 2009-02-24 15:38 ` Eric Sandeen 2009-02-24 17:03 ` Andreas Friedrich Berendsen 0 siblings, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2009-02-24 15:38 UTC (permalink / raw) To: Theodore Tso; +Cc: Andreas Friedrich Berendsen, linux-ext4 Theodore Tso wrote: > On Wed, Feb 25, 2009 at 01:22:22AM +1300, Andreas Friedrich Berendsen wrote: >> Kernel version: 2.26.28.7 >> efs2progs version: 1.41.4 >> arch: x86_64 (amd) >> >> I made this test four times and the results were the same: linux >> freezes and becomes unresponsive. Only solution is to reset the box. I >> do not know if the problem is with the USB devices sub-subsystem or a >> possible ext4 race condition. > > Can you use alt-sysrq to get some stack traces or a register dump out, > so we can see where the kernel is hanging? > > - Ted alt-sysrq-w would be a good place to start (just in case you're not familiar w/ the sysrq keys) It'd also be great to test w/ 2.6.29, as a deadlock was fixed there recently (it's on its way to .28.x too AFAIK) Thanks, -Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: possible ext4 race situation freezing linux 2009-02-24 15:38 ` Eric Sandeen @ 2009-02-24 17:03 ` Andreas Friedrich Berendsen 2009-02-24 17:30 ` Theodore Tso 0 siblings, 1 reply; 9+ messages in thread From: Andreas Friedrich Berendsen @ 2009-02-24 17:03 UTC (permalink / raw) To: linux-ext4 Tso & Eric, I'll try again to reproduce the error, but when the system freezes, I have the X interface running. And, indeed, I'm not familiar with the behaviour of sysreq. Which sysrq should I use when the problem happens? Will the system change to the text interface? If there are no disk activity (all leds are off), where the information will be recorded? -- Berendsen -----Original Message----- From: Eric Sandeen <sandeen@redhat.com> To: Theodore Tso <tytso@mit.edu> Cc: Andreas Friedrich Berendsen <afberendsen@gmail.com>, linux-ext4@vger.kernel.org Subject: Re: possible ext4 race situation freezing linux Date: Tue, 24 Feb 2009 09:38:17 -0600 Theodore Tso wrote: > On Wed, Feb 25, 2009 at 01:22:22AM +1300, Andreas Friedrich Berendsen wrote: >> Kernel version: 2.26.28.7 >> efs2progs version: 1.41.4 >> arch: x86_64 (amd) >> >> I made this test four times and the results were the same: linux >> freezes and becomes unresponsive. Only solution is to reset the box. I >> do not know if the problem is with the USB devices sub-subsystem or a >> possible ext4 race condition. > > Can you use alt-sysrq to get some stack traces or a register dump out, > so we can see where the kernel is hanging? > > - Ted alt-sysrq-w would be a good place to start (just in case you're not familiar w/ the sysrq keys) It'd also be great to test w/ 2.6.29, as a deadlock was fixed there recently (it's on its way to .28.x too AFAIK) Thanks, -Eric -- __________________________________________ Andreas Friedrich Berendsen SCA OCP MSCA A+ Linux+ Network+ HpMASE ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: possible ext4 race situation freezing linux 2009-02-24 17:03 ` Andreas Friedrich Berendsen @ 2009-02-24 17:30 ` Theodore Tso 2009-03-04 6:42 ` Andreas Friedrich Berendsen 0 siblings, 1 reply; 9+ messages in thread From: Theodore Tso @ 2009-02-24 17:30 UTC (permalink / raw) To: Andreas Friedrich Berendsen; +Cc: linux-ext4 On Wed, Feb 25, 2009 at 06:03:33AM +1300, Andreas Friedrich Berendsen wrote: > Tso & Eric, > > I'll try again to reproduce the error, but when the system freezes, I > have the X interface running. And, indeed, I'm not familiar with the > behaviour of sysreq. Which sysrq should I use when the problem happens? > Will the system change to the text interface? If there are no disk > activity (all leds are off), where the information will be recorded? Well, given that you can reproduce it fairly reliably, can't you just switch to a text console using before triggering your reproduction case? You can use different VT consules, switching between them using Alt-F2, Alt-F3, Alt-F2, Alt-F4, etc., instead of using different terminal windows. Depending on how badly system is wedged, it may not be possible to record the information to disk. Usually what folks will do is use a digital camera and record snapshots from the text console, or, if they have a serial console set up, they can record output on another machine. A serial console has the advantage that you can reliably capture the entire sysrq output (you send a serial break character instead of using the sysrq key) and it also works even if you have X running. As such, it's the preferred method, but it's a bit of a pain for most people to set up, and some modern laptops no longer have 8250 serial ports any more, so we make do with what we have. Regards, - Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: possible ext4 race situation freezing linux 2009-02-24 17:30 ` Theodore Tso @ 2009-03-04 6:42 ` Andreas Friedrich Berendsen 2009-03-04 6:56 ` Eric Sandeen 0 siblings, 1 reply; 9+ messages in thread From: Andreas Friedrich Berendsen @ 2009-03-04 6:42 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 [-- Attachment #1: Type: text/plain, Size: 2050 bytes --] Ts'o, Problem still exist. Now, when executing 'fsck.ext4 -C 0 -F -y -v' I receive a list of inodes, and at certain point system freezes. Attached I'm sending the output for SysRq as requested Cheers -- Berendsen -----Original Message----- From: Theodore Tso <tytso@mit.edu> To: Andreas Friedrich Berendsen <afberendsen@gmail.com> Cc: linux-ext4 <linux-ext4@vger.kernel.org> Bcc: tytso@mit.edu Subject: Re: possible ext4 race situation freezing linux Date: Tue, 24 Feb 2009 12:30:31 -0500 On Wed, Feb 25, 2009 at 06:03:33AM +1300, Andreas Friedrich Berendsen wrote: > Tso & Eric, > > I'll try again to reproduce the error, but when the system freezes, I > have the X interface running. And, indeed, I'm not familiar with the > behaviour of sysreq. Which sysrq should I use when the problem happens? > Will the system change to the text interface? If there are no disk > activity (all leds are off), where the information will be recorded? Well, given that you can reproduce it fairly reliably, can't you just switch to a text console using before triggering your reproduction case? You can use different VT consules, switching between them using Alt-F2, Alt-F3, Alt-F2, Alt-F4, etc., instead of using different terminal windows. Depending on how badly system is wedged, it may not be possible to record the information to disk. Usually what folks will do is use a digital camera and record snapshots from the text console, or, if they have a serial console set up, they can record output on another machine. A serial console has the advantage that you can reliably capture the entire sysrq output (you send a serial break character instead of using the sysrq key) and it also works even if you have X running. As such, it's the preferred method, but it's a bit of a pain for most people to set up, and some modern laptops no longer have 8250 serial ports any more, so we make do with what we have. Regards, - Ted -- __________________________________________ Andreas Friedrich Berendsen SCA OCP MSCA A+ Linux+ Network+ HpMASE [-- Attachment #2: messages.gz --] [-- Type: application/x-gzip, Size: 29441 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: possible ext4 race situation freezing linux 2009-03-04 6:42 ` Andreas Friedrich Berendsen @ 2009-03-04 6:56 ` Eric Sandeen 2009-03-04 8:03 ` Andreas Friedrich Berendsen 0 siblings, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2009-03-04 6:56 UTC (permalink / raw) To: Andreas Friedrich Berendsen; +Cc: Theodore Tso, linux-ext4 Andreas Friedrich Berendsen wrote: > Ts'o, > > Problem still exist. Now, when executing 'fsck.ext4 -C 0 -F -y -v' I > receive a list of inodes, and at certain point system freezes. so now it's freezing when ext4 isn't even mounted but simply being fsck'd? This may point to a generic storage problem. > Attached I'm sending the output for SysRq as requested $ zcat messages.gz | grep -i sysrq $ The sysrq output didn't seem to make it to that log. -Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: possible ext4 race situation freezing linux 2009-03-04 6:56 ` Eric Sandeen @ 2009-03-04 8:03 ` Andreas Friedrich Berendsen 2009-03-06 19:21 ` Andreas Friedrich Berendsen 0 siblings, 1 reply; 9+ messages in thread From: Andreas Friedrich Berendsen @ 2009-03-04 8:03 UTC (permalink / raw) To: Eric Sandeen; +Cc: Theodore Tso, linux-ext4 Steps: 1. Original FS with problems. 2. Using fsck with -y was problematic because at certainpoint a segment faul occured 3. Using fsck manually. Answering 'y' for all questions but the one which caused the segment fault 4. Removed as most as possible files from FS to new LV in the same VG 5. A new fsck run worked 6. resize2fs+lvreduce to reduce FS size and have more free space in VG 7. Removed more files to new LV inside the same VG 8. New run of fsck worked 9. resize2fs to prepare for a new lvreduce. Power failure after almost 24 hours of run 10. After system reboot, FS can be mounted and FS seems to be ok. Executed find, find+grep, cp, and other tools to check file accessibility. Not messages in /var/log/messages 11. New run of fsck. system freeze 12. Per request, used ALT+PrintScreen+(dlmpqtvw) 13. Used AltPrintScree+(resuib) to restart system 14. System restarted 15. Copy of /var/log/messages to /tmp/messages. Removed lines before and after Alt+PrintScreen commands Problem can be reproduced as many times as needed. Do you want me to execute any procedures to collect data? Extract from /tmp/messages: Mar 4 19:19:28 storage kernel: ------------[ cut here ]------------ Mar 4 19:19:28 storage kernel: WARNING: at arch/x86/mm/ioremap.c:226 __ioremap_caller+0xc7/0x299() Mar 4 19:19:28 storage kernel: Modules linked in: ck804xrom(+) i2c_core mtd chipreg map_funcs joydev usb_storage ata_generic pata_acpi pata_amd [last unloaded: scsi_wait_scan] Mar 4 19:19:28 storage kernel: Pid: 881, comm: modprobe Not tainted 2.6.28.7.afb.fc10.4.x86_amd64 #1 Mar 4 19:19:28 storage kernel: Call Trace: Mar 4 19:19:28 storage kernel: [<ffffffff8104516d>] warn_on_slowpath +0x58/0x7d Mar 4 19:19:28 storage kernel: [<ffffffff810458a5>] ? release_console_sem+0x1c6/0x1fb Mar 4 19:19:28 storage kernel: [<ffffffff81347775>] ? printk+0x3c/0x3f Mar 4 19:19:28 storage kernel: [<ffffffff81028787>] ? default_spin_lock_flags+0x9/0xe Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom +0x0/0x556 [ck804xrom] Mar 4 19:19:28 storage kernel: [<ffffffff8102e4a6>] __ioremap_caller +0xc7/0x299 Mar 4 19:19:28 storage kernel: [<ffffffffa006c25c>] ? init_ck804xrom +0x25c/0x556 [ck804xrom] Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom +0x0/0x556 [ck804xrom] Mar 4 19:19:28 storage kernel: [<ffffffff8102e74d>] ioremap_nocache +0x12/0x14 Mar 4 19:19:28 storage kernel: [<ffffffffa006c25c>] init_ck804xrom +0x25c/0x556 [ck804xrom] Mar 4 19:19:28 storage kernel: [<ffffffff810ae835>] ? vfree+0x29/0x2b Mar 4 19:19:28 storage kernel: [<ffffffff810699e5>] ? load_module +0x1803/0x197a Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom +0x0/0x556 [ck804xrom] Mar 4 19:19:28 storage kernel: [<ffffffff8100a058>] do_one_initcall +0x58/0x145 Mar 4 19:19:28 storage kernel: [<ffffffff810c612c>] ? do_sync_read +0xe7/0x12d Mar 4 19:19:28 storage kernel: [<ffffffff81069ce2>] sys_init_module +0xa9/0x1b6 Mar 4 19:19:28 storage kernel: [<ffffffff8101104a>] system_call_fastpath+0x16/0x1b Mar 4 19:19:28 storage kernel: ---[ end trace 66d1cdaa6433edb1 ]--- -----Original Message----- From: Eric Sandeen <sandeen@redhat.com> To: Andreas Friedrich Berendsen <afberendsen@gmail.com> Cc: Theodore Tso <tytso@mit.edu>, linux-ext4 <linux-ext4@vger.kernel.org> Subject: Re: possible ext4 race situation freezing linux Date: Wed, 04 Mar 2009 00:56:04 -0600 Andreas Friedrich Berendsen wrote: > Ts'o, > > Problem still exist. Now, when executing 'fsck.ext4 -C 0 -F -y -v' I > receive a list of inodes, and at certain point system freezes. so now it's freezing when ext4 isn't even mounted but simply being fsck'd? This may point to a generic storage problem. > Attached I'm sending the output for SysRq as requested $ zcat messages.gz | grep -i sysrq $ The sysrq output didn't seem to make it to that log. -Eric -- __________________________________________ Andreas Friedrich Berendsen SCA OCP MSCA A+ Linux+ Network+ HpMASE ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: possible ext4 race situation freezing linux 2009-03-04 8:03 ` Andreas Friedrich Berendsen @ 2009-03-06 19:21 ` Andreas Friedrich Berendsen 0 siblings, 0 replies; 9+ messages in thread From: Andreas Friedrich Berendsen @ 2009-03-06 19:21 UTC (permalink / raw) To: Eric Sandeen; +Cc: Theodore Tso, linux-ext4 After downloading and installing kernel 2.6.29-rc7 fsck is working. Few (451) files with multiply-claimed blocks. fsck is running for the last 12 hours and looks like is doing the job. -----Original Message----- From: Andreas Friedrich Berendsen <afberendsen@gmail.com> To: Eric Sandeen <sandeen@redhat.com> Cc: Theodore Tso <tytso@mit.edu>, linux-ext4 <linux-ext4@vger.kernel.org> Subject: Re: possible ext4 race situation freezing linux Date: Wed, 04 Mar 2009 21:03:35 +1300 Steps: 1. Original FS with problems. 2. Using fsck with -y was problematic because at certainpoint a segment faul occured 3. Using fsck manually. Answering 'y' for all questions but the one which caused the segment fault 4. Removed as most as possible files from FS to new LV in the same VG 5. A new fsck run worked 6. resize2fs+lvreduce to reduce FS size and have more free space in VG 7. Removed more files to new LV inside the same VG 8. New run of fsck worked 9. resize2fs to prepare for a new lvreduce. Power failure after almost 24 hours of run 10. After system reboot, FS can be mounted and FS seems to be ok. Executed find, find+grep, cp, and other tools to check file accessibility. Not messages in /var/log/messages 11. New run of fsck. system freeze 12. Per request, used ALT+PrintScreen+(dlmpqtvw) 13. Used AltPrintScree+(resuib) to restart system 14. System restarted 15. Copy of /var/log/messages to /tmp/messages. Removed lines before and after Alt+PrintScreen commands Problem can be reproduced as many times as needed. Do you want me to execute any procedures to collect data? Extract from /tmp/messages: Mar 4 19:19:28 storage kernel: ------------[ cut here ]------------ Mar 4 19:19:28 storage kernel: WARNING: at arch/x86/mm/ioremap.c:226 __ioremap_caller+0xc7/0x299() Mar 4 19:19:28 storage kernel: Modules linked in: ck804xrom(+) i2c_core mtd chipreg map_funcs joydev usb_storage ata_generic pata_acpi pata_amd [last unloaded: scsi_wait_scan] Mar 4 19:19:28 storage kernel: Pid: 881, comm: modprobe Not tainted 2.6.28.7.afb.fc10.4.x86_amd64 #1 Mar 4 19:19:28 storage kernel: Call Trace: Mar 4 19:19:28 storage kernel: [<ffffffff8104516d>] warn_on_slowpath +0x58/0x7d Mar 4 19:19:28 storage kernel: [<ffffffff810458a5>] ? release_console_sem+0x1c6/0x1fb Mar 4 19:19:28 storage kernel: [<ffffffff81347775>] ? printk+0x3c/0x3f Mar 4 19:19:28 storage kernel: [<ffffffff81028787>] ? default_spin_lock_flags+0x9/0xe Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom +0x0/0x556 [ck804xrom] Mar 4 19:19:28 storage kernel: [<ffffffff8102e4a6>] __ioremap_caller +0xc7/0x299 Mar 4 19:19:28 storage kernel: [<ffffffffa006c25c>] ? init_ck804xrom +0x25c/0x556 [ck804xrom] Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom +0x0/0x556 [ck804xrom] Mar 4 19:19:28 storage kernel: [<ffffffff8102e74d>] ioremap_nocache +0x12/0x14 Mar 4 19:19:28 storage kernel: [<ffffffffa006c25c>] init_ck804xrom +0x25c/0x556 [ck804xrom] Mar 4 19:19:28 storage kernel: [<ffffffff810ae835>] ? vfree+0x29/0x2b Mar 4 19:19:28 storage kernel: [<ffffffff810699e5>] ? load_module +0x1803/0x197a Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom +0x0/0x556 [ck804xrom] Mar 4 19:19:28 storage kernel: [<ffffffff8100a058>] do_one_initcall +0x58/0x145 Mar 4 19:19:28 storage kernel: [<ffffffff810c612c>] ? do_sync_read +0xe7/0x12d Mar 4 19:19:28 storage kernel: [<ffffffff81069ce2>] sys_init_module +0xa9/0x1b6 Mar 4 19:19:28 storage kernel: [<ffffffff8101104a>] system_call_fastpath+0x16/0x1b Mar 4 19:19:28 storage kernel: ---[ end trace 66d1cdaa6433edb1 ]--- -----Original Message----- From: Eric Sandeen <sandeen@redhat.com> To: Andreas Friedrich Berendsen <afberendsen@gmail.com> Cc: Theodore Tso <tytso@mit.edu>, linux-ext4 <linux-ext4@vger.kernel.org> Subject: Re: possible ext4 race situation freezing linux Date: Wed, 04 Mar 2009 00:56:04 -0600 Andreas Friedrich Berendsen wrote: > Ts'o, > > Problem still exist. Now, when executing 'fsck.ext4 -C 0 -F -y -v' I > receive a list of inodes, and at certain point system freezes. so now it's freezing when ext4 isn't even mounted but simply being fsck'd? This may point to a generic storage problem. > Attached I'm sending the output for SysRq as requested $ zcat messages.gz | grep -i sysrq $ The sysrq output didn't seem to make it to that log. -Eric -- __________________________________________ Andreas Friedrich Berendsen SCA OCP MSCA A+ Linux+ Network+ HpMASE ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-03-06 19:22 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-24 12:22 possible ext4 race situation freezing linux Andreas Friedrich Berendsen 2009-02-24 14:17 ` Theodore Tso 2009-02-24 15:38 ` Eric Sandeen 2009-02-24 17:03 ` Andreas Friedrich Berendsen 2009-02-24 17:30 ` Theodore Tso 2009-03-04 6:42 ` Andreas Friedrich Berendsen 2009-03-04 6:56 ` Eric Sandeen 2009-03-04 8:03 ` Andreas Friedrich Berendsen 2009-03-06 19:21 ` Andreas Friedrich Berendsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).