* Kernel BUG when running xfs_fsr with 2.6.35.1 @ 2010-08-16 8:32 Arto Jantunen 2010-08-16 22:09 ` Dave Chinner 0 siblings, 1 reply; 6+ messages in thread From: Arto Jantunen @ 2010-08-16 8:32 UTC (permalink / raw) To: xfs [-- Attachment #1: Type: text/plain, Size: 462 bytes --] Hi List, I had a kernel BUG yesterday when running xfs_fsr on my Debian Unstable laptop. The kernel is upstream 2.6.35.1. I'm attaching the backtrace below. I haven't tried reproducing the problem yet and don't know if it is reproducible. I can try that, and test patches etc. if it is useful. Let me know if there is any other information I can provide to help with debugging. Please CC me on any replies, I'm not subscribed to the list. -- Arto Jantunen [-- Attachment #2: Backtrace --] [-- Type: text/plain, Size: 8232 bytes --] [18695.285232] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [18695.285245] IP: [<ffffffffa017f4f0>] xfs_trans_find_item+0x0/0x5 [xfs] [18695.285290] PGD 24285067 PUD 2f6f0067 PMD 0 [18695.285300] Oops: 0000 [#1] PREEMPT SMP [18695.285309] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.0/input/input8/name [18695.285316] CPU 1 [18695.285319] Modules linked in: usbhid hid cryptd aes_x86_64 aes_generic ppdev lp sco acpi_cpufreq mperf cpufreq_userspace cpufreq_stats bnep cpufreq_powersave cpufreq_conservative rfcomm l2cap crc16 bluetooth uinput fuse loop snd_hda_codec_realtek arc4 ecb pcmcia snd_hda_intel i915 snd_hda_codec snd_hwdep drm_kms_helper iwl3945 drm iwlcore i2c_algo_bit tifm_7xx1 joydev tifm_core snd_pcm_oss snd_mixer_oss yenta_socket snd_pcm pcmcia_rsrc pcmcia_core snd_timer mac80211 snd i2c_i801 rng_core cfg80211 psmouse parport_pc parport i2c_core intel_agp tpm_tis tpm tpm_bios soundcore evdev serio_raw rfkill snd_page_alloc irda crc_ccitt container battery processor video output wmi ac button xfs exportfs sd_mod crc_t10dif ata_generic ata_piix libata firewire_ohci tg3 firewire_core uhci_hcd libphy crc_itu_t scsi_mod ehci_hcd thermal thermal_sys [last unloaded: scsi_wait_scan] [18695.285464] [18695.285470] Pid: 4050, comm: xfs_fsr Not tainted 2.6.35.1 #1 Ness2 /TravelMate 3040 [18695.285476] RIP: 0010:[<ffffffffa017f4f0>] [<ffffffffa017f4f0>] xfs_trans_find_item+0x0/0x5 [xfs] [18695.285512] RSP: 0018:ffff880035055c40 EFLAGS: 00010206 [18695.285517] RAX: 0000000000000000 RBX: ffff88003cda68d0 RCX: 0000000000000005 [18695.285522] RDX: 0000000000000005 RSI: 0000000000000000 RDI: ffff88003cda68d0 [18695.285528] RBP: ffff88000462f400 R08: ffff88003e6db300 R09: ffff880035055d18 [18695.285533] R10: 0000000000000000 R11: ffff88003d081800 R12: 0000000000000005 [18695.285539] R13: ffff88000462f438 R14: 0000000000000000 R15: 0000000000000000 [18695.285546] FS: 00007f232413d700(0000) GS:ffff880001900000(0000) knlGS:0000000000000000 [18695.285552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [18695.285557] CR2: 0000000000000018 CR3: 000000000fc7d000 CR4: 00000000000006e0 [18695.285563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [18695.285569] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [18695.285575] Process xfs_fsr (pid: 4050, threadinfo ffff880035054000, task ffff88002f562c10) [18695.285580] Stack: [18695.285583] ffffffffa017f421 ffff88000462f400 0000000000000000 0000000000000000 [18695.285591] <0> ffffffffa0152a4a ffff880035055d58 0000000000000000 ffff880000000000 [18695.285601] <0> 0000000000000000 ffff880035055cf8 ffff880035055d18 ffff880000000000 [18695.285612] Call Trace: [18695.285646] [<ffffffffa017f421>] ? xfs_trans_log_inode+0x19/0x42 [xfs] [18695.285679] [<ffffffffa0152a4a>] ? xfs_bunmapi+0xb2a/0xb9e [xfs] [18695.285693] [<ffffffff8103497f>] ? get_parent_ip+0x9/0x1b [18695.285730] [<ffffffffa016b57a>] ? xfs_itruncate_finish+0x1b8/0x329 [xfs] [18695.285764] [<ffffffffa01821c5>] ? xfs_inactive+0x1e1/0x402 [xfs] [18695.285772] [<ffffffff8103497f>] ? get_parent_ip+0x9/0x1b [18695.285785] [<ffffffff810eaf9f>] ? clear_inode+0x58/0xaf [18695.285792] [<ffffffff810eb4f3>] ? generic_delete_inode+0xa4/0x110 [18695.285801] [<ffffffff810e7b25>] ? d_kill+0x42/0x61 [18695.285808] [<ffffffff810e8295>] ? dput+0x149/0x156 [18695.285815] [<ffffffff810da2a9>] ? fput+0x17b/0x1a4 [18695.285824] [<ffffffff810d7941>] ? filp_close+0x5f/0x6a [18695.285831] [<ffffffff810d79ee>] ? sys_close+0xa2/0xdb [18695.285840] [<ffffffff81002a82>] ? system_call_fastpath+0x16/0x1b [18695.285845] Code: 9d fe ff 85 c0 75 1b 48 85 ed 74 16 89 44 24 08 44 89 e2 48 8b 33 48 89 ef e8 67 ff ff ff 8b 44 24 08 5a 59 5b 5d 41 5c c3 90 90 <48> 8b 46 18 c3 44 8a 46 0b bf 01 00 00 00 41 0f b6 c0 45 0f b6 [18695.285930] RIP [<ffffffffa017f4f0>] xfs_trans_find_item+0x0/0x5 [xfs] [18695.285963] RSP <ffff880035055c40> [18695.285966] CR2: 0000000000000018 [18695.285973] ---[ end trace 95a1de083b2b773e ]--- [18696.225289] ------------[ cut here ]------------ [18696.225300] kernel BUG at fs/xfs/xfs_iget.c:301! [18696.225306] invalid opcode: 0000 [#2] PREEMPT SMP [18696.225315] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.0/input/input8/name [18696.225320] CPU 1 [18696.225323] Modules linked in: usbhid hid cryptd aes_x86_64 aes_generic ppdev lp sco acpi_cpufreq mperf cpufreq_userspace cpufreq_stats bnep cpufreq_powersave cpufreq_conservative rfcomm l2cap crc16 bluetooth uinput fuse loop snd_hda_codec_realtek arc4 ecb pcmcia snd_hda_intel i915 snd_hda_codec snd_hwdep drm_kms_helper iwl3945 drm iwlcore i2c_algo_bit tifm_7xx1 joydev tifm_core snd_pcm_oss snd_mixer_oss yenta_socket snd_pcm pcmcia_rsrc pcmcia_core snd_timer mac80211 snd i2c_i801 rng_core cfg80211 psmouse parport_pc parport i2c_core intel_agp tpm_tis tpm tpm_bios soundcore evdev serio_raw rfkill snd_page_alloc irda crc_ccitt container battery processor video output wmi ac button xfs exportfs sd_mod crc_t10dif ata_generic ata_piix libata firewire_ohci tg3 firewire_core uhci_hcd libphy crc_itu_t scsi_mod ehci_hcd thermal thermal_sys [last unloaded: scsi_wait_scan] [18696.225469] [18696.225476] Pid: 2284, comm: plasma-desktop Tainted: G D 2.6.35.1 #1 Ness2 /TravelMate 3040 [18696.225482] RIP: 0010:[<ffffffffa016962c>] [<ffffffffa016962c>] xfs_iget+0x3e3/0x593 [xfs] [18696.225527] RSP: 0018:ffff88003069b9f8 EFLAGS: 00010246 [18696.225532] RAX: 0000000000000000 RBX: ffff88003d081800 RCX: ffffc900102ae8d0 [18696.225537] RDX: ffffffff00000001 RSI: 0000000000000004 RDI: ffff88000462f4e0 [18696.225543] RBP: 0000000000000004 R08: ffffc900102ae8e0 R09: 0000000000000000 [18696.225549] R10: ffff88003d081800 R11: 0000000000000250 R12: ffff88000462f458 [18696.225554] R13: ffff88003d67af0c R14: ffff88003d67aec0 R15: 00000000000140c0 [18696.225561] FS: 00007f2005eac780(0000) GS:ffff880001900000(0000) knlGS:0000000000000000 [18696.225567] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [18696.225573] CR2: 00007f2002dbd300 CR3: 00000000307a1000 CR4: 00000000000006e0 [18696.225578] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [18696.225584] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [18696.225590] Process plasma-desktop (pid: 2284, threadinfo ffff88003069a000, task ffff8800305d71e0) [18696.225595] Stack: [18696.225599] ffff880000000000 0000000000000002 ffff88002437cb19 000000000424fe22 [18696.225608] <0> 0000001900000000 ffff88003d67af10 0424fe2200000002 0000000100000001 [18696.225619] <0> ffff88003069a000 ffff88002437cb30 ffff88003069bb18 000000000024fe22 [18696.225631] Call Trace: [18696.225668] [<ffffffffa017f4c8>] ? xfs_trans_iget+0x16/0x3e [xfs] [18696.225707] [<ffffffffa016b953>] ? xfs_ialloc+0xaf/0x53f [xfs] [18696.225742] [<ffffffffa0183e34>] ? kmem_zone_alloc+0x58/0x9c [xfs] [18696.225777] [<ffffffffa017fcd1>] ? xfs_dir_ialloc+0xa0/0x273 [xfs] [18696.225814] [<ffffffffa017dd7e>] ? xfs_trans_reserve+0xc9/0x195 [xfs] [18696.225849] [<ffffffffa018164e>] ? xfs_create+0x2f9/0x572 [xfs] [18696.225882] [<ffffffffa018c02f>] ? xfs_vn_mknod+0xcc/0x168 [xfs] [18696.225895] [<ffffffff810e1b86>] ? vfs_create+0x66/0x88 [18696.225906] [<ffffffff81035b1a>] ? sub_preempt_count+0x83/0x94 [18696.225914] [<ffffffff810e278e>] ? do_last+0x290/0x55f [18696.225923] [<ffffffff810e4285>] ? do_filp_open+0x1ee/0x5c8 [18696.225933] [<ffffffff8103497f>] ? get_parent_ip+0x9/0x1b [18696.225942] [<ffffffff812f538b>] ? _raw_spin_unlock+0x25/0x33 [18696.225952] [<ffffffff810ecb05>] ? alloc_fd+0x110/0x122 [18696.225962] [<ffffffff810d7a7d>] ? do_sys_open+0x56/0xf9 [18696.225972] [<ffffffff81002a82>] ? system_call_fastpath+0x16/0x1b [18696.225978] Code: 00 0f 84 28 01 00 00 bf d0 00 00 00 e8 d0 f1 00 e1 85 c0 0f 85 11 01 00 00 85 ed 74 12 89 ee 4c 89 e7 e8 2c f7 ff ff 85 c0 75 04 <0f> 0b eb fe 4c 89 ef e8 61 ba 18 e1 8a 4c 24 10 be 01 00 00 00 [18696.226060] RIP [<ffffffffa016962c>] xfs_iget+0x3e3/0x593 [xfs] [18696.226095] RSP <ffff88003069b9f8> [18696.226101] ---[ end trace 95a1de083b2b773f ]--- [18696.226109] note: plasma-desktop[2284] exited with preempt_count 1 [-- Attachment #3: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel BUG when running xfs_fsr with 2.6.35.1 2010-08-16 8:32 Kernel BUG when running xfs_fsr with 2.6.35.1 Arto Jantunen @ 2010-08-16 22:09 ` Dave Chinner 2010-08-17 9:19 ` Arto Jantunen 2010-08-17 17:05 ` Arto Jantunen 0 siblings, 2 replies; 6+ messages in thread From: Dave Chinner @ 2010-08-16 22:09 UTC (permalink / raw) To: Arto Jantunen; +Cc: xfs On Mon, Aug 16, 2010 at 11:32:29AM +0300, Arto Jantunen wrote: > > Hi List, > > I had a kernel BUG yesterday when running xfs_fsr on my Debian Unstable > laptop. The kernel is upstream 2.6.35.1. I'm attaching the backtrace > below. I haven't tried reproducing the problem yet and don't know if it is > reproducible. I can try that, and test patches etc. if it is useful. Let me > know if there is any other information I can provide to help with debugging. It's not obvious what has gone wrong at all - I haven't seen anything like this in all my recent testing, so it's something new. The first oops implies the inode has not been joined to the transaction, but from code inspection I cannot see how that can happen. What compiler did you use to build the kernel? can you reproduce the problem at all? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel BUG when running xfs_fsr with 2.6.35.1 2010-08-16 22:09 ` Dave Chinner @ 2010-08-17 9:19 ` Arto Jantunen 2010-08-17 17:05 ` Arto Jantunen 1 sibling, 0 replies; 6+ messages in thread From: Arto Jantunen @ 2010-08-17 9:19 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Dave Chinner <david@fromorbit.com> writes: > On Mon, Aug 16, 2010 at 11:32:29AM +0300, Arto Jantunen wrote: >> I had a kernel BUG yesterday when running xfs_fsr on my Debian Unstable >> laptop. The kernel is upstream 2.6.35.1. I'm attaching the backtrace >> below. I haven't tried reproducing the problem yet and don't know if it is >> reproducible. I can try that, and test patches etc. if it is useful. Let me >> know if there is any other information I can provide to help with debugging. > > It's not obvious what has gone wrong at all - I haven't seen > anything like this in all my recent testing, so it's something new. > The first oops implies the inode has not been joined to the > transaction, but from code inspection I cannot see how that can > happen. > > What compiler did you use to build the kernel? can you reproduce the > problem at all? I used the compiler which is available on current Debian Testing, --version reports "gcc (Debian 4.4.4-8) 4.4.5 20100728 (prerelease)". I'll try to reproduce the problem later today. -- Arto Jantunen _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel BUG when running xfs_fsr with 2.6.35.1 2010-08-16 22:09 ` Dave Chinner 2010-08-17 9:19 ` Arto Jantunen @ 2010-08-17 17:05 ` Arto Jantunen 2010-08-17 23:03 ` Dave Chinner 1 sibling, 1 reply; 6+ messages in thread From: Arto Jantunen @ 2010-08-17 17:05 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs [-- Attachment #1: Type: text/plain, Size: 1401 bytes --] Dave Chinner <david@fromorbit.com> writes: >> I had a kernel BUG yesterday when running xfs_fsr on my Debian Unstable >> laptop. The kernel is upstream 2.6.35.1. I'm attaching the backtrace >> below. I haven't tried reproducing the problem yet and don't know if it is >> reproducible. I can try that, and test patches etc. if it is useful. Let me >> know if there is any other information I can provide to help with debugging. > > It's not obvious what has gone wrong at all - I haven't seen > anything like this in all my recent testing, so it's something new. > The first oops implies the inode has not been joined to the > transaction, but from code inspection I cannot see how that can > happen. I tried to reproduce the problem, and this time xfs_fsr finished without reporting errors, but the kernel output the following two lines (one of which is essentially empty): [ 6372.878945] Filesystem "sda4": Access to block zero in inode 67203861 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 2 [ 6372.878950] I decided to boot from a usb stick and try xfs_repair -n, I have attached the output of that. There were errors reported. Is this simply a case of random (possibly hardware related) fs corruption, or were the errors actually caused by the xfs_fsr run that crashed the system? Is there a way to tell from this data, is there anything else I can provide? -- Arto Jantunen [-- Attachment #2: xfs_repair -n --] [-- Type: text/plain, Size: 10170 bytes --] Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... error following ag 0 unlinked list error following ag 2 unlinked list error following ag 3 unlinked list - process known inodes and perform inode discovery... - agno = 0 b766fb90: Badness in key lookup (length) bp=(bno 208, len 16384 bytes) key=(bno 208, len 8192 bytes) b766fb90: Badness in key lookup (length) bp=(bno 720, len 16384 bytes) key=(bno 720, len 8192 bytes) b766fb90: Badness in key lookup (length) bp=(bno 752, len 16384 bytes) key=(bno 752, len 8192 bytes) b766fb90: Badness in key lookup (length) bp=(bno 14224, len 16384 bytes) key=(bno 14224, len 8192 bytes) b766fb90: Badness in key lookup (length) bp=(bno 15440, len 16384 bytes) key=(bno 15440, len 8192 bytes) b766fb90: Badness in key lookup (length) bp=(bno 127952, len 16384 bytes) key=(bno 127952, len 8192 bytes) b766fb90: Badness in key lookup (length) bp=(bno 178096, len 16384 bytes) key=(bno 178096, len 8192 bytes) b766fb90: Badness in key lookup (length) bp=(bno 282256, len 16384 bytes) key=(bno 282256, len 8192 bytes) b766fb90: Badness in key lookup (length) bp=(bno 283888, len 16384 bytes) key=(bno 283888, len 8192 bytes) b766fb90: Badness in key lookup (length) bp=(bno 380176, len 16384 bytes) key=(bno 380176, len 8192 bytes) - agno = 1 b6e6fb90: Badness in key lookup (length) bp=(bno 29853904, len 16384 bytes) key=(bno 29853904, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 29854000, len 16384 bytes) key=(bno 29854000, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 29854032, len 16384 bytes) key=(bno 29854032, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 29854320, len 16384 bytes) key=(bno 29854320, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 29855120, len 16384 bytes) key=(bno 29855120, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 29855632, len 16384 bytes) key=(bno 29855632, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 29860912, len 16384 bytes) key=(bno 29860912, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 29861328, len 16384 bytes) key=(bno 29861328, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 29865328, len 16384 bytes) key=(bno 29865328, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 30082480, len 16384 bytes) key=(bno 30082480, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 30153392, len 16384 bytes) key=(bno 30153392, len 8192 bytes) b6e6fb90: Badness in key lookup (length) bp=(bno 30156400, len 16384 bytes) key=(bno 30156400, len 8192 bytes) - agno = 2 b4cecb90: Badness in key lookup (length) bp=(bno 44738944, len 16384 bytes) key=(bno 44738944, len 8192 bytes) b4cecb90: Badness in key lookup (length) bp=(bno 44756544, len 16384 bytes) key=(bno 44756544, len 8192 bytes) b4cecb90: Badness in key lookup (length) bp=(bno 44910976, len 16384 bytes) key=(bno 44910976, len 8192 bytes) b4cecb90: Badness in key lookup (length) bp=(bno 44917376, len 16384 bytes) key=(bno 44917376, len 8192 bytes) b4cecb90: Badness in key lookup (length) bp=(bno 44991680, len 16384 bytes) key=(bno 44991680, len 8192 bytes) b4cecb90: Badness in key lookup (length) bp=(bno 45189120, len 16384 bytes) key=(bno 45189120, len 8192 bytes) b4cecb90: Badness in key lookup (length) bp=(bno 45241856, len 16384 bytes) key=(bno 45241856, len 8192 bytes) - agno = 3 - agno = 4 - agno = 5 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 475, would move to lost+found disconnected inode 1457, would move to lost+found disconnected inode 1462, would move to lost+found disconnected inode 1515, would move to lost+found disconnected inode 1527, would move to lost+found disconnected inode 12936, would move to lost+found disconnected inode 28462, would move to lost+found disconnected inode 28486, would move to lost+found disconnected inode 30897, would move to lost+found disconnected inode 30900, would move to lost+found disconnected inode 255910, would move to lost+found disconnected inode 356253, would move to lost+found disconnected inode 564550, would move to lost+found disconnected inode 564552, would move to lost+found disconnected inode 564558, would move to lost+found disconnected inode 564560, would move to lost+found disconnected inode 567818, would move to lost+found disconnected inode 760356, would move to lost+found disconnected inode 67167339, would move to lost+found disconnected inode 67167345, would move to lost+found disconnected inode 67167367, would move to lost+found disconnected inode 67167377, would move to lost+found disconnected inode 67167381, would move to lost+found disconnected inode 67167523, would move to lost+found disconnected inode 67167530, would move to lost+found disconnected inode 67167540, would move to lost+found disconnected inode 67167567, would move to lost+found disconnected inode 67167589, would move to lost+found disconnected inode 67167590, would move to lost+found disconnected inode 67167611, would move to lost+found disconnected inode 67167995, would move to lost+found disconnected inode 67168188, would move to lost+found disconnected inode 67168199, would move to lost+found disconnected inode 67169785, would move to lost+found disconnected inode 67169805, would move to lost+found disconnected inode 67170769, would move to lost+found disconnected inode 67170784, would move to lost+found disconnected inode 67181371, would move to lost+found disconnected inode 67181372, would move to lost+found disconnected inode 67181399, would move to lost+found disconnected inode 67182212, would move to lost+found disconnected inode 67190216, would move to lost+found disconnected inode 67624500, would move to lost+found disconnected inode 67766334, would move to lost+found disconnected inode 67772371, would move to lost+found disconnected inode 100667199, would move to lost+found disconnected inode 100702371, would move to lost+found disconnected inode 101011291, would move to lost+found disconnected inode 101024061, would move to lost+found disconnected inode 101026567, would move to lost+found disconnected inode 101172659, would move to lost+found disconnected inode 101567559, would move to lost+found disconnected inode 101567560, would move to lost+found disconnected inode 101567561, would move to lost+found disconnected inode 101567563, would move to lost+found disconnected inode 101673003, would move to lost+found disconnected inode 101673004, would move to lost+found Phase 7 - verify link counts... would have reset inode 475 nlinks from 0 to 1 would have reset inode 1457 nlinks from 0 to 1 would have reset inode 1462 nlinks from 0 to 1 would have reset inode 1515 nlinks from 0 to 1 would have reset inode 1527 nlinks from 0 to 1 would have reset inode 12936 nlinks from 0 to 1 would have reset inode 28462 nlinks from 0 to 1 would have reset inode 28486 nlinks from 0 to 1 would have reset inode 30897 nlinks from 0 to 1 would have reset inode 30900 nlinks from 0 to 1 would have reset inode 255910 nlinks from 0 to 1 would have reset inode 356253 nlinks from 0 to 1 would have reset inode 564550 nlinks from 0 to 1 would have reset inode 564552 nlinks from 0 to 1 would have reset inode 564558 nlinks from 0 to 1 would have reset inode 564560 nlinks from 0 to 1 would have reset inode 567818 nlinks from 0 to 1 would have reset inode 760356 nlinks from 0 to 1 would have reset inode 67167339 nlinks from 0 to 1 would have reset inode 67167345 nlinks from 0 to 1 would have reset inode 67167367 nlinks from 0 to 1 would have reset inode 67167377 nlinks from 0 to 1 would have reset inode 67167381 nlinks from 0 to 1 would have reset inode 67167523 nlinks from 0 to 1 would have reset inode 67167530 nlinks from 0 to 1 would have reset inode 67167540 nlinks from 0 to 1 would have reset inode 67167567 nlinks from 0 to 1 would have reset inode 67167589 nlinks from 0 to 1 would have reset inode 67167590 nlinks from 0 to 1 would have reset inode 67167611 nlinks from 0 to 1 would have reset inode 67167995 nlinks from 0 to 1 would have reset inode 67168188 nlinks from 0 to 1 would have reset inode 67168199 nlinks from 0 to 1 would have reset inode 67169785 nlinks from 0 to 1 would have reset inode 67169805 nlinks from 0 to 1 would have reset inode 67170769 nlinks from 0 to 1 would have reset inode 67170784 nlinks from 0 to 1 would have reset inode 67181371 nlinks from 0 to 1 would have reset inode 67181372 nlinks from 0 to 1 would have reset inode 67181399 nlinks from 0 to 1 would have reset inode 67182212 nlinks from 0 to 1 would have reset inode 67190216 nlinks from 0 to 1 would have reset inode 67624500 nlinks from 0 to 1 would have reset inode 67766334 nlinks from 0 to 1 would have reset inode 67772371 nlinks from 0 to 1 would have reset inode 100667199 nlinks from 0 to 1 would have reset inode 100702371 nlinks from 0 to 1 would have reset inode 101011291 nlinks from 0 to 1 would have reset inode 101024061 nlinks from 0 to 1 would have reset inode 101026567 nlinks from 0 to 1 would have reset inode 101172659 nlinks from 0 to 1 would have reset inode 101567559 nlinks from 0 to 1 would have reset inode 101567560 nlinks from 0 to 1 would have reset inode 101567561 nlinks from 0 to 1 would have reset inode 101567563 nlinks from 0 to 1 would have reset inode 101673003 nlinks from 0 to 1 would have reset inode 101673004 nlinks from 0 to 1 No modify flag set, skipping filesystem flush and exiting. [-- Attachment #3: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel BUG when running xfs_fsr with 2.6.35.1 2010-08-17 17:05 ` Arto Jantunen @ 2010-08-17 23:03 ` Dave Chinner 2010-08-18 8:48 ` Arto Jantunen 0 siblings, 1 reply; 6+ messages in thread From: Dave Chinner @ 2010-08-17 23:03 UTC (permalink / raw) To: Arto Jantunen; +Cc: xfs On Tue, Aug 17, 2010 at 08:05:35PM +0300, Arto Jantunen wrote: > Dave Chinner <david@fromorbit.com> writes: > >> I had a kernel BUG yesterday when running xfs_fsr on my Debian Unstable > >> laptop. The kernel is upstream 2.6.35.1. I'm attaching the backtrace > >> below. I haven't tried reproducing the problem yet and don't know if it is > >> reproducible. I can try that, and test patches etc. if it is useful. Let me > >> know if there is any other information I can provide to help with debugging. > > > > It's not obvious what has gone wrong at all - I haven't seen > > anything like this in all my recent testing, so it's something new. > > The first oops implies the inode has not been joined to the > > transaction, but from code inspection I cannot see how that can > > happen. > > I tried to reproduce the problem, and this time xfs_fsr finished without > reporting errors, but the kernel output the following two lines (one of which > is essentially empty): > > [ 6372.878945] Filesystem "sda4": Access to block zero in inode 67203861 > start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 2 > [ 6372.878950] That's a corrupt extent record - it's all zeros, and judging by the fact that it's only got 2 extents, it's probaly inline in the inode (i.e. the inode fork has been zeroed.) > > I decided to boot from a usb stick and try xfs_repair -n, I have attached the > output of that. There were errors reported. Is this simply a case of random > (possibly hardware related) fs corruption, or were the errors actually caused > by the xfs_fsr run that crashed the system? Is there a way to tell from this > data, is there anything else I can provide? .... > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - scan filesystem freespace and inode maps... > - found root inode chunk > Phase 3 - for each AG... > - scan (but don't clear) agi unlinked lists... > error following ag 0 unlinked list > error following ag 2 unlinked list > error following ag 3 unlinked list Ok, so a corrupt set of inode unlinked lists > - process known inodes and perform inode discovery... > - agno = 0 > b766fb90: Badness in key lookup (length) > bp=(bno 208, len 16384 bytes) key=(bno 208, len 8192 bytes) > b766fb90: Badness in key lookup (length) > bp=(bno 720, len 16384 bytes) key=(bno 720, len 8192 bytes) [snip] > Phase 6 - check inode connectivity... > - traversing filesystem ... > - traversal finished ... > - moving disconnected inodes to lost+found ... > disconnected inode 475, would move to lost+found > disconnected inode 1457, would move to lost+found [snip] > Phase 7 - verify link counts... > would have reset inode 475 nlinks from 0 to 1 > would have reset inode 1457 nlinks from 0 to 1 Ok, so inode #457 is in the inode chunk at block 208, likewise inode #1457 is in the chunk at bno 720. This all implies that at some point there's been a problem with the second phase of the unlink procedure and freeing the inode cluster. It looks like the inode cluster has been partially freed (by the "Badness in key lookup" errors) as half of the chunk is free space and half appears to be in use. The freespace btree is clearly confused about this. Along with the inodes bein removed from the directory structure and the link counts being zero, this really does indicate that something went wrong with an inode cluster freeing transaction at some point. I can't see how normal execution would do this, so it leads me to think that transaction recovery might be involved. It smells like partial transaction recovery failures so my next question is this: what is your hardware, have you had any power loss events and are you using barriers? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel BUG when running xfs_fsr with 2.6.35.1 2010-08-17 23:03 ` Dave Chinner @ 2010-08-18 8:48 ` Arto Jantunen 0 siblings, 0 replies; 6+ messages in thread From: Arto Jantunen @ 2010-08-18 8:48 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Dave Chinner <david@fromorbit.com> writes: > On Tue, Aug 17, 2010 at 08:05:35PM +0300, Arto Jantunen wrote: >> Dave Chinner <david@fromorbit.com> writes: >> >> I had a kernel BUG yesterday when running xfs_fsr on my Debian Unstable >> >> laptop. The kernel is upstream 2.6.35.1. I'm attaching the backtrace >> >> below. I haven't tried reproducing the problem yet and don't know if it is >> >> reproducible. I can try that, and test patches etc. if it is useful. Let me >> >> know if there is any other information I can provide to help with debugging. >> > >> > It's not obvious what has gone wrong at all - I haven't seen >> > anything like this in all my recent testing, so it's something new. >> > The first oops implies the inode has not been joined to the >> > transaction, but from code inspection I cannot see how that can >> > happen. >> >> I tried to reproduce the problem, and this time xfs_fsr finished without >> reporting errors, but the kernel output the following two lines (one of which >> is essentially empty): >> >> [ 6372.878945] Filesystem "sda4": Access to block zero in inode 67203861 >> start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 2 >> [ 6372.878950] > > That's a corrupt extent record - it's all zeros, and judging by the > fact that it's only got 2 extents, it's probaly inline in the inode > (i.e. the inode fork has been zeroed.) > >> >> I decided to boot from a usb stick and try xfs_repair -n, I have attached the >> output of that. There were errors reported. Is this simply a case of random >> (possibly hardware related) fs corruption, or were the errors actually caused >> by the xfs_fsr run that crashed the system? Is there a way to tell from this >> data, is there anything else I can provide? > .... > >> Phase 1 - find and verify superblock... >> Phase 2 - using internal log >> - scan filesystem freespace and inode maps... >> - found root inode chunk >> Phase 3 - for each AG... >> - scan (but don't clear) agi unlinked lists... >> error following ag 0 unlinked list >> error following ag 2 unlinked list >> error following ag 3 unlinked list > > Ok, so a corrupt set of inode unlinked lists > >> - process known inodes and perform inode discovery... >> - agno = 0 >> b766fb90: Badness in key lookup (length) >> bp=(bno 208, len 16384 bytes) key=(bno 208, len 8192 bytes) >> b766fb90: Badness in key lookup (length) >> bp=(bno 720, len 16384 bytes) key=(bno 720, len 8192 bytes) > > [snip] > >> Phase 6 - check inode connectivity... >> - traversing filesystem ... >> - traversal finished ... >> - moving disconnected inodes to lost+found ... >> disconnected inode 475, would move to lost+found >> disconnected inode 1457, would move to lost+found > > [snip] > >> Phase 7 - verify link counts... >> would have reset inode 475 nlinks from 0 to 1 >> would have reset inode 1457 nlinks from 0 to 1 > > Ok, so inode #457 is in the inode chunk at block 208, likewise > inode #1457 is in the chunk at bno 720. This all implies that > at some point there's been a problem with the second phase of > the unlink procedure and freeing the inode cluster. It looks like > the inode cluster has been partially freed (by the "Badness in key > lookup" errors) as half of the chunk is free space and half appears > to be in use. The freespace btree is clearly confused about this. > > Along with the inodes bein removed from the directory structure and > the link counts being zero, this really does indicate that something > went wrong with an inode cluster freeing transaction at some point. > > I can't see how normal execution would do this, so it leads me to > think that transaction recovery might be involved. It smells like > partial transaction recovery failures so my next question is this: > what is your hardware, have you had any power loss events and are > you using barriers? Could this corruption have been caused by having to reboot via sysrq after the original crash (with sync, umount, sync, reboot)? Other than that one, I don't remember having any power failures or such. The hardware is an Acer TravelMate 3040 laptop with a single SATA disk (120Gb IIRC). I haven't disabled barriers manually and am not using any layers between the fs and disk (dm or md or such), so as far as I understand barriers should be enabled (I'll check the kernel log when I'm at the machine again and send another mail tonight if that is not in fact the case). Any ideas if the original crash during xfs_fsr was caused by existing problems in the fs, or was the crash the cause of the problems seen now? Should I allow xfs_repair to fix the fs, or will that lose data that could be useful for debugging? -- Arto Jantunen _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-08-18 8:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-16 8:32 Kernel BUG when running xfs_fsr with 2.6.35.1 Arto Jantunen 2010-08-16 22:09 ` Dave Chinner 2010-08-17 9:19 ` Arto Jantunen 2010-08-17 17:05 ` Arto Jantunen 2010-08-17 23:03 ` Dave Chinner 2010-08-18 8:48 ` Arto Jantunen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox