From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754049Ab2I0SCj (ORCPT ); Thu, 27 Sep 2012 14:02:39 -0400 Received: from h1446028.stratoserver.net ([85.214.92.142]:44080 "EHLO mail.ahsoftware.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752903Ab2I0SCh convert rfc822-to-8bit (ORCPT ); Thu, 27 Sep 2012 14:02:37 -0400 Message-ID: <5064945C.4020403@ahsoftware.de> Date: Thu, 27 Sep 2012 20:01:00 +0200 From: Alexander Holler User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Jan Kara CC: Dan Carpenter , linux-kernel@vger.kernel.org Subject: Re: kernel BUG at fs/buffer.c:3205 (stable 3.5.3) References: <50531629.9020107@ahsoftware.de> <20120925110206.GD28937@mwanda> <50643C4A.9010202@ahsoftware.de> <20120927151232.GA12210@quack.suse.cz> <506474E8.3030300@ahsoftware.de> <50647CCC.7010407@ahsoftware.de> In-Reply-To: <50647CCC.7010407@ahsoftware.de> Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 27.09.2012 18:20, schrieb Alexander Holler: > Am 27.09.2012 17:46, schrieb Alexander Holler: >> Hello, >> >> Am 27.09.2012 17:12, schrieb Jan Kara: >>> Just some thoughts about your oops: >>> The assertion which fails is: >>> BUG_ON(!list_empty(&bh->b_assoc_buffers)); >>> >>> Now b_assoc_buffers isn't used very much. In particular ext4 which you >>> seem >>> to be using doesn't use this list at all (except when mounted in >>> nojournal >>> mode but that doesn't seem to be your case). That would point rather >>> strongly at a memory corruption issue. >>> >>> So if you can reproduce the oops, it might be interesting to print >>> bh->b_assoc_buffers.next and &bh->b_assoc_buffers.next if the list is >>> found >>> to be non-empty. >> >> Hmm, a loose pointer would explain it all too. Especially the cases when >> I just have seen wrong content in the archive without having any oops. I >> try to reproduce it with >> >> pr_info("AHO: %p %p\n", bh->b_assoc_buffers.next, >> &bh->b_assoc_buffers.next); >> >> after the BUG_ON(). >> >> Thanks for the hint. I wasn't already that far to know that >> b_assoc_buffers isn't used that much. > > Hmm, that doesn't look very practicable because b_assoc_buffers seems to > be used a lot here. ;) > Maybe I should have mentioned that I'm mounting the source filesystem > (root with ext4) with > nodelalloc(rw,noatime,nodelalloc,errors=remount-ro,data=ordered), and to > backup it, I'm using a bind-mount (mount -o bind / /foo) as source. > > But the debug output starts very early on boot, where no bind-mount is > used: > > --------------------- > Sep 27 18:03:23 krabat udevd[1254]: invalid rule > '/etc/udev/rules.d/80-aho.rules:26' > Sep 27 18:03:23 krabat kernel: [ 4.562670] usb usb8: New USB device > found, idVendor=1d6b, idProduct=0001 > Sep 27 18:03:23 krabat kernel: [ 4.562671] usb usb8: New USB device > strings: Mfr=3, Product=2, SerialNumber=1 > Sep 27 18:03:23 krabat systemd-uaccess[1363]: Failed to apply ACL on > /dev/kvm: Operation not supported > Sep 27 18:03:23 krabat kernel: [ 4.562673] usb usb8: Product: UHCI > Host Controller > Sep 27 18:03:23 krabat kernel: [ 4.562674] usb usb8: Manufacturer: > Linux 3.5.4-00009-gfa43f23-dirty uhci_hcd > Sep 27 18:03:23 krabat kernel: [ 4.562676] usb usb8: SerialNumber: > 0000:00:1d.0 > Sep 27 18:03:23 krabat systemd-uaccess[1716]: Failed to apply ACL on > /dev/kvm: Operation not supported > Sep 27 18:03:23 krabat kernel: [ 4.563285] hub 8-0:1.0: USB hub found > Sep 27 18:03:23 krabat kernel: [ 4.563288] hub 8-0:1.0: 2 ports detected > Sep 27 18:03:23 krabat systemd-uaccess[2324]: Failed to apply ACL on > /dev/snd/timer: Operation not supported > Sep 27 18:03:23 krabat kernel: [ 4.563316] AHO: ffff880212e4b048 > ffff880212e4b048 > Sep 27 18:03:23 krabat kernel: [ 4.563318] AHO: ffff880212e4b0b0 > ffff880212e4b0b0 > Sep 27 18:03:23 krabat kernel: [ 4.563319] AHO: ffff880212e4b118 > ffff880212e4b118 > --------------------- > > And afterwards I see tons of those messages, so it doesn't look usable. > Anyway, I retry to repdroduce the problem without that debug line, just > to see if still can reproduce the problem with F17 as userspace (and > kernel 3.5.4 instead of 3.5.3). After 2 successful tries in sequence, the third failed (sorry, LANG=de): --------------------------------------------------------------- [root@krabat bind]# tar cp . | mbuffer | bzip2smp >/mnt/usb3/Krabat.Fedora17.sdb2.27.09.12.tar.bz2 in @ 33.1 MiB/s, out @ 38.3 MiB/s, 888 MiB total, buffer 95% fulltar: ./tmp/.X11-unix/X0: Socket ignoriert in @ 0.0 KiB/s, out @ 20.8 MiB/s, 24.9 GiB total, buffer 22% full summary: 24.9 GiByte in 19 min 53.0 sec - average of 21.4 MiB/s [root@krabat bind]# tar djf /mnt/usb3/Krabat.Fedora17.sdb2.27.09.12.tar.bz2 ./var/log/messages: Änderungszeit ist unterschiedlich ./var/log/messages: Größe ist unterschiedlich ./var/tmp/kdecache-aholler/icon-cache.kcache: Änderungszeit ist unterschiedlich ./var/tmp/kdecache-aholler/icon-cache.kcache: Unterschiedliche Inhalte ./var/tmp/kdecache-aholler/plasma_theme_oxygen.kcache: Änderungszeit ist unterschiedlich ./var/tmp/kdecache-aholler/plasma_theme_oxygen.kcache: Unterschiedliche Inhalte ./var/lib/chrony/drift: Änderungszeit ist unterschiedlich ./var/lib/chrony/drift: Unterschiedliche Inhalte ./home/aholler/.kde/share/apps/konqueror/autosave/_1.77: Änderungszeit ist unterschiedlich ./home/aholler/.kde/share/apps/konqueror/autosave/_1.77: Größe ist unterschiedlich ./home/aholler/.kde/share/apps/konqueror/konq_history: Änderungszeit ist unterschiedlich ./home/aholler/.kde/share/apps/konqueror/konq_history: Größe ist unterschiedlich ./home/aholler/.kde/share/apps/kcookiejar/cookies: Änderungszeit ist unterschiedlich ./home/aholler/thinstation_src-2.0beta2.tar.bz2: Unterschiedliche Inhalte bzip2: Data integrity error when decompressing. Input file = (stdin), output file = (stdout) It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files. You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files. tar: Unerwartetes Dateiende im Archiv. tar: Child returned status 2 tar: Error is not recoverable: exiting now [root@krabat bind]# --------------------------------------------------------------- This time without any oops, dmesg just shows some --------------------------------------------------------------- [ 111.087356] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null) [ 672.868948] CPU4: Core temperature above threshold, cpu clock throttled (total events = 1) [ 672.868949] CPU0: Core temperature above threshold, cpu clock throttled (total events = 1) [ 672.869970] CPU0: Core temperature/speed normal [ 672.869971] CPU4: Core temperature/speed normal [ 688.285419] CPU6: Core temperature above threshold, cpu clock throttled (total events = 1) [ 688.285421] CPU2: Core temperature above threshold, cpu clock throttled (total events = 1) [ 688.286442] CPU2: Core temperature/speed normal [ 688.286443] CPU6: Core temperature/speed normal [ 698.822614] CPU3: Core temperature above threshold, cpu clock throttled (total events = 1) [ 698.822615] CPU7: Core temperature above threshold, cpu clock throttled (total events = 1) [ 698.824674] CPU3: Core temperature/speed normal [ 698.824675] CPU7: Core temperature/speed normal [ 706.979633] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1) [ 706.979635] CPU5: Core temperature above threshold, cpu clock throttled (total events = 1) [ 706.980648] CPU1: Core temperature/speed normal [ 706.980649] CPU5: Core temperature/speed normal [ 899.540485] [Hardware Error]: Machine check events logged --------------------------------------------------------------- Nothing else. Kernel is 3.5.4 userland now F17. Regards.