* Fwd: Re: Broken nilfs2 filesystem [not found] ` <51F2A8A4.4020400-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-07-26 16:52 ` Anton Eliasson [not found] ` <51F2A945.6050909-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-07-26 16:52 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA I forgot to send this to the list. -------- Ursprungligt meddelande -------- Ämne: Re: Broken nilfs2 filesystem Datum: Fri, 26 Jul 2013 18:49:40 +0200 Från: Anton Eliasson <devel-17Olwe7vw2dLC78zk6coLg@public.gmane.org> Till: Vyacheslav Dubeyko <slava-2lV3ebY47BVBDgjK7y7TUQ@public.gmane.org> Vyacheslav Dubeyko skrev 2013-07-26 14:37: > Hi Anton, > > Do you ready to try to obtain debug output? I am really waiting your > readiness because your opportunity to reproduce the issue is very > important. > > I think that it needs to enable such configuration options: > 1. CONFIG_NILFS2_DEBUG_SHOW_ERRORS > 2. CONFIG_NILFS2_DEBUG_BASE_OPERATIONS > 3. CONFIG_NILFS2_DEBUG_MDT_FILES > 4. CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM > 5. CONFIG_NILFS2_DEBUG_BLOCK_MAPPING > 6. CONFIG_NILFS2_DEBUG_DUMP_STACK > > Thanks, > Vyacheslav Dubeyko. Hi, Thanks for the reminder! I tried it out a few weeks ago. My plan then was to restore the old root system (running Linux 3.9) and /home to the mechanical hard drive, boot them up and use that system to build the newer kernel with your kernel. This would make minimal changes to the system I had experienced the bugs with, I thought. Building the kernel took me some time but I eventually succeded. I forgot those configuration options though so I suppose that build did nothing out of the ordinary, even though it had the patches. I got it to boot but then ran in to the issue of incompatible video drivers. I've been down that road before and I did not feel like going back. My next plan was to wait until 3.10 got released into the core repository which happened yesterday or so. I would then use my fully updated system (currently running the 3.10 kernel on an ext4 root filesystem) to build a kernel with the debug patches. It would be a slightly different system with many updated packages, but at least the root filesystem will be a healthy ext4. So that's what I did today. Buildlogs excerpts are at the bottom of this message. I got the kernel to boot and X to start. But as soon as anything tried to read from /home (even just logging into a virtual terminal as a regular user), that terminal froze. Logging in as root to a different VT showed that syslog-ng and nilfs_cleanerd took 100% CPU each. After a while the entire system froze. Pressing Alt+SysRq+R caused the kernel to spew out an endless stream of call traces to the terminal, leaving no other way to reboot than a hard reset. After a reboot I found that the running nilfs_cleanerd does not respond to either a SIGTERM or a SIGKILL. Changing the mount options of /home from rw,noatime,discard to ro,norecovery,noatime,discard seems to work. No crashes during login to a VT. Read-only is no fun though. Next attempt is rw,nogc,noatime,discard. It seems to work also, though there are a lot of call traces in dmesg. After just a few reboots and a few minutes of uptime, /var/log/kernel.log, messages.log and everything.log have grown to 3.5 GB each. My / is only 30 GB so I can't sustain this for very long. I have aborted the experiments for today. kernel.log has 35 million lines and compresses to 220 MB. I've uploaded it here (http://antoneliasson.se/publicdump/kernel.log.20130726.gz). What should I do next? Build logs ======= * Download kernel package build scripts and patches (from the Arch Build System). Unpack and apply downstream patches: $ makepkg -o ==> Skapar paket: linux 3.10.2-1 (fre jul 26 17:08:16 CEST 2013) ==> Retrieving sources... -> Hittade linux-3.10.tar.xz -> Laddar ner patch-3.10.2.xz... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 178 100 178 0 0 188 0 --:--:-- --:--:-- --:--:-- 188 100 27876 100 27876 0 0 13761 0 0:00:02 0:00:02 --:--:-- 37978 -> Hittade config -> Hittade config.x86_64 -> Hittade linux.preset -> Hittade change-default-console-loglevel.patch ==> Validerar källfiler med md5sums... linux-3.10.tar.xz ... Godkänd patch-3.10.2.xz ... Godkänd config ... Godkänd config.x86_64 ... Godkänd linux.preset ... Godkänd change-default-console-loglevel.patch ... Godkänd ==> Extracting sources... -> Extraherar linux-3.10.tar.xz med bsdtar -> Extraherar patch-3.10.2.xz med xz ==> Startar prepare()... patching file Documentation/parisc/registers patching file MAINTAINERS patching file Makefile patching file arch/arm/boot/dts/imx23.dtsi patching file arch/arm/boot/dts/imx28.dtsi patching file arch/arm/boot/dts/imx6dl.dtsi patching file arch/arm/boot/dts/imx6q.dtsi patching file arch/arm/include/asm/mmu_context.h patching file arch/arm/kernel/perf_event.c [...] patching file mm/page_alloc.c patching file mm/slab.c patching file net/ceph/auth_none.c patching file kernel/printk.c Hunk #1 succeeded at 56 with fuzz 2 (offset -2 lines). ==> Källor är redo. $ ls -l -rw-r--r-- 1 anton anton 2,7K 26 jul 00.06 alsa-firmware-loading-3.8.8.patch -rw-r--r-- 1 anton anton 605 26 jul 00.06 change-default-console-loglevel.patch -rw-r--r-- 1 anton anton 142K 26 jul 17.13 config -rw-r--r-- 1 anton anton 142K 26 jul 00.06 config~ -rw-r--r-- 1 anton anton 138K 26 jul 17.13 config.x86_64 -rw-r--r-- 1 anton anton 138K 26 jul 00.06 config.x86_64~ -rw-r--r-- 1 anton anton 926 26 jul 00.06 linux.install -rw-r--r-- 1 anton anton 376 26 jul 00.06 linux.preset drwxr-xr-x 2 anton anton 4,0K 26 jul 17.06 nilfs2-debug-output-patch-set-25-06-2013 -rw-r--r-- 1 anton anton 13K 26 jul 00.06 PKGBUILD drwxr-xr-x 3 anton anton 4,0K 26 jul 17.08 src * Apply nilfs2 debug patches: $ cd src/linux-3.10/ $ for file in ../../nilfs2-debug-output-patch-set-25-06-2013/*.patch; do patch -p1 < "$file"; done patching file fs/nilfs2/Kconfig patching file fs/nilfs2/debug.h patching file fs/nilfs2/Kconfig patching file fs/nilfs2/debug.h patching file fs/nilfs2/dir.c patching file fs/nilfs2/file.c patching file fs/nilfs2/inode.c patching file fs/nilfs2/ioctl.c patching file fs/nilfs2/namei.c patching file fs/nilfs2/nilfs.h [...] patching file fs/nilfs2/segbuf.c patching file fs/nilfs2/segment.c patching file fs/nilfs2/sufile.c patching file fs/nilfs2/super.c patching file fs/nilfs2/the_nilfs.c patching file fs/nilfs2/Kconfig patching file fs/nilfs2/debug.h * Append the following lines to config (just in case) and config.x86_64 (which I assume I will use): CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y CONFIG_NILFS2_DEBUG_MDT_FILES=y CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y CONFIG_NILFS2_DEBUG_DUMP_STACK=y * Build the kernel. There were some interactive configuration options concerning nilfs2 that I responded yes to: $ makepkg -e ==> Skapar paket: linux 3.10.2-1 (fre jul 26 17:16:41 CEST 2013) ==> Checking runtime dependencies... ==> Checking buildtime dependencies... ==> VARNING: Using existing src/ tree ==> Startar build()... HOSTCC scripts/basic/fixdep HOSTCC scripts/kconfig/conf.o SHIPPED scripts/kconfig/zconf.tab.c SHIPPED scripts/kconfig/zconf.lex.c SHIPPED scripts/kconfig/zconf.hash.c HOSTCC scripts/kconfig/zconf.tab.o HOSTLD scripts/kconfig/conf scripts/kconfig/conf --silentoldconfig Kconfig * * Restart config... * * * File systems * [...] NILFS2 file system support (NILFS2_FS) [M/n/y/?] m NILFS2 debugging (NILFS2_DEBUG) [N/y/?] (NEW) y Use pr_debug() instead of printk() (NILFS2_USE_PR_DEBUG) [N/y/?] (NEW) y Show internal errors (NILFS2_DEBUG_SHOW_ERRORS) [N/y/?] (NEW) y Enable dump stack output (NILFS2_DEBUG_DUMP_STACK) [N/y/?] (NEW) y [...] # # configuration written to .config # SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_32.h SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_64.h SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_x32.h SYSTBL arch/x86/syscalls/../include/generated/asm/syscalls_32.h SYSHDR arch/x86/syscalls/../include/generated/asm/unistd_32_ia32.h SYSHDR arch/x86/syscalls/../include/generated/asm/unistd_64_x32.h SYSTBL arch/x86/syscalls/../include/generated/asm/syscalls_64.h WRAP arch/x86/include/generated/asm/clkdev.h CHK include/generated/uapi/linux/version.h UPD include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h UPD include/generated/utsrelease.h HOSTCC arch/x86/tools/relocs_32.o [...] INSTALL /home/anton/build/linux/pkg/linux/lib/firmware/edgeport/down3.bin INSTALL /home/anton/build/linux/pkg/linux/lib/firmware/keyspan_pda/keyspan_pda.fw INSTALL /home/anton/build/linux/pkg/linux/lib/firmware/keyspan_pda/xircom_pgs.fw INSTALL /home/anton/build/linux/pkg/linux/lib/firmware/cpia2/stv0672_vp4.bin INSTALL /home/anton/build/linux/pkg/linux/lib/firmware/yam/1200.bin INSTALL /home/anton/build/linux/pkg/linux/lib/firmware/yam/9600.bin DEPMOD 3.10.2-1-ARCH ==> Städar upp efter installationen... -> Rensar oönskade filer... -> Komprimerar man och info sidor... ==> Creating package "linux"... -> Skapar .PKGINFO fil... -> Lägger till install fil... -> Generating .MTREE file... -> Komprimerar paket... ==> Startar package_linux-headers()... ==> Städar upp efter installationen... -> Rensar oönskade filer... -> Komprimerar man och info sidor... ==> Creating package "linux-headers"... -> Skapar .PKGINFO fil... -> Generating .MTREE file... -> Komprimerar paket... ==> Startar package_linux-docs()... ==> Städar upp efter installationen... -> Rensar oönskade filer... -> Komprimerar man och info sidor... ==> Creating package "linux-docs"... -> Skapar .PKGINFO fil... -> Generating .MTREE file... -> Komprimerar paket... ==> Leaving fakeroot environment. ==> Kompilering klar: linux 3.10.2-1 (fre jul 26 17:46:42 CEST 2013) * Install over the stock 3.10 kernel, point /home to the old nilfs2 filesystem in /etc/fstab and reboot. -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <51F2A945.6050909-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <51F2A945.6050909-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-07-27 16:23 ` Vyacheslav Dubeyko [not found] ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-07-27 16:23 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Anton, On Jul 26, 2013, at 8:52 PM, Anton Eliasson wrote: Thank you for your efforts. But, as I understand, currently, you don't reproduce the issue and shared system log doesn't contain any new details about the issue. Please, see my description below. [snip] > > I have aborted the experiments for today. kernel.log has 35 million > lines and compresses to 220 MB. I've uploaded it here > (http://antoneliasson.se/publicdump/kernel.log.20130726.gz). What should > I do next? > Unfortunately, the shared system log content doesn't contain any NILFS2 error messages. So, it means that the issue doesn't be reproduced. Do you really confident that you can reproduce the issue before beginning of getting debug output? Could you check firstly the issue reproducibility? You made one mistake during configuration of debug output. Please, see my description below. [snip] > * Append the following lines to config (just in case) and config.x86_64 > (which I assume I will use): > > CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y > CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y > CONFIG_NILFS2_DEBUG_MDT_FILES=y > CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y > CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y > CONFIG_NILFS2_DEBUG_DUMP_STACK=y > I think that better to use "make menuconfig" for debug output configuration because above-mentioned options have dependencies from other ones. Please, use "make menuconfig" way because it is not so easy to describe what set of configuration options are valid. [snip] > * File systems > * > [...] > NILFS2 file system support (NILFS2_FS) [M/n/y/?] m > NILFS2 debugging (NILFS2_DEBUG) [N/y/?] (NEW) y > Use pr_debug() instead of printk() (NILFS2_USE_PR_DEBUG) > [N/y/?] (NEW) y No, no, no... When you select using pr_debug() then you disable CONFIG_NILFS2_DEBUG_BASE_OPERATIONS, CONFIG_NILFS2_DEBUG_MDT_FILES, CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM, CONFIG_NILFS2_DEBUG_BLOCK_MAPPING options because you need to use dynamic debug opportunity (please, see Documentation/dynamic-debug-howto.txt). Moreover, when you select CONFIG_NILFS2_DEBUG_DUMP_STACK in dynamic debug output case then every function emits dump_stack() output. Please, read comments for configuration options. Firstly, I want to get debug output without enabling pr_debug(). We will have debug output only from requested subsystems in the case of using simple printk(). So, improper configuration of debug output is the reason of huge size of system log. I suggest not to use CONFIG_NILFS2_DEBUG_DUMP_STACK option, firstly. > Show internal errors (NILFS2_DEBUG_SHOW_ERRORS) [N/y/?] (NEW) y > Enable dump stack output (NILFS2_DEBUG_DUMP_STACK) [N/y/?] (NEW) y > So, first of all, we need to reproduce the issue in initial state. Then, it needs to configure debug output properly and to get debug output for the case of reproduced issue. Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> @ 2013-07-27 22:32 ` Anton Eliasson 2013-08-15 10:40 ` Nilfs2 crash debugging (was: Broken nilfs2 filesystem) Anton Eliasson 1 sibling, 0 replies; 37+ messages in thread From: Anton Eliasson @ 2013-07-27 22:32 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Vyacheslav Dubeyko skrev 2013-07-27 18:23: > Hi Anton, > > On Jul 26, 2013, at 8:52 PM, Anton Eliasson wrote: > > Thank you for your efforts. But, as I understand, currently, you > don't reproduce the issue and shared system log doesn't contain > any new details about the issue. Please, see my description below. That is correct, I just wanted to know if I was on the right track (and it turned out that I weren't). > [snip] >> I have aborted the experiments for today. kernel.log has 35 million >> lines and compresses to 220 MB. I've uploaded it here >> (http://antoneliasson.se/publicdump/kernel.log.20130726.gz). What should >> I do next? >> > Unfortunately, the shared system log content doesn't contain any NILFS2 > error messages. So, it means that the issue doesn't be reproduced. Do you > really confident that you can reproduce the issue before beginning of getting > debug output? Could you check firstly the issue reproducibility? You're right, I should try that first. Unfortunately, I'll be away from this computer again for the next week or two. I'll try to allocate some time for this investigation after that. I can access it via SSH so I might spend an evening recompiling the kernel remotely, but I don't want to reboot the computer remotely. > > You made one mistake during configuration of debug output. Please, see > my description below. > > [snip] >> * Append the following lines to config (just in case) and config.x86_64 >> (which I assume I will use): >> >> CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y >> CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y >> CONFIG_NILFS2_DEBUG_MDT_FILES=y >> CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y >> CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y >> CONFIG_NILFS2_DEBUG_DUMP_STACK=y >> > I think that better to use "make menuconfig" for debug output configuration > because above-mentioned options have dependencies from other ones. > Please, use "make menuconfig" way because it is not so easy to describe > what set of configuration options are valid. > > [snip] >> * File systems >> * >> [...] >> NILFS2 file system support (NILFS2_FS) [M/n/y/?] m >> NILFS2 debugging (NILFS2_DEBUG) [N/y/?] (NEW) y >> Use pr_debug() instead of printk() (NILFS2_USE_PR_DEBUG) >> [N/y/?] (NEW) y > No, no, no... When you select using pr_debug() then you disable > CONFIG_NILFS2_DEBUG_BASE_OPERATIONS, > CONFIG_NILFS2_DEBUG_MDT_FILES, CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM, > CONFIG_NILFS2_DEBUG_BLOCK_MAPPING options because you need to use > dynamic debug opportunity (please, see Documentation/dynamic-debug-howto.txt). > Moreover, when you select CONFIG_NILFS2_DEBUG_DUMP_STACK in dynamic > debug output case then every function emits dump_stack() output. Please, read > comments for configuration options. > > Firstly, I want to get debug output without enabling pr_debug(). We will have debug output > only from requested subsystems in the case of using simple printk(). So, improper > configuration of debug output is the reason of huge size of system log. I suggest > not to use CONFIG_NILFS2_DEBUG_DUMP_STACK option, firstly. > >> Show internal errors (NILFS2_DEBUG_SHOW_ERRORS) [N/y/?] (NEW) y >> Enable dump stack output (NILFS2_DEBUG_DUMP_STACK) [N/y/?] (NEW) y >> > So, first of all, we need to reproduce the issue in initial state. Then, it needs to configure > debug output properly and to get debug output for the case of reproduced issue. > > Thanks, > Vyacheslav Dubeyko. > Thanks for all your advice. I'm very new to compiling and configuring kernels. I will keep you updated on how my next attempt works out. -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Nilfs2 crash debugging (was: Broken nilfs2 filesystem) [not found] ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> 2013-07-27 22:32 ` Anton Eliasson @ 2013-08-15 10:40 ` Anton Eliasson [not found] ` <520CB032.2000602-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 1 sibling, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-08-15 10:40 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Vyacheslav Dubeyko skrev 2013-07-27 18:23: > Hi Anton, > > On Jul 26, 2013, at 8:52 PM, Anton Eliasson wrote: > > Thank you for your efforts. But, as I understand, currently, you > don't reproduce the issue and shared system log doesn't contain > any new details about the issue. Please, see my description below. > [snip] Hi again. I was able to reproduce the crash on a fully updated system by starting the two virtual machines simultaneously as described in my e-mail from May 25. I made a new attempt to rebuild the kernel with your patches. I selected these options in make menuconfig [1], which resulted in this generated config.x86_64 [2] which has the following diff compared to the stock config.x86_64: --- config.x86_64 2013-08-11 00:06:09.000000000 +0200 +++ config.x86_64.last 2013-08-11 12:48:44.094979947 +0200 @@ -1,6 +1,6 @@ # # Automatically generated file; DO NOT EDIT. -# Linux/x86 3.10.0-1 Kernel Configuration +# Linux/x86 3.10.5-1 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y @@ -5450,6 +5450,11 @@ # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set # CONFIG_BTRFS_DEBUG is not set CONFIG_NILFS2_FS=m +CONFIG_NILFS2_DEBUG=y +# CONFIG_NILFS2_USE_PR_DEBUG is not set +CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y +CONFIG_NILFS2_DEBUG_DUMP_STACK=y +# CONFIG_NILFS2_DEBUG_SUBSYSTEMS is not set CONFIG_FS_POSIX_ACL=y CONFIG_EXPORTFS=y CONFIG_FILE_LOCKING=y I hope those build options are the ones you want. Using the custom kernel and mount options, I could reproduce the crash right away. Here's the log [3] (crash at timestamp "Aug 15 10:26:26 riven kernel: [ 376.625992]"). The cleaner wasn't running at the time. I don't remember if I used the mount option nogc or if I killed it manually after booting up. Because of these uncertainties and the fact that the log is a bit messy, I attempted to rotate the logs, reboot and try again. Of course, that caused this heisenbug to disappear again. I produced some pretty logs showing lots of errors without the cleaner[4], with the cleaner started manually [5] and with the cleaner started at boot [6]. None of them show the crash however so they may be of limited use for you. Okay, one final attempt. I reinstalled the stock kernel and managed to crash the system using the virtual machines like before. I then reinstalled the custom kernel, rotated the logs, rebooted with the mount options "rw,noatime,discard", left the cleanerd running and fired up VMware. I was happy to see the system die as expected. [7] and [8] should contain beautiful logs of everything from boot to crash. [1]: http://antoneliasson.se/publicdump/menuconfig.png [2]: http://antoneliasson.se/publicdump/config.x86_64.last [3]: http://antoneliasson.se/publicdump/kernel.log.2.gz [4]: http://antoneliasson.se/publicdump/kernel.log.nogc-nocleanerd-nocrash.2013-08-15.1048.log.gz [5]: http://antoneliasson.se/publicdump/kernel.log.nogc-cleanerd-nocrash.2013-08-15.1054.log.gz [6]: http://antoneliasson.se/publicdump/kernel.log.gc-cleanerd-nocrash.2013-08-15.1104.log.gz [7]: http://antoneliasson.se/publicdump/kernel.log.gc-cleanerd-crash.2013-08-15.1205.log.gz [8]: http://antoneliasson.se/publicdump/everything.log.gc-cleanerd-crash.2013-08-15.1211.log.gz -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <520CB032.2000602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Nilfs2 crash debugging (was: Broken nilfs2 filesystem) [not found] ` <520CB032.2000602-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-08-16 7:11 ` Vyacheslav Dubeyko 2013-08-19 19:55 ` Vyacheslav Dubeyko 1 sibling, 0 replies; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-08-16 7:11 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Anton, On Thu, 2013-08-15 at 12:40 +0200, Anton Eliasson wrote: > Hi again. I was able to reproduce the crash on a fully updated system by > starting the two virtual machines simultaneously as described in my > e-mail from May 25. I made a new attempt to rebuild the kernel with your > patches. I selected these options in make menuconfig [1], which resulted > in this generated config.x86_64 [2] which has the following diff > compared to the stock config.x86_64: > Thank you for sharing the debug output for the issue. I am going to be acquainted with shared materials during this weekend (or on the next week). Unfortunately, I have busy week. Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Nilfs2 crash debugging (was: Broken nilfs2 filesystem) [not found] ` <520CB032.2000602-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 2013-08-16 7:11 ` Vyacheslav Dubeyko @ 2013-08-19 19:55 ` Vyacheslav Dubeyko [not found] ` <FEA41B6A-7D82-4563-AAF5-D5AFA3734D79-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> 1 sibling, 1 reply; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-08-19 19:55 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Aug 15, 2013, at 2:40 PM, Anton Eliasson wrote: [snip] > Hi again. I was able to reproduce the crash on a fully updated system by starting the two virtual machines simultaneously as described in my e-mail from May 25. I made a new attempt to rebuild the kernel with your patches. I selected these options in make menuconfig [1], which resulted in this generated config.x86_64 [2] which has the following diff compared to the stock config.x86_64: > As I remember, you reported about remount file system in RO mode and many "broken bnode" error messages issue, initially. Unfortunately, as I can see, you can't reproduce this issue. I really had hope that you can reproduce this important issue. As I see, shared by you logs with crush contain details about the issue that it was reported also by Jérôme Poulin <jeromepoulin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>. I mean this error message: [ 304.494448] BUG: unable to handle kernel paging request at 00000000000013f6 [ 304.494456] IP: [<ffffffffa1327232>] nilfs_end_page_io+0x12/0xc0 [nilfs2] I can reproduce this issue on my side and this issue is under investigation yet. But anyway... Could you try to reproduce the issue with remounting file system in RO mode? It is really important and annoying issue. > --- config.x86_64 2013-08-11 00:06:09.000000000 +0200 > +++ config.x86_64.last 2013-08-11 12:48:44.094979947 +0200 > @@ -1,6 +1,6 @@ > # > # Automatically generated file; DO NOT EDIT. > -# Linux/x86 3.10.0-1 Kernel Configuration > +# Linux/x86 3.10.5-1 Kernel Configuration > # > CONFIG_64BIT=y > CONFIG_X86_64=y > @@ -5450,6 +5450,11 @@ > # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set > # CONFIG_BTRFS_DEBUG is not set > CONFIG_NILFS2_FS=m > +CONFIG_NILFS2_DEBUG=y > +# CONFIG_NILFS2_USE_PR_DEBUG is not set > +CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y > +CONFIG_NILFS2_DEBUG_DUMP_STACK=y > +# CONFIG_NILFS2_DEBUG_SUBSYSTEMS is not set > CONFIG_FS_POSIX_ACL=y > CONFIG_EXPORTFS=y > CONFIG_FILE_LOCKING=y > As I remember, I asked you about enabling more configuration options. I mean such options: CONFIG_NILFS2_DEBUG_BASE_OPERATIONS, CONFIG_NILFS2_DEBUG_MDT_FILES, CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM, CONFIG_NILFS2_DEBUG_BLOCK_MAPPING. I suppose that you don't enable these options because it has dependence from "Enable output from subsystem" option. But, anyway, I am afraid that you don't reproduce the issue in the case of these options enabling. But maybe you will be more lucky in such trying. :) Anyway, thank you for your efforts. It will be really great if you will be lucky and will reproduce the issue with remount file system in RO mode and many "broken bnode" error messages. Could you try again? Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <FEA41B6A-7D82-4563-AAF5-D5AFA3734D79-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]
* Re: Nilfs2 crash debugging [not found] ` <FEA41B6A-7D82-4563-AAF5-D5AFA3734D79-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> @ 2013-08-25 15:02 ` Anton Eliasson [not found] ` <521A1C88.9080100-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-08-25 15:02 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Vyacheslav Dubeyko skrev 2013-08-19 21:55: > On Aug 15, 2013, at 2:40 PM, Anton Eliasson wrote: > > [snip] >> Hi again. I was able to reproduce the crash on a fully updated system by starting the two virtual machines simultaneously as described in my e-mail from May 25. I made a new attempt to rebuild the kernel with your patches. I selected these options in make menuconfig [1], which resulted in this generated config.x86_64 [2] which has the following diff compared to the stock config.x86_64: >> > As I remember, you reported about remount file system in RO mode > and many "broken bnode" error messages issue, initially. Unfortunately, > as I can see, you can't reproduce this issue. I really had hope that you > can reproduce this important issue. > > As I see, shared by you logs with crush contain details about the issue > that it was reported also by Jérôme Poulin <jeromepoulin-Re5JQEeQqe8@public.gmane.orgm>. > I mean this error message: > > [ 304.494448] BUG: unable to handle kernel paging request at 00000000000013f6 > [ 304.494456] IP: [<ffffffffa1327232>] nilfs_end_page_io+0x12/0xc0 [nilfs2] > > I can reproduce this issue on my side and this issue is under investigation yet. > > But anyway... Could you try to reproduce the issue with remounting > file system in RO mode? It is really important and annoying issue. Yes, that one is much easier to reproduce. I simply try to read one of the corrupted files in /home. See below. I have no idea how the actual corruption happened, however. > [...] > As I remember, I asked you about enabling more configuration options. > I mean such options: > CONFIG_NILFS2_DEBUG_BASE_OPERATIONS, > CONFIG_NILFS2_DEBUG_MDT_FILES, > CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM, > CONFIG_NILFS2_DEBUG_BLOCK_MAPPING. > > I suppose that you don't enable these options because it has dependence > from "Enable output from subsystem" option. But, anyway, I am afraid > that you don't reproduce the issue in the case of these options enabling. > But maybe you will be more lucky in such trying. :) I think I got it right this time. The missing options appeared after I enabled CONFIG_NILFS2_DEBUG_SUBSYSTEMS. The config I used is here [1], which has the following diff compared to the upstream config: --- config.x86_64 2013-08-25 06:53:05.000000000 +0200 +++ config.x86_64.last 2013-08-25 15:24:51.118711529 +0200 @@ -1,6 +1,6 @@ # # Automatically generated file; DO NOT EDIT. -# Linux/x86 3.10.5-1 Kernel Configuration +# Linux/x86 3.10.9-1 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y @@ -5452,6 +5452,20 @@ # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set # CONFIG_BTRFS_DEBUG is not set CONFIG_NILFS2_FS=m +CONFIG_NILFS2_DEBUG=y +# CONFIG_NILFS2_USE_PR_DEBUG is not set +CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y +CONFIG_NILFS2_DEBUG_DUMP_STACK=y +CONFIG_NILFS2_DEBUG_SUBSYSTEMS=y +CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y +CONFIG_NILFS2_DEBUG_MDT_FILES=y +CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y +# CONFIG_NILFS2_DEBUG_GC_SUBSYSTEM is not set +# CONFIG_NILFS2_DEBUG_RECOVERY_SUBSYSTEM is not set +CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y +# CONFIG_NILFS2_DEBUG_BUFFER_MANAGEMENT is not set +# CONFIG_NILFS2_DEBUG_SHOW_SPAM is not set +# CONFIG_NILFS2_DEBUG_HEXDUMP is not set CONFIG_FS_POSIX_ACL=y CONFIG_EXPORTFS=y CONFIG_FILE_LOCKING=y > Anyway, thank you for your efforts. It will be really great if you will be lucky > and will reproduce the issue with remount file system in RO mode > and many "broken bnode" error messages. Could you try again? > > Thanks, > Vyacheslav Dubeyko. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Yes. Here's another huge kernel.log for you [2]. It's 19 MB compressed and 282 MB uncompressed. I blanked the log while running the stock kernel and then rebooted to the custom debugging kernel. X wouldn't start so I just logged in to a virtual terminal, changed directory to "~/Bilder/20130321-28 Jakobs bilder från Nederländerna" and then executed `cat 179.JPG >/dev/null`. This caused a read-only remount and a bunch of "broken bmap" messages to show, followed by an "Input/Output error". I saved a copy of /var/log/kernel.log as soon as I could after that, before reinstalling the stock kernel and rebooting. [1]: http://antoneliasson.se/publicdump/config.x86_64.last.20130825 [2]: http://antoneliasson.se/publicdump/kernel.log.20130825.gz -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <521A1C88.9080100-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Nilfs2 crash debugging [not found] ` <521A1C88.9080100-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-08-26 9:56 ` Vyacheslav Dubeyko 2013-08-26 18:37 ` Anton Eliasson 0 siblings, 1 reply; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-08-26 9:56 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Sun, 2013-08-25 at 17:02 +0200, Anton Eliasson wrote: [snip] > Yes. Here's another huge kernel.log for you [2]. It's 19 MB compressed > and 282 MB uncompressed. I blanked the log while running the stock > kernel and then rebooted to the custom debugging kernel. X wouldn't > start so I just logged in to a virtual terminal, changed directory to > "~/Bilder/20130321-28 Jakobs bilder från Nederländerna" and then > executed `cat 179.JPG >/dev/null`. > > This caused a read-only remount and a bunch of "broken bmap" messages to > show, followed by an "Input/Output error". I saved a copy of > /var/log/kernel.log as soon as I could after that, before reinstalling > the stock kernel and rebooting. > > [1]: http://antoneliasson.se/publicdump/config.x86_64.last.20130825 > [2]: http://antoneliasson.se/publicdump/kernel.log.20130825.gz > Yes, it's great. Thank you. Now I can investigate the issue's environment. I suspect that this issue is related to the issue with "unable to handle kernel paging request" in nilfs_end_page_io(). But, maybe, I am wrong. Anyway, it is a good basis for more detailed understanding of the issue. Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Nilfs2 crash debugging 2013-08-26 9:56 ` Vyacheslav Dubeyko @ 2013-08-26 18:37 ` Anton Eliasson [not found] ` <521BA084.80901-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-08-26 18:37 UTC (permalink / raw) To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Vyacheslav Dubeyko skrev 2013-08-26 11:56: > On Sun, 2013-08-25 at 17:02 +0200, Anton Eliasson wrote: > > [snip] >> Yes. Here's another huge kernel.log for you [2]. It's 19 MB compressed >> and 282 MB uncompressed. I blanked the log while running the stock >> kernel and then rebooted to the custom debugging kernel. X wouldn't >> start so I just logged in to a virtual terminal, changed directory to >> "~/Bilder/20130321-28 Jakobs bilder från Nederländerna" and then >> executed `cat 179.JPG >/dev/null`. >> >> This caused a read-only remount and a bunch of "broken bmap" messages to >> show, followed by an "Input/Output error". I saved a copy of >> /var/log/kernel.log as soon as I could after that, before reinstalling >> the stock kernel and rebooting. >> >> [1]: http://antoneliasson.se/publicdump/config.x86_64.last.20130825 >> [2]: http://antoneliasson.se/publicdump/kernel.log.20130825.gz >> > Yes, it's great. Thank you. Now I can investigate the issue's > environment. I suspect that this issue is related to the issue with > "unable to handle kernel paging request" in nilfs_end_page_io(). But, > maybe, I am wrong. Anyway, it is a good basis for more detailed > understanding of the issue. > > Thanks, > Vyacheslav Dubeyko. > > You're welcome. And thank you for your thorough instructions. It's been very informative and worthwhile for me to patch and build a kernel with custom options. Let me know if you need more experiments run on the damaged filesystem. Otherwise I'll delete the stored disk images in a month or two. -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <521BA084.80901-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Nilfs2 crash debugging [not found] ` <521BA084.80901-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-08-30 5:58 ` Vyacheslav Dubeyko 2013-09-04 19:39 ` Anton Eliasson 0 siblings, 1 reply; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-08-30 5:58 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Mon, 2013-08-26 at 20:37 +0200, Anton Eliasson wrote: > > > > > You're welcome. And thank you for your thorough instructions. It's been > very informative and worthwhile for me to patch and build a kernel with > custom options. Let me know if you need more experiments run on the > damaged filesystem. Otherwise I'll delete the stored disk images in a > month or two. > As I remember, you reproduced the issue by means of starting of two virtual machines. I think that I will try to reproduce the issue by this way. But I am investigating the another issue currently and, unfortunately, I haven't opportunities for investigating this issue in parallel. I don't fully confident that it is possible to do it. But, does it possible to collect strace output of virtual machines starting for the case of reproduced issue? What do you think? You have shared kernel log for the reproduced issue case, currently. But strace output can give interesting details from the user-space point of view. Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Nilfs2 crash debugging 2013-08-30 5:58 ` Vyacheslav Dubeyko @ 2013-09-04 19:39 ` Anton Eliasson [not found] ` <52278C63.6090303-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-09-04 19:39 UTC (permalink / raw) To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Vyacheslav Dubeyko skrev 2013-08-30 07:58: > On Mon, 2013-08-26 at 20:37 +0200, Anton Eliasson wrote: > >>> >> You're welcome. And thank you for your thorough instructions. It's been >> very informative and worthwhile for me to patch and build a kernel with >> custom options. Let me know if you need more experiments run on the >> damaged filesystem. Otherwise I'll delete the stored disk images in a >> month or two. >> > As I remember, you reproduced the issue by means of starting of two > virtual machines. I think that I will try to reproduce the issue by this > way. But I am investigating the another issue currently and, > unfortunately, I haven't opportunities for investigating this issue in > parallel. > > I don't fully confident that it is possible to do it. But, does it > possible to collect strace output of virtual machines starting for the > case of reproduced issue? What do you think? You have shared kernel log > for the reproduced issue case, currently. But strace output can give > interesting details from the user-space point of view. > > Thanks, > Vyacheslav Dubeyko. > > I spent about an hour trying to reproduce this today. I built Linux 3.10.10 using your patches from June. The patch command reported some offsets and fuzz so it seems that the nilfs driver has changed since the last kernel version. I don't know if the updates affect this bug. With this new cusom kernel, everything I/O related ran very slowly. The nilfs garbage collector used 100 % CPU constantly. Killing it sped things up a little. I started and stopped the virtual machines a few times, with reboots in between. Eventually the system tried to touch some corrupted parts of the virtual machine image and /home remounted read-only. At that point I gave up. I doubt the strace output will help you but I uploaded it here [1] anyway. VMware Workstation is a complex application that consists of many executables. Some are run directly by the user, some as system services and some as kernel modules. Picking the right place to stick the multimeter probe is probably difficult. Unfortunately I forgot to install syslog-ng today and my instance of systemd is not configured to log verbosely enough to capture the kernel debug output. So no kernel.log for today. This is all starting to feel like a waste of time for me as I don't even use nilfs on any of my machines anymore. I'm going to withdraw my offer to debug these issues any further. Sorry. I hope you have gathered enough information to solve them and I wish you the best of luck. [1]: http://antoneliasson.se/publicdump/vmware-strace.log.gz -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <52278C63.6090303-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Nilfs2 crash debugging [not found] ` <52278C63.6090303-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-09-04 20:00 ` Vyacheslav Dubeyko 0 siblings, 0 replies; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-09-04 20:00 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Anton, On Sep 4, 2013, at 11:39 PM, Anton Eliasson wrote: [snip] >> > I spent about an hour trying to reproduce this today. I built Linux 3.10.10 using your patches from June. The patch command reported some offsets and fuzz so it seems that the nilfs driver has changed since the last kernel version. I don't know if the updates affect this bug. With this new cusom kernel, everything I/O related ran very slowly. The nilfs garbage collector used 100 % CPU constantly. Killing it sped things up a little. > > I started and stopped the virtual machines a few times, with reboots in between. Eventually the system tried to touch some corrupted parts of the virtual machine image and /home remounted read-only. At that point I gave up. I doubt the strace output will help you but I uploaded it here [1] anyway. VMware Workstation is a complex application that consists of many executables. Some are run directly by the user, some as system services and some as kernel modules. Picking the right place to stick the multimeter probe is probably difficult. > > Unfortunately I forgot to install syslog-ng today and my instance of systemd is not configured to log verbosely enough to capture the kernel debug output. So no kernel.log for today. This is all starting to feel like a waste of time for me as I don't even use nilfs on any of my machines anymore. I'm going to withdraw my offer to debug these issues any further. Sorry. I hope you have gathered enough information to solve them and I wish you the best of luck. > > [1]: http://antoneliasson.se/publicdump/vmware-strace.log.gz > Thank you for your efforts. I think that I have discovered the reason of all issues that you were reported. I posted the patch ([PATCH] [CRITICAL] nilfs2: fix issue with race condition of competition between segments for dirty blocks) in Monday. Currently, this patch is under discussion. So, I hope that NILFS2 will be more stable. Anyway, your reports were very important for finding the reason of issues and fix elaboration. Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <51A0A97A.4020503@antoneliasson.se>]
[parent not found: <713B7146-DC0C-45AE-9ED2-30EB8F84FA57@dubeyko.com>]
[parent not found: <713B7146-DC0C-45AE-9ED2-30EB8F84FA57-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <713B7146-DC0C-45AE-9ED2-30EB8F84FA57-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> @ 2013-05-27 12:45 ` Anton Eliasson [not found] ` <51A35558.1080503-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-05-27 12:45 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: devel-17Olwe7vw2dLC78zk6coLg Vyacheslav Dubeyko skrev 2013-05-26 14:59: > Hi Anton, > > On May 25, 2013, at 4:07 PM, Anton Eliasson wrote: > >> > Thank you for additional details. > > But, as I remember, Ryusuke asked to try such commands too: > > $ sudo nilfs-tune -l /dev/dm-3 > $ sudo dumpseg /dev/dm-3 7007 > $ lssu -a /dev/dm-3 > > Could you share output of these commands? > > My messages are being silently swallowed! Maybe your list doesn't like my attachments? This is the third attempt and this time without attachments. Ryusuke Konishi skrev 2013-05-23 03:40: > Hi, > On Wed, 22 May 2013 22:36:02 +0200, Anton Eliasson wrote: >> Anton Eliasson skrev 2013-05-22 22:33: >>> Greetings! >>> It pains me to report that my /home filesystem broke down today. My >>> system is running Arch Linux 64-bit. The filesystem resides on a >>> Crucial M4 256 GB SSD, on top of a LVM2 volume. The drive and >>> filesystem are both around six months old. Partition table and error >>> log excerpts are at the bottom of this e-mail. Full logs are available >>> upon request. >>> >>> I am providing this information as a bug report. I have no reason to >>> suspect the hardware but I cannot exclude it either. If you (the >>> developers) are interested in troubleshooting this for prosperity, I >>> can be your hands and run whatever tools are required. If not, I'll >>> reformat the filesystem, restore the data from backup and forget that >>> this happened. >>> >>> In case the formatting gets mangled, this e-mail is also available at >> Right here: http://paste.debian.net/5841/ > Thank you for the report. > > According to the log, btree of a regular file is destroyed for some reason. > I think we should look into how the btree block is broken. > > Could you try the following commands to inspect the broken disk segment ? > > $ sudo dd if=/dev/dm-3 bs=4k count=2048 skip=14350336 iflag=direct 2>/dev/null | hexdump -C There's some semi-private stuff in there so I'll e-mail it separately to Ryusuke Konishi and Vyacheslav Dubeyko. > > This will print out blocks of the segment 7007 which includes the > broken btree block. > > The following commands are also useful to get debug information. > Could you try them, too ? > > $ sudo nilfs-tune -l /dev/dm-3 Today (May 23) it's called dm-2 but I don't think that should matter. nilfs-tune 2.1.5 Filesystem volume name: home Filesystem UUID: e4e8bd9a-12f6-4c2a-b32f-9471f1b321fc Filesystem magic number: 0x3434 Filesystem revision #: 2.0 Filesystem features: (none) Filesystem state: invalid or mounted,error Filesystem OS type: Linux Block size: 4096 Filesystem created: Sat Oct 6 15:52:11 2012 Last mount time: Sat May 25 10:42:30 2013 Last write time: Sat May 25 10:42:30 2013 Mount count: 143 Maximum mount count: 50 Reserve blocks uid: 0 (user root) Reserve blocks gid: 0 (group root) First inode: 11 Inode size: 128 DAT entry size: 32 Checkpoint size: 192 Segment usage size: 16 Number of segments: 14039 Device size: 117771862016 First data block: 1 # of blocks per segment: 2048 Reserved segments %: 5 Last checkpoint #: 1260585 Last block address: 430080 Last sequence #: 1557848 Free blocks count: 10317824 Commit interval: 0 # of blks to create seg: 0 CRC seed: 0xfb8deb0b CRC check sum: 0x0db18bf2 CRC check data size: 0x00000118 > $ sudo dumpseg /dev/dm-3 7007 http://antoneliasson.se/publicdump/dumpseg-home-Anton_Eliasson-20130525.gz > $ lssu -a /dev/dm-3 I ran this on May 23 but haven't had time to compose this e-mail until two days ago. During that period I mounted the filesystem as rw once or twice and I unfortunately forgot to kill nilfs_cleanerd so some of the segments might have moved around. So I have rerun lssu and uploaded both outputs here: http://antoneliasson.se/publicdump/lssu-Anton_Eliasson-20130523.gz http://antoneliasson.se/publicdump/lssu-Anton_Eliasson-20130525.gz > > The third command requires the device is mounted, so /home should be > mounted previously with a readonly option and a norecovery option: > > $ sudo mount -t nilfs2 -o ro,norecovery /dev/dm-3 /home > Additionally, I have uploaded /var/log/everything.log spanning May 19-22 here: http://antoneliasson.se/publicdump/everything.log.gz The first system crash is on line 14748. On line 15829 onwards nilfs warns that an fs is unchecked and has a bad checksum. On line 16206 is the first bad btree node error. I copied the entire /var/log tree a reboot or two after I figured out that I had a bad fs. Please tell me if you need any other log files from there. -- Best Regards Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <51A35558.1080503-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <51A35558.1080503-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-05-27 13:23 ` Vyacheslav Dubeyko 0 siblings, 0 replies; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-05-27 13:23 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Anton, On Mon, 2013-05-27 at 14:45 +0200, Anton Eliasson wrote: [snip] > Additionally, I have uploaded /var/log/everything.log spanning May 19-22 > here: > http://antoneliasson.se/publicdump/everything.log.gz > > The first system crash is on line 14748. On line 15829 onwards nilfs > warns that an fs is unchecked and has a bad checksum. On line 16206 is > the first bad btree node error. I copied the entire /var/log tree a > reboot or two after I figured out that I had a bad fs. Please tell me if > you need any other log files from there. > Thank you for all additional details and, especially, for the system log. Your system log is really interesting. It is first time when I can see crash dump before error messages about broken bmap. It is really important detail, I suppose. So, I need to investigate all your information more deeply. Maybe, I will ask about additional information later. But, currently, I have enough and I need to think over about known issue's environment. Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Broken nilfs2 filesystem
@ 2013-05-22 20:33 Anton Eliasson
[not found] ` <519D2B96.9000106-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
0 siblings, 1 reply; 37+ messages in thread
From: Anton Eliasson @ 2013-05-22 20:33 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: devel-17Olwe7vw2dLC78zk6coLg
[-- Attachment #1: Type: text/plain, Size: 18942 bytes --]
Greetings!
It pains me to report that my /home filesystem broke down today. My
system is running Arch Linux 64-bit. The filesystem resides on a Crucial
M4 256 GB SSD, on top of a LVM2 volume. The drive and filesystem are
both around six months old. Partition table and error log excerpts are
at the bottom of this e-mail. Full logs are available upon request.
I am providing this information as a bug report. I have no reason to
suspect the hardware but I cannot exclude it either. If you (the
developers) are interested in troubleshooting this for prosperity, I can
be your hands and run whatever tools are required. If not, I'll reformat
the filesystem, restore the data from backup and forget that this happened.
In case the formatting gets mangled, this e-mail is also available at
What happened today, in chronological order:
~18:00
======
I am troubleshooting some issues that turn out to be caused by a wrongly
configured system clock. The RTC (hardware clock) is set to local time
(UTC+2) but the OS is configured to treat the RTC as UTC. This is
because it was set to UTC previously, but then I reinstalled Windows
which promptly reset it to local time.
This set the mtime of some files in both / and /home to dates in the
future. When I discovered this, I `touch`ed all affected files (`touch
now; sudo find / /home -xdev -newer now -exec touch {} \;`) to reset
their mtime and rebooted the system. I do not know if this is relevant;
if not, it makes reading the log files more fun.
I then launch my command line backup program "bup", Firefox and some
other apps.
~18:50-19:00
============
Firefox freezes. The system keeps running but I can't launch new
programs. It looked like all I/O broke down. However, bup kept running.
I left the computer alone for perhaps 30-60 min.
~20:00
======
When I came back, bup hade frozen (/var/log/messages at 18:53:31).[1] I
restart X by pressing Alt+SysRq+K (/var/log/messages at 20:06:33) and
return to the login screen. The system freezes during login though,
probably because /home had probably been mounted read only). So I reboot
using Alt+SysRq+REISUB (/var/log/messages at 20:07:05). I noticed some
I/O errors during shutdown.
After the reboot there are no immediate signs of disaster. I launch bup
again. Some time later, /home remounts as read only. I notice that bup
has reported I/O errors while reading some files in /home.[2] dmesg and
/var/log/kern.log contains errors mentioning "bad btree node" and
"nilfs_bmap_lookup_contig: broken bmap".[3]
I proceed to examine one of the files that bup reported I/O errors for:
[2/5.0.2]{1}anton@riven:~/Bilder/20130321-28 Jakobs bilder från
Nederländerna> LANG=C stat 179.JPG
File: '179.JPG'
Size: 3774546 Blocks: 7416 IO Block: 4096 regular file
Device: fe03h/65027d Inode: 136492 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ anton) Gid: ( 1000/
anton)
Access: 2013-03-28 22:04:48.000000000 +0100
Modify: 2013-03-28 22:04:48.000000000 +0100
Change: 2013-04-30 16:40:34.053914840 +0200
Birth: -
[2/5.0.2]anton@riven:~/Bilder/20130321-28 Jakobs bilder från
Nederländerna> LANG= cat 179.JPG > /dev/null
cat: 179.JPG: Input/output error
[2/5.0.2]{1}anton@riven:~/Bilder/20130321-28 Jakobs bilder från
Nederländerna> LANG=C cat 179.JPG > /dev/null
cat: 179.JPG: Input/output error
[2/5.0.2]{1}anton@riven:~/Bilder/20130321-28 Jakobs bilder från
Nederländerna> dmesg | tail
[ 3762.363260] NILFS: bad btree node (blocknr=14351789): level = 0,
flags = 0x0, nchildren = 0
[ 3762.363269] NILFS error (device dm-3): nilfs_bmap_lookup_contig:
broken bmap (inode number=136492)
[ 3855.881972] NILFS: bad btree node (blocknr=14351789): level = 0,
flags = 0x0, nchildren = 0
[ 3855.881980] NILFS error (device dm-3): nilfs_bmap_lookup_contig:
broken bmap (inode number=136492)
[ 3857.977754] NILFS: bad btree node (blocknr=14351789): level = 0,
flags = 0x0, nchildren = 0
[ 3857.977763] NILFS error (device dm-3): nilfs_bmap_lookup_contig:
broken bmap (inode number=136492)
[2/5.0.2]anton@riven:~/Bilder/20130321-28 Jakobs bilder från
Nederländerna> sudo journalctl -xn
[sudo] password for anton:
-- Logs begin at fre 2012-10-19 17:29:01 CEST, end at ons
2013-05-22 21:12:17 CEST. --
maj 22 21:09:45 riven sudo[1545]: pam_unix(sudo:session): session
closed for user root
maj 22 21:10:42 riven kernel: NILFS: bad btree node
(blocknr=14351789): level = 0, flags = 0x0, nchildren = 0
maj 22 21:10:42 riven kernel: [87B blob data]
maj 22 21:11:12 riven sudo[1563]: anton : TTY=pts/3 ;
PWD=/Athena/Dump/nilfs-felsökning-2013-05-22 ; USER=root ; COMM
maj 22 21:11:12 riven sudo[1563]: pam_unix(sudo:session): session
opened for user root by anton(uid=0)
maj 22 21:11:18 riven sudo[1563]: pam_unix(sudo:session): session
closed for user root
maj 22 21:12:15 riven kernel: NILFS: bad btree node
(blocknr=14351789): level = 0, flags = 0x0, nchildren = 0
maj 22 21:12:15 riven kernel: [87B blob data]
maj 22 21:12:17 riven kernel: NILFS: bad btree node
(blocknr=14351789): level = 0, flags = 0x0, nchildren = 0
maj 22 21:12:17 riven kernel: [87B blob data]
[2/5.0.2]anton@riven:~/Bilder/20130321-28 Jakobs bilder från
Nederländerna> LANG=C rm 179.JPG
rm: cannot remove '179.JPG': Read-only file system
System configuration
====================
Summary
-------
/ is on the logical volume riven/arch. /home is on the logical volume
riven/home. The volume group riven is on the physical volume sda2. The
physical volume sda2 is on the Crucial M4 SSD.
There is also a volume group riven-proto on a 1 TB hard disk drive
containing an old unused Arch Linux installation and a filesystem
"supplement" which is still used.
There is one NTFS partition on the SSD sda and one on the HDD sdb. sdb3
is an old unused ext4 partition.
/etc/fstab
----------
tmpfs /tmp tmpfs nodev,nosuid 0 0
/dev/mapper/riven-arch / nilfs2 rw,noatime,discard 0 0
/dev/mapper/riven-home /home nilfs2 rw,noatime,discard 0 0
/dev/mapper/riven-swap none swap defaults
0 0
/dev/riven-proto/supplement /Supplement ext4 defaults,noatime 0 0
# some NFS mounts excluded
$ sudo lvdisplay
----------------
--- Logical volume ---
LV Path /dev/riven-proto/arch
LV Name arch
VG Name riven-proto
LV UUID GA0SNf-N1rZ-ErCU-ALG5-0L6D-Ix4j-s3cLe4
LV Write Access read/write
LV Creation host, time archiso, 2012-09-27 22:42:09 +0200
LV Status available
# open 0
LV Size 30,00 GiB
Current LE 7680
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:1
--- Logical volume ---
LV Path /dev/riven-proto/supplement
LV Name supplement
VG Name riven-proto
LV UUID jgrYcK-fEm9-tAOq-I6PR-PB4N-Khet-E0qsaU
LV Write Access read/write
LV Creation host, time riven, 2012-10-31 18:31:31 +0100
LV Status available
# open 1
LV Size 200,00 GiB
Current LE 51200
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:4
--- Logical volume ---
LV Path /dev/riven/swap
LV Name swap
VG Name riven
LV UUID 28r67b-M7hy-2orC-5snN-7CUu-jqGn-x1vxXc
LV Write Access read/write
LV Creation host, time archiso, 2012-10-06 15:47:56 +0200
LV Status available
# open 2
LV Size 1,00 GiB
Current LE 256
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:0
--- Logical volume ---
LV Path /dev/riven/arch
LV Name arch
VG Name riven
LV UUID QAjHWq-5eDv-IyQe-ihiq-dpyZ-acIR-Imox4Y
LV Write Access read/write
LV Creation host, time archiso, 2012-10-06 15:48:50 +0200
LV Status available
# open 2
LV Size 30,00 GiB
Current LE 7680
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:2
--- Logical volume ---
LV Path /dev/riven/home
LV Name home
VG Name riven
LV UUID YAQgTA-3Cvo-fuzu-6Uaj-0C6m-Tzt9-SyGv2y
LV Write Access read/write
LV Creation host, time archiso, 2012-10-06 15:50:25 +0200
LV Status available
# open 1
LV Size 109,68 GiB
Current LE 28079
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:3
$ sudo vgdisplay
----------------------
--- Volume group ---
VG Name riven-proto
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 11
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 1
Max PV 0
Cur PV 1
Act PV 1
VG Size 512,41 GiB
PE Size 4,00 MiB
Total PE 131178
Alloc PE / Size 58880 / 230,00 GiB
Free PE / Size 72298 / 282,41 GiB
VG UUID HGfujG-CdYE-zQuD-xIOf-Lyus-8B3f-4WPPxp
--- Volume group ---
VG Name riven
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 4
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 3
Open LV 3
Max PV 0
Cur PV 1
Act PV 1
VG Size 140,68 GiB
PE Size 4,00 MiB
Total PE 36015
Alloc PE / Size 36015 / 140,68 GiB
Free PE / Size 0 / 0
VG UUID EZSXiE-F9Ec-vUX2-Ny0l-Fa2U-IIr6-6sAmIV
$ sudo pvdisplay
--------------------------
--- Physical volume ---
PV Name /dev/sdb2
VG Name riven-proto
PV Size 512,42 GiB / not usable 4,00 MiB
Allocatable yes
PE Size 4,00 MiB
Total PE 131178
Free PE 72298
Allocated PE 58880
PV UUID dAN0do-QWac-iBO6-tBxg-fr2z-ciNL-EZyOko
--- Physical volume ---
PV Name /dev/sda2
VG Name riven
PV Size 140,68 GiB / not usable 0
Allocatable yes (but full)
PE Size 4,00 MiB
Total PE 36015
Free PE 0
Allocated PE 36015
PV UUID KtuR1D-G2vj-8qb9-Kzyz-xkDP-weI9-NtHFMc
$ LANG=C sudo fdisk -l
----------------------
Disk /dev/sda: 256.1 GB, 256060514304 bytes, 500118192 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000e08ae
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 205078527 102538240 7 HPFS/NTFS/exFAT
/dev/sda2 205078528 500115455 147518464 83 Linux
Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000b5f5c
Device Boot Start End Blocks Id System
/dev/sdb1 63 615353759 307676848+ 7 HPFS/NTFS/exFAT
/dev/sdb2 878905344 1953523711 537309184 83 Linux
/dev/sdb3 * 877930496 878905343 487424 83 Linux
Partition table entries are not in disk order
Disk /dev/mapper/riven-swap: 1073 MB, 1073741824 bytes, 2097152 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/riven--proto-arch: 32.2 GB, 32212254720 bytes, 62914560
sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x20ac7dda
This doesn't look like a partition table
Probably you selected the wrong device.
Device Boot Start End Blocks
Id System
/dev/mapper/riven--proto-arch1 ? 3224498923 3657370039 216435558+
7 HPFS/NTFS/exFAT
/dev/mapper/riven--proto-arch2 ? 3272020941 5225480974 976730017
16 Hidden FAT16
/dev/mapper/riven--proto-arch3 ? 0 0 0
6f Unknown
/dev/mapper/riven--proto-arch4 50200576 974536369 462167897
0 Empty
Partition table entries are not in disk order
Disk /dev/mapper/riven-arch: 32.2 GB, 32212254720 bytes, 62914560 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/riven-home: 117.8 GB, 117771862016 bytes, 230023168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/riven--proto-supplement: 214.7 GB, 214748364800 bytes,
419430400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
$ ls -l /dev/riven/
-------------------
totalt 0
lrwxrwxrwx 1 root root 7 22 maj 20.08 arch -> ../dm-2
lrwxrwxrwx 1 root root 7 22 maj 20.08 home -> ../dm-3
lrwxrwxrwx 1 root root 7 22 maj 20.08 swap -> ../dm-0
/etc/nilfs-cleanerd.conf
------------------------
protection_period 0
min_clean_segments 0
max_clean_segments 100%
clean_check_interval 10
selection_policy timestamp # timestamp in ascend order
nsegments_per_clean 2
mc_nsegments_per_clean 4
cleaning_interval 5
mc_cleaning_interval 1
retry_interval 60
use_mmap
log_priority info
References
==========
[1]:
See attached file "messages-excerpt-Anton_Eliasson-20130522"
[2]:
[2/5.0.2]anton@riven:~> bup.sh
/home/anton/etc-tomahna-20120615/etc/vmware/: [Errno 13] Permission
denied
Indexing: 117930, done.
bup: merging indexes (211106/211106), done.
WARNING: 1 errors encountered.
Reading index: 211091, done.
/home/anton/vmware/WXP/WXP-disk1.vmdk: [Errno 5] Input/output error
/home/anton/Bilder/20130321-28 Jakobs bilder från
Nederländerna/179.JPG: [Errno 5] Input/output error
/home/anton/Bilder/20130321-28 Jakobs bilder från
Nederländerna/172.JPG: [Errno 5] Input/output error
/home/anton/Bilder/20130321-28 Jakobs bilder från
Nederländerna/170.JPG: [Errno 5] Input/output error
/home/anton/Bilder/20130321-28 Jakobs bilder från
Nederländerna/165.JPG: [Errno 5] Input/output error
/home/anton/Bilder/20130321-28 Jakobs bilder från
Nederländerna/164.JPG: [Errno 5] Input/output error
/home/anton/Bilder/20130321-28 Jakobs bilder från
Nederländerna/163.JPG: [Errno 5] Input/output error
/home/anton/Bilder/20130321-28 Jakobs bilder från
Nederländerna/160.JPG: [Errno 5] Input/output error
Saving: 100.00% (52192637/52192637k, 211091/211091 files), done.
bloom: adding 1 file (59294 objects).
Traceback (most recent call last):
File "/usr/lib/bup/cmd/bup-save", line 431, in <module>
w.close() # must close before we can update the ref
File "/usr/lib/bup/bup/client.py", line 316, in close
id = self._end()
File "/usr/lib/bup/bup/client.py", line 313, in _end
return self.suggest_packs() # Returns last idx received
File "/usr/lib/bup/bup/client.py", line 233, in _suggest_packs
self.sync_index(idx)
File "/usr/lib/bup/bup/client.py", line 193, in sync_index
f = open(fn + '.tmp', 'w')
IOError: [Errno 30] Read-only file system:
'/home/anton/.bup/index-cache/hermes__Athena_Backup_bup/pack-90022de5619eee12a7611a33caf41047fb57a90a.idx.tmp'
[2/5.0.2]{1}anton@riven:~>
[3]:
[ 233.951973] NILFS: bad btree node (blocknr=19549978): level = 0,
flags = 0x0, nchildren = 0
[ 233.951982] NILFS error (device dm-3): nilfs_bmap_lookup_contig:
broken bmap (inode number=4301)
[ 233.955999] Remounting filesystem read-only
[ 233.956119] NILFS: bad btree node (blocknr=19549978): level = 0,
flags = 0x0, nchildren = 0
[ 233.956125] NILFS error (device dm-3): nilfs_bmap_lookup_contig:
broken bmap (inode number=4301)
[ 233.956417] NILFS: bad btree node (blocknr=19549978): level = 0,
flags = 0x0, nchildren = 0
[ 233.956422] NILFS error (device dm-3): nilfs_bmap_lookup_contig:
broken bmap (inode number=4301)
[ 233.956524] NILFS: bad btree node (blocknr=19549978): level = 0,
flags = 0x0, nchildren = 0
[ 233.956530] NILFS error (device dm-3): nilfs_bmap_lookup_contig:
broken bmap (inode number=4301)
...
[ 819.004092] NILFS: bad btree node (blocknr=14351789): level = 0,
flags = 0x0, nchildren = 0
[ 819.004101] NILFS error (device dm-3): nilfs_bmap_lookup_contig:
broken bmap (inode number=136492)
[ 819.004177] NILFS: bad btree node (blocknr=14351789): level = 0,
flags = 0x0, nchildren = 0
[ 819.004181] NILFS error (device dm-3): nilfs_bmap_lookup_contig:
broken bmap (inode number=136492)
[ 819.004257] NILFS: bad btree node (blocknr=14351789): level = 0,
flags = 0x0, nchildren = 0
[ 819.004263] NILFS error (device dm-3): nilfs_bmap_lookup_contig:
broken bmap (inode number=136492)
...
--
Best Regards,
Anton Eliasson
[-- Attachment #2: messages-excerpt-Anton_Eliasson-20130522 --]
[-- Type: text/plain, Size: 36354 bytes --]
May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Pinging tcp://notifications.sparkleshare.org:443/
May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Received pong from tcp://notifications.sparkleshare.org:443/
May 22 18:53:31 riven kernel: [ 3821.605591] PGD 19636d067 PUD 19636e067 PMD 0
May 22 18:53:31 riven kernel: [ 3821.605597] Oops: 0000 [#1] PREEMPT SMP
May 22 18:53:31 riven kernel: [ 3821.605602] Modules linked in: nfsv3 nfs_acl ppdev parport_pc parport fuse vsock btrfs nvidia(PO) raid6_pq crc32c libcrc32c zlib_deflate iTCO_wdt iTCO_vendor_support gpio_ich ext4 crc16 mbcache xor jbd2 coretemp kvm_intel kvm snd_hda_codec_realtek microcode psmouse pcspkr serio_raw lpc_ich i2c_i801 snd_hda_intel r8169 evdev snd_hda_codec drm mii i2c_core snd_hwdep snd_pcm acpi_cpufreq snd_page_alloc mperf snd_timer snd button soundcore intel_agp intel_gtt processor loop nfs lockd sunrpc fscache nilfs2 dm_mod sd_mod sr_mod cdrom ata_generic pata_acpi hid_generic usbhid hid pata_it8213 ahci libahci firewire_ohci libata firewire_core scsi_mod crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common
May 22 18:53:31 riven kernel: [ 3821.605669] CPU 2
May 22 18:53:31 riven kernel: [ 3821.605674] Pid: 250, comm: nilfs_cleanerd Tainted: P O 3.9.3-1-ARCH #1 Gigabyte Technology Co., Ltd. EP45-DS4/EP45-DS4
May 22 18:53:31 riven kernel: [ 3821.605677] RIP: 0010:[<ffffffffa027f1a2>] [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605686] RSP: 0018:ffff8801960f7b30 EFLAGS: 00010202
May 22 18:53:31 riven kernel: [ 3821.605690] RAX: ffff880101b49250 RBX: 00000000000036cd RCX: 0000000000000034
May 22 18:53:31 riven kernel: [ 3821.605692] RDX: 000000000000000d RSI: 0000000000000000 RDI: 00000000000036cd
May 22 18:53:31 riven kernel: [ 3821.605695] RBP: ffff8801960f7b38 R08: 1c00000000000000 R09: a80000c80e000000
May 22 18:53:31 riven kernel: [ 3821.605697] R10: 57ffe937f2320380 R11: 0000000000000019 R12: ffff8801a27ac738
May 22 18:53:31 riven kernel: [ 3821.605700] R13: ffff880101b49208 R14: ffffea00001eab40 R15: ffffea00001dee80
May 22 18:53:31 riven kernel: [ 3821.605703] FS: 00007f44fa5d0740(0000) GS:ffff8801afd00000(0000) knlGS:0000000000000000
May 22 18:53:31 riven kernel: [ 3821.605706] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 22 18:53:31 riven kernel: [ 3821.605709] CR2: 00000000000036cd CR3: 000000019636a000 CR4: 00000000000007e0
May 22 18:53:31 riven kernel: [ 3821.605711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 22 18:53:31 riven kernel: [ 3821.605714] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 22 18:53:31 riven kernel: [ 3821.605717] Process nilfs_cleanerd (pid: 250, threadinfo ffff8801960f6000, task ffff8801a509cb60)
May 22 18:53:31 riven kernel: [ 3821.605719] Stack:
May 22 18:53:31 riven kernel: [ 3821.605721] ffff8801a27ac690 ffff8801960f7c20 ffffffffa0280ed5 0000000000000002
May 22 18:53:31 riven kernel: [ 3821.605727] ffff8801a509cb60 ffff8801a509cb60 ffff8801a509cb60 ffff8801a3dc6070
May 22 18:53:31 riven kernel: [ 3821.605731] ffff8801a5cddd60 ffff8801a5cddc00 00000001001ead00 ffff8801a3dc6060
May 22 18:53:31 riven kernel: [ 3821.605736] Call Trace:
May 22 18:53:31 riven kernel: [ 3821.605747] [<ffffffffa0280ed5>] nilfs_segctor_do_construct+0xd65/0x1ab0 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605756] [<ffffffffa0281e42>] nilfs_segctor_construct+0x172/0x290 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605765] [<ffffffffa0282ead>] nilfs_clean_segments+0xed/0x270 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605771] [<ffffffff811bc4bc>] ? __set_page_dirty+0x6c/0xc0
May 22 18:53:31 riven kernel: [ 3821.605780] [<ffffffffa028906f>] nilfs_ioctl_clean_segments.isra.14+0x4bf/0x740 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605788] [<ffffffffa0279a8d>] ? nilfs_btree_lookup+0x4d/0x70 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605797] [<ffffffffa028970c>] nilfs_ioctl+0x21c/0x740 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605802] [<ffffffff814d0b76>] ? __schedule+0x3f6/0x940
May 22 18:53:31 riven kernel: [ 3821.605808] [<ffffffff8119cf65>] do_vfs_ioctl+0x2e5/0x4d0
May 22 18:53:31 riven kernel: [ 3821.605813] [<ffffffff81152930>] ? do_munmap+0x2b0/0x3e0
May 22 18:53:31 riven kernel: [ 3821.605818] [<ffffffff8119d1d1>] sys_ioctl+0x81/0xa0
May 22 18:53:31 riven kernel: [ 3821.605822] [<ffffffff814d3769>] ? do_device_not_available+0x19/0x20
May 22 18:53:31 riven kernel: [ 3821.605827] [<ffffffff814d9e9d>] system_call_fastpath+0x1a/0x1f
May 22 18:53:31 riven kernel: [ 3821.605829] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4
May 22 18:53:31 riven kernel: [ 3821.605881] RSP <ffff8801960f7b30>
May 22 18:53:31 riven kernel: [ 3821.605884] CR2: 00000000000036cd
May 22 18:53:31 riven kernel: [ 3821.605887] ---[ end trace 21dfcc9b8d62edba ]---
May 22 18:55:14 riven slim[274]: 18:55:14 | ListenerTcp | Pinging tcp://notifications.sparkleshare.org:443/
May 22 18:58:33 riven slim[274]: /home/buildbot/buildslave_steam/steam_rel_client_ubuntu12_linux/build/src/clientdll/../common/pipes.cpp (722) : Assertion Failed: Stalled cross-thread pipe
May 22 18:58:33 riven slim[274]: /home/buildbot/buildslave_steam/steam_rel_client_ubuntu12_linux/build/src/clientdll/../common/pipes.cpp (722) : Fatal assert failed: /home/buildbot/buildslave_steam/steam_rel_client_ubuntu12_linux/build/src/clientdll/../common/pipes.cpp, line 722. Application exiting.
May 22 18:58:33 riven slim[274]: Assert( Fatal assert ):/home/buildbot/buildslave_steam/steam_rel_client_ubuntu12_linux/build/src/clientdll/../common/pipes.cpp:722
May 22 18:58:33 riven slim[274]: Installing breakpad exception handler for appid(steam)/version(1368838102_client)
May 22 20:05:56 riven slim[274]: ESC[31m[16:43:03.489783 Warning]ESC[0m ESC[35m[ZeitgeistPlugin]ESC[0m Zeitgeist search failed: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface `org.gnome.zeitgeist.Index' on object at path /org/gnome/zeitgeist/index/activity
May 22 20:05:56 riven slim[274]: ESC[31m[16:43:03.570057 Warning]ESC[0m ESC[35m[ZeitgeistPlugin]ESC[0m Zeitgeist search failed: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface `org.gnome.zeitgeist.Index' on object at path /org/gnome/zeitgeist/index/activity
May 22 20:05:56 riven slim[274]: ESC[31m[16:43:03.681706 Warning]ESC[0m ESC[35m[ZeitgeistPlugin]ESC[0m Zeitgeist search failed: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface `org.gnome.zeitgeist.Index' on object at path /org/gnome/zeitgeist/index/activity
May 22 20:05:56 riven slim[274]: Got Event! 2, -1
May 22 20:05:56 riven slim[274]: Got KeyPress! keycode: 65, modifiers: 64
May 22 20:05:56 riven slim[274]: Calling handler for '<Super>space'...
May 22 20:06:23 riven slim[274]: ** (zeitgeist-datahub:592): WARNING **: zeitgeist-datahub.vala:209: Error during inserting events: Timeout was reached
May 22 20:06:33 riven kernel: [ 8203.651816] SysRq : SAK
May 22 20:06:33 riven kernel: [ 8203.651872] SAK: killed process 318 (X): task_session(p)==tty->session
May 22 20:06:33 riven kernel: [ 8203.651977] SAK: killed process 318 (X): task_session(p)==tty->session
May 22 20:06:33 riven slim[274]: xfce4-session: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: xfwm4: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: xfsettingsd: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: Thunar: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: xfce4-panel: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: wrapper: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: (pasystray:564): Gdk-WARNING **: pasystray: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: xfdesktop: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: wrapper: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: terminator: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: AL lib: pulseaudio.c:353: Received context failure!
May 22 20:06:33 riven slim[274]: AL lib: pulseaudio.c:366: Received stream failure!
May 22 20:06:33 riven slim[274]: /usr/bin/xauth: file /var/run/slim.auth does not exist
May 22 20:06:33 riven slim[274]: X.Org X Server 1.14.1
May 22 20:06:33 riven slim[274]: Release Date: 2013-04-17
May 22 20:06:33 riven slim[274]: X Protocol Version 11, Revision 0
May 22 20:06:33 riven slim[274]: Build Operating System: Linux 3.8.7-1-ARCH x86_64
May 22 20:06:33 riven slim[274]: Current Operating System: Linux riven 3.9.3-1-ARCH #1 SMP PREEMPT Sun May 19 22:50:29 CEST 2013 x86_64
May 22 20:06:33 riven slim[274]: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/riven-arch ro quiet
May 22 20:06:33 riven slim[274]: Build Date: 17 April 2013 02:37:06PM
May 22 20:06:33 riven slim[274]: Current version of pixman: 0.30.0
May 22 20:06:33 riven slim[274]: Before reporting problems, check http://wiki.x.org
May 22 20:06:33 riven slim[274]: to make sure that you have the latest version.
May 22 20:06:33 riven slim[274]: Markers: (--) probed, (**) from config file, (==) default setting,
May 22 20:06:33 riven slim[274]: (++) from command line, (!!) notice, (II) informational,
May 22 20:06:33 riven slim[274]: (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
May 22 20:06:33 riven slim[274]: (==) Log file: "/var/log/Xorg.0.log", Time: Wed May 22 20:06:33 2013
May 22 20:06:33 riven slim[274]: (==) Using config directory: "/etc/X11/xorg.conf.d"
May 22 20:06:33 riven slim[274]: Initializing built-in extension Generic Event Extension
May 22 20:06:33 riven slim[274]: Initializing built-in extension SHAPE
May 22 20:06:33 riven slim[274]: Initializing built-in extension MIT-SHM
May 22 20:06:33 riven slim[274]: Initializing built-in extension XInputExtension
May 22 20:06:33 riven slim[274]: Initializing built-in extension XTEST
May 22 20:06:33 riven slim[274]: Initializing built-in extension BIG-REQUESTS
May 22 20:06:33 riven slim[274]: Initializing built-in extension SYNC
May 22 20:06:33 riven slim[274]: Initializing built-in extension XKEYBOARD
May 22 20:06:33 riven slim[274]: Initializing built-in extension XC-MISC
May 22 20:06:33 riven slim[274]: Initializing built-in extension SECURITY
May 22 20:06:33 riven slim[274]: Initializing built-in extension XINERAMA
May 22 20:06:33 riven slim[274]: Initializing built-in extension XFIXES
May 22 20:06:33 riven slim[274]: Initializing built-in extension RENDER
May 22 20:06:33 riven slim[274]: Initializing built-in extension RANDR
May 22 20:06:33 riven slim[274]: Initializing built-in extension COMPOSITE
May 22 20:06:33 riven slim[274]: Initializing built-in extension DAMAGE
May 22 20:06:33 riven slim[274]: Initializing built-in extension MIT-SCREEN-SAVER
May 22 20:06:33 riven slim[274]: Initializing built-in extension DOUBLE-BUFFER
May 22 20:06:33 riven slim[274]: Initializing built-in extension RECORD
May 22 20:06:33 riven slim[274]: Initializing built-in extension DPMS
May 22 20:06:33 riven slim[274]: Initializing built-in extension X-Resource
May 22 20:06:33 riven slim[274]: Initializing built-in extension XVideo
May 22 20:06:33 riven slim[274]: Initializing built-in extension XVideo-MotionCompensation
May 22 20:06:33 riven slim[274]: Initializing built-in extension XFree86-VidModeExtension
May 22 20:06:33 riven slim[274]: Initializing built-in extension XFree86-DGA
May 22 20:06:33 riven slim[274]: Initializing built-in extension XFree86-DRI
May 22 20:06:33 riven slim[274]: Initializing built-in extension DRI2
May 22 20:06:33 riven slim[274]: Loading extension GLX
May 22 20:06:33 riven slim[274]: Loading extension NV-GLX
May 22 20:06:33 riven slim[274]: Loading extension NV-CONTROL
May 22 20:06:33 riven slim[274]: Loading extension XINERAMA
May 22 20:06:34 riven slim[274]: The XKEYBOARD keymap compiler (xkbcomp) reports:
May 22 20:06:34 riven slim[274]: > Warning: Type "ONE_LEVEL" has 1 levels, but <RALT> has 2 symbols
May 22 20:06:34 riven slim[274]: > Ignoring extra symbols
May 22 20:06:34 riven slim[274]: Errors from xkbcomp are not fatal to the X server
May 22 20:07:05 riven kernel: [ 8235.987564] SysRq : Keyboard mode set to system default
May 22 20:07:06 riven kernel: [ 8236.819557] SysRq : Terminate All Tasks
May 22 20:07:06 riven [ 8236.827760] systemd-journald[150]: Received SIGTERM
May 22 20:07:06 riven [ 8236.842434] systemd[1]: getty-2yfe/R6NngVTDjBF/Jpztg@public.gmane.org holdoff time over, scheduling restart.
May 22 20:07:06 riven [ 8236.842702] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven [ 8236.842783] systemd[1]: Stopping Getty on tty1...
May 22 20:07:06 riven [ 8236.842948] systemd[1]: Starting Getty on tty1...
May 22 20:07:06 riven [ 8236.843662] systemd-udevd[4989]: starting version 204
May 22 20:07:06 riven [ 8236.843883] systemd[1]: Started Getty on tty1.
May 22 20:07:06 riven [ 8236.844554] systemd[1]: Started udev Kernel Device Manager.
May 22 20:07:06 riven [ 8236.847758] systemd[1]: systemd-journald.service holdoff time over, scheduling restart.
May 22 20:07:06 riven [ 8236.847820] systemd[1]: Stopping Journal Service...
May 22 20:07:06 riven [ 8236.847915] systemd[1]: Starting Journal Service...
May 22 20:07:06 riven [ 8236.849708] systemd[1]: Started Journal Service.
May 22 20:07:06 riven [ 8236.849770] systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
May 22 20:07:06 riven ntpd[441]: ntpd exiting on signal 15
May 22 20:07:06 riven dhcpcd[415]: received SIGTERM, stopping
May 22 20:07:06 riven dhcpcd[415]: eth0: removing interface
May 22 20:07:06 riven nilfs_cleanerd[182]: shutdown
May 22 20:07:06 riven systemd[1]: systemd-udevd.service holdoff time over, scheduling restart.
May 22 20:07:06 riven systemd[1]: Stopping udev Kernel Device Manager...
May 22 20:07:06 riven systemd[1]: Starting udev Kernel Device Manager...
May 22 20:07:06 riven systemd[1]: rpcbind.service: main process exited, code=exited, status=2/INVALIDARGUMENT
May 22 20:07:06 riven systemd[1]: Unit rpcbind.service entered failed state.
May 22 20:07:06 riven systemd[1]: systemd-logind.service holdoff time over, scheduling restart.
May 22 20:07:06 riven systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven systemd[1]: Stopping Login Service...
May 22 20:07:06 riven systemd[1]: Starting Login Service...
May 22 20:07:06 riven systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
May 22 20:07:06 riven systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven systemd[1]: dhcpcd-ET31U/T6GptTDjBF/Jpztg@public.gmane.org: main process exited, code=exited, status=1/FAILURE
May 22 20:07:06 riven systemd[1]: dhcpcd-ET31U/T6GptTDjBF/Jpztg@public.gmane.org: control process exited, code=exited status=1
May 22 20:07:06 riven systemd[1]: Unit dhcpcd-ET31U/T6GptTDjBF/Jpztg@public.gmane.org entered failed state.
May 22 20:07:06 riven dhcpcd[5022]: dhcpcd[5022]: dhcpcd not running
May 22 20:07:06 riven systemd[1]: Started Login Service.
May 22 20:07:06 riven systemd[1]: sshd.service holdoff time over, scheduling restart.
May 22 20:07:06 riven systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven systemd[1]: Stopping OpenSSH Daemon...
May 22 20:07:06 riven systemd[1]: Started SSH Key Generation.
May 22 20:07:06 riven systemd[1]: Starting OpenSSH Daemon...
May 22 20:07:06 riven systemd[1]: Started OpenSSH Daemon.
May 22 20:07:06 riven systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven systemd[1]: Starting System Logger Daemon...
May 22 20:07:06 riven systemd[1]: Started System Logger Daemon.
May 22 20:07:06 riven systemd[1]: cronie.service holdoff time over, scheduling restart.
May 22 20:07:06 riven systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven systemd[1]: Started NFSv2/3 Network Status Monitor Daemon.
May 22 20:07:09 riven systemd[1]: sshd.service: main process exited, code=killed, status=9/KILL
May 22 20:07:09 riven systemd[1]: Unit sshd.service entered failed state.
May 22 20:07:09 riven systemd[1]: syslog-ng.service: main process exited, code=killed, status=9/KILL
May 22 20:07:09 riven systemd[1]: Unit syslog-ng.service entered failed state.
May 22 20:07:09 riven systemd[1]: systemd-journald.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven kernel: [ 8239.939532] SysRq : Kill All Tasks
May 22 20:07:10 riven [ 8239.942395] systemd[1]: Unit systemd-journald.service entered failed state.
May 22 20:07:10 riven [ 8239.943000] systemd[1]: systemd-udevd.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8239.943778] systemd[1]: Unit systemd-udevd.service entered failed state.
May 22 20:07:10 riven [ 8239.944266] systemd[1]: dbus.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8239.944989] systemd[1]: Unit dbus.service entered failed state.
May 22 20:07:10 riven [ 8239.945422] systemd[1]: cronie.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8239.946121] systemd[1]: Unit cronie.service entered failed state.
May 22 20:07:10 riven [ 8239.946582] systemd[1]: rpcbind.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8239.947427] systemd[1]: Unit rpcbind.service entered failed state.
May 22 20:07:10 riven rpc.statd[5056]: Version 1.2.8 starting
May 22 20:07:10 riven [ 8239.947857] systemd[1]: rpc-statd.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8239.948624] systemd[1]: Unit rpc-statd.service entered failed state.
May 22 20:07:10 riven [ 8239.949063] systemd[1]: getty-2yfe/R6NngVTDjBF/Jpztg@public.gmane.org holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8239.949661] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8239.950658] systemd[1]: Stopping Getty on tty1...
May 22 20:07:10 riven [ 8239.951036] systemd[1]: Starting Getty on tty1...
May 22 20:07:10 riven [ 8239.951991] systemd[1]: Started Getty on tty1.
May 22 20:07:10 riven [ 8239.952501] systemd[1]: systemd-journald.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8239.953037] systemd[1]: Stopping Journal Service...
May 22 20:07:10 riven [ 8239.961433] systemd[1]: Starting Journal Service...
May 22 20:07:10 riven [ 8239.970154] systemd[1]: Started Journal Service.
May 22 20:07:10 riven [ 8239.978363] systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
May 22 20:07:10 riven [ 8239.994267] systemd[1]: systemd-udevd.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.009916] systemd[1]: Stopping udev Kernel Device Manager...
May 22 20:07:10 riven [ 8240.012250] systemd-journald[5047]: File /var/log/journal/5b2137919d6f4039a3b2c2f21333daa6/system.journal corrupted or uncleanly shut down, renaming and replacing.
May 22 20:07:10 riven [ 8240.041143] systemd[1]: Starting udev Kernel Device Manager...
May 22 20:07:10 riven [ 8240.050383] systemd[1]: ntpd.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8240.053378] systemd-udevd[5049]: starting version 204
May 22 20:07:10 riven [ 8240.074818] systemd[1]: Unit ntpd.service entered failed state.
May 22 20:07:10 riven [ 8240.082734] systemd[1]: sshd.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.090582] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.115806] systemd[1]: Stopping OpenSSH Daemon...
May 22 20:07:10 riven [ 8240.122627] systemd[1]: Started SSH Key Generation.
May 22 20:07:10 riven [ 8240.129163] systemd[1]: Starting OpenSSH Daemon...
May 22 20:07:10 riven [ 8240.136086] systemd[1]: Started OpenSSH Daemon.
May 22 20:07:10 riven [ 8240.142750] systemd[1]: cronie.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.149212] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.169042] systemd[1]: Stopping Periodic Command Scheduler...
May 22 20:07:10 riven [ 8240.175785] systemd[1]: Starting Periodic Command Scheduler...
May 22 20:07:10 riven [ 8240.182815] systemd[1]: Started Periodic Command Scheduler.
May 22 20:07:10 riven [ 8240.189721] systemd[1]: rpcbind.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.203185] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.223731] systemd[1]: Stopping RPC Bind...
May 22 20:07:10 riven [ 8240.230514] systemd[1]: Starting RPC Bind...
May 22 20:07:10 riven [ 8240.237587] systemd[1]: Started udev Kernel Device Manager.
May 22 20:07:10 riven [ 8240.244349] systemd[1]: systemd-logind.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8240.257182] systemd[1]: Unit systemd-logind.service entered failed state.
May 22 20:07:10 riven [ 8240.263939] systemd[1]: Started RPC Bind.
May 22 20:07:10 riven [ 8240.270314] systemd[1]: Starting NFSv2/3 Network Status Monitor Daemon...
May 22 20:07:10 riven [ 8240.277481] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.297612] systemd[1]: Starting System Logger Daemon...
May 22 20:07:10 riven [ 8240.304684] systemd[1]: Started System Logger Daemon.
May 22 20:07:10 riven [ 8240.311549] systemd[1]: ntpd.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.318164] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.338387] systemd[1]: Stopping Network Time Service...
May 22 20:07:10 riven [ 8240.345345] systemd[1]: Starting Network Time Service...
May 22 20:07:10 riven ntpd[5059]: ntpd 4.2.6p5@1.2349-o Mon May 6 10:20:10 UTC 2013 (1)
May 22 20:07:10 riven ntpd[5060]: proto: precision = 0.558 usec
May 22 20:07:10 riven ntpd[5060]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
May 22 20:07:10 riven ntpd[5060]: Listen and drop on 1 v6wildcard :: UDP 123
May 22 20:07:10 riven ntpd[5060]: Listen normally on 2 lo 127.0.0.1 UDP 123
May 22 20:07:10 riven ntpd[5060]: Listen normally on 3 lo ::1 UDP 123
May 22 20:07:10 riven ntpd[5060]: Listen normally on 4 eth0 fe80::21f:d0ff:fe26:7491 UDP 123
May 22 20:07:10 riven ntpd[5060]: peers refreshed
May 22 20:07:10 riven ntpd[5060]: Listening on routing socket on fd #21 for interface updates
May 22 20:07:10 riven ntpd[5060]: Deferring DNS for 0.pool.ntp.org 1
May 22 20:07:10 riven [ 8240.352774] systemd[1]: Started NFSv2/3 Network Status Monitor Daemon.
May 22 20:07:10 riven ntpd[5060]: Deferring DNS for 1.pool.ntp.org 1
May 22 20:07:10 riven ntpd[5060]: Deferring DNS for 2.pool.ntp.org 1
May 22 20:07:10 riven [ 8240.360274] systemd[1]: Started Network Time Service.
May 22 20:07:10 riven [ 8240.366831] systemd[1]: systemd-logind.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.379809] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.400281] systemd[1]: Stopping Login Service...
May 22 20:07:10 riven [ 8240.407082] systemd[1]: Starting Login Service...
May 22 20:07:10 riven [ 8240.416399] systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
May 22 20:07:10 riven [ 8240.430330] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.451224] systemd[1]: Starting D-Bus System Message Bus...
May 22 20:07:10 riven [ 8240.458934] systemd[1]: Started D-Bus System Message Bus.
May 22 20:07:10 riven systemd[1]: Started Login Service.
May 22 20:07:12 riven kernel: [ 8242.947507] SysRq : Emergency Sync
May 22 20:07:14 riven kernel: [ 8244.899488] SysRq : Emergency Remount R/O
May 22 20:08:06 riven systemd-sysctl[148]: Duplicate assignment of kernel/sysrq in file '/usr/lib/sysctl.d/50-default.conf', ignoring.
May 22 20:08:06 riven systemd[1]: Mounting Arbitrary Executable File Formats File System...
May 22 20:08:06 riven systemd[1]: Mounted Debug File System.
May 22 20:08:06 riven systemd[1]: Mounted POSIX Message Queue File System.
May 22 20:08:06 riven systemd[1]: Mounted Huge Pages File System.
May 22 20:08:06 riven systemd[1]: Started udev Kernel Device Manager.
May 22 20:08:06 riven systemd[1]: Mounted Arbitrary Executable File Formats File System.
May 22 20:08:06 riven systemd[1]: Started Set Up Additional Binary Formats.
May 22 20:08:06 riven systemd-fsck[161]: Ignoring error.
May 22 20:08:07 riven kernel: [ 0.000000] Initializing cgroup subsys cpuset
May 22 20:08:07 riven kernel: [ 0.000000] Initializing cgroup subsys cpu
May 22 20:08:07 riven kernel: [ 0.000000] Linux version 3.9.3-1-ARCH (tobias@T-POWA-LX) (gcc version 4.8.0 20130502 (prerelease) (GCC) ) #1 SMP PREEMPT Sun May 19 22:50:29 CEST 2013
May 22 20:08:07 riven kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/riven-arch ro quiet
May 22 20:08:07 riven kernel: [ 0.000000] e820: BIOS-provided physical RAM map:
May 22 20:08:07 riven kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009e7ff] usable
May 22 20:08:07 riven kernel: [ 0.000000] BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
May 22 20:08:07 riven kernel: [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
May 22 20:08:07 riven kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cfeaffff] usable
May 22 20:08:07 riven kernel: [ 0.000000] BIOS-e820: [mem 0x00000000cfeb0000-0x00000000cfee2fff] ACPI NVS
May 22 20:08:06 riven systemd[1]: Started File System Check on Root Device.
May 22 20:08:07 riven systemd[1]: Reached target Sockets.
May 22 20:08:07 riven kernel: [ 0.000000] BIOS-e820: [mem 0x00000000cfee3000-0x00000000cfeeffff] ACPI data
May 22 20:08:07 riven kernel: [ 0.000000] BIOS-e820: [mem 0x00000000cfef0000-0x00000000cfefffff] reserved
May 22 20:08:07 riven kernel: [ 0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000e3ffffff] reserved
May 22 20:08:07 riven kernel: [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000ffffffff] reserved
May 22 20:08:07 riven kernel: [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000001afffffff] usable
May 22 20:08:07 riven kernel: [ 0.000000] NX (Execute Disable) protection: active
May 22 20:08:07 riven kernel: [ 0.000000] SMBIOS 2.4 present.
May 22 20:08:07 riven dhcpcd[274]: eth0: waiting for carrier
May 22 20:08:07 riven kernel: [ 0.000000] No AGP bridge found
May 22 20:08:07 riven kernel: [ 0.000000] e820: last_pfn = 0x1b0000 max_arch_pfn = 0x400000000
May 22 20:08:07 riven kernel: [ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
May 22 20:08:07 riven kernel: [ 0.000000] e820: last_pfn = 0xcfeb0 max_arch_pfn = 0x400000000
May 22 20:08:07 riven kernel: [ 0.000000] found SMP MP-table at [mem 0x000f58e0-0x000f58ef] mapped at [ffff8800000f58e0]
May 22 20:08:07 riven kernel: [ 0.000000] Scanning 1 areas for low memory corruption
May 22 20:08:07 riven kernel: [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
May 22 20:08:07 riven kernel: [ 0.000000] init_memory_mapping: [mem 0x1afe00000-0x1afffffff]
May 22 20:08:07 riven kernel: [ 0.000000] init_memory_mapping: [mem 0x1ac000000-0x1afdfffff]
May 22 20:08:07 riven kernel: [ 0.000000] init_memory_mapping: [mem 0x180000000-0x1abffffff]
May 22 20:08:07 riven kernel: [ 0.000000] init_memory_mapping: [mem 0x00100000-0xcfeaffff]
May 22 20:08:07 riven kernel: [ 0.000000] init_memory_mapping: [mem 0x100000000-0x17fffffff]
May 22 20:08:07 riven kernel: [ 0.000000] RAMDISK: [mem 0x37ae4000-0x37d69fff]
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: RSDP 00000000000f7710 00014 (v00 GBT )
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: RSDT 00000000cfee3040 00034 (v01 GBT GBTUACPI 42302E31 GBTU 01010101)
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: FACP 00000000cfee30c0 00074 (v01 GBT GBTUACPI 42302E31 GBTU 01010101)
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: DSDT 00000000cfee3180 03C78 (v01 GBT GBTUACPI 00001000 MSFT 0100000C)
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: FACS 00000000cfeb0000 00040
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: MCFG 00000000cfee6fc0 0003C (v01 GBT GBTUACPI 42302E31 GBTU 01010101)
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: APIC 00000000cfee6e40 00084 (v01 GBT GBTUACPI 42302E31 GBTU 01010101)
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: SSDT 00000000cfee79a0 003AB (v01 PmRef CpuPm 00003000 INTL 20040311)
May 22 20:08:07 riven kernel: [ 0.000000] No NUMA configuration found
May 22 20:08:07 riven kernel: [ 0.000000] Faking a node at [mem 0x0000000000000000-0x00000001afffffff]
May 22 20:08:07 riven kernel: [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x1afffffff]
May 22 20:08:07 riven kernel: [ 0.000000] NODE_DATA [mem 0x1afff6000-0x1afffafff]
May 22 20:08:07 riven kernel: [ 0.000000] Zone ranges:
May 22 20:08:07 riven kernel: [ 0.000000] DMA [mem 0x00001000-0x00ffffff]
May 22 20:08:07 riven kernel: [ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
May 22 20:08:07 riven kernel: [ 0.000000] Normal [mem 0x100000000-0x1afffffff]
May 22 20:08:07 riven kernel: [ 0.000000] Movable zone start for each node
May 22 20:08:07 riven kernel: [ 0.000000] Early memory node ranges
May 22 20:08:07 riven kernel: [ 0.000000] node 0: [mem 0x00001000-0x0009dfff]
May 22 20:08:07 riven kernel: [ 0.000000] node 0: [mem 0x00100000-0xcfeaffff]
May 22 20:08:07 riven kernel: [ 0.000000] node 0: [mem 0x100000000-0x1afffffff]
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: PM-Timer IO Port: 0x408
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x03] enabled)
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
May 22 20:08:07 riven kernel: [ 0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
May 22 20:08:07 riven kernel: [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
May 22 20:08:07 riven kernel: [ 0.000000] Using ACPI (MADT) for SMP configuration information
May 22 20:08:07 riven kernel: [ 0.000000] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
May 22 20:08:07 riven kernel: [ 0.000000] PM: Registered nosave memory: 000000000009e000 - 00000000000a0000
May 22 20:08:07 riven kernel: [ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
May 22 20:08:07 riven kernel: [ 0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
May 22 20:08:07 riven kernel: [ 0.000000] PM: Registered nosave memory: 00000000cfeb0000 - 00000000cfee3000
May 22 20:08:07 riven kernel: [ 0.000000] PM: Registered nosave memory: 00000000cfee3000 - 00000000cfef0000
May 22 20:08:07 riven kernel: [ 0.000000] PM: Registered nosave memory: 00000000cfef0000 - 00000000cff00000
May 22 20:08:07 riven kernel: [ 0.000000] PM: Registered nosave memory: 00000000cff00000 - 00000000e0000000
May 22 20:08:07 riven kernel: [ 0.000000] PM: Registered nosave memory: 00000000e0000000 - 00000000e4000000
May 22 20:08:07 riven kernel: [ 0.000000] PM: Registered nosave memory: 00000000e4000000 - 00000000fec00000
May 22 20:08:07 riven kernel: [ 0.000000] PM: Registered nosave memory: 00000000fec00000 - 0000000100000000
May 22 20:08:07 riven kernel: [ 0.000000] e820: [mem 0xe4000000-0xfebfffff] available for PCI devices
May 22 20:08:07 riven kernel: [ 0.000000] Booting paravirtualized kernel on bare hardware
May 22 20:08:07 riven kernel: [ 0.000000] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:4 nr_node_ids:1
May 22 20:08:07 riven kernel: [ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff8801afc00000 s85824 r8192 d20672 u524288
May 22 20:08:07 riven kernel: [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 1547837
May 22 20:08:07 riven kernel: [ 0.000000] Policy zone: Normal
May 22 20:08:07 riven kernel: [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/riven-arch ro quiet
May 22 20:08:07 riven kernel: [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
May 22 20:08:07 riven kernel: [ 0.000000] __ex_table already sorted, skipping sort
May 22 20:08:07 riven kernel: [ 0.000000] Checking aperture...
May 22 20:08:07 riven kernel: [ 0.000000] No AGP bridge found
May 22 20:08:07 riven kernel: [ 0.000000] Memory: 6110480k/7077888k available (4983k kernel code, 788172k absent, 179236k reserved, 3967k data, 1092k init)
May 22 20:08:07 riven kernel: [ 0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
May 22 20:08:07 riven kernel: [ 0.000000] Preemptible hierarchical RCU implementation.
May 22 20:08:07 riven kernel: [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
May 22 20:08:07 riven kernel: [ 0.000000] Dump stacks of tasks blocking RCU-preempt GP.
^ permalink raw reply [flat|nested] 37+ messages in thread[parent not found: <519D2B96.9000106-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <519D2B96.9000106-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-05-22 20:36 ` Anton Eliasson [not found] ` <519D2C32.5040600-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 2013-05-23 6:44 ` Vyacheslav Dubeyko 1 sibling, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-05-22 20:36 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Anton Eliasson skrev 2013-05-22 22:33: > Greetings! > It pains me to report that my /home filesystem broke down today. My > system is running Arch Linux 64-bit. The filesystem resides on a > Crucial M4 256 GB SSD, on top of a LVM2 volume. The drive and > filesystem are both around six months old. Partition table and error > log excerpts are at the bottom of this e-mail. Full logs are available > upon request. > > I am providing this information as a bug report. I have no reason to > suspect the hardware but I cannot exclude it either. If you (the > developers) are interested in troubleshooting this for prosperity, I > can be your hands and run whatever tools are required. If not, I'll > reformat the filesystem, restore the data from backup and forget that > this happened. > > In case the formatting gets mangled, this e-mail is also available at Right here: http://paste.debian.net/5841/ -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <519D2C32.5040600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <519D2C32.5040600-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-05-23 1:40 ` Ryusuke Konishi 0 siblings, 0 replies; 37+ messages in thread From: Ryusuke Konishi @ 2013-05-23 1:40 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi, On Wed, 22 May 2013 22:36:02 +0200, Anton Eliasson wrote: > Anton Eliasson skrev 2013-05-22 22:33: >> Greetings! >> It pains me to report that my /home filesystem broke down today. My >> system is running Arch Linux 64-bit. The filesystem resides on a >> Crucial M4 256 GB SSD, on top of a LVM2 volume. The drive and >> filesystem are both around six months old. Partition table and error >> log excerpts are at the bottom of this e-mail. Full logs are available >> upon request. >> >> I am providing this information as a bug report. I have no reason to >> suspect the hardware but I cannot exclude it either. If you (the >> developers) are interested in troubleshooting this for prosperity, I >> can be your hands and run whatever tools are required. If not, I'll >> reformat the filesystem, restore the data from backup and forget that >> this happened. >> >> In case the formatting gets mangled, this e-mail is also available at > Right here: http://paste.debian.net/5841/ Thank you for the report. According to the log, btree of a regular file is destroyed for some reason. I think we should look into how the btree block is broken. Could you try the following commands to inspect the broken disk segment ? $ sudo dd if=/dev/dm-3 bs=4k count=2048 skip=14350336 iflag=direct 2>/dev/null | hexdump -C This will print out blocks of the segment 7007 which includes the broken btree block. The following commands are also useful to get debug information. Could you try them, too ? $ sudo nilfs-tune -l /dev/dm-3 $ sudo dumpseg /dev/dm-3 7007 $ lssu -a /dev/dm-3 The third command requires the device is mounted, so /home should be mounted previously with a readonly option and a norecovery option: $ sudo mount -t nilfs2 -o ro,norecovery /dev/dm-3 /home With regards, Ryusuke Konishi > -- > Best Regards, > Anton Eliasson > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" > in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem [not found] ` <519D2B96.9000106-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 2013-05-22 20:36 ` Anton Eliasson @ 2013-05-23 6:44 ` Vyacheslav Dubeyko 2013-05-25 11:59 ` Anton Eliasson 1 sibling, 1 reply; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-05-23 6:44 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Anton, On Wed, 2013-05-22 at 22:33 +0200, Anton Eliasson wrote: > Greetings! > It pains me to report that my /home filesystem broke down today. My > system is running Arch Linux 64-bit. The filesystem resides on a Crucial > M4 256 GB SSD, on top of a LVM2 volume. The drive and filesystem are > both around six months old. Partition table and error log excerpts are > at the bottom of this e-mail. Full logs are available upon request. > > I am providing this information as a bug report. I have no reason to > suspect the hardware but I cannot exclude it either. If you (the > developers) are interested in troubleshooting this for prosperity, I can > be your hands and run whatever tools are required. If not, I'll reformat > the filesystem, restore the data from backup and forget that this happened. > > In case the formatting gets mangled, this e-mail is also available at > What happened today, in chronological order: > > ~18:00 > ====== > I am troubleshooting some issues that turn out to be caused by a wrongly > configured system clock. The RTC (hardware clock) is set to local time > (UTC+2) but the OS is configured to treat the RTC as UTC. This is > because it was set to UTC previously, but then I reinstalled Windows > which promptly reset it to local time. > > This set the mtime of some files in both / and /home to dates in the > future. When I discovered this, I `touch`ed all affected files (`touch > now; sudo find / /home -xdev -newer now -exec touch {} \;`) to reset > their mtime and rebooted the system. I do not know if this is relevant; > if not, it makes reading the log files more fun. > > I then launch my command line backup program "bup", Firefox and some > other apps. > > ~18:50-19:00 > ============ > Firefox freezes. The system keeps running but I can't launch new > programs. It looked like all I/O broke down. However, bup kept running. > I left the computer alone for perhaps 30-60 min. > So, as I understand, a reproducing path is: (1) set mtime of some files in the future; (2) touch all affected files; (3) reboot the system; (4) launch backup program "bup", Firefox and some other apps. I think that it makes sense to try this reproducing path. But we had reports about the issue with likewise symptoms (nilfs_bmap_lookup_contig: broken bmap) for the case of 4 KB block size from other users. Unfortunately, I can't reproduce such issue for the case of 4 KB blocks size earlier. As I feel the clear reproducing path is crucial for this issue. I understand that it can be hard to reproduce the issue again. But, anyway, have you opportunity to try to reproduce the issue on another NILFS2 partition on your side? Anyway, I am going to reproduce the issue by this reproducing path on my side. > ~20:00 > ====== > When I came back, bup hade frozen (/var/log/messages at 18:53:31).[1] I > restart X by pressing Alt+SysRq+K (/var/log/messages at 20:06:33) and > return to the login screen. The system freezes during login though, > probably because /home had probably been mounted read only). So I reboot > using Alt+SysRq+REISUB (/var/log/messages at 20:07:05). I noticed some > I/O errors during shutdown. > > After the reboot there are no immediate signs of disaster. I launch bup > again. Some time later, /home remounts as read only. I notice that bup > has reported I/O errors while reading some files in /home.[2] dmesg and > /var/log/kern.log contains errors mentioning "bad btree node" and > "nilfs_bmap_lookup_contig: broken bmap".[3] > Now we have patch for overcome the freezing of system after such issue: http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01614.html. With the best regards, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem 2013-05-23 6:44 ` Vyacheslav Dubeyko @ 2013-05-25 11:59 ` Anton Eliasson [not found] ` <51A0A7A0.6010207-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-05-25 11:59 UTC (permalink / raw) To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Vyacheslav Dubeyko skrev 2013-05-23 08:44: > Hi Anton, > > On Wed, 2013-05-22 at 22:33 +0200, Anton Eliasson wrote: >> Greetings! >> It pains me to report that my /home filesystem broke down today. My >> system is running Arch Linux 64-bit. The filesystem resides on a Crucial >> M4 256 GB SSD, on top of a LVM2 volume. The drive and filesystem are >> both around six months old. Partition table and error log excerpts are >> at the bottom of this e-mail. Full logs are available upon request. >> >> I am providing this information as a bug report. I have no reason to >> suspect the hardware but I cannot exclude it either. If you (the >> developers) are interested in troubleshooting this for prosperity, I can >> be your hands and run whatever tools are required. If not, I'll reformat >> the filesystem, restore the data from backup and forget that this happened. >> >> In case the formatting gets mangled, this e-mail is also available at >> What happened today, in chronological order: >> >> ~18:00 >> ====== >> I am troubleshooting some issues that turn out to be caused by a wrongly >> configured system clock. The RTC (hardware clock) is set to local time >> (UTC+2) but the OS is configured to treat the RTC as UTC. This is >> because it was set to UTC previously, but then I reinstalled Windows >> which promptly reset it to local time. >> >> This set the mtime of some files in both / and /home to dates in the >> future. When I discovered this, I `touch`ed all affected files (`touch >> now; sudo find / /home -xdev -newer now -exec touch {} \;`) to reset >> their mtime and rebooted the system. I do not know if this is relevant; >> if not, it makes reading the log files more fun. >> >> I then launch my command line backup program "bup", Firefox and some >> other apps. >> >> ~18:50-19:00 >> ============ >> Firefox freezes. The system keeps running but I can't launch new >> programs. It looked like all I/O broke down. However, bup kept running. >> I left the computer alone for perhaps 30-60 min. >> > So, as I understand, a reproducing path is: > (1) set mtime of some files in the future; > (2) touch all affected files; > (3) reboot the system; > (4) launch backup program "bup", Firefox and some other apps. That about sums up what I did, yes. While debugging the clock problems I rebooted more than once in a short time period. > I think that it makes sense to try this reproducing path. But we had > reports about the issue with likewise symptoms > (nilfs_bmap_lookup_contig: broken bmap) for the case of 4 KB block size > from other users. Unfortunately, I can't reproduce such issue for the > case of 4 KB blocks size earlier. As I feel the clear reproducing path > is crucial for this issue. > > I understand that it can be hard to reproduce the issue again. But, > anyway, have you opportunity to try to reproduce the issue on another > NILFS2 partition on your side? > > Anyway, I am going to reproduce the issue by this reproducing path on my > side. I have created a new nilfs filesystem about the same size as the old one on another drive and restored /home to it. If I find the time this weekend, I'll give it the same treatment. >> ~20:00 >> ====== >> When I came back, bup hade frozen (/var/log/messages at 18:53:31).[1] I >> restart X by pressing Alt+SysRq+K (/var/log/messages at 20:06:33) and >> return to the login screen. The system freezes during login though, >> probably because /home had probably been mounted read only). So I reboot >> using Alt+SysRq+REISUB (/var/log/messages at 20:07:05). I noticed some >> I/O errors during shutdown. >> >> After the reboot there are no immediate signs of disaster. I launch bup >> again. Some time later, /home remounts as read only. I notice that bup >> has reported I/O errors while reading some files in /home.[2] dmesg and >> /var/log/kern.log contains errors mentioning "bad btree node" and >> "nilfs_bmap_lookup_contig: broken bmap".[3] >> > Now we have patch for overcome the freezing of system after such issue: > http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01614.html. That is good. I shall await the next release with great anticipation. > With the best regards, > Vyacheslav Dubeyko. > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <51A0A7A0.6010207-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <51A0A7A0.6010207-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-05-25 16:26 ` Anton Eliasson [not found] ` <51A0E62D.5060600-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-05-25 16:26 UTC (permalink / raw) To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Anton Eliasson skrev 2013-05-25 13:59: [...] >>> ~20:00 >>> ====== >>> When I came back, bup hade frozen (/var/log/messages at 18:53:31).[1] I >>> restart X by pressing Alt+SysRq+K (/var/log/messages at 20:06:33) and >>> return to the login screen. The system freezes during login though, >>> probably because /home had probably been mounted read only). So I reboot >>> using Alt+SysRq+REISUB (/var/log/messages at 20:07:05). I noticed some >>> I/O errors during shutdown. >>> >>> After the reboot there are no immediate signs of disaster. I launch bup >>> again. Some time later, /home remounts as read only. I notice that bup >>> has reported I/O errors while reading some files in /home.[2] dmesg and >>> /var/log/kern.log contains errors mentioning "bad btree node" and >>> "nilfs_bmap_lookup_contig: broken bmap".[3] >>> >> Now we have patch for overcome the freezing of system after such issue: >> http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01614.html. > That is good. I shall await the next release with great anticipation. I don't think the bug described in the patch you linked to is responsible for my crashes. Check this out: May 25 17:15:12 riven kernel: [ 1165.629786] /dev/vmnet: port on hub 0 successfully opened May 25 17:15:15 riven kernel: [ 1168.871258] /dev/vmnet: open called by PID 2073 (vmx-vcpu-0) May 25 17:15:15 riven kernel: [ 1168.871281] /dev/vmnet: port on hub 0 successfully opened May 25 17:15:34 riven kernel: [ 1187.572676] /dev/vmnet: open called by PID 2075 (vmx-vcpu-1) May 25 17:15:34 riven kernel: [ 1187.572693] /dev/vmnet: port on hub 0 successfully opened May 25 17:15:38 riven kernel: [ 1192.188770] BUG: unable to handle kernel NULL pointer dereference at 0000000000000b95 May 25 17:15:38 riven kernel: [ 1192.188781] IP: [<ffffffffa03021a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] May 25 17:15:38 riven kernel: [ 1192.188798] PGD 1982f8067 PUD 198e2b067 PMD 0 May 25 17:15:38 riven kernel: [ 1192.188803] Oops: 0000 [#1] PREEMPT SMP May 25 17:15:38 riven kernel: [ 1192.188809] Modules linked in: nfsv3 nfs_acl vmnet(O) ppdev parport_pc parport fuse vsock vmci(O) vmmon(O) ext4 crc16 mbcache jbd2 nvidia(PO) gpio_ich iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm snd_hda_codec_realtek pcspkr psmouse microcode serio_raw i2c_i801 snd_hda_intel lpc_ich snd_hda_codec drm evdev r8169 snd_hwdep snd_pcm i2c_core snd_page_alloc mii acpi_cpufreq snd_timer intel_agp mperf intel_gtt snd soundcore button processor loop nfs lockd sunrpc fscache nilfs2 dm_mod sd_mod sr_mod cdrom ata_generic pata_acpi hid_generic usbhid hid ahci libahci pata_it8213 libata firewire_ohci scsi_mod firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common May 25 17:15:38 riven kernel: [ 1192.188877] CPU 1 May 25 17:15:38 riven kernel: [ 1192.188883] Pid: 262, comm: nilfs_cleanerd Tainted: P O 3.9.3-1-ARCH #1 Gigabyte Technology Co., Ltd. EP45-DS4/EP45-DS4 May 25 17:15:38 riven kernel: [ 1192.188888] RIP: 0010:[<ffffffffa03021a2>] [<ffffffffa03021a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] May 25 17:15:38 riven kernel: [ 1192.188897] RSP: 0018:ffff880195afdb30 EFLAGS: 00010206 May 25 17:15:38 riven kernel: [ 1192.188900] RAX: ffff8801a25e7d48 RBX: 0000000000000b95 RCX: 0000000000000034 May 25 17:15:38 riven kernel: [ 1192.188903] RDX: 000000000000000d RSI: 0000000000000000 RDI: 0000000000000b95 May 25 17:15:38 riven kernel: [ 1192.188906] RBP: ffff880195afdb38 R08: a200000000000000 R09: a800028051000000 May 25 17:15:38 riven kernel: [ 1192.188908] R10: 57ffe77fafa01440 R11: 0000000000000019 R12: ffff8801988b2648 May 25 17:15:38 riven kernel: [ 1192.188911] R13: ffff8801a25e7d00 R14: ffffea00000d04c0 R15: ffffea0000a01180 May 25 17:15:38 riven kernel: [ 1192.188914] FS: 00007f8bf81f3740(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000 May 25 17:15:38 riven kernel: [ 1192.188917] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 25 17:15:38 riven kernel: [ 1192.188920] CR2: 0000000000000b95 CR3: 00000001959eb000 CR4: 00000000000007e0 May 25 17:15:38 riven kernel: [ 1192.188923] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 25 17:15:38 riven kernel: [ 1192.188925] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 25 17:15:38 riven kernel: [ 1192.188928] Process nilfs_cleanerd (pid: 262, threadinfo ffff880195afc000, task ffff880195f2c300) May 25 17:15:38 riven kernel: [ 1192.188930] Stack: May 25 17:15:38 riven kernel: [ 1192.188932] ffff8801988b25a0 ffff880195afdc20 ffffffffa0303ed5 ffffea0002dfb7c0 May 25 17:15:38 riven kernel: [ 1192.188938] ffff880195f2c300 ffff880195f2c300 ffff880195f2c300 ffff8801a56b8a70 May 25 17:15:38 riven kernel: [ 1192.188942] ffff8801a49d0b60 ffff8801a49d0a00 0000000102dfb7c0 ffff8801a56b8a60 May 25 17:15:38 riven kernel: [ 1192.188947] Call Trace: May 25 17:15:38 riven kernel: [ 1192.188959] [<ffffffffa0303ed5>] nilfs_segctor_do_construct+0xd65/0x1ab0 [nilfs2] May 25 17:15:38 riven kernel: [ 1192.188969] [<ffffffffa0304e42>] nilfs_segctor_construct+0x172/0x290 [nilfs2] May 25 17:15:38 riven kernel: [ 1192.188978] [<ffffffffa0305ead>] nilfs_clean_segments+0xed/0x270 [nilfs2] May 25 17:15:38 riven kernel: [ 1192.188985] [<ffffffff811bc4bc>] ? __set_page_dirty+0x6c/0xc0 May 25 17:15:38 riven kernel: [ 1192.188994] [<ffffffffa030c06f>] nilfs_ioctl_clean_segments.isra.14+0x4bf/0x740 [nilfs2] May 25 17:15:38 riven kernel: [ 1192.189003] [<ffffffffa02fca8d>] ? nilfs_btree_lookup+0x4d/0x70 [nilfs2] May 25 17:15:38 riven kernel: [ 1192.189012] [<ffffffffa030c70c>] nilfs_ioctl+0x21c/0x740 [nilfs2] May 25 17:15:38 riven kernel: [ 1192.189018] [<ffffffff8119cf65>] do_vfs_ioctl+0x2e5/0x4d0 May 25 17:15:38 riven kernel: [ 1192.189025] [<ffffffff81152930>] ? do_munmap+0x2b0/0x3e0 May 25 17:15:38 riven kernel: [ 1192.189029] [<ffffffff8119d1d1>] sys_ioctl+0x81/0xa0 May 25 17:15:38 riven kernel: [ 1192.189036] [<ffffffff814d3769>] ? do_device_not_available+0x19/0x20 May 25 17:15:38 riven kernel: [ 1192.189042] [<ffffffff814d9e9d>] system_call_fastpath+0x1a/0x1f May 25 17:15:38 riven kernel: [ 1192.189044] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4 May 25 17:15:38 riven kernel: [ 1192.189089] RIP [<ffffffffa03021a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] May 25 17:15:38 riven kernel: [ 1192.189098] RSP <ffff880195afdb30> May 25 17:15:38 riven kernel: [ 1192.189100] CR2: 0000000000000b95 May 25 17:15:38 riven kernel: [ 1192.189104] ---[ end trace 0c7496171e3b9dfd ]--- May 25 18:03:02 riven kernel: [ 0.000000] Initializing cgroup subsys cpuset May 25 18:03:02 riven kernel: [ 0.000000] Initializing cgroup subsys cpu May 25 18:03:02 riven kernel: [ 0.000000] Linux version 3.9.3-1-ARCH (tobias@T-POWA-LX) (gcc version 4.8.0 20130502 (prerelease) (GCC) ) #1 SMP PREEMPT Sun May 19 22:50:29 CEST 2013 May 25 18:03:02 riven kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/riven-arch ro quiet No remounts, just a kernel oops. I can reproduce this without fail by booting a VMware Workstation (9.0.2) virtual machine that resides on the nilfs /home volume while another virtual machine is doing something IO-intensive. More specifically, I have a virtual machine running Windows XP in /home, a nilfs filesystem, and a virtual machine running Windows 7 in /Supplement. /Supplement is an ext4 volume in the same LVM volume group as /home on the same slow hard drive. I can crash the host by either: * Starting both machines at the same time. * Starting the W7 machine first and when it is fully booted to the desktop, but still doing I/O intensive Windows stuff, starting the WXP machine. If I first start the WXP machine and let it boot to the desktop, at the point where it is actually I/O idle, I can safely start the W7 machine. After that I found no trouble installing software updates and logging in and out of both machines at the same time, though the HDD made it very slow of course. After the host had crashed, I could still list and read files in /home but as soon as I attempted to `touch` a file, that terminal froze. Any terminal that attempted to read a file after that point froze as well and there was nothing left to do but to Alt+SysRq+B. -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <51A0E62D.5060600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <51A0E62D.5060600-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-05-26 12:54 ` Vyacheslav Dubeyko 2013-05-29 6:39 ` Vyacheslav Dubeyko 1 sibling, 0 replies; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-05-26 12:54 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Anton, On May 25, 2013, at 8:26 PM, Anton Eliasson wrote: > Anton Eliasson skrev 2013-05-25 13:59: > [...] >>>> ~20:00 >>>> ====== >>>> When I came back, bup hade frozen (/var/log/messages at 18:53:31).[1] I >>>> restart X by pressing Alt+SysRq+K (/var/log/messages at 20:06:33) and >>>> return to the login screen. The system freezes during login though, >>>> probably because /home had probably been mounted read only). So I reboot >>>> using Alt+SysRq+REISUB (/var/log/messages at 20:07:05). I noticed some >>>> I/O errors during shutdown. >>>> >>>> After the reboot there are no immediate signs of disaster. I launch bup >>>> again. Some time later, /home remounts as read only. I notice that bup >>>> has reported I/O errors while reading some files in /home.[2] dmesg and >>>> /var/log/kern.log contains errors mentioning "bad btree node" and >>>> "nilfs_bmap_lookup_contig: broken bmap".[3] >>>> >>> Now we have patch for overcome the freezing of system after such issue: >>> http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01614.html. >> That is good. I shall await the next release with great anticipation. > I don't think the bug described in the patch you linked to is responsible for my crashes. Check this out: > I didn't state that this patch solves your issue. I meant that after remount in RO mode the NILFS2 driver still has dirty pages for the case of issue. This patch solves the issue of infinite trying of flushing these dirty pages by kernel flush thread. The infinite trying to flush dirty pages can result in system freezing, as I understand. [snip] > No remounts, just a kernel oops. I can reproduce this without fail by booting a VMware Workstation (9.0.2) virtual machine that resides on the nilfs /home volume while another virtual machine is doing something IO-intensive. > Sorry, I am confused slightly by different descriptions of the issue in your e-mails. Initially, I have understanding that, first of all, you have issue with remount in RO mode. But now you are talking about crash without remount. Could you share full system log that you have for the issue case? I need to understand a sequence of events. Maybe, you have two issues instead of one. Currently, I haven't clear picture of the issue's environment. Thanks, Vyacheslav Dubeyko. > More specifically, I have a virtual machine running Windows XP in /home, a nilfs filesystem, and a virtual machine running Windows 7 in /Supplement. /Supplement is an ext4 volume in the same LVM volume group as /home on the same slow hard drive. I can crash the host by either: > > * Starting both machines at the same time. > * Starting the W7 machine first and when it is fully booted to the desktop, but still doing I/O intensive Windows stuff, starting the WXP machine. > > If I first start the WXP machine and let it boot to the desktop, at the point where it is actually I/O idle, I can safely start the W7 machine. After that I found no trouble installing software updates and logging in and out of both machines at the same time, though the HDD made it very slow of course. > > After the host had crashed, I could still list and read files in /home but as soon as I attempted to `touch` a file, that terminal froze. Any terminal that attempted to read a file after that point froze as well and there was nothing left to do but to Alt+SysRq+B. > > -- > Best Regards, > Anton Eliasson > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem [not found] ` <51A0E62D.5060600-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 2013-05-26 12:54 ` Vyacheslav Dubeyko @ 2013-05-29 6:39 ` Vyacheslav Dubeyko 2013-05-29 14:37 ` Ryusuke Konishi 2013-05-30 8:10 ` Anton Eliasson 1 sibling, 2 replies; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-05-29 6:39 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Anton On Sat, 2013-05-25 at 18:26 +0200, Anton Eliasson wrote: [snip] > More specifically, I have a virtual machine running Windows XP in /home, > a nilfs filesystem, and a virtual machine running Windows 7 in > /Supplement. /Supplement is an ext4 volume in the same LVM volume group > as /home on the same slow hard drive. I can crash the host by either: > > * Starting both machines at the same time. > * Starting the W7 machine first and when it is fully booted to the > desktop, but still doing I/O intensive Windows stuff, starting the WXP > machine. > > If I first start the WXP machine and let it boot to the desktop, at the > point where it is actually I/O idle, I can safely start the W7 machine. > After that I found no trouble installing software updates and logging in > and out of both machines at the same time, though the HDD made it very > slow of course. Currently, I am thinking about reproducing path. It is really important to have clear reproducing path. But I haven't clear picture of your environment yet. As I understand, you have two virtual VmWare machine (Win XP and Win 7). Am I correct? Moreover, I am thinking about the fact that virtual machine on different volumes influence on each other in the issue environment. Currently, I haven't clear understanding of this. > /etc/fstab > ---------- > tmpfs /tmp tmpfs nodev,nosuid 0 0 > /dev/mapper/riven-arch / nilfs2 rw,noatime,discard 0 0 > /dev/mapper/riven-home /home nilfs2 rw,noatime,discard 0 0 > /dev/mapper/riven-swap none swap defaults > 0 0 > /dev/riven-proto/supplement /Supplement ext4 defaults,noatime 0 0 > # some NFS mounts excluded > As I can see, riven-arch, riven-home and riven-swap are under device mapper but riven-proto is not. Could you share more details about how your Logical Volumes environment was prepared? Current state of fsck.nilfs2 doesn't give many useful details. But debug output of fsck.nilfs2 contains detailed info about first superblock, second superblock and segment summaries of all segments. I think that this output can give to me more understanding about NILFS2 volume state. Could you share debug output of fsck.nilfs2 for me? You can found archive with fsck.nilf2 source code in this place: (http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz). Please, build fsck.nilfs2 but don't install it. The fsck.nilfs2 on the initial state of development. Currently, fsck.nilfs2 doesn't make any writing operations. So, you can execute command in such way: "fsck.nilfs2 -v debug [device] 2> [output-file]". The output file has a big size, usually. I am preparing patch for NILFS2 driver with debug output. I think that it makes sense to get more detail about the issue on your side because you can reproduce the issue stably. So, I'll send you this patch as it will be ready. Have you opportunity to patch your kernel and share debug output for the reproduced issue case? Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem 2013-05-29 6:39 ` Vyacheslav Dubeyko @ 2013-05-29 14:37 ` Ryusuke Konishi [not found] ` <20130529.233757.27789741.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2013-05-30 8:10 ` Anton Eliasson 1 sibling, 1 reply; 37+ messages in thread From: Ryusuke Konishi @ 2013-05-29 14:37 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: Anton Eliasson, linux-nilfs-u79uwXL29TY76Z2rM5mHXA I don't know whether this may be a hint of this trouble, but according to the system log, page_buffers() of nilfs_end_page_io() seems to hit an Oops due to an invalid page address "0x36cd": May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] May 22 18:53:31 riven kernel: [ 3821.605591] PGD 19636d067 PUD 19636e067 PMD 0 May 22 18:53:31 riven kernel: [ 3821.605597] Oops: 0000 [#1] PREEMPT SMP <snip> May 22 18:53:31 riven kernel: [ 3821.605829] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4 May 22 18:53:31 riven kernel: [ 3821.605873] RIP [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] May 22 18:53:31 riven kernel: [ 3821.605881] RSP <ffff8801960f7b30> May 22 18:53:31 riven kernel: [ 3821.605884] CR2: 00000000000036cd where the instruction sequence of "<48> 8b 07 f6 c4 08" is "mov (%rdi),%rax; test $0x8, %ah", and corresponds to the part testing PagePrivate(page) in page_buffers() macro called within nilfs_end_page_io() routine: if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) { This cannot happen, but there may be something we missed. Regards, Ryusuke Konishi On Wed, 29 May 2013 10:39:33 +0400, Vyacheslav Dubeyko wrote: > Hi Anton > > On Sat, 2013-05-25 at 18:26 +0200, Anton Eliasson wrote: > > [snip] >> More specifically, I have a virtual machine running Windows XP in /home, >> a nilfs filesystem, and a virtual machine running Windows 7 in >> /Supplement. /Supplement is an ext4 volume in the same LVM volume group >> as /home on the same slow hard drive. I can crash the host by either: >> >> * Starting both machines at the same time. >> * Starting the W7 machine first and when it is fully booted to the >> desktop, but still doing I/O intensive Windows stuff, starting the WXP >> machine. >> >> If I first start the WXP machine and let it boot to the desktop, at the >> point where it is actually I/O idle, I can safely start the W7 machine. >> After that I found no trouble installing software updates and logging in >> and out of both machines at the same time, though the HDD made it very >> slow of course. > > Currently, I am thinking about reproducing path. It is really important > to have clear reproducing path. But I haven't clear picture of your > environment yet. As I understand, you have two virtual VmWare machine > (Win XP and Win 7). Am I correct? > > Moreover, I am thinking about the fact that virtual machine on different > volumes influence on each other in the issue environment. Currently, I > haven't clear understanding of this. > >> /etc/fstab >> ---------- >> tmpfs /tmp tmpfs nodev,nosuid 0 0 >> /dev/mapper/riven-arch / nilfs2 rw,noatime,discard 0 0 >> /dev/mapper/riven-home /home nilfs2 rw,noatime,discard 0 0 >> /dev/mapper/riven-swap none swap defaults >> 0 0 >> /dev/riven-proto/supplement /Supplement ext4 defaults,noatime 0 0 >> # some NFS mounts excluded >> > > As I can see, riven-arch, riven-home and riven-swap are under device > mapper but riven-proto is not. Could you share more details about how > your Logical Volumes environment was prepared? > > Current state of fsck.nilfs2 doesn't give many useful details. But debug > output of fsck.nilfs2 contains detailed info about first superblock, > second superblock and segment summaries of all segments. I think that > this output can give to me more understanding about NILFS2 volume state. > Could you share debug output of fsck.nilfs2 for me? > > You can found archive with fsck.nilf2 source code in this place: > (http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz). Please, build fsck.nilfs2 but don't install it. The fsck.nilfs2 on the initial state of development. Currently, fsck.nilfs2 doesn't make any writing operations. So, you can execute command in such way: "fsck.nilfs2 -v debug [device] 2> [output-file]". The output file has a big size, usually. > > I am preparing patch for NILFS2 driver with debug output. I think that > it makes sense to get more detail about the issue on your side because > you can reproduce the issue stably. So, I'll send you this patch as it > will be ready. Have you opportunity to patch your kernel and share debug > output for the reproduced issue case? > > Thanks, > Vyacheslav Dubeyko. > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <20130529.233757.27789741.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <20130529.233757.27789741.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2013-05-30 6:13 ` Vyacheslav Dubeyko 2013-05-30 6:55 ` Ryusuke Konishi 0 siblings, 1 reply; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-05-30 6:13 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: Anton Eliasson, linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote: > I don't know whether this may be a hint of this trouble, but according > to the system log, page_buffers() of nilfs_end_page_io() seems to hit > an Oops due to an invalid page address "0x36cd": > Yes. There are two possible way to be in nilfs_end_page_io(): (1) nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I suspect the nilfs_abort_logs() because of compiler optimization. But now I haven't evidence of it. And it needs to investigate issue more deeply for stating something definitely, I think. With the best regards, Vyacheslav Dubeyko. > May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd > May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] > May 22 18:53:31 riven kernel: [ 3821.605591] PGD 19636d067 PUD 19636e067 PMD 0 > May 22 18:53:31 riven kernel: [ 3821.605597] Oops: 0000 [#1] PREEMPT SMP > <snip> > May 22 18:53:31 riven kernel: [ 3821.605829] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4 > May 22 18:53:31 riven kernel: [ 3821.605873] RIP [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] > May 22 18:53:31 riven kernel: [ 3821.605881] RSP <ffff8801960f7b30> > May 22 18:53:31 riven kernel: [ 3821.605884] CR2: 00000000000036cd > > where the instruction sequence of "<48> 8b 07 f6 c4 08" is "mov > (%rdi),%rax; test $0x8, %ah", and corresponds to the part testing > PagePrivate(page) in page_buffers() macro called within > nilfs_end_page_io() routine: > > if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) { > > This cannot happen, but there may be something we missed. > > > Regards, > Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem 2013-05-30 6:13 ` Vyacheslav Dubeyko @ 2013-05-30 6:55 ` Ryusuke Konishi [not found] ` <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Ryusuke Konishi @ 2013-05-30 6:55 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: Anton Eliasson, linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Thu, 30 May 2013 10:13:05 +0400, Vyacheslav Dubeyko wrote: > On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote: >> I don't know whether this may be a hint of this trouble, but according >> to the system log, page_buffers() of nilfs_end_page_io() seems to hit >> an Oops due to an invalid page address "0x36cd": >> > > Yes. There are two possible way to be in nilfs_end_page_io(): (1) > nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I > suspect the nilfs_abort_logs() That sounds a likely cause. Can you test nilfs_abort_logs by injecting a random fault in some easy way ? Regards, Ryusuke Konishi > because of compiler optimization. But now > I haven't evidence of it. And it needs to investigate issue more deeply > for stating something definitely, I think. > With the best regards, > Vyacheslav Dubeyko. > >> May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd >> May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] >> May 22 18:53:31 riven kernel: [ 3821.605591] PGD 19636d067 PUD 19636e067 PMD 0 >> May 22 18:53:31 riven kernel: [ 3821.605597] Oops: 0000 [#1] PREEMPT SMP >> <snip> >> May 22 18:53:31 riven kernel: [ 3821.605829] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4 >> May 22 18:53:31 riven kernel: [ 3821.605873] RIP [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] >> May 22 18:53:31 riven kernel: [ 3821.605881] RSP <ffff8801960f7b30> >> May 22 18:53:31 riven kernel: [ 3821.605884] CR2: 00000000000036cd >> >> where the instruction sequence of "<48> 8b 07 f6 c4 08" is "mov >> (%rdi),%rax; test $0x8, %ah", and corresponds to the part testing >> PagePrivate(page) in page_buffers() macro called within >> nilfs_end_page_io() routine: >> >> if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) { >> >> This cannot happen, but there may be something we missed. >> >> >> Regards, >> Ryusuke Konishi > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2013-05-30 7:21 ` Vyacheslav Dubeyko 2013-06-06 6:56 ` Vyacheslav Dubeyko 1 sibling, 0 replies; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-05-30 7:21 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: Anton Eliasson, linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Thu, 2013-05-30 at 15:55 +0900, Ryusuke Konishi wrote: > On Thu, 30 May 2013 10:13:05 +0400, Vyacheslav Dubeyko wrote: > > On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote: > >> I don't know whether this may be a hint of this trouble, but according > >> to the system log, page_buffers() of nilfs_end_page_io() seems to hit > >> an Oops due to an invalid page address "0x36cd": > >> > > > > Yes. There are two possible way to be in nilfs_end_page_io(): (1) > > nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I > > suspect the nilfs_abort_logs() > > That sounds a likely cause. > > Can you test nilfs_abort_logs by injecting a random fault in some easy > way ? > Yes, sure. Now I am thinking about proper place for such injection. I'll share results of such attempt. With the best regards, Vyacheslav Dubeyko. > Regards, > Ryusuke Konishi > > > > because of compiler optimization. But now > > I haven't evidence of it. And it needs to investigate issue more deeply > > for stating something definitely, I think. > > > > With the best regards, > > Vyacheslav Dubeyko. > > > >> May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd > >> May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] > >> May 22 18:53:31 riven kernel: [ 3821.605591] PGD 19636d067 PUD 19636e067 PMD 0 > >> May 22 18:53:31 riven kernel: [ 3821.605597] Oops: 0000 [#1] PREEMPT SMP > >> <snip> > >> May 22 18:53:31 riven kernel: [ 3821.605829] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4 > >> May 22 18:53:31 riven kernel: [ 3821.605873] RIP [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] > >> May 22 18:53:31 riven kernel: [ 3821.605881] RSP <ffff8801960f7b30> > >> May 22 18:53:31 riven kernel: [ 3821.605884] CR2: 00000000000036cd > >> > >> where the instruction sequence of "<48> 8b 07 f6 c4 08" is "mov > >> (%rdi),%rax; test $0x8, %ah", and corresponds to the part testing > >> PagePrivate(page) in page_buffers() macro called within > >> nilfs_end_page_io() routine: > >> > >> if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) { > >> > >> This cannot happen, but there may be something we missed. > >> > >> > >> Regards, > >> Ryusuke Konishi > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem [not found] ` <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2013-05-30 7:21 ` Vyacheslav Dubeyko @ 2013-06-06 6:56 ` Vyacheslav Dubeyko 2013-06-06 9:20 ` Reinoud Zandijk 2013-06-12 20:31 ` Anton Eliasson 1 sibling, 2 replies; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-06-06 6:56 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: Anton Eliasson, linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Thu, 2013-05-30 at 15:55 +0900, Ryusuke Konishi wrote: > On Thu, 30 May 2013 10:13:05 +0400, Vyacheslav Dubeyko wrote: > > On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote: > >> I don't know whether this may be a hint of this trouble, but according > >> to the system log, page_buffers() of nilfs_end_page_io() seems to hit > >> an Oops due to an invalid page address "0x36cd": > >> > > > > Yes. There are two possible way to be in nilfs_end_page_io(): (1) > > nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I > > suspect the nilfs_abort_logs() > > That sounds a likely cause. > > Can you test nilfs_abort_logs by injecting a random fault in some easy > way ? > So, what I discovered currently. First of all, unfortunately, I can't reproduce the issue yet, currently. I suspect that in this issue the aging state of volume, peculiarity of workload and environment play very important role. As I remember, all reporters of likewise symptoms (broken bnode error messages) talked about several months of successful working of NILFS2 file system. I tried to make LVM environment as it was described by Anton. But I didn't catch the issue in this environment. So, I think that I haven't properly aged NILFS2 volume state and I tried not proper workload. It needs to think about proper workload more deeply. As I can see from Anton's system log that it took place frequent update and git activity. Moreover, update and git were nearly before crash: May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (37 782 of 41 158 KB)... May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (38 390 of 41 158 KB)... May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (39 066 of 41 158 KB)... May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (39 742 of 41 158 KB)... May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (40 311 of 41 158 KB)... May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (40 956 of 41 158 KB)... May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:45] Downloading update (41 158 of 41 158 KB)... May 22 18:50:13 riven slim[274]: [2013-05-22 18:48:45] Downl18:50:13 | Git | default | Checking for remote changes... May 22 18:50:13 riven slim[274]: 18:50:13 | Cmd | default | git rev-parse HEAD May 22 18:50:13 riven slim[274]: 18:50:13 | Cmd | default | git ls-remote --heads --exit-code "ssh://storage@hephaestus/home/storage/default" master May 22 18:50:13 riven slim[274]: 18:50:13 | Git | default | No remote changes, local+remote: 8eab1e96aa618010ff17c11a955f4423d823beb6 May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Pinging tcp://notifications.sparkleshare.org:443/ May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Received pong from tcp://notifications.sparkleshare.org:443/ May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] So, maybe, git activity is a possible workload for the issue reproducing. It needs to check it, I suppose. I tried to simulate errors occurrence in nilfs_segctor_do_construct() method by means of excluding of error checking in places: http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1942 http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1953 http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1962 http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1976 http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1989 Initially, by chance, I simply comment error checking statement. Then, I comment error checking statement and additionally set code error by -EINVAL. It is strange but if I set error code then I haven't any visible failure in working of NILFS2 driver. But I have very interesting error in the case when I simply comment error checking statement without setting code error: May 31 15:05:49 slavad-ubuntu nilfs_cleanerd[2409]: run (manual) May 31 15:05:50 slavad-ubuntu kernel: [ 737.725827] [nilfs_segctor_do_construct] fs/nilfs2/segment.c:1944 May 31 15:05:50 slavad-ubuntu nilfs_cleanerd[2409]: cannot clean segments: File exists May 31 15:05:50 slavad-ubuntu nilfs_cleanerd[2409]: shutdown May 31 15:05:50 slavad-ubuntu kernel: [ 737.744660] ------------[ cut here ]------------ May 31 15:05:50 slavad-ubuntu kernel: [ 737.744674] WARNING: at fs/nilfs2/ioctl.c:449 nilfs_ioctl_clean_segments.isra.11+0x667/0x690() May 31 15:05:50 slavad-ubuntu kernel: [ 737.744676] Hardware name: OptiPlex 760 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744679] Modules linked in: snd_hda_codec_analog snd_hda_intel i915 snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event bnep rfcomm snd_seq drm_kms_helper drm bluetooth nfsv4 snd_timer snd_seq_device i2c_algo_bit snd joydev hid_generic soundcore dell_wmi video dcdbas coretemp psmouse serio_raw mei sparse_keymap ppdev snd_page_alloc lpc_ich mac_hid parport_pc microcode wmi lp parport binfmt_misc nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc e1000e ptp pps_core usbhid hid May 31 15:05:50 slavad-ubuntu kernel: [ 737.744746] Pid: 2409, comm: nilfs_cleanerd Tainted: G I 3.9.0-rc6+ #35 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744748] Call Trace: May 31 15:05:50 slavad-ubuntu kernel: [ 737.744756] [<ffffffff8105c7df>] warn_slowpath_common+0x7f/0xc0 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744760] [<ffffffff8105c83a>] warn_slowpath_null+0x1a/0x20 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744765] [<ffffffff81301837>] nilfs_ioctl_clean_segments.isra.11+0x667/0x690 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744771] [<ffffffff81098f0f>] ? local_clock+0x6f/0x80 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744776] [<ffffffff81301e44>] nilfs_ioctl+0x3d4/0x690 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744781] [<ffffffff810c370f>] ? lock_release_non_nested+0x30f/0x350 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744785] [<ffffffff81098ca5>] ? sched_clock_local+0x25/0x90 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744790] [<ffffffff811b7e26>] do_vfs_ioctl+0x96/0x570 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744795] [<ffffffff81169e4c>] ? might_fault+0x5c/0xb0 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744801] [<ffffffff81748985>] ? sysret_check+0x22/0x5d May 31 15:05:50 slavad-ubuntu kernel: [ 737.744805] [<ffffffff811b8391>] sys_ioctl+0x91/0xb0 May 31 15:05:50 slavad-ubuntu kernel: [ 737.744809] [<ffffffff813a70be>] ? trace_hardirqs_on_thunk+0x3a/0x3f May 31 15:05:50 slavad-ubuntu kernel: [ 737.744813] [<ffffffff81748959>] system_call_fastpath+0x16/0x1b May 31 15:05:50 slavad-ubuntu kernel: [ 737.744816] ---[ end trace 374fc1d251cc46c6 ]--- May 31 15:05:50 slavad-ubuntu kernel: [ 737.744933] NILFS: GC failed during preparation: cannot read source blocks: err=-17 May 31 15:09:44 slavad-ubuntu kernel: [ 972.324583] [nilfs_segctor_do_construct] fs/nilfs2/segment.c:1944 May 31 15:09:49 slavad-ubuntu kernel: [ 977.349257] [nilfs_segctor_do_construct] fs/nilfs2/segment.c:1944 May 31 15:11:57 slavad-ubuntu nilfs_cleanerd[2820]: start May 31 15:11:57 slavad-ubuntu nilfs_cleanerd[2820]: pause (clean check) May 31 15:12:08 slavad-ubuntu nilfs_cleanerd[2820]: run (manual) May 31 15:12:08 slavad-ubuntu nilfs_cleanerd[2820]: cannot clean segments: File exists May 31 15:12:08 slavad-ubuntu nilfs_cleanerd[2820]: shutdown May 31 15:12:08 slavad-ubuntu kernel: [ 1115.562880] nilfs_ioctl_move_inode_block: conflicting data buffer: ino=4, cno=0, offset=0, blocknr=2086, vblocknr=232528 May 31 15:12:08 slavad-ubuntu kernel: [ 1115.562887] NILFS: GC failed during preparation: cannot read source blocks: err=-17 As I understand, this error looks like last Anton's reports about complete failure of using the corrupted NILFS2 volume. So, maybe, it is possible to make assumption that it takes place continuous and permanent segments construction abortion in the case of the issue. But simulation by means of commenting error checking statement without setting code error is not proper driver's workflow, as I understand. And it confuses me. Currently, I haven't clear understanding of it. So, it needs to continue investigation of the issue further, from my viewpoint. With the best regards, Vyacheslav Dubeyko. > Regards, > Ryusuke Konishi > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem 2013-06-06 6:56 ` Vyacheslav Dubeyko @ 2013-06-06 9:20 ` Reinoud Zandijk [not found] ` <20130606092054.GA201-HNv6YvNvQKMNqjISwOrxaLFspR4gePGN@public.gmane.org> 2013-06-12 20:31 ` Anton Eliasson 1 sibling, 1 reply; 37+ messages in thread From: Reinoud Zandijk @ 2013-06-06 9:20 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 887 bytes --] Hi, just my $0.02 so to say: On Thu, Jun 06, 2013 at 10:56:09AM +0400, Vyacheslav Dubeyko wrote: > First of all, unfortunately, I can't reproduce the issue yet, currently. > I suspect that in this issue the aging state of volume, peculiarity of > workload and environment play very important role. As I remember, all > reporters of likewise symptoms (broken bnode error messages) talked > about several months of successful working of NILFS2 file system. sounds to me as if a b-tree is in a perculiar state and that updating the btree results in this corruption. Have you tried to mount one of the checkpoints/snapshots earlier as RO and see if those are correct? If so, dumping both DATs and both btrees might give a clue as to what went wrong. If only it gives a clue as to how complicated the btree is before the updating and what actions are taken on it. With regards, Reinoud [-- Attachment #2: Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <20130606092054.GA201-HNv6YvNvQKMNqjISwOrxaLFspR4gePGN@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <20130606092054.GA201-HNv6YvNvQKMNqjISwOrxaLFspR4gePGN@public.gmane.org> @ 2013-06-06 9:34 ` Vyacheslav Dubeyko 2013-06-06 14:19 ` Reinoud Zandijk 2013-06-12 20:12 ` Anton Eliasson 1 sibling, 1 reply; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-06-06 9:34 UTC (permalink / raw) To: Reinoud Zandijk; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Thu, 2013-06-06 at 11:20 +0200, Reinoud Zandijk wrote: > Hi, > > just my $0.02 so to say: > > On Thu, Jun 06, 2013 at 10:56:09AM +0400, Vyacheslav Dubeyko wrote: > > First of all, unfortunately, I can't reproduce the issue yet, currently. > > I suspect that in this issue the aging state of volume, peculiarity of > > workload and environment play very important role. As I remember, all > > reporters of likewise symptoms (broken bnode error messages) talked > > about several months of successful working of NILFS2 file system. > > sounds to me as if a b-tree is in a perculiar state and that updating the > btree results in this corruption. > > Have you tried to mount one of the checkpoints/snapshots earlier as RO and see > if those are correct? If so, dumping both DATs and both btrees might give a > clue as to what went wrong. If only it gives a clue as to how complicated the > btree is before the updating and what actions are taken on it. > Unfortunately, I haven't reproduced issue on my side. On my side all is OK. I am trying to reproduce the issue that was reported by many times of different users. But, currently, without any success on my side. So, I can't investigate the essence of the issue. I know symptoms but I don't know reproducing path of the issue. Thank you for your advice. But without corruption on my side I can't investigate anything. Thanks, Vyacheslav Dubeyko. > With regards, > Reinoud > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem 2013-06-06 9:34 ` Vyacheslav Dubeyko @ 2013-06-06 14:19 ` Reinoud Zandijk 0 siblings, 0 replies; 37+ messages in thread From: Reinoud Zandijk @ 2013-06-06 14:19 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Thu, Jun 06, 2013 at 01:34:48PM +0400, Vyacheslav Dubeyko wrote: > On Thu, 2013-06-06 at 11:20 +0200, Reinoud Zandijk wrote: > Unfortunately, I haven't reproduced issue on my side. On my side all is > OK. I am trying to reproduce the issue that was reported by many times > of different users. But, currently, without any success on my side. So, > I can't investigate the essence of the issue. I know symptoms but I > don't know reproducing path of the issue. Oops i must have CC'd it to you instead of the submitter, well i hope he reads it on the list :) With regards, Reinoud -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem [not found] ` <20130606092054.GA201-HNv6YvNvQKMNqjISwOrxaLFspR4gePGN@public.gmane.org> 2013-06-06 9:34 ` Vyacheslav Dubeyko @ 2013-06-12 20:12 ` Anton Eliasson 1 sibling, 0 replies; 37+ messages in thread From: Anton Eliasson @ 2013-06-12 20:12 UTC (permalink / raw) To: Reinoud Zandijk; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Reinoud Zandijk skrev 2013-06-06 11:20: > Hi, > > just my $0.02 so to say: > > On Thu, Jun 06, 2013 at 10:56:09AM +0400, Vyacheslav Dubeyko wrote: >> First of all, unfortunately, I can't reproduce the issue yet, currently. >> I suspect that in this issue the aging state of volume, peculiarity of >> workload and environment play very important role. As I remember, all >> reporters of likewise symptoms (broken bnode error messages) talked >> about several months of successful working of NILFS2 file system. > sounds to me as if a b-tree is in a perculiar state and that updating the > btree results in this corruption. > > Have you tried to mount one of the checkpoints/snapshots earlier as RO and see > if those are correct? If so, dumping both DATs and both btrees might give a > clue as to what went wrong. If only it gives a clue as to how complicated the > btree is before the updating and what actions are taken on it. > > With regards, > Reinoud > I have configured nilfs_cleanerd.conf to clean very aggressively so my earliest checkpoint is from after the incident. I included the contents of that file in my first email sent on May 22 (http://article.gmane.org/gmane.comp.file-systems.nilfs.user/2920). Even so, I tried to loopback mount the oldest checkpoint I have which I found was affected by the same corruption. # losetup /dev/loop0 /Athena/Dump/riven/riven-home-20130531.img # mount /dev/loop0 /mnt $ mount | tail -1 /dev/loop0 on /mnt type nilfs2 (ro,relatime,norecovery) $ lscp /dev/loop0 CNO DATE TIME MODE FLG NBLKINC ICNT 1260571 2013-05-23 16:51:49 cp - 140 155496 1260572 2013-05-23 16:51:51 cp - 1632 155495 1260575 2013-05-23 16:52:06 cp - 1473 155496 1260576 2013-05-23 16:52:09 cp - 49 155495 1260580 2013-05-24 23:36:11 cp - 1345 155496 1260581 2013-05-24 23:36:16 cp - 1500 155495 1260582 2013-05-24 23:36:21 cp - 1356 155497 1260583 2013-05-24 23:36:26 cp - 1465 155495 # chcp ss /dev/loop0 1260571 # umount /mnt # mount -o ro,norecovery,cp=1260571 /dev/loop0 /mnt $ cd /mnt/anton/Bilder/20130321-28\ Jakobs\ bilder\ från\ Nederländerna $ LANG=C cat *>/dev/null cat: 160.JPG: Input/output error cat: 163.JPG: Input/output error cat: 164.JPG: Input/output error cat: 165.JPG: Input/output error cat: 170.JPG: Input/output error cat: 172.JPG: Input/output error cat: 179.JPG: Input/output error -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem 2013-06-06 6:56 ` Vyacheslav Dubeyko 2013-06-06 9:20 ` Reinoud Zandijk @ 2013-06-12 20:31 ` Anton Eliasson [not found] ` <51B8DA8E.6020802-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 1 sibling, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-06-12 20:31 UTC (permalink / raw) To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Vyacheslav Dubeyko skrev 2013-06-06 08:56: > On Thu, 2013-05-30 at 15:55 +0900, Ryusuke Konishi wrote: >> On Thu, 30 May 2013 10:13:05 +0400, Vyacheslav Dubeyko wrote: >>> On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote: >>>> I don't know whether this may be a hint of this trouble, but according >>>> to the system log, page_buffers() of nilfs_end_page_io() seems to hit >>>> an Oops due to an invalid page address "0x36cd": >>>> >>> Yes. There are two possible way to be in nilfs_end_page_io(): (1) >>> nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I >>> suspect the nilfs_abort_logs() >> That sounds a likely cause. >> >> Can you test nilfs_abort_logs by injecting a random fault in some easy >> way ? >> > So, what I discovered currently. > > First of all, unfortunately, I can't reproduce the issue yet, currently. > I suspect that in this issue the aging state of volume, peculiarity of > workload and environment play very important role. As I remember, all > reporters of likewise symptoms (broken bnode error messages) talked > about several months of successful working of NILFS2 file system. > > I tried to make LVM environment as it was described by Anton. But I > didn't catch the issue in this environment. So, I think that I haven't > properly aged NILFS2 volume state and I tried not proper workload. It > needs to think about proper workload more deeply. As I can see from > Anton's system log that it took place frequent update and git activity. > Moreover, update and git were nearly before crash: I'm not so sure that my issues are caused by aging of the filesystem. As I described in my third e-mail on May 30 (http://article.gmane.org/gmane.comp.file-systems.nilfs.user/2957), I was able to trash my new /home which was only a week old. I'm starting to think it has something to do with either VMware or bup (which is git based) or a combination of both. > May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (37 782 of 41 158 KB)... > May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (38 390 of 41 158 KB)... > May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (39 066 of 41 158 KB)... > May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (39 742 of 41 158 KB)... > May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (40 311 of 41 158 KB)... > May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (40 956 of 41 158 KB)... > May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:45] Downloading update (41 158 of 41 158 KB)... > May 22 18:50:13 riven slim[274]: [2013-05-22 18:48:45] Downl18:50:13 | Git | default | Checking for remote changes... > May 22 18:50:13 riven slim[274]: 18:50:13 | Cmd | default | git rev-parse HEAD > May 22 18:50:13 riven slim[274]: 18:50:13 | Cmd | default | git ls-remote --heads --exit-code "ssh://storage@hephaestus/home/storage/default" master > May 22 18:50:13 riven slim[274]: 18:50:13 | Git | default | No remote changes, local+remote: 8eab1e96aa618010ff17c11a955f4423d823beb6 > May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Pinging tcp://notifications.sparkleshare.org:443/ > May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Received pong from tcp://notifications.sparkleshare.org:443/ > May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd > May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2] > > So, maybe, git activity is a possible workload for the issue > reproducing. It needs to check it, I suppose. Git in this case is a part of SparkleShare. SparkleShare is a Git based file synchronisation program, much like Dropbox but self hosted. However, I've made very little changes to the files tracked by SparkleShare so the Git workload should be extremely light. I believe Steam is what's printing "Downloading update". > I tried to simulate errors occurrence in nilfs_segctor_do_construct() > method by means of excluding of error checking in places: > [...] -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <51B8DA8E.6020802-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <51B8DA8E.6020802-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-06-13 10:01 ` Vyacheslav Dubeyko 0 siblings, 0 replies; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-06-13 10:01 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Wed, 2013-06-12 at 22:31 +0200, Anton Eliasson wrote: [snip] > > I tried to make LVM environment as it was described by Anton. But I > > didn't catch the issue in this environment. So, I think that I haven't > > properly aged NILFS2 volume state and I tried not proper workload. It > > needs to think about proper workload more deeply. As I can see from > > Anton's system log that it took place frequent update and git activity. > > Moreover, update and git were nearly before crash: > I'm not so sure that my issues are caused by aging of the filesystem. As > I described in my third e-mail on May 30 > (http://article.gmane.org/gmane.comp.file-systems.nilfs.user/2957), I > was able to trash my new /home which was only a week old. I'm starting > to think it has something to do with either VMware or bup (which is git > based) or a combination of both. As I understand, the issue takes place on GC side. This fact complicates the situation because the real problem can be far from detected symptoms. I mean that a real issue can occur earlier without detection. So, I assume that possible reasons can be: (1) special file system aging state; (2) race condition. Any of these reason can be reproduced only by clear and strict reproducing path. So, I have to find the reproducing path, firstly. Soon, I'll finish the preparation of debugging output patch set. I hope that this patch set can give more details about the issue by reproducing it on your side. With the best regards, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Broken nilfs2 filesystem 2013-05-29 6:39 ` Vyacheslav Dubeyko 2013-05-29 14:37 ` Ryusuke Konishi @ 2013-05-30 8:10 ` Anton Eliasson [not found] ` <51A70971.40602-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 1 sibling, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-05-30 8:10 UTC (permalink / raw) To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Vyacheslav Dubeyko skrev 2013-05-29 08:39: > Hi Anton > > On Sat, 2013-05-25 at 18:26 +0200, Anton Eliasson wrote: > > [snip] >> More specifically, I have a virtual machine running Windows XP in /home, >> a nilfs filesystem, and a virtual machine running Windows 7 in >> /Supplement. /Supplement is an ext4 volume in the same LVM volume group >> as /home on the same slow hard drive. I can crash the host by either: >> >> * Starting both machines at the same time. >> * Starting the W7 machine first and when it is fully booted to the >> desktop, but still doing I/O intensive Windows stuff, starting the WXP >> machine. >> >> If I first start the WXP machine and let it boot to the desktop, at the >> point where it is actually I/O idle, I can safely start the W7 machine. >> After that I found no trouble installing software updates and logging in >> and out of both machines at the same time, though the HDD made it very >> slow of course. > Currently, I am thinking about reproducing path. It is really important > to have clear reproducing path. But I haven't clear picture of your > environment yet. As I understand, you have two virtual VmWare machine > (Win XP and Win 7). Am I correct? Correct. The Windows XP machine is stored in /home/anton/vmware/ and the Windows 7 machine is stored in /Supplement/anton/vmware/. > Moreover, I am thinking about the fact that virtual machine on different > volumes influence on each other in the issue environment. Currently, I > haven't clear understanding of this. > >> /etc/fstab >> ---------- >> tmpfs /tmp tmpfs nodev,nosuid 0 0 >> /dev/mapper/riven-arch / nilfs2 rw,noatime,discard 0 0 >> /dev/mapper/riven-home /home nilfs2 rw,noatime,discard 0 0 >> /dev/mapper/riven-swap none swap defaults >> 0 0 >> /dev/riven-proto/supplement /Supplement ext4 defaults,noatime 0 0 >> # some NFS mounts excluded >> > As I can see, riven-arch, riven-home and riven-swap are under device > mapper but riven-proto is not. Could you share more details about how > your Logical Volumes environment was prepared? I drew some sketches to help illustrate my partitioning scheme because it's getting quite complicated: http://imgur.com/HC8GstJ,MlKc3DN riven-proto is also managed by device mapper. An LVM volume can be specified either as /dev/mapper/<vg>-<lv> or /dev/<vg>/<lv>. Both of these files are symlinks to the same device. I think I've read somewhere that the former symlinks are created earlier in the boot process but other than that these two ways are basically equivalent. > Current state of fsck.nilfs2 doesn't give many useful details. But debug > output of fsck.nilfs2 contains detailed info about first superblock, > second superblock and segment summaries of all segments. I think that > this output can give to me more understanding about NILFS2 volume state. > Could you share debug output of fsck.nilfs2 for me? > > You can found archive with fsck.nilf2 source code in this place: > (http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz). Please, build fsck.nilfs2 but don't install it. The fsck.nilfs2 on the initial state of development. Currently, fsck.nilfs2 doesn't make any writing operations. So, you can execute command in such way: "fsck.nilfs2 -v debug [device] 2> [output-file]". The output file has a big size, usually. I'll see if I can do that tonight. > I am preparing patch for NILFS2 driver with debug output. I think that > it makes sense to get more detail about the issue on your side because > you can reproduce the issue stably. So, I'll send you this patch as it > will be ready. Have you opportunity to patch your kernel and share debug > output for the reproduced issue case? I've never patched a kernel before but I could give it a try. I probably won't have time for that until next weekend though, as I will be away from this particular computer for the next week. > Thanks, > Vyacheslav Dubeyko. > > -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <51A70971.40602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <51A70971.40602-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-05-30 15:30 ` Anton Eliasson [not found] ` <51A770A8.9070105-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-05-30 15:30 UTC (permalink / raw) To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Anton Eliasson skrev 2013-05-30 10:10: > Vyacheslav Dubeyko skrev 2013-05-29 08:39: [snip] >> Current state of fsck.nilfs2 doesn't give many useful details. But debug >> output of fsck.nilfs2 contains detailed info about first superblock, >> second superblock and segment summaries of all segments. I think that >> this output can give to me more understanding about NILFS2 volume state. >> Could you share debug output of fsck.nilfs2 for me? >> >> You can found archive with fsck.nilf2 source code in this place: >> (http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz). >> Please, build fsck.nilfs2 but don't install it. The fsck.nilfs2 on the >> initial state of development. Currently, fsck.nilfs2 doesn't make any >> writing operations. So, you can execute command in such way: >> "fsck.nilfs2 -v debug [device] 2> [output-file]". The output file has >> a big size, usually. > I'll see if I can do that tonight. Okay, this is what was printed to stdout by fsck.nilfs2 with debug verbosity: fsck.nilfs2 v.0.04-under-development (nilfs-utils 2.1.4) [UI_INFO]: The NILFS superblocks checking begins. [FS_INFO]: [SB] [ID: 0x10200020000005f] [SEG: 0 LOG: 0] Superblock state flag *tells* that filesystem stays in mounted state. [FS_INFO]: [SB] [ID: 0x10200020000006a] Primary and secondary superblocks have different info about last checkpoint. [FS_INFO]: [SB] [ID: 0x10200020000006b] Primary and secondary superblocks have different info about disk block address of partial segment. [FS_INFO]: [SB] [ID: 0x10200020000006c] Primary and secondary superblocks have different info about sequential number of partial segment. [FS_INFO]: [SB] [ID: 0x102000200000070] Primary and secondary superblocks have different last write time. [FS_INFO]: [SB] [ID: 0x102000200000073] Primary and secondary superblocks have different file system state flags. [FS_INFO]: [SB] [ID: 0x10200020000003d] NILFS has valid primary and secondary superblocks. [INTERNAL_INFO]: NILFS has valid primary and secondary superblocks. Requested device: /dev/riven/home. [UI_INFO]: NILFS volume's segments checking begins. [UI_INFO]: FSCK currently has partial and experimental checking functionality. Sorry. Functionality is not implemented yet. [UI_INFO]: All is OK. Have a nice day. And here's the log that was printed to stderr: http://antoneliasson.se/publicdump/riven-home-fsck-stderr.log.gz It's 12 MB gzipped and 457 MB uncompressed. -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <51A770A8.9070105-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <51A770A8.9070105-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-05-30 20:50 ` Anton Eliasson [not found] ` <51A7BB84.3010505-17Olwe7vw2dLC78zk6coLg@public.gmane.org> 0 siblings, 1 reply; 37+ messages in thread From: Anton Eliasson @ 2013-05-30 20:50 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Sorry for my frequent posting today. It looks like my system is falling apart. Earlier today /home died again. This is the volume "newhome" on the sketch I sent two e-mails ago. Just like the last time it involved a file related to VMware and just like last time the ro remount was triggered by my backup program bup as it tried to read the corrupted file. VMware itself was not running at the time. I create a new logical volume riven-proto/homesweethome, format it to ext4, change the fstab entry for /home to /dev/riven-proto/homesweethome, reboot and check the logs. kernel.log said: May 30 19:41:24 riven kernel: [88324.864707] NILFS: bad btree node (blocknr=16521285): level = 100, flags = 0x3c, nchildren = 29793 May 30 19:41:24 riven kernel: [88324.864716] NILFS error (device dm-4): nilfs_bmap_lookup_contig: broken bmap (inode number=117612) May 30 19:41:24 riven kernel: [88324.864716] May 30 19:41:24 riven kernel: [88324.875626] Remounting filesystem read-only May 30 19:41:24 riven kernel: [88324.875803] NILFS: bad btree node (blocknr=16521285): level = 100, flags = 0x3c, nchildren = 29793 May 30 19:41:24 riven kernel: [88324.875809] NILFS error (device dm-4): nilfs_bmap_lookup_contig: broken bmap (inode number=117612) May 30 19:41:24 riven kernel: [88324.875809] This output makes me believe that only one file is corrupted: $ sudo mount -o ro,norecovery /dev/riven-proto/newhome /mnt $ cd /mnt/anton/ $ LANG=C find . -type f -exec cat {} >/dev/null \; cat: ./vmware/WXP/WXP-15dc29db.vmem: Input/output error Next issue: after said reboot I got these errors: May 30 20:09:35 riven kernel: [ 7.298727] nilfs_ioctl_move_inode_block: conflicting data buffer: ino=8079, cno=726783, offset=911, blocknr=4812804, vblocknr=565882 May 30 20:09:35 riven kernel: [ 7.299406] NILFS: GC failed during preparation: cannot read source blocks: err=-17 nilfs_cleanerd won't start on the root fs. Same errors if I try to start it manually (`nilfs_cleanerd /dev/riven/arch` as root). I'd really like to have my SSD back now. Can I dd /home ("old" home on volume group "riven" that we've been debugging these last few days) to an image file and then reformat? I could keep riven-proto/newhome around if you want to debug that as well. As far as I know, riven-proto/newhome died very cleanly, with no rw mounts after the corruption was first discovered. -- Best Regards, Anton Eliasson -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <51A7BB84.3010505-17Olwe7vw2dLC78zk6coLg@public.gmane.org>]
* Re: Broken nilfs2 filesystem [not found] ` <51A7BB84.3010505-17Olwe7vw2dLC78zk6coLg@public.gmane.org> @ 2013-05-31 6:39 ` Vyacheslav Dubeyko 0 siblings, 0 replies; 37+ messages in thread From: Vyacheslav Dubeyko @ 2013-05-31 6:39 UTC (permalink / raw) To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Anton, On Thu, 2013-05-30 at 22:50 +0200, Anton Eliasson wrote: [snip] > > I'd really like to have my SSD back now. Can I dd /home ("old" home on > volume group "riven" that we've been debugging these last few days) to > an image file and then reformat? I could keep riven-proto/newhome around > if you want to debug that as well. As far as I know, riven-proto/newhome > died very cleanly, with no rw mounts after the corruption was first > discovered. > Yes, of course, you can reformat your drive and have working file system. Please, simply make image of corrupted partition with reproducible issue. It will be a great to have such image for further opportunity to investigate the issue on your side in the case of necessity. Anyway, first of all, I'll try to investigate and fix issue on my side. Thank you for all information and details that you provided to us. Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2013-09-04 20:00 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <51F2A8A4.4020400@antoneliasson.se>
[not found] ` <51F2A8A4.4020400-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-07-26 16:52 ` Fwd: Re: Broken nilfs2 filesystem Anton Eliasson
[not found] ` <51F2A945.6050909-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-07-27 16:23 ` Vyacheslav Dubeyko
[not found] ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-07-27 22:32 ` Anton Eliasson
2013-08-15 10:40 ` Nilfs2 crash debugging (was: Broken nilfs2 filesystem) Anton Eliasson
[not found] ` <520CB032.2000602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-08-16 7:11 ` Vyacheslav Dubeyko
2013-08-19 19:55 ` Vyacheslav Dubeyko
[not found] ` <FEA41B6A-7D82-4563-AAF5-D5AFA3734D79-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-08-25 15:02 ` Nilfs2 crash debugging Anton Eliasson
[not found] ` <521A1C88.9080100-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-08-26 9:56 ` Vyacheslav Dubeyko
2013-08-26 18:37 ` Anton Eliasson
[not found] ` <521BA084.80901-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-08-30 5:58 ` Vyacheslav Dubeyko
2013-09-04 19:39 ` Anton Eliasson
[not found] ` <52278C63.6090303-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-09-04 20:00 ` Vyacheslav Dubeyko
[not found] <51A0A97A.4020503@antoneliasson.se>
[not found] ` <713B7146-DC0C-45AE-9ED2-30EB8F84FA57@dubeyko.com>
[not found] ` <713B7146-DC0C-45AE-9ED2-30EB8F84FA57-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-05-27 12:45 ` Broken nilfs2 filesystem Anton Eliasson
[not found] ` <51A35558.1080503-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-27 13:23 ` Vyacheslav Dubeyko
2013-05-22 20:33 Anton Eliasson
[not found] ` <519D2B96.9000106-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-22 20:36 ` Anton Eliasson
[not found] ` <519D2C32.5040600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-23 1:40 ` Ryusuke Konishi
2013-05-23 6:44 ` Vyacheslav Dubeyko
2013-05-25 11:59 ` Anton Eliasson
[not found] ` <51A0A7A0.6010207-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-25 16:26 ` Anton Eliasson
[not found] ` <51A0E62D.5060600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-26 12:54 ` Vyacheslav Dubeyko
2013-05-29 6:39 ` Vyacheslav Dubeyko
2013-05-29 14:37 ` Ryusuke Konishi
[not found] ` <20130529.233757.27789741.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-05-30 6:13 ` Vyacheslav Dubeyko
2013-05-30 6:55 ` Ryusuke Konishi
[not found] ` <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-05-30 7:21 ` Vyacheslav Dubeyko
2013-06-06 6:56 ` Vyacheslav Dubeyko
2013-06-06 9:20 ` Reinoud Zandijk
[not found] ` <20130606092054.GA201-HNv6YvNvQKMNqjISwOrxaLFspR4gePGN@public.gmane.org>
2013-06-06 9:34 ` Vyacheslav Dubeyko
2013-06-06 14:19 ` Reinoud Zandijk
2013-06-12 20:12 ` Anton Eliasson
2013-06-12 20:31 ` Anton Eliasson
[not found] ` <51B8DA8E.6020802-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-06-13 10:01 ` Vyacheslav Dubeyko
2013-05-30 8:10 ` Anton Eliasson
[not found] ` <51A70971.40602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-30 15:30 ` Anton Eliasson
[not found] ` <51A770A8.9070105-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-30 20:50 ` Anton Eliasson
[not found] ` <51A7BB84.3010505-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-31 6:39 ` Vyacheslav Dubeyko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.