All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: Re: Broken nilfs2 filesystem
       [not found] ` <51F2A8A4.4020400-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-07-26 16:52   ` Anton Eliasson
       [not found]     ` <51F2A945.6050909-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Anton Eliasson @ 2013-07-26 16:52 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

I forgot to send this to the list.

-------- Ursprungligt meddelande --------
Ämne: 	Re: Broken nilfs2 filesystem
Datum: 	Fri, 26 Jul 2013 18:49:40 +0200
Från: 	Anton Eliasson <devel-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
Till: 	Vyacheslav Dubeyko <slava-2lV3ebY47BVBDgjK7y7TUQ@public.gmane.org>



Vyacheslav Dubeyko skrev 2013-07-26 14:37:
> Hi Anton,
>
> Do you ready to try to obtain debug output? I am really waiting your
> readiness because your opportunity to reproduce the issue is very
> important.
>
> I think that it needs to enable such configuration options:
> 1. CONFIG_NILFS2_DEBUG_SHOW_ERRORS
> 2. CONFIG_NILFS2_DEBUG_BASE_OPERATIONS
> 3. CONFIG_NILFS2_DEBUG_MDT_FILES
> 4. CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM
> 5. CONFIG_NILFS2_DEBUG_BLOCK_MAPPING
> 6. CONFIG_NILFS2_DEBUG_DUMP_STACK
>
> Thanks,
> Vyacheslav Dubeyko.

Hi,
Thanks for the reminder! I tried it out a few weeks ago. My plan then
was to restore the old root system (running Linux 3.9) and /home to the
mechanical hard drive, boot them up and use that system to build the
newer kernel with your kernel. This would make minimal changes to the
system I had experienced the bugs with, I thought.

Building the kernel took me some time but I eventually succeded. I
forgot those configuration options though so I suppose that build did
nothing out of the ordinary, even though it had the patches. I got it to
boot but then ran in to the issue of incompatible video drivers. I've
been down that road before and I did not feel like going back.

My next plan was to wait until 3.10 got released into the core
repository which happened yesterday or so. I would then use my fully
updated system (currently running the 3.10 kernel on an ext4 root
filesystem) to build a kernel with the debug patches. It would be a
slightly different system with many updated packages, but at least the
root filesystem will be a healthy ext4. So that's what I did today.
Buildlogs excerpts are at the bottom of this message.

I got the kernel to boot and X to start. But as soon as anything tried
to read from /home (even just logging into a virtual terminal as a
regular user), that terminal froze. Logging in as root to a different VT
showed that syslog-ng and nilfs_cleanerd took 100% CPU each. After a
while the entire system froze. Pressing Alt+SysRq+R caused the kernel to
spew out an endless stream of call traces to the terminal, leaving no
other way to reboot than a hard reset. After a reboot I found that the
running nilfs_cleanerd does not respond to either a SIGTERM or a SIGKILL.

Changing the mount options of /home from rw,noatime,discard to
ro,norecovery,noatime,discard seems to work. No crashes during login to
a VT. Read-only is no fun though.

Next attempt is rw,nogc,noatime,discard. It seems to work also, though
there are a lot of call traces in dmesg. After just a few reboots and a
few minutes of uptime, /var/log/kernel.log, messages.log and
everything.log have grown to 3.5 GB each. My / is only 30 GB so I can't
sustain this for very long.

I have aborted the experiments for today. kernel.log has 35 million
lines and compresses to 220 MB. I've uploaded it here
(http://antoneliasson.se/publicdump/kernel.log.20130726.gz). What should
I do next?

Build logs
=======

* Download kernel package build scripts and patches (from the Arch Build
System). Unpack and apply downstream patches:

     $ makepkg -o
     ==> Skapar paket: linux 3.10.2-1 (fre jul 26 17:08:16 CEST 2013)
     ==> Retrieving sources...
       -> Hittade linux-3.10.tar.xz
       -> Laddar ner patch-3.10.2.xz...
       % Total    % Received % Xferd  Average Speed   Time Time
Time  Current
                                      Dload  Upload   Total Spent
Left  Speed
     100   178  100   178    0     0    188      0 --:--:-- --:--:--
--:--:--   188
     100 27876  100 27876    0     0  13761      0  0:00:02  0:00:02
--:--:-- 37978
       -> Hittade config
       -> Hittade config.x86_64
       -> Hittade linux.preset
       -> Hittade change-default-console-loglevel.patch
     ==> Validerar källfiler med md5sums...
         linux-3.10.tar.xz ... Godkänd
         patch-3.10.2.xz ... Godkänd
         config ... Godkänd
         config.x86_64 ... Godkänd
         linux.preset ... Godkänd
         change-default-console-loglevel.patch ... Godkänd
     ==> Extracting sources...
       -> Extraherar linux-3.10.tar.xz med bsdtar
       -> Extraherar patch-3.10.2.xz med xz
     ==> Startar prepare()...
     patching file Documentation/parisc/registers
     patching file MAINTAINERS
     patching file Makefile
     patching file arch/arm/boot/dts/imx23.dtsi
     patching file arch/arm/boot/dts/imx28.dtsi
     patching file arch/arm/boot/dts/imx6dl.dtsi
     patching file arch/arm/boot/dts/imx6q.dtsi
     patching file arch/arm/include/asm/mmu_context.h
     patching file arch/arm/kernel/perf_event.c
     [...]
     patching file mm/page_alloc.c
     patching file mm/slab.c
     patching file net/ceph/auth_none.c
     patching file kernel/printk.c
     Hunk #1 succeeded at 56 with fuzz 2 (offset -2 lines).
     ==> Källor är redo.
     $ ls -l
     -rw-r--r-- 1 anton anton 2,7K 26 jul 00.06
alsa-firmware-loading-3.8.8.patch
     -rw-r--r-- 1 anton anton  605 26 jul 00.06
change-default-console-loglevel.patch
     -rw-r--r-- 1 anton anton 142K 26 jul 17.13 config
     -rw-r--r-- 1 anton anton 142K 26 jul 00.06 config~
     -rw-r--r-- 1 anton anton 138K 26 jul 17.13 config.x86_64
     -rw-r--r-- 1 anton anton 138K 26 jul 00.06 config.x86_64~
     -rw-r--r-- 1 anton anton  926 26 jul 00.06 linux.install
     -rw-r--r-- 1 anton anton  376 26 jul 00.06 linux.preset
     drwxr-xr-x 2 anton anton 4,0K 26 jul 17.06
nilfs2-debug-output-patch-set-25-06-2013
     -rw-r--r-- 1 anton anton  13K 26 jul 00.06 PKGBUILD
     drwxr-xr-x 3 anton anton 4,0K 26 jul 17.08 src

* Apply nilfs2 debug patches:

     $ cd src/linux-3.10/
     $ for file in
../../nilfs2-debug-output-patch-set-25-06-2013/*.patch; do patch -p1 <
"$file"; done
     patching file fs/nilfs2/Kconfig
     patching file fs/nilfs2/debug.h
     patching file fs/nilfs2/Kconfig
     patching file fs/nilfs2/debug.h
     patching file fs/nilfs2/dir.c
     patching file fs/nilfs2/file.c
     patching file fs/nilfs2/inode.c
     patching file fs/nilfs2/ioctl.c
     patching file fs/nilfs2/namei.c
     patching file fs/nilfs2/nilfs.h
     [...]
     patching file fs/nilfs2/segbuf.c
     patching file fs/nilfs2/segment.c
     patching file fs/nilfs2/sufile.c
     patching file fs/nilfs2/super.c
     patching file fs/nilfs2/the_nilfs.c
     patching file fs/nilfs2/Kconfig
     patching file fs/nilfs2/debug.h

* Append the following lines to config (just in case) and config.x86_64
(which I assume I will use):

     CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y
     CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y
     CONFIG_NILFS2_DEBUG_MDT_FILES=y
     CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y
     CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y
     CONFIG_NILFS2_DEBUG_DUMP_STACK=y

* Build the kernel. There were some interactive configuration options
concerning nilfs2 that I responded yes to:

     $ makepkg -e
     ==> Skapar paket: linux 3.10.2-1 (fre jul 26 17:16:41 CEST 2013)
     ==> Checking runtime dependencies...
     ==> Checking buildtime dependencies...
     ==> VARNING:  Using existing src/ tree
     ==> Startar build()...
       HOSTCC  scripts/basic/fixdep
       HOSTCC  scripts/kconfig/conf.o
       SHIPPED scripts/kconfig/zconf.tab.c
       SHIPPED scripts/kconfig/zconf.lex.c
       SHIPPED scripts/kconfig/zconf.hash.c
       HOSTCC  scripts/kconfig/zconf.tab.o
       HOSTLD  scripts/kconfig/conf
     scripts/kconfig/conf --silentoldconfig Kconfig
     *
     * Restart config...
     *
     *
     * File systems
     *
     [...]
     NILFS2 file system support (NILFS2_FS) [M/n/y/?] m
       NILFS2 debugging (NILFS2_DEBUG) [N/y/?] (NEW) y
         Use pr_debug() instead of printk() (NILFS2_USE_PR_DEBUG)
[N/y/?] (NEW) y
         Show internal errors (NILFS2_DEBUG_SHOW_ERRORS) [N/y/?] (NEW) y
         Enable dump stack output (NILFS2_DEBUG_DUMP_STACK) [N/y/?] (NEW) y
     [...]
     #
     # configuration written to .config
     #
       SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_32.h
       SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_64.h
       SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_x32.h
       SYSTBL arch/x86/syscalls/../include/generated/asm/syscalls_32.h
       SYSHDR arch/x86/syscalls/../include/generated/asm/unistd_32_ia32.h
       SYSHDR arch/x86/syscalls/../include/generated/asm/unistd_64_x32.h
       SYSTBL arch/x86/syscalls/../include/generated/asm/syscalls_64.h
       WRAP    arch/x86/include/generated/asm/clkdev.h
       CHK     include/generated/uapi/linux/version.h
       UPD     include/generated/uapi/linux/version.h
       CHK     include/generated/utsrelease.h
       UPD     include/generated/utsrelease.h
       HOSTCC  arch/x86/tools/relocs_32.o
     [...]
       INSTALL
/home/anton/build/linux/pkg/linux/lib/firmware/edgeport/down3.bin
       INSTALL
/home/anton/build/linux/pkg/linux/lib/firmware/keyspan_pda/keyspan_pda.fw
       INSTALL
/home/anton/build/linux/pkg/linux/lib/firmware/keyspan_pda/xircom_pgs.fw
       INSTALL
/home/anton/build/linux/pkg/linux/lib/firmware/cpia2/stv0672_vp4.bin
       INSTALL /home/anton/build/linux/pkg/linux/lib/firmware/yam/1200.bin
       INSTALL /home/anton/build/linux/pkg/linux/lib/firmware/yam/9600.bin
       DEPMOD  3.10.2-1-ARCH
     ==> Städar upp efter installationen...
       -> Rensar oönskade filer...
       -> Komprimerar man och info sidor...
     ==> Creating package "linux"...
       -> Skapar .PKGINFO fil...
       -> Lägger till install fil...
       -> Generating .MTREE file...
       -> Komprimerar paket...
     ==> Startar package_linux-headers()...
     ==> Städar upp efter installationen...
       -> Rensar oönskade filer...
       -> Komprimerar man och info sidor...
     ==> Creating package "linux-headers"...
       -> Skapar .PKGINFO fil...
       -> Generating .MTREE file...
       -> Komprimerar paket...
     ==> Startar package_linux-docs()...
     ==> Städar upp efter installationen...
       -> Rensar oönskade filer...
       -> Komprimerar man och info sidor...
     ==> Creating package "linux-docs"...
       -> Skapar .PKGINFO fil...
       -> Generating .MTREE file...
       -> Komprimerar paket...
     ==> Leaving fakeroot environment.
     ==> Kompilering klar: linux 3.10.2-1 (fre jul 26 17:46:42 CEST 2013)

* Install over the stock 3.10 kernel, point /home to the old nilfs2
filesystem in /etc/fstab and reboot.

-- 
Best Regards,
Anton Eliasson



--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]     ` <51F2A945.6050909-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-07-27 16:23       ` Vyacheslav Dubeyko
       [not found]         ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Vyacheslav Dubeyko @ 2013-07-27 16:23 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Anton,

On Jul 26, 2013, at 8:52 PM, Anton Eliasson wrote:

Thank you for your efforts. But, as I understand, currently, you
don't reproduce the issue and shared system log doesn't contain
any new details about the issue. Please, see my description below.

[snip]
> 
> I have aborted the experiments for today. kernel.log has 35 million
> lines and compresses to 220 MB. I've uploaded it here
> (http://antoneliasson.se/publicdump/kernel.log.20130726.gz). What should
> I do next?
> 

Unfortunately, the shared system log content doesn't contain any NILFS2
error messages. So, it means that the issue doesn't be reproduced. Do you
really confident that you can reproduce the issue before beginning of getting
debug output? Could you check firstly the issue reproducibility?

You made one mistake during configuration of debug output. Please, see
my description below.

[snip]
> * Append the following lines to config (just in case) and config.x86_64
> (which I assume I will use):
> 
>    CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y
>    CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y
>    CONFIG_NILFS2_DEBUG_MDT_FILES=y
>    CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y
>    CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y
>    CONFIG_NILFS2_DEBUG_DUMP_STACK=y
> 

I think that better to use "make menuconfig" for debug output configuration
because above-mentioned options have dependencies from other ones.
Please, use "make menuconfig" way because it is not so easy to describe
what set of configuration options are valid.

[snip]
>    * File systems
>    *
>    [...]
>    NILFS2 file system support (NILFS2_FS) [M/n/y/?] m
>      NILFS2 debugging (NILFS2_DEBUG) [N/y/?] (NEW) y
>        Use pr_debug() instead of printk() (NILFS2_USE_PR_DEBUG)
> [N/y/?] (NEW) y

No, no, no... When you select using pr_debug() then you disable
CONFIG_NILFS2_DEBUG_BASE_OPERATIONS,
CONFIG_NILFS2_DEBUG_MDT_FILES, CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM,
CONFIG_NILFS2_DEBUG_BLOCK_MAPPING options because you need to use
dynamic debug opportunity (please, see Documentation/dynamic-debug-howto.txt).
Moreover, when you select CONFIG_NILFS2_DEBUG_DUMP_STACK in dynamic
debug output case then every function emits dump_stack() output. Please, read
comments for configuration options.

Firstly, I want to get debug output without enabling pr_debug(). We will have debug output
only from requested subsystems in the case of using simple printk(). So, improper
configuration of debug output is the reason of huge size of system log. I suggest
not to use  CONFIG_NILFS2_DEBUG_DUMP_STACK option, firstly.

>        Show internal errors (NILFS2_DEBUG_SHOW_ERRORS) [N/y/?] (NEW) y
>        Enable dump stack output (NILFS2_DEBUG_DUMP_STACK) [N/y/?] (NEW) y
> 

So, first of all, we need to reproduce the issue in initial state. Then, it needs to configure
debug output properly and to get debug output for the case of reproduced issue.

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]         ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-07-27 22:32           ` Anton Eliasson
  2013-08-15 10:40           ` Nilfs2 crash debugging (was: Broken nilfs2 filesystem) Anton Eliasson
  1 sibling, 0 replies; 12+ messages in thread
From: Anton Eliasson @ 2013-07-27 22:32 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Vyacheslav Dubeyko skrev 2013-07-27 18:23:
> Hi Anton,
>
> On Jul 26, 2013, at 8:52 PM, Anton Eliasson wrote:
>
> Thank you for your efforts. But, as I understand, currently, you
> don't reproduce the issue and shared system log doesn't contain
> any new details about the issue. Please, see my description below.
That is correct, I just wanted to know if I was on the right track (and 
it turned out that I weren't).
> [snip]
>> I have aborted the experiments for today. kernel.log has 35 million
>> lines and compresses to 220 MB. I've uploaded it here
>> (http://antoneliasson.se/publicdump/kernel.log.20130726.gz). What should
>> I do next?
>>
> Unfortunately, the shared system log content doesn't contain any NILFS2
> error messages. So, it means that the issue doesn't be reproduced. Do you
> really confident that you can reproduce the issue before beginning of getting
> debug output? Could you check firstly the issue reproducibility?
You're right, I should try that first. Unfortunately, I'll be away from 
this computer again for the next week or two. I'll try to allocate some 
time for this investigation after that. I can access it via SSH so I 
might spend an evening recompiling the kernel remotely, but I don't want 
to reboot the computer remotely.
>
> You made one mistake during configuration of debug output. Please, see
> my description below.
>
> [snip]
>> * Append the following lines to config (just in case) and config.x86_64
>> (which I assume I will use):
>>
>>     CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y
>>     CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y
>>     CONFIG_NILFS2_DEBUG_MDT_FILES=y
>>     CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y
>>     CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y
>>     CONFIG_NILFS2_DEBUG_DUMP_STACK=y
>>
> I think that better to use "make menuconfig" for debug output configuration
> because above-mentioned options have dependencies from other ones.
> Please, use "make menuconfig" way because it is not so easy to describe
> what set of configuration options are valid.
>
> [snip]
>>     * File systems
>>     *
>>     [...]
>>     NILFS2 file system support (NILFS2_FS) [M/n/y/?] m
>>       NILFS2 debugging (NILFS2_DEBUG) [N/y/?] (NEW) y
>>         Use pr_debug() instead of printk() (NILFS2_USE_PR_DEBUG)
>> [N/y/?] (NEW) y
> No, no, no... When you select using pr_debug() then you disable
> CONFIG_NILFS2_DEBUG_BASE_OPERATIONS,
> CONFIG_NILFS2_DEBUG_MDT_FILES, CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM,
> CONFIG_NILFS2_DEBUG_BLOCK_MAPPING options because you need to use
> dynamic debug opportunity (please, see Documentation/dynamic-debug-howto.txt).
> Moreover, when you select CONFIG_NILFS2_DEBUG_DUMP_STACK in dynamic
> debug output case then every function emits dump_stack() output. Please, read
> comments for configuration options.
>
> Firstly, I want to get debug output without enabling pr_debug(). We will have debug output
> only from requested subsystems in the case of using simple printk(). So, improper
> configuration of debug output is the reason of huge size of system log. I suggest
> not to use  CONFIG_NILFS2_DEBUG_DUMP_STACK option, firstly.
>
>>         Show internal errors (NILFS2_DEBUG_SHOW_ERRORS) [N/y/?] (NEW) y
>>         Enable dump stack output (NILFS2_DEBUG_DUMP_STACK) [N/y/?] (NEW) y
>>
> So, first of all, we need to reproduce the issue in initial state. Then, it needs to configure
> debug output properly and to get debug output for the case of reproduced issue.
>
> Thanks,
> Vyacheslav Dubeyko.
>
Thanks for all your advice. I'm very new to compiling and configuring 
kernels. I will keep you updated on how my next attempt works out.

-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Nilfs2 crash debugging (was: Broken nilfs2 filesystem)
       [not found]         ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  2013-07-27 22:32           ` Anton Eliasson
@ 2013-08-15 10:40           ` Anton Eliasson
       [not found]             ` <520CB032.2000602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Anton Eliasson @ 2013-08-15 10:40 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Vyacheslav Dubeyko skrev 2013-07-27 18:23:
> Hi Anton,
>
> On Jul 26, 2013, at 8:52 PM, Anton Eliasson wrote:
>
> Thank you for your efforts. But, as I understand, currently, you
> don't reproduce the issue and shared system log doesn't contain
> any new details about the issue. Please, see my description below.
>
[snip]

Hi again. I was able to reproduce the crash on a fully updated system by 
starting the two virtual machines simultaneously as described in my 
e-mail from May 25. I made a new attempt to rebuild the kernel with your 
patches. I selected these options in make menuconfig [1], which resulted 
in this generated config.x86_64 [2] which has the following diff 
compared to the stock config.x86_64:

     --- config.x86_64    2013-08-11 00:06:09.000000000 +0200
     +++ config.x86_64.last    2013-08-11 12:48:44.094979947 +0200
     @@ -1,6 +1,6 @@
      #
      # Automatically generated file; DO NOT EDIT.
     -# Linux/x86 3.10.0-1 Kernel Configuration
     +# Linux/x86 3.10.5-1 Kernel Configuration
      #
      CONFIG_64BIT=y
      CONFIG_X86_64=y
     @@ -5450,6 +5450,11 @@
      # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
      # CONFIG_BTRFS_DEBUG is not set
      CONFIG_NILFS2_FS=m
     +CONFIG_NILFS2_DEBUG=y
     +# CONFIG_NILFS2_USE_PR_DEBUG is not set
     +CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y
     +CONFIG_NILFS2_DEBUG_DUMP_STACK=y
     +# CONFIG_NILFS2_DEBUG_SUBSYSTEMS is not set
      CONFIG_FS_POSIX_ACL=y
      CONFIG_EXPORTFS=y
      CONFIG_FILE_LOCKING=y

I hope those build options are the ones you want. Using the custom 
kernel and mount options, I could reproduce the crash right away. Here's 
the log [3] (crash at timestamp "Aug 15 10:26:26 riven kernel: [  
376.625992]"). The cleaner wasn't running at the time. I don't remember 
if I used the mount option nogc or if I killed it manually after booting up.

Because of these uncertainties and the fact that the log is a bit messy, 
I attempted to rotate the logs, reboot and try again. Of course, that 
caused this heisenbug to disappear again. I produced some pretty logs 
showing lots of errors without the cleaner[4], with the cleaner started 
manually [5] and with the cleaner started at boot [6]. None of them show 
the crash however so they may be of limited use for you.

Okay, one final attempt. I reinstalled the stock kernel and managed to 
crash the system using the virtual machines like before. I then 
reinstalled the custom kernel, rotated the logs, rebooted with the mount 
options "rw,noatime,discard", left the cleanerd running and fired up 
VMware. I was happy to see the system die as expected. [7] and [8] 
should contain beautiful logs of everything from boot to crash.

[1]: http://antoneliasson.se/publicdump/menuconfig.png
[2]: http://antoneliasson.se/publicdump/config.x86_64.last
[3]: http://antoneliasson.se/publicdump/kernel.log.2.gz
[4]: 
http://antoneliasson.se/publicdump/kernel.log.nogc-nocleanerd-nocrash.2013-08-15.1048.log.gz
[5]: 
http://antoneliasson.se/publicdump/kernel.log.nogc-cleanerd-nocrash.2013-08-15.1054.log.gz
[6]: 
http://antoneliasson.se/publicdump/kernel.log.gc-cleanerd-nocrash.2013-08-15.1104.log.gz
[7]: 
http://antoneliasson.se/publicdump/kernel.log.gc-cleanerd-crash.2013-08-15.1205.log.gz
[8]: 
http://antoneliasson.se/publicdump/everything.log.gc-cleanerd-crash.2013-08-15.1211.log.gz

-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nilfs2 crash debugging (was: Broken nilfs2 filesystem)
       [not found]             ` <520CB032.2000602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-08-16  7:11               ` Vyacheslav Dubeyko
  2013-08-19 19:55               ` Vyacheslav Dubeyko
  1 sibling, 0 replies; 12+ messages in thread
From: Vyacheslav Dubeyko @ 2013-08-16  7:11 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Anton,

On Thu, 2013-08-15 at 12:40 +0200, Anton Eliasson wrote:

> Hi again. I was able to reproduce the crash on a fully updated system by 
> starting the two virtual machines simultaneously as described in my 
> e-mail from May 25. I made a new attempt to rebuild the kernel with your 
> patches. I selected these options in make menuconfig [1], which resulted 
> in this generated config.x86_64 [2] which has the following diff 
> compared to the stock config.x86_64:
> 

Thank you for sharing the debug output for the issue. I am going to be
acquainted with shared materials during this weekend (or on the next
week). Unfortunately, I have busy week.

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nilfs2 crash debugging (was: Broken nilfs2 filesystem)
       [not found]             ` <520CB032.2000602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  2013-08-16  7:11               ` Vyacheslav Dubeyko
@ 2013-08-19 19:55               ` Vyacheslav Dubeyko
       [not found]                 ` <FEA41B6A-7D82-4563-AAF5-D5AFA3734D79-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Vyacheslav Dubeyko @ 2013-08-19 19:55 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


On Aug 15, 2013, at 2:40 PM, Anton Eliasson wrote:

[snip]
> Hi again. I was able to reproduce the crash on a fully updated system by starting the two virtual machines simultaneously as described in my e-mail from May 25. I made a new attempt to rebuild the kernel with your patches. I selected these options in make menuconfig [1], which resulted in this generated config.x86_64 [2] which has the following diff compared to the stock config.x86_64:
> 

As I remember, you reported about remount file system in RO mode
and many "broken bnode" error messages issue, initially. Unfortunately,
as I can see, you can't reproduce this issue. I really had hope that you
can reproduce this important issue.

As I see, shared by you logs with crush contain details about the issue
that it was reported also by Jérôme Poulin <jeromepoulin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>.
I mean this error message:

[  304.494448] BUG: unable to handle kernel paging request at 00000000000013f6     
[  304.494456] IP: [<ffffffffa1327232>] nilfs_end_page_io+0x12/0xc0 [nilfs2]

I can reproduce this issue on my side and this issue is under investigation yet.

But anyway... Could you try to reproduce the issue with remounting
file system in RO mode? It is really important and annoying issue.

>    --- config.x86_64    2013-08-11 00:06:09.000000000 +0200
>    +++ config.x86_64.last    2013-08-11 12:48:44.094979947 +0200
>    @@ -1,6 +1,6 @@
>     #
>     # Automatically generated file; DO NOT EDIT.
>    -# Linux/x86 3.10.0-1 Kernel Configuration
>    +# Linux/x86 3.10.5-1 Kernel Configuration
>     #
>     CONFIG_64BIT=y
>     CONFIG_X86_64=y
>    @@ -5450,6 +5450,11 @@
>     # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
>     # CONFIG_BTRFS_DEBUG is not set
>     CONFIG_NILFS2_FS=m
>    +CONFIG_NILFS2_DEBUG=y
>    +# CONFIG_NILFS2_USE_PR_DEBUG is not set
>    +CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y
>    +CONFIG_NILFS2_DEBUG_DUMP_STACK=y
>    +# CONFIG_NILFS2_DEBUG_SUBSYSTEMS is not set
>     CONFIG_FS_POSIX_ACL=y
>     CONFIG_EXPORTFS=y
>     CONFIG_FILE_LOCKING=y
> 

As I remember, I asked you about enabling more configuration options.
I mean such options:
CONFIG_NILFS2_DEBUG_BASE_OPERATIONS,
CONFIG_NILFS2_DEBUG_MDT_FILES, 
CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM,
CONFIG_NILFS2_DEBUG_BLOCK_MAPPING.

I suppose that you don't enable these options because it has dependence
from "Enable output from subsystem" option. But, anyway, I am afraid
that you don't reproduce the issue in the case of these options enabling.
But maybe you will be more lucky in such trying. :) 

Anyway, thank you for your efforts. It will be really great if you will be lucky
and will reproduce the issue with remount file system in RO mode
and many "broken bnode" error messages. Could you try again?

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nilfs2 crash debugging
       [not found]                 ` <FEA41B6A-7D82-4563-AAF5-D5AFA3734D79-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-08-25 15:02                   ` Anton Eliasson
       [not found]                     ` <521A1C88.9080100-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Anton Eliasson @ 2013-08-25 15:02 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Vyacheslav Dubeyko skrev 2013-08-19 21:55:
> On Aug 15, 2013, at 2:40 PM, Anton Eliasson wrote:
>
> [snip]
>> Hi again. I was able to reproduce the crash on a fully updated system by starting the two virtual machines simultaneously as described in my e-mail from May 25. I made a new attempt to rebuild the kernel with your patches. I selected these options in make menuconfig [1], which resulted in this generated config.x86_64 [2] which has the following diff compared to the stock config.x86_64:
>>
> As I remember, you reported about remount file system in RO mode
> and many "broken bnode" error messages issue, initially. Unfortunately,
> as I can see, you can't reproduce this issue. I really had hope that you
> can reproduce this important issue.
>
> As I see, shared by you logs with crush contain details about the issue
> that it was reported also by Jérôme Poulin <jeromepoulin-Re5JQEeQqe8@public.gmane.orgm>.
> I mean this error message:
>
> [  304.494448] BUG: unable to handle kernel paging request at 00000000000013f6
> [  304.494456] IP: [<ffffffffa1327232>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
>
> I can reproduce this issue on my side and this issue is under investigation yet.
>
> But anyway... Could you try to reproduce the issue with remounting
> file system in RO mode? It is really important and annoying issue.
Yes, that one is much easier to reproduce. I simply try to read one of 
the corrupted files in /home. See below. I have no idea how the actual 
corruption happened, however.

> [...]
> As I remember, I asked you about enabling more configuration options.
> I mean such options:
> CONFIG_NILFS2_DEBUG_BASE_OPERATIONS,
> CONFIG_NILFS2_DEBUG_MDT_FILES,
> CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM,
> CONFIG_NILFS2_DEBUG_BLOCK_MAPPING.
>
> I suppose that you don't enable these options because it has dependence
> from "Enable output from subsystem" option. But, anyway, I am afraid
> that you don't reproduce the issue in the case of these options enabling.
> But maybe you will be more lucky in such trying. :)
I think I got it right this time. The missing options appeared after I 
enabled CONFIG_NILFS2_DEBUG_SUBSYSTEMS. The config I used is here [1], 
which has the following diff compared to the upstream config:

     --- config.x86_64    2013-08-25 06:53:05.000000000 +0200
     +++ config.x86_64.last    2013-08-25 15:24:51.118711529 +0200
     @@ -1,6 +1,6 @@
      #
      # Automatically generated file; DO NOT EDIT.
     -# Linux/x86 3.10.5-1 Kernel Configuration
     +# Linux/x86 3.10.9-1 Kernel Configuration
      #
      CONFIG_64BIT=y
      CONFIG_X86_64=y
     @@ -5452,6 +5452,20 @@
      # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
      # CONFIG_BTRFS_DEBUG is not set
      CONFIG_NILFS2_FS=m
     +CONFIG_NILFS2_DEBUG=y
     +# CONFIG_NILFS2_USE_PR_DEBUG is not set
     +CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y
     +CONFIG_NILFS2_DEBUG_DUMP_STACK=y
     +CONFIG_NILFS2_DEBUG_SUBSYSTEMS=y
     +CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y
     +CONFIG_NILFS2_DEBUG_MDT_FILES=y
     +CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y
     +# CONFIG_NILFS2_DEBUG_GC_SUBSYSTEM is not set
     +# CONFIG_NILFS2_DEBUG_RECOVERY_SUBSYSTEM is not set
     +CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y
     +# CONFIG_NILFS2_DEBUG_BUFFER_MANAGEMENT is not set
     +# CONFIG_NILFS2_DEBUG_SHOW_SPAM is not set
     +# CONFIG_NILFS2_DEBUG_HEXDUMP is not set
      CONFIG_FS_POSIX_ACL=y
      CONFIG_EXPORTFS=y
      CONFIG_FILE_LOCKING=y

> Anyway, thank you for your efforts. It will be really great if you will be lucky
> and will reproduce the issue with remount file system in RO mode
> and many "broken bnode" error messages. Could you try again?
>
> Thanks,
> Vyacheslav Dubeyko.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yes. Here's another huge kernel.log for you [2]. It's 19 MB compressed 
and 282 MB uncompressed. I blanked the log while running the stock 
kernel and then rebooted to the custom debugging kernel. X wouldn't 
start so I just logged in to a virtual terminal, changed directory to 
"~/Bilder/20130321-28 Jakobs bilder från Nederländerna" and then 
executed `cat 179.JPG >/dev/null`.

This caused a read-only remount and a bunch of "broken bmap" messages to 
show, followed by an "Input/Output error". I saved a copy of 
/var/log/kernel.log as soon as I could after that, before reinstalling 
the stock kernel and rebooting.

[1]: http://antoneliasson.se/publicdump/config.x86_64.last.20130825
[2]: http://antoneliasson.se/publicdump/kernel.log.20130825.gz

-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nilfs2 crash debugging
       [not found]                     ` <521A1C88.9080100-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-08-26  9:56                       ` Vyacheslav Dubeyko
  2013-08-26 18:37                         ` Anton Eliasson
  0 siblings, 1 reply; 12+ messages in thread
From: Vyacheslav Dubeyko @ 2013-08-26  9:56 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Sun, 2013-08-25 at 17:02 +0200, Anton Eliasson wrote:

[snip]
> Yes. Here's another huge kernel.log for you [2]. It's 19 MB compressed 
> and 282 MB uncompressed. I blanked the log while running the stock 
> kernel and then rebooted to the custom debugging kernel. X wouldn't 
> start so I just logged in to a virtual terminal, changed directory to 
> "~/Bilder/20130321-28 Jakobs bilder från Nederländerna" and then 
> executed `cat 179.JPG >/dev/null`.
> 
> This caused a read-only remount and a bunch of "broken bmap" messages to 
> show, followed by an "Input/Output error". I saved a copy of 
> /var/log/kernel.log as soon as I could after that, before reinstalling 
> the stock kernel and rebooting.
> 
> [1]: http://antoneliasson.se/publicdump/config.x86_64.last.20130825
> [2]: http://antoneliasson.se/publicdump/kernel.log.20130825.gz
> 

Yes, it's great. Thank you. Now I can investigate the issue's
environment. I suspect that this issue is related to the issue with
"unable to handle kernel paging request" in nilfs_end_page_io(). But,
maybe, I am wrong. Anyway, it is a good basis for more detailed
understanding of the issue.

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nilfs2 crash debugging
  2013-08-26  9:56                       ` Vyacheslav Dubeyko
@ 2013-08-26 18:37                         ` Anton Eliasson
       [not found]                           ` <521BA084.80901-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Anton Eliasson @ 2013-08-26 18:37 UTC (permalink / raw)
  To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Vyacheslav Dubeyko skrev 2013-08-26 11:56:
> On Sun, 2013-08-25 at 17:02 +0200, Anton Eliasson wrote:
>
> [snip]
>> Yes. Here's another huge kernel.log for you [2]. It's 19 MB compressed
>> and 282 MB uncompressed. I blanked the log while running the stock
>> kernel and then rebooted to the custom debugging kernel. X wouldn't
>> start so I just logged in to a virtual terminal, changed directory to
>> "~/Bilder/20130321-28 Jakobs bilder från Nederländerna" and then
>> executed `cat 179.JPG >/dev/null`.
>>
>> This caused a read-only remount and a bunch of "broken bmap" messages to
>> show, followed by an "Input/Output error". I saved a copy of
>> /var/log/kernel.log as soon as I could after that, before reinstalling
>> the stock kernel and rebooting.
>>
>> [1]: http://antoneliasson.se/publicdump/config.x86_64.last.20130825
>> [2]: http://antoneliasson.se/publicdump/kernel.log.20130825.gz
>>
> Yes, it's great. Thank you. Now I can investigate the issue's
> environment. I suspect that this issue is related to the issue with
> "unable to handle kernel paging request" in nilfs_end_page_io(). But,
> maybe, I am wrong. Anyway, it is a good basis for more detailed
> understanding of the issue.
>
> Thanks,
> Vyacheslav Dubeyko.
>
>
You're welcome. And thank you for your thorough instructions. It's been 
very informative and worthwhile for me to patch and build a kernel with 
custom options. Let me know if you need more experiments run on the 
damaged filesystem. Otherwise I'll delete the stored disk images in a 
month or two.

-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nilfs2 crash debugging
       [not found]                           ` <521BA084.80901-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-08-30  5:58                             ` Vyacheslav Dubeyko
  2013-09-04 19:39                               ` Anton Eliasson
  0 siblings, 1 reply; 12+ messages in thread
From: Vyacheslav Dubeyko @ 2013-08-30  5:58 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Mon, 2013-08-26 at 20:37 +0200, Anton Eliasson wrote:

> >
> >
> You're welcome. And thank you for your thorough instructions. It's been 
> very informative and worthwhile for me to patch and build a kernel with 
> custom options. Let me know if you need more experiments run on the 
> damaged filesystem. Otherwise I'll delete the stored disk images in a 
> month or two.
> 

As I remember, you reproduced the issue by means of starting of two
virtual machines. I think that I will try to reproduce the issue by this
way. But I am investigating the another issue currently and,
unfortunately, I haven't opportunities for investigating this issue in
parallel.

I don't fully confident that it is possible to do it. But, does it
possible to collect strace output of virtual machines starting for the
case of reproduced issue? What do you think? You have shared kernel log
for the reproduced issue case, currently. But strace output can give
interesting details from the user-space point of view.

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nilfs2 crash debugging
  2013-08-30  5:58                             ` Vyacheslav Dubeyko
@ 2013-09-04 19:39                               ` Anton Eliasson
       [not found]                                 ` <52278C63.6090303-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Anton Eliasson @ 2013-09-04 19:39 UTC (permalink / raw)
  To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Vyacheslav Dubeyko skrev 2013-08-30 07:58:
> On Mon, 2013-08-26 at 20:37 +0200, Anton Eliasson wrote:
>
>>>
>> You're welcome. And thank you for your thorough instructions. It's been
>> very informative and worthwhile for me to patch and build a kernel with
>> custom options. Let me know if you need more experiments run on the
>> damaged filesystem. Otherwise I'll delete the stored disk images in a
>> month or two.
>>
> As I remember, you reproduced the issue by means of starting of two
> virtual machines. I think that I will try to reproduce the issue by this
> way. But I am investigating the another issue currently and,
> unfortunately, I haven't opportunities for investigating this issue in
> parallel.
>
> I don't fully confident that it is possible to do it. But, does it
> possible to collect strace output of virtual machines starting for the
> case of reproduced issue? What do you think? You have shared kernel log
> for the reproduced issue case, currently. But strace output can give
> interesting details from the user-space point of view.
>
> Thanks,
> Vyacheslav Dubeyko.
>
>
I spent about an hour trying to reproduce this today. I built Linux 
3.10.10 using your patches from June. The patch command reported some 
offsets and fuzz so it seems that the nilfs driver has changed since the 
last kernel version. I don't know if the updates affect this bug. With 
this new cusom kernel, everything I/O related ran very slowly. The nilfs 
garbage collector used 100 % CPU constantly. Killing it sped things up a 
little.

I started and stopped the virtual machines a few times, with reboots in 
between. Eventually the system tried to touch some corrupted parts of 
the virtual machine image and /home remounted read-only. At that point I 
gave up. I doubt the strace output will help you but I uploaded it here 
[1] anyway. VMware Workstation is a complex application that consists of 
many executables. Some are run directly by the user, some as system 
services and some as kernel modules. Picking the right place to stick 
the multimeter probe is probably difficult.

Unfortunately I forgot to install syslog-ng today and my instance of 
systemd is not configured to log verbosely enough to capture the kernel 
debug output. So no kernel.log for today. This is all starting to feel 
like a waste of time for me as I don't even use nilfs on any of my 
machines anymore. I'm going to withdraw my offer to debug these issues 
any further. Sorry. I hope you have gathered enough information to solve 
them and I wish you the best of luck.

[1]: http://antoneliasson.se/publicdump/vmware-strace.log.gz

-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nilfs2 crash debugging
       [not found]                                 ` <52278C63.6090303-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-09-04 20:00                                   ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 12+ messages in thread
From: Vyacheslav Dubeyko @ 2013-09-04 20:00 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Anton,

On Sep 4, 2013, at 11:39 PM, Anton Eliasson wrote:

[snip]
>> 
> I spent about an hour trying to reproduce this today. I built Linux 3.10.10 using your patches from June. The patch command reported some offsets and fuzz so it seems that the nilfs driver has changed since the last kernel version. I don't know if the updates affect this bug. With this new cusom kernel, everything I/O related ran very slowly. The nilfs garbage collector used 100 % CPU constantly. Killing it sped things up a little.
> 
> I started and stopped the virtual machines a few times, with reboots in between. Eventually the system tried to touch some corrupted parts of the virtual machine image and /home remounted read-only. At that point I gave up. I doubt the strace output will help you but I uploaded it here [1] anyway. VMware Workstation is a complex application that consists of many executables. Some are run directly by the user, some as system services and some as kernel modules. Picking the right place to stick the multimeter probe is probably difficult.
> 
> Unfortunately I forgot to install syslog-ng today and my instance of systemd is not configured to log verbosely enough to capture the kernel debug output. So no kernel.log for today. This is all starting to feel like a waste of time for me as I don't even use nilfs on any of my machines anymore. I'm going to withdraw my offer to debug these issues any further. Sorry. I hope you have gathered enough information to solve them and I wish you the best of luck.
> 
> [1]: http://antoneliasson.se/publicdump/vmware-strace.log.gz
> 

Thank you for your efforts.

I think that I have discovered the reason of all issues that you were reported.
I posted the patch ([PATCH] [CRITICAL] nilfs2: fix issue with race condition
of competition between segments for dirty blocks) in Monday. Currently, this
patch is under discussion. So, I hope that NILFS2 will be more stable.

Anyway, your reports were very important for finding the reason of issues
and fix elaboration.

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-09-04 20:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <51F2A8A4.4020400@antoneliasson.se>
     [not found] ` <51F2A8A4.4020400-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-07-26 16:52   ` Fwd: Re: Broken nilfs2 filesystem Anton Eliasson
     [not found]     ` <51F2A945.6050909-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-07-27 16:23       ` Vyacheslav Dubeyko
     [not found]         ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-07-27 22:32           ` Anton Eliasson
2013-08-15 10:40           ` Nilfs2 crash debugging (was: Broken nilfs2 filesystem) Anton Eliasson
     [not found]             ` <520CB032.2000602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-08-16  7:11               ` Vyacheslav Dubeyko
2013-08-19 19:55               ` Vyacheslav Dubeyko
     [not found]                 ` <FEA41B6A-7D82-4563-AAF5-D5AFA3734D79-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-08-25 15:02                   ` Nilfs2 crash debugging Anton Eliasson
     [not found]                     ` <521A1C88.9080100-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-08-26  9:56                       ` Vyacheslav Dubeyko
2013-08-26 18:37                         ` Anton Eliasson
     [not found]                           ` <521BA084.80901-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-08-30  5:58                             ` Vyacheslav Dubeyko
2013-09-04 19:39                               ` Anton Eliasson
     [not found]                                 ` <52278C63.6090303-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-09-04 20:00                                   ` Vyacheslav Dubeyko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.