From: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
To: Ryusuke Konishi
<konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Cc: Anton Eliasson <devel-17Olwe7vw2dLC78zk6coLg@public.gmane.org>,
linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Broken nilfs2 filesystem
Date: Thu, 06 Jun 2013 10:56:09 +0400 [thread overview]
Message-ID: <1370501769.2202.36.camel@slavad-ubuntu> (raw)
In-Reply-To: <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
On Thu, 2013-05-30 at 15:55 +0900, Ryusuke Konishi wrote:
> On Thu, 30 May 2013 10:13:05 +0400, Vyacheslav Dubeyko wrote:
> > On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote:
> >> I don't know whether this may be a hint of this trouble, but according
> >> to the system log, page_buffers() of nilfs_end_page_io() seems to hit
> >> an Oops due to an invalid page address "0x36cd":
> >>
> >
> > Yes. There are two possible way to be in nilfs_end_page_io(): (1)
> > nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I
> > suspect the nilfs_abort_logs()
>
> That sounds a likely cause.
>
> Can you test nilfs_abort_logs by injecting a random fault in some easy
> way ?
>
So, what I discovered currently.
First of all, unfortunately, I can't reproduce the issue yet, currently.
I suspect that in this issue the aging state of volume, peculiarity of
workload and environment play very important role. As I remember, all
reporters of likewise symptoms (broken bnode error messages) talked
about several months of successful working of NILFS2 file system.
I tried to make LVM environment as it was described by Anton. But I
didn't catch the issue in this environment. So, I think that I haven't
properly aged NILFS2 volume state and I tried not proper workload. It
needs to think about proper workload more deeply. As I can see from
Anton's system log that it took place frequent update and git activity.
Moreover, update and git were nearly before crash:
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (37 782 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (38 390 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (39 066 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (39 742 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (40 311 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (40 956 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:45] Downloading update (41 158 of 41 158 KB)...
May 22 18:50:13 riven slim[274]: [2013-05-22 18:48:45] Downl18:50:13 | Git | default | Checking for remote changes...
May 22 18:50:13 riven slim[274]: 18:50:13 | Cmd | default | git rev-parse HEAD
May 22 18:50:13 riven slim[274]: 18:50:13 | Cmd | default | git ls-remote --heads --exit-code "ssh://storage@hephaestus/home/storage/default" master
May 22 18:50:13 riven slim[274]: 18:50:13 | Git | default | No remote changes, local+remote: 8eab1e96aa618010ff17c11a955f4423d823beb6
May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Pinging tcp://notifications.sparkleshare.org:443/
May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Received pong from tcp://notifications.sparkleshare.org:443/
May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd
May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
So, maybe, git activity is a possible workload for the issue
reproducing. It needs to check it, I suppose.
I tried to simulate errors occurrence in nilfs_segctor_do_construct()
method by means of excluding of error checking in places:
http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1942
http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1953
http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1962
http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1976
http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1989
Initially, by chance, I simply comment error checking statement. Then, I
comment error checking statement and additionally set code error by
-EINVAL. It is strange but if I set error code then I haven't any
visible failure in working of NILFS2 driver. But I have very interesting
error in the case when I simply comment error checking statement without
setting code error:
May 31 15:05:49 slavad-ubuntu nilfs_cleanerd[2409]: run (manual)
May 31 15:05:50 slavad-ubuntu kernel: [ 737.725827] [nilfs_segctor_do_construct] fs/nilfs2/segment.c:1944
May 31 15:05:50 slavad-ubuntu nilfs_cleanerd[2409]: cannot clean segments: File exists
May 31 15:05:50 slavad-ubuntu nilfs_cleanerd[2409]: shutdown
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744660] ------------[ cut here ]------------
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744674] WARNING: at fs/nilfs2/ioctl.c:449 nilfs_ioctl_clean_segments.isra.11+0x667/0x690()
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744676] Hardware name: OptiPlex 760
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744679] Modules linked in: snd_hda_codec_analog snd_hda_intel i915 snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event bnep rfcomm snd_seq drm_kms_helper drm bluetooth nfsv4 snd_timer snd_seq_device i2c_algo_bit snd joydev hid_generic soundcore dell_wmi video dcdbas coretemp psmouse serio_raw mei sparse_keymap ppdev snd_page_alloc lpc_ich mac_hid parport_pc microcode wmi lp parport binfmt_misc nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc e1000e ptp pps_core usbhid hid
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744746] Pid: 2409, comm: nilfs_cleanerd Tainted: G I 3.9.0-rc6+ #35
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744748] Call Trace:
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744756] [<ffffffff8105c7df>] warn_slowpath_common+0x7f/0xc0
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744760] [<ffffffff8105c83a>] warn_slowpath_null+0x1a/0x20
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744765] [<ffffffff81301837>] nilfs_ioctl_clean_segments.isra.11+0x667/0x690
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744771] [<ffffffff81098f0f>] ? local_clock+0x6f/0x80
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744776] [<ffffffff81301e44>] nilfs_ioctl+0x3d4/0x690
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744781] [<ffffffff810c370f>] ? lock_release_non_nested+0x30f/0x350
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744785] [<ffffffff81098ca5>] ? sched_clock_local+0x25/0x90
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744790] [<ffffffff811b7e26>] do_vfs_ioctl+0x96/0x570
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744795] [<ffffffff81169e4c>] ? might_fault+0x5c/0xb0
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744801] [<ffffffff81748985>] ? sysret_check+0x22/0x5d
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744805] [<ffffffff811b8391>] sys_ioctl+0x91/0xb0
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744809] [<ffffffff813a70be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744813] [<ffffffff81748959>] system_call_fastpath+0x16/0x1b
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744816] ---[ end trace 374fc1d251cc46c6 ]---
May 31 15:05:50 slavad-ubuntu kernel: [ 737.744933] NILFS: GC failed during preparation: cannot read source blocks: err=-17
May 31 15:09:44 slavad-ubuntu kernel: [ 972.324583] [nilfs_segctor_do_construct] fs/nilfs2/segment.c:1944
May 31 15:09:49 slavad-ubuntu kernel: [ 977.349257] [nilfs_segctor_do_construct] fs/nilfs2/segment.c:1944
May 31 15:11:57 slavad-ubuntu nilfs_cleanerd[2820]: start
May 31 15:11:57 slavad-ubuntu nilfs_cleanerd[2820]: pause (clean check)
May 31 15:12:08 slavad-ubuntu nilfs_cleanerd[2820]: run (manual)
May 31 15:12:08 slavad-ubuntu nilfs_cleanerd[2820]: cannot clean segments: File exists
May 31 15:12:08 slavad-ubuntu nilfs_cleanerd[2820]: shutdown
May 31 15:12:08 slavad-ubuntu kernel: [ 1115.562880] nilfs_ioctl_move_inode_block: conflicting data buffer: ino=4, cno=0, offset=0, blocknr=2086, vblocknr=232528
May 31 15:12:08 slavad-ubuntu kernel: [ 1115.562887] NILFS: GC failed during preparation: cannot read source blocks: err=-17
As I understand, this error looks like last Anton's reports about
complete failure of using the corrupted NILFS2 volume. So, maybe, it is
possible to make assumption that it takes place continuous and permanent
segments construction abortion in the case of the issue. But simulation
by means of commenting error checking statement without setting code
error is not proper driver's workflow, as I understand. And it confuses
me. Currently, I haven't clear understanding of it.
So, it needs to continue investigation of the issue further, from my
viewpoint.
With the best regards,
Vyacheslav Dubeyko.
> Regards,
> Ryusuke Konishi
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-06-06 6:56 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-22 20:33 Broken nilfs2 filesystem Anton Eliasson
[not found] ` <519D2B96.9000106-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-22 20:36 ` Anton Eliasson
[not found] ` <519D2C32.5040600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-23 1:40 ` Ryusuke Konishi
2013-05-23 6:44 ` Vyacheslav Dubeyko
2013-05-25 11:59 ` Anton Eliasson
[not found] ` <51A0A7A0.6010207-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-25 16:26 ` Anton Eliasson
[not found] ` <51A0E62D.5060600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-26 12:54 ` Vyacheslav Dubeyko
2013-05-29 6:39 ` Vyacheslav Dubeyko
2013-05-29 14:37 ` Ryusuke Konishi
[not found] ` <20130529.233757.27789741.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-05-30 6:13 ` Vyacheslav Dubeyko
2013-05-30 6:55 ` Ryusuke Konishi
[not found] ` <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-05-30 7:21 ` Vyacheslav Dubeyko
2013-06-06 6:56 ` Vyacheslav Dubeyko [this message]
2013-06-06 9:20 ` Reinoud Zandijk
[not found] ` <20130606092054.GA201-HNv6YvNvQKMNqjISwOrxaLFspR4gePGN@public.gmane.org>
2013-06-06 9:34 ` Vyacheslav Dubeyko
2013-06-06 14:19 ` Reinoud Zandijk
2013-06-12 20:12 ` Anton Eliasson
2013-06-12 20:31 ` Anton Eliasson
[not found] ` <51B8DA8E.6020802-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-06-13 10:01 ` Vyacheslav Dubeyko
2013-05-30 8:10 ` Anton Eliasson
[not found] ` <51A70971.40602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-30 15:30 ` Anton Eliasson
[not found] ` <51A770A8.9070105-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-30 20:50 ` Anton Eliasson
[not found] ` <51A7BB84.3010505-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-31 6:39 ` Vyacheslav Dubeyko
[not found] <51A0A97A.4020503@antoneliasson.se>
[not found] ` <713B7146-DC0C-45AE-9ED2-30EB8F84FA57@dubeyko.com>
[not found] ` <713B7146-DC0C-45AE-9ED2-30EB8F84FA57-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-05-27 12:45 ` Anton Eliasson
[not found] ` <51A35558.1080503-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-27 13:23 ` Vyacheslav Dubeyko
[not found] <51F2A8A4.4020400@antoneliasson.se>
2013-07-26 16:52 ` Fwd: " Anton Eliasson
[not found] ` <51F2A945.6050909-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-07-27 16:23 ` Vyacheslav Dubeyko
[not found] ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-07-27 22:32 ` Anton Eliasson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1370501769.2202.36.camel@slavad-ubuntu \
--to=slava-yeenwd64clxbdgjk7y7tuq@public.gmane.org \
--cc=devel-17Olwe7vw2dLC78zk6coLg@public.gmane.org \
--cc=konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org \
--cc=linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.