All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Zefan <lizefan@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: qixuan wu <wuqixuan@gmail.com>, Tao Ma <tm@tao.ma>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Eric Sandeen <sandeen@redhat.com>,
	Yafang Shao <laoar.shao@gmail.com>,
	<linux-fsdevel@vger.kernel.org>, <linux-ext4@vger.kernel.org>,
	<wuqixuan@huawei.com>, <xieshuangyi@huawei.com>
Subject: Re: help about ext3 read-only issue on ext3(2.6.16.30)
Date: Tue, 11 Dec 2012 16:01:51 +0800	[thread overview]
Message-ID: <50C6E86F.8040308@huawei.com> (raw)
In-Reply-To: <50C1BF05.6020605@huawei.com>

>>> We have already dump of the data by debugfs. The data is very good
>>> without error. But we just did it before fsck, even the fsck is not
>>> giving any error. I want to know whether fsck will modify disk data
>>> without reporting any error or not ?
>>   Ah, OK. So it seems that directory block is OK, just  f_pos gets corrupted
>> somehow. There are guards in ext3_readdir() to rescan dir block when
>> directory is modified but maybe that's not working correctly. I don't want
>> to burn too much time on this since this is so ancient kernel but I'd be
>> looking in that direction...
>>
> 
> I've added some debug code into ext3, which does these things:
> - dump the dir block
> - print the current and last f_pos and offset
> - dump_stack() to see which process triggers the bug
> 
> Hope we can trigger the bug in our labs (We did see this happened twice this week
> in a lab), though we can't patch the kernel in the products.
> 
> I compared ext3_readdir() with latest ext3, and saw no difference except some
> API changes. I'll dig deeper. Thansks for the suggestion!
> 

We've managed to trigger the bug once, and collected some debug information. We
found the buffer head wasn't corrupted, but f_pos was set to 4024 and then ext3
reported error.

EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #12747345: rec_len is smaller than minimal - offset=4024, inode=0, rec_len=0, name_len=0
Aborting journal on device sda7.
ext3_abort called.
EXT3-fs error (device sda7): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only

00000000: 51 82 c2 00 0c 00 01 02 2e 00 00 00 04 80 c2 00  Q...............
00000010: 0c 00 02 02 2e 2e 00 00 d6 80 c2 00 10 00 06 02  ................
00000020: 62 61 63 6b 75 70 00 00 bb 82 c2 00 1c 00 11 01  backup..........
00000030: 4d 6f 6e 69 74 6f 72 53 65 72 76 69 63 65 2e 6f  MonitorService.o
00000040: 70 00 00 00 be 82 c2 00 1c 00 13 01 43 6f 6d 70  p...........Comp
00000050: 6c 61 69 6e 74 50 72 6f 63 65 73 73 2e 6f 70 00  laintProcess.op.
00000060: c2 82 c2 00 20 00 15 01 4c 6f 63 61 74 69 6f 6e  .... ...Location
00000070: 50 72 65 50 72 6f 63 65 73 73 2e 6f 70 00 00 00  PreProcess.op...
00000080: c9 82 c2 00 18 00 0f 01 4e 6f 72 74 68 50 72 6f  ........NorthPro
00000090: 63 65 73 73 2e 6f 70 00 d4 82 c2 00 18 00 0d 01  cess.op.........
000000a0: 53 79 73 4d 6f 6e 69 74 6f 72 2e 6f 70 00 00 00  SysMonitor.op...
000000b0: db 82 c2 00 1c 00 13 01 56 56 49 50 4e 6f 72 74  ........VVIPNort
000000c0: 68 50 72 6f 63 65 73 73 2e 6f 70 00 e1 82 c2 00  hProcess.op.....
000000d0: 34 0f 09 01 72 61 6e 73 61 75 2e 6f 70 00 00 00  4...ransau.op...
000000e0: 4f 83 c2 00 20 0f 1e 01 72 61 6e 73 61 75 2e 6f  O... ...ransau.o
000000f0: 70 2e 32 30 31 32 31 32 31 30 30 32 30 39 32 34  p.20121210020924
00000100: 34 35 31 33 39 34 00 00 79 83 c2 00 f8 0e 18 01  451394..y.......
00000110: 72 61 6e 73 61 75 2e 6f 70 2e 32 30 31 32 31 32  ransau.op.201212
00000120: 31 30 30 32 30 39 32 34 00 00 00 00 00 00 00 00  10020924........
...
00000ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

last_offset=-1, last_fpos=-1, f_pos=4024

-1 means we hit the bug in the first iteration in the insde while in ext3_readdir().

I've checked how ext3_readdir() works and how f_pos, f_version and i_version
get initialized and modified. Now I'm lost. I really can't see how f_pos got
corrupted. :(

A sample strace output:

10062 getdents64(46, /* 4 entries */, 4096) = 136
10062 stat64("/xxx/current_log.txt", {st_mode=S_IFREG|0600, st_size=4436494, ...}) = 0
10062 stat64("/xxx/20121205054350.593907.txt", {st_mode=S_IFREG|0600, st_size=8388846, ...}) = 0
10062 getdents64(46, /* 0 entries */, 4096) = 0

In the second call to getdents, f_pos should be 4096, but somehow it was
changed to 4024? but how...

Any hints?


  reply	other threads:[~2012-12-11  8:01 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-01 14:22 help about ext3 read-only issue on ext3(2.6.16.30) Yafang Shao
2012-12-03 17:59 ` Eric Sandeen
2012-12-04 13:54   ` Li Zefan
2012-12-04 15:09     ` Theodore Ts'o
2012-12-05 10:43       ` Li Zefan
2012-12-05 14:26         ` Tao Ma
2012-12-05 15:51           ` qixuan wu
2012-12-06  1:13           ` Li Zefan
2012-12-06 12:37             ` Jan Kara
2012-12-06 16:21               ` qixuan wu
2012-12-06 17:09                 ` Jan Kara
2012-12-07 10:03                   ` Li Zefan
2012-12-11  8:01                     ` Li Zefan [this message]
2012-12-12 10:04                       ` Jan Kara
2012-12-12 11:31                         ` Li Zefan
2012-12-14  3:32                           ` Peng, Tao
2012-12-17 10:51                           ` Li Zefan
2012-12-20 11:32                             ` Jan Kara
2013-02-12 12:19                               ` Jan Kara
2012-12-04 15:29     ` Tao Ma
2012-12-04 16:11       ` Bernd Schubert
2012-12-04 20:20         ` Theodore Ts'o
2012-12-04 16:16       ` qixuan wu
2012-12-04 20:45         ` Theodore Ts'o
2012-12-05 13:58         ` Tao Ma
2012-12-05 15:05           ` Theodore Ts'o
2012-12-06  1:54             ` Tao Ma
2012-12-06 15:48               ` qixuan wu
2012-12-05 15:46           ` qixuan wu
2012-12-06  2:58             ` Yongqiang Yang
2012-12-06 16:26               ` qixuan wu
2012-12-07  1:49                 ` Yongqiang Yang
2012-12-05 10:46       ` Li Zefan
2012-12-05 14:02         ` Tao Ma
2012-12-06  1:17           ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C6E86F.8040308@huawei.com \
    --to=lizefan@huawei.com \
    --cc=jack@suse.cz \
    --cc=laoar.shao@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tm@tao.ma \
    --cc=tytso@mit.edu \
    --cc=wuqixuan@gmail.com \
    --cc=wuqixuan@huawei.com \
    --cc=xieshuangyi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.