Re: help about ext3 read-only issue on ext3(2.6.16.30)

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Li Zefan <lizefan@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: qixuan wu <wuqixuan@gmail.com>, Tao Ma <tm@tao.ma>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Eric Sandeen <sandeen@redhat.com>,
	Yafang Shao <laoar.shao@gmail.com>,
	<linux-fsdevel@vger.kernel.org>, <linux-ext4@vger.kernel.org>,
	<wuqixuan@huawei.com>, <xieshuangyi@huawei.com>
Subject: Re: help about ext3 read-only issue on ext3(2.6.16.30)
Date: Wed, 12 Dec 2012 19:31:08 +0800	[thread overview]
Message-ID: <50C86AFC.7080301@huawei.com> (raw)
In-Reply-To: <20121212100444.GB18885@quack.suse.cz>

On 2012/12/12 18:04, Jan Kara wrote:
> On Tue 11-12-12 16:01:51, Li Zefan wrote:
>>>>> We have already dump of the data by debugfs. The data is very good
>>>>> without error. But we just did it before fsck, even the fsck is not
>>>>> giving any error. I want to know whether fsck will modify disk data
>>>>> without reporting any error or not ?
>>>>   Ah, OK. So it seems that directory block is OK, just  f_pos gets corrupted
>>>> somehow. There are guards in ext3_readdir() to rescan dir block when
>>>> directory is modified but maybe that's not working correctly. I don't want
>>>> to burn too much time on this since this is so ancient kernel but I'd be
>>>> looking in that direction...
>>>>
>>>
>>> I've added some debug code into ext3, which does these things:
>>> - dump the dir block
>>> - print the current and last f_pos and offset
>>> - dump_stack() to see which process triggers the bug
>>>
>>> Hope we can trigger the bug in our labs (We did see this happened twice this week
>>> in a lab), though we can't patch the kernel in the products.
>>>
>>> I compared ext3_readdir() with latest ext3, and saw no difference except some
>>> API changes. I'll dig deeper. Thansks for the suggestion!
>>>
>>
>> We've managed to trigger the bug once, and collected some debug information. We
>> found the buffer head wasn't corrupted, but f_pos was set to 4024 and then ext3
>> reported error.
>>
>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #12747345: rec_len is smaller than minimal - offset=4024, inode=0, rec_len=0, name_len=0
>> Aborting journal on device sda7.
>> ext3_abort called.
>> EXT3-fs error (device sda7): ext3_journal_start_sb: Detected aborted journal
>> Remounting filesystem read-only
>>
>> 00000000: 51 82 c2 00 0c 00 01 02 2e 00 00 00 04 80 c2 00  Q...............
>> 00000010: 0c 00 02 02 2e 2e 00 00 d6 80 c2 00 10 00 06 02  ................
>> 00000020: 62 61 63 6b 75 70 00 00 bb 82 c2 00 1c 00 11 01  backup..........
>> 00000030: 4d 6f 6e 69 74 6f 72 53 65 72 76 69 63 65 2e 6f  MonitorService.o
>> 00000040: 70 00 00 00 be 82 c2 00 1c 00 13 01 43 6f 6d 70  p...........Comp
>> 00000050: 6c 61 69 6e 74 50 72 6f 63 65 73 73 2e 6f 70 00  laintProcess.op.
>> 00000060: c2 82 c2 00 20 00 15 01 4c 6f 63 61 74 69 6f 6e  .... ...Location
>> 00000070: 50 72 65 50 72 6f 63 65 73 73 2e 6f 70 00 00 00  PreProcess.op...
>> 00000080: c9 82 c2 00 18 00 0f 01 4e 6f 72 74 68 50 72 6f  ........NorthPro
>> 00000090: 63 65 73 73 2e 6f 70 00 d4 82 c2 00 18 00 0d 01  cess.op.........
>> 000000a0: 53 79 73 4d 6f 6e 69 74 6f 72 2e 6f 70 00 00 00  SysMonitor.op...
>> 000000b0: db 82 c2 00 1c 00 13 01 56 56 49 50 4e 6f 72 74  ........VVIPNort
>> 000000c0: 68 50 72 6f 63 65 73 73 2e 6f 70 00 e1 82 c2 00  hProcess.op.....
>> 000000d0: 34 0f 09 01 72 61 6e 73 61 75 2e 6f 70 00 00 00  4...ransau.op...
>> 000000e0: 4f 83 c2 00 20 0f 1e 01 72 61 6e 73 61 75 2e 6f  O... ...ransau.o
>> 000000f0: 70 2e 32 30 31 32 31 32 31 30 30 32 30 39 32 34  p.20121210020924
>> 00000100: 34 35 31 33 39 34 00 00 79 83 c2 00 f8 0e 18 01  451394..y.......
>> 00000110: 72 61 6e 73 61 75 2e 6f 70 2e 32 30 31 32 31 32  ransau.op.201212
>> 00000120: 31 30 30 32 30 39 32 34 00 00 00 00 00 00 00 00  10020924........
>> ...
>> 00000ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>
>> last_offset=-1, last_fpos=-1, f_pos=4024
>>
>> -1 means we hit the bug in the first iteration in the insde while in
>> ext3_readdir().
>>
>> I've checked how ext3_readdir() works and how f_pos, f_version and i_version
>> get initialized and modified. Now I'm lost. I really can't see how f_pos got
>> corrupted. :(
>   Hum, it looks really curious. So f_pos has been 4024 when we entered
> ext3_readdir()?

dunno. but what else can be

> Do you know what it was when we last left ext3_readdir()
> for that filp? You can store that value in some debug entry added to struct
> file... Also any chance we ever hit:
>                                 if (version != filp->f_version)
>                                         goto revalidate;
> I don't think it can ever happen since we hold i_mutex and
> generic_file_llseek() takes i_mutex as well. But better be sure.
> 

Yesterday I've added more debug aids, which convers all the above information
mentioned. Actually the code tracks all the places that change f_pos, and
I think only lseek() and readdir() can change it.

Now I'm waiting for the bug to happen again, can be several days...

next prev parent reply	other threads:[~2012-12-12 11:32 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-01 14:22 help about ext3 read-only issue on ext3(2.6.16.30) Yafang Shao
2012-12-03 17:59 ` Eric Sandeen
2012-12-04 13:54   ` Li Zefan
2012-12-04 15:09     ` Theodore Ts'o
2012-12-05 10:43       ` Li Zefan
2012-12-05 14:26         ` Tao Ma
2012-12-05 15:51           ` qixuan wu
2012-12-06  1:13           ` Li Zefan
2012-12-06 12:37             ` Jan Kara
2012-12-06 16:21               ` qixuan wu
2012-12-06 17:09                 ` Jan Kara
2012-12-07 10:03                   ` Li Zefan
2012-12-11  8:01                     ` Li Zefan
2012-12-12 10:04                       ` Jan Kara
2012-12-12 11:31                         ` Li Zefan [this message]
2012-12-14  3:32                           ` Peng, Tao
2012-12-17 10:51                           ` Li Zefan
2012-12-20 11:32                             ` Jan Kara
2013-02-12 12:19                               ` Jan Kara
2012-12-04 15:29     ` Tao Ma
2012-12-04 16:11       ` Bernd Schubert
2012-12-04 20:20         ` Theodore Ts'o
2012-12-04 16:16       ` qixuan wu
2012-12-04 20:45         ` Theodore Ts'o
2012-12-05 13:58         ` Tao Ma
2012-12-05 15:05           ` Theodore Ts'o
2012-12-06  1:54             ` Tao Ma
2012-12-06 15:48               ` qixuan wu
2012-12-05 15:46           ` qixuan wu
2012-12-06  2:58             ` Yongqiang Yang
2012-12-06 16:26               ` qixuan wu
2012-12-07  1:49                 ` Yongqiang Yang
2012-12-05 10:46       ` Li Zefan
2012-12-05 14:02         ` Tao Ma
2012-12-06  1:17           ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C86AFC.7080301@huawei.com \
    --to=lizefan@huawei.com \
    --cc=jack@suse.cz \
    --cc=laoar.shao@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tm@tao.ma \
    --cc=tytso@mit.edu \
    --cc=wuqixuan@gmail.com \
    --cc=wuqixuan@huawei.com \
    --cc=xieshuangyi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).