All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Zefan <lizefan@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: qixuan wu <wuqixuan@gmail.com>, Tao Ma <tm@tao.ma>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Eric Sandeen <sandeen@redhat.com>,
	Yafang Shao <laoar.shao@gmail.com>,
	<linux-fsdevel@vger.kernel.org>, <linux-ext4@vger.kernel.org>,
	<wuqixuan@huawei.com>, <xieshuangyi@huawei.com>
Subject: Re: help about ext3 read-only issue on ext3(2.6.16.30)
Date: Wed, 12 Dec 2012 19:31:08 +0800	[thread overview]
Message-ID: <50C86AFC.7080301@huawei.com> (raw)
In-Reply-To: <20121212100444.GB18885@quack.suse.cz>

On 2012/12/12 18:04, Jan Kara wrote:
> On Tue 11-12-12 16:01:51, Li Zefan wrote:
>>>>> We have already dump of the data by debugfs. The data is very good
>>>>> without error. But we just did it before fsck, even the fsck is not
>>>>> giving any error. I want to know whether fsck will modify disk data
>>>>> without reporting any error or not ?
>>>>   Ah, OK. So it seems that directory block is OK, just  f_pos gets corrupted
>>>> somehow. There are guards in ext3_readdir() to rescan dir block when
>>>> directory is modified but maybe that's not working correctly. I don't want
>>>> to burn too much time on this since this is so ancient kernel but I'd be
>>>> looking in that direction...
>>>>
>>>
>>> I've added some debug code into ext3, which does these things:
>>> - dump the dir block
>>> - print the current and last f_pos and offset
>>> - dump_stack() to see which process triggers the bug
>>>
>>> Hope we can trigger the bug in our labs (We did see this happened twice this week
>>> in a lab), though we can't patch the kernel in the products.
>>>
>>> I compared ext3_readdir() with latest ext3, and saw no difference except some
>>> API changes. I'll dig deeper. Thansks for the suggestion!
>>>
>>
>> We've managed to trigger the bug once, and collected some debug information. We
>> found the buffer head wasn't corrupted, but f_pos was set to 4024 and then ext3
>> reported error.
>>
>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #12747345: rec_len is smaller than minimal - offset=4024, inode=0, rec_len=0, name_len=0
>> Aborting journal on device sda7.
>> ext3_abort called.
>> EXT3-fs error (device sda7): ext3_journal_start_sb: Detected aborted journal
>> Remounting filesystem read-only
>>
>> 00000000: 51 82 c2 00 0c 00 01 02 2e 00 00 00 04 80 c2 00  Q...............
>> 00000010: 0c 00 02 02 2e 2e 00 00 d6 80 c2 00 10 00 06 02  ................
>> 00000020: 62 61 63 6b 75 70 00 00 bb 82 c2 00 1c 00 11 01  backup..........
>> 00000030: 4d 6f 6e 69 74 6f 72 53 65 72 76 69 63 65 2e 6f  MonitorService.o
>> 00000040: 70 00 00 00 be 82 c2 00 1c 00 13 01 43 6f 6d 70  p...........Comp
>> 00000050: 6c 61 69 6e 74 50 72 6f 63 65 73 73 2e 6f 70 00  laintProcess.op.
>> 00000060: c2 82 c2 00 20 00 15 01 4c 6f 63 61 74 69 6f 6e  .... ...Location
>> 00000070: 50 72 65 50 72 6f 63 65 73 73 2e 6f 70 00 00 00  PreProcess.op...
>> 00000080: c9 82 c2 00 18 00 0f 01 4e 6f 72 74 68 50 72 6f  ........NorthPro
>> 00000090: 63 65 73 73 2e 6f 70 00 d4 82 c2 00 18 00 0d 01  cess.op.........
>> 000000a0: 53 79 73 4d 6f 6e 69 74 6f 72 2e 6f 70 00 00 00  SysMonitor.op...
>> 000000b0: db 82 c2 00 1c 00 13 01 56 56 49 50 4e 6f 72 74  ........VVIPNort
>> 000000c0: 68 50 72 6f 63 65 73 73 2e 6f 70 00 e1 82 c2 00  hProcess.op.....
>> 000000d0: 34 0f 09 01 72 61 6e 73 61 75 2e 6f 70 00 00 00  4...ransau.op...
>> 000000e0: 4f 83 c2 00 20 0f 1e 01 72 61 6e 73 61 75 2e 6f  O... ...ransau.o
>> 000000f0: 70 2e 32 30 31 32 31 32 31 30 30 32 30 39 32 34  p.20121210020924
>> 00000100: 34 35 31 33 39 34 00 00 79 83 c2 00 f8 0e 18 01  451394..y.......
>> 00000110: 72 61 6e 73 61 75 2e 6f 70 2e 32 30 31 32 31 32  ransau.op.201212
>> 00000120: 31 30 30 32 30 39 32 34 00 00 00 00 00 00 00 00  10020924........
>> ...
>> 00000ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>
>> last_offset=-1, last_fpos=-1, f_pos=4024
>>
>> -1 means we hit the bug in the first iteration in the insde while in
>> ext3_readdir().
>>
>> I've checked how ext3_readdir() works and how f_pos, f_version and i_version
>> get initialized and modified. Now I'm lost. I really can't see how f_pos got
>> corrupted. :(
>   Hum, it looks really curious. So f_pos has been 4024 when we entered
> ext3_readdir()?

dunno. but what else can be

> Do you know what it was when we last left ext3_readdir()
> for that filp? You can store that value in some debug entry added to struct
> file... Also any chance we ever hit:
>                                 if (version != filp->f_version)
>                                         goto revalidate;
> I don't think it can ever happen since we hold i_mutex and
> generic_file_llseek() takes i_mutex as well. But better be sure.
> 

Yesterday I've added more debug aids, which convers all the above information
mentioned. Actually the code tracks all the places that change f_pos, and
I think only lseek() and readdir() can change it.

Now I'm waiting for the bug to happen again, can be several days...


  reply	other threads:[~2012-12-12 11:31 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-01 14:22 help about ext3 read-only issue on ext3(2.6.16.30) Yafang Shao
2012-12-03 17:59 ` Eric Sandeen
2012-12-04 13:54   ` Li Zefan
2012-12-04 15:09     ` Theodore Ts'o
2012-12-05 10:43       ` Li Zefan
2012-12-05 14:26         ` Tao Ma
2012-12-05 15:51           ` qixuan wu
2012-12-06  1:13           ` Li Zefan
2012-12-06 12:37             ` Jan Kara
2012-12-06 16:21               ` qixuan wu
2012-12-06 17:09                 ` Jan Kara
2012-12-07 10:03                   ` Li Zefan
2012-12-11  8:01                     ` Li Zefan
2012-12-12 10:04                       ` Jan Kara
2012-12-12 11:31                         ` Li Zefan [this message]
2012-12-14  3:32                           ` Peng, Tao
2012-12-17 10:51                           ` Li Zefan
2012-12-20 11:32                             ` Jan Kara
2013-02-12 12:19                               ` Jan Kara
2012-12-04 15:29     ` Tao Ma
2012-12-04 16:11       ` Bernd Schubert
2012-12-04 20:20         ` Theodore Ts'o
2012-12-04 16:16       ` qixuan wu
2012-12-04 20:45         ` Theodore Ts'o
2012-12-05 13:58         ` Tao Ma
2012-12-05 15:05           ` Theodore Ts'o
2012-12-06  1:54             ` Tao Ma
2012-12-06 15:48               ` qixuan wu
2012-12-05 15:46           ` qixuan wu
2012-12-06  2:58             ` Yongqiang Yang
2012-12-06 16:26               ` qixuan wu
2012-12-07  1:49                 ` Yongqiang Yang
2012-12-05 10:46       ` Li Zefan
2012-12-05 14:02         ` Tao Ma
2012-12-06  1:17           ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C86AFC.7080301@huawei.com \
    --to=lizefan@huawei.com \
    --cc=jack@suse.cz \
    --cc=laoar.shao@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tm@tao.ma \
    --cc=tytso@mit.edu \
    --cc=wuqixuan@gmail.com \
    --cc=wuqixuan@huawei.com \
    --cc=xieshuangyi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.