linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Li Zefan <lizefan@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: qixuan wu <wuqixuan@gmail.com>, Tao Ma <tm@tao.ma>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Eric Sandeen <sandeen@redhat.com>,
	Yafang Shao <laoar.shao@gmail.com>,
	<linux-fsdevel@vger.kernel.org>, <linux-ext4@vger.kernel.org>,
	<wuqixuan@huawei.com>, <xieshuangyi@huawei.com>,
	<tao.peng@emc.com>
Subject: Re: help about ext3 read-only issue on ext3(2.6.16.30)
Date: Mon, 17 Dec 2012 18:51:27 +0800	[thread overview]
Message-ID: <50CEF92F.6050306@huawei.com> (raw)
In-Reply-To: <50C86AFC.7080301@huawei.com>

>>> last_offset=-1, last_fpos=-1, f_pos=4024
>>>
>>> -1 means we hit the bug in the first iteration in the insde while in
>>> ext3_readdir().
>>>
>>> I've checked how ext3_readdir() works and how f_pos, f_version and i_version
>>> get initialized and modified. Now I'm lost. I really can't see how f_pos got
>>> corrupted. :(
>>   Hum, it looks really curious. So f_pos has been 4024 when we entered
>> ext3_readdir()?
> 
> dunno. but what else can be
> 
>> Do you know what it was when we last left ext3_readdir()
>> for that filp? You can store that value in some debug entry added to struct
>> file... Also any chance we ever hit:
>>                                 if (version != filp->f_version)
>>                                         goto revalidate;
>> I don't think it can ever happen since we hold i_mutex and
>> generic_file_llseek() takes i_mutex as well. But better be sure.
>>
> 
> Yesterday I've added more debug aids, which convers all the above information
> mentioned. Actually the code tracks all the places that change f_pos, and
> I think only lseek() and readdir() can change it.
> 
> Now I'm waiting for the bug to happen again, can be several days...
> 

The bug was triggered again:

EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #9372013: rec_len is smaller than minimal - offset=4028, inode=0, rec_len=0, name_len=0

And I've confirmed f_pos=4028 when we entered ext3_readdir(), while it should be 4096.

I wrote a simple ring buffer to track operations on log dirs, and from the
ring buffer, we can see that there were no lseek, unlink, rename, etc.

This is correct:

dir=9372013, seq=1549, spot=readdir_1, f_pos=0, f_pos_delta=0
dir=9372013, seq=1550, spot=readdir_3, f_pos=0, f_pos_delta=0
dir=9372013, seq=1551, spot=readdir_5, f_pos=12, f_pos_delta=12
dir=9372013, seq=1552, spot=readdir_5, f_pos=24, f_pos_delta=12
...
dir=9372013, seq=1595, spot=readdir_5, f_pos=1488, f_pos_delta=28
dir=9372013, seq=1596, spot=readdir_5, f_pos=1516, f_pos_delta=28
dir=9372013, seq=1597, spot=readdir_5, f_pos=1556, f_pos_delta=40
dir=9372013, seq=1598, spot=readdir_5, f_pos=1584, f_pos_delta=28
...
dir=9372013, seq=1627, spot=readdir_5, f_pos=2392, f_pos_delta=28
dir=9372013, seq=1628, spot=readdir_5, f_pos=4096, f_pos_delta=1704
dir=9372013, seq=1629, spot=readdir_1, f_pos=4096, f_pos_delta=0

(readir_1 is the entry of readdir(), and readdir_3 is when we enter (f_version != i_version),
and readdir_5 is we iterate the dir block)

Then f_pos went wrong suddenly:

dir=9372013, seq=1676, spot=readdir_5, f_pos=1488, f_pos_delta=28
dir=9372013, seq=1677, spot=readdir_5, f_pos=1516, f_pos_delta=28
dir=9372013, seq=1678, spot=readdir_5, f_pos=1556, f_pos_delta=40
dir=9372013, seq=1679, spot=readdir_5, f_pos=1516, f_pos_delta=28   <-- !!!!!!!!
dir=9372013, seq=1680, spot=readdir_5, f_pos=1540, f_pos_delta=24
...
dir=9372013, seq=1708, spot=readdir_5, f_pos=2324, f_pos_delta=28
dir=9372013, seq=1709, spot=readdir_5, f_pos=4028, f_pos_delta=1704
dir=9372013, seq=1710, spot=readdir_1, f_pos=4028, f_pos_delta=0

This is odd...

While f_pos was wrong, offset is always correct, and this is not some
single-bit error in memory, so someone else changed f_pos? but we were
holding i_mutex, and we see nothing else except readdir in the ring
buffer...


  parent reply	other threads:[~2012-12-17 10:51 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-01 14:22 help about ext3 read-only issue on ext3(2.6.16.30) Yafang Shao
2012-12-03 17:59 ` Eric Sandeen
2012-12-04 13:54   ` Li Zefan
2012-12-04 15:09     ` Theodore Ts'o
2012-12-05 10:43       ` Li Zefan
2012-12-05 14:26         ` Tao Ma
2012-12-05 15:51           ` qixuan wu
2012-12-06  1:13           ` Li Zefan
2012-12-06 12:37             ` Jan Kara
2012-12-06 16:21               ` qixuan wu
2012-12-06 17:09                 ` Jan Kara
2012-12-07 10:03                   ` Li Zefan
2012-12-11  8:01                     ` Li Zefan
2012-12-12 10:04                       ` Jan Kara
2012-12-12 11:31                         ` Li Zefan
2012-12-14  3:32                           ` Peng, Tao
2012-12-17 10:51                           ` Li Zefan [this message]
2012-12-20 11:32                             ` Jan Kara
2013-02-12 12:19                               ` Jan Kara
2012-12-04 15:29     ` Tao Ma
2012-12-04 16:11       ` Bernd Schubert
2012-12-04 20:20         ` Theodore Ts'o
2012-12-04 16:16       ` qixuan wu
2012-12-04 20:45         ` Theodore Ts'o
2012-12-05 13:58         ` Tao Ma
2012-12-05 15:05           ` Theodore Ts'o
2012-12-06  1:54             ` Tao Ma
2012-12-06 15:48               ` qixuan wu
2012-12-05 15:46           ` qixuan wu
2012-12-06  2:58             ` Yongqiang Yang
2012-12-06 16:26               ` qixuan wu
2012-12-07  1:49                 ` Yongqiang Yang
2012-12-05 10:46       ` Li Zefan
2012-12-05 14:02         ` Tao Ma
2012-12-06  1:17           ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50CEF92F.6050306@huawei.com \
    --to=lizefan@huawei.com \
    --cc=jack@suse.cz \
    --cc=laoar.shao@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tao.peng@emc.com \
    --cc=tm@tao.ma \
    --cc=tytso@mit.edu \
    --cc=wuqixuan@gmail.com \
    --cc=wuqixuan@huawei.com \
    --cc=xieshuangyi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).