All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Zefan <lizefan@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: qixuan wu <wuqixuan@gmail.com>, Tao Ma <tm@tao.ma>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Eric Sandeen <sandeen@redhat.com>,
	Yafang Shao <laoar.shao@gmail.com>,
	<linux-fsdevel@vger.kernel.org>, <linux-ext4@vger.kernel.org>,
	<wuqixuan@huawei.com>, <xieshuangyi@huawei.com>,
	<tao.peng@emc.com>
Subject: Re: help about ext3 read-only issue on ext3(2.6.16.30)
Date: Mon, 17 Dec 2012 18:51:27 +0800	[thread overview]
Message-ID: <50CEF92F.6050306@huawei.com> (raw)
In-Reply-To: <50C86AFC.7080301@huawei.com>

>>> last_offset=-1, last_fpos=-1, f_pos=4024
>>>
>>> -1 means we hit the bug in the first iteration in the insde while in
>>> ext3_readdir().
>>>
>>> I've checked how ext3_readdir() works and how f_pos, f_version and i_version
>>> get initialized and modified. Now I'm lost. I really can't see how f_pos got
>>> corrupted. :(
>>   Hum, it looks really curious. So f_pos has been 4024 when we entered
>> ext3_readdir()?
> 
> dunno. but what else can be
> 
>> Do you know what it was when we last left ext3_readdir()
>> for that filp? You can store that value in some debug entry added to struct
>> file... Also any chance we ever hit:
>>                                 if (version != filp->f_version)
>>                                         goto revalidate;
>> I don't think it can ever happen since we hold i_mutex and
>> generic_file_llseek() takes i_mutex as well. But better be sure.
>>
> 
> Yesterday I've added more debug aids, which convers all the above information
> mentioned. Actually the code tracks all the places that change f_pos, and
> I think only lseek() and readdir() can change it.
> 
> Now I'm waiting for the bug to happen again, can be several days...
> 

The bug was triggered again:

EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #9372013: rec_len is smaller than minimal - offset=4028, inode=0, rec_len=0, name_len=0

And I've confirmed f_pos=4028 when we entered ext3_readdir(), while it should be 4096.

I wrote a simple ring buffer to track operations on log dirs, and from the
ring buffer, we can see that there were no lseek, unlink, rename, etc.

This is correct:

dir=9372013, seq=1549, spot=readdir_1, f_pos=0, f_pos_delta=0
dir=9372013, seq=1550, spot=readdir_3, f_pos=0, f_pos_delta=0
dir=9372013, seq=1551, spot=readdir_5, f_pos=12, f_pos_delta=12
dir=9372013, seq=1552, spot=readdir_5, f_pos=24, f_pos_delta=12
...
dir=9372013, seq=1595, spot=readdir_5, f_pos=1488, f_pos_delta=28
dir=9372013, seq=1596, spot=readdir_5, f_pos=1516, f_pos_delta=28
dir=9372013, seq=1597, spot=readdir_5, f_pos=1556, f_pos_delta=40
dir=9372013, seq=1598, spot=readdir_5, f_pos=1584, f_pos_delta=28
...
dir=9372013, seq=1627, spot=readdir_5, f_pos=2392, f_pos_delta=28
dir=9372013, seq=1628, spot=readdir_5, f_pos=4096, f_pos_delta=1704
dir=9372013, seq=1629, spot=readdir_1, f_pos=4096, f_pos_delta=0

(readir_1 is the entry of readdir(), and readdir_3 is when we enter (f_version != i_version),
and readdir_5 is we iterate the dir block)

Then f_pos went wrong suddenly:

dir=9372013, seq=1676, spot=readdir_5, f_pos=1488, f_pos_delta=28
dir=9372013, seq=1677, spot=readdir_5, f_pos=1516, f_pos_delta=28
dir=9372013, seq=1678, spot=readdir_5, f_pos=1556, f_pos_delta=40
dir=9372013, seq=1679, spot=readdir_5, f_pos=1516, f_pos_delta=28   <-- !!!!!!!!
dir=9372013, seq=1680, spot=readdir_5, f_pos=1540, f_pos_delta=24
...
dir=9372013, seq=1708, spot=readdir_5, f_pos=2324, f_pos_delta=28
dir=9372013, seq=1709, spot=readdir_5, f_pos=4028, f_pos_delta=1704
dir=9372013, seq=1710, spot=readdir_1, f_pos=4028, f_pos_delta=0

This is odd...

While f_pos was wrong, offset is always correct, and this is not some
single-bit error in memory, so someone else changed f_pos? but we were
holding i_mutex, and we see nothing else except readdir in the ring
buffer...


  parent reply	other threads:[~2012-12-17 10:52 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-01 14:22 help about ext3 read-only issue on ext3(2.6.16.30) Yafang Shao
2012-12-03 17:59 ` Eric Sandeen
2012-12-04 13:54   ` Li Zefan
2012-12-04 15:09     ` Theodore Ts'o
2012-12-05 10:43       ` Li Zefan
2012-12-05 14:26         ` Tao Ma
2012-12-05 15:51           ` qixuan wu
2012-12-06  1:13           ` Li Zefan
2012-12-06 12:37             ` Jan Kara
2012-12-06 16:21               ` qixuan wu
2012-12-06 17:09                 ` Jan Kara
2012-12-07 10:03                   ` Li Zefan
2012-12-11  8:01                     ` Li Zefan
2012-12-12 10:04                       ` Jan Kara
2012-12-12 11:31                         ` Li Zefan
2012-12-14  3:32                           ` Peng, Tao
2012-12-17 10:51                           ` Li Zefan [this message]
2012-12-20 11:32                             ` Jan Kara
2013-02-12 12:19                               ` Jan Kara
2012-12-04 15:29     ` Tao Ma
2012-12-04 16:11       ` Bernd Schubert
2012-12-04 20:20         ` Theodore Ts'o
2012-12-04 16:16       ` qixuan wu
2012-12-04 20:45         ` Theodore Ts'o
2012-12-05 13:58         ` Tao Ma
2012-12-05 15:05           ` Theodore Ts'o
2012-12-06  1:54             ` Tao Ma
2012-12-06 15:48               ` qixuan wu
2012-12-05 15:46           ` qixuan wu
2012-12-06  2:58             ` Yongqiang Yang
2012-12-06 16:26               ` qixuan wu
2012-12-07  1:49                 ` Yongqiang Yang
2012-12-05 10:46       ` Li Zefan
2012-12-05 14:02         ` Tao Ma
2012-12-06  1:17           ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50CEF92F.6050306@huawei.com \
    --to=lizefan@huawei.com \
    --cc=jack@suse.cz \
    --cc=laoar.shao@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tao.peng@emc.com \
    --cc=tm@tao.ma \
    --cc=tytso@mit.edu \
    --cc=wuqixuan@gmail.com \
    --cc=wuqixuan@huawei.com \
    --cc=xieshuangyi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.