From: Jan Kara <jack@suse.cz>
To: Li Zefan <lizefan@huawei.com>
Cc: Jan Kara <jack@suse.cz>, qixuan wu <wuqixuan@gmail.com>,
Tao Ma <tm@tao.ma>, Theodore Ts'o <tytso@mit.edu>,
Eric Sandeen <sandeen@redhat.com>,
Yafang Shao <laoar.shao@gmail.com>,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
wuqixuan@huawei.com, xieshuangyi@huawei.com, tao.peng@emc.com
Subject: Re: help about ext3 read-only issue on ext3(2.6.16.30)
Date: Tue, 12 Feb 2013 13:19:49 +0100 [thread overview]
Message-ID: <20130212121949.GA19583@quack.suse.cz> (raw)
In-Reply-To: <20121220113213.GB3301@quack.suse.cz>
On Thu 20-12-12 12:32:13, Jan Kara wrote:
> Hello,
>
> On Mon 17-12-12 18:51:27, Li Zefan wrote:
> > >>> last_offset=-1, last_fpos=-1, f_pos=4024
> > >>>
> > >>> -1 means we hit the bug in the first iteration in the insde while in
> > >>> ext3_readdir().
> > >>>
> > >>> I've checked how ext3_readdir() works and how f_pos, f_version and i_version
> > >>> get initialized and modified. Now I'm lost. I really can't see how f_pos got
> > >>> corrupted. :(
> > >> Hum, it looks really curious. So f_pos has been 4024 when we entered
> > >> ext3_readdir()?
> > >
> > > dunno. but what else can be
> > >
> > >> Do you know what it was when we last left ext3_readdir()
> > >> for that filp? You can store that value in some debug entry added to struct
> > >> file... Also any chance we ever hit:
> > >> if (version != filp->f_version)
> > >> goto revalidate;
> > >> I don't think it can ever happen since we hold i_mutex and
> > >> generic_file_llseek() takes i_mutex as well. But better be sure.
> > >>
> > >
> > > Yesterday I've added more debug aids, which convers all the above information
> > > mentioned. Actually the code tracks all the places that change f_pos, and
> > > I think only lseek() and readdir() can change it.
> > >
> > > Now I'm waiting for the bug to happen again, can be several days...
> > >
> >
> > The bug was triggered again:
> >
> > EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #9372013: rec_len is smaller than minimal - offset=4028, inode=0, rec_len=0, name_len=0
> >
> > And I've confirmed f_pos=4028 when we entered ext3_readdir(), while it
> > should be 4096.
> OK, interesting.
>
> > I wrote a simple ring buffer to track operations on log dirs, and from the
> > ring buffer, we can see that there were no lseek, unlink, rename, etc.
> >
> > This is correct:
> >
> > dir=9372013, seq=1549, spot=readdir_1, f_pos=0, f_pos_delta=0
> > dir=9372013, seq=1550, spot=readdir_3, f_pos=0, f_pos_delta=0
> > dir=9372013, seq=1551, spot=readdir_5, f_pos=12, f_pos_delta=12
> > dir=9372013, seq=1552, spot=readdir_5, f_pos=24, f_pos_delta=12
> > ...
> > dir=9372013, seq=1595, spot=readdir_5, f_pos=1488, f_pos_delta=28
> > dir=9372013, seq=1596, spot=readdir_5, f_pos=1516, f_pos_delta=28
> > dir=9372013, seq=1597, spot=readdir_5, f_pos=1556, f_pos_delta=40
> > dir=9372013, seq=1598, spot=readdir_5, f_pos=1584, f_pos_delta=28
> > ...
> > dir=9372013, seq=1627, spot=readdir_5, f_pos=2392, f_pos_delta=28
> > dir=9372013, seq=1628, spot=readdir_5, f_pos=4096, f_pos_delta=1704
> > dir=9372013, seq=1629, spot=readdir_1, f_pos=4096, f_pos_delta=0
> >
> > (readir_1 is the entry of readdir(), and readdir_3 is when we enter
> > (f_version != i_version), and readdir_5 is we iterate the dir block)
> Can you post the readdir() code you now run including your logging?
> Thanks!
>
> > Then f_pos went wrong suddenly:
> >
> > dir=9372013, seq=1676, spot=readdir_5, f_pos=1488, f_pos_delta=28
> > dir=9372013, seq=1677, spot=readdir_5, f_pos=1516, f_pos_delta=28
> > dir=9372013, seq=1678, spot=readdir_5, f_pos=1556, f_pos_delta=40
> > dir=9372013, seq=1679, spot=readdir_5, f_pos=1516, f_pos_delta=28 <-- !!!!!!!!
> > dir=9372013, seq=1680, spot=readdir_5, f_pos=1540, f_pos_delta=24
> > ...
> > dir=9372013, seq=1708, spot=readdir_5, f_pos=2324, f_pos_delta=28
> > dir=9372013, seq=1709, spot=readdir_5, f_pos=4028, f_pos_delta=1704
> > dir=9372013, seq=1710, spot=readdir_1, f_pos=4028, f_pos_delta=0
> >
> > This is odd...
> >
> > While f_pos was wrong, offset is always correct, and this is not some
> > single-bit error in memory, so someone else changed f_pos? but we were
> > holding i_mutex, and we see nothing else except readdir in the ring
> > buffer...
> How do you know 'offset' was correct? I don't see it in the ring
> buffer... Anyway, this all points to us taking the:
> if (version != filp->f_version)
> goto revalidate;
> branch when 'offset' is already advanced but f_pos isn't. Then we don't
> enter:
> if (filp->f_version != inode->i_version) {
> branch and thus f_pos and 'offset' stay out of sync and problems happen.
>
> But how these strange issues with f_version happen is a mystery for me so
> far.
Were there any more test results regarding this? I'm just curious what
the problem is / was...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2013-02-12 12:19 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-01 14:22 help about ext3 read-only issue on ext3(2.6.16.30) Yafang Shao
2012-12-03 17:59 ` Eric Sandeen
2012-12-04 13:54 ` Li Zefan
2012-12-04 15:09 ` Theodore Ts'o
2012-12-05 10:43 ` Li Zefan
2012-12-05 14:26 ` Tao Ma
2012-12-05 15:51 ` qixuan wu
2012-12-06 1:13 ` Li Zefan
2012-12-06 12:37 ` Jan Kara
2012-12-06 16:21 ` qixuan wu
2012-12-06 17:09 ` Jan Kara
2012-12-07 10:03 ` Li Zefan
2012-12-11 8:01 ` Li Zefan
2012-12-12 10:04 ` Jan Kara
2012-12-12 11:31 ` Li Zefan
2012-12-14 3:32 ` Peng, Tao
2012-12-17 10:51 ` Li Zefan
2012-12-20 11:32 ` Jan Kara
2013-02-12 12:19 ` Jan Kara [this message]
2012-12-04 15:29 ` Tao Ma
2012-12-04 16:11 ` Bernd Schubert
2012-12-04 20:20 ` Theodore Ts'o
2012-12-04 16:16 ` qixuan wu
2012-12-04 20:45 ` Theodore Ts'o
2012-12-05 13:58 ` Tao Ma
2012-12-05 15:05 ` Theodore Ts'o
2012-12-06 1:54 ` Tao Ma
2012-12-06 15:48 ` qixuan wu
2012-12-05 15:46 ` qixuan wu
2012-12-06 2:58 ` Yongqiang Yang
2012-12-06 16:26 ` qixuan wu
2012-12-07 1:49 ` Yongqiang Yang
2012-12-05 10:46 ` Li Zefan
2012-12-05 14:02 ` Tao Ma
2012-12-06 1:17 ` Li Zefan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130212121949.GA19583@quack.suse.cz \
--to=jack@suse.cz \
--cc=laoar.shao@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=sandeen@redhat.com \
--cc=tao.peng@emc.com \
--cc=tm@tao.ma \
--cc=tytso@mit.edu \
--cc=wuqixuan@gmail.com \
--cc=wuqixuan@huawei.com \
--cc=xieshuangyi@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).