linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tao Ma <tm@tao.ma>
To: qixuan wu <wuqixuan@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>, Eric Sandeen <sandeen@redhat.com>,
	Yafang Shao <laoar.shao@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	wuqixuan@huawei.com
Subject: Re: help about ext3 read-only issue on ext3(2.6.16.30)
Date: Wed, 05 Dec 2012 21:58:17 +0800	[thread overview]
Message-ID: <50BF52F9.5080707@tao.ma> (raw)
In-Reply-To: <CAEjEV8D7G3xaT_Y4KdH7eVA-bdexjEE5iKPFRQ=Psgfmmoe-+g@mail.gmail.com>

Hi qixuan,
On 12/05/2012 12:16 AM, qixuan wu wrote:
> Hi Tao, all,
> 
>     I guess it's a memory(or ext3/kenrel) issue. Beause in one
> machine, after report this issue, the partition is made to readonly,
> we use debugfs to "ls dir", and it's fine. It can list all files
> without error. If the disk has issue, when we using ls command, it
> will give error also. (The dir name is also using debugfs to get by
> issue inode ID.)
Are you sure the disk is good? I just checked the code in e2fsprogs, it
seems that it will not complain if rec_len = 0, and the dir iteration
just aborts. I guess the right way should be dd the corresponding block
out, decode and read it in binary format. :(

Thanks
Tao
> 
>     Is there the possibility: one thread(A) is read_dir(directly read
> from buffer head), and another thread(B) is creating item, and fill
> this buffer header at the same time. During create item, first modify
> the last item's rec_len(let it point to next item which initially is
> zero), then fill this added new item. Suppose the seq is below :
>    1) B: modify last item's rec_len
>    2) A: Read last item, rec_len is modified already by B, and it
> identify next item is existing.
>    3) A: Read new item, all feilds are zero still.
>    3) B: fill new item with correct value.
> 
>    This may cause problem. Sorry I am not still checking the code
> properly. Raise this suppose is just hope ext3 experts can help to
> think whether such concurring scenario has problem or not ?
>    Any idea or clue is welcome.
> 
> Regards & Thanks a lot.
> Wuqx
> 
> On Tue, Dec 4, 2012 at 11:29 PM, Tao Ma <tm@tao.ma> wrote:
>> Hi zefan,
>> On 12/04/2012 09:54 PM, Li Zefan wrote:
>>>>> We have many x86 boards, and we've been using 2.6.16.60 for a long
>>>>> time. Before time we occasionally found ext3 was switched to read-only
>>>>> while services were running, and we took it for granted it must be
>>>>> some hardware problems.
>>>>>     But recently this issue happens frequently, both in old boards and
>>>>> new boards. We've analyzed logs, and in one board we did find
>>>>> exceptional reboot (but ext3 error happened 9 days after), and in
>>>>> another board we found mptbase recovery routine, but in all other
>>>>> boards there's no suspicious output at all.
>>>>>     The only change with the system is some application updates, and
>>>>> apps now put more IO burden on disks.
>>>>>     The error always happened in ext3_readdir, like this:
>>>>>
>>>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory#6685458: rec_len is smaller than minimal - offset=3860, inode=0, rec_len=0, name_len=0
>>>>> Aborting journal on device sda7.
>>>>> EXT3-fs error (device sda7) in start_transaction: Readonly filesystem
>>>>> Aborting journal on device sda7.
>>>>> ext3_abort called.
>>>>> EXT3-fs error (device sda7): ext3_journal_start_sb: Detected aborted journal
>>>>> Remounting filesystem read-only
>>>>> __journal_remove_journal_head: freeing b_committed_data
>>>>>
>>>>> We highly doubt it's hardware failures with this frequency in mind, so
>>>>> we're wondering regarding to this issue if there's some ext3 bug-fix
>>>>> having merged into mainline but not in our old kernel?
>>>>
>>>> Absolutely there are.  There have been 87 changes just to namei.c since 2.6.16.
>>>> You could look through git logs to see if anything looks applicable.
>>>>
>>>> You might try:
>>>>
>>>> ef2b02d3e617cb0400eedf2668f86215e1b0e6af ext34: ensure do_split leaves enough free space in both blocks
>>>
>>> I've been asked to investigate this issue. Thanks for the reply!
>>>
>>> I found this fix while searching for similar bug reports, but I don't think it
>>> worths trying as we don't use dir_index feature.
>>>
>>> I've collected some logs in different machines, and the error was always
>>> triggered in ext3_readdir:
>>>
>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #6685458: rec_len is smaller than minimal - offset=3860, inode=0, rec_len=0, name_len=0
>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #9650541: rec_len is smaller than minimal - offset=3960, inode=0, rec_len=0, name_len=0
>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #11124783: rec_len is smaller than minimal - offset=4072, inode=0, rec_len=0, name_len=0
>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #52740880: rec_len is smaller than minimal - offset=4024, inode=0, rec_len=0, name_len=0
>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #52740880: rec_len is smaller than minimal - offset=4084, inode=0, rec_len=0, name_len=0
>>>
>>> The last two errors happened on the same machine, and the same inode! One
>>> happened in 11/22 (I was told they had run fsck later on), and one in 12/01.
>> So now this directory has been fscked to be right? You can try by just
>> ls this directory and check whether there are any errors in dmesg.
>>
>> Having said that, as this error happens 2 times for the same inode,
>> maybe there is a kernel bug. At least as Ted said in another mail, the
>> end of this buffer head seems to be cleared. So I guess next time when
>> you see this error, please do:
>> 1. use debugfs to find the disk layout for this dir
>> 2. read the blocks from the block device directly
>> 3. check whether the end of a block(from offset to the end) is zeroed.
>> 4. If yes, I guess there should be a kernel bug and we can go on to
>> investigate the code.
>>
>> Thanks
>> Tao
>>>
>>> The offset is always a bit smaller than blocksize, and all the fields are 0.
>>> I dumped one of the dirs, and only ~1.6K was used (fsck reported no error).
>>>
>>> In some machines fsck reported no error at all, and in others filesystems
>>> were corrupted though fixable.
>>>
>>> I didn't see any other error messages before this error at all.
>>>
>>> Does this remind you of some old ext3 bug?
>>>
>>> I'll send you fsck output, dir contents and other logs if u'r interested.
>>>
>>>>
>>>> but to be honest, sticking with such an old kernel means you are largely on your own, or may need contract help if you can't resolve it.
>>>>
>>>
>>> There're numerous machines running old kernels, and many of them are hard to
>>> change. :(
>>>
>>> Yesterday they upgrade apps on ~30 machines, and soon after that 5 machines
>>> had filesystem corrupted. However they won't stop upgrading other machines!
>>>
>>> On the other hand, we can hardly reproduce this bug in the lab.
>>>
>>> So this is critical and urgent. Any help is appreciated.
>>>
>>> Regards
>>> Li Zefan
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


  parent reply	other threads:[~2012-12-05 13:58 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-01 14:22 help about ext3 read-only issue on ext3(2.6.16.30) Yafang Shao
2012-12-03 17:59 ` Eric Sandeen
2012-12-04 13:54   ` Li Zefan
2012-12-04 15:09     ` Theodore Ts'o
2012-12-05 10:43       ` Li Zefan
2012-12-05 14:26         ` Tao Ma
2012-12-05 15:51           ` qixuan wu
2012-12-06  1:13           ` Li Zefan
2012-12-06 12:37             ` Jan Kara
2012-12-06 16:21               ` qixuan wu
2012-12-06 17:09                 ` Jan Kara
2012-12-07 10:03                   ` Li Zefan
2012-12-11  8:01                     ` Li Zefan
2012-12-12 10:04                       ` Jan Kara
2012-12-12 11:31                         ` Li Zefan
2012-12-14  3:32                           ` Peng, Tao
2012-12-17 10:51                           ` Li Zefan
2012-12-20 11:32                             ` Jan Kara
2013-02-12 12:19                               ` Jan Kara
2012-12-04 15:29     ` Tao Ma
2012-12-04 16:11       ` Bernd Schubert
2012-12-04 20:20         ` Theodore Ts'o
2012-12-04 16:16       ` qixuan wu
2012-12-04 20:45         ` Theodore Ts'o
2012-12-05 13:58         ` Tao Ma [this message]
2012-12-05 15:05           ` Theodore Ts'o
2012-12-06  1:54             ` Tao Ma
2012-12-06 15:48               ` qixuan wu
2012-12-05 15:46           ` qixuan wu
2012-12-06  2:58             ` Yongqiang Yang
2012-12-06 16:26               ` qixuan wu
2012-12-07  1:49                 ` Yongqiang Yang
2012-12-05 10:46       ` Li Zefan
2012-12-05 14:02         ` Tao Ma
2012-12-06  1:17           ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50BF52F9.5080707@tao.ma \
    --to=tm@tao.ma \
    --cc=laoar.shao@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=sandeen@redhat.com \
    --cc=wuqixuan@gmail.com \
    --cc=wuqixuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).