All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tao Ma <tm@tao.ma>
To: qixuan wu <wuqixuan@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>, Eric Sandeen <sandeen@redhat.com>,
	Yafang Shao <laoar.shao@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	wuqixuan@huawei.com
Subject: Re: help about ext3 read-only issue on ext3(2.6.16.30)
Date: Wed, 05 Dec 2012 21:58:17 +0800	[thread overview]
Message-ID: <50BF52F9.5080707@tao.ma> (raw)
In-Reply-To: <CAEjEV8D7G3xaT_Y4KdH7eVA-bdexjEE5iKPFRQ=Psgfmmoe-+g@mail.gmail.com>

Hi qixuan,
On 12/05/2012 12:16 AM, qixuan wu wrote:
> Hi Tao, all,
> 
>     I guess it's a memory(or ext3/kenrel) issue. Beause in one
> machine, after report this issue, the partition is made to readonly,
> we use debugfs to "ls dir", and it's fine. It can list all files
> without error. If the disk has issue, when we using ls command, it
> will give error also. (The dir name is also using debugfs to get by
> issue inode ID.)
Are you sure the disk is good? I just checked the code in e2fsprogs, it
seems that it will not complain if rec_len = 0, and the dir iteration
just aborts. I guess the right way should be dd the corresponding block
out, decode and read it in binary format. :(

Thanks
Tao
> 
>     Is there the possibility: one thread(A) is read_dir(directly read
> from buffer head), and another thread(B) is creating item, and fill
> this buffer header at the same time. During create item, first modify
> the last item's rec_len(let it point to next item which initially is
> zero), then fill this added new item. Suppose the seq is below :
>    1) B: modify last item's rec_len
>    2) A: Read last item, rec_len is modified already by B, and it
> identify next item is existing.
>    3) A: Read new item, all feilds are zero still.
>    3) B: fill new item with correct value.
> 
>    This may cause problem. Sorry I am not still checking the code
> properly. Raise this suppose is just hope ext3 experts can help to
> think whether such concurring scenario has problem or not ?
>    Any idea or clue is welcome.
> 
> Regards & Thanks a lot.
> Wuqx
> 
> On Tue, Dec 4, 2012 at 11:29 PM, Tao Ma <tm@tao.ma> wrote:
>> Hi zefan,
>> On 12/04/2012 09:54 PM, Li Zefan wrote:
>>>>> We have many x86 boards, and we've been using 2.6.16.60 for a long
>>>>> time. Before time we occasionally found ext3 was switched to read-only
>>>>> while services were running, and we took it for granted it must be
>>>>> some hardware problems.
>>>>>     But recently this issue happens frequently, both in old boards and
>>>>> new boards. We've analyzed logs, and in one board we did find
>>>>> exceptional reboot (but ext3 error happened 9 days after), and in
>>>>> another board we found mptbase recovery routine, but in all other
>>>>> boards there's no suspicious output at all.
>>>>>     The only change with the system is some application updates, and
>>>>> apps now put more IO burden on disks.
>>>>>     The error always happened in ext3_readdir, like this:
>>>>>
>>>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory#6685458: rec_len is smaller than minimal - offset=3860, inode=0, rec_len=0, name_len=0
>>>>> Aborting journal on device sda7.
>>>>> EXT3-fs error (device sda7) in start_transaction: Readonly filesystem
>>>>> Aborting journal on device sda7.
>>>>> ext3_abort called.
>>>>> EXT3-fs error (device sda7): ext3_journal_start_sb: Detected aborted journal
>>>>> Remounting filesystem read-only
>>>>> __journal_remove_journal_head: freeing b_committed_data
>>>>>
>>>>> We highly doubt it's hardware failures with this frequency in mind, so
>>>>> we're wondering regarding to this issue if there's some ext3 bug-fix
>>>>> having merged into mainline but not in our old kernel?
>>>>
>>>> Absolutely there are.  There have been 87 changes just to namei.c since 2.6.16.
>>>> You could look through git logs to see if anything looks applicable.
>>>>
>>>> You might try:
>>>>
>>>> ef2b02d3e617cb0400eedf2668f86215e1b0e6af ext34: ensure do_split leaves enough free space in both blocks
>>>
>>> I've been asked to investigate this issue. Thanks for the reply!
>>>
>>> I found this fix while searching for similar bug reports, but I don't think it
>>> worths trying as we don't use dir_index feature.
>>>
>>> I've collected some logs in different machines, and the error was always
>>> triggered in ext3_readdir:
>>>
>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #6685458: rec_len is smaller than minimal - offset=3860, inode=0, rec_len=0, name_len=0
>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #9650541: rec_len is smaller than minimal - offset=3960, inode=0, rec_len=0, name_len=0
>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #11124783: rec_len is smaller than minimal - offset=4072, inode=0, rec_len=0, name_len=0
>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #52740880: rec_len is smaller than minimal - offset=4024, inode=0, rec_len=0, name_len=0
>>> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #52740880: rec_len is smaller than minimal - offset=4084, inode=0, rec_len=0, name_len=0
>>>
>>> The last two errors happened on the same machine, and the same inode! One
>>> happened in 11/22 (I was told they had run fsck later on), and one in 12/01.
>> So now this directory has been fscked to be right? You can try by just
>> ls this directory and check whether there are any errors in dmesg.
>>
>> Having said that, as this error happens 2 times for the same inode,
>> maybe there is a kernel bug. At least as Ted said in another mail, the
>> end of this buffer head seems to be cleared. So I guess next time when
>> you see this error, please do:
>> 1. use debugfs to find the disk layout for this dir
>> 2. read the blocks from the block device directly
>> 3. check whether the end of a block(from offset to the end) is zeroed.
>> 4. If yes, I guess there should be a kernel bug and we can go on to
>> investigate the code.
>>
>> Thanks
>> Tao
>>>
>>> The offset is always a bit smaller than blocksize, and all the fields are 0.
>>> I dumped one of the dirs, and only ~1.6K was used (fsck reported no error).
>>>
>>> In some machines fsck reported no error at all, and in others filesystems
>>> were corrupted though fixable.
>>>
>>> I didn't see any other error messages before this error at all.
>>>
>>> Does this remind you of some old ext3 bug?
>>>
>>> I'll send you fsck output, dir contents and other logs if u'r interested.
>>>
>>>>
>>>> but to be honest, sticking with such an old kernel means you are largely on your own, or may need contract help if you can't resolve it.
>>>>
>>>
>>> There're numerous machines running old kernels, and many of them are hard to
>>> change. :(
>>>
>>> Yesterday they upgrade apps on ~30 machines, and soon after that 5 machines
>>> had filesystem corrupted. However they won't stop upgrading other machines!
>>>
>>> On the other hand, we can hardly reproduce this bug in the lab.
>>>
>>> So this is critical and urgent. Any help is appreciated.
>>>
>>> Regards
>>> Li Zefan
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


  parent reply	other threads:[~2012-12-05 13:58 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-01 14:22 help about ext3 read-only issue on ext3(2.6.16.30) Yafang Shao
2012-12-03 17:59 ` Eric Sandeen
2012-12-04 13:54   ` Li Zefan
2012-12-04 15:09     ` Theodore Ts'o
2012-12-05 10:43       ` Li Zefan
2012-12-05 14:26         ` Tao Ma
2012-12-05 15:51           ` qixuan wu
2012-12-06  1:13           ` Li Zefan
2012-12-06 12:37             ` Jan Kara
2012-12-06 16:21               ` qixuan wu
2012-12-06 17:09                 ` Jan Kara
2012-12-07 10:03                   ` Li Zefan
2012-12-11  8:01                     ` Li Zefan
2012-12-12 10:04                       ` Jan Kara
2012-12-12 11:31                         ` Li Zefan
2012-12-14  3:32                           ` Peng, Tao
2012-12-17 10:51                           ` Li Zefan
2012-12-20 11:32                             ` Jan Kara
2013-02-12 12:19                               ` Jan Kara
2012-12-04 15:29     ` Tao Ma
2012-12-04 16:11       ` Bernd Schubert
2012-12-04 20:20         ` Theodore Ts'o
2012-12-04 16:16       ` qixuan wu
2012-12-04 20:45         ` Theodore Ts'o
2012-12-05 13:58         ` Tao Ma [this message]
2012-12-05 15:05           ` Theodore Ts'o
2012-12-06  1:54             ` Tao Ma
2012-12-06 15:48               ` qixuan wu
2012-12-05 15:46           ` qixuan wu
2012-12-06  2:58             ` Yongqiang Yang
2012-12-06 16:26               ` qixuan wu
2012-12-07  1:49                 ` Yongqiang Yang
2012-12-05 10:46       ` Li Zefan
2012-12-05 14:02         ` Tao Ma
2012-12-06  1:17           ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50BF52F9.5080707@tao.ma \
    --to=tm@tao.ma \
    --cc=laoar.shao@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=sandeen@redhat.com \
    --cc=wuqixuan@gmail.com \
    --cc=wuqixuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.