[Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* [Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive
@ 2016-03-17  5:05 bugzilla-daemon
  2016-03-17  7:38 ` [Bug 114821] " bugzilla-daemon
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: bugzilla-daemon @ 2016-03-17  5:05 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=114821

            Bug ID: 114821
           Summary: Frequent and recurring ext4 "bad header invalid magic"
                    errors on a healthy drive
           Product: File System
           Version: 2.5
    Kernel Version: 3.18.5+
          Hardware: ARM
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@kernel-bugs.osdl.org
          Reporter: dse.ssd@gmail.com
        Regression: No

Recurring ext4 "bad header" errors are reported for a healthy, few months old
external USB drive (WD MyBook 1230) connected to a Raspberry Pi running
Raspbian 7, kernel 3.18.5+. The drive was working fine previously when
formatted in NTFS, SMART has no errors logged, drive passes short and long
SMART self-tests as well as manufacturer diagnostic tests (WD Data Lifeguard).
Another external USB drive (a WD MyBook 1140), formatted in NTFS shows no
problems when connected to the same machine. The specific error is:

ext4_ext_check_inode:495: inode #1523: comm ls: pblk 0 bad header/extent:
invalid magic - magic 0, entries 0, max 0(0), depth 0(0)

The errors are produced whenever some certain files are accessed and the hard
drive produces a distinct (and rather unsettling) clunk every few seconds,
which sounds like a head load/unload or something, but the files seem to be
read without error.

Running fsck reports and fixes some errors and makes the ext4 partition clean
again, but the same errors are produced when the files are read again. The
offending files can be moved to another device and back to the same location to
stop the error from recurring, but it soon appears when accessing other files.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 114821] Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive
  2016-03-17  5:05 [Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive bugzilla-daemon
@ 2016-03-17  7:38 ` bugzilla-daemon
  2016-03-18  5:17 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2016-03-17  7:38 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=114821

Christian Kujau <lists@nerdbynature.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lists@nerdbynature.de

--- Comment #1 from Christian Kujau <lists@nerdbynature.de> ---
"the hard drive produces a distinct clunk every few seconds" sounds like
hardware problems. Other file systems have different access patterns and it may
not be vis^W audible with e.g. NTFS, but I wouldn't trust a hard disk that
makes "clunk" noises. That being said, there are crucial parts missing from
this report: syslog errors during the access problems (a full dmesg may be
helpful too); does it happen with a current vanilla kernel? Also: has this been
reported to the linux-ext4 mailing list prior opening this bug?

Also, this error seems to be rather common with Raspberry environments, see:

https://www.linuxquestions.org/questions/slackware-arm-108/raspberry-pi2-keeps-crashing-and-corrupting-disk-4175541613/
 (May 2015)

https://github.com/atom/atom/issues/8381 (Aug 2015)

https://unix.stackexchange.com/questions/269609/recurring-ext4-bad-header-invalid-magic-errors-on-a-healty-drive
(from 3 days ago, is that you?)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 114821] Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive
  2016-03-17  5:05 [Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive bugzilla-daemon
  2016-03-17  7:38 ` [Bug 114821] " bugzilla-daemon
@ 2016-03-18  5:17 ` bugzilla-daemon
  2016-03-18  5:17 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2016-03-18  5:17 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=114821

--- Comment #2 from Lukas <dse.ssd@gmail.com> ---

I was suspecting hardware problems initially, but now I don't think that's the
case - both short and long SMART self-tests as well as vendor diagnostic tests
(including a 10 hour scan of every sector) produced no errors. All SMART
attributes that would indicate electromechanical failure are at zero raw
values.

I've also just verified that the 'clunk sound' is a head load cycle. By copying
a certain file I can trigger the error and if I check the SMART attributes
before and after, I see the 'Load Cycle Count (ID #193)' increased by 2.
Currently it is at 1823.

Unfortunately, I'm not skilled enough to quickly test with a vanilla kernel on
the same OS. Would it be useful if I'd tested using Raspbian Jessie with 4.1
kernel instead? I can exchange the SD card with the OS while keeping everything
else the same.

The stackexchange post is indeed mine and I have not posted this to the
linux-ext4 mailing list. Should I do so? I apologize if I'm doing things out of
order here, I have little experience in official linux bug reporting.

Attaching dmesg ans syslog files.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 114821] Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive
  2016-03-17  5:05 [Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive bugzilla-daemon
  2016-03-17  7:38 ` [Bug 114821] " bugzilla-daemon
  2016-03-18  5:17 ` bugzilla-daemon
@ 2016-03-18  5:17 ` bugzilla-daemon
  2016-03-18  5:18 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2016-03-18  5:17 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=114821

--- Comment #3 from Lukas <dse.ssd@gmail.com> ---
Created attachment 209701
  --> https://bugzilla.kernel.org/attachment.cgi?id=209701&action=edit
dmesg

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 114821] Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive
  2016-03-17  5:05 [Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive bugzilla-daemon
                   ` (2 preceding siblings ...)
  2016-03-18  5:17 ` bugzilla-daemon
@ 2016-03-18  5:18 ` bugzilla-daemon
  2016-03-18 22:46 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2016-03-18  5:18 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=114821

--- Comment #4 from Lukas <dse.ssd@gmail.com> ---
Created attachment 209711
  --> https://bugzilla.kernel.org/attachment.cgi?id=209711&action=edit
syslog

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 114821] Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive
  2016-03-17  5:05 [Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive bugzilla-daemon
                   ` (3 preceding siblings ...)
  2016-03-18  5:18 ` bugzilla-daemon
@ 2016-03-18 22:46 ` bugzilla-daemon
  2016-03-20  6:46 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2016-03-18 22:46 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=114821

Theodore Tso <tytso@mit.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@mit.edu

--- Comment #5 from Theodore Tso <tytso@mit.edu> ---
No file system activity should be causing a head load cycle in the absence of a
hardware problem.

Any chance you can try copying the file system image to another external usb
drive, run e2fsck on the copied file system image, and then try using the
copied file systme image on the new usb drive?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 114821] Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive
  2016-03-17  5:05 [Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive bugzilla-daemon
                   ` (4 preceding siblings ...)
  2016-03-18 22:46 ` bugzilla-daemon
@ 2016-03-20  6:46 ` bugzilla-daemon
  2016-03-20 15:12 ` bugzilla-daemon
  2016-03-20 20:50 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2016-03-20  6:46 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=114821

--- Comment #6 from Lukas <dse.ssd@gmail.com> ---
I agree, the head cycling does seem unlikely to be caused by software. But I'm
confused how can the drive be 'failing' simple reads so often and yet pass
every fitness, including a vendor diagnostic test of every sector...

I cannot create a drive image... It is 4TB in size and I don't have that much
contiguous free storage at my disposal. What I could do is wipe the drive and
re-create the ext4 partition. I have most of the contents backed up. It sounds
like a radical way of testing, but let me know if that's my only option.

In the mean time I've done some more testing. Maybe it will prove useful.

Firstly, I've reproduced the error on Raspbian 8 with 4.1.18-v7+ kernel running
on a different Raspberry Pi device. (Sorry, the kernel is again non-vanilla).
Upon accessing a particular file on the drive, I can hear the head load cycle
and a dmesg ext4 error is logged.

Secondly, I've noticed that most of the ext4 errors are due to smbd and for the
same inode #1523. That inode seems to be accessed quite often whenever I do
anything with the drive.

I ran

$ debugfs -R "ncheck 1523" /dev/sdb1

and determined that the inode holds a single text file in the root directory of
the drive - 'file.txt'. I cannot list or delete this file:

% ls -li

ls: cannot access file.txt: Input/output error
? -?????????  ? ?     ?               ?            ? file.txt

$ rm file.txt
rm: cannot remove `file.txt': Input/output error

If I execute

$ debugfs -R 'cat <1523>' /dev/sdb1

it returns nothing and I believe the file should be empty. The command takes a
few minutes to execute, which I don't think is normal too.

I know this sounds even more like hardware problems, but, again, there's
nothing in the SMART attributes, self-tests or vendor diagnostic tests to
support that.

Should I attempt to delete this file using debugfs? Or should I re-create the
partition from scratch? Or should I just get another drive, close this 'bug'
and stop troubleshooting?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 114821] Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive
  2016-03-17  5:05 [Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive bugzilla-daemon
                   ` (5 preceding siblings ...)
  2016-03-20  6:46 ` bugzilla-daemon
@ 2016-03-20 15:12 ` bugzilla-daemon
  2016-03-20 20:50 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2016-03-20 15:12 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=114821

Andreas Dilger <adilger.kernelbugzilla@dilger.ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adilger.kernelbugzilla@dilg
                   |                            |er.ca

--- Comment #7 from Andreas Dilger <adilger.kernelbugzilla@dilger.ca> ---
Probably better than deleting the file, which would just release the bad
block(s) to be allocated by some other file, is to use "debugfs -c -R 'stat
<1523>' /dev/sdb1" to list the blocks allocated to the file, and then run
"badblocks" to mark those blocks bad.  This might also be the inode block that
is bad, which you could tell by listening for the head jump when you run
"debugfs stat" on the inode.  I don't think ext4 or e2fsck currently handles
bad blocks in the middle of the inode table, though I guess they could by
permanently marking those inodes allocated when parsing the bad block numbers. 

One way to check which is the bad block(s), if it isn't the inode itself, is
once you isolate the file's block number(s) is to run "dd if=/dev/sdb1
of=/dev/null bs=4k count=1 skip={blocknr}" one at a time for each block and
listen for the head jump.  This may be complicated by readahead in both the
drive and the kernel, so it may be better to do this from higher to lower block
numbers.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 114821] Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive
  2016-03-17  5:05 [Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive bugzilla-daemon
                   ` (6 preceding siblings ...)
  2016-03-20 15:12 ` bugzilla-daemon
@ 2016-03-20 20:50 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2016-03-20 20:50 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=114821

--- Comment #8 from Lukas <dse.ssd@gmail.com> ---
The command

$ debugfs -c -R 'stat <1523>' /dev/sdb1

produces the following output:

Inode: 1523   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 0    Version: 0x00000000
User:  1001   Group:   100   Size: 0
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x56b66f93 -- Sat Feb  6 17:11:31 2016
atime: 0x56b66f93 -- Sat Feb  6 17:11:31 2016
mtime: 0x4e4d75ac -- Thu Aug 18 16:27:24 2011
Size of extra inode fields: 0
EXTENTS:
(END)

Executing 'stat <1523>' command does not cause a head load cycle.

As I understand, there are no blocks assigned to the inode. The file is an
empty text file, so maybe it makes sense? I've created a new empty text file
and it also has no blocks assigned to its inode, but it can be read and stat'ed
with no problems.

I've checked a few of the other inode numbers that appear in dmesg less
frequently and - all of them point to empty text files! I would guess the
reason why inode #1523 appears so often in the logs is because the file is in
the root directory and gets listed each time the root is traversed.

So, is it possible that something went wrong with just the zero-length files
while I was copying my data to it? I did use ext2fsd on Windows 8.1 to copy
some of the files... I just tested it again, and I can copy zero-length files
to the drive using Windows and read them when mounted on linux with no
problems.

And since the zero-length files have no useful information (other than their
file names, which I can still read) and have no blocks assigned to them can I
now just remove them using debugfs?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-03-20 20:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-17  5:05 [Bug 114821] New: Frequent and recurring ext4 "bad header invalid magic" errors on a healthy drive bugzilla-daemon
2016-03-17  7:38 ` [Bug 114821] " bugzilla-daemon
2016-03-18  5:17 ` bugzilla-daemon
2016-03-18  5:17 ` bugzilla-daemon
2016-03-18  5:18 ` bugzilla-daemon
2016-03-18 22:46 ` bugzilla-daemon
2016-03-20  6:46 ` bugzilla-daemon
2016-03-20 15:12 ` bugzilla-daemon
2016-03-20 20:50 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox