Filesystem corruption on Synology iSCSI LUN

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Filesystem corruption on Synology iSCSI LUN
@ 2014-11-28 21:32 Villa
  2014-11-29  3:04 ` Theodore Ts'o
  0 siblings, 1 reply; 2+ messages in thread
From: Villa @ 2014-11-28 21:32 UTC (permalink / raw)
  To: linux-ext4

Hi everyone,

I've got an interesting ext4 corruption problem that I can
successfully reproduce and I'm trying to determine where the fault is
coming from.  Let me start out by saying that I am not a kernel
developer, nor am I much of a programmer.  My understanding of
filesystems is rudimentary (by computer science standards), but after
20 years in the IT field, I certainly know more than your average
person.  Having said that, I can't offer deep technical insight into
filesystem issues - but I hope you can.

The problem is occurring with an iSCSI LUN presented to an Ubuntu
12.04 x64 Linux system via a Synology DS1513 using DSM version 5.1.
This filesystem has been running flawlessly for quite some time.  It
is on UPS and no power outages or unscheduled shutdowns have taken
place lately.  I very recently upgraded from DSM 5.0 to 5.1, and
roughly after this I started noticing the filesystem corruption
problem.  However, it is far too simplistic to immediately assume that
DSM 5.1 is the culprit, and instead I am trying to find out what else
may be causing the issue.

The LUN is approximately 4TB and from the time that DSM 5.1 was
installed to the point that I began noticing problems was only a few
days (again, this doesn't prove the Synology DSM is involved).  In
those few days, almost no new files were added to the filesystem.
However, I noticed the next day after I added a directory and some new
files (thanks to a Logwatch report) that several errors were recorded
by the kernel.  I unmounted the LUN and ran "fsck.ext4 -f" on the
device, which detected several errors and fixed them.  The recovered
files were in the "lost+found" directory and I was able to move them
into the correct place.  However, on a hunch, I tried the same thing
again - and got the same errors.  This situation seems to be
completely repeatable on my system.  I just subscribed to this list
today and I am not familiar with your established standards or
expectations, so I am including as much relevant information as I can.
If anyone has any insight or clues, or needs more information, please
let me know.

"uname -a" output:
Linux cj148869-a 3.2.0-72-generic #107-Ubuntu SMP Thu Nov 6 14:24:01
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
--------------------------------------------------------------------------------

Mounted iSCSI device/partition:
/dev/sdd1
--------------------------------------------------------------------------------

"fdisk" p:
Disk /dev/sdd: 4402.3 GB, 4402341478400 bytes
255 heads, 63 sectors/track, 535220 cylinders, total 8598323200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1  4294967295  2147483647+  ee  GPT
Partition 1 does not start on physical sector boundary
--------------------------------------------------------------------------------

"iscsiadm -m node" output:
172.16.8.10:3260,0 iqn.2000-01.com.synology:regusersfs.cjserver-lun1-target
--------------------------------------------------------------------------------

"lspci | grep -i ethernet" output:
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
--------------------------------------------------------------------------------

NIC kernel module:
r8168 (version 8.037.00)
--------------------------------------------------------------------------------

Command to mount LUN:
mount -t ext4 -o acl,user_xattr /dev/sdd1 /storage/iscsi-lun1
--------------------------------------------------------------------------------

Commands to trigger fault/corruption:
mkdir /storage/iscsi-lun1/mymedia/pub/software/linux/mobile
vi /storage/iscsi-lun1/mymedia/pub/software/linux/mobile/text.txt
     (an attempt to write a simple text file)
--------------------------------------------------------------------------------

output of "dmesg" (beginning with the mounting of the device):
[125975.883678] EXT4-fs (sdd1): mounted filesystem with ordered data
mode. Opts: acl,user_xattr
[126085.888075] sd 9:0:0:0: [sdd]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[126085.888081] sd 9:0:0:0: [sdd]  Sense Key : Illegal Request [current]
[126085.888086] sd 9:0:0:0: [sdd]  <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0
[126085.888093] sd 9:0:0:0: [sdd] CDB: Write(16): 8a 00 00 00 00 01 e1
c0 95 c0 00 00 00 08 00 00
[126085.888105] end_request: I/O error, dev sdd, sector 8082462144
[126085.890808] Buffer I/O error on device sdd1, logical block 1010307512
[126085.893509] lost page write due to I/O error on sdd1
[126105.933792] EXT4-fs error (device sdd1): add_dirent_to_buf:1273:
inode #126289726: block 1010307512: comm vi: bad entry in directory:
rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0,
name_len=0
[126105.935569] EXT4-fs error (device sdd1): add_dirent_to_buf:1273:
inode #126289726: block 1010307512: comm vi: bad entry in directory:
rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0,
name_len=0
[126111.933747] EXT4-fs error (device sdd1): add_dirent_to_buf:1273:
inode #126289726: block 1010307512: comm vi: bad entry in directory:
rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0,
name_len=0
--------------------------------------------------------------------------------

After umounting, output of "fsck.ext4 -f /dev/sdd1":
e2fsck 1.42 (29-Nov-2011)
/dev/sdd1: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Directory inode 126289726, block #0, offset 0: directory corrupted
Salvage<y>? yes

Missing '.' in directory inode 126289726.
Fix<y>? yes

Setting filetype for entry '.' in ??? (126289726) to 2.
Missing '..' in directory inode 126289726.
Fix<y>? yes

Setting filetype for entry '..' in ??? (126289726) to 2.
Pass 3: Checking directory connectivity
'..' in /mymedia/pub/software/linux/mobile (126289726) is <The NULL
inode> (0), should be /mymedia/pub/software/linux (126091366).
Fix<y>? yes

Pass 4: Checking reference counts
Inode 2 ref count is 4, should be 5.  Fix<y>? yes

Inode 126091366 ref count is 19, should be 18.  Fix<y>? yes

Pass 5: Checking group summary information

/dev/sdd1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdd1: 147160/134348800 files (0.8% non-contiguous),
478740596/1074789888 blocks
--------------------------------------------------------------------------------

After running this clean up and either moving around files from
lost+found (or just deleting them), the filesystem seems to behave --
until I try to write files.

Other relevant "dmesg" warnings from other recent failures/problems
(happened immediately after mounting and trying to write
files/folders):
[28315.611845] EXT4-fs (sdd1): mounted filesystem with ordered data
mode. Opts: acl,user_xattr
[28360.135947] EXT4-fs error (device sdd1):
htree_dirblock_to_tree:587: inode #126289726: block 1010307512: comm
rm: bad entry in directory: rec_len is smaller than minimal -
offset=0(0), inode=0, rec_len=0, name_len=0
[28360.138737] EXT4-fs warning (device sdd1): empty_dir:1926: bad
directory (dir #126289726) - no `.' or `..'
[28580.746047] EXT4-fs (sdd1): mounted filesystem with ordered data
mode. Opts: acl,user_xattr
[28597.680443] sd 9:0:0:0: [sdd]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[28597.680449] sd 9:0:0:0: [sdd]  Sense Key : Illegal Request [current]
[28597.680454] sd 9:0:0:0: [sdd]  <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0
[28597.680466] sd 9:0:0:0: [sdd] CDB: Write(16): 8a 00 00 00 00 01 e1
c0 95 c0 00 00 00 08 00 00
[28597.680472] end_request: I/O error, dev sdd, sector 8082462144
[28597.681706] Buffer I/O error on device sdd1, logical block 1010307512
[28597.682936] lost page write due to I/O error on sdd1
[28617.421379] Aborting journal on device sdd1-8.
[28617.425268] EXT4-fs error (device sdd1): ext4_put_super:819:
Couldn't clean up the journal
[28617.427950] EXT4-fs (sdd1): Remounting filesystem read-only
[28621.076820] sd 9:0:0:0: [sdd]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[28621.076824] sd 9:0:0:0: [sdd]  Sense Key : Illegal Request [current]
[28621.076828] sd 9:0:0:0: [sdd]  <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0
[28621.076834] sd 9:0:0:0: [sdd] CDB: Write(16): 8a 00 00 00 00 01 e1
c0 95 c0 00 00 00 08 00 00
[28621.076844] end_request: I/O error, dev sdd, sector 8082462144
[28621.078991] Buffer I/O error on device sdd1, logical block 1010307512
[28621.081116] lost page write due to I/O error on sdd1
[28670.043409] sd 9:0:0:0: [sdd]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[28670.043413] sd 9:0:0:0: [sdd]  Sense Key : Illegal Request [current]
[28670.043417] sd 9:0:0:0: [sdd]  <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0
[28670.043421] sd 9:0:0:0: [sdd] CDB: Write(16): 8a 00 00 00 00 01 e1
c0 95 c0 00 00 00 08 00 00
[28670.043429] end_request: I/O error, dev sdd, sector 8082462144
[28670.045163] Buffer I/O error on device sdd1, logical block 1010307512
[28670.046886] lost page write due to I/O error on sdd1
[28700.734181] EXT4-fs (sdd1): mounted filesystem with ordered data
mode. Opts: acl,user_xattr
[28721.134899] EXT4-fs error (device sdd1):
htree_dirblock_to_tree:587: inode #126289726: block 1010307512: comm
rm: bad entry in directory: rec_len is smaller than minimal -
offset=0(0), inode=0, rec_len=0, name_len=0
[28721.137720] EXT4-fs warning (device sdd1): empty_dir:1926: bad
directory (dir #126289726) - no `.' or `..'
--------------------------------------------------------------------------------

I know this list doesn't exist to fix my personal problems and I
understand that this is a lot (especially for the first post in the
thread), but I'd like to know if any of you think this filesystem is
salvageable and if it can be permanently fixed.  Luckily this is a
backup LUN and all of the data is safely elsewhere, so I can
"experiment" if necessary.  I wonder if this is some sort of
kernel/module problem.  If anyone can help, I'd greatly appreciate it.
Let me know if you need more info.

Thanks,

Villa

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Filesystem corruption on Synology iSCSI LUN
  2014-11-28 21:32 Filesystem corruption on Synology iSCSI LUN Villa
@ 2014-11-29  3:04 ` Theodore Ts'o
  0 siblings, 0 replies; 2+ messages in thread
From: Theodore Ts'o @ 2014-11-29  3:04 UTC (permalink / raw)
  To: Villa; +Cc: linux-ext4

On Fri, Nov 28, 2014 at 09:32:21PM +0000, Villa wrote:
> The problem is occurring with an iSCSI LUN presented to an Ubuntu
> 12.04 x64 Linux system via a Synology DS1513 using DSM version 5.1.
> This filesystem has been running flawlessly for quite some time.  It
> is on UPS and no power outages or unscheduled shutdowns have taken
> place lately.  I very recently upgraded from DSM 5.0 to 5.1, and
> roughly after this I started noticing the filesystem corruption
> problem.  However, it is far too simplistic to immediately assume that
> DSM 5.1 is the culprit, and instead I am trying to find out what else
> may be causing the issue.

Unfortunately, I suspect all we can say is that DSM 5.1 is probably
the issue.

> [126085.888075] sd 9:0:0:0: [sdd]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [126085.888081] sd 9:0:0:0: [sdd]  Sense Key : Illegal Request [current]
> [126085.888086] sd 9:0:0:0: [sdd]  <<vendor>> ASC=0x80 ASCQ=0x0ASC=0x80 ASCQ=0x0
> [126085.888093] sd 9:0:0:0: [sdd] CDB: Write(16): 8a 00 00 00 00 01 e1
> c0 95 c0 00 00 00 08 00 00
> [126085.888105] end_request: I/O error, dev sdd, sector 8082462144
> [126085.890808] Buffer I/O error on device sdd1, logical block 1010307512
> [126085.893509] lost page write due to I/O error on sdd1

This I/O error is coming from the SCSI stack, and indicates something
is going very wrong at the iSCSI target or iSCSI initiator.  Until you
can get this resolved, it's hopeless to try to look at anything at the
file system layer.  You always have to fix the problems lowest on the
storage stack before moving upwards...

> I know this list doesn't exist to fix my personal problems and I
> understand that this is a lot (especially for the first post in the
> thread), but I'd like to know if any of you think this filesystem is
> salvageable and if it can be permanently fixed.  Luckily this is a
> backup LUN and all of the data is safely elsewhere, so I can
> "experiment" if necessary.  I wonder if this is some sort of
> kernel/module problem.  If anyone can help, I'd greatly appreciate it.
> Let me know if you need more info.

The main thing is can you get the bits off of the iSCSI volume
successfully.  If you can, there is a high probability that the file
system can be fixed, with hopefully minor amounts of data loss.  But
given the malfunctioning at the SCSI layer, any attempt to try to
"fix" things on at the file system level can very easily make things
much worse.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-11-29  3:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-28 21:32 Filesystem corruption on Synology iSCSI LUN Villa
2014-11-29  3:04 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).