* Disk error, then endless loop
@ 2015-11-17 8:03 Chris Dunlop
2015-11-17 12:41 ` Brian Foster
0 siblings, 1 reply; 8+ messages in thread
From: Chris Dunlop @ 2015-11-17 8:03 UTC (permalink / raw)
To: xfs
Hi,
XFS error handling on linux 3.18.21 looks to be "suboptimal".
I had an XFS disk start returning read errors, then disappear from the
controller altogether (only to come back under a different /dev/sdXX name).
XFS is now endlessly flooding these messages into kern.log (55,000 copies
and counting...):
[5358213.926049] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
[5358213.926141] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
More info below, but some questions:
Is this a known issue, if so, has it been fixed, and if so, in which commit?
I guess I'm going to have to hard boot the machine to get out of this,
right?
More info...
The XFS the only thing on the gpt partitioned disk, on partition 1, with a
log device on a partition of an SSD-backed md raid-1.
The disk is:
Device Model: WDC WD60EFRX-68MYMN1
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
The XFS was formatted like:
# mkfs.xfs -V
mkfs.xfs version 3.2.1
# mkfs.xfs -l logdev=/dev/md8p5 -i size=2048 /dev/sdu1
meta-data=/dev/sdu1 isize=2048 agcount=6, agsize=268435455 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=1465130385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =/dev/md8p5 bsize=4096 blocks=409600, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
I tried to umount the filesystem but the umount is now hung and unkillable:
# ps -ostat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
As previously mentioned, the disk has actually reappeared under a different
/dev/sdXX name (it was sdu, now sdbh). Trying to mount the disk (read only)
results in:
# mkdir /mnt/xfs && mount -ologdev=/dev/md8p5,ro /dev/sdbh1 /mnt/xfs
mount: /dev/sdbh1 already mounted or /mnt/xfs busy
kern.log leading up to this event:
[5358213.665887] mpt2sas0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436)
[5358213.665939] mpt2sas0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436)
[5358213.665990] mpt2sas0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436)
[5358213.666042] mpt2sas0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436)
[5358213.666138] sd 0:0:20:0: [sdu]
[5358213.666165] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[5358213.666196] sd 0:0:20:0: [sdu] CDB:
[5358213.666222] Write(16): 8a 00 00 00 00 00 2e 99 9b 00 00 00 02 98 00 00
[5358213.666295] blk_update_request: I/O error, dev sdu, sector 781818624
[5358213.666423] Buffer I/O error on dev sdu1, logical block 363305032, lost async page write
[5358213.666480] sd 0:0:20:0: [sdu]
[5358213.666504] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[5358213.666532] sd 0:0:20:0: [sdu] CDB:
[5358213.666555] Write(16): 8a 00 00 00 00 00 2e 99 97 00 00 00 04 00 00 00
[5358213.666626] blk_update_request: I/O error, dev sdu, sector 781817600
[5358213.666661] sd 0:0:20:0: [sdu]
[5358213.666684] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[5358213.666713] sd 0:0:20:0: [sdu] CDB:
[5358213.666736] Write(16): 8a 00 00 00 00 00 2e 99 93 00 00 00 04 00 00 00
[5358213.666808] blk_update_request: I/O error, dev sdu, sector 781816576
[5358213.666842] sd 0:0:20:0: [sdu]
[5358213.666865] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[5358213.666893] sd 0:0:20:0: [sdu] CDB:
[5358213.666917] Read(16): 88 00 00 00 00 01 27 9b 51 10 00 00 00 08 00 00
[5358213.666988] blk_update_request: I/O error, dev sdu, sector 4959457552
[5358213.667025] sd 0:0:20:0: [sdu]
[5358213.667048] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[5358213.667077] sd 0:0:20:0: [sdu] CDB:
[5358213.667100] Write(16): 8a 00 00 00 00 01 2c 40 b8 a8 00 00 01 78 00 00
[5358213.667171] blk_update_request: I/O error, dev sdu, sector 5037406376
[5358213.667206] sd 0:0:20:0: [sdu]
[5358213.667229] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[5358213.667257] sd 0:0:20:0: [sdu] CDB:
[5358213.667281] Write(16): 8a 00 00 00 00 01 2c 40 b4 a8 00 00 04 00 00 00
[5358213.667351] blk_update_request: I/O error, dev sdu, sector 5037405352
[5358213.667385] blk_update_request: I/O error, dev sdu, sector 0
[5358213.667419] blk_update_request: I/O error, dev sdu, sector 0
[5358213.667452] sd 0:0:20:0: [sdu]
[5358213.667475] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[5358213.667504] sd 0:0:20:0: [sdu] CDB:
[5358213.667527] Write(16): 8a 00 00 00 00 01 27 9b 50 b0 00 00 00 60 00 00
[5358213.667598] blk_update_request: I/O error, dev sdu, sector 4959457456
[5358213.667628] Buffer I/O error on dev sdu1, logical block 619931926, lost async page write
[5358213.667678] Buffer I/O error on dev sdu1, logical block 619931927, lost async page write
[5358213.667727] Buffer I/O error on dev sdu1, logical block 619931928, lost async page write
[5358213.667774] Buffer I/O error on dev sdu1, logical block 619931929, lost async page write
[5358213.667821] Buffer I/O error on dev sdu1, logical block 619931930, lost async page write
[5358213.667868] Buffer I/O error on dev sdu1, logical block 619931931, lost async page write
[5358213.667915] Buffer I/O error on dev sdu1, logical block 619931932, lost async page write
[5358213.667962] Buffer I/O error on dev sdu1, logical block 619931933, lost async page write
[5358213.668010] Buffer I/O error on dev sdu1, logical block 619931934, lost async page write
[5358213.668065] sd 0:0:20:0: [sdu]
[5358213.668088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[5358213.668118] sd 0:0:20:0: [sdu] CDB:
[5358213.668141] Write(16): 8a 00 00 00 00 00 2e 99 91 98 00 00 01 68 00 00
<< above 4 errors repeat a number of times, then >>>
[5358213.672847] sd 0:0:20:0: [sdu]
[5358213.672870] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[5358213.672898] sd 0:0:20:0: [sdu] CDB:
[5358213.672922] Write(16): 8a 00 00 00 00 00 ad 40 60 38 00 00 04 00 00 00
[5358213.673083] XFS (sdu1): metadata I/O error: block 0x817f21d8 ("xfs_trans_read_buf_map") error 5 numblks 8
[5358213.673086] XFS (sdu1): metadata I/O error: block 0x183698f78 ("xfs_trans_read_buf_map") error 5 numblks 16
[5358213.673093] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[5358213.673225] sd 0:0:20:0: [sdu]
[5358213.673226] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[5358213.673227] sd 0:0:20:0: [sdu] CDB:
[5358213.673233] Write(16): 8a 00 00 00 00 01 ab b3 5a c8 00 00 04 00 00 00
[5358213.678590] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
[5358213.678686] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[5358213.679799] XFS (sdu1): metadata I/O error: block 0x28 ("xfs_buf_iodone_callbacks") error 5 numblks 8
[5358213.725951] XFS (sdu1): Detected failing async write on buffer block 0x805d4cd8. Retrying async write.
[5358213.725951]
[5358213.726069] XFS (sdu1): Detected failing async write on buffer block 0x20d390918. Retrying async write.
[5358213.726069]
[5358213.726181] XFS (sdu1): Detected failing async write on buffer block 0x88a017f0. Retrying async write.
[5358213.726181]
[5358213.726292] XFS (sdu1): Detected failing async write on buffer block 0x80d04890. Retrying async write.
[5358213.726292]
[5358213.726428] XFS (sdu1): Detected failing async write on buffer block 0x85bd33d8. Retrying async write.
[5358213.726428]
[5358213.726539] XFS (sdu1): Detected failing async write on buffer block 0x80ca6110. Retrying async write.
[5358213.726539]
[5358213.726650] XFS (sdu1): Detected failing async write on buffer block 0x857f1bb8. Retrying async write.
[5358213.726650]
[5358213.726762] XFS (sdu1): Detected failing async write on buffer block 0x88a017e0. Retrying async write.
[5358213.726762]
[5358213.726873] XFS (sdu1): Detected failing async write on buffer block 0x804f1c10. Retrying async write.
[5358213.726873]
[5358213.726984] XFS (sdu1): Detected failing async write on buffer block 0x859381b8. Retrying async write.
[5358213.726984]
[5358213.727126] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
[5358213.727212] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[5358213.775880] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
[5358213.775972] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[5358213.825966] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
[5358213.826061] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[5358213.876050] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
[5358213.876142] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[5358213.926049] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
[5358213.926141] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
...and the "metadata" and "xfs_imap_to_bp" messages continue to flood into
kern.log (120,000 and counting...)
Cheers,
Chris
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disk error, then endless loop
2015-11-17 8:03 Disk error, then endless loop Chris Dunlop
@ 2015-11-17 12:41 ` Brian Foster
2015-11-17 16:28 ` Chris Dunlop
0 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2015-11-17 12:41 UTC (permalink / raw)
To: Chris Dunlop; +Cc: xfs
On Tue, Nov 17, 2015 at 07:03:33PM +1100, Chris Dunlop wrote:
> Hi,
>
> XFS error handling on linux 3.18.21 looks to be "suboptimal".
>
> I had an XFS disk start returning read errors, then disappear from the
> controller altogether (only to come back under a different /dev/sdXX name).
> XFS is now endlessly flooding these messages into kern.log (55,000 copies
> and counting...):
>
Note that disks returning errors and/or dropping on and offline is
outside the realm of the filesystem. There isn't much the fs can do in
that case.
> [5358213.926049] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
> [5358213.926141] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
>
> More info below, but some questions:
>
> Is this a known issue, if so, has it been fixed, and if so, in which commit?
>
This is expected and the most likely scenario is that the filesystem
eventually shuts itself down to avoid further damage. At this point,
most operations will return -EIO (-5) back to the user.
> I guess I'm going to have to hard boot the machine to get out of this,
> right?
>
Normally you should be able to unmount a filesystem that has shut
down...
> More info...
>
> The XFS the only thing on the gpt partitioned disk, on partition 1, with a
> log device on a partition of an SSD-backed md raid-1.
>
> The disk is:
>
> Device Model: WDC WD60EFRX-68MYMN1
> User Capacity: 6,001,175,126,016 bytes [6.00 TB]
> Sector Sizes: 512 bytes logical, 4096 bytes physical
>
> The XFS was formatted like:
>
> # mkfs.xfs -V
> mkfs.xfs version 3.2.1
> # mkfs.xfs -l logdev=/dev/md8p5 -i size=2048 /dev/sdu1
> meta-data=/dev/sdu1 isize=2048 agcount=6, agsize=268435455 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=0 finobt=0
> data = bsize=4096 blocks=1465130385, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
> log =/dev/md8p5 bsize=4096 blocks=409600, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> I tried to umount the filesystem but the umount is now hung and unkillable:
>
> # ps -ostat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
> STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
> D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
>
... but it appears you still have something pending in the AIL which is
holding everything up. The most likely case is an EFI/EFD item hanging
around from an extent free operation, as this was a known issue,
particularly on fs shutdowns. Fixes for this went into the v4.3 kernel.
Note that this won't address whatever is wrong with the disk in the
first place, just (hopefully) the ability to unmount when it ultimately
fails.
Does the umount process actually appear to be doing anything? E.g., are
you seeing noticeable CPU load or I/O errors continue to the logs, or
has everything pretty much locked up? You could also enable tracepoints
(trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe)
to get a quick idea of what's going on.
> As previously mentioned, the disk has actually reappeared under a different
> /dev/sdXX name (it was sdu, now sdbh). Trying to mount the disk (read only)
> results in:
>
> # mkdir /mnt/xfs && mount -ologdev=/dev/md8p5,ro /dev/sdbh1 /mnt/xfs
> mount: /dev/sdbh1 already mounted or /mnt/xfs busy
>
Probably due to either a uuid check or blocking on access to the
external log device. You'll probably need to clean up the stale mount
before this will work.
As it is, something is clearly wrong with the drive. I can't really
interpret the I/O errors and whatnot (linux-scsi?), but you probably
want to look into health assessment tools (e.g., smart) to get an idea
of what's wrong and/or replace the device and restore from backups (or
perhaps heal via the ceph cluster, in your case).
Brian
> kern.log leading up to this event:
>
> [5358213.665887] mpt2sas0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436)
> [5358213.665939] mpt2sas0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436)
> [5358213.665990] mpt2sas0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436)
> [5358213.666042] mpt2sas0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436)
> [5358213.666138] sd 0:0:20:0: [sdu]
> [5358213.666165] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [5358213.666196] sd 0:0:20:0: [sdu] CDB:
> [5358213.666222] Write(16): 8a 00 00 00 00 00 2e 99 9b 00 00 00 02 98 00 00
> [5358213.666295] blk_update_request: I/O error, dev sdu, sector 781818624
> [5358213.666423] Buffer I/O error on dev sdu1, logical block 363305032, lost async page write
> [5358213.666480] sd 0:0:20:0: [sdu]
> [5358213.666504] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [5358213.666532] sd 0:0:20:0: [sdu] CDB:
> [5358213.666555] Write(16): 8a 00 00 00 00 00 2e 99 97 00 00 00 04 00 00 00
> [5358213.666626] blk_update_request: I/O error, dev sdu, sector 781817600
> [5358213.666661] sd 0:0:20:0: [sdu]
> [5358213.666684] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [5358213.666713] sd 0:0:20:0: [sdu] CDB:
> [5358213.666736] Write(16): 8a 00 00 00 00 00 2e 99 93 00 00 00 04 00 00 00
> [5358213.666808] blk_update_request: I/O error, dev sdu, sector 781816576
> [5358213.666842] sd 0:0:20:0: [sdu]
> [5358213.666865] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [5358213.666893] sd 0:0:20:0: [sdu] CDB:
> [5358213.666917] Read(16): 88 00 00 00 00 01 27 9b 51 10 00 00 00 08 00 00
> [5358213.666988] blk_update_request: I/O error, dev sdu, sector 4959457552
> [5358213.667025] sd 0:0:20:0: [sdu]
> [5358213.667048] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [5358213.667077] sd 0:0:20:0: [sdu] CDB:
> [5358213.667100] Write(16): 8a 00 00 00 00 01 2c 40 b8 a8 00 00 01 78 00 00
> [5358213.667171] blk_update_request: I/O error, dev sdu, sector 5037406376
> [5358213.667206] sd 0:0:20:0: [sdu]
> [5358213.667229] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [5358213.667257] sd 0:0:20:0: [sdu] CDB:
> [5358213.667281] Write(16): 8a 00 00 00 00 01 2c 40 b4 a8 00 00 04 00 00 00
> [5358213.667351] blk_update_request: I/O error, dev sdu, sector 5037405352
> [5358213.667385] blk_update_request: I/O error, dev sdu, sector 0
> [5358213.667419] blk_update_request: I/O error, dev sdu, sector 0
> [5358213.667452] sd 0:0:20:0: [sdu]
> [5358213.667475] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [5358213.667504] sd 0:0:20:0: [sdu] CDB:
> [5358213.667527] Write(16): 8a 00 00 00 00 01 27 9b 50 b0 00 00 00 60 00 00
> [5358213.667598] blk_update_request: I/O error, dev sdu, sector 4959457456
> [5358213.667628] Buffer I/O error on dev sdu1, logical block 619931926, lost async page write
> [5358213.667678] Buffer I/O error on dev sdu1, logical block 619931927, lost async page write
> [5358213.667727] Buffer I/O error on dev sdu1, logical block 619931928, lost async page write
> [5358213.667774] Buffer I/O error on dev sdu1, logical block 619931929, lost async page write
> [5358213.667821] Buffer I/O error on dev sdu1, logical block 619931930, lost async page write
> [5358213.667868] Buffer I/O error on dev sdu1, logical block 619931931, lost async page write
> [5358213.667915] Buffer I/O error on dev sdu1, logical block 619931932, lost async page write
> [5358213.667962] Buffer I/O error on dev sdu1, logical block 619931933, lost async page write
> [5358213.668010] Buffer I/O error on dev sdu1, logical block 619931934, lost async page write
> [5358213.668065] sd 0:0:20:0: [sdu]
> [5358213.668088] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [5358213.668118] sd 0:0:20:0: [sdu] CDB:
> [5358213.668141] Write(16): 8a 00 00 00 00 00 2e 99 91 98 00 00 01 68 00 00
> << above 4 errors repeat a number of times, then >>>
> [5358213.672847] sd 0:0:20:0: [sdu]
> [5358213.672870] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [5358213.672898] sd 0:0:20:0: [sdu] CDB:
> [5358213.672922] Write(16): 8a 00 00 00 00 00 ad 40 60 38 00 00 04 00 00 00
> [5358213.673083] XFS (sdu1): metadata I/O error: block 0x817f21d8 ("xfs_trans_read_buf_map") error 5 numblks 8
> [5358213.673086] XFS (sdu1): metadata I/O error: block 0x183698f78 ("xfs_trans_read_buf_map") error 5 numblks 16
> [5358213.673093] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> [5358213.673225] sd 0:0:20:0: [sdu]
> [5358213.673226] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [5358213.673227] sd 0:0:20:0: [sdu] CDB:
> [5358213.673233] Write(16): 8a 00 00 00 00 01 ab b3 5a c8 00 00 04 00 00 00
> [5358213.678590] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
> [5358213.678686] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> [5358213.679799] XFS (sdu1): metadata I/O error: block 0x28 ("xfs_buf_iodone_callbacks") error 5 numblks 8
> [5358213.725951] XFS (sdu1): Detected failing async write on buffer block 0x805d4cd8. Retrying async write.
> [5358213.725951]
> [5358213.726069] XFS (sdu1): Detected failing async write on buffer block 0x20d390918. Retrying async write.
> [5358213.726069]
> [5358213.726181] XFS (sdu1): Detected failing async write on buffer block 0x88a017f0. Retrying async write.
> [5358213.726181]
> [5358213.726292] XFS (sdu1): Detected failing async write on buffer block 0x80d04890. Retrying async write.
> [5358213.726292]
> [5358213.726428] XFS (sdu1): Detected failing async write on buffer block 0x85bd33d8. Retrying async write.
> [5358213.726428]
> [5358213.726539] XFS (sdu1): Detected failing async write on buffer block 0x80ca6110. Retrying async write.
> [5358213.726539]
> [5358213.726650] XFS (sdu1): Detected failing async write on buffer block 0x857f1bb8. Retrying async write.
> [5358213.726650]
> [5358213.726762] XFS (sdu1): Detected failing async write on buffer block 0x88a017e0. Retrying async write.
> [5358213.726762]
> [5358213.726873] XFS (sdu1): Detected failing async write on buffer block 0x804f1c10. Retrying async write.
> [5358213.726873]
> [5358213.726984] XFS (sdu1): Detected failing async write on buffer block 0x859381b8. Retrying async write.
> [5358213.726984]
> [5358213.727126] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
> [5358213.727212] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> [5358213.775880] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
> [5358213.775972] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> [5358213.825966] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
> [5358213.826061] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> [5358213.876050] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
> [5358213.876142] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> [5358213.926049] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
> [5358213.926141] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
>
> ...and the "metadata" and "xfs_imap_to_bp" messages continue to flood into
> kern.log (120,000 and counting...)
>
>
> Cheers,
>
> Chris
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disk error, then endless loop
2015-11-17 12:41 ` Brian Foster
@ 2015-11-17 16:28 ` Chris Dunlop
2015-11-17 17:37 ` Brian Foster
0 siblings, 1 reply; 8+ messages in thread
From: Chris Dunlop @ 2015-11-17 16:28 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
G'day Brian,
Thanks for your response...
On Tue, Nov 17, 2015 at 07:41:48AM -0500, Brian Foster wrote:
> On Tue, Nov 17, 2015 at 07:03:33PM +1100, Chris Dunlop wrote:
>> Hi,
>>
>> XFS error handling on linux 3.18.21 looks to be "suboptimal".
>>
>> I had an XFS disk start returning read errors, then disappear from the
>> controller altogether (only to come back under a different /dev/sdXX name).
>> XFS is now endlessly flooding these messages into kern.log (55,000 copies
>> and counting...):
>
> Note that disks returning errors and/or dropping on and offline is
> outside the realm of the filesystem. There isn't much the fs can do in
> that case.
What?! The fs can't fix broken disks?! But I want magic ponies! :-)
>> [5358213.926049] XFS (sdu1): metadata I/O error: block 0x2b163a0 ("xfs_trans_read_buf_map") error 5 numblks 16
>> [5358213.926141] XFS (sdu1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
>>
>> More info below, but some questions:
>>
>> Is this a known issue, if so, has it been fixed, and if so, in which commit?
>>
>
> This is expected and the most likely scenario is that the filesystem
> eventually shuts itself down to avoid further damage. At this point,
> most operations will return -EIO (-5) back to the user.
>
>> I guess I'm going to have to hard boot the machine to get out of this,
>> right?
>
> Normally you should be able to unmount a filesystem that has shut
> down...
But not in this case...
>> I tried to umount the filesystem but the umount is now hung and unkillable:
>>
>> # ps -ostat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
>> STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
>> D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
>
> ... but it appears you still have something pending in the AIL which is
> holding everything up. The most likely case is an EFI/EFD item hanging
> around from an extent free operation, as this was a known issue,
> particularly on fs shutdowns. Fixes for this went into the v4.3 kernel.
Any chance of these fixes getting into -stable, or are they too intrusive
and/or depend on other intrusive changes?
> Note that this won't address whatever is wrong with the disk in the
> first place,
Bwaaaaahhhh, magic ponies!
> just (hopefully) the ability to unmount when it ultimately fails.
Yes, that's what I was hoping for.
> Does the umount process actually appear to be doing anything? E.g., are
> you seeing noticeable CPU load or I/O errors continue to the logs, or
> has everything pretty much locked up? You could also enable tracepoints
> (trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe)
> to get a quick idea of what's going on.
No, the umount hasn't done anything noticable in the past 6.5 hours:
b2# date; ps -opid,lstart,time,stat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
Wed Nov 18 03:08:30 AEDT 2015
PID STARTED TIME STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
23946 Tue Nov 17 17:30:41 2015 00:00:00 D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
I don't know how to interpret the tracepoints, but there are other busy
XFS filesystems on the box so that's cluttering things up. For what it's
worth, it appears the original device (/dev/sdu1) was previously dev
65:65:
b2# ls -l /dev/sd[tuv]{,1}
brw-rw---T 1 root disk 65, 48 Sep 16 17:11 /dev/sdt
brw-rw---T 1 root disk 65, 49 Sep 16 17:11 /dev/sdt1
brw-rw---T 1 root disk 65, 80 Oct 30 15:38 /dev/sdv
brw-rw---T 1 root disk 65, 81 Oct 30 15:40 /dev/sdv1
..and in 10 seconds of /sys/kernel/debug/tracing/trace_pipe we have:
# grep 'dev 65:65' /sys/kernel/debug/tracing/trace_pipe > /tmp/x1 & sleep 10; kill $!
# wc -l /tmp/x1
181953 /tmp/x1
# head /tmp/x1
<...>-7702 [012] .... 5392362.786946: xfs_buf_item_iodone_async: dev 65:65 bno 0x1828eed18 nblks 0x8 hold 2 pincount 0 lock 0 flags ASYNC|DONE|PAGES caller xfs_buf_ioend [xfs]
<...>-7702 [012] .... 5392362.786946: xfs_buf_ioerror: dev 65:65 bno 0x1828eed18 len 0x1000 hold 2 pincount 0 lock 0 error 0 flags ASYNC|DONE|PAGES caller xfs_buf_iodone_callbacks [xfs]
xfsaild/sdu1-7991 [005] .N.. 5392363.647059: xfs_buf_submit: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
xfsaild/sdu1-7991 [005] .N.. 5392363.647059: xfs_buf_hold: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_rele: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 3 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_submit: dev 65:65 bno 0x280006398 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_hold: dev 65:65 bno 0x280006398 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
xfsaild/sdu1-7991 [005] .N.. 5392363.647063: xfs_buf_rele: dev 65:65 bno 0x280006398 nblks 0x8 hold 3 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
xfsaild/sdu1-7991 [005] .N.. 5392363.647063: xfs_buf_submit: dev 65:65 bno 0x2800063f8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
xfsaild/sdu1-7991 [005] .N.. 5392363.647064: xfs_buf_hold: dev 65:65 bno 0x2800063f8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
In the 181953 lines there's a set of 188 different block numbers
appearing, ranging from 0x8 to 0x2900ffbd8:
# sed -rn 's/.*bno (0x[[:xdigit:]]+).*/\1/p' /tmp/x1 | sort -u > /tmp/x2
# wc -l /tmp/x2
188 /tmp/x2
# perl -pe 's/(.*)/hex($1)/e' /tmp/x2 | sort -n | perl -ne 'printf "0x%x\n",$_' > /tmp/x3
# head /tmp/x3
0x8
0x10
0x20
0x28
0x3188
0x4a40
0x4a68
0x2b163a0
0x31fecd0
0x426e8f8
b2# tail /tmp/x3
0x2900ffb78
0x2900ffb88
0x2900ffb98
0x2900ffba8
0x2900ffbb0
0x2900ffbb8
0x2900ffbc0
0x2900ffbc8
0x2900ffbd0
0x2900ffbd8
>> As previously mentioned, the disk has actually reappeared under a different
>> /dev/sdXX name (it was sdu, now sdbh). Trying to mount the disk (read only)
>> results in:
>>
>> # mkdir /mnt/xfs && mount -ologdev=/dev/md8p5,ro /dev/sdbh1 /mnt/xfs
>> mount: /dev/sdbh1 already mounted or /mnt/xfs busy
>
> Probably due to either a uuid check or blocking on access to the
> external log device. You'll probably need to clean up the stale mount
> before this will work.
>
> As it is, something is clearly wrong with the drive. I can't really
> interpret the I/O errors and whatnot (linux-scsi?), but you probably
> want to look into health assessment tools (e.g., smart) to get an idea
> of what's wrong and/or replace the device and restore from backups (or
> perhaps heal via the ceph cluster, in your case).
Sure. It's not the disk that's concerning me, they're expected to die,
but it looks like the disk error has put XFS in a state where the only
solution is a hard power cycle (after quiescing and cleaning up what I
can): I haven't tried it yet, but I expect I won't be able reboot
cleanly.
>
> Brian
Cheers,
Chris
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disk error, then endless loop
2015-11-17 16:28 ` Chris Dunlop
@ 2015-11-17 17:37 ` Brian Foster
2015-11-17 19:35 ` Chris Dunlop
0 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2015-11-17 17:37 UTC (permalink / raw)
To: Chris Dunlop; +Cc: xfs
On Wed, Nov 18, 2015 at 03:28:02AM +1100, Chris Dunlop wrote:
> G'day Brian,
>
> Thanks for your response...
>
> On Tue, Nov 17, 2015 at 07:41:48AM -0500, Brian Foster wrote:
> > On Tue, Nov 17, 2015 at 07:03:33PM +1100, Chris Dunlop wrote:
> >> Hi,
> >>
...
>
> >> I tried to umount the filesystem but the umount is now hung and unkillable:
> >>
> >> # ps -ostat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
> >> STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
> >> D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
> >
> > ... but it appears you still have something pending in the AIL which is
> > holding everything up. The most likely case is an EFI/EFD item hanging
> > around from an extent free operation, as this was a known issue,
> > particularly on fs shutdowns. Fixes for this went into the v4.3 kernel.
>
> Any chance of these fixes getting into -stable, or are they too intrusive
> and/or depend on other intrusive changes?
>
I don't think so... it was a multi-patch series and a rework of the
EFI/EFD reference counting as opposed to an isolated bug fix. For
reference, it was commits 5e4b538 through f0b2efa or so.
> > Note that this won't address whatever is wrong with the disk in the
> > first place,
>
> Bwaaaaahhhh, magic ponies!
>
> > just (hopefully) the ability to unmount when it ultimately fails.
>
> Yes, that's what I was hoping for.
>
> > Does the umount process actually appear to be doing anything? E.g., are
> > you seeing noticeable CPU load or I/O errors continue to the logs, or
> > has everything pretty much locked up? You could also enable tracepoints
> > (trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe)
> > to get a quick idea of what's going on.
>
> No, the umount hasn't done anything noticable in the past 6.5 hours:
>
> b2# date; ps -opid,lstart,time,stat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
> Wed Nov 18 03:08:30 AEDT 2015
> PID STARTED TIME STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
> 23946 Tue Nov 17 17:30:41 2015 00:00:00 D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
>
> I don't know how to interpret the tracepoints, but there are other busy
> XFS filesystems on the box so that's cluttering things up. For what it's
> worth, it appears the original device (/dev/sdu1) was previously dev
> 65:65:
>
> b2# ls -l /dev/sd[tuv]{,1}
> brw-rw---T 1 root disk 65, 48 Sep 16 17:11 /dev/sdt
> brw-rw---T 1 root disk 65, 49 Sep 16 17:11 /dev/sdt1
> brw-rw---T 1 root disk 65, 80 Oct 30 15:38 /dev/sdv
> brw-rw---T 1 root disk 65, 81 Oct 30 15:40 /dev/sdv1
>
> ..and in 10 seconds of /sys/kernel/debug/tracing/trace_pipe we have:
>
> # grep 'dev 65:65' /sys/kernel/debug/tracing/trace_pipe > /tmp/x1 & sleep 10; kill $!
> # wc -l /tmp/x1
> 181953 /tmp/x1
> # head /tmp/x1
> <...>-7702 [012] .... 5392362.786946: xfs_buf_item_iodone_async: dev 65:65 bno 0x1828eed18 nblks 0x8 hold 2 pincount 0 lock 0 flags ASYNC|DONE|PAGES caller xfs_buf_ioend [xfs]
> <...>-7702 [012] .... 5392362.786946: xfs_buf_ioerror: dev 65:65 bno 0x1828eed18 len 0x1000 hold 2 pincount 0 lock 0 error 0 flags ASYNC|DONE|PAGES caller xfs_buf_iodone_callbacks [xfs]
> xfsaild/sdu1-7991 [005] .N.. 5392363.647059: xfs_buf_submit: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
> xfsaild/sdu1-7991 [005] .N.. 5392363.647059: xfs_buf_hold: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_rele: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 3 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_submit: dev 65:65 bno 0x280006398 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_hold: dev 65:65 bno 0x280006398 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> xfsaild/sdu1-7991 [005] .N.. 5392363.647063: xfs_buf_rele: dev 65:65 bno 0x280006398 nblks 0x8 hold 3 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> xfsaild/sdu1-7991 [005] .N.. 5392363.647063: xfs_buf_submit: dev 65:65 bno 0x2800063f8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
> xfsaild/sdu1-7991 [005] .N.. 5392363.647064: xfs_buf_hold: dev 65:65 bno 0x2800063f8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
>
Hmm, that is notably more activity than I recall when reproducing the
original AIL issue. Do we know whether the filesystem had actually shut
down or is in some intermediate state looping on errors? The fact that
it continues to try and submit I/O suggests that perhaps it hasn't shut
down for whatever reason.
If the device has already dropped and reconnected as a new dev node,
it's probably harmless at this point to just try to forcibly shut down
the fs on the old one. Could you try the following?
xfs_io -x -c shutdown <mnt>
Can you unmount the fs after that? If not, is there still any tracepoint
activity on the old device?
Brian
> In the 181953 lines there's a set of 188 different block numbers
> appearing, ranging from 0x8 to 0x2900ffbd8:
>
> # sed -rn 's/.*bno (0x[[:xdigit:]]+).*/\1/p' /tmp/x1 | sort -u > /tmp/x2
> # wc -l /tmp/x2
> 188 /tmp/x2
> # perl -pe 's/(.*)/hex($1)/e' /tmp/x2 | sort -n | perl -ne 'printf "0x%x\n",$_' > /tmp/x3
> # head /tmp/x3
> 0x8
> 0x10
> 0x20
> 0x28
> 0x3188
> 0x4a40
> 0x4a68
> 0x2b163a0
> 0x31fecd0
> 0x426e8f8
> b2# tail /tmp/x3
> 0x2900ffb78
> 0x2900ffb88
> 0x2900ffb98
> 0x2900ffba8
> 0x2900ffbb0
> 0x2900ffbb8
> 0x2900ffbc0
> 0x2900ffbc8
> 0x2900ffbd0
> 0x2900ffbd8
>
> >> As previously mentioned, the disk has actually reappeared under a different
> >> /dev/sdXX name (it was sdu, now sdbh). Trying to mount the disk (read only)
> >> results in:
> >>
> >> # mkdir /mnt/xfs && mount -ologdev=/dev/md8p5,ro /dev/sdbh1 /mnt/xfs
> >> mount: /dev/sdbh1 already mounted or /mnt/xfs busy
> >
> > Probably due to either a uuid check or blocking on access to the
> > external log device. You'll probably need to clean up the stale mount
> > before this will work.
> >
> > As it is, something is clearly wrong with the drive. I can't really
> > interpret the I/O errors and whatnot (linux-scsi?), but you probably
> > want to look into health assessment tools (e.g., smart) to get an idea
> > of what's wrong and/or replace the device and restore from backups (or
> > perhaps heal via the ceph cluster, in your case).
>
> Sure. It's not the disk that's concerning me, they're expected to die,
> but it looks like the disk error has put XFS in a state where the only
> solution is a hard power cycle (after quiescing and cleaning up what I
> can): I haven't tried it yet, but I expect I won't be able reboot
> cleanly.
>
> >
> > Brian
>
> Cheers,
>
> Chris
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disk error, then endless loop
2015-11-17 17:37 ` Brian Foster
@ 2015-11-17 19:35 ` Chris Dunlop
2015-11-17 20:21 ` Brian Foster
0 siblings, 1 reply; 8+ messages in thread
From: Chris Dunlop @ 2015-11-17 19:35 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Tue, Nov 17, 2015 at 12:37:24PM -0500, Brian Foster wrote:
> On Wed, Nov 18, 2015 at 03:28:02AM +1100, Chris Dunlop wrote:
>> On Tue, Nov 17, 2015 at 07:41:48AM -0500, Brian Foster wrote:
>>> On Tue, Nov 17, 2015 at 07:03:33PM +1100, Chris Dunlop wrote:
>>>> I tried to umount the filesystem but the umount is now hung and unkillable:
>>>>
>>>> # ps -ostat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
>>>> STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
>>>> D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
>>>
>>> ... but it appears you still have something pending in the AIL which is
>>> holding everything up. The most likely case is an EFI/EFD item hanging
>>> around from an extent free operation, as this was a known issue,
>>> particularly on fs shutdowns. Fixes for this went into the v4.3 kernel.
>>
>> Any chance of these fixes getting into -stable, or are they too intrusive
>> and/or depend on other intrusive changes?
>
> I don't think so... it was a multi-patch series and a rework of the
> EFI/EFD reference counting as opposed to an isolated bug fix. For
> reference, it was commits 5e4b538 through f0b2efa or so.
...
>>> Does the umount process actually appear to be doing anything? E.g., are
>>> you seeing noticeable CPU load or I/O errors continue to the logs, or
>>> has everything pretty much locked up? You could also enable tracepoints
>>> (trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe)
>>> to get a quick idea of what's going on.
>>
>> No, the umount hasn't done anything noticable in the past 6.5 hours:
>>
>> b2# date; ps -opid,lstart,time,stat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
>> Wed Nov 18 03:08:30 AEDT 2015
>> PID STARTED TIME STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
>> 23946 Tue Nov 17 17:30:41 2015 00:00:00 D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
>>
>> I don't know how to interpret the tracepoints, but there are other busy
>> XFS filesystems on the box so that's cluttering things up. For what it's
>> worth, it appears the original device (/dev/sdu1) was previously dev
>> 65:65:
>>
>> b2# ls -l /dev/sd[tuv]{,1}
>> brw-rw---T 1 root disk 65, 48 Sep 16 17:11 /dev/sdt
>> brw-rw---T 1 root disk 65, 49 Sep 16 17:11 /dev/sdt1
>> brw-rw---T 1 root disk 65, 80 Oct 30 15:38 /dev/sdv
>> brw-rw---T 1 root disk 65, 81 Oct 30 15:40 /dev/sdv1
>>
>> ..and in 10 seconds of /sys/kernel/debug/tracing/trace_pipe we have:
>>
>> # grep 'dev 65:65' /sys/kernel/debug/tracing/trace_pipe > /tmp/x1 & sleep 10; kill $!
>> # wc -l /tmp/x1
>> 181953 /tmp/x1
>> # head /tmp/x1
>> <...>-7702 [012] .... 5392362.786946: xfs_buf_item_iodone_async: dev 65:65 bno 0x1828eed18 nblks 0x8 hold 2 pincount 0 lock 0 flags ASYNC|DONE|PAGES caller xfs_buf_ioend [xfs]
>> <...>-7702 [012] .... 5392362.786946: xfs_buf_ioerror: dev 65:65 bno 0x1828eed18 len 0x1000 hold 2 pincount 0 lock 0 error 0 flags ASYNC|DONE|PAGES caller xfs_buf_iodone_callbacks [xfs]
>> xfsaild/sdu1-7991 [005] .N.. 5392363.647059: xfs_buf_submit: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
>> xfsaild/sdu1-7991 [005] .N.. 5392363.647059: xfs_buf_hold: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
>> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_rele: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 3 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
>> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_submit: dev 65:65 bno 0x280006398 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
>> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_hold: dev 65:65 bno 0x280006398 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
>> xfsaild/sdu1-7991 [005] .N.. 5392363.647063: xfs_buf_rele: dev 65:65 bno 0x280006398 nblks 0x8 hold 3 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
>> xfsaild/sdu1-7991 [005] .N.. 5392363.647063: xfs_buf_submit: dev 65:65 bno 0x2800063f8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
>> xfsaild/sdu1-7991 [005] .N.. 5392363.647064: xfs_buf_hold: dev 65:65 bno 0x2800063f8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
>>
>
> Hmm, that is notably more activity than I recall when reproducing the
> original AIL issue. Do we know whether the filesystem had actually shut
> down or is in some intermediate state looping on errors? The fact that
> it continues to try and submit I/O suggests that perhaps it hasn't shut
> down for whatever reason.
>
> If the device has already dropped and reconnected as a new dev node,
> it's probably harmless at this point to just try to forcibly shut down
> the fs on the old one. Could you try the following?
>
> xfs_io -x -c shutdown <mnt>
# xfs_io -x -c shutdown /var/lib/ceph/osd/ceph-18
foreign file active, shutdown command is for XFS filesystems only
# grep ceph-18 /etc/mtab
<<< crickets >>>
I don't know when the fs disappeared from mtab, it could have been when I
first did the umount I guess, I didn't think to check at the time. But the
umount is still there:
# date; ps -opid,lstart,time,stat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
Wed Nov 18 06:23:21 AEDT 2015
PID STARTED TIME STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
23946 Tue Nov 17 17:30:41 2015 00:00:00 D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
> Can you unmount the fs after that? If not, is there still any tracepoint
> activity on the old device?
>
> Brian
The activity is still ongoing on the old device:
# trace-cmd start -e "xfs:*"
/sys/kernel/debug/tracing/events/xfs/*/filter
# grep 'dev 65:65' /sys/kernel/debug/tracing/trace_pipe > /tmp/y1 & sleep 10; kill $!; wc -l /tmp/y1
129573 /tmp/y1
Doing my little sed and perl dance shows it's still the same set of 188
blocks as mentioned before.
Chris
>> In the 181953 lines there's a set of 188 different block numbers
>> appearing, ranging from 0x8 to 0x2900ffbd8:
>>
>> # sed -rn 's/.*bno (0x[[:xdigit:]]+).*/\1/p' /tmp/x1 | sort -u > /tmp/x2
>> # wc -l /tmp/x2
>> 188 /tmp/x2
>> # perl -pe 's/(.*)/hex($1)/e' /tmp/x2 | sort -n | perl -ne 'printf "0x%x\n",$_' > /tmp/x3
>> # head /tmp/x3
>> 0x8
>> 0x10
>> 0x20
>> 0x28
>> 0x3188
>> 0x4a40
>> 0x4a68
>> 0x2b163a0
>> 0x31fecd0
>> 0x426e8f8
>> b2# tail /tmp/x3
>> 0x2900ffb78
>> 0x2900ffb88
>> 0x2900ffb98
>> 0x2900ffba8
>> 0x2900ffbb0
>> 0x2900ffbb8
>> 0x2900ffbc0
>> 0x2900ffbc8
>> 0x2900ffbd0
>> 0x2900ffbd8
>>
>>>> As previously mentioned, the disk has actually reappeared under a different
>>>> /dev/sdXX name (it was sdu, now sdbh). Trying to mount the disk (read only)
>>>> results in:
>>>>
>>>> # mkdir /mnt/xfs && mount -ologdev=/dev/md8p5,ro /dev/sdbh1 /mnt/xfs
>>>> mount: /dev/sdbh1 already mounted or /mnt/xfs busy
>>>
>>> Probably due to either a uuid check or blocking on access to the
>>> external log device. You'll probably need to clean up the stale mount
>>> before this will work.
>>>
>>> As it is, something is clearly wrong with the drive. I can't really
>>> interpret the I/O errors and whatnot (linux-scsi?), but you probably
>>> want to look into health assessment tools (e.g., smart) to get an idea
>>> of what's wrong and/or replace the device and restore from backups (or
>>> perhaps heal via the ceph cluster, in your case).
>>
>> Sure. It's not the disk that's concerning me, they're expected to die,
>> but it looks like the disk error has put XFS in a state where the only
>> solution is a hard power cycle (after quiescing and cleaning up what I
>> can): I haven't tried it yet, but I expect I won't be able reboot
>> cleanly.
>>
>>>
>>> Brian
>>
>> Cheers,
>>
>> Chris
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disk error, then endless loop
2015-11-17 19:35 ` Chris Dunlop
@ 2015-11-17 20:21 ` Brian Foster
2015-11-17 20:34 ` Brian Foster
0 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2015-11-17 20:21 UTC (permalink / raw)
To: Chris Dunlop; +Cc: xfs
On Wed, Nov 18, 2015 at 06:35:34AM +1100, Chris Dunlop wrote:
> On Tue, Nov 17, 2015 at 12:37:24PM -0500, Brian Foster wrote:
> > On Wed, Nov 18, 2015 at 03:28:02AM +1100, Chris Dunlop wrote:
> >> On Tue, Nov 17, 2015 at 07:41:48AM -0500, Brian Foster wrote:
> >>> On Tue, Nov 17, 2015 at 07:03:33PM +1100, Chris Dunlop wrote:
> >>>> I tried to umount the filesystem but the umount is now hung and unkillable:
> >>>>
> >>>> # ps -ostat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
> >>>> STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
> >>>> D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
> >>>
> >>> ... but it appears you still have something pending in the AIL which is
> >>> holding everything up. The most likely case is an EFI/EFD item hanging
> >>> around from an extent free operation, as this was a known issue,
> >>> particularly on fs shutdowns. Fixes for this went into the v4.3 kernel.
> >>
> >> Any chance of these fixes getting into -stable, or are they too intrusive
> >> and/or depend on other intrusive changes?
> >
> > I don't think so... it was a multi-patch series and a rework of the
> > EFI/EFD reference counting as opposed to an isolated bug fix. For
> > reference, it was commits 5e4b538 through f0b2efa or so.
>
> ...
>
> >>> Does the umount process actually appear to be doing anything? E.g., are
> >>> you seeing noticeable CPU load or I/O errors continue to the logs, or
> >>> has everything pretty much locked up? You could also enable tracepoints
> >>> (trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe)
> >>> to get a quick idea of what's going on.
> >>
> >> No, the umount hasn't done anything noticable in the past 6.5 hours:
> >>
> >> b2# date; ps -opid,lstart,time,stat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
> >> Wed Nov 18 03:08:30 AEDT 2015
> >> PID STARTED TIME STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
> >> 23946 Tue Nov 17 17:30:41 2015 00:00:00 D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
> >>
> >> I don't know how to interpret the tracepoints, but there are other busy
> >> XFS filesystems on the box so that's cluttering things up. For what it's
> >> worth, it appears the original device (/dev/sdu1) was previously dev
> >> 65:65:
> >>
> >> b2# ls -l /dev/sd[tuv]{,1}
> >> brw-rw---T 1 root disk 65, 48 Sep 16 17:11 /dev/sdt
> >> brw-rw---T 1 root disk 65, 49 Sep 16 17:11 /dev/sdt1
> >> brw-rw---T 1 root disk 65, 80 Oct 30 15:38 /dev/sdv
> >> brw-rw---T 1 root disk 65, 81 Oct 30 15:40 /dev/sdv1
> >>
> >> ..and in 10 seconds of /sys/kernel/debug/tracing/trace_pipe we have:
> >>
> >> # grep 'dev 65:65' /sys/kernel/debug/tracing/trace_pipe > /tmp/x1 & sleep 10; kill $!
> >> # wc -l /tmp/x1
> >> 181953 /tmp/x1
> >> # head /tmp/x1
> >> <...>-7702 [012] .... 5392362.786946: xfs_buf_item_iodone_async: dev 65:65 bno 0x1828eed18 nblks 0x8 hold 2 pincount 0 lock 0 flags ASYNC|DONE|PAGES caller xfs_buf_ioend [xfs]
> >> <...>-7702 [012] .... 5392362.786946: xfs_buf_ioerror: dev 65:65 bno 0x1828eed18 len 0x1000 hold 2 pincount 0 lock 0 error 0 flags ASYNC|DONE|PAGES caller xfs_buf_iodone_callbacks [xfs]
> >> xfsaild/sdu1-7991 [005] .N.. 5392363.647059: xfs_buf_submit: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
> >> xfsaild/sdu1-7991 [005] .N.. 5392363.647059: xfs_buf_hold: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> >> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_rele: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 3 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> >> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_submit: dev 65:65 bno 0x280006398 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
> >> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_hold: dev 65:65 bno 0x280006398 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> >> xfsaild/sdu1-7991 [005] .N.. 5392363.647063: xfs_buf_rele: dev 65:65 bno 0x280006398 nblks 0x8 hold 3 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> >> xfsaild/sdu1-7991 [005] .N.. 5392363.647063: xfs_buf_submit: dev 65:65 bno 0x2800063f8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
> >> xfsaild/sdu1-7991 [005] .N.. 5392363.647064: xfs_buf_hold: dev 65:65 bno 0x2800063f8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> >>
> >
> > Hmm, that is notably more activity than I recall when reproducing the
> > original AIL issue. Do we know whether the filesystem had actually shut
> > down or is in some intermediate state looping on errors? The fact that
> > it continues to try and submit I/O suggests that perhaps it hasn't shut
> > down for whatever reason.
> >
> > If the device has already dropped and reconnected as a new dev node,
> > it's probably harmless at this point to just try to forcibly shut down
> > the fs on the old one. Could you try the following?
> >
> > xfs_io -x -c shutdown <mnt>
>
> # xfs_io -x -c shutdown /var/lib/ceph/osd/ceph-18
> foreign file active, shutdown command is for XFS filesystems only
>
> # grep ceph-18 /etc/mtab
> <<< crickets >>>
>
> I don't know when the fs disappeared from mtab, it could have been when I
> first did the umount I guess, I didn't think to check at the time. But the
> umount is still there:
>
> # date; ps -opid,lstart,time,stat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
> Wed Nov 18 06:23:21 AEDT 2015
> PID STARTED TIME STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
> 23946 Tue Nov 17 17:30:41 2015 00:00:00 D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
>
Ah, so it's already been removed from the namespace. Apparently it's
stuck at some point after the mount is made inaccessible and before it
actually finishes with I/O. I'm not sure we have any other option other
than a reset at this point, unfortunately. :/
Brian
> > Can you unmount the fs after that? If not, is there still any tracepoint
> > activity on the old device?
> >
> > Brian
>
> The activity is still ongoing on the old device:
>
> # trace-cmd start -e "xfs:*"
> /sys/kernel/debug/tracing/events/xfs/*/filter
> # grep 'dev 65:65' /sys/kernel/debug/tracing/trace_pipe > /tmp/y1 & sleep 10; kill $!; wc -l /tmp/y1
> 129573 /tmp/y1
>
> Doing my little sed and perl dance shows it's still the same set of 188
> blocks as mentioned before.
>
> Chris
>
> >> In the 181953 lines there's a set of 188 different block numbers
> >> appearing, ranging from 0x8 to 0x2900ffbd8:
> >>
> >> # sed -rn 's/.*bno (0x[[:xdigit:]]+).*/\1/p' /tmp/x1 | sort -u > /tmp/x2
> >> # wc -l /tmp/x2
> >> 188 /tmp/x2
> >> # perl -pe 's/(.*)/hex($1)/e' /tmp/x2 | sort -n | perl -ne 'printf "0x%x\n",$_' > /tmp/x3
> >> # head /tmp/x3
> >> 0x8
> >> 0x10
> >> 0x20
> >> 0x28
> >> 0x3188
> >> 0x4a40
> >> 0x4a68
> >> 0x2b163a0
> >> 0x31fecd0
> >> 0x426e8f8
> >> b2# tail /tmp/x3
> >> 0x2900ffb78
> >> 0x2900ffb88
> >> 0x2900ffb98
> >> 0x2900ffba8
> >> 0x2900ffbb0
> >> 0x2900ffbb8
> >> 0x2900ffbc0
> >> 0x2900ffbc8
> >> 0x2900ffbd0
> >> 0x2900ffbd8
> >>
> >>>> As previously mentioned, the disk has actually reappeared under a different
> >>>> /dev/sdXX name (it was sdu, now sdbh). Trying to mount the disk (read only)
> >>>> results in:
> >>>>
> >>>> # mkdir /mnt/xfs && mount -ologdev=/dev/md8p5,ro /dev/sdbh1 /mnt/xfs
> >>>> mount: /dev/sdbh1 already mounted or /mnt/xfs busy
> >>>
> >>> Probably due to either a uuid check or blocking on access to the
> >>> external log device. You'll probably need to clean up the stale mount
> >>> before this will work.
> >>>
> >>> As it is, something is clearly wrong with the drive. I can't really
> >>> interpret the I/O errors and whatnot (linux-scsi?), but you probably
> >>> want to look into health assessment tools (e.g., smart) to get an idea
> >>> of what's wrong and/or replace the device and restore from backups (or
> >>> perhaps heal via the ceph cluster, in your case).
> >>
> >> Sure. It's not the disk that's concerning me, they're expected to die,
> >> but it looks like the disk error has put XFS in a state where the only
> >> solution is a hard power cycle (after quiescing and cleaning up what I
> >> can): I haven't tried it yet, but I expect I won't be able reboot
> >> cleanly.
> >>
> >>>
> >>> Brian
> >>
> >> Cheers,
> >>
> >> Chris
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disk error, then endless loop
2015-11-17 20:21 ` Brian Foster
@ 2015-11-17 20:34 ` Brian Foster
2015-11-17 22:16 ` Chris Dunlop
0 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2015-11-17 20:34 UTC (permalink / raw)
To: Chris Dunlop; +Cc: xfs
On Tue, Nov 17, 2015 at 03:21:31PM -0500, Brian Foster wrote:
> On Wed, Nov 18, 2015 at 06:35:34AM +1100, Chris Dunlop wrote:
> > On Tue, Nov 17, 2015 at 12:37:24PM -0500, Brian Foster wrote:
> > > On Wed, Nov 18, 2015 at 03:28:02AM +1100, Chris Dunlop wrote:
> > >> On Tue, Nov 17, 2015 at 07:41:48AM -0500, Brian Foster wrote:
> > >>> On Tue, Nov 17, 2015 at 07:03:33PM +1100, Chris Dunlop wrote:
> > >>>> I tried to umount the filesystem but the umount is now hung and unkillable:
> > >>>>
> > >>>> # ps -ostat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
> > >>>> STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
> > >>>> D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
> > >>>
> > >>> ... but it appears you still have something pending in the AIL which is
> > >>> holding everything up. The most likely case is an EFI/EFD item hanging
> > >>> around from an extent free operation, as this was a known issue,
> > >>> particularly on fs shutdowns. Fixes for this went into the v4.3 kernel.
> > >>
> > >> Any chance of these fixes getting into -stable, or are they too intrusive
> > >> and/or depend on other intrusive changes?
> > >
> > > I don't think so... it was a multi-patch series and a rework of the
> > > EFI/EFD reference counting as opposed to an isolated bug fix. For
> > > reference, it was commits 5e4b538 through f0b2efa or so.
> >
> > ...
> >
> > >>> Does the umount process actually appear to be doing anything? E.g., are
> > >>> you seeing noticeable CPU load or I/O errors continue to the logs, or
> > >>> has everything pretty much locked up? You could also enable tracepoints
> > >>> (trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe)
> > >>> to get a quick idea of what's going on.
> > >>
> > >> No, the umount hasn't done anything noticable in the past 6.5 hours:
> > >>
> > >> b2# date; ps -opid,lstart,time,stat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
> > >> Wed Nov 18 03:08:30 AEDT 2015
> > >> PID STARTED TIME STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
> > >> 23946 Tue Nov 17 17:30:41 2015 00:00:00 D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
> > >>
> > >> I don't know how to interpret the tracepoints, but there are other busy
> > >> XFS filesystems on the box so that's cluttering things up. For what it's
> > >> worth, it appears the original device (/dev/sdu1) was previously dev
> > >> 65:65:
> > >>
> > >> b2# ls -l /dev/sd[tuv]{,1}
> > >> brw-rw---T 1 root disk 65, 48 Sep 16 17:11 /dev/sdt
> > >> brw-rw---T 1 root disk 65, 49 Sep 16 17:11 /dev/sdt1
> > >> brw-rw---T 1 root disk 65, 80 Oct 30 15:38 /dev/sdv
> > >> brw-rw---T 1 root disk 65, 81 Oct 30 15:40 /dev/sdv1
> > >>
> > >> ..and in 10 seconds of /sys/kernel/debug/tracing/trace_pipe we have:
> > >>
> > >> # grep 'dev 65:65' /sys/kernel/debug/tracing/trace_pipe > /tmp/x1 & sleep 10; kill $!
> > >> # wc -l /tmp/x1
> > >> 181953 /tmp/x1
> > >> # head /tmp/x1
> > >> <...>-7702 [012] .... 5392362.786946: xfs_buf_item_iodone_async: dev 65:65 bno 0x1828eed18 nblks 0x8 hold 2 pincount 0 lock 0 flags ASYNC|DONE|PAGES caller xfs_buf_ioend [xfs]
> > >> <...>-7702 [012] .... 5392362.786946: xfs_buf_ioerror: dev 65:65 bno 0x1828eed18 len 0x1000 hold 2 pincount 0 lock 0 error 0 flags ASYNC|DONE|PAGES caller xfs_buf_iodone_callbacks [xfs]
> > >> xfsaild/sdu1-7991 [005] .N.. 5392363.647059: xfs_buf_submit: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
> > >> xfsaild/sdu1-7991 [005] .N.. 5392363.647059: xfs_buf_hold: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> > >> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_rele: dev 65:65 bno 0x27ffffff8 nblks 0x8 hold 3 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> > >> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_submit: dev 65:65 bno 0x280006398 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
> > >> xfsaild/sdu1-7991 [005] .N.. 5392363.647061: xfs_buf_hold: dev 65:65 bno 0x280006398 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> > >> xfsaild/sdu1-7991 [005] .N.. 5392363.647063: xfs_buf_rele: dev 65:65 bno 0x280006398 nblks 0x8 hold 3 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> > >> xfsaild/sdu1-7991 [005] .N.. 5392363.647063: xfs_buf_submit: dev 65:65 bno 0x2800063f8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller __xfs_buf_delwri_submit [xfs]
> > >> xfsaild/sdu1-7991 [005] .N.. 5392363.647064: xfs_buf_hold: dev 65:65 bno 0x2800063f8 nblks 0x8 hold 2 pincount 0 lock 0 flags WRITE|ASYNC|DONE|PAGES caller xfs_buf_submit [xfs]
> > >>
> > >
> > > Hmm, that is notably more activity than I recall when reproducing the
> > > original AIL issue. Do we know whether the filesystem had actually shut
> > > down or is in some intermediate state looping on errors? The fact that
> > > it continues to try and submit I/O suggests that perhaps it hasn't shut
> > > down for whatever reason.
> > >
> > > If the device has already dropped and reconnected as a new dev node,
> > > it's probably harmless at this point to just try to forcibly shut down
> > > the fs on the old one. Could you try the following?
> > >
> > > xfs_io -x -c shutdown <mnt>
> >
> > # xfs_io -x -c shutdown /var/lib/ceph/osd/ceph-18
> > foreign file active, shutdown command is for XFS filesystems only
> >
> > # grep ceph-18 /etc/mtab
> > <<< crickets >>>
> >
> > I don't know when the fs disappeared from mtab, it could have been when I
> > first did the umount I guess, I didn't think to check at the time. But the
> > umount is still there:
> >
> > # date; ps -opid,lstart,time,stat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
> > Wed Nov 18 06:23:21 AEDT 2015
> > PID STARTED TIME STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
> > 23946 Tue Nov 17 17:30:41 2015 00:00:00 D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
> >
>
> Ah, so it's already been removed from the namespace. Apparently it's
> stuck at some point after the mount is made inaccessible and before it
> actually finishes with I/O. I'm not sure we have any other option other
> than a reset at this point, unfortunately. :/
>
> Brian
>
One last thought... it occurred to me that scsi devs have a delete
option under the /sysfs fs. Does the old/stale device still exist under
/sys/block/<dev>? If so, perhaps an 'echo 1 >
/sys/block/<dev>/device/delete' would move things along..?
Note that I have no idea what effect that will have beyond removing the
device node (so if it is still accessible now, it probably won't be
after that command). I just tried it while doing I/O to a test device
and it looked like it caused an fs shutdown, so it could be worth a try
as a last resort before a system restart.
Brian
> > > Can you unmount the fs after that? If not, is there still any tracepoint
> > > activity on the old device?
> > >
> > > Brian
> >
> > The activity is still ongoing on the old device:
> >
> > # trace-cmd start -e "xfs:*"
> > /sys/kernel/debug/tracing/events/xfs/*/filter
> > # grep 'dev 65:65' /sys/kernel/debug/tracing/trace_pipe > /tmp/y1 & sleep 10; kill $!; wc -l /tmp/y1
> > 129573 /tmp/y1
> >
> > Doing my little sed and perl dance shows it's still the same set of 188
> > blocks as mentioned before.
> >
> > Chris
> >
> > >> In the 181953 lines there's a set of 188 different block numbers
> > >> appearing, ranging from 0x8 to 0x2900ffbd8:
> > >>
> > >> # sed -rn 's/.*bno (0x[[:xdigit:]]+).*/\1/p' /tmp/x1 | sort -u > /tmp/x2
> > >> # wc -l /tmp/x2
> > >> 188 /tmp/x2
> > >> # perl -pe 's/(.*)/hex($1)/e' /tmp/x2 | sort -n | perl -ne 'printf "0x%x\n",$_' > /tmp/x3
> > >> # head /tmp/x3
> > >> 0x8
> > >> 0x10
> > >> 0x20
> > >> 0x28
> > >> 0x3188
> > >> 0x4a40
> > >> 0x4a68
> > >> 0x2b163a0
> > >> 0x31fecd0
> > >> 0x426e8f8
> > >> b2# tail /tmp/x3
> > >> 0x2900ffb78
> > >> 0x2900ffb88
> > >> 0x2900ffb98
> > >> 0x2900ffba8
> > >> 0x2900ffbb0
> > >> 0x2900ffbb8
> > >> 0x2900ffbc0
> > >> 0x2900ffbc8
> > >> 0x2900ffbd0
> > >> 0x2900ffbd8
> > >>
> > >>>> As previously mentioned, the disk has actually reappeared under a different
> > >>>> /dev/sdXX name (it was sdu, now sdbh). Trying to mount the disk (read only)
> > >>>> results in:
> > >>>>
> > >>>> # mkdir /mnt/xfs && mount -ologdev=/dev/md8p5,ro /dev/sdbh1 /mnt/xfs
> > >>>> mount: /dev/sdbh1 already mounted or /mnt/xfs busy
> > >>>
> > >>> Probably due to either a uuid check or blocking on access to the
> > >>> external log device. You'll probably need to clean up the stale mount
> > >>> before this will work.
> > >>>
> > >>> As it is, something is clearly wrong with the drive. I can't really
> > >>> interpret the I/O errors and whatnot (linux-scsi?), but you probably
> > >>> want to look into health assessment tools (e.g., smart) to get an idea
> > >>> of what's wrong and/or replace the device and restore from backups (or
> > >>> perhaps heal via the ceph cluster, in your case).
> > >>
> > >> Sure. It's not the disk that's concerning me, they're expected to die,
> > >> but it looks like the disk error has put XFS in a state where the only
> > >> solution is a hard power cycle (after quiescing and cleaning up what I
> > >> can): I haven't tried it yet, but I expect I won't be able reboot
> > >> cleanly.
> > >>
> > >>>
> > >>> Brian
> > >>
> > >> Cheers,
> > >>
> > >> Chris
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disk error, then endless loop
2015-11-17 20:34 ` Brian Foster
@ 2015-11-17 22:16 ` Chris Dunlop
0 siblings, 0 replies; 8+ messages in thread
From: Chris Dunlop @ 2015-11-17 22:16 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Tue, Nov 17, 2015 at 03:34:55PM -0500, Brian Foster wrote:
> On Tue, Nov 17, 2015 at 03:21:31PM -0500, Brian Foster wrote:
>> On Wed, Nov 18, 2015 at 06:35:34AM +1100, Chris Dunlop wrote:
>>> On Tue, Nov 17, 2015 at 12:37:24PM -0500, Brian Foster wrote:
>>>> If the device has already dropped and reconnected as a new dev node,
>>>> it's probably harmless at this point to just try to forcibly shut down
>>>> the fs on the old one. Could you try the following?
>>>>
>>>> xfs_io -x -c shutdown <mnt>
>>>
>>> # xfs_io -x -c shutdown /var/lib/ceph/osd/ceph-18
>>> foreign file active, shutdown command is for XFS filesystems only
>>>
>>> # grep ceph-18 /etc/mtab
>>> <<< crickets >>>
>>>
>>> I don't know when the fs disappeared from mtab, it could have been when I
>>> first did the umount I guess, I didn't think to check at the time. But the
>>> umount is still there:
>>>
>>> # date; ps -opid,lstart,time,stat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
>>> Wed Nov 18 06:23:21 AEDT 2015
>>> PID STARTED TIME STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
>>> 23946 Tue Nov 17 17:30:41 2015 00:00:00 D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
>>
>> Ah, so it's already been removed from the namespace. Apparently it's
>> stuck at some point after the mount is made inaccessible and before it
>> actually finishes with I/O. I'm not sure we have any other option other
>> than a reset at this point, unfortunately. :/
Yes, I thought this would likely be the case.
> One last thought... it occurred to me that scsi devs have a delete
> option under the /sysfs fs. Does the old/stale device still exist under
> /sys/block/<dev>? If so, perhaps an 'echo 1 >
> /sys/block/<dev>/device/delete' would move things along..?
Unfortunately, no, it's not there.
> Note that I have no idea what effect that will have beyond removing the
> device node (so if it is still accessible now, it probably won't be
> after that command). I just tried it while doing I/O to a test device
> and it looked like it caused an fs shutdown, so it could be worth a try
> as a last resort before a system restart.
>
> Brian
Thanks again,
Chris
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-11-17 22:16 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-17 8:03 Disk error, then endless loop Chris Dunlop
2015-11-17 12:41 ` Brian Foster
2015-11-17 16:28 ` Chris Dunlop
2015-11-17 17:37 ` Brian Foster
2015-11-17 19:35 ` Chris Dunlop
2015-11-17 20:21 ` Brian Foster
2015-11-17 20:34 ` Brian Foster
2015-11-17 22:16 ` Chris Dunlop
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox