From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: xfs.pkoch@dfgh.net
Cc: linux-xfs@vger.kernel.org
Subject: Re: Growing RAID10 with active XFS filesystem
Date: Mon, 8 Jan 2018 11:26:07 -0800 [thread overview]
Message-ID: <20180108192607.GS5602@magnolia> (raw)
In-Reply-To: <f289da8f-96ec-7db4-abb1-b151d553c088@gmail.com>
On Mon, Jan 08, 2018 at 08:08:09PM +0100, xfs.pkoch@dfgh.net wrote:
> Dear Linux-Raid and Linux-XFS experts:
>
> I'm posting this on both the linux-raid and linux-xfs
> mailing list as it's not clear at this point wether
> this is a MD- od XFS-problem.
>
> I have described my problem in a recent posting on
> linux-raid and Wol's conclusion was:
>
> >In other words, one or more of the following three are true :-
> >1) The OP has been caught by some random act of God
> >2) There's a serious flaw in "mdadm --grow"
> >3) There's a serious flaw in xfs
> >
> >Cheers,
> >Wol
>
> There's very important data on our RAID10 device but I doubt
> it's important enough for God to take a hand into our storage.
>
> But let me first summarize what happened and why I believe that
> this is an XFS-problem:
>
> Machine running Linux 3.14.69 with no kernel-patches.
>
> XFS filesystem was created with XFS userutils 3.1.11.
> I did a fresh compile of xfsprogs-4.9.0 yesterday when
> I realized that the 3.1.11 xfs_repair did not help.
>
> mdadm is V3.3
>
> /dev/md5 is a RAID10-device that was created in Feb 2013
> with 10 2TB disks and an ext3 filesystem on it. Once in a
> while I added two more 2TB disks. Reshaping was done
> while the ext3 filesystem was mounted. Then the ext3
> filesystem was unmounted resized and mounted again. That
> worked until I resized the RAID10 from 16 to 20 disks and
> realized that ext3 does not support filesystems >16TB.
>
> I switched to XFS and created a 20TB filesystem. Here are
> the details:
>
> # xfs_info /dev/md5
> meta-data=/dev/md5 isize=256 agcount=32,
> agsize=152608128 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=4883457280, imaxpct=5
> = sunit=128 swidth=1280 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal bsize=4096 blocks=521728, version=2
> = sectsz=512 sunit=8 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> Please notice: Ths XFS-filesystem has a size of
> 4883457280*4K = 19,533,829,120K
>
> On saturday I tried to add two more 2TB disks to the RAID10
> and the XFS filesystem was mounted (and in medium use) at that
> time. Commands were:
>
> # mdadm /dev/md5 --add /dev/sdo
> # mdadm --grow /dev/md5 --raid-devices=21
>
> # mdadm -D /dev/md5
> /dev/md5:
> Version : 1.2
> Creation Time : Sun Feb 10 16:58:10 2013
> Raid Level : raid10
> Array Size : 19533829120 (18628.91 GiB 20002.64 GB)
> Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB)
> Raid Devices : 21
> Total Devices : 21
> Persistence : Superblock is persistent
>
> Update Time : Sat Jan 6 15:08:37 2018
> State : clean, reshaping
> Active Devices : 21
> Working Devices : 21
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : near=2
> Chunk Size : 512K
>
> Reshape Status : 1% complete
> Delta Devices : 1, (20->21)
>
> Name : backup:5 (local to host backup)
> UUID : 9030ff07:6a292a3c:26589a26:8c92a488
> Events : 86002
>
> Number Major Minor RaidDevice State
> 0 8 16 0 active sync /dev/sdb
> 1 65 48 1 active sync /dev/sdt
> 2 8 64 2 active sync /dev/sde
> 3 65 96 3 active sync /dev/sdw
> 4 8 112 4 active sync /dev/sdh
> 5 65 144 5 active sync /dev/sdz
> 6 8 160 6 active sync /dev/sdk
> 7 65 192 7 active sync /dev/sdac
> 8 8 208 8 active sync /dev/sdn
> 9 65 240 9 active sync /dev/sdaf
> 10 65 0 10 active sync /dev/sdq
> 11 66 32 11 active sync /dev/sdai
> 12 8 32 12 active sync /dev/sdc
> 13 65 64 13 active sync /dev/sdu
> 14 8 80 14 active sync /dev/sdf
> 15 65 112 15 active sync /dev/sdx
> 16 8 128 16 active sync /dev/sdi
> 17 65 160 17 active sync /dev/sdaa
> 18 8 176 18 active sync /dev/sdl
> 19 65 208 19 active sync /dev/sdad
> 20 8 224 20 active sync /dev/sdo
>
> Please notice: Ths RAID10-device has a size of 19,533,829,120K
> that's exactly the same size as the contained XFS-filesystem.
>
> Immediately after the RAID10 reshape operation started the
> XFS-filesystem reported I/O-errors and was severly damaged.
> I waited for the reshape operation to finish and tried to repair
> the filesystem with xfs_repair (version 3.1.11) but xfs_repair
> crashed, so I tried 4.9.0-version aif xfs_reapair with no luck
> either.
>
> /dev/md5 ist now mounted ro,norecovery with an overlay filesystem
> on top of it (thanks very much to Andreas for that idea) and I have
> setup a new server today. Rsyncing the data to the new server will
> take a while and I'm sure I will stumble on lots of corrupted files.
> I proceeded from XFS to ZFS (skipped YFS) so lengthy reshape
> operations won't happen in the future anymore.
>
> Here are the relevant log messages:
>
> >Jan 6 14:45:00 backup kernel: md: reshape of RAID array md5
> >Jan 6 14:45:00 backup kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> >Jan 6 14:45:00 backup kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
> >Jan 6 14:45:00 backup kernel: md: using 128k window, over a total of 19533829120k.
> >Jan 6 14:45:00 backup kernel: XFS (md5): metadata I/O error: block 0x12c08f360 ("xfs_trans_read_buf_map") error 5 numblks 16
> >Jan 6 14:45:00 backup kernel: XFS (md5): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> >Jan 6 14:45:00 backup kernel: XFS (md5): metadata I/O error: block 0x12c08f360 ("xfs_trans_read_buf_map") error 5 numblks 16
> >Jan 6 14:45:00 backup kernel: XFS (md5): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> >... hundreds of the above XFS-messages deleted
> >Jan 6 14:45:00 backup kernel: XFS (md5): Log I/O Error Detected. Shutting down filesystem
> >Jan 6 14:45:00 backup kernel: XFS (md5): Please umount the filesystem and rectify the problem(s)
>
> Please notice: no error message about hardware-problems.
> All 21 disks are fine and the next messages from the
> md-driver was:
>
> >Jan 7 02:28:02 backup kernel: md: md5: reshape done.
> >Jan 7 02:28:03 backup kernel: md5: detected capacity change from 20002641018880 to 21002772807680
>
> I'm wondering about one thing: the first xfs message is about a
> meatadata I/O error on block 0x12c08f360. Since the xfs filesystem
I'm sure Dave will have more to say about this, but...
"block 0x12c08f360" == units of sectors, not fs blocks.
IOWs, this IO error happened at offset 2,577,280,712,704 (~2.5TB)
XFS doesn't change the fs size until you tell it to (via growfs);
even if the underlying storage geometry changes, XFS won't act on it
until the admin tells it to.
What did xfs_repair do?
--D
> has a blocksize of 4K this block is located at position 20135005568K
> which is beyond the end of the RAID10 device. No wonder that the
> xfs driver receives an I/O error. And also no wonder that the
> filesystem is severely corrupted right now.
>
> Question 1: How did the xfs driver knew on Jan 6 that the RAID10
> device was about to be increased from 20TB to 21TB on Jan 7?
>
> Question 2: Why did the xfs driver started to use the additional
> space that was not yet there without me executing xfs_growfs.
>
> This looks like a severe XFS-problem to me.
>
> But my hope is that all the data taht was within the filesystem
> before Jan 6 14:45 is not involved in the corruption. If xfs
> started to use space beyond the end of the underlying raid
> device this should have affected only data that was created,
> modified or deleted after Jan 6 14:45.
>
> If that was true we could clearly distinct between data
> that we must dump and data that we can keep. The machine is
> our backup system (as you may have guessed from its name)
> and I would like to keep old backup-files.
>
> I remember that mkfs.xfs is clever enough to adopt the
> filesystem paramters to the underlying hardware of the
> block device that the xfs filesystem is created on. Hence
> from the xfs drivers point of view the underlying block
> device is not just a sequence of data blocks, but the xfs
> driver knows something about the layout of the underlying
> hardware.
>
> If that was true - how does the xfs driver reacts if that
> information about the layout of the underlying hardware
> changes while the xfs-filesystem is mounted?
>
> Seems to be an interesting problem
>
> Kind regards
>
> Peter Koch
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-01-08 19:26 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-08 19:08 Growing RAID10 with active XFS filesystem xfs.pkoch
2018-01-08 19:26 ` Darrick J. Wong [this message]
2018-01-08 22:01 ` Dave Chinner
2018-01-08 23:44 ` xfs.pkoch
2018-01-09 9:36 ` Wols Lists
2018-01-09 21:47 ` IMAP-FCC:Sent
2018-01-09 22:25 ` Dave Chinner
2018-01-09 22:32 ` Reindl Harald
2018-01-10 6:17 ` Wols Lists
2018-01-11 2:14 ` Dave Chinner
2018-01-12 2:16 ` Guoqing Jiang
2018-01-10 14:10 ` Phil Turmel
2018-01-11 3:07 ` Dave Chinner
2018-01-12 13:32 ` Wols Lists
2018-01-12 14:25 ` Emmanuel Florac
2018-01-12 17:52 ` Wols Lists
2018-01-12 18:37 ` Emmanuel Florac
2018-01-12 19:35 ` Wol's lists
2018-01-13 12:30 ` Brad Campbell
2018-01-13 13:18 ` Wols Lists
2018-01-13 0:20 ` Stan Hoeppner
2018-01-13 19:29 ` Wol's lists
2018-01-13 22:40 ` Dave Chinner
2018-01-13 23:04 ` Wols Lists
2018-01-14 21:33 ` Wol's lists
2018-01-15 17:08 ` Emmanuel Florac
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180108192607.GS5602@magnolia \
--to=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=xfs.pkoch@dfgh.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox