Re: Growing RAID10 with active XFS filesystem

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: xfs.pkoch@dfgh.net
Cc: linux-xfs@vger.kernel.org
Subject: Re: Growing RAID10 with active XFS filesystem
Date: Mon, 8 Jan 2018 11:26:07 -0800	[thread overview]
Message-ID: <20180108192607.GS5602@magnolia> (raw)
In-Reply-To: <f289da8f-96ec-7db4-abb1-b151d553c088@gmail.com>

On Mon, Jan 08, 2018 at 08:08:09PM +0100, xfs.pkoch@dfgh.net wrote:
> Dear Linux-Raid and Linux-XFS experts:
> 
> I'm posting this on both the linux-raid and linux-xfs
> mailing list as it's not clear at this point wether
> this is a MD- od XFS-problem.
> 
> I have described my problem in a recent posting on
> linux-raid and Wol's conclusion was:
> 
> >In other words, one or more of the following three are true :-
> >1) The OP has been caught by some random act of God
> >2) There's a serious flaw in "mdadm --grow"
> >3) There's a serious flaw in xfs
> >
> >Cheers,
> >Wol
> 
> There's very important data on our RAID10 device but I doubt
> it's important enough for God to take a hand into our storage.
> 
> But let me first summarize what happened and why I believe that
> this is an XFS-problem:
> 
> Machine running Linux 3.14.69 with no kernel-patches.
> 
> XFS filesystem was created with XFS userutils 3.1.11.
> I did a fresh compile of xfsprogs-4.9.0 yesterday when
> I realized that the 3.1.11 xfs_repair did not help.
> 
> mdadm is V3.3
> 
> /dev/md5 is a RAID10-device that was created in Feb 2013
> with 10 2TB disks and an ext3 filesystem on it. Once in a
> while I added two more 2TB disks. Reshaping was done
> while the ext3 filesystem was mounted. Then the ext3
> filesystem was unmounted resized and mounted again. That
> worked until I resized the RAID10 from 16 to 20 disks and
> realized that ext3 does not support filesystems >16TB.
> 
> I switched to XFS and created a 20TB filesystem. Here are
> the details:
> 
> # xfs_info /dev/md5
> meta-data=/dev/md5               isize=256    agcount=32,
> agsize=152608128 blks
>           =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=4883457280, imaxpct=5
>           =                       sunit=128    swidth=1280 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=521728, version=2
>           =                       sectsz=512   sunit=8 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Please notice: Ths XFS-filesystem has a size of
> 4883457280*4K = 19,533,829,120K
> 
> On saturday I tried to add two more 2TB disks to the RAID10
> and the XFS filesystem was mounted (and in medium use) at that
> time. Commands were:
> 
> # mdadm /dev/md5 --add /dev/sdo
> # mdadm --grow /dev/md5 --raid-devices=21
> 
> # mdadm -D /dev/md5
> /dev/md5:
>          Version : 1.2
>    Creation Time : Sun Feb 10 16:58:10 2013
>       Raid Level : raid10
>       Array Size : 19533829120 (18628.91 GiB 20002.64 GB)
>    Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB)
>     Raid Devices : 21
>    Total Devices : 21
>      Persistence : Superblock is persistent
> 
>      Update Time : Sat Jan  6 15:08:37 2018
>            State : clean, reshaping
>   Active Devices : 21
> Working Devices : 21
>   Failed Devices : 0
>    Spare Devices : 0
> 
>           Layout : near=2
>       Chunk Size : 512K
> 
>   Reshape Status : 1% complete
>    Delta Devices : 1, (20->21)
> 
>             Name : backup:5  (local to host backup)
>             UUID : 9030ff07:6a292a3c:26589a26:8c92a488
>           Events : 86002
> 
>      Number   Major   Minor   RaidDevice State
>         0       8       16        0      active sync   /dev/sdb
>         1      65       48        1      active sync   /dev/sdt
>         2       8       64        2      active sync   /dev/sde
>         3      65       96        3      active sync   /dev/sdw
>         4       8      112        4      active sync   /dev/sdh
>         5      65      144        5      active sync   /dev/sdz
>         6       8      160        6      active sync   /dev/sdk
>         7      65      192        7      active sync   /dev/sdac
>         8       8      208        8      active sync   /dev/sdn
>         9      65      240        9      active sync   /dev/sdaf
>        10      65        0       10      active sync   /dev/sdq
>        11      66       32       11      active sync   /dev/sdai
>        12       8       32       12      active sync   /dev/sdc
>        13      65       64       13      active sync   /dev/sdu
>        14       8       80       14      active sync   /dev/sdf
>        15      65      112       15      active sync   /dev/sdx
>        16       8      128       16      active sync   /dev/sdi
>        17      65      160       17      active sync   /dev/sdaa
>        18       8      176       18      active sync   /dev/sdl
>        19      65      208       19      active sync   /dev/sdad
>        20       8      224       20      active sync   /dev/sdo
> 
> Please notice: Ths RAID10-device has a size of 19,533,829,120K
> that's exactly the same size as the contained XFS-filesystem.
> 
> Immediately after the RAID10 reshape operation started the
> XFS-filesystem reported I/O-errors and was severly damaged.
> I waited for the reshape operation to finish and tried to repair
> the filesystem with xfs_repair (version 3.1.11) but xfs_repair
> crashed, so I tried 4.9.0-version aif xfs_reapair with no luck
> either.
> 
> /dev/md5 ist now mounted ro,norecovery with an overlay filesystem
> on top of it (thanks very much to Andreas for that idea) and I have
> setup a new server today. Rsyncing the data to the new server will
> take a while and I'm sure I will stumble on lots of corrupted files.
> I proceeded from XFS to ZFS (skipped YFS) so lengthy reshape
> operations won't happen in the future anymore.
> 
> Here are the relevant log messages:
> 
> >Jan  6 14:45:00 backup kernel: md: reshape of RAID array md5
> >Jan  6 14:45:00 backup kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> >Jan  6 14:45:00 backup kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
> >Jan  6 14:45:00 backup kernel: md: using 128k window, over a total of 19533829120k.
> >Jan  6 14:45:00 backup kernel: XFS (md5): metadata I/O error: block 0x12c08f360 ("xfs_trans_read_buf_map") error 5 numblks 16
> >Jan  6 14:45:00 backup kernel: XFS (md5): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> >Jan  6 14:45:00 backup kernel: XFS (md5): metadata I/O error: block 0x12c08f360 ("xfs_trans_read_buf_map") error 5 numblks 16
> >Jan  6 14:45:00 backup kernel: XFS (md5): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> >... hundreds of the above XFS-messages deleted
> >Jan  6 14:45:00 backup kernel: XFS (md5): Log I/O Error Detected.  Shutting down filesystem
> >Jan  6 14:45:00 backup kernel: XFS (md5): Please umount the filesystem and rectify the problem(s)
> 
> Please notice: no error message about hardware-problems.
> All 21 disks are fine and the next messages from the
> md-driver was:
> 
> >Jan  7 02:28:02 backup kernel: md: md5: reshape done.
> >Jan  7 02:28:03 backup kernel: md5: detected capacity change from 20002641018880 to 21002772807680
> 
> I'm wondering about one thing: the first xfs message is about a
> meatadata I/O error on block 0x12c08f360. Since the xfs filesystem

I'm sure Dave will have more to say about this, but...

"block 0x12c08f360" == units of sectors, not fs blocks.

IOWs, this IO error happened at offset 2,577,280,712,704 (~2.5TB)

XFS doesn't change the fs size until you tell it to (via growfs);
even if the underlying storage geometry changes, XFS won't act on it
until the admin tells it to.

What did xfs_repair do?

--D

> has a blocksize of 4K this block is located at position 20135005568K
> which is beyond the end of the RAID10 device. No wonder that the
> xfs driver receives an I/O error. And also no wonder that the
> filesystem is severely corrupted right now.
> 
> Question 1: How did the xfs driver knew on Jan 6 that the RAID10
> device was about to be increased from 20TB to 21TB on Jan 7?
> 
> Question 2: Why did the xfs driver started to use the additional
> space that was not yet there without me executing xfs_growfs.
> 
> This looks like a severe XFS-problem to me.
> 
> But my hope is that all the data taht was within the filesystem
> before Jan 6 14:45 is not involved in the corruption. If xfs
> started to use space beyond the end of the underlying raid
> device this should have affected only data that was created,
> modified or deleted after Jan 6 14:45.
> 
> If that was true we could clearly distinct between data
> that we must dump and data that we can keep. The machine is
> our backup system (as you may have guessed from its name)
> and I would like to keep old backup-files.
> 
> I remember that mkfs.xfs is clever enough to adopt the
> filesystem paramters to the underlying hardware of the
> block device that the xfs filesystem is created on. Hence
> from the xfs drivers point of view the underlying block
> device is not just a sequence of data blocks, but the xfs
> driver knows something about the layout of the underlying
> hardware.
> 
> If that was true - how does the xfs driver reacts if that
> information about the layout of the underlying hardware
> changes while the xfs-filesystem is mounted?
> 
> Seems to be an interesting problem
> 
> Kind regards
> 
> Peter Koch
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2018-01-08 19:26 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-08 19:08 Growing RAID10 with active XFS filesystem xfs.pkoch
2018-01-08 19:26 ` Darrick J. Wong [this message]
2018-01-08 22:01   ` Dave Chinner
2018-01-08 23:44     ` mdraid.pkoch
2018-01-08 23:44       ` xfs.pkoch
2018-01-09  9:36     ` Wols Lists
2018-01-09 21:47       ` IMAP-FCC:Sent
2018-01-09 22:25       ` Dave Chinner
2018-01-09 22:32         ` Reindl Harald
2018-01-10  6:17         ` Wols Lists
2018-01-11  2:14           ` Dave Chinner
2018-01-12  2:16             ` Guoqing Jiang
2018-01-10 14:10         ` Phil Turmel
2018-01-10 21:57           ` Wols Lists
2018-01-11  3:07           ` Dave Chinner
2018-01-12 13:32             ` Wols Lists
2018-01-12 14:25               ` Emmanuel Florac
2018-01-12 17:52                 ` Wols Lists
2018-01-12 18:37                   ` Emmanuel Florac
2018-01-12 19:35                     ` Wol's lists
2018-01-13 12:30                       ` Brad Campbell
2018-01-13 13:18                         ` Wols Lists
2018-01-13  0:20                   ` Stan Hoeppner
2018-01-13 19:29                     ` Wol's lists
2018-01-13 22:40                       ` Dave Chinner
2018-01-13 23:04                         ` Wols Lists
2018-01-14 21:33                 ` Wol's lists
2018-01-15 17:08                   ` Emmanuel Florac
  -- strict thread matches above, loose matches on Subject: below --
2018-01-08 19:06 mdraid.pkoch
2018-01-06 15:44 mdraid.pkoch
2018-01-07 19:33 ` John Stoffel
2018-01-07 20:16 ` Andreas Klauer
2018-01-08  7:31 ` Guoqing Jiang
2018-01-08 15:16   ` Wols Lists
2018-01-08 15:34     ` Reindl Harald
2018-01-08 16:24     ` Wolfgang Denk
2018-01-10  1:57     ` Guoqing Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180108192607.GS5602@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=xfs.pkoch@dfgh.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.