From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout.artfiles.de ([80.252.97.80]:47573 "EHLO mailout.artfiles.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755744AbaBROAh (ORCPT ); Tue, 18 Feb 2014 09:00:37 -0500 Received: from [178.13.125.3] (helo=fuckup.localnet) auth=Wolfgang_Mader@brain-frog.de by mailout.artfiles.de with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80.1) id 1WFkaV-00070s-ME for linux-btrfs@vger.kernel.org; Tue, 18 Feb 2014 14:19:51 +0100 From: Wolfgang Mader To: BTRFS Subject: Read i/o errs and disk replacement Date: Tue, 18 Feb 2014 14:19:47 +0100 Message-ID: <8051054.BLVnDBVVi7@fuckup> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: linux-btrfs-owner@vger.kernel.org List-ID: Hi all, well, I hit the first incidence where I really have to work with my btrfs setup. To get things straight I want to double-check here to not screw things up right from the start. We are talking about a home server. There is no time or user pressure involved, and there are backups, too. Software ------------- Linux 3.13.3 Btrfs v3.12 Hardware --------------- 5 1T hard drives configured to be a raid 10 for both data and metadata Data, RAID10: total=282.00GiB, used=273.33GiB System, RAID10: total=64.00MiB, used=36.00KiB Metadata, RAID10: total=1.00GiB, used=660.48MiB Error -------- This is not btrfs' fault but due to an hd error. I saw in the system logs btrfs: bdev /dev/sdb errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 and a subsequent check on btrfs showed [/dev/sdb].write_io_errs 0 [/dev/sdb].read_io_errs 2 [/dev/sdb].flush_io_errs 0 [/dev/sdb].corruption_errs 0 [/dev/sdb].generation_errs 0 So, I have a read error on sdb. Questions --------------- 1) Do I have to take action immediately (shutdown the system, umount the file system)? Can I even ignore the error? Unfortunately, I can not access SMART information through the sata interface of the enclosure which hosts the hds. 2) I only can replace the disk, not add a new one and than swap over. There is no space left in the disk enclosure I am using. I also can not guarantee that if I remove sdb and start the system up again that all the other disks are named the same as they are now, and that the newly added disk will be names sdb again. Is this an issue? 3) I know that btrfs can handle disks of different sizes. Is there a downside if I go for a 3T disk and add it to the 1T disks? Is there e.g. more stuff saved on the 3T disk, and if this ones fails I lose redundancy? Is a soft transition to 3T where I replace every dying 1T disk with a 3T disk advisable? Proposed solution for the current issue -------------------------------------------------------------- 1) Delete the faulted drive using btrfs device delete /dev/sdb /path/to/pool 2) Format the new disk with btrfs mkfs.btrfs 3) Add the new disk to the filesystem using btrfs device add /dev/newdiskname /path/to/pool 4) Balance the file system btrfs fs balance /path/to/pool Is this the proper way to deal with the situation? Thank you for your advice. Best, Wolfgang