From: Phil Turmel <philip@turmel.org>
To: Julie Ashworth <ashworth@berkeley.edu>
Cc: linux-raid@vger.kernel.org
Subject: Re: request help with RAID1 array that endlessly attempts to sync
Date: Tue, 21 Jan 2014 08:23:31 -0500 [thread overview]
Message-ID: <52DE74D3.10706@turmel.org> (raw)
In-Reply-To: <20140121063808.GA27520@ssh2.neuro.berkeley.edu>
Good morning Julie,
On 01/21/2014 01:38 AM, Julie Ashworth wrote:
> On 18-12-2013 07.08 -0500, Phil Turmel wrote:
>> I'd let the sync continue until it fails or completes. And if it
>> completes, exercise the array to see if it stays flaky. If it does not
>> complete, start swapping parts in the system.
> ---end quoted text---
>
> I'm responding to an old thread, but current problem. I started a RAID1 rebuild in mid-December, and it's still running - now with 2712 read errors - and counting. (I enclosed smartctl output).
The smartctl report says the drive is relatively healthy (3 total
relocations after 30,000 hours of operation). That implies all of your
read errors are transient. Or it is the other drive? (Show the other
drive's smartctl output, too, perhaps.)
> # cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 sda1[0] sdb1[1]
> 521984 blocks [2/2] [UU]
>
> md1 : active raid1 sda2[2] sdb2[1]
> 976237824 blocks [2/1] [_U]
> [==============>......] recovery = 70.9% (692700480/976237824) finish=68.5min speed=68956K/sec
I would *not* disturb the rebuild (yet). You have a better option.
> md0 is a boot partition, and md1 is the operating system.
> I configured LVM on md1, and allocated 68GB (of 1TB):
>
> # vgdisplay /dev/VolGroup00
> VG Name VolGroup00
> VG Size 931.00 GB
> Alloc PE / Size 2176 / 68.00 GB
> Free PE / Size 27616 / 863.00 GB
>
> Currently, only ~5GB of the 1TB disk is used.
>
> At this point, what is my best option for limiting downtime of the server (i.e. avoiding a rebuild)?
> I added a drive (/dev/sdc) with identical geometry, and consider using dd, i.e.
>
> # dd if=/dev/sdb of=/dev/sdc bs=4096 conv=sync,noerror
No, this would also duplicate the raid metadata, confusing MD if you had
an unexpected reboot in the middle.
> This may not be the most efficient method of transferring data, since .5% of the disk is used. But obviously, I'm not in a hurry.
No hurry is good, as I suggest you take advantage of LVM to establish a
new raid under your volume group. This can be done on the fly, but
involves several steps.
> Please excuse my ignorance, but after it's cloned, is it possible to add /dev/sdc2 to md1 while it's syncing (to /dev/sda2)? Or do I need to wait until /dev/sdb fails to replace it with /dev/sdc?
Using LVM can achieve the equivalent.
Here's what I recommend:
1) Partition sdc to match the old drives
2) Expand /dev/md0 onto sdc1:
mdadm --add /dev/md0 /dev/sdc1
mdadm --grow /dev/md0 -n 3
3) Create a new, degraded raid1 on sdc2
mdadm --create --level=1 -n 2 /dev/md2 /dev/sdc2 missing
4) Update mdadm.conf and initramfs to include /dev/md2
5) Add the new array to your volume group
pvcreate /dev/md2
vgextend VolGroup00 /dev/md2
6) Convert your logical volume(s) into mirrors across both PVs
lvconvert -m1 --mirrorlog=mirrored /dev/VolGroup00/lvname
{wait for this background task to complete}
7) Fail the rebuilding drive out of /dev/md1 and add it to /dev/md2
mdadm /dev/md1 --fail /dev/sd?2
mdadm /dev/md1 --remove /dev/sd?2
mdadm /dev/md2 --add /dev/sd?2
{wait for the background rebuild to complete}
8) Unmirror your logical volume(s), dropping the /dev/md1 copy
lvconvert -m0 /dev/VolGroup00/lvname /dev/md1
9) Drop the empty /dev/md1 from the volume group
vgreduce -a VolGroup00
pvremove /dev/md1
10) Update mdadm.conf and initramfs to omit /dev/md1
11) Destroy /dev/md1 and add its device to the new array
mdadm --stop /dev/md1
mdadm --add /dev/md2 /dev/sd?2
mdadm --grow /dev/md2 -n 3
Enjoy your triple redundancy.
Phil
next prev parent reply other threads:[~2014-01-21 13:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-17 6:50 request help with RAID1 array that endlessly attempts to sync Julie Ashworth
2013-12-17 16:53 ` Julie Ashworth
2013-12-17 17:55 ` Phil Turmel
2013-12-17 19:26 ` Julie Ashworth
2013-12-17 19:43 ` Phil Turmel
2013-12-17 23:12 ` David C. Rankin
2013-12-18 3:45 ` Julie Ashworth
2013-12-18 12:08 ` Phil Turmel
2014-01-21 6:38 ` Julie Ashworth
2014-01-21 13:23 ` Phil Turmel [this message]
2014-02-25 0:16 ` Julie Ashworth
2013-12-17 18:12 ` Wilson Jonathan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52DE74D3.10706@turmel.org \
--to=philip@turmel.org \
--cc=ashworth@berkeley.edu \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).