* LVM RAID1 syncing component
@ 2014-11-25 4:07 Joe Lawrence
2014-11-26 5:42 ` Chris Murphy
2014-11-26 20:41 ` NeilBrown
0 siblings, 2 replies; 10+ messages in thread
From: Joe Lawrence @ 2014-11-25 4:07 UTC (permalink / raw)
To: linux-raid; +Cc: Joe Lawrence
Does anyone know how its possible to determine which side of an LVM RAID 1
is the stale partner during RAID resync?
In ordinary MD RAID, I believe you can check
/sys/block/md0/md/dev-XXX/state, but LVM RAID seems to hide those files
when leveraging the MD code. I've looked though pvs/vgs/lvs manpages, but
can't figure anything out there either.
Thanks,
-- Joe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: LVM RAID1 syncing component
2014-11-25 4:07 LVM RAID1 syncing component Joe Lawrence
@ 2014-11-26 5:42 ` Chris Murphy
2014-11-26 13:20 ` Joe Lawrence
2014-11-26 20:41 ` NeilBrown
1 sibling, 1 reply; 10+ messages in thread
From: Chris Murphy @ 2014-11-26 5:42 UTC (permalink / raw)
To: linux-raid
On Mon, Nov 24, 2014 at 9:07 PM, Joe Lawrence <joe.lawrence@stratus.com> wrote:
> Does anyone know how its possible to determine which side of an LVM RAID 1
> is the stale partner during RAID resync?
>
> In ordinary MD RAID, I believe you can check
> /sys/block/md0/md/dev-XXX/state, but LVM RAID seems to hide those files
> when leveraging the MD code. I've looked though pvs/vgs/lvs manpages, but
> can't figure anything out there either.
Rather indirectly: iotop which will show you which devices are mostly
being read from and written to.
# lvs -a -o copy_percent
Anything less than 100% is syncing. I think.
--
Chris Murphy
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: LVM RAID1 syncing component
2014-11-26 5:42 ` Chris Murphy
@ 2014-11-26 13:20 ` Joe Lawrence
0 siblings, 0 replies; 10+ messages in thread
From: Joe Lawrence @ 2014-11-26 13:20 UTC (permalink / raw)
To: Chris Murphy; +Cc: linux-raid, linux-lvm
On Tue, 25 Nov 2014 22:42:38 -0700
Chris Murphy <lists@colorremedies.com> wrote:
> On Mon, Nov 24, 2014 at 9:07 PM, Joe Lawrence <joe.lawrence@stratus.com> wrote:
> > Does anyone know how its possible to determine which side of an LVM RAID 1
> > is the stale partner during RAID resync?
> >
> > In ordinary MD RAID, I believe you can check
> > /sys/block/md0/md/dev-XXX/state, but LVM RAID seems to hide those files
> > when leveraging the MD code. I've looked though pvs/vgs/lvs manpages, but
> > can't figure anything out there either.
>
> Rather indirectly: iotop which will show you which devices are mostly
> being read from and written to.
>
> # lvs -a -o copy_percent
> Anything less than 100% is syncing. I think.
>
From the manpages I see the following attribute bits:
* lvs, lv_attr bit Volume Health: (p)artial
* vgs, vg_attr bit (p)artial: one or more physical volumes belonging
to the volume group are missing from the system
* pvs, pv_attr bit (m)issing
along with the lvs copy_percent (is this similar to sync_percent) that
you mentioned. That's about it.
Since there seems to be no real underlying MD device, I'm assuming that
ioctls are out of the question as well.
-- Joe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: LVM RAID1 syncing component
2014-11-25 4:07 LVM RAID1 syncing component Joe Lawrence
2014-11-26 5:42 ` Chris Murphy
@ 2014-11-26 20:41 ` NeilBrown
2014-11-29 15:26 ` Peter Grandi
2014-12-01 21:19 ` Joe Lawrence
1 sibling, 2 replies; 10+ messages in thread
From: NeilBrown @ 2014-11-26 20:41 UTC (permalink / raw)
To: Joe Lawrence; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1090 bytes --]
On Mon, 24 Nov 2014 23:07:32 -0500 Joe Lawrence <joe.lawrence@stratus.com>
wrote:
> Does anyone know how its possible to determine which side of an LVM RAID 1
> is the stale partner during RAID resync?
>
> In ordinary MD RAID, I believe you can check
> /sys/block/md0/md/dev-XXX/state,
Why do you believe that?
During a resync (after an unclean shutdown) the devices are indistinguishable.
RAID1 reads all drives and if there is a difference it chooses one data block
to write to the others - always the one with the lowest index number.
So with md or LVM it is the same: first "first" is "copied" to the "second".
NeilBrown
> but LVM RAID seems to hide those files
> when leveraging the MD code. I've looked though pvs/vgs/lvs manpages, but
> can't figure anything out there either.
>
> Thanks,
>
> -- Joe
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: LVM RAID1 syncing component
2014-11-26 20:41 ` NeilBrown
@ 2014-11-29 15:26 ` Peter Grandi
2014-12-01 23:28 ` NeilBrown
2014-12-01 21:19 ` Joe Lawrence
1 sibling, 1 reply; 10+ messages in thread
From: Peter Grandi @ 2014-11-29 15:26 UTC (permalink / raw)
To: Linux RAID
[ ... ]
> During a resync (after an unclean shutdown) the devices are
> indistinguishable. RAID1 reads all drives and if there is a
> difference it chooses one data block to write to the others -
> always the one with the lowest index number.
Uhhhhh "indistinguishable" and "lowest index number"?
Shouldn't that be "lowest index number among those with the
highest event count"?
Put another way, couldn't it happen that in a 5-way RAID1 for
example an unclean shutdown results in 2 drives with the same
highest event count and 3 drives with lower event counts, and
then the data page to write is that from the one of the 2 with
the lowest index number and is written only to the 3 with the
lower event count?
Also, in case of an «unclean shutdown» resulting in all members
of a RAID1 set having the same event count, is the resync still
done? Is it necessary? Or is «unclean shutdown» used here as an
alias for "not all event counts are the same".
I am asking as to what RAID1 actually does mostly, but also
perhaps as to what it ought to be doing.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: LVM RAID1 syncing component
2014-11-26 20:41 ` NeilBrown
2014-11-29 15:26 ` Peter Grandi
@ 2014-12-01 21:19 ` Joe Lawrence
2014-12-01 21:27 ` Joe Lawrence
2014-12-01 21:41 ` NeilBrown
1 sibling, 2 replies; 10+ messages in thread
From: Joe Lawrence @ 2014-12-01 21:19 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Thu, 27 Nov 2014 07:41:58 +1100
NeilBrown <neilb@suse.de> wrote:
> On Mon, 24 Nov 2014 23:07:32 -0500 Joe Lawrence <joe.lawrence@stratus.com>
> wrote:
>
> > Does anyone know how its possible to determine which side of an LVM RAID 1
> > is the stale partner during RAID resync?
> >
> > In ordinary MD RAID, I believe you can check
> > /sys/block/md0/md/dev-XXX/state,
>
> Why do you believe that?
>
> During a resync (after an unclean shutdown) the devices are indistinguishable.
> RAID1 reads all drives and if there is a difference it chooses one data block
> to write to the others - always the one with the lowest index number.
>
> So with md or LVM it is the same: first "first" is "copied" to the "second".
Hi Neil,
Here's a quick example of my thought-process, where md2 is an in-sync
RAID1 of sdq2 and sdr2 with an internal write bitmap:
% mdadm --fail /dev/md3 /dev/sdr2
% mdadm --remove /dev/md3 /dev/sdr2
[ ... File I/O to /dev/md3 ... ]
% mdadm -X /dev/sd[qr]2
Filename : /dev/sdq2
Magic : 6d746962
Version : 4
UUID : 073511ee:0b0c20e0:662ae8da:b53c7979
Events : 8526 << ECq
Events Cleared : 8498
State : OK
Chunksize : 64 MB
Daemon : 5s flush period
Write Mode : Normal
Sync Size : 16768896 (15.99 GiB 17.17 GB)
Bitmap : 256 bits (chunks), 5 dirty (2.0%)
Filename : /dev/sdr2
Magic : 6d746962
Version : 4
UUID : 073511ee:0b0c20e0:662ae8da:b53c7979
Events : 8513 << ECr
Events Cleared : 8498
State : OK
Chunksize : 64 MB
Daemon : 5s flush period
Write Mode : Normal
Sync Size : 16768896 (15.99 GiB 17.17 GB)
Bitmap : 256 bits (chunks), 5 dirty (2.0%)
[ Note that ECq > ECr, which makes sense since sdq was the remaining
disk standing in the RAID. ]
% mdadm --add /dev/md3 /dev/sdr2
% mdadm --detail /dev/md3
/dev/md3:
Version : 1.2
Creation Time : Thu Nov 13 15:47:19 2014
Raid Level : raid1
Array Size : 16768896 (15.99 GiB 17.17 GB)
Used Dev Size : 16768896 (15.99 GiB 17.17 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Dec 1 16:07:55 2014
State : active, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 0% complete
Name : dhcp-linux-2192-2025:3
UUID : 073511ee:0b0c20e0:662ae8da:b53c7979
Events : 8528
Number Major Minor RaidDevice State
0 65 2 0 active sync /dev/sdq2
1 65 18 1 spare rebuilding /dev/sdr2
% head /sys/block/md3/md/dev-sd*/state
==> /sys/block/md3/md/dev-sdq2/state <==
in_sync
==> /sys/block/md3/md/dev-sdr2/state <==
spare
In this scenario, sdr was re-added to the RAID and with a lower events-
cleared count. I assume that MD will only need to read the data
represented by the dirty bitmap bits from the "active sync" disk to the
"spare rebuilding" disk. Is this not the case?
Regards,
-- Joe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: LVM RAID1 syncing component
2014-12-01 21:19 ` Joe Lawrence
@ 2014-12-01 21:27 ` Joe Lawrence
2014-12-01 21:41 ` NeilBrown
1 sibling, 0 replies; 10+ messages in thread
From: Joe Lawrence @ 2014-12-01 21:27 UTC (permalink / raw)
To: Joe Lawrence; +Cc: NeilBrown, linux-raid
On Mon, 1 Dec 2014 16:19:47 -0500
Joe Lawrence <joe.lawrence@stratus.com> wrote:
> On Thu, 27 Nov 2014 07:41:58 +1100
> NeilBrown <neilb@suse.de> wrote:
>
> > On Mon, 24 Nov 2014 23:07:32 -0500 Joe Lawrence <joe.lawrence@stratus.com>
> > wrote:
> >
> > > Does anyone know how its possible to determine which side of an LVM RAID 1
> > > is the stale partner during RAID resync?
> > >
> > > In ordinary MD RAID, I believe you can check
> > > /sys/block/md0/md/dev-XXX/state,
> >
> > Why do you believe that?
> >
> > During a resync (after an unclean shutdown) the devices are indistinguishable.
> > RAID1 reads all drives and if there is a difference it chooses one data block
> > to write to the others - always the one with the lowest index number.
> >
> > So with md or LVM it is the same: first "first" is "copied" to the "second".
>
> Hi Neil,
>
> Here's a quick example of my thought-process, where md2 is an in-sync
> RAID1 of sdq2 and sdr2 with an internal write bitmap:
>
> % mdadm --fail /dev/md3 /dev/sdr2
> % mdadm --remove /dev/md3 /dev/sdr2
>
> [ ... File I/O to /dev/md3 ... ]
>
> % mdadm -X /dev/sd[qr]2
> Filename : /dev/sdq2
> Magic : 6d746962
> Version : 4
> UUID : 073511ee:0b0c20e0:662ae8da:b53c7979
> Events : 8526 << ECq
> Events Cleared : 8498
> State : OK
> Chunksize : 64 MB
> Daemon : 5s flush period
> Write Mode : Normal
> Sync Size : 16768896 (15.99 GiB 17.17 GB)
> Bitmap : 256 bits (chunks), 5 dirty (2.0%)
> Filename : /dev/sdr2
> Magic : 6d746962
> Version : 4
> UUID : 073511ee:0b0c20e0:662ae8da:b53c7979
> Events : 8513 << ECr
> Events Cleared : 8498
> State : OK
> Chunksize : 64 MB
> Daemon : 5s flush period
> Write Mode : Normal
> Sync Size : 16768896 (15.99 GiB 17.17 GB)
> Bitmap : 256 bits (chunks), 5 dirty (2.0%)
>
> [ Note that ECq > ECr, which makes sense since sdq was the remaining
> disk standing in the RAID. ]
>
> % mdadm --add /dev/md3 /dev/sdr2
> % mdadm --detail /dev/md3
> /dev/md3:
> Version : 1.2
> Creation Time : Thu Nov 13 15:47:19 2014
> Raid Level : raid1
> Array Size : 16768896 (15.99 GiB 17.17 GB)
> Used Dev Size : 16768896 (15.99 GiB 17.17 GB)
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Mon Dec 1 16:07:55 2014
> State : active, degraded, recovering
> Active Devices : 1
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 1
>
> Rebuild Status : 0% complete
>
> Name : dhcp-linux-2192-2025:3
> UUID : 073511ee:0b0c20e0:662ae8da:b53c7979
> Events : 8528
>
> Number Major Minor RaidDevice State
> 0 65 2 0 active sync /dev/sdq2
> 1 65 18 1 spare rebuilding /dev/sdr2
>
> % head /sys/block/md3/md/dev-sd*/state
> ==> /sys/block/md3/md/dev-sdq2/state <==
> in_sync
>
> ==> /sys/block/md3/md/dev-sdr2/state <==
> spare
>
> In this scenario, sdr was re-added to the RAID and with a lower events-
> cleared count. I assume that MD will only need to read the data
> represented by the dirty bitmap bits from the "active sync" disk to the
> "spare rebuilding" disk. Is this not the case?
D'oh! Sorry for the edit, I meant the "events" count and not the "events
cleared" count to determine sync direction. I put the "<<" arrows in
the right place but then used the wrong term everywhere else.
-- Joe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: LVM RAID1 syncing component
2014-12-01 21:19 ` Joe Lawrence
2014-12-01 21:27 ` Joe Lawrence
@ 2014-12-01 21:41 ` NeilBrown
2014-12-02 19:05 ` Joe Lawrence
1 sibling, 1 reply; 10+ messages in thread
From: NeilBrown @ 2014-12-01 21:41 UTC (permalink / raw)
To: Joe Lawrence; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1759 bytes --]
On Mon, 1 Dec 2014 16:19:47 -0500 Joe Lawrence <joe.lawrence@stratus.com>
wrote:
> On Thu, 27 Nov 2014 07:41:58 +1100
> NeilBrown <neilb@suse.de> wrote:
>
> > On Mon, 24 Nov 2014 23:07:32 -0500 Joe Lawrence <joe.lawrence@stratus.com>
> > wrote:
> >
> > > Does anyone know how its possible to determine which side of an LVM RAID 1
> > > is the stale partner during RAID resync?
> > >
> > > In ordinary MD RAID, I believe you can check
> > > /sys/block/md0/md/dev-XXX/state,
> >
> > Why do you believe that?
> >
> > During a resync (after an unclean shutdown) the devices are indistinguishable.
> > RAID1 reads all drives and if there is a difference it chooses one data block
> > to write to the others - always the one with the lowest index number.
> >
> > So with md or LVM it is the same: first "first" is "copied" to the "second".
>
> Hi Neil,
>
> Here's a quick example of my thought-process, where md2 is an in-sync
> RAID1 of sdq2 and sdr2 with an internal write bitmap:
>
> % mdadm --fail /dev/md3 /dev/sdr2
> % mdadm --remove /dev/md3 /dev/sdr2
You are referring to what I would call "recovery", not "resync"
(which is why I put "(after an unclean shutdown)" in my answer to make it
clear what circumstances I was talking about).
resync: fixing things after an unclean shutdown
recovery: restoring data after a device has been removed an another
(or possibly the same) added.
I think
dmsetup info
should provide the info you want.
One of the fields is a sequence of letters 'D', 'a', 'A'.
* Status characters:
* 'D' = Dead/Failed device
* 'a' = Alive but not in-sync
* 'A' = Alive and in-sync
Does that provide the information you wanted?
NeilBrown
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: LVM RAID1 syncing component
2014-11-29 15:26 ` Peter Grandi
@ 2014-12-01 23:28 ` NeilBrown
0 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2014-12-01 23:28 UTC (permalink / raw)
To: Peter Grandi; +Cc: Linux RAID
[-- Attachment #1: Type: text/plain, Size: 1952 bytes --]
On Sat, 29 Nov 2014 15:26:52 +0000 pg@lxra2.for.sabi.co.UK (Peter Grandi)
wrote:
> [ ... ]
>
> > During a resync (after an unclean shutdown) the devices are
> > indistinguishable. RAID1 reads all drives and if there is a
> > difference it chooses one data block to write to the others -
> > always the one with the lowest index number.
>
> Uhhhhh "indistinguishable" and "lowest index number"?
>
> Shouldn't that be "lowest index number among those with the
> highest event count"?
>
> Put another way, couldn't it happen that in a 5-way RAID1 for
> example an unclean shutdown results in 2 drives with the same
> highest event count and 3 drives with lower event counts, and
> then the data page to write is that from the one of the 2 with
> the lowest index number and is written only to the 3 with the
> lower event count?
>
> Also, in case of an «unclean shutdown» resulting in all members
> of a RAID1 set having the same event count, is the resync still
> done? Is it necessary? Or is «unclean shutdown» used here as an
> alias for "not all event counts are the same".
>
> I am asking as to what RAID1 actually does mostly, but also
> perhaps as to what it ought to be doing.
I've told you what it actually does. I think that is what it ought to do.
If you think that maybe it ought to do something differently from what it
does, I suggest you try to come up with a specific scenario where what
actually happens is not optimal, and give clear reasons for why you think
something else is optimal.
To be specific, after an unclean shutdown, the array is assembled from all
devices which have an uptodate event count, and then all blocks are compared
and where a difference is found, data is copies from the lowest index number
block to the others.
"uptodate" in the context of event counts means the event count is equal to,
or one less than, the highest event count found.
NeilBrown
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: LVM RAID1 syncing component
2014-12-01 21:41 ` NeilBrown
@ 2014-12-02 19:05 ` Joe Lawrence
0 siblings, 0 replies; 10+ messages in thread
From: Joe Lawrence @ 2014-12-02 19:05 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Tue, 2 Dec 2014 08:41:11 +1100
NeilBrown <neilb@suse.de> wrote:
> On Mon, 1 Dec 2014 16:19:47 -0500 Joe Lawrence <joe.lawrence@stratus.com>
> wrote:
>
> > On Thu, 27 Nov 2014 07:41:58 +1100
> > NeilBrown <neilb@suse.de> wrote:
> >
> > > On Mon, 24 Nov 2014 23:07:32 -0500 Joe Lawrence <joe.lawrence@stratus.com>
> > > wrote:
> > >
> > > > Does anyone know how its possible to determine which side of an LVM RAID 1
> > > > is the stale partner during RAID resync?
> > > >
> > > > In ordinary MD RAID, I believe you can check
> > > > /sys/block/md0/md/dev-XXX/state,
> > >
> > > Why do you believe that?
> > >
> > > During a resync (after an unclean shutdown) the devices are indistinguishable.
> > > RAID1 reads all drives and if there is a difference it chooses one data block
> > > to write to the others - always the one with the lowest index number.
> > >
> > > So with md or LVM it is the same: first "first" is "copied" to the "second".
> >
> > Hi Neil,
> >
> > Here's a quick example of my thought-process, where md2 is an in-sync
> > RAID1 of sdq2 and sdr2 with an internal write bitmap:
> >
> > % mdadm --fail /dev/md3 /dev/sdr2
> > % mdadm --remove /dev/md3 /dev/sdr2
>
> You are referring to what I would call "recovery", not "resync"
> (which is why I put "(after an unclean shutdown)" in my answer to make it
> clear what circumstances I was talking about).
>
> resync: fixing things after an unclean shutdown
> recovery: restoring data after a device has been removed an another
> (or possibly the same) added.
>
> I think
>
> dmsetup info
>
> should provide the info you want.
> One of the fields is a sequence of letters 'D', 'a', 'A'.
>
> * Status characters:
> * 'D' = Dead/Failed device
> * 'a' = Alive but not in-sync
> * 'A' = Alive and in-sync
>
> Does that provide the information you wanted?
Yes! When I add a disk back to the array, I see the status characters
you mentioned during _recovery_:
% while [ true ]
do
dmsetup status vg0-lvraid0
sleep 10s
done
0 18857984 raid raid1 2 DA 18857984/18857984 idle 0
0 18857984 raid raid1 2 DA 18857984/18857984 idle 0
0 18857984 raid raid1 2 DA 18857984/18857984 idle 0
0 18857984 raid raid1 2 aA 0/18857984 recover 0
0 18857984 raid raid1 2 aA 0/18857984 recover 0
0 18857984 raid raid1 2 aA 256/18857984 recover 0
0 18857984 raid raid1 2 aA 8519680/18857984 recover 0
0 18857984 raid raid1 2 AA 18857984/18857984 idle 0
So now, determining which disk is which in the raid_set. Can I use a
command like lvs to tie the n-th character status back to a device?
% lvs -a -o name,devices vg0
LV Devices
lvraid0 lvraid0_rimage_0(0),lvraid0_rimage_1(0)
[lvraid0_rimage_0] /dev/sdr1(1)
[lvraid0_rimage_1] /dev/sdt1(1)
[lvraid0_rmeta_0] /dev/sdr1(0)
[lvraid0_rmeta_1] /dev/sdt1(0)
Where the first character represents [lvraid0_rimage_0] and the second
[lvraid0_rimage_1].
Thanks,
-- Joe
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-12-02 19:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-25 4:07 LVM RAID1 syncing component Joe Lawrence
2014-11-26 5:42 ` Chris Murphy
2014-11-26 13:20 ` Joe Lawrence
2014-11-26 20:41 ` NeilBrown
2014-11-29 15:26 ` Peter Grandi
2014-12-01 23:28 ` NeilBrown
2014-12-01 21:19 ` Joe Lawrence
2014-12-01 21:27 ` Joe Lawrence
2014-12-01 21:41 ` NeilBrown
2014-12-02 19:05 ` Joe Lawrence
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).