All kinds of things on RAID/mdadm

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* All kinds of things on RAID/mdadm
@ 2011-11-07 12:46 Martijn
  2011-11-07 13:05 ` Miles Fidelman
  2011-11-09 21:56 ` NeilBrown
  0 siblings, 2 replies; 4+ messages in thread
From: Martijn @ 2011-11-07 12:46 UTC (permalink / raw)
  To: linux-raid

Dear readers, perhaps Neil and/or fellow mdadm developers,

Over the last couple of weeks, I've been spending quite some time with 
mdadm and looking at a good way to use RAID on Linux for one of our 
servers. My colleagues (friends) and me are designing a new software 
platform for our company service, and RAID is an important base layer in 
the system.

To give you some context: our company was started not only to provide 
paid services, but also for learning purposes. We where all students 
when we started our company over nine years ago, and we still like to 
learn things related to all kinds of (IT) subjects. In that same 
direction, because we want to learn, we fancy a certain kind of 
'thoroughness' when creating and documenting something. At least, when 
we're given the time to do that, such as in this case.

Our current platform was set up early in 2005, when a colleague and I 
spent an evening finding out if we *really* couldn't just mirror full 
/dev/hda with /dev/hdb ;-) After reading it wasn't possible many, many 
times, we ended up manually copying partition tables from hda to hdb 
(eek!), mirroring partitions instead of drives (huh?), using some LVM 
(every time you think you know how it works, it's different), installing 
grub on both drives ("will this work on failover?"), and.. it worked.

I really slept bad after that though: we needed the reliability cheap, 
but it was so different from what I had imagined upfront and knew from 
hardware RAID. The extra complexity was a big deal for me too. Bigger 
than necessary I think. But I wanted to be more sure I could wrap my 
head around a problem if someone would call me in the middle of the 
night to fix it.

So this time around, I choose to be more thorough on the important 
aspects and one of those aspects is: recovery and what to do if 
something is wrong. While mdadm is a tool that's pretty clear in it's 
usage, supported by a good manual, I've come accross some things I 
cannot document to my full satisfaction after reading the manual. 
raid.wiki.kernel.org is down as well, and ironically the contents aren't 
'mirrored' anywhere. Google Cache may have it, but I can't find it: the 
results are littered with non-important meta pages from the wiki.
I also quickly searched through the mdadm code, but didn't see comments 
that cleared up my questions.

Searching for possible states of an array, I discovered that there are 
all sorts of combinations for states. The basics are clean, degraded and 
dirty. But what does 'clean, no-errors' mean? And 'dirty, no-errors'? 
Searching through the code, I even found a point where a label 'Dirty 
State' could be listed as 'clean'. Is it a good idea to add a list with 
explainations of possible states, basic and exotic, to the manual? Much 
in the same way all monitor events are listed. I can imagine not 
everyone knowing the difference between dirty and degraded for example. 
It's a basic thing that is skipped in most cases.

Perhaps the same could be done for individual disk states. Of course we 
all know "active sync", and based on what I've seen elsewhere the states 
"removed", "spare" and "faulty spare" exist. But having a list of all 
possible states would help prepare documentation for the things we 
really don't want to happen. Takes off the pressure a bit :-)

I'm not voting for mdadm to become a tool that even babies can use to 
create their arrays, but with this info others may be able to act with 
confidence based on their own knowledge, instead of search for articles 
on the web that happen to list the state of the array they're searching 
for. A lot of those articles do not teach anything. They just make you 
brainlessly copy and paste commands and fill in the character device 
files. Some of them are just plain wrong and may result in data being lost.
I also vote on articles giving partitionable devices a good kick over 
using partitions for RAID, but that's outside the scope of this post ;-)

Where do you think that important things, such as to 'how to organize 
failover' and questions like 'do I benifit from putting swap on a RAID 
char. device', should be documented? Is it the currently unreachable 
raid.wiki.kernel.org? Would it be better to provide the info that leads 
to the answers in the mdadm manual so that it is always available?

Are there any sources you would recommend reading if someone is 
interested in how mdadm/software-RAID 'works'? I'm not sure if RAID has 
an actual spec somewhere on which mdadm is based.

Looking forward to your replies and maybe a conversation leading to 
improvement where necessary :-)

Kind regards,
Martijn

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: All kinds of things on RAID/mdadm
  2011-11-07 12:46 All kinds of things on RAID/mdadm Martijn
@ 2011-11-07 13:05 ` Miles Fidelman
  2011-11-09 18:50   ` Keith Keller
  2011-11-09 21:56 ` NeilBrown
  1 sibling, 1 reply; 4+ messages in thread
From: Miles Fidelman @ 2011-11-07 13:05 UTC (permalink / raw)
  To: linux-raid

Martijn wrote:
>
> Where do you think that important things, such as to 'how to organize 
> failover' and questions like 'do I benifit from putting swap on a RAID 
> char. device', should be documented? Is it the currently unreachable 
> raid.wiki.kernel.org? Would it be better to provide the info that 
> leads to the answers in the mdadm manual so that it is always available?
>
> Are there any sources you would recommend reading if someone is 
> interested in how mdadm/software-RAID 'works'? I'm not sure if RAID 
> has an actual spec somewhere on which mdadm is based.
>

I'd like to echo this.  The unavailability of raid.wiki.kernel.org (as 
well as the rest of *.wiki.kernel.org) is a rather critical outage.

I've been hard pressed to find any other definitive sources of 
information about md, btrfs, etc.

Anybody have some suggestions?

Miles Fidelman


-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: All kinds of things on RAID/mdadm
  2011-11-07 13:05 ` Miles Fidelman
@ 2011-11-09 18:50   ` Keith Keller
  0 siblings, 0 replies; 4+ messages in thread
From: Keith Keller @ 2011-11-09 18:50 UTC (permalink / raw)
  To: linux-raid

On 2011-11-07, Miles Fidelman <mfidelman@meetinghouse.net> wrote:
>
> I'd like to echo this.  The unavailability of raid.wiki.kernel.org (as 
> well as the rest of *.wiki.kernel.org) is a rather critical outage.

As of this morning, raid.wiki.kernel.org seems to be back.

I've been looking it over, from the standpoint of a month-long user of
mdraid, and have found some minor points I'd like to change.  Is there
an accepted process for making modifications, or can anyone simply
create an account and update the wiki?  (In particular, the
Reconstruction page talks about powering down a system, but should also
mention hot-swap as an alternative; also, it mentions raidhotadd, which
IIRC is no longer part of the md tools; I believe mdadm /dev/mdX --add
/dev/sdXX is now the correct syntax?)

--keith

-- 
kkeller@wombat.san-francisco.ca.us

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: All kinds of things on RAID/mdadm
  2011-11-07 12:46 All kinds of things on RAID/mdadm Martijn
  2011-11-07 13:05 ` Miles Fidelman
@ 2011-11-09 21:56 ` NeilBrown
  1 sibling, 0 replies; 4+ messages in thread
From: NeilBrown @ 2011-11-09 21:56 UTC (permalink / raw)
  To: mailinglist; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6098 bytes --]

On Mon, 07 Nov 2011 13:46:56 +0100 Martijn <mailinglist@mindconnect.nl> wrote:

> Dear readers, perhaps Neil and/or fellow mdadm developers,
> 
> Over the last couple of weeks, I've been spending quite some time with 
> mdadm and looking at a good way to use RAID on Linux for one of our 
> servers. My colleagues (friends) and me are designing a new software 
> platform for our company service, and RAID is an important base layer in 
> the system.
> 
> To give you some context: our company was started not only to provide 
> paid services, but also for learning purposes. We where all students 
> when we started our company over nine years ago, and we still like to 
> learn things related to all kinds of (IT) subjects. In that same 
> direction, because we want to learn, we fancy a certain kind of 
> 'thoroughness' when creating and documenting something. At least, when 
> we're given the time to do that, such as in this case.
> 
> Our current platform was set up early in 2005, when a colleague and I 
> spent an evening finding out if we *really* couldn't just mirror full 
> /dev/hda with /dev/hdb ;-) After reading it wasn't possible many, many 
> times, we ended up manually copying partition tables from hda to hdb 
> (eek!), mirroring partitions instead of drives (huh?), using some LVM 
> (every time you think you know how it works, it's different), installing 
> grub on both drives ("will this work on failover?"), and.. it worked.
> 
> I really slept bad after that though: we needed the reliability cheap, 
> but it was so different from what I had imagined upfront and knew from 
> hardware RAID. The extra complexity was a big deal for me too. Bigger 
> than necessary I think. But I wanted to be more sure I could wrap my 
> head around a problem if someone would call me in the middle of the 
> night to fix it.
> 
> So this time around, I choose to be more thorough on the important 
> aspects and one of those aspects is: recovery and what to do if 
> something is wrong. While mdadm is a tool that's pretty clear in it's 
> usage, supported by a good manual, I've come accross some things I 
> cannot document to my full satisfaction after reading the manual. 
> raid.wiki.kernel.org is down as well, and ironically the contents aren't 
> 'mirrored' anywhere. Google Cache may have it, but I can't find it: the 
> results are littered with non-important meta pages from the wiki.
> I also quickly searched through the mdadm code, but didn't see comments 
> that cleared up my questions.
> 
> Searching for possible states of an array, I discovered that there are 
> all sorts of combinations for states. The basics are clean, degraded and 
> dirty. But what does 'clean, no-errors' mean? And 'dirty, no-errors'? 
> Searching through the code, I even found a point where a label 'Dirty 
> State' could be listed as 'clean'. Is it a good idea to add a list with 
> explainations of possible states, basic and exotic, to the manual? Much 
> in the same way all monitor events are listed. I can imagine not 
> everyone knowing the difference between dirty and degraded for example. 
> It's a basic thing that is skipped in most cases.

The basics are really:

 - clean or dirty  (where 'dirty' is sometimes called 'active')
 - optimal or degraded or failed

There two sets are independent, though a RAID4,5,6 array which is both dirty
and degraded cannot be started without "--force" as there could be corruption.

Where are you getting the "no-errors" messages from?


> 
> Perhaps the same could be done for individual disk states. Of course we 
> all know "active sync", and based on what I've seen elsewhere the states 
> "removed", "spare" and "faulty spare" exist. But having a list of all 
> possible states would help prepare documentation for the things we 
> really don't want to happen. Takes off the pressure a bit :-)

I don't think "faulty spare" is a meaningful state. Where did you see that?

A device can be:
 faulty or missing or removed
 spare
 active, but not yet fully in-sync
 active, sync

> 
> I'm not voting for mdadm to become a tool that even babies can use to 
> create their arrays, but with this info others may be able to act with 
> confidence based on their own knowledge, instead of search for articles 
> on the web that happen to list the state of the array they're searching 
> for. A lot of those articles do not teach anything. They just make you 
> brainlessly copy and paste commands and fill in the character device 
> files. Some of them are just plain wrong and may result in data being lost.
> I also vote on articles giving partitionable devices a good kick over 
> using partitions for RAID, but that's outside the scope of this post ;-)
> 
> Where do you think that important things, such as to 'how to organize 
> failover' and questions like 'do I benifit from putting swap on a RAID 
> char. device', should be documented? Is it the currently unreachable 
> raid.wiki.kernel.org? Would it be better to provide the info that leads 
> to the answers in the mdadm manual so that it is always available?

I'm not sure man pages are the right place for some of this, though there is
certainly room for improvement in the man pages and I'm happy to take
contributions.

If we wanted a document that talked about best-practice and swap and so forth
I would suggest an 'info' document would be the right sort of format.


> 
> Are there any sources you would recommend reading if someone is 
> interested in how mdadm/software-RAID 'works'? I'm not sure if RAID has 
> an actual spec somewhere on which mdadm is based.

The wikipedia entry isn't bad.

> 
> Looking forward to your replies and maybe a conversation leading to 
> improvement where necessary :-)
> 
> Kind regards,
> Martijn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-11-09 21:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-07 12:46 All kinds of things on RAID/mdadm Martijn
2011-11-07 13:05 ` Miles Fidelman
2011-11-09 18:50   ` Keith Keller
2011-11-09 21:56 ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).