linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mdadm mail option configuration
@ 2002-05-28 16:04   ` Jeff Hill
  2002-05-29 10:46     ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Hill @ 2002-05-28 16:04 UTC (permalink / raw)
  To: linux-raid

I'm having problems configuring mdadm to send e-mail alerts. From past list 
discussions I configured it with:

devo:/#  cat /etc/mdadm.conf

DEVICE /dev/hd*[0-9] /dev/sd*[0-9]
ARRAY /dev/md1 level=raid1 num-devices=2 
UUID=8e0114bb:e5b3a608:048bed2d:5fe33724
ARRAY /dev/md2 level=raid1 num-devices=2 
UUID=434dd79d:0c046e9b:867e8d98:86bdb5da
ARRAY /dev/md3 level=raid1 num-devices=2 
UUID=9a9a2078:44c22d83:25a32457:db7e3783
ARRAY /dev/md5 level=raid1 num-devices=2 
UUID=e3254469:34147651:c9298800:5a84cd06
ARRAY /dev/md6 level=raid1 num-devices=2 
UUID=80c78ac7:c0f4442a:3dd6517d:57edc79e
MAILADDR jhill@hrpost.com
PROGRAM /usr/local/sbin/raidalert

------
Then, I just have an init script that runs:

	/sbin/mdadm -Fs --delay=600 &

I thought having that running would be right for monitoring and e-mailing, 
but I've tried a few variations without success. When one of the component 
devices fails and is booted out of the array, no alert is mailed. The 
raidalert script is one I picked up from past discussions on the list, and 
I wouldn't think it would be causing any problems:

devo:/#  cat /usr/local/sbin/raidalert
#!/bin/bash
# Log Raid Problems
logger -p kern.crit -t RAID $*

--------

I know this should be simple. I must be missing the obvious, but hours of 
tinkering and searching the list archives have gotten me nowhere.

Thanks for any assistance,

Jeff Hill


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm mail option configuration
  2002-05-28 16:04   ` mdadm mail option configuration Jeff Hill
@ 2002-05-29 10:46     ` Neil Brown
  2002-05-29 12:29       ` Jeff Hill
  2002-05-29 14:04       ` Danilo Godec
  0 siblings, 2 replies; 11+ messages in thread
From: Neil Brown @ 2002-05-29 10:46 UTC (permalink / raw)
  To: Jeff Hill; +Cc: linux-raid

On Tuesday May 28, jhill@hronline.com wrote:
> I'm having problems configuring mdadm to send e-mail alerts. From past list 
> discussions I configured it with:
> 
> devo:/#  cat /etc/mdadm.conf
> 
> DEVICE /dev/hd*[0-9] /dev/sd*[0-9]
> ARRAY /dev/md1 level=raid1 num-devices=2 
> UUID=8e0114bb:e5b3a608:048bed2d:5fe33724
> ARRAY /dev/md2 level=raid1 num-devices=2 
> UUID=434dd79d:0c046e9b:867e8d98:86bdb5da
> ARRAY /dev/md3 level=raid1 num-devices=2 
> UUID=9a9a2078:44c22d83:25a32457:db7e3783
> ARRAY /dev/md5 level=raid1 num-devices=2 
> UUID=e3254469:34147651:c9298800:5a84cd06
> ARRAY /dev/md6 level=raid1 num-devices=2 
> UUID=80c78ac7:c0f4442a:3dd6517d:57edc79e
> MAILADDR jhill@hrpost.com
> PROGRAM /usr/local/sbin/raidalert

You need white-space infront of the UUID= bits, as it is continuation
of the previous line... or did you suffer a mailer word wrap??

Do you have a "/usr/lib/sendmail" or is it somewhere else on your
system?

> 
> ------
> Then, I just have an init script that runs:
> 
> 	/sbin/mdadm -Fs --delay=600 &

Why 600 (10 minutes)?? I would suggest 60seconds for normal operation
and 1 second for testing.

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm mail option configuration
  2002-05-29 10:46     ` Neil Brown
@ 2002-05-29 12:29       ` Jeff Hill
  2002-05-29 22:53         ` Neil Brown
  2002-05-29 14:04       ` Danilo Godec
  1 sibling, 1 reply; 11+ messages in thread
From: Jeff Hill @ 2002-05-29 12:29 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

At 08:46 PM 29/05/2002 +1000, Neil Brown wrote:

>You need white-space infront of the UUID= bits, as it is continuation
>of the previous line... or did you suffer a mailer word wrap??

Yep, space was there (file generated by mdadm [using 1.0.0 src]). 
(Not used to using Windoze Eudora word wrap ;-) (haven't had time 
to install Debian):

DEVICE /dev/hd*[0-9] /dev/sd*[0-9]
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=8e0114bb:e5b3a608:048bed2d:5fe33724
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=434dd79d:0c046e9b:867e8d98:86bdb5da
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=9a9a2078:44c22d83:25a32457:db7e3783
ARRAY /dev/md5 level=raid1 num-devices=2 UUID=e3254469:34147651:c9298800:5a84cd06
ARRAY /dev/md6 level=raid1 num-devices=2 UUID=80c78ac7:c0f4442a:3dd6517d:57edc79e
MAILADDR jhill@hrpost.com
PROGRAM /usr/local/sbin/raidalert


>Do you have a "/usr/lib/sendmail" or is it somewhere else on your
>system?

/usr/lib/sendmail is a link to /usr/sbin/sendmail (qmail)

>> ------
>> Then, I just have an init script that runs:
>> 
>>       /sbin/mdadm -Fs --delay=600 &
>
>Why 600 (10 minutes)?? I would suggest 60seconds for normal operation
>and 1 second for testing.

Okay, changed. My choice was arbitrary, just assumed it would be 
reasonable -- I'm moving it on to a heavily loaded production server 
and the e-mail isn't constantly monitored so I assumed . . . .

Appreciate the help. As my config is apparently okay, I'll look 
elsewhere for the problem. 

Thanks for a really good program. mdadm seems like a big step up 
for Linux software raid.


>NeilBrown

Jeff Hill


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm mail option configuration
  2002-05-29 10:46     ` Neil Brown
  2002-05-29 12:29       ` Jeff Hill
@ 2002-05-29 14:04       ` Danilo Godec
  2002-05-29 23:10         ` Neil Brown
  1 sibling, 1 reply; 11+ messages in thread
From: Danilo Godec @ 2002-05-29 14:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Hill, linux-raid

On Wed, 29 May 2002, Neil Brown wrote:

> > Then, I just have an init script that runs:
> >
> > 	/sbin/mdadm -Fs --delay=600 &
>
> Why 600 (10 minutes)?? I would suggest 60seconds for normal operation
> and 1 second for testing.

I've tried this and maybe I'm missing something. I've set a 5 second
interval for checking and I only got one mail - notifiyng me about
a failure.

Now I'm not sure whether mdadm is supposed to send mail on every check? Or
is it sending mail just when a BAD thing happens?

  D.

PS: The PROGRAM was started more often (I think three times).



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm mail option configuration
  2002-05-29 12:29       ` Jeff Hill
@ 2002-05-29 22:53         ` Neil Brown
  2002-05-30 14:11           ` Jeff Hill
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2002-05-29 22:53 UTC (permalink / raw)
  To: Jeff Hill; +Cc: linux-raid

On Wednesday May 29, jhill@hrpost.com wrote:
> >> Then, I just have an init script that runs:
> >> 
> >>       /sbin/mdadm -Fs --delay=600 &
> >
> >Why 600 (10 minutes)?? I would suggest 60seconds for normal operation
> >and 1 second for testing.
> 
> Okay, changed. My choice was arbitrary, just assumed it would be 
> reasonable -- I'm moving it on to a heavily loaded production server 
> and the e-mail isn't constantly monitored so I assumed . . . .

You assumed what?  Are you thinking that it wil send mail every $delay
seconds if there is a problem, and you didn't want to be spammed?

Mdadm only sends mail when it notices a drive fail, not when it
notices that a drive is failed.
i.e. if on one poll the drive is working, and on the next poll the
drive is not working, then it sends mail saying "The drive just
failed".

So when you were testing, was "mdadm -Fs" actually running at the
moment when you simulated a drive failure?

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm mail option configuration
  2002-05-29 14:04       ` Danilo Godec
@ 2002-05-29 23:10         ` Neil Brown
  2002-05-30  6:25           ` Danilo Godec
  2002-05-31 14:09           ` Jakob Østergaard
  0 siblings, 2 replies; 11+ messages in thread
From: Neil Brown @ 2002-05-29 23:10 UTC (permalink / raw)
  To: Danilo Godec; +Cc: Jeff Hill, linux-raid

On Wednesday May 29, danci@agenda.si wrote:
> On Wed, 29 May 2002, Neil Brown wrote:
> 
> > > Then, I just have an init script that runs:
> > >
> > > 	/sbin/mdadm -Fs --delay=600 &
> >
> > Why 600 (10 minutes)?? I would suggest 60seconds for normal operation
> > and 1 second for testing.
> 
> I've tried this and maybe I'm missing something. I've set a 5 second
> interval for checking and I only got one mail - notifiyng me about
> a failure.

Yes, that's right.  One failure, one email.  I'm not in the business
of spam.

> 
> Now I'm not sure whether mdadm is supposed to send mail on every check? Or
> is it sending mail just when a BAD thing happens?

Exactly.

It has occurred to me that it could be useful to send mail at startup
if there appear to be any abnormalities, but I think I would prefer
that sort of functionality to be external.  A sysdamin might want that
mail are reboot, or every night, or every week, or never.  A simple:
  grep -s > /dev/nu $magic_pattern /proc/mdstat && 
       mail -s "Raid problem on `hostname`" root << END
	    Possible RAID problem, please check.
	    `hostname`
	    `cat /proc/mdstat
	    END

is all that is needed.
> 
>   D.
> 
> PS: The PROGRAM was started more often (I think three times).

That's quite possible. 
   Once for  Fail
   Once for  RebuildStarted
   maybe some RebuildNN
   Once for  SpareActive

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm mail option configuration
  2002-05-29 23:10         ` Neil Brown
@ 2002-05-30  6:25           ` Danilo Godec
  2002-05-31 14:09           ` Jakob Østergaard
  1 sibling, 0 replies; 11+ messages in thread
From: Danilo Godec @ 2002-05-30  6:25 UTC (permalink / raw)
  To: linux-raid

On Thu, 30 May 2002, Neil Brown wrote:

> > I've tried this and maybe I'm missing something. I've set a 5 second
> > interval for checking and I only got one mail - notifiyng me about
> > a failure.
>
> Yes, that's right.  One failure, one email.  I'm not in the business
> of spam.

I see, I thought it would mail on every status change - as in 'rebuild
started', 'spare active', etc.

> > Now I'm not sure whether mdadm is supposed to send mail on every check? Or
> > is it sending mail just when a BAD thing happens?
>
> Exactly.

OK; that's fine.

> > PS: The PROGRAM was started more often (I think three times).
>
> That's quite possible.
>    Once for  Fail
>    Once for  RebuildStarted
>    maybe some RebuildNN
>    Once for  SpareActive

Yup, that's exactly how it was. :)

Thanks, D.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm mail option configuration
  2002-05-29 22:53         ` Neil Brown
@ 2002-05-30 14:11           ` Jeff Hill
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff Hill @ 2002-05-30 14:11 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

At 08:53 AM 30/05/2002 +1000, Neil Brown wrote:
>On Wednesday May 29, jhill@hrpost.com wrote:
> > >> Then, I just have an init script that runs:
> > >>       /sbin/mdadm -Fs --delay=600 &
> > >Why 600 (10 minutes)?? I would suggest 60seconds for normal operation
> > >and 1 second for testing.
> > Okay, changed. My choice was arbitrary, just assumed it would be
> > reasonable -- I'm moving it on to a heavily loaded production server
> > and the e-mail isn't constantly monitored so I assumed . . . .
>
>You assumed what?  Are you thinking that it wil send mail every $delay
>seconds if there is a problem, and you didn't want to be spammed?

No, I assumed there was no reason to add additional polling. If there is no 
one around to read the e-mail, there seemed no reason to add to the number 
of processes being run on a heavily loaded server. I expect the load of 
polling is miniscule, but it all adds something.


>Mdadm only sends mail when it notices a drive fail, not when it
>notices that a drive is failed.
>i.e. if on one poll the drive is working, and on the next poll the
>drive is not working, then it sends mail saying "The drive just
>failed".
>
>So when you were testing, was "mdadm -Fs" actually running at the
>moment when you simulated a drive failure?

I certainly thought it was, but unfortunately I wasn't simulating. When I 
saw that all of the partitions on sdb had been kicked out of the array, I 
saw that mdadm -Fs was running, but the drive could have failed before a 
reboot when mdadm might not have been running. When you confirmed that my 
configuration was generally correct, I assumed that mdadm wasn't running or 
that I had done something else wrong. When I can get the system running 
again (lilo is hosed; looking at grub for boot raid?), I'll test and 
promise to e-mail back.

Regards,

Jeff Hill


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm mail option configuration
  2002-05-29 23:10         ` Neil Brown
  2002-05-30  6:25           ` Danilo Godec
@ 2002-05-31 14:09           ` Jakob Østergaard
  1 sibling, 0 replies; 11+ messages in thread
From: Jakob Østergaard @ 2002-05-31 14:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: Danilo Godec, Jeff Hill, linux-raid

On Thu, May 30, 2002 at 09:10:41AM +1000, Neil Brown wrote:
> On Wednesday May 29, danci@agenda.si wrote:
> > On Wed, 29 May 2002, Neil Brown wrote:
> > 
> > > > Then, I just have an init script that runs:
> > > >
> > > > 	/sbin/mdadm -Fs --delay=600 &
> > >
> > > Why 600 (10 minutes)?? I would suggest 60seconds for normal operation
> > > and 1 second for testing.
> > 
> > I've tried this and maybe I'm missing something. I've set a 5 second
> > interval for checking and I only got one mail - notifiyng me about
> > a failure.
> 
> Yes, that's right.  One failure, one email.  I'm not in the business
> of spam.

What we do with SysOrb (blatant plug: http://sysorb.com) is to send out an
e-mail immediately when the RAID degrades, and then a new mail every N seconds.
The RAID may be checked every 10 seconds, and the user may configure N to be,
say, 1800 seconds.  So the failure is detected almost immediately, while the
alert will only be sent every half hour for example.

We've found that this repetition is useful as a reminder.   It also motivates
people to either fix the problem, or schedule downtime for the check saying
that it will be down for another 24 hours for example.

Once you are administering more than a few machines, one alert can get lost in
the occational heap.

...
> It has occurred to me that it could be useful to send mail at startup
> if there appear to be any abnormalities, but I think I would prefer
> that sort of functionality to be external.  A sysdamin might want that
> mail are reboot, or every night, or every week, or never.  A simple:
>   grep -s > /dev/nu $magic_pattern /proc/mdstat && 
>        mail -s "Raid problem on `hostname`" root << END
> 	    Possible RAID problem, please check.
> 	    `hostname`
> 	    `cat /proc/mdstat
> 	    END
> 
> is all that is needed.

In general, I think that these small scripts are really nice and all, if that
is "good enough" for you.   Once they are no longer good enough, start looking
into real monitoring systems.

NetSaint could be hacked into supporting RAID I'm sure.   And if you want to
save the hackery and can accept a commercial solution, well, then I plugged one
just above  ;)

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: mdadm mail option configuration
@ 2002-06-01  1:43 Haofeng Kou
  2002-06-01 11:56 ` Jakob Østergaard
  0 siblings, 1 reply; 11+ messages in thread
From: Haofeng Kou @ 2002-06-01  1:43 UTC (permalink / raw)
  To: linux-raid

I finish one linux driver and want to do the whole features testing for
this driver, any body know certain tools for this testing ? It is a RAID
controller linux scsi driver.

Regards,
haofeng

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm mail option configuration
  2002-06-01  1:43 Haofeng Kou
@ 2002-06-01 11:56 ` Jakob Østergaard
  0 siblings, 0 replies; 11+ messages in thread
From: Jakob Østergaard @ 2002-06-01 11:56 UTC (permalink / raw)
  To: Haofeng Kou; +Cc: linux-raid

On Fri, May 31, 2002 at 06:43:26PM -0700, Haofeng Kou wrote:
> I finish one linux driver and want to do the whole features testing for
> this driver, any body know certain tools for this testing ? It is a RAID
> controller linux scsi driver.

You could run some disk benchmarks (tiotest, bonnie etc.)

The best test is of course when you get the driver into the kernel (it is
GPL right?) - then people will do all the testing for you   :)

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-06-01 11:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <message from Jeff Hill on Wednesday May 29>
     [not found] ` <message from Jeff Hill on Tuesday May 28>
2002-05-28 16:04   ` mdadm mail option configuration Jeff Hill
2002-05-29 10:46     ` Neil Brown
2002-05-29 12:29       ` Jeff Hill
2002-05-29 22:53         ` Neil Brown
2002-05-30 14:11           ` Jeff Hill
2002-05-29 14:04       ` Danilo Godec
2002-05-29 23:10         ` Neil Brown
2002-05-30  6:25           ` Danilo Godec
2002-05-31 14:09           ` Jakob Østergaard
2002-06-01  1:43 Haofeng Kou
2002-06-01 11:56 ` Jakob Østergaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).