public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Warne <nick@linicks.net>
To: linux-kernel@vger.kernel.org
Subject: Black Friday
Date: Fri, 15 Oct 2004 20:09:49 +0100	[thread overview]
Message-ID: <200410152009.49873.nick@linicks.net> (raw)

Hi all,

Here is a story that happened to me today, and a sightly warning for any 
sysadmins that read here.

At 9:50 am on the 1st Oct (2 weeks ago), main file server at work crashed big 
time (a Snapserver 4100).  It trashed the RAID (240GB - 170GB usable [50GB 
free]), but the OS (BSD based, I believe) is pretty good and rebuilt it - 
took 8 hours - no data lost out of 120+ GB

I also have a second Snapserver that runs Quantums own synchronisation 
software, so that at any point it time, server 'b' is an exact copy of server 
'a' - this I set to sync at 1:00 am each day.  The idea being you can 
recover/swap from server to server real time.

The first crash proved problems I never thought off in disaster recover 
options.  The Snapserver synchronisation software doesn't 'sync' directory 
share nor file permissions - just the actual binary data.  So the 'copy' 
server 'b' is not as is server 'a'.

OK, so since that 'black Friday' two weeks ago, I hacked a way to get the file 
permissions from box ''a' to box 'b' replicated manually so at least a quick 
swap over from box 'a' to box 'b' would be possible and the change over for 
the users would be invisible and all file/share permissions are correct.

Today at 9:50 Snapserver box 'a' crashed again.  I suspected that it was dying 
now, and the better move would be to push everybody to the back up box 'b' 
and replace the dodgy box 'a' until I could replace it.

Except box 'b' was AWOL as well (I didn't scream exactly...).  That crashed 
too at 9:50 (yes, in sync with box 'a').  The 'sync' software is bloody 
good ;)

After a lengthy discussion with Snap engineers, it turns out the OS does a 
KERNEL PANIC after a certain number of file opens/shares get accessed on the 
version OS I was running.  It only does this once the server reaches the 
point of whatever threshhold causes it - i.e. a growing, expanding fileserver 
- they didn't really tell me, nor elaborate.

But because two weeks ago, only one server crashed, I put it down to the 
gremlins... but as 'a' had not sync'ed with 'b' yet, that is why only one 
crashed then, and as both are sync'ed since, today both of them.

Two weeks later, both have the same threshhold to cause the kernel crash.

Snap guys gave me new OS firmware to flash - I have done one server, the main 
server  tomorrow.

The gist of this mail?  Never, ever think you are safe.

Nick
P.S still have DLT backup anyway - but users want to USE NOW not wait :(

-- 
"When you're chewing on life's gristle,
Don't grumble, Give a whistle..."

             reply	other threads:[~2004-10-15 19:17 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-15 19:09 Nick Warne [this message]
  -- strict thread matches above, loose matches on Subject: below --
2004-10-15 19:34 Black Friday Nick Warne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200410152009.49873.nick@linicks.net \
    --to=nick@linicks.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox