linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: jmolina@tgen.org
Cc: linux-raid@vger.kernel.org
Subject: Re: Failed RAID5 array grow after reboot interruption; mdadm: Failed to restore critical section for reshape, sorry.
Date: Thu, 19 Jun 2008 14:25:34 +1000	[thread overview]
Message-ID: <18521.57278.92951.935685@notabene.brown> (raw)
In-Reply-To: message from jmolina@tgen.org on Monday June 16

On Monday June 16, jmolina@tgen.org wrote:
> 
> During the grow process, this system slowly went unresponsive, and I
> was forced to reboot it after about 30 hours.  At first I was not
> able to run any mdadm commands to see the status of the grow (about
> 30 minutes after starting), then I was not able to log in with a new
> shell, then after about 24 hours I was able to use a previously
> opened shell to see that tons of CRON jobs and other work had backed
> up, however during all of this time the system was still acting as
> an IP router doing NAT.  Finally, after about 30 hours, the dhcpd
> daemon stopped giving out leases and then finally traffic stopped
> and I could not ping the host any longer (not a lease problem). 

This is a bit of a worry.  It sounds like the system was running out
of memory.  It would seem to suggest that either the reshape process
was leaking memory, or that it was blocking writeout somehow so that
other memory wasn't getting freed.
However I cannot measure it doing either of these things.

If you can reproduce this, I'd love to see the content of
   /proc/meminfo
   /proc/slabinfo
   /proc/slab_allocators

at 5 minutes intervals.   But I don't expect you'll want to try that
experiment :-)


NeilBrown

      parent reply	other threads:[~2008-06-19  4:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-16  8:23 Failed RAID5 array grow after reboot interruption; mdadm: Failed to restore critical section for reshape, sorry jmolina
2008-06-16  9:28 ` jmolina
2008-06-16 10:00 ` Ken Drummond
2008-06-16 16:48   ` Jesse Molina
2008-06-16 19:34     ` Richard Scobie
2008-06-17  1:08     ` Jesse Molina
2008-06-17  7:03       ` jmolina
2008-06-18  0:30         ` resync=PENDING, interrupted RAID5 grow will not automatically reconstruct Jesse Molina
2008-06-19  4:06           ` Neil Brown
2008-06-19  4:25 ` Neil Brown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18521.57278.92951.935685@notabene.brown \
    --to=neilb@suse.de \
    --cc=jmolina@tgen.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).