linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bas van Schaik <bas@tuxes.nl>
To: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid5 hang on get_active_stripe
Date: Tue, 10 Oct 2006 12:27:51 +0200	[thread overview]
Message-ID: <452B75A7.4060705@tuxes.nl> (raw)
In-Reply-To: <17706.58512.374925.103556@cse.unsw.edu.au>

Hi all,

Neil Brown wrote:
> On Tuesday October 10, chris@cjx.com wrote:
>   
>> Very happy to. Let me know what you'd like me to do.
>>     
>
> Cool thanks.
> (snip)
>   
I don't know if it's useful information, but I'm encountering the same
problem here, in a totally different situation. I'm using Peter Breuers
ENBD (you probably know him, since he started a discussion about request
retries with exponential timeouts and a communication channel to raid a
while ago) to import a total of 12 devices from other machines to
compose those disks into 3 arrays of RAID5. Those 3 arrays are combined
in one VG with one LV, running CryptoLoop on top. Last, but not least, a
ReiserFS is created on the loopback device. I'm using the Debian Etch
stock 2.6.17-kernel, by the way.

When doing a lot of I/O on the ReiserFS (like a "reiserfsck
--rebuild-tree"), the machine suddenly gets stuck, I think after filling
it's memory with buffers. I've been doing a lot of debugging with Peter,
attached you'll find a "ps -axl" with a widened WCHAN column to see that
some of the enbd-client processes get stuck in the RAID code. We've not
been able find out how ENBD gets into the RAID code, but I don't think
that's really relevant right now. Here's the relevant part of ps:

ps ax -o f,uid,pid,ppid,pri,ni,vsz,rss,wchan:30,stat,tty,time,command
(only the relevant rows)

> F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN                          STAT TT           TIME COMMAND
> (snip)
> 5     0 26523     1  23   0  2140 1052 -                              Ss   ?        00:00:00 enbd-client iss01 1300 -i iss01-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndi
> 5     0 26540     1  23   0  2140 1048 get_active_stripe              Ds   ?        00:00:00 enbd-client iss04 1300 -i iss04-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndl
> 5     0 26552     1  23   0  2140 1044 -                              Ss   ?        00:00:00 enbd-client iss02 1200 -i iss02-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/ndf
> 5     0 26556     1  23   0  2140 1048 -                              Ss   ?        00:00:00 enbd-client iss01 1100 -i iss01-hda5 -n 2 -e -m -b 4096 -p 30 /dev/nda
> 5     0 26561     1  23   0  2140 1052 get_active_stripe              Ds   ?        00:00:00 enbd-client iss02 1100 -i iss02-hda5 -n 2 -e -m -b 4096 -p 30 /dev/ndb
> 5     0 26564     1  23   0  2144 1052 -                              Ss   ?        00:00:00 enbd-client iss03 1200 -i iss03-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/ndg
> 5     0 26568     1  23   0  2144 1052 -                              Ss   ?        00:00:00 enbd-client iss04 1200 -i iss04-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/ndh
> 5     0 26581     1  23   0  2144 1052 -                              Ss   ?        00:00:00 enbd-client iss03 1100 -i iss03-hda5 -n 2 -e -m -b 4096 -p 30 /dev/ndc
> 5     0 26590     1  23   0  2140 1048 -                              Ss   ?        00:00:00 enbd-client iss01 1200 -i iss01-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/nde
> 5     0 26606     1  23   0  2144 1052 -                              Ss   ?        00:00:00 enbd-client iss02 1300 -i iss02-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndj
> 5     0 26614     1  23   0  2144 1052 -                              Ss   ?        00:00:00 enbd-client iss03 1300 -i iss03-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndk
> 5     0 26616     1  23   0  2144 1056 -                              Ss   ?        00:00:00 enbd-client iss04 1100 -i iss04-hda5 -n 2 -e -m -b 4096 -p 30 /dev/ndd
> 5     0 26617 26523  24   0  2140  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss01 1300 -i iss01-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndi
> 5     0 26618 26523  24   0  2140  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss01 1300 -i iss01-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndi
> 5     0 26619 26540  24   0  2140  948 enbd_get_req                   S    ?        00:00:01 enbd-client iss04 1300 -i iss04-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndl
> 5     0 26620 26540  24   0  2140  948 enbd_get_req                   S    ?        00:00:01 enbd-client iss04 1300 -i iss04-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndl
> 5     0 26621 26552  24   0  2140  948 get_active_stripe              D    ?        00:32:11 enbd-client iss02 1200 -i iss02-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/ndf
> 5     0 26622 26552  24   0  2140  948 get_active_stripe              D    ?        00:32:18 enbd-client iss02 1200 -i iss02-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/ndf
> 5     0 26623 26564  23   0  2144  956 enbd_get_req                   S    ?        00:32:27 enbd-client iss03 1200 -i iss03-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/ndg
> 5     0 26624 26564  24   0  2144  956 enbd_get_req                   S    ?        00:32:37 enbd-client iss03 1200 -i iss03-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/ndg
> 5     0 26625 26568  24   0  2144  956 enbd_get_req                   S    ?        00:35:35 enbd-client iss04 1200 -i iss04-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/ndh
> 5     0 26626 26561  24   0  2140  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss02 1100 -i iss02-hda5 -n 2 -e -m -b 4096 -p 30 /dev/ndb
> 5     0 26627 26561  24   0  2140  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss02 1100 -i iss02-hda5 -n 2 -e -m -b 4096 -p 30 /dev/ndb
> 5     0 26628 26568  24   0  2144  956 enbd_get_req                   S    ?        00:35:37 enbd-client iss04 1200 -i iss04-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/ndh
> 5     0 26629 26556  24   0  2140  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss01 1100 -i iss01-hda5 -n 2 -e -m -b 4096 -p 30 /dev/nda
> 5     0 26630 26556  24   0  2140  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss01 1100 -i iss01-hda5 -n 2 -e -m -b 4096 -p 30 /dev/nda
> 5     0 26631 26581  24   0  2144  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss03 1100 -i iss03-hda5 -n 2 -e -m -b 4096 -p 30 /dev/ndc
> 5     0 26632 26581  24   0  2144  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss03 1100 -i iss03-hda5 -n 2 -e -m -b 4096 -p 30 /dev/ndc
> 5     0 26633 26590  24   0  2140  952 enbd_get_req                   S    ?        00:36:58 enbd-client iss01 1200 -i iss01-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/nde
> 5     0 26634 26590  24   0  2140  952 enbd_get_req                   S    ?        00:36:50 enbd-client iss01 1200 -i iss01-hdc5 -n 2 -e -m -b 4096 -p 30 /dev/nde
> 5     0 26635 26606  24   0  2144  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss02 1300 -i iss02-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndj
> 5     0 26636 26606  24   0  2144  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss02 1300 -i iss02-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndj
> 5     0 26637 26616  24   0  2144  952 enbd_get_req                   S    ?        00:00:00 enbd-client iss04 1100 -i iss04-hda5 -n 2 -e -m -b 4096 -p 30 /dev/ndd
> 5     0 26638 26616  23   0  2144  952 enbd_get_req                   S    ?        00:00:00 enbd-client iss04 1100 -i iss04-hda5 -n 2 -e -m -b 4096 -p 30 /dev/ndd
> 5     0 26639 26614  23   0  2144  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss03 1300 -i iss03-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndk
> 5     0 26640 26614  24   0  2144  948 enbd_get_req                   S    ?        00:00:00 enbd-client iss03 1300 -i iss03-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndk
>   

I've tried this "reiserfsck --rebuild-tree" a couple of times, it keeps
hanging at the same point when my memory gets filled with buffers. My
assumption is, that reiserfs is writing out too fast, the network (ENBD)
can't handle it and after a while there's no memory left for TCP
buffers. I've solved this problem by editing
/proc/sys/vm/min_free_kbytes to force the kernel to leave some memory
for the TCP buffers and other interrupt handling.

I'm not able to install a vanilla kernel with some patches, but I would
be happy to provide some extra details about the crash if you want me
to. I assume I can even reproduce it, on another cluster however, since
I've recreated a (ext3) filesystem on the cluster we're talking about.

Regards

  -- Bas van Schaik



  reply	other threads:[~2006-10-10 10:27 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-06 11:46 raid5 performance question Raz Ben-Jehuda(caro)
2006-03-06 11:59 ` Gordon Henderson
2006-03-06 12:56   ` Raz Ben-Jehuda(caro)
2006-03-13 22:36     ` raid5 hang on get_active_stripe Patrik Jonsson
2006-03-13 23:17       ` Neil Brown
2006-05-11 15:13         ` dean gaudet
2006-05-17 18:41           ` dean gaudet
2006-05-19  3:46             ` Neil Brown
2006-05-23  2:33             ` Neil Brown
2006-05-26 16:07               ` dean gaudet
2006-05-26 23:55                 ` Neil Brown
2006-05-27  0:28                   ` dean gaudet
2006-05-27 15:42                   ` dean gaudet
2006-05-28 12:04                     ` Neil Brown
2006-05-29 20:56                       ` dean gaudet
2006-05-29 23:51                         ` Neil Brown
2006-05-31  0:15                           ` dean gaudet
2006-05-31  0:46                             ` Neil Brown
2006-05-31  1:03                               ` dean gaudet
2006-05-31  1:38                                 ` Neil Brown
2006-05-31  1:46                                   ` dean gaudet
2006-06-01  2:51                                     ` Neil Brown
2006-06-02  7:10                                       ` dean gaudet
2006-06-02  7:27                                         ` Neil Brown
2006-06-10 19:49                                           ` dean gaudet
2006-06-13 18:53                                           ` Bill Davidsen
2006-06-13 19:05                                             ` dean gaudet
2006-06-13 23:13                                             ` Neil Brown
2006-10-07 23:25                                               ` Chris Allen
2006-10-09 11:03                                                 ` Chris Allen
2006-10-09 23:06                                                   ` Neil Brown
2006-10-09 23:21                                                     ` Chris Allen
2006-10-10  0:08                                                       ` Neil Brown
2006-10-10 10:27                                                         ` Bas van Schaik [this message]
2006-11-14 10:28                                                         ` Chris Allen
2006-11-15 20:39                                                           ` dean gaudet
2006-03-06 22:17 ` raid5 performance question Guy
2006-03-06 22:24 ` Neil Brown
2006-03-07  8:40   ` Raz Ben-Jehuda(caro)
2006-03-07 23:03     ` Neil Brown
2006-03-22 13:22       ` Bill Davidsen
2006-03-24  4:40         ` Neil Brown
2006-03-08  6:45   ` thunder7

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=452B75A7.4060705@tuxes.nl \
    --to=bas@tuxes.nl \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).