Re: Setting up md-raid5: observations, errors, questions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michael Tokarev <mjt@tls.msk.ru>
To: Christian Pernegger <pernegger@gmail.com>
Cc: linux-raid@vger.kernel.org, linux-ide@vger.kernel.org
Subject: Re: Setting up md-raid5: observations, errors, questions
Date: Sun, 02 Mar 2008 21:33:51 +0300	[thread overview]
Message-ID: <47CAF30F.7040502@msgid.tls.msk.ru> (raw)
In-Reply-To: <bb145bd20803020832x744c2674nd0a6a30e445b5453@mail.gmail.com>

Christian Pernegger wrote:
>>  > OK. Back to the fs again, same command, different device. Still
>>  > glacially slow (and still running), only now the whole box is at a
>>  > standstill, too. cat /proc/cpuinfo takes about 3 minutes (!) to
>>  > complete, I'm still waiting for top to launch (15min and counting).
>>  > I'll leave mke2fs running for now ...
>>
>>  What's the state of your array at this point - is it resyncing?
> 
> Yes. Didn't think it would matter (much). Never did before.

It does.  If everything works ok, it should not, but it's not your
case ;)

>>   o how about making filesystem(s) on individual disks first, to see
>>     how that will work out?  Maybe on each of them in parallel? :)
> 
> Running. System is perfectly responsive during 4x mke2fs -j -q on raw devices.
> Done. Upper bound for duration is 8 minutes (probaby much lower,
> forgot to let it beep on completion), which is much better than the 2
> hours with the syncing RAID.

Aha.  Excellent.

>  26:    1041479        267   IO-APIC-fasteoi   sata_promise
>  27:          0          0   IO-APIC-fasteoi   sata_promise

> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  4      0  12864 1769688  10000    0    0     0 146822  539  809  0 26 23 51

Ok.  146Mb/sec.

> Cpu(s):  1.3%us,  8.1%sy,  0.0%ni, 41.6%id, 46.0%wa,  0.7%hi,  2.3%si,  0.0%st

46.0% waiting

> I hope you can interpret that :)

Some ;)

>>   o try --assume-clean when creating the array
> 
> mke2fs (same command as in first post) now running on fresh
> --assumed-clean array w/o crypto. System is only marginally less
> responsive than under idle load, if at all.

So the responsibility problem is solved here, right?  I mean, if
there's no resync going on (the case with --assume-clean), the rest
of the system works as expected, right?

> But inode table writing speed is only about 8-10/second. For the
> single disk case I couldn't read the numbers fast enough.

Note that mkfs now has to do 3x more work, too - since the device
is 3x (for 4-drive raid5) larger.

> chris@jesus:~$ cat /proc/interrupts
>  26:    1211165        267   IO-APIC-fasteoi   sata_promise
>  27:          0          0   IO-APIC-fasteoi   sata_promise
> 
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  1      0  11092 1813376  10804    0    0     0 13316  535 5201  0  9 51 40

That's 10 times slower than in the case of 4 individual disks.

> Cpu(s):  0.0%us, 10.1%sy,  0.0%ni, 55.6%id, 33.7%wa,  0.2%hi,  0.3%si,  0.0%st

and only 33.7% waiting, which is probably due to the lack of
parallelism.

> From vmstat I gather that total write throughput is an order of
> magnitude slower than on the 4 raw disks in parallel. Naturally the
> mke2fs on the raid isn't parallelized but it should still be
> sequential enough to get the max for a single disk (~60-40MB/s),
> right?

Well, not really.  Mkfs is doing many small writes all over the
place, so each is seek+write.  And it's syncronous - no next write
gets submitted till the current one completes.

Ok.  For now I don't see a problem (over than that there IS a problem
somewhere - obviously).  Interrupts are ok.  System time (10.1%) in
second case doesn't look right, but it was 8.1% before...

Only 2 guesses left.  And I really mean "guesses", because I can't
say definitely what's going on anyway.

First, try to disable bitmaps on the raid array, and see if it makes
any difference.  For some reason I think it will... ;)

And second, the whole thing looks pretty much like a more general
problem discussed here and elsewhere last few days.  I mean handling
of parallel reads and writes - when single write may stall reads
for quite some time and vise versa.  I see it every day on disks
without NCQ/TCQ - system is mostly single-tasking, sorta like
ol'good MS-DOG :)  Good TCQ-enabled drives survives very high load
while the system is still more-or-less responsible (and I forgot when
I last saw "bad" TCQ-enabled drive - even 10 y/o 4Gb seagate has
excellent TCQ support ;).  And all modern SATA stuff works pretty
much like old IDE drives, which were designed "for personal use",
or "single-task only" -- even ones that CLAMS to support NCQ in
reality does not....  But that's a long story, and your disks
and/or controllers (or the combination) don't even support NCQ...

/mjt

next prev parent reply	other threads:[~2008-03-02 18:33 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-02 12:23 Setting up md-raid5: observations, errors, questions Christian Pernegger
2008-03-02 12:41 ` Justin Piszcz
2008-03-02 12:56   ` Christian Pernegger
2008-03-02 13:03     ` Justin Piszcz
2008-03-02 13:23       ` Janek Kozicki
2008-03-02 13:33         ` Christian Pernegger
2008-03-02 13:24       ` Christian Pernegger
2008-03-03 17:59         ` Bill Davidsen
2008-03-03 20:19           ` Christian Pernegger
2008-03-02 13:48 ` Robin Hill
2008-03-02 14:17   ` Christian Pernegger
2008-03-02 14:32     ` Janek Kozicki
2008-03-02 14:46       ` Christian Pernegger
2008-03-02 20:18     ` Michael Guntsche
2008-03-02 15:20 ` Michael Tokarev
2008-03-02 16:32   ` Christian Pernegger
2008-03-02 16:32     ` Christian Pernegger
2008-03-02 18:33     ` Michael Tokarev [this message]
2008-03-02 21:19       ` Christian Pernegger
2008-03-02 21:56         ` Michael Tokarev
2008-03-03  0:17           ` Christian Pernegger
2008-03-03  2:58             ` Michael Tokarev
2008-03-03  8:38               ` Christian Pernegger
2008-03-04 16:54                 ` Christian Pernegger
2008-03-05  6:38                   ` Christian Pernegger
2008-03-10 14:03                     ` Christian Pernegger
2008-03-02 18:53     ` Christian Pernegger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47CAF30F.7040502@msgid.tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=pernegger@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.