mdadm 2.1: command line option parsing bug?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mdadm 2.1: command line option parsing bug?
@ 2005-11-18 19:34 Andreas Haumer
  2005-11-21 23:21 ` Neil Brown
  0 siblings, 1 reply; 33+ messages in thread
From: Andreas Haumer @ 2005-11-18 19:34 UTC (permalink / raw)
  To: linux-raid

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

I just upgraded to mdadm-2.1 from mdadm-1.12.0 and noticed
that the following command (which is even mentioned in the
manual page) doesn't work anymore:

root@tolstoi:~ {838} $ mdadm -Ebsc partitions
mdadm: cannot open partitions: No such file or directory


(I always use this command to create my mdadm.conf file,
so I found the problem very quickly after upgrading ;-)

After playing a little bit with the available commandline
options I found that the following commands do work:

root@tolstoi:~ {839} $ mdadm -Eb -sc partitions
ARRAY /dev/md/1 level=raid1 num-devices=3 UUID=c2c22b31:909b7973:0e4224bb:19cf529d

ARRAY /dev/md/0 level=raid5 num-devices=3 UUID=7abbd703:643ffe8e:d6ae4d0e:1b0908aa

as well as the following variations:

root@tolstoi:~ {840} $ mdadm -Eb -s -c partitions
ARRAY /dev/md/1 level=raid1 num-devices=3 UUID=c2c22b31:909b7973:0e4224bb:19cf529d

ARRAY /dev/md/0 level=raid5 num-devices=3 UUID=7abbd703:643ffe8e:d6ae4d0e:1b0908aa


root@tolstoi:~ {841} $ mdadm --examine --brief --scan --config=partitions
ARRAY /dev/md/1 level=raid1 num-devices=3 UUID=c2c22b31:909b7973:0e4224bb:19cf529d

ARRAY /dev/md/0 level=raid5 num-devices=3 UUID=7abbd703:643ffe8e:d6ae4d0e:1b0908aa


So it looks like the commandline option parsing algorithm
in mdadm.c (which is way too complicated for me to find out
quickly what is going wrong here... ;-) has some problem with
the "-Ebsc partitions" variant.

Seems to be a bug?

I also noticed that there is now an additional newline between
the "ARRAY" lines. Is that intentional? IMHO there should only
by one newline after each line here.

HTH

- - andreas

- --
Andreas Haumer                     | mailto:andreas@xss.co.at
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDfiy0xJmyeGcXPhERAp9KAKCfQru5tViXFM3I+MGwEfTfSS7zwACgxD0C
ExjRVCIUb3ik5x1zCFYslFo=
=Rc7n
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: mdadm 2.1: command line option parsing bug?
  2005-11-18 19:34 mdadm 2.1: command line option parsing bug? Andreas Haumer
@ 2005-11-21 23:21 ` Neil Brown
  2005-11-22 11:21   ` Michael Tokarev
                     ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Neil Brown @ 2005-11-21 23:21 UTC (permalink / raw)
  To: Andreas Haumer; +Cc: linux-raid

On Friday November 18, andreas@xss.co.at wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi!
> 
> I just upgraded to mdadm-2.1 from mdadm-1.12.0 and noticed
> that the following command (which is even mentioned in the
> manual page) doesn't work anymore:
> 
> root@tolstoi:~ {838} $ mdadm -Ebsc partitions
> mdadm: cannot open partitions: No such file or directory

No.... I guess I should fix the man page, and maybe the code :-(

In mdadm-2 we've added --bitmap= and use '-b' as a short version.
So sometimes -b takes an argument (--bitmap) and sometimes not
(--brief).  So getopt is told that it takes an optional argument.
This explains the observed behaviour.
Possibly this was a mistake....

I would like it to take an argument in contexts where --bitmap was
meaningful (Create, Assemble, Grow) and not where --brief is
meaningful (Examine, Detail).  but I don't know if getopt_long will
allow the 'short_opt' string to be changed half way through
processing...

At the very least, I can print a message if '-b' is being interpreted
as as --brief, but the option argument is present.

-a has the same problem (--add vs --auto).

I'll see what I can do,
Thanks.

> 
> I also noticed that there is now an additional newline between
> the "ARRAY" lines. Is that intentional? IMHO there should only
> by one newline after each line here.
> 

Oh yes.  That blank line gets filled with 'spares=' if there are
any spares, and =devices=' if --verbose.  But I should remove it in
other cases.  Thanks.

NeilBrown

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: mdadm 2.1: command line option parsing bug?
  2005-11-21 23:21 ` Neil Brown
@ 2005-11-22 11:21   ` Michael Tokarev
  2005-11-24  5:15     ` Neil Brown
  2005-11-22 15:41   ` Molle Bestefich
  2005-11-22 22:05   ` Andre Noll
  2 siblings, 1 reply; 33+ messages in thread
From: Michael Tokarev @ 2005-11-22 11:21 UTC (permalink / raw)
  To: Neil Brown; +Cc: Andreas Haumer, linux-raid

Neil Brown wrote:
[]
> I would like it to take an argument in contexts where --bitmap was
> meaningful (Create, Assemble, Grow) and not where --brief is
> meaningful (Examine, Detail).  but I don't know if getopt_long will
> allow the 'short_opt' string to be changed half way through
> processing...

getopt allows you to change both long and short options set
before every call (provided argv&argc are intact).  But.

Please, pretty please, don't implement the same options with
different meaning.  It's confusing at best.  Assign short options
to frequently-used commands, and leave only long options for the
rest.  I dunno whichever of --brief or --bitmap is more frequent,
I'd say both can be long-only, but since -b already stands for
--brief, don't use it for --bitmap.

> At the very least, I can print a message if '-b' is being interpreted
> as as --brief, but the option argument is present.
> 
> -a has the same problem (--add vs --auto).

And this is also bad.  In my opinion anyway.

/mjt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: mdadm 2.1: command line option parsing bug?
  2005-11-22 11:21   ` Michael Tokarev
@ 2005-11-24  5:15     ` Neil Brown
  0 siblings, 0 replies; 33+ messages in thread
From: Neil Brown @ 2005-11-24  5:15 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Andreas Haumer, linux-raid

On Tuesday November 22, mjt@tls.msk.ru wrote:
> Neil Brown wrote:
> []
> > I would like it to take an argument in contexts where --bitmap was
> > meaningful (Create, Assemble, Grow) and not where --brief is
> > meaningful (Examine, Detail).  but I don't know if getopt_long will
> > allow the 'short_opt' string to be changed half way through
> > processing...
> 
> getopt allows you to change both long and short options set
> before every call (provided argv&argc are intact).  But.
> 
> Please, pretty please, don't implement the same options with
> different meaning.  It's confusing at best.  Assign short options
> to frequently-used commands, and leave only long options for the
> rest.  I dunno whichever of --brief or --bitmap is more frequent,
> I'd say both can be long-only, but since -b already stands for
> --brief, don't use it for --bitmap.
> 
> > At the very least, I can print a message if '-b' is being interpreted
> > as as --brief, but the option argument is present.
> > 
> > -a has the same problem (--add vs --auto).
> 
> And this is also bad.  In my opinion anyway.

I'm afraid it is a bit late...
 -f == --force or --fail or  --daemonise (think 'fork')
 -m == --super-minor --mail
 -p == --parity or --program
 -c == --config or --chunk

Think of mdadm as a number of separate program (A, B, C, D, E, G :-)
Each has an independent, though sometimes overlapping, set of options.

It looks like I can get getopt_long to work quite nicely to e.g. '-b'
taking an arg in some contexts, and not in others. so expect that to
be fixed in 2.2.

NeilBrown

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: mdadm 2.1: command line option parsing bug?
  2005-11-21 23:21 ` Neil Brown
  2005-11-22 11:21   ` Michael Tokarev
@ 2005-11-22 15:41   ` Molle Bestefich
  2005-11-24  5:25     ` Neil Brown
  2005-11-22 22:05   ` Andre Noll
  2 siblings, 1 reply; 33+ messages in thread
From: Molle Bestefich @ 2005-11-22 15:41 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown wrote:
> I would like it to take an argument in contexts where --bitmap was
> meaningful (Create, Assemble, Grow) and not where --brief is
> meaningful (Examine, Detail).  but I don't know if getopt_long will
> allow the 'short_opt' string to be changed half way through
> processing...

Here's an honest opinion from a regular user.

mdadm's command line arguments seem arcane and cryptic and unintuitive.
It's difficult to grasp what combinations will actually do something
worthwhile and what combinations will just yield a 'you cannot do
that' output.

I find myself spending 20 minutes with mdadm --help and experimenting
with different commands (which shouldn't be the case when doing RAID
stuff) just to do simple things like create an array or make MD
assemble the devices that compose an array.

I know.  Not very constructive, but a POV anyway.  Maybe I just do not
use MD enough, and so I shouldn't complain, because the interface is
really not designed for the absolute newbie.  If so, then I apologize.

I don't have any constructive suggestions, except to say that the way
the classic Cisco interface does things works very nicely.

A lot of other manufacturers has also started doing things the Cisco
way.  If you don't have a Cisco router available, you can fx. use a
Windows XP box.  Type 'netsh' in a command prompt, then 'help'.  Or
alternatively 'netsh help'.  You get the idea :-).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: mdadm 2.1: command line option parsing bug?
  2005-11-22 15:41   ` Molle Bestefich
@ 2005-11-24  5:25     ` Neil Brown
  2005-11-24  7:31       ` Ross Vandegrift
  2005-12-15  1:53       ` Molle Bestefich
  0 siblings, 2 replies; 33+ messages in thread
From: Neil Brown @ 2005-11-24  5:25 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: linux-raid

On Tuesday November 22, molle.bestefich@gmail.com wrote:
> Neil Brown wrote:
> > I would like it to take an argument in contexts where --bitmap was
> > meaningful (Create, Assemble, Grow) and not where --brief is
> > meaningful (Examine, Detail).  but I don't know if getopt_long will
> > allow the 'short_opt' string to be changed half way through
> > processing...
> 
> Here's an honest opinion from a regular user.

Honest is always nice!

> 
> mdadm's command line arguments seem arcane and cryptic and unintuitive.
> It's difficult to grasp what combinations will actually do something
> worthwhile and what combinations will just yield a 'you cannot do
> that' output.
> 
> I find myself spending 20 minutes with mdadm --help and experimenting
> with different commands (which shouldn't be the case when doing RAID
> stuff) just to do simple things like create an array or make MD
> assemble the devices that compose an array.

Thanks strange.  They seem very regular and intuitive to me (but I
find it very hard to be objective).  Maybe you have a mental model of
what the task involves that differs from how md actually does things.

To create an array, you need:
  the array to create,
  the list of devices to hold the data
  the level
  the number of devices (consistency check)
  and maybe some other parameters like chunksize or parity mode.
So that is what you give 'mdadm --create'
Providing you give the array to be created first, the rest can come in
any order.

 mdadm --create /dev/md0  /dev/sda /dev/sdb --level=5 --raid-disks=3 /dev/sdc

Can you say anything more about the sort of mistakes you find yourself
making.  That might help either improve the help pages or the error
messages (revamping all the diagnostic messages to make the more
helpful is slowly climbing to the top of my todo list).

> 
> I know.  Not very constructive, but a POV anyway.  Maybe I just do not
> use MD enough, and so I shouldn't complain, because the interface is
> really not designed for the absolute newbie.  If so, then I apologize.
> 
> I don't have any constructive suggestions, except to say that the way
> the classic Cisco interface does things works very nicely.
> 
> A lot of other manufacturers has also started doing things the Cisco
> way.  If you don't have a Cisco router available, you can fx. use a
> Windows XP box.  Type 'netsh' in a command prompt, then 'help'.  Or
> alternatively 'netsh help'.  You get the idea :-).

Is this and interactive interface where you have hit 'tab' at anytime
and it either completes the current word, or lists options? (For me,
that is the 'kermit' interface, as kermit was the first program I used
which had it).
This is certainly a nice interface when learning a system, but I don't
think it belongs in a tool like mdadm.  Rather it might make sense to
create an 'mdassist' tool which works like this and build mdadm
commands for you...

NeilBrown

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: mdadm 2.1: command line option parsing bug?
  2005-11-24  5:25     ` Neil Brown
@ 2005-11-24  7:31       ` Ross Vandegrift
  2005-12-15  1:53       ` Molle Bestefich
  1 sibling, 0 replies; 33+ messages in thread
From: Ross Vandegrift @ 2005-11-24  7:31 UTC (permalink / raw)
  To: Neil Brown; +Cc: Molle Bestefich, linux-raid

On Thu, Nov 24, 2005 at 04:25:01PM +1100, Neil Brown wrote:
> Can you say anything more about the sort of mistakes you find yourself
> making.  That might help either improve the help pages or the error
> messages (revamping all the diagnostic messages to make the more
> helpful is slowly climbing to the top of my todo list).

I've been using mdadm from the very beginning, but the one thing I
still don't have a great handle on is the difference between querying
devices vs. arrays, and when to query what to find out what I want to
know.

Partly, I'm not familiar with it because it's not like I run mdadm
everyday.  It's the kind of thing that's run once, configured once,
put into a script somewhere and then forgotten about.

> > A lot of other manufacturers has also started doing things the Cisco
> > way.  If you don't have a Cisco router available, you can fx. use a
> > Windows XP box.  Type 'netsh' in a command prompt, then 'help'.  Or
> > alternatively 'netsh help'.  You get the idea :-).
> 
> Is this and interactive interface where you have hit 'tab' at anytime
> and it either completes the current word, or lists options? (For me,
> that is the 'kermit' interface, as kermit was the first program I used
> which had it).

I'd have to vote with Neil on this.  mdadm is going to be used in lots
of scripts.  System startup/shutdown scripts especially.  It's current form
is awesome for that.  Throw the right args together and you can do
anything.  As soon as it's interative, we might as well use a mouse...
::-P

-- 
Ross Vandegrift
ross@lug.udel.edu

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: mdadm 2.1: command line option parsing bug?
  2005-11-24  5:25     ` Neil Brown
  2005-11-24  7:31       ` Ross Vandegrift
@ 2005-12-15  1:53       ` Molle Bestefich
  2005-12-15  4:19         ` Neil Brown
  1 sibling, 1 reply; 33+ messages in thread
From: Molle Bestefich @ 2005-12-15  1:53 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

> > mdadm's command line arguments seem arcane and cryptic and unintuitive.
> > It's difficult to grasp what combinations will actually do something
> > worthwhile and what combinations will just yield a 'you cannot do
> > that' output.
> >
> > I find myself spending 20 minutes with mdadm --help and experimenting
> > with different commands (which shouldn't be the case when doing RAID
> > stuff) just to do simple things like create an array or make MD
> > assemble the devices that compose an array.
>
> Thanks strange.  They seem very regular and intuitive to me (but I
> find it very hard to be objective).  Maybe you have a mental model of
> what the task involves that differs from how md actually does things.

> Can you say anything more about the sort of mistakes you find yourself
> making.  That might help either improve the help pages or the error
> messages (revamping all the diagnostic messages to make the more
> helpful is slowly climbing to the top of my todo list).

I can, but I suck at putting these things down in writing, which is
why my initial description was intentionally vague and shallow :-). 
Now, I'll try anyway.  It'll be imprecise and I'll miss some things
and show up late with major points etc., but at least I will have
tried =).

Ok.. Starting with the usage output:

   $ mdadm
   Usage: mdadm --help
     for help

Nothing wrong with that.  Good, concise stuff.  Great.

The --help output, however...:

   $ mdadm --help
   Usage: mdadm --create device options...
          mdadm --assemble device options...
          mdadm --build device options...
   ... snip ...

... I find confusing.
Try and read the words create, assemble and build as an outsider or a
child would read them, as regular english words, not with MD
development in mind.  All three words mean just about the same thing
in plain english (something like "taking smaller parts and
constructing something bigger with them" or some such).  That confuses
me.

To make things worse, I now have to type in 3 commands (--help
--<cmd>) and compare the output of each just to get a grasp on what
each of the 3 individual commands do.  The proces is "laborious". 
Well, I'll live, but it's annoying to have to scroll up and down
perhaps 5-6 screens of text whilst comparing options just to figure
out which command I would like to use.  I would much prefer if the
'areas of utility' that mdadm commands are divided in were self
explanatory.

I'm not saying that these 'areas of utility' (--create etc.) are not
grouped in a logical fashion.  Just that it's not always easy to
comprehend what they cover.  Perhaps it's enough just to add a small
description per line of 'mdadm --help' output.  After 'mdadm --create'
it would fx. say 'creates a new RAID array based on multiple devices.'
or so.  Just a one-liner.

Moving right along:

    ... snip ...
          mdadm --manage device options...
          mdadm --misc options... devices
          mdadm --monitor options...
    ... snip ...

I find it very unhelpful to have a --misc section.  Every time I'm
looking for some command, besides from having to guess in which
section it's located, I have to check --misc also, since misc can
cover anything.

    ... snip ...
          mdadm device options...
    ... snip ...

Where does that syntax suddenly come from?
Is it a special no-command mode?  What does it do?  Hmm.  Confusing.

And btw, why mention "device" and "options" for each and every
section/command set/command when it's obvious that you need more
parameters to do something fruitful?  In my opinion it would be better
to rid the general --help screen of them and instead specify in brief
what functionality the sections are meant to cover.

    ... snip ...
    mdadm is used for building, managing, and monitoring
    Linux md devices (aka RAID arrays)
    ... snip ...

Probably belongs in the top of the output, but that's a small nit...

    ... snip ...
    For detailed help on the above major modes use --help after the mode
    e.g.
            mdadm --assemble --help
    ... snip ...

I keep typing mdadm --help --assemble which unfortunately gets me
nowhere.  A lot of the times it's because the general help text has
scrolled out of view because I've just done an '--xxx --help' for some
other command.  Maybe it's just me, but a minor improvement could be
to allow '--help --xxx'.

    ... snip ...
    For general help on options use
            mdadm --help-options
    ... snip ...

Like misc, I think this is an odd section.  Let's explore it's contents:

   $ mdadm --help-options
   Any parameter that does not start with '-' is treated as a device name
    ... snip ...

To me, the above is very interesting information about mdadm's syntax
and thus should be presented at the start of the general --help
screen.

    ... snip ...
   The first such name is often the name of an md device.  Subsequent
   names are often names of component devices.
    ... snip ...

This too.

What's with the "often" btw?
It is somewhat more helpful to me if parameters is an exact science.
Eg. the above could say, "for commands affecting whole arrays, the
first device name is the name of the MD device while subsequent device
names refer to component devices".  Or preferably something a little
more concise.

    ... snip ...
   Some common options are:
  --help        -h   : General help message or, after above option,
                       mode specific help message
  --help-options     : This help message
    ... snip ...

Those are not interesting at all, we've already used those to get here.
And both are mentioned in the general --help screen.  Snip 'em if you ask me.

    ... snip ...
  --version     -V   : Print version information for mdadm
    ... snip ...

I don't know where I'd put this.
Maybe I would rename 'monitor' to 'monitor/information' and make the
section description something like 'array monitoring, array/device
queries and other information' and then stuff --version in that
section....

    ... snip ...
  --force       -f   : Override normal checks and be more forceful
    ... snip ...

Without a context, --force is terribly unusable to me, as is mentioning it here.

    ... snip ...
  --verbose     -v   : Be more verbose about what is happening
  --brief       -b   : Be less verbose, more brief
    ... snip ...

No comments.  Not sure what they'd do since they're mentioned out of
context.  I think most people shrug and think 'irrellevant' when they
read those two lines.

    ... snip ...
  --assemble    -A   : Assemble an array
  --build       -B   : Build a legacy array
  --create      -C   : Create a new array
  --detail      -D   : Display details of an array
    ... snip ...

Extremely useful information about basic commands is right there, but
is very well hidden away in a section named "--help-options".
IMHO, the above should go directly to the front --help screen.

If I understand "--build" above correctly (I spotted the word "legacy"
upon reading it the third time), it should IMHO be renamed from
"--build" to "--assemble-legacy" right away!

    ... snip ...
  --examine     -E   : Examine superblock on an array component
  --monitor     -F   : monitor (follow) some arrays
  --query       -Q   : Display general information about how a
                       device relates to the md driver
    ... snip ...

Again, seems very useful.  Wonder why these are hiding in odd places
like --misc and --help-options.

Ok, enough about sections and grouping.

Next on the agenda, I find the messages that I get from mdadm when I
make syntactic blunders confusing.  I think it comes from the fact
that mdadm is run like this (example): 'mdadm -f' instead of 'mdadm
--manage -f'.  It means that the error message I get when I try to do
something is way off, because mdadm thinks I'm doing something
completely different than what I'm actually trying to do.  I would
much prefer having to specify an extra word (fx. --manage) for firing
an actual mdadm command if that would give me useful error messages. 
Having to look through a bundle of documentation for commands I'm not
even interested in just to try and figure out what mdadm thinks I'm
trying to make it do is not funny.

There you go.. initial thoughts.

Sorry for all the negativism, hope at least it's useful for something
constructive!

> Is this and interactive interface where you have hit 'tab' at anytime
> and it either completes the current word, or lists options? (For me,
> that is the 'kermit' interface, as kermit was the first program I used
> which had it).

Yes.

I'm not saying this is good as a replacement for mdadm's command line
arguments.  Most programs that has a Kermit-like CLI can also be
executed with the exact same commands that the CLI accepts, raw on the
command line.  So the CLI would be an added feature to mdadm, but
would not replace or remove anything.  Although "--create" would have
to be renamed "create" and so forth.

Maybe the CLI is a bad idea.  It could be that the CLI is a solution
to a symptom, not the problem itself.  Maybe I'm just struggling with
having to type the 'mdadm --help' part of 'mdadm --help blah' far too
many times because the help screens are not helping me as much as I
would like (and because I reverse order --help and --<cmd>).  Maybe
I'm just tired of typing a lot of similar 'mdadm <insert various
commands and parameters here>' a lot because I don't understand the
syntactic error messages that mdadm is giving me, because it's giving
me wrong messages on account of thinking that I want to fire a
different command than I'm really trying to.  I dunno.

The CLI is probably overkill, better help screens can probably do the
trick for me.
And if not, BASH can be extended to show help screens and finish mdadm
commands/parameters :-).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: mdadm 2.1: command line option parsing bug?
  2005-12-15  1:53       ` Molle Bestefich
@ 2005-12-15  4:19         ` Neil Brown
  2005-12-15 10:37           ` Molle Bestefich
  0 siblings, 1 reply; 33+ messages in thread
From: Neil Brown @ 2005-12-15  4:19 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: linux-raid

On Thursday December 15, molle.bestefich@gmail.com wrote:
> 
> I can, but I suck at putting these things down in writing, which is
> why my initial description was intentionally vague and shallow :-). 
> Now, I'll try anyway.  It'll be imprecise and I'll miss some things
> and show up late with major points etc., but at least I will have
> tried =).

Thanks for trying.

> 
> Ok.. Starting with the usage output:
> 
>    $ mdadm
>    Usage: mdadm --help
>      for help
> 
> Nothing wrong with that.  Good, concise stuff.  Great.
> 
> The --help output, however...:
> 
>    $ mdadm --help
>    Usage: mdadm --create device options...
>           mdadm --assemble device options...
>           mdadm --build device options...
>    ... snip ...
> 
> ... I find confusing.

I like the suggestion of adding one-line descriptions to this.
How about:

Usage: mdadm --create device options...
          Create a new array from unused devices.
       mdadm --assemble device options...
          reassemble a previously created array
       mdadm --build device options...
          create or assemble an array without metadata
       mdadm --manage device options...
          Make changes to an active array
       mdadm --misc options... devices
          report information or perform miscellaneous tasks.
       mdadm --monitor options...
          monitor one or more arrays and report any changes in status
       mdadm device options...
          same as --manage

> 
> Moving right along:
> 
>     ... snip ...
>           mdadm --manage device options...
>           mdadm --misc options... devices
>           mdadm --monitor options...
>     ... snip ...
> 
> I find it very unhelpful to have a --misc section.  Every time I'm
> looking for some command, besides from having to guess in which
> section it's located, I have to check --misc also, since misc can
> cover anything.

Yes, I can see how that is confusing.  
The difference between 'manage' and 'misc' is that manage will only
apply to a single array, while misc can apply to multiple arrays.
This is really a distinction that is mostly relevant in the
implementation.  I should try to hide it in the documentation.

> 
>     ... snip ...
>           mdadm device options...
>     ... snip ...
> 
> Where does that syntax suddenly come from?
> Is it a special no-command mode?  What does it do?  Hmm.  Confusing.
> 
> And btw, why mention "device" and "options" for each and every
> section/command set/command when it's obvious that you need more
> parameters to do something fruitful?  In my opinion it would be better
> to rid the general --help screen of them and instead specify in brief
> what functionality the sections are meant to cover.

well.  it is "device options" for all but misc, which has 
 "options ... devices..."

but I agree that it is probably unnecessary repetition. 

> I keep typing mdadm --help --assemble which unfortunately gets me
> nowhere.  A lot of the times it's because the general help text has
> scrolled out of view because I've just done an '--xxx --help' for some
> other command.  Maybe it's just me, but a minor improvement could be
> to allow '--help --xxx'.

That's a fair comment.  I currently print the help message as soon as I
see the --help option.  But that could change to: if I see '--help',
set a flag, then for every option, print the appropriate help.
Then 
  mdadm --help --size
could even give something useful...

> 
>     ... snip ...
>     For general help on options use
>             mdadm --help-options
>     ... snip ...
> 
> Like misc, I think this is an odd section.  Let's explore it's contents:
> 
>    $ mdadm --help-options
>    Any parameter that does not start with '-' is treated as a device name
>     ... snip ...
> 
> To me, the above is very interesting information about mdadm's syntax
> and thus should be presented at the start of the general --help
> screen.
> 
>     ... snip ...
>    The first such name is often the name of an md device.  Subsequent
>    names are often names of component devices.
>     ... snip ...
> 
> This too.
> 
> What's with the "often" btw?

mdadm --examine /dev/md1 /dev/md2 /dev/md3

subsequent names are names of other md devices, not component devices.


> It is somewhat more helpful to me if parameters is an exact science.
> Eg. the above could say, "for commands affecting whole arrays, the
> first device name is the name of the MD device while subsequent device
> names refer to component devices".  Or preferably something a little
> more concise.
> 
>     ... snip ...
>    Some common options are:
>   --help        -h   : General help message or, after above option,
>                        mode specific help message
>   --help-options     : This help message
>     ... snip ...
> 
> Those are not interesting at all, we've already used those to get here.
> And both are mentioned in the general --help screen.  Snip 'em if
> you ask me.

Fair comment.  They are there for completeness, but not really needed.

> 
> Next on the agenda, I find the messages that I get from mdadm when I
> make syntactic blunders confusing.  I think it comes from the fact
> that mdadm is run like this (example): 'mdadm -f' instead of 'mdadm
> --manage -f'.  It means that the error message I get when I try to do
> something is way off, because mdadm thinks I'm doing something
> completely different than what I'm actually trying to do.  I would
> much prefer having to specify an extra word (fx. --manage) for firing
> an actual mdadm command if that would give me useful error messages. 
> Having to look through a bundle of documentation for commands I'm not
> even interested in just to try and figure out what mdadm thinks I'm
> trying to make it do is not funny.

I'm not sure exactly what you are thinking here, but I will try giving
mdadm some erroneous arguments and try to improve what it says.


> 
> 
> There you go.. initial thoughts.
> 
> Sorry for all the negativism, hope at least it's useful for something
> constructive!

Yes, very useful, thanks.
> 
> The CLI is probably overkill, better help screens can probably do the
> trick for me.

Well, lets see if we can improve the help screens first...

(Note: there are other comments in the email that I haven't directly
responded to.  That doesn't mean I disagree or anything, just that I
didn't respond to them).

Thanks,

NeilBrown


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: mdadm 2.1: command line option parsing bug?
  2005-12-15  4:19         ` Neil Brown
@ 2005-12-15 10:37           ` Molle Bestefich
  0 siblings, 0 replies; 33+ messages in thread
From: Molle Bestefich @ 2005-12-15 10:37 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

I found myself typing "IMHO" after writing up just about each comment.
I've dropped that and you'll just have to know that all this is IMHO
and not an attack on your ways if they happen to be different ^_^.

Neil Brown wrote:
> I like the suggestion of adding one-line descriptions to this.
> How about:

I'll first comment each command/description (general feedback later):

> Usage:
>        mdadm --create device options...
>           Create a new array from unused devices.
>        mdadm --assemble device options...
>           reassemble a previously created array

I like 'em!

>        mdadm --build device options...
>           create or assemble an array without metadata

Oh, that's what it's for, operating without metadata.  Good to know.

It would help a lot (for me) if the above also had a brief note about
why I would ever want to use --build.  Otherwise I'll just be confused
about when I should use assemble and when I should use build and why
it even exists.

What's the logic behind having split create/assemble commands but a
joined command for creating/assembling when there's no metadata?  (I'm
sure there is one, I'm just confused as always.)

>        mdadm --manage device options...
>           Make changes to an active array

Nice.

>        mdadm --misc options... devices
>           report information or perform miscellaneous tasks.

Get rid of the misc section or rename it to something meaningful..

>        mdadm --monitor options...
>           monitor one or more arrays and report any changes in status

Nice.

>        mdadm device options...
>           same as --manage

Oh, so that's what it does.
A bit confusing for a newbie like me that we're not sticking to _1_ syntax.

Since the above is a side note about a convenient syntax hack, I think
that (in the event that you find that it is not confusing [which I do
:->] and decide to keep it), there should be at least a blank line
between the very important --<cmd> descriptions and this rarely
relevant note.

General stuff:

The note:
 * use --help in combination with --<cmd> for further help

is missing.  I think it would be good to retain it.


> > I find it very unhelpful to have a --misc section.  Every time I'm
> > looking for some command, besides from having to guess in which
> > section it's located, I have to check --misc also, since misc can
> > cover anything.
>
> Yes, I can see how that is confusing.
> The difference between 'manage' and 'misc' is that manage will only
> apply to a single array, while misc can apply to multiple arrays.
> This is really a distinction that is mostly relevant in the
> implementation.  I should try to hide it in the documentation.

That would be good :-).
Renaming --misc to something which relates to the fact that this is
about "multi md device" commands and giving it an appropriate
description line (I don't think your --misc description was any good,
sorry, hehe) would also be *a lot* better..


> > And btw, why mention "device" and "options" for each and every
> > section/command set/command when it's obvious that you need more
> > parameters to do something fruitful?  In my opinion it would be better
> > to rid the general --help screen of them and instead specify in brief
> > what functionality the sections are meant to cover.
>
> well.  it is "device options" for all but misc, which has
>  "options ... devices..."

Yes, okay, I think that would be explained better by a description
line for misc saying that it's about multiple devices (by the way, are
we talking multiple MD or multiple component devices?)

> but I agree that it is probably unnecessary repetition.

Also completely useless since there's no mention of what any of the
options /are/, no?
Just adds to confusion, so better to snip it.

> > I keep typing mdadm --help --assemble which unfortunately gets me
> > nowhere.  A lot of the times it's because the general help text has
> > scrolled out of view because I've just done an '--xxx --help' for some
> > other command.  Maybe it's just me, but a minor improvement could be
> > to allow '--help --xxx'.
>
> That's a fair comment.  I currently print the help message as soon as I
> see the --help option.  But that could change to: if I see '--help',
> set a flag, then for every option, print the appropriate help.

That would be really swell!

> Then
>   mdadm --help --size
> could even give something useful...

Yup :-).  Hmm.  Not bad at all.
Would be good for the extreme newbie and confused ppl like me ;).

> >     ... snip ...
> >     For general help on options use
> >             mdadm --help-options
> >     ... snip ...
> >
> > Like misc, I think this is an odd section.  Let's explore it's contents:
> >
> >    $ mdadm --help-options
> >    Any parameter that does not start with '-' is treated as a device name
> >     ... snip ...
> >
> > To me, the above is very interesting information about mdadm's syntax
> > and thus should be presented at the start of the general --help
> > screen.
> >
> >     ... snip ...
> >    The first such name is often the name of an md device.  Subsequent
> >    names are often names of component devices.
> >     ... snip ...

> > It is somewhat more helpful to me if parameters is an exact science.
> > Eg. the above could say, "for commands affecting whole arrays, the
> > first device name is the name of the MD device while subsequent device
> > names refer to component devices".  Or preferably something a little
> > more concise.

> > What's with the "often" btw?
>
> mdadm --examine /dev/md1 /dev/md2 /dev/md3
>
> subsequent names are names of other md devices, not component devices.

Ok.  In that case, I think if this note is to be exacted enough to
actually become useful, it will also become too broad to be of any use
to the help reader anyway.  Thus it should be dropped in favor of
similar (but obviously simpler) notes, one per individual command.


> >     ... snip ...
> >    Some common options are:
> >   --help        -h   : General help message or, after above option,
> >                        mode specific help message
> >   --help-options     : This help message
> >     ... snip ...
> >
> > Those are not interesting at all, we've already used those to get here.
> > And both are mentioned in the general --help screen.  Snip 'em if
> > you ask me.
>
> Fair comment.  They are there for completeness, but not really needed.

Yes, ok, it's a good principle to document everything.

But stuffing a lot of unrelated options in --help-options (or --misc
for that matter) tastes like "not enough thought has went into
designing the categories and commands."

Don't take it as a criticism of your work - making this stuff right
takes a lot of hard thought that could be used for something
immediately productive.  And even when a lot of thought has gone into
it, it still takes:

* a bunch of other clever people to look at it and bring up fresh
suggestions, and
* real tests with real users trying to do real things while someone
does a transcript

before it becomes real good.  So don't take it like I'm criticizing
your hard work, I'm not!

Dropping the entire --help-options and moving --help to the front page
would be right in my mind.  If you want suggestions where the rest of
the commands should go, I can try and suggest something helpful.


> > Next on the agenda, I find the messages that I get from mdadm when I
> > make syntactic blunders confusing.  I think it comes from the fact
> > that mdadm is run like this (example): 'mdadm -f' instead of 'mdadm
> > --manage -f'.  It means that the error message I get when I try to do
> > something is way off, because mdadm thinks I'm doing something
> > completely different than what I'm actually trying to do.  I would
> > much prefer having to specify an extra word (fx. --manage) for firing
> > an actual mdadm command if that would give me useful error messages.
> > Having to look through a bundle of documentation for commands I'm not
> > even interested in just to try and figure out what mdadm thinks I'm
> > trying to make it do is not funny.
>
> I'm not sure exactly what you are thinking here, but I will try giving
> mdadm some erroneous arguments and try to improve what it says.

Can't remember, it's been a while since I used it last.  I tend to use
it a lot but then only for brief periods of time at a time.  I think I
was having trouble making mdadm output useful error messages for
--run.  Not sure whether I gave it devices which didn't exist, wrong
options or what I did.  Sorry I can't be more helpful here.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: mdadm 2.1: command line option parsing bug?
  2005-11-21 23:21 ` Neil Brown
  2005-11-22 11:21   ` Michael Tokarev
  2005-11-22 15:41   ` Molle Bestefich
@ 2005-11-22 22:05   ` Andre Noll
  2005-11-26 14:04     ` RAID0 performance question JaniD++
  2 siblings, 1 reply; 33+ messages in thread
From: Andre Noll @ 2005-11-22 22:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: Andreas Haumer, linux-raid

On 10:21, Neil Brown wrote:

> -a has the same problem (--add vs --auto).
> 
> I'll see what I can do,

gengetopt? (http://www.gnu.org/software/gengetopt/)

Andre
-- 
Jesus not only saves, he also frequently makes backups

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RAID0 performance question
  2005-11-22 22:05   ` Andre Noll
@ 2005-11-26 14:04     ` JaniD++
  2005-11-26 15:56       ` Raz Ben-Jehuda(caro)
  2005-11-26 23:27       ` Neil Brown
  0 siblings, 2 replies; 33+ messages in thread
From: JaniD++ @ 2005-11-26 14:04 UTC (permalink / raw)
  To: linux-raid

Hello list,

I have searching the bottleneck of my system, and found something what i
cant cleanly understand.

I have use NBD with 4 disk nodes. (raidtab is the bottom of mail)

The cat /dev/nb# >/dev/null    makes ~ 350 Mbit/s on each nodes.
The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes ~ 780-800
Mbit/s. - i think this is my network bottleneck.

But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes) only makes
~450-490 Mbit/s, and i dont know why....

Somebody have an idea? :-)

(the nb31,30,29,28 only possible mirrors)

Thanks
Janos

raiddev         /dev/md1
raid-level      1
nr-raid-disks   2
chunk-size      32
persistent-superblock 1
device          /dev/nb0
raid-disk       0
device          /dev/nb31
raid-disk       1
failed-disk     /dev/nb31

raiddev         /dev/md2
raid-level      1
nr-raid-disks   2
chunk-size      32
persistent-superblock 1
device          /dev/nb1
raid-disk       0
device          /dev/hb30
raid-disk       1
failed-disk     /dev/nb30

raiddev         /dev/md3
raid-level      1
nr-raid-disks   2
chunk-size      32
persistent-superblock 1
device          /dev/nb2
raid-disk       0
device          /dev/nb29
raid-disk       1
failed-disk     /dev/nb29

raiddev         /dev/md4
raid-level      1
nr-raid-disks   2
chunk-size      32
persistent-superblock 1
device          /dev/nb3
raid-disk       0
device          /dev/nb28
raid-disk       1
failed-disk     /dev/nb28

raiddev         /dev/md31
raid-level      0
nr-raid-disks   4
chunk-size      32
persistent-superblock 1
device          /dev/md1
raid-disk       0
device          /dev/md2
raid-disk       1
device          /dev/md3
raid-disk       2
device          /dev/md4
raid-disk       3



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-26 14:04     ` RAID0 performance question JaniD++
@ 2005-11-26 15:56       ` Raz Ben-Jehuda(caro)
  2005-11-26 16:08         ` JaniD++
  2005-11-26 23:27       ` Neil Brown
  1 sibling, 1 reply; 33+ messages in thread
From: Raz Ben-Jehuda(caro) @ 2005-11-26 15:56 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

look at the cpu consumption.

On 11/26/05, JaniD++ <djani22@dynamicweb.hu> wrote:
> Hello list,
>
> I have searching the bottleneck of my system, and found something what i
> cant cleanly understand.
>
> I have use NBD with 4 disk nodes. (raidtab is the bottom of mail)
>
> The cat /dev/nb# >/dev/null    makes ~ 350 Mbit/s on each nodes.
> The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes ~ 780-800
> Mbit/s. - i think this is my network bottleneck.
>
> But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes) only makes
> ~450-490 Mbit/s, and i dont know why....
>
> Somebody have an idea? :-)
>
> (the nb31,30,29,28 only possible mirrors)
>
> Thanks
> Janos
>
> raiddev         /dev/md1
> raid-level      1
> nr-raid-disks   2
> chunk-size      32
> persistent-superblock 1
> device          /dev/nb0
> raid-disk       0
> device          /dev/nb31
> raid-disk       1
> failed-disk     /dev/nb31
>
> raiddev         /dev/md2
> raid-level      1
> nr-raid-disks   2
> chunk-size      32
> persistent-superblock 1
> device          /dev/nb1
> raid-disk       0
> device          /dev/hb30
> raid-disk       1
> failed-disk     /dev/nb30
>
> raiddev         /dev/md3
> raid-level      1
> nr-raid-disks   2
> chunk-size      32
> persistent-superblock 1
> device          /dev/nb2
> raid-disk       0
> device          /dev/nb29
> raid-disk       1
> failed-disk     /dev/nb29
>
> raiddev         /dev/md4
> raid-level      1
> nr-raid-disks   2
> chunk-size      32
> persistent-superblock 1
> device          /dev/nb3
> raid-disk       0
> device          /dev/nb28
> raid-disk       1
> failed-disk     /dev/nb28
>
> raiddev         /dev/md31
> raid-level      0
> nr-raid-disks   4
> chunk-size      32
> persistent-superblock 1
> device          /dev/md1
> raid-disk       0
> device          /dev/md2
> raid-disk       1
> device          /dev/md3
> raid-disk       2
> device          /dev/md4
> raid-disk       3
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


--
Raz

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-26 15:56       ` Raz Ben-Jehuda(caro)
@ 2005-11-26 16:08         ` JaniD++
  2005-11-26 17:11           ` Lajber Zoltan
  0 siblings, 1 reply; 33+ messages in thread
From: JaniD++ @ 2005-11-26 16:08 UTC (permalink / raw)
  To: Raz Ben-Jehuda(caro); +Cc: linux-raid

Hello, Raz,

Think this is not cpu usage problem. :-)
The system is divided to 4 cpuset, and each cpuset uses only one disknode.
(CPU0->nb0, CPU1->nb1, ...)

this top is under cat /dev/md31 (raid0)

Thanks,
Janos

 17:16:01  up 14:19,  4 users,  load average: 7.74, 5.03, 4.20
305 processes: 301 sleeping, 4 running, 0 zombie, 0 stopped
CPU0 states:  33.1% user  47.0% system    0.0% nice   0.0% iowait  18.0%
idle
CPU1 states:  21.0% user  52.0% system    0.0% nice   6.0% iowait  19.0%
idle
CPU2 states:   2.0% user  74.0% system    0.0% nice   3.0% iowait  18.0%
idle
CPU3 states:  10.0% user  57.0% system    0.0% nice   5.0% iowait  26.0%
idle
Mem:  4149412k av, 3961084k used,  188328k free,       0k shrd,  557032k
buff
       911068k active,            2881680k inactive
Swap:       0k av,       0k used,       0k free                 2779388k
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 2410 root       0 -19  1584  108    36 S <  48.3  0.0  21:57   3 nbd-client
16191 root      25   0  4832  820   664 R    48.3  0.0   3:04   0 grep
 2408 root       0 -19  1588  112    36 S <  47.3  0.0  24:05   2 nbd-client
 2406 root       0 -19  1584  108    36 S <  40.8  0.0  22:56   1 nbd-client
18126 root      18   0  5780 1604   508 D    38.0  0.0   0:12   1 dd
 2404 root       0 -19  1588  112    36 S <  36.2  0.0  22:56   0 nbd-client
  294 root      15   0     0    0     0 SW    7.4  0.0   3:22   1 kswapd0
 2284 root      16   0 13500 5376  3040 S     7.4  0.1   8:53   2 httpd
18307 root      16   0  6320 2232  1432 S     4.6  0.0   0:00   2 sendmail
16789 root      16   0  5472 1552   952 R     3.7  0.0   0:03   3 top
 2431 root      10  -5     0    0     0 SW<   2.7  0.0   7:32   2 md2_raid1
29076 root      17   0  4776  772   680 S     2.7  0.0   1:09   3 xfs_fsr
 6955 root      15   0  1588  108    36 S     2.7  0.0   0:56   2 nbd-client

----- Original Message ----- 
From: "Raz Ben-Jehuda(caro)" <raziebe@gmail.com>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Saturday, November 26, 2005 4:56 PM
Subject: Re: RAID0 performance question


> look at the cpu consumption.
>
> On 11/26/05, JaniD++ <djani22@dynamicweb.hu> wrote:
> > Hello list,
> >
> > I have searching the bottleneck of my system, and found something what i
> > cant cleanly understand.
> >
> > I have use NBD with 4 disk nodes. (raidtab is the bottom of mail)
> >
> > The cat /dev/nb# >/dev/null    makes ~ 350 Mbit/s on each nodes.
> > The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes ~ 780-800
> > Mbit/s. - i think this is my network bottleneck.
> >
> > But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes) only makes
> > ~450-490 Mbit/s, and i dont know why....
> >
> > Somebody have an idea? :-)
> >
> > (the nb31,30,29,28 only possible mirrors)
> >
> > Thanks
> > Janos
> >
> > raiddev         /dev/md1
> > raid-level      1
> > nr-raid-disks   2
> > chunk-size      32
> > persistent-superblock 1
> > device          /dev/nb0
> > raid-disk       0
> > device          /dev/nb31
> > raid-disk       1
> > failed-disk     /dev/nb31
> >
> > raiddev         /dev/md2
> > raid-level      1
> > nr-raid-disks   2
> > chunk-size      32
> > persistent-superblock 1
> > device          /dev/nb1
> > raid-disk       0
> > device          /dev/hb30
> > raid-disk       1
> > failed-disk     /dev/nb30
> >
> > raiddev         /dev/md3
> > raid-level      1
> > nr-raid-disks   2
> > chunk-size      32
> > persistent-superblock 1
> > device          /dev/nb2
> > raid-disk       0
> > device          /dev/nb29
> > raid-disk       1
> > failed-disk     /dev/nb29
> >
> > raiddev         /dev/md4
> > raid-level      1
> > nr-raid-disks   2
> > chunk-size      32
> > persistent-superblock 1
> > device          /dev/nb3
> > raid-disk       0
> > device          /dev/nb28
> > raid-disk       1
> > failed-disk     /dev/nb28
> >
> > raiddev         /dev/md31
> > raid-level      0
> > nr-raid-disks   4
> > chunk-size      32
> > persistent-superblock 1
> > device          /dev/md1
> > raid-disk       0
> > device          /dev/md2
> > raid-disk       1
> > device          /dev/md3
> > raid-disk       2
> > device          /dev/md4
> > raid-disk       3
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>
>
> --
> Raz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-26 16:08         ` JaniD++
@ 2005-11-26 17:11           ` Lajber Zoltan
  2005-11-26 17:34             ` JaniD++
  0 siblings, 1 reply; 33+ messages in thread
From: Lajber Zoltan @ 2005-11-26 17:11 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

On Sat, 26 Nov 2005, JaniD++ wrote:

> Hello, Raz,
>
> Think this is not cpu usage problem. :-)
> The system is divided to 4 cpuset, and each cpuset uses only one disknode.
> (CPU0->nb0, CPU1->nb1, ...)

Seams to be CPU problem. Which kind of NIC do you have?

> CPU2 states:   2.0% user  74.0% system    0.0% nice   3.0% iowait  18.0%
> idle
> CPU3 states:  10.0% user  57.0% system    0.0% nice   5.0% iowait  26.0%

Do you have 4 cpu, or 2 HT cpu?

Bye,
-=Lajbi=----------------------------------------------------------------
 LAJBER Zoltan               Szent Istvan Egyetem,  Informatika Hivatal
 Most of the time, if you think you are in trouble, crank that throttle!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-26 17:11           ` Lajber Zoltan
@ 2005-11-26 17:34             ` JaniD++
  2005-11-26 19:47               ` Lajber Zoltan
  0 siblings, 1 reply; 33+ messages in thread
From: JaniD++ @ 2005-11-26 17:34 UTC (permalink / raw)
  To: Lajber Zoltan; +Cc: linux-raid

Hello, Zoltán!

----- Original Message ----- 
From: "Lajber Zoltan" <lajbi@lajli.gau.hu>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Saturday, November 26, 2005 6:11 PM
Subject: Re: RAID0 performance question


> On Sat, 26 Nov 2005, JaniD++ wrote:
>
> > Hello, Raz,
> >
> > Think this is not cpu usage problem. :-)
> > The system is divided to 4 cpuset, and each cpuset uses only one
disknode.
> > (CPU0->nb0, CPU1->nb1, ...)
>
> Seams to be CPU problem. Which kind of NIC do you have?

Intel xeon motherboard, intel e1000 x2. (64bit)
But i already write that, if i cut out the raid, and starts the 4 cat at one
time the traffic is rise to 780-800 Mbit! :-)

This is not hardware related problem.
Only tune, or missconfiguration problem.  - I think...

>
> > CPU2 states:   2.0% user  74.0% system    0.0% nice   3.0% iowait  18.0%
> > idle
> > CPU3 states:  10.0% user  57.0% system    0.0% nice   5.0% iowait  26.0%
>
> Do you have 4 cpu, or 2 HT cpu?

2x HT :-)

But in the previous top, it was an used system!

Thanks,

Janos

>
> Bye,
> -=Lajbi=----------------------------------------------------------------
>  LAJBER Zoltan               Szent Istvan Egyetem,  Informatika Hivatal
>  Most of the time, if you think you are in trouble, crank that throttle!
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-26 17:34             ` JaniD++
@ 2005-11-26 19:47               ` Lajber Zoltan
  0 siblings, 0 replies; 33+ messages in thread
From: Lajber Zoltan @ 2005-11-26 19:47 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

Hi,

If you don't speak hungarian, forget this sentence:

Beszelsz magyarul? akkor folytathatjuk ugy is.


On Sat, 26 Nov 2005, JaniD++ wrote:

> Intel xeon motherboard, intel e1000 x2. (64bit)
> But i already write that, if i cut out the raid, and starts the 4 cat at one
> time the traffic is rise to 780-800 Mbit! :-)
>
> This is not hardware related problem.
> Only tune, or missconfiguration problem.  - I think...

What is in the /proc/interrupts? interruts distibuted over cpus, or all
irq goes for one cpu? What about, if you switch off HT?

Bye,
-=Lajbi=----------------------------------------------------------------
 LAJBER Zoltan               Szent Istvan Egyetem,  Informatika Hivatal
 Most of the time, if you think you are in trouble, crank that throttle!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-26 14:04     ` RAID0 performance question JaniD++
  2005-11-26 15:56       ` Raz Ben-Jehuda(caro)
@ 2005-11-26 23:27       ` Neil Brown
  2005-11-26 23:37         ` JaniD++
  2005-11-27 15:39         ` Al Boldi
  1 sibling, 2 replies; 33+ messages in thread
From: Neil Brown @ 2005-11-26 23:27 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

On Saturday November 26, djani22@dynamicweb.hu wrote:
> Hello list,
> 
> I have searching the bottleneck of my system, and found something what i
> cant cleanly understand.
> 
> I have use NBD with 4 disk nodes. (raidtab is the bottom of mail)
> 
> The cat /dev/nb# >/dev/null    makes ~ 350 Mbit/s on each nodes.
> The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes ~ 780-800
> Mbit/s. - i think this is my network bottleneck.
> 
> But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes) only makes
> ~450-490 Mbit/s, and i dont know why....
> 
> Somebody have an idea? :-)

Try increasing the read-ahead setting on /dev/md31 using 'blockdev'.
network block devices are likely to have latency issues and would
benefit from large read-ahead.

NeilBrown

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-26 23:27       ` Neil Brown
@ 2005-11-26 23:37         ` JaniD++
  2005-11-27 15:39         ` Al Boldi
  1 sibling, 0 replies; 33+ messages in thread
From: JaniD++ @ 2005-11-26 23:37 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Hello Neil,

This is my init script's one part:

blockdev --setra 2048 /dev/nb0
blockdev --setra 2048 /dev/nb1
blockdev --setra 2048 /dev/nb2
blockdev --setra 2048 /dev/nb3
blockdev --setra 2048 /dev/md1
blockdev --setra 2048 /dev/md2
blockdev --setra 2048 /dev/md3
blockdev --setra 2048 /dev/md4
blockdev --setra 4096 /dev/md31

:-)

This is the "default" for me.
The test is have made with this settings.
The problem is somewhere else....

Thanks
Janos

----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Sunday, November 27, 2005 12:27 AM
Subject: Re: RAID0 performance question


> On Saturday November 26, djani22@dynamicweb.hu wrote:
> > Hello list,
> >
> > I have searching the bottleneck of my system, and found something what i
> > cant cleanly understand.
> >
> > I have use NBD with 4 disk nodes. (raidtab is the bottom of mail)
> >
> > The cat /dev/nb# >/dev/null    makes ~ 350 Mbit/s on each nodes.
> > The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes ~ 780-800
> > Mbit/s. - i think this is my network bottleneck.
> >
> > But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes) only makes
> > ~450-490 Mbit/s, and i dont know why....
> >
> > Somebody have an idea? :-)
>
> Try increasing the read-ahead setting on /dev/md31 using 'blockdev'.
> network block devices are likely to have latency issues and would
> benefit from large read-ahead.
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-26 23:27       ` Neil Brown
  2005-11-26 23:37         ` JaniD++
@ 2005-11-27 15:39         ` Al Boldi
  2005-11-27 16:21           ` JaniD++
  1 sibling, 1 reply; 33+ messages in thread
From: Al Boldi @ 2005-11-27 15:39 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid, Neil Brown

Neil Brown wrote:
> On Saturday November 26, djani22@dynamicweb.hu wrote:
> >
> > The cat /dev/nb# >/dev/null    makes ~ 350 Mbit/s on each nodes.

Why is this so slow?
Or is this the max node-HD throughput?
What's the node HW config?

> > The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes ~ 780-800
> > Mbit/s. - i think this is my network bottleneck.

How much do you get w/ nb0+1,2,3 and nb0+1+2,3 respectively?

> > But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes) only makes
> > ~450-490 Mbit/s, and i dont know why....
> >
> > Somebody have an idea? :-)
>
> Try increasing the read-ahead setting on /dev/md31 using 'blockdev'.
> network block devices are likely to have latency issues and would
> benefit from large read-ahead.

Also try larger chunk-size ~4mb.

--
Al


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-27 15:39         ` Al Boldi
@ 2005-11-27 16:21           ` JaniD++
  2005-11-27 17:40             ` Al Boldi
  0 siblings, 1 reply; 33+ messages in thread
From: JaniD++ @ 2005-11-27 16:21 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-raid

Hi,


----- Original Message ----- 
From: "Al Boldi" <a1426z@gawab.com>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>; "Neil Brown" <neilb@suse.de>
Sent: Sunday, November 27, 2005 4:39 PM
Subject: Re: RAID0 performance question


> Neil Brown wrote:
> > On Saturday November 26, djani22@dynamicweb.hu wrote:
> > >
> > > The cat /dev/nb# >/dev/null    makes ~ 350 Mbit/s on each nodes.
>
> Why is this so slow?

Yes. :-)
This system tries to serv about 500-800 downloads.
The final goal is the 1k! ;-)

And this point why i am asking the list, because the HW performance is much
more without raid0 layer. :-/
The 780-800 Mbit is "almost" enough to 1k downloaders.

> Or is this the max node-HD throughput?

Yes.

> What's the node HW config?

P4-3G -HT
12x 200Gb hdd (10 IDE+2 SATA)
2G Ram
realtek gige.

RAID5 inside! ;-)


>
> > > The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes ~
780-800
> > > Mbit/s. - i think this is my network bottleneck.
>
> How much do you get w/ nb0+1,2,3 and nb0+1+2,3 respectively?

I unable to test write, because this is a productive system.
Think this value is about 75-80 per node and ~200-250 in md31.

>
> > > But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes) only
makes
> > > ~450-490 Mbit/s, and i dont know why....
> > >
> > > Somebody have an idea? :-)
> >
> > Try increasing the read-ahead setting on /dev/md31 using 'blockdev'.
> > network block devices are likely to have latency issues and would
> > benefit from large read-ahead.
>
> Also try larger chunk-size ~4mb.

Ahh.
This is what i can't do. :-(
I dont know how to backup 8TB! ;-)

Thanks,
Janos


>
> --
> Al
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-27 16:21           ` JaniD++
@ 2005-11-27 17:40             ` Al Boldi
  2005-11-27 19:02               ` JaniD++
  2005-11-30 23:13               ` JaniD++
  0 siblings, 2 replies; 33+ messages in thread
From: Al Boldi @ 2005-11-27 17:40 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

JaniD++ wrote:
> Al Boldi wrote:
> > Neil Brown wrote:
> > > On Saturday November 26, djani22@dynamicweb.hu wrote:
> > > > The cat /dev/nb# >/dev/null    makes ~ 350 Mbit/s on each nodes.
> > > > The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes
> > > > ~780-800 Mbit/s. - i think this is my network bottleneck.
> >
> > How much do you get w/ nb0+1,2,3 and nb0+1+2,3 respectively?
>
> I unable to test write, because this is a productive system.
> Think this value is about 75-80 per node and ~200-250 in md31.

How much do you get with:
cat nb# + nb# > /dev/null
cat nb# + nb# + nb# > /dev/null
respectively?

> > > > But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes) only
> > > > makes ~450-490 Mbit/s, and i dont know why....
> > > >
> > > > Somebody have an idea? :-)
> > >
> > > Try increasing the read-ahead setting on /dev/md31 using 'blockdev'.
> > > network block devices are likely to have latency issues and would
> > > benefit from large read-ahead.
> >
> > Also try larger chunk-size ~4mb.
>
> Ahh.
> This is what i can't do. :-(
> I dont know how to backup 8TB! ;-)

Maybe you could use your mirror!?

--
Al


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-27 17:40             ` Al Boldi
@ 2005-11-27 19:02               ` JaniD++
  2005-11-30 23:13               ` JaniD++
  1 sibling, 0 replies; 33+ messages in thread
From: JaniD++ @ 2005-11-27 19:02 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-raid, Neil Brown


----- Original Message ----- 
From: "Al Boldi" <a1426z@gawab.com>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Sunday, November 27, 2005 6:40 PM
Subject: Re: RAID0 performance question


> JaniD++ wrote:
> > Al Boldi wrote:
> > > Neil Brown wrote:
> > > > On Saturday November 26, djani22@dynamicweb.hu wrote:
> > > > > The cat /dev/nb# >/dev/null    makes ~ 350 Mbit/s on each nodes.
> > > > > The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes
> > > > > ~780-800 Mbit/s. - i think this is my network bottleneck.
> > >
> > > How much do you get w/ nb0+1,2,3 and nb0+1+2,3 respectively?
> >
> > I unable to test write, because this is a productive system.
> > Think this value is about 75-80 per node and ~200-250 in md31.
>
> How much do you get with:
> cat nb# + nb# > /dev/null
> cat nb# + nb# + nb# > /dev/null
> respectively?

md1 = 280- 291Mbit
md1+md2 = 450-480 Mbit
md1+md2+md3 = 615-630 Mbit
md1+md2+md3+md4 = now the peak is 674 Mbit....

...on a lightly used online system.  (~44Mbit download + ~60Mbit upload)
This time dd if=/dev/md1 of=/dev/null bs=1M count=4096 what i have used.

I think, this is normal.

I have try the md31 with different readahead settings:

1. nb0,1,2,3 + md1,2,3,4 readahead = 0 and md31 readahead =4096
result: 380 Mbit

2 all readahead . = 0
result:  88-98 Mbit

3. nb0,1,2,3 + md1,2,3,4 readahead = 2048 and md31 readahead = 0
result: 88-96 Mbit   - I wonder! :-O

4. nb# + md# readahead = 0 and md31 readahead = 8192
result: 96-114 Mbit

The winner is my default profile :-D
all 2048 and md31 = 4096

result : 403-423 Mbit

Neil!

What do you say? :-)

>
> > > > > But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes) only
> > > > > makes ~450-490 Mbit/s, and i dont know why....
> > > > >
> > > > > Somebody have an idea? :-)
> > > >
> > > > Try increasing the read-ahead setting on /dev/md31 using 'blockdev'.
> > > > network block devices are likely to have latency issues and would
> > > > benefit from large read-ahead.
> > >
> > > Also try larger chunk-size ~4mb.
> >
> > Ahh.
> > This is what i can't do. :-(
> > I dont know how to backup 8TB! ;-)
>
> Maybe you could use your mirror!?

There is no mirrors!
This is only further options!
To be able easy replace, repair one node....

Thanks
Janos

>
> --
> Al


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-27 17:40             ` Al Boldi
  2005-11-27 19:02               ` JaniD++
@ 2005-11-30 23:13               ` JaniD++
  2005-12-02 19:53                 ` Al Boldi
  1 sibling, 1 reply; 33+ messages in thread
From: JaniD++ @ 2005-11-30 23:13 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-raid

Hello,

> > > > > But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes) only
> > > > > makes ~450-490 Mbit/s, and i dont know why....
> > > > >
> > > > > Somebody have an idea? :-)
> > > >
> > > > Try increasing the read-ahead setting on /dev/md31 using 'blockdev'.
> > > > network block devices are likely to have latency issues and would
> > > > benefit from large read-ahead.
> > >
> > > Also try larger chunk-size ~4mb.
> >
> > Ahh.
> > This is what i can't do. :-(
> > I dont know how to backup 8TB! ;-)
>
> Maybe you could use your mirror!?

I have one idea! :-)

I can use the spare drives in the disknodes! :-)

But i don't know exactly what to try.
increase or decrease the chunksize?
In the top layer raid (md31,raid0) or in the middle layer raids (md1-4,
raid1) or both?

Can somebody help me to find the performance problem source?

Thanks,
Janos


>
> --
> Al


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-30 23:13               ` JaniD++
@ 2005-12-02 19:53                 ` Al Boldi
  2005-12-18  0:13                   ` JaniD++
  0 siblings, 1 reply; 33+ messages in thread
From: Al Boldi @ 2005-12-02 19:53 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

JaniD++ wrote:
> > > > > > But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes)
> > > > > > only makes ~450-490 Mbit/s, and i dont know why....
> > > > > >
> > > > > > Somebody have an idea? :-)
> > > > >
> > > > > Try increasing the read-ahead setting on /dev/md31 using
> > > > > 'blockdev'. network block devices are likely to have latency
> > > > > issues and would benefit from large read-ahead.
> > > >
> > > > Also try larger chunk-size ~4mb.
>
> But i don't know exactly what to try.
> increase or decrease the chunksize?
> In the top layer raid (md31,raid0) or in the middle layer raids (md1-4,
> raid1) or both?
>

What I found is that raid over nbd is highly max-chunksize dependent, due to 
nbd running over TCP.  But increasing chunksize does not necessarily mean 
better system utilization.  Much depends on your application request size.

Tuning performance to maximize cat/dd /dev/md# throughput may only be 
suitable for a synthetic indication of overall performance in system 
comparisons.

If your aim is to increase system utilization, then look for a good benchmark 
specific to your application requirements which would mimic a realistic 
load.

--
Al

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-12-02 19:53                 ` Al Boldi
@ 2005-12-18  0:13                   ` JaniD++
  2005-12-19 11:16                     ` Al Boldi
  2005-12-21  1:40                     ` Neil Brown
  0 siblings, 2 replies; 33+ messages in thread
From: JaniD++ @ 2005-12-18  0:13 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-raid


----- Original Message ----- 
From: "Al Boldi" <a1426z@gawab.com>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Friday, December 02, 2005 8:53 PM
Subject: Re: RAID0 performance question


> JaniD++ wrote:
> > > > > > > But the cat /dev/md31 >/dev/null (RAID0, the sum of 4 nodes)
> > > > > > > only makes ~450-490 Mbit/s, and i dont know why....
> > > > > > >
> > > > > > > Somebody have an idea? :-)
> > > > > >
> > > > > > Try increasing the read-ahead setting on /dev/md31 using
> > > > > > 'blockdev'. network block devices are likely to have latency
> > > > > > issues and would benefit from large read-ahead.
> > > > >
> > > > > Also try larger chunk-size ~4mb.
> >
> > But i don't know exactly what to try.
> > increase or decrease the chunksize?
> > In the top layer raid (md31,raid0) or in the middle layer raids (md1-4,
> > raid1) or both?
> >
>
> What I found is that raid over nbd is highly max-chunksize dependent, due
to
> nbd running over TCP.  But increasing chunksize does not necessarily mean
> better system utilization.  Much depends on your application request size.
>
> Tuning performance to maximize cat/dd /dev/md# throughput may only be
> suitable for a synthetic indication of overall performance in system
> comparisons.

Yes, you have right!
I already know that. ;-)

But the bottleneck-effect is visible with dd/cat too.  (and i am a litte bit
lazy :-)

Now i try the system with my spare drives, with the bigger chunk size
(=4096K on RAID0 and all RAID1), and the slowness is still here. :(
The problem is _exactly_ the same as previously.
I think unneccessary to try smaller chunk size, because the 32k is allready
small for 2,5,8MB readahead.

The problem is somewhere else... :-/

I have got one (or more) question for the raid list!

The raid (md) device why dont have scheduler in sysfs?
And if it have scheduler, where can i tune it?
The raid0 can handle multiple requests at one time?

For me, the performance bottleneck is cleanly about RAID0 layer used exactly
as "concentrator" to join the 4x2TB to 1x8TB.
But it is only a software, and i cant beleave it is unfixable, or tunable.
;-)

Cheers,
Janos

>
> If your aim is to increase system utilization, then look for a good
benchmark
> specific to your application requirements which would mimic a realistic
> load.
>
> --
> Al
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-12-18  0:13                   ` JaniD++
@ 2005-12-19 11:16                     ` Al Boldi
  2005-11-22  1:14                       ` JaniD++
  2005-11-23 10:48                       ` JaniD++
  2005-12-21  1:40                     ` Neil Brown
  1 sibling, 2 replies; 33+ messages in thread
From: Al Boldi @ 2005-12-19 11:16 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

JaniD++ wrote:
> For me, the performance bottleneck is cleanly about RAID0 layer used
> exactly as "concentrator" to join the 4x2TB to 1x8TB.

Did you try running RAID0 over nbd directly and found it to be faster?

IIRC, stacking raid modules does need a considerable amount of tuning, and 
even then it does not scale linearly.

Maybe NeilBrown can help?

--
Al


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-12-19 11:16                     ` Al Boldi
@ 2005-11-22  1:14                       ` JaniD++
  2005-11-23 10:48                       ` JaniD++
  1 sibling, 0 replies; 33+ messages in thread
From: JaniD++ @ 2005-11-22  1:14 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-raid


----- Original Message ----- 
From: "Al Boldi" <a1426z@gawab.com>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, December 19, 2005 12:16 PM
Subject: Re: RAID0 performance question


> JaniD++ wrote:
> > For me, the performance bottleneck is cleanly about RAID0 layer used
> > exactly as "concentrator" to join the 4x2TB to 1x8TB.
>
> Did you try running RAID0 over nbd directly and found it to be faster?

At this time i cannot test it, because the system is loaded, and the result
will fase.
Anyway i will probe this....

>
> IIRC, stacking raid modules does need a considerable amount of tuning, and
> even then it does not scale linearly.
>
> Maybe NeilBrown can help?

Maybe, but it looks like Neil is not interested. :-(

And the more probles is, i plan to modify the entire system structure, and
will not be spare drives to testing soon... :-/

Cheers,
Janos




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-12-19 11:16                     ` Al Boldi
  2005-11-22  1:14                       ` JaniD++
@ 2005-11-23 10:48                       ` JaniD++
  1 sibling, 0 replies; 33+ messages in thread
From: JaniD++ @ 2005-11-23 10:48 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-raid

----- Original Message ----- 
From: "Al Boldi" <a1426z@gawab.com>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, December 19, 2005 12:16 PM
Subject: Re: RAID0 performance question

> JaniD++ wrote:
> > For me, the performance bottleneck is cleanly about RAID0 layer used
> > exactly as "concentrator" to join the 4x2TB to 1x8TB.
>
> Did you try running RAID0 over nbd directly and found it to be faster?

Now i trying the NBD + RAID0 without the middle layer raid1.
I wondering, the speed is much more better!

If i use dd in the raid0 and the raid1 layer is active, the traffic is
350-400Mbit/s
If i use dd in the raid0 and the raid1 layer is inactive, the traffic is
512-620Mbit/s!
If i use parallel dd on all NBD devices, the traffic is 650-720Mbit/s.

(on my system's current minimal load)

I found it very interesting!
The kernel never reports timing information about raids in
/sys/block/mdX/stat files!

I cannot understand...

Cheers,
Janos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-12-18  0:13                   ` JaniD++
  2005-12-19 11:16                     ` Al Boldi
@ 2005-12-21  1:40                     ` Neil Brown
  2005-11-22  1:56                       ` JaniD++
  1 sibling, 1 reply; 33+ messages in thread
From: Neil Brown @ 2005-12-21  1:40 UTC (permalink / raw)
  To: JaniD++; +Cc: Al Boldi, linux-raid

On Sunday December 18, djani22@dynamicweb.hu wrote:
> 
> The raid (md) device why dont have scheduler in sysfs?
> And if it have scheduler, where can i tune it?

raid0 doesn't do any scheduling.
All it does is take requests from the filesystem, decide which device
they should go do (possibly splitting them if needed) and forwarding
them on to the device.  That is all.

> The raid0 can handle multiple requests at one time?

Yes.  But raid0 doesn't exactly 'handle' requests.  It 'directs'
requests for other devices to 'handle'.

> 
> For me, the performance bottleneck is cleanly about RAID0 layer used exactly
> as "concentrator" to join the 4x2TB to 1x8TB.
> But it is only a software, and i cant beleave it is unfixable, or
> tunable.

There is really nothing to tune apart from chunksize.

You can tune the way the filesystem/vm accesses the device by setting
readahead (readahead on component devices of a raid0 has exactly 0
effect). 

You can tune the underlying devices by choosing a scheduler (for a
disk drive) or a packet size (for over-the-network devices) or
whatever. 

But there is nothing to tune in raid0.

Also, rather than doing measurements on the block devices (/dev/mdX)
do measurements on a filesystem created on that device.
I have often found that the filesystem goes faster than the block
device.

NeilBrown

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-12-21  1:40                     ` Neil Brown
@ 2005-11-22  1:56                       ` JaniD++
  2005-12-22  4:49                         ` Neil Brown
  0 siblings, 1 reply; 33+ messages in thread
From: JaniD++ @ 2005-11-22  1:56 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid


----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: "Al Boldi" <a1426z@gawab.com>; <linux-raid@vger.kernel.org>
Sent: Wednesday, December 21, 2005 2:40 AM
Subject: Re: RAID0 performance question


> On Sunday December 18, djani22@dynamicweb.hu wrote:
> >
> > The raid (md) device why dont have scheduler in sysfs?
> > And if it have scheduler, where can i tune it?
>
> raid0 doesn't do any scheduling.
> All it does is take requests from the filesystem, decide which device
> they should go do (possibly splitting them if needed) and forwarding
> them on to the device.  That is all.
>
> > The raid0 can handle multiple requests at one time?
>
> Yes.  But raid0 doesn't exactly 'handle' requests.  It 'directs'
> requests for other devices to 'handle'.
>
> >
> > For me, the performance bottleneck is cleanly about RAID0 layer used
exactly
> > as "concentrator" to join the 4x2TB to 1x8TB.
> > But it is only a software, and i cant beleave it is unfixable, or
> > tunable.
>
> There is really nothing to tune apart from chunksize.
>
> You can tune the way the filesystem/vm accesses the device by setting
> readahead (readahead on component devices of a raid0 has exactly 0
> effect).

First i want to sorry, about "Neil not interested" thing in previous mail...

:-(
I have already try the all available options, including readahead in all
layer (result in earlyer mails), and chunksize.
But with this settings, i cannot workaround this.
And the result is incomprehensible for me!
The raid0 performance is not equal with one component , with sum of all
component , and not equal with the slowest component!

>
> You can tune the underlying devices by choosing a scheduler (for a
> disk drive) or a packet size (for over-the-network devices) or
> whatever.

The NBD has a scheduler, and this is already tuned for really top
performance, and for the components it is really great! :-)
(I have planned to set the NBD to 4KB packets, but this is hard, becaused by
my NICs are not supported the jumbo packets...)

>
> But there is nothing to tune in raid0.
>
>
> Also, rather than doing measurements on the block devices (/dev/mdX)
> do measurements on a filesystem created on that device.
> I have often found that the filesystem goes faster than the block
> device.

I use XFS, and the two performance is almost equal, depends on kind of load.
But in most often case, it is almost equal.

Thanks,
Janos

>
>
> NeilBrown


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-11-22  1:56                       ` JaniD++
@ 2005-12-22  4:49                         ` Neil Brown
  2005-11-23  9:44                           ` JaniD++
  0 siblings, 1 reply; 33+ messages in thread
From: Neil Brown @ 2005-12-22  4:49 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

On Tuesday November 22, djani22@dynamicweb.hu wrote:
> I have already try the all available options, including readahead in all
> layer (result in earlyer mails), and chunksize.
> But with this settings, i cannot workaround this.
> And the result is incomprehensible for me!
> The raid0 performance is not equal with one component , with sum of all
> component , and not equal with the slowest component!

This is quite perplexing.

My next step would probably be to watch the network traffic with
tcpdump or ethereal.  I would look for any differences between when it
is going quickly (without raid0) and when slowly (with raid0).

Rather than tcpdump, it might be easier to instrument the nbd server
to print out requests and timestamps.

Sorry I cannot be more helpful, and do have a Merry Christmas anyway
:-)

NeilBrown

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RAID0 performance question
  2005-12-22  4:49                         ` Neil Brown
@ 2005-11-23  9:44                           ` JaniD++
  0 siblings, 0 replies; 33+ messages in thread
From: JaniD++ @ 2005-11-23  9:44 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid


----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Thursday, December 22, 2005 5:49 AM
Subject: Re: RAID0 performance question


> On Tuesday November 22, djani22@dynamicweb.hu wrote:
> > I have already try the all available options, including readahead in all
> > layer (result in earlyer mails), and chunksize.
> > But with this settings, i cannot workaround this.
> > And the result is incomprehensible for me!
> > The raid0 performance is not equal with one component , with sum of all
> > component , and not equal with the slowest component!
>
> This is quite perplexing.
>
> My next step would probably be to watch the network traffic with
> tcpdump or ethereal.  I would look for any differences between when it
> is going quickly (without raid0) and when slowly (with raid0).
>
> Rather than tcpdump, it might be easier to instrument the nbd server
> to print out requests and timestamps.

Yes, it is good idea!
I will try it, thanks!

>
> Sorry I cannot be more helpful, and do have a Merry Christmas anyway
> :-)

The same to you, and everybody on this list! :-)

Cheers,
Janos

>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2005-12-22  4:49 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-18 19:34 mdadm 2.1: command line option parsing bug? Andreas Haumer
2005-11-21 23:21 ` Neil Brown
2005-11-22 11:21   ` Michael Tokarev
2005-11-24  5:15     ` Neil Brown
2005-11-22 15:41   ` Molle Bestefich
2005-11-24  5:25     ` Neil Brown
2005-11-24  7:31       ` Ross Vandegrift
2005-12-15  1:53       ` Molle Bestefich
2005-12-15  4:19         ` Neil Brown
2005-12-15 10:37           ` Molle Bestefich
2005-11-22 22:05   ` Andre Noll
2005-11-26 14:04     ` RAID0 performance question JaniD++
2005-11-26 15:56       ` Raz Ben-Jehuda(caro)
2005-11-26 16:08         ` JaniD++
2005-11-26 17:11           ` Lajber Zoltan
2005-11-26 17:34             ` JaniD++
2005-11-26 19:47               ` Lajber Zoltan
2005-11-26 23:27       ` Neil Brown
2005-11-26 23:37         ` JaniD++
2005-11-27 15:39         ` Al Boldi
2005-11-27 16:21           ` JaniD++
2005-11-27 17:40             ` Al Boldi
2005-11-27 19:02               ` JaniD++
2005-11-30 23:13               ` JaniD++
2005-12-02 19:53                 ` Al Boldi
2005-12-18  0:13                   ` JaniD++
2005-12-19 11:16                     ` Al Boldi
2005-11-22  1:14                       ` JaniD++
2005-11-23 10:48                       ` JaniD++
2005-12-21  1:40                     ` Neil Brown
2005-11-22  1:56                       ` JaniD++
2005-12-22  4:49                         ` Neil Brown
2005-11-23  9:44                           ` JaniD++

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).