linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
To: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Cc: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>,
	Wolfgang Denk <wd@denx.de>,
	linux-raid@vger.kernel.org
Subject: Re: raid6check extremely slow ?
Date: Tue, 12 May 2020 20:32:51 +0200	[thread overview]
Message-ID: <20200512183251.GA11548@lazy.lzy> (raw)
In-Reply-To: <e24b0703-a599-45ef-f6b6-0a713cfa414c@cloud.ionos.com>

On Tue, May 12, 2020 at 08:16:27PM +0200, Guoqing Jiang wrote:
> On 5/12/20 6:07 PM, Piergiorgio Sartor wrote:
> > On Mon, May 11, 2020 at 11:07:31PM +0200, Guoqing Jiang wrote:
> > > On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
> > > > On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
> > > > > Hi Wolfgang,
> > > > > 
> > > > > 
> > > > > On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> > > > > > Dear Guoqing Jiang,
> > > > > > 
> > > > > > In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com>  you wrote:
> > > > > > > Seems raid6check is in 'D' state, what are the output of 'cat
> > > > > > > /proc/19719/stack' and /proc/mdstat?
> > > > > > # for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_lo_store+0x50/0xa0
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > 
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_lo_store+0x50/0xa0
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > 
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_hi_store+0x44/0x90
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > 
> > > > > > [<0>] __wait_rcu_gp+0x10d/0x110
> > > > > > [<0>] synchronize_rcu+0x47/0x50
> > > > > > [<0>] mddev_suspend+0x4a/0x140
> > > > > > [<0>] suspend_hi_store+0x44/0x90
> > > > > > [<0>] md_attr_store+0x86/0xe0
> > > > > > [<0>] kernfs_fop_write+0xce/0x1b0
> > > > > > [<0>] vfs_write+0xb6/0x1a0
> > > > > > [<0>] ksys_write+0x4f/0xc0
> > > > > > [<0>] do_syscall_64+0x5b/0xf0
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
> > > > > is called,
> > > > > means synchronize_rcu and other synchronize mechanisms are triggered in the
> > > > > path ...
> > > > > 
> > > > > > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> > > > > > all the time?  I thought it was_reading_  the disks only?
> > > > > I didn't read raid6check before, just find check_stripes has
> > > > > 
> > > > > 
> > > > >       while (length > 0) {
> > > > >               lock_stripe -> write suspend_lo/hi node
> > > > >               ...
> > > > >               unlock_all_stripes -> -> write suspend_lo/hi node
> > > > >       }
> > > > > 
> > > > > I think it explains the stack of raid6check, and maybe it is way that
> > > > > raid6check works, lock
> > > > > stripe, check the stripe then unlock the stripe, just my guess ...
> > > > Hi again!
> > > > 
> > > > I made a quick test.
> > > > I disabled the lock / unlock in raid6check.
> > > > 
> > > > With lock / unlock, I get around 1.2MB/sec
> > > > per device component, with ~13% CPU load.
> > > > Wihtout lock / unlock, I get around 15.5MB/sec
> > > > per device component, with ~30% CPU load.
> > > > 
> > > > So, it seems the lock / unlock mechanism is
> > > > quite expensive.
> > > Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.
> > > 
> > > > I'm not sure what's the best solution, since
> > > > we still need to avoid race conditions.
> > > I guess there are two possible ways:
> > > 
> > > 1. Per your previous reply, only call raid6check when array is RO, then
> > > we don't need the lock.
> > > 
> > > 2. Investigate if it is possible that acquire stripe_lock in
> > > suspend_lo/hi_store
> > > to avoid the race between raid6check and write to the same stripe. IOW,
> > > try fine grained protection instead of call the expensive suspend/resume
> > > in suspend_lo/hi_store. But I am not sure it is doable or not right now.
> > Could you please elaborate on the
> > "fine grained protection" thing?
> 
> Even raid6check checks stripe and locks stripe one by one, but the thing
> is different in kernel space, locking of one stripe triggers mddev_suspend
> and mddev_resume which affect all stripes ...
> 
> If kernel can expose interface to actually locking one stripe, then
> raid6check
> could use it to actually lock only one stripe (this is what I call fine
> grained)
> instead of trigger suspend/resume which are time consuming.

I see, you mean we need a different
interface to this lock / unlock thing.

> > > BTW, seems there are build problems for raid6check ...
> > > 
> > > mdadm$ make raid6check
> > > gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter
> > > -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\"
> > > -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\"
> > > -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\"
> > > -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM
> > > -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS
> > > -DBINDIR=\"/sbin\"  -o sysfs.o -c sysfs.c
> > > gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o
> > > xmalloc.o dlink.o
> > > sysfs.o: In function `sysfsline':
> > > sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
> > > sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
> > > sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
> > > collect2: error: ld returned 1 exit status
> > > Makefile:220: recipe for target 'raid6check' failed
> > > make: *** [raid6check] Error 1
> > I cannot see this problem.
> > I could compile without issue.
> > Maybe some library is missing somewhere,
> > but I'm not sure where.
> 
> Do you try with the fastest mdadm tree? But could be environment issue ...

I'm using Fedora, so I downloaded
the .srpm package, installed, enabled
raid6check, patched and rebuild...

My background idea was to have the
mdadm rpm *with* raid6check, but I
did not go so far...

Sorry...

bye,

pg
 
> Thanks,
> Guoqing

-- 

piergiorgio

  reply	other threads:[~2020-05-12 18:32 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
2020-05-10 13:26 ` Piergiorgio Sartor
2020-05-11  6:33   ` Wolfgang Denk
2020-05-10 22:16 ` Guoqing Jiang
2020-05-11  6:40   ` Wolfgang Denk
2020-05-11  8:58     ` Guoqing Jiang
2020-05-11 15:39       ` Piergiorgio Sartor
2020-05-12  7:37         ` Wolfgang Denk
2020-05-12 16:17           ` Piergiorgio Sartor
2020-05-13  6:13             ` Wolfgang Denk
2020-05-13 16:22               ` Piergiorgio Sartor
2020-05-11 16:14       ` Piergiorgio Sartor
2020-05-11 20:53         ` Giuseppe Bilotta
2020-05-11 21:12           ` Guoqing Jiang
2020-05-11 21:16             ` Guoqing Jiang
2020-05-12  1:52               ` Giuseppe Bilotta
2020-05-12  6:27                 ` Adam Goryachev
2020-05-12 16:11                   ` Piergiorgio Sartor
2020-05-12 16:05           ` Piergiorgio Sartor
2020-05-11 21:07         ` Guoqing Jiang
2020-05-11 22:44           ` Peter Grandi
2020-05-12 16:09             ` Piergiorgio Sartor
2020-05-12 20:54               ` antlists
2020-05-13 16:18                 ` Piergiorgio Sartor
2020-05-13 17:37                   ` Wols Lists
2020-05-13 18:23                     ` Piergiorgio Sartor
2020-05-12 16:07           ` Piergiorgio Sartor
2020-05-12 18:16             ` Guoqing Jiang
2020-05-12 18:32               ` Piergiorgio Sartor [this message]
2020-05-13  6:18                 ` Wolfgang Denk
2020-05-13  6:07             ` Wolfgang Denk
2020-05-15 10:34               ` Andrey Jr. Melnikov
2020-05-15 11:54                 ` Wolfgang Denk
2020-05-15 12:58                   ` Guoqing Jiang
2020-05-14 17:20 ` Roy Sigurd Karlsbakk
2020-05-14 18:20   ` Wolfgang Denk
2020-05-14 19:51     ` Roy Sigurd Karlsbakk
2020-05-15  8:08       ` Wolfgang Denk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200512183251.GA11548@lazy.lzy \
    --to=piergiorgio.sartor@nexgo.de \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=wd@denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).