From: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
To: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Cc: Wolfgang Denk <wd@denx.de>, linux-raid@vger.kernel.org
Subject: Re: raid6check extremely slow ?
Date: Mon, 11 May 2020 18:14:15 +0200 [thread overview]
Message-ID: <20200511161415.GA8049@lazy.lzy> (raw)
In-Reply-To: <f003a8c7-e96d-ddc3-6d1d-42a13b70e0b6@cloud.ionos.com>
On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
> Hi Wolfgang,
>
>
> On 5/11/20 8:40 AM, Wolfgang Denk wrote:
> > Dear Guoqing Jiang,
> >
> > In message <2cf55e5f-bdfb-9fef-6255-151e049ac0a1@cloud.ionos.com> you wrote:
> > > Seems raid6check is in 'D' state, what are the output of 'cat
> > > /proc/19719/stack' and /proc/mdstat?
> > # for i in 1 2 3 4 ; do cat /proc/19719/stack; sleep 2; echo ; done
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_lo_store+0x50/0xa0
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_lo_store+0x50/0xa0
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_hi_store+0x44/0x90
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [<0>] __wait_rcu_gp+0x10d/0x110
> > [<0>] synchronize_rcu+0x47/0x50
> > [<0>] mddev_suspend+0x4a/0x140
> > [<0>] suspend_hi_store+0x44/0x90
> > [<0>] md_attr_store+0x86/0xe0
> > [<0>] kernfs_fop_write+0xce/0x1b0
> > [<0>] vfs_write+0xb6/0x1a0
> > [<0>] ksys_write+0x4f/0xc0
> > [<0>] do_syscall_64+0x5b/0xf0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
> is called,
> means synchronize_rcu and other synchronize mechanisms are triggered in the
> path ...
>
> > Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
> > all the time? I thought it was _reading_ the disks only?
>
> I didn't read raid6check before, just find check_stripes has
>
>
> while (length > 0) {
> lock_stripe -> write suspend_lo/hi node
> ...
> unlock_all_stripes -> -> write suspend_lo/hi node
> }
>
> I think it explains the stack of raid6check, and maybe it is way that
> raid6check works, lock
> stripe, check the stripe then unlock the stripe, just my guess ...
Hi again!
I made a quick test.
I disabled the lock / unlock in raid6check.
With lock / unlock, I get around 1.2MB/sec
per device component, with ~13% CPU load.
Wihtout lock / unlock, I get around 15.5MB/sec
per device component, with ~30% CPU load.
So, it seems the lock / unlock mechanism is
quite expensive.
I'm not sure what's the best solution, since
we still need to avoid race conditions.
Any suggestion is welcome!
bye,
pg
> > And iostat does not report any writes either?
>
> Because CPU is busying with mddev_suspend I think.
>
> > # iostat /dev/sd[efhijklm] | cat
> > Linux 5.6.8-300.fc32.x86_64 (atlas.denx.de) 2020-05-11 _x86_64_ (8 CPU)
> >
> > avg-cpu: %user %nice %system %iowait %steal %idle
> > 0.18 0.00 1.07 0.17 0.00 98.58
> >
> > Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
> > sde 20.30 368.76 0.10 0.00 277022327 75178 0
> > sdf 20.28 368.77 0.10 0.00 277030081 75170 0
> > sdh 20.30 368.74 0.10 0.00 277007903 74854 0
> > sdi 20.30 368.79 0.10 0.00 277049113 75246 0
> > sdj 20.82 368.76 0.10 0.00 277022363 74986 0
> > sdk 20.30 368.73 0.10 0.00 277002179 76322 0
> > sdl 20.29 368.78 0.10 0.00 277039743 74982 0
> > sdm 20.29 368.75 0.10 0.00 277018163 74958 0
> >
> >
> > # cat /proc/mdstat
> > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
> > md3 : active raid10 sdc1[0] sdd1[1]
> > 234878976 blocks 512K chunks 2 far-copies [2/2] [UU]
> > bitmap: 0/2 pages [0KB], 65536KB chunk
> >
> > md0 : active raid6 sdm[15] sdl[14] sdi[8] sde[12] sdj[9] sdk[10] sdh[13] sdf[11]
> > 11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU]
> >
> > md1 : active raid1 sdb3[0] sda3[1]
> > 484118656 blocks [2/2] [UU]
> >
> > md2 : active raid1 sdb1[0] sda1[1]
> > 255936 blocks [2/2] [UU]
> >
> > unused devices: <none>
> >
> > > > 3 days later:
> > > Is raid6check still in 'D' state as before?
> > Yes, nothing changed, still running:
> >
> > top - 08:39:30 up 8 days, 16:41, 3 users, load average: 1.00, 1.00, 1.00
> > Tasks: 243 total, 1 running, 242 sleeping, 0 stopped, 0 zombie
> > %Cpu0 : 0.0 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
> > %Cpu1 : 1.0 us, 5.4 sy, 0.0 ni, 92.2 id, 0.7 wa, 0.3 hi, 0.3 si, 0.0 st
> > %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > MiB Mem : 24034.6 total, 10920.6 free, 1883.0 used, 11231.1 buff/cache
> > MiB Swap: 7828.5 total, 7828.5 free, 0.0 used. 21756.5 avail Mem
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 19719 root 20 0 2852 2820 2020 D 7.6 0.0 679:04.39 raid6check
>
> I think the stack of raid6check is pretty much the same as before.
>
> Since the estimated time of 12TB array is about 57 days, if the estimated
> time is linear to
> the number of stripes in the same machine, then it is how raid6check works
> as I guessed.
>
> Thanks,
> Guoqing
--
piergiorgio
next prev parent reply other threads:[~2020-05-11 16:14 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-10 12:07 raid6check extremely slow ? Wolfgang Denk
2020-05-10 13:26 ` Piergiorgio Sartor
2020-05-11 6:33 ` Wolfgang Denk
2020-05-10 22:16 ` Guoqing Jiang
2020-05-11 6:40 ` Wolfgang Denk
2020-05-11 8:58 ` Guoqing Jiang
2020-05-11 15:39 ` Piergiorgio Sartor
2020-05-12 7:37 ` Wolfgang Denk
2020-05-12 16:17 ` Piergiorgio Sartor
2020-05-13 6:13 ` Wolfgang Denk
2020-05-13 16:22 ` Piergiorgio Sartor
2020-05-11 16:14 ` Piergiorgio Sartor [this message]
2020-05-11 20:53 ` Giuseppe Bilotta
2020-05-11 21:12 ` Guoqing Jiang
2020-05-11 21:16 ` Guoqing Jiang
2020-05-12 1:52 ` Giuseppe Bilotta
2020-05-12 6:27 ` Adam Goryachev
2020-05-12 16:11 ` Piergiorgio Sartor
2020-05-12 16:05 ` Piergiorgio Sartor
2020-05-11 21:07 ` Guoqing Jiang
2020-05-11 22:44 ` Peter Grandi
2020-05-12 16:09 ` Piergiorgio Sartor
2020-05-12 20:54 ` antlists
2020-05-13 16:18 ` Piergiorgio Sartor
2020-05-13 17:37 ` Wols Lists
2020-05-13 18:23 ` Piergiorgio Sartor
2020-05-12 16:07 ` Piergiorgio Sartor
2020-05-12 18:16 ` Guoqing Jiang
2020-05-12 18:32 ` Piergiorgio Sartor
2020-05-13 6:18 ` Wolfgang Denk
2020-05-13 6:07 ` Wolfgang Denk
2020-05-15 10:34 ` Andrey Jr. Melnikov
2020-05-15 11:54 ` Wolfgang Denk
2020-05-15 12:58 ` Guoqing Jiang
2020-05-14 17:20 ` Roy Sigurd Karlsbakk
2020-05-14 18:20 ` Wolfgang Denk
2020-05-14 19:51 ` Roy Sigurd Karlsbakk
2020-05-15 8:08 ` Wolfgang Denk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200511161415.GA8049@lazy.lzy \
--to=piergiorgio.sartor@nexgo.de \
--cc=guoqing.jiang@cloud.ionos.com \
--cc=linux-raid@vger.kernel.org \
--cc=wd@denx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).