linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: Ali Gholami Rudi <aligrudi@gmail.com>, Yu Kuai <yukuai1@huaweicloud.com>
Cc: Xiao Ni <xni@redhat.com>,
	linux-raid@vger.kernel.org, song@kernel.org,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
Date: Mon, 19 Jun 2023 09:22:52 +0800	[thread overview]
Message-ID: <c9fbb419-ec62-8673-ded2-cc11d3a11d7f@huaweicloud.com> (raw)
In-Reply-To: <20231906000051@laper.mirepesht>

Hi,

在 2023/06/19 4:30, Ali Gholami Rudi 写道:
> Hi,
> 
> I tested raid10 with NVMe disks on Debian 12 (Linux 6.1.0), which
> includes your first patch only.
> 
> The speed was very bad:
> 
> READ:  IOPS=360K BW=1412MiB/s
> WRITE: IOPS=154K BW= 606MiB/s
> 

Can you try to test with --bitmap=none, and --assume-clean(or echo
frozen to sync_action), let's see if spin_lock from wait_barrier() and
from md_bitmap_startwrite is bypassed, how performance will be.

Thanks,
Kuai

> Perf's output:
> 
> +   98.90%     0.00%  fio      [unknown]               [.] 0xffffffffffffffff
> +   98.71%     0.00%  fio      fio                     [.] 0x0000563ae0f62117
> +   97.69%     0.02%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> +   97.66%     0.02%  fio      [kernel.kallsyms]       [k] do_syscall_64
> +   97.29%     0.00%  fio      fio                     [.] 0x0000563ae0f5fceb
> +   97.29%     0.05%  fio      fio                     [.] td_io_queue
> +   97.20%     0.01%  fio      fio                     [.] td_io_commit
> +   97.20%     0.02%  fio      libc.so.6               [.] syscall
> +   96.94%     0.05%  fio      libaio.so.1.0.2         [.] io_submit
> +   96.94%     0.00%  fio      fio                     [.] 0x0000563ae0f84e5e
> +   96.50%     0.02%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> -   96.44%     0.03%  fio      [kernel.kallsyms]       [k] io_submit_one
>     - 96.41% io_submit_one
>        - 65.16% aio_read
>           - 65.07% xfs_file_read_iter
>              - 65.06% xfs_file_dio_read
>                 - 60.21% iomap_dio_rw
>                    - 60.21% __iomap_dio_rw
>                       - 49.84% iomap_dio_bio_iter
>                          - 49.39% submit_bio_noacct_nocheck
>                             - 49.08% __submit_bio
>                                - 48.80% md_handle_request
>                                   - 48.40% raid10_make_request
>                                      - 48.14% raid10_read_request
>                                         - 47.63% regular_request_wait
>                                            - 47.62% wait_barrier
>                                               - 44.17% _raw_spin_lock_irq
>                                                    44.14% native_queued_spin_lock_slowpath
>                                               - 2.39% schedule
>                                                  - 2.38% __schedule
>                                                     + 1.99% pick_next_task_fair
>                       - 9.78% iomap_iter
>                          - 9.77% xfs_read_iomap_begin
>                             - 9.30% xfs_ilock_for_iomap
>                                - 9.29% down_read
>                                   - 9.18% rwsem_down_read_slowpath
>                                      - 4.67% schedule_preempt_disabled
>                                         - 4.67% schedule
>                                            - 4.67% __schedule
>                                               - 4.08% pick_next_task_fair
>                                                  - 4.08% newidle_balance
>                                                     - 3.94% load_balance
>                                                        - 3.60% find_busiest_group
>                                                             3.59% update_sd_lb_stats.constprop.0
>                                      - 4.12% _raw_spin_lock_irq
>                                           4.11% native_queued_spin_lock_slowpath
>                 + 4.56% touch_atime
>        - 31.12% aio_write
>           - 31.06% xfs_file_write_iter
>              - 31.00% xfs_file_dio_write_aligned
>                 - 27.41% iomap_dio_rw
>                    - 27.40% __iomap_dio_rw
>                       - 23.29% iomap_dio_bio_iter
>                          - 23.14% submit_bio_noacct_nocheck
>                             - 23.11% __submit_bio
>                                - 23.02% md_handle_request
>                                   - 22.85% raid10_make_request
>                                      - 20.45% regular_request_wait
>                                         - 20.44% wait_barrier
>                                            - 18.97% _raw_spin_lock_irq
>                                                 18.96% native_queued_spin_lock_slowpath
>                                            - 1.02% schedule
>                                               - 1.02% __schedule
>                                                  - 0.85% pick_next_task_fair
>                                                     + 0.84% newidle_balance
>                                      + 1.85% md_bitmap_startwrite
>                       - 3.20% iomap_iter
>                          - 3.19% xfs_direct_write_iomap_begin
>                             - 3.00% xfs_ilock_for_iomap
>                                - 2.99% down_read
>                                   - 2.95% rwsem_down_read_slowpath
>                                      + 1.70% schedule_preempt_disabled
>                                      + 1.13% _raw_spin_lock_irq
>                       + 0.81% blk_finish_plug
>                 + 3.47% xfs_file_write_checks
> +   87.62%     0.01%  fio      [kernel.kallsyms]       [k] iomap_dio_rw
> +   87.61%     0.14%  fio      [kernel.kallsyms]       [k] __iomap_dio_rw
> +   74.85%    74.85%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
> +   73.13%     0.10%  fio      [kernel.kallsyms]       [k] iomap_dio_bio_iter
> +   72.99%     0.11%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irq
> +   72.76%     0.02%  fio      [kernel.kallsyms]       [k] submit_bio_noacct_nocheck
> +   72.20%     0.01%  fio      [kernel.kallsyms]       [k] __submit_bio
> +   71.82%     0.43%  fio      [kernel.kallsyms]       [k] md_handle_request
> +   71.25%     0.15%  fio      [kernel.kallsyms]       [k] raid10_make_request
> +   68.08%     0.02%  fio      [kernel.kallsyms]       [k] regular_request_wait
> +   68.06%     0.57%  fio      [kernel.kallsyms]       [k] wait_barrier
> +   65.16%     0.01%  fio      [kernel.kallsyms]       [k] aio_read
> +   65.07%     0.01%  fio      [kernel.kallsyms]       [k] xfs_file_read_iter
> +   65.06%     0.01%  fio      [kernel.kallsyms]       [k] xfs_file_dio_read
> +   48.14%     0.12%  fio      [kernel.kallsyms]       [k] raid10_read_request
> 
> Note that in the ramdisk tests, I gate whole ramdisks or raid devices
> to fio.  Here I used files on the filesystem.
> 
> Thanks,
> Ali
> 
> # cat /proc/mdstat:
> Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
> md127 : active raid10 ram1[1] ram0[0]
>        1046528 blocks super 1.2 2 near-copies [2/2] [UU]
> 
> md3 : active raid10 nvme0n1p5[0] nvme1n1p5[1] nvme3n1p5[3] nvme4n1p5[4] nvme6n1p5[6] nvme5n1p5[5] nvme7n1p5[7] nvme2n1p5[2]
>        14887084032 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
>        [=======>.............]  resync = 37.2% (5549960960/14887084032) finish=754.4min speed=206272K/sec
>        bitmap: 70/111 pages [280KB], 65536KB chunk
> 
> md1 : active raid10 nvme1n1p3[1] nvme3n1p3[3] nvme0n1p3[0] nvme4n1p3[4] nvme5n1p3[5] nvme6n1p3[6] nvme7n1p3[7] nvme2n1p3[2]
>        41906176 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
> 
> md0 : active raid10 nvme1n1p2[1] nvme3n1p2[3] nvme0n1p2[0] nvme6n1p2[6] nvme4n1p2[4] nvme5n1p2[5] nvme7n1p2[7] nvme2n1p2[2]
>        2084864 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
> 
> md2 : active (auto-read-only) raid10 nvme4n1p4[4] nvme1n1p4[1] nvme3n1p4[3] nvme0n1p4[0] nvme6n1p4[6] nvme7n1p4[7] nvme5n1p4[5] nvme2n1p4[2]
>        67067904 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
>          resync=PENDING
> 
> unused devices: <none>
> 
> # lspci | grep NVM
> 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 61:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 62:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 83:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 84:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> #
> 
> .
> 


  reply	other threads:[~2023-06-19  1:23 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15  7:54 Unacceptably Poor RAID1 Performance with Many CPU Cores Ali Gholami Rudi
2023-06-15  9:16 ` Xiao Ni
2023-06-15 17:08   ` Ali Gholami Rudi
2023-06-15 17:36     ` Ali Gholami Rudi
2023-06-16  1:53       ` Xiao Ni
2023-06-16  5:20         ` Ali Gholami Rudi
2023-06-15 14:02 ` Yu Kuai
2023-06-16  2:14   ` Xiao Ni
2023-06-16  2:34     ` Yu Kuai
2023-06-16  5:52     ` Ali Gholami Rudi
     [not found]     ` <20231606091224@laper.mirepesht>
2023-06-16  7:31       ` Ali Gholami Rudi
2023-06-16  7:42         ` Yu Kuai
2023-06-16  8:21           ` Ali Gholami Rudi
2023-06-16  8:34             ` Yu Kuai
2023-06-16  8:52               ` Ali Gholami Rudi
2023-06-16  9:17                 ` Yu Kuai
2023-06-16 11:51                 ` Ali Gholami Rudi
2023-06-16 12:27                   ` Yu Kuai
2023-06-18 20:30                     ` Ali Gholami Rudi
2023-06-19  1:22                       ` Yu Kuai [this message]
2023-06-19  5:19                       ` Ali Gholami Rudi
2023-06-19  6:53                         ` Yu Kuai
2023-06-21  8:05                     ` Xiao Ni
2023-06-21  8:26                       ` Yu Kuai
2023-06-21  8:55                         ` Xiao Ni
2023-07-01 11:17                         ` Ali Gholami Rudi
2023-07-03 12:39                           ` Yu Kuai
2023-07-05  7:59                             ` Ali Gholami Rudi
2023-06-21 19:34                       ` Wols Lists
2023-06-23  0:52                         ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9fbb419-ec62-8673-ded2-cc11d3a11d7f@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=aligrudi@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).