From: Ali Gholami Rudi <aligrudi@gmail.com>
To: Yu Kuai <yukuai1@huaweicloud.com>
Cc: Xiao Ni <xni@redhat.com>,
linux-raid@vger.kernel.org, song@kernel.org,
"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
Date: Sat, 01 Jul 2023 14:47:43 +0330 [thread overview]
Message-ID: <20230107144743@tare.nit> (raw)
In-Reply-To: <2311bff8-232c-916b-98b6-7543bd48ecfa@huaweicloud.com>
Hi,
I repeated the test on a large array (14TB instead of 40GB):
$ cat /proc/mdstat
Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
md3 : active raid10 nvme1n1p5[1] nvme3n1p5[3] nvme0n1p5[0] nvme2n1p5[2]
14889424896 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
md0 : active raid10 nvme2n1p2[2] nvme3n1p2[3] nvme0n1p2[0] nvme1n1p2[1]
2091008 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
..
I get these results.
On md0 (40GB):
READ: IOPS=1563K BW=6109MiB/s
WRITE: IOPS= 670K BW=2745MiB/s
On md3 (14TB):
READ: IOPS=1177K BW=4599MiB/s
WRITE: IOPS= 505K BW=1972MiB/s
On md3 but disabling mdadm bitmap (mdadm --grow --bitmap=none /dev/md3):
READ: IOPS=1351K BW=5278MiB/s
WRITE: IOPS= 579K BW=2261MiB/s
The tests are performed on Debian-12 (kernel version 6.1).
Any room for improvement?
Thanks,
Ali
This is perf's output; there are lock contentions.
+ 95.25% 0.00% fio [unknown] [k] 0xffffffffffffffff
+ 95.00% 0.00% fio fio [.] 0x000055e073fcd117
+ 93.68% 0.13% fio [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
+ 93.54% 0.03% fio [kernel.kallsyms] [k] do_syscall_64
+ 92.38% 0.03% fio libc.so.6 [.] syscall
+ 92.18% 0.00% fio fio [.] 0x000055e073fcaceb
+ 92.18% 0.08% fio fio [.] td_io_queue
+ 92.04% 0.02% fio fio [.] td_io_commit
+ 91.76% 0.00% fio fio [.] 0x000055e073fefe5e
- 91.76% 0.05% fio libaio.so.1.0.2 [.] io_submit
- 91.71% io_submit
- 91.69% syscall
- 91.58% entry_SYSCALL_64_after_hwframe
- 91.55% do_syscall_64
- 91.06% __x64_sys_io_submit
- 90.93% io_submit_one
- 48.85% aio_write
- 48.77% ext4_file_write_iter
- 39.86% iomap_dio_rw
- 39.85% __iomap_dio_rw
- 22.55% blk_finish_plug
- 22.55% __blk_flush_plug
- 21.67% raid10_unplug
- 16.54% submit_bio_noacct_nocheck
- 16.44% blk_mq_submit_bio
- 16.17% __rq_qos_throttle
- 16.01% wbt_wait
- 15.93% rq_qos_wait
- 14.52% prepare_to_wait_exclusive
- 11.50% _raw_spin_lock_irqsave
11.49% native_queued_spin_lock_slowpath
- 3.01% _raw_spin_unlock_irqrestore
- 2.29% asm_common_interrupt
- 2.29% common_interrupt
- 2.28% __common_interrupt
- 2.28% handle_edge_irq
- 2.26% handle_irq_event
- 2.26% __handle_irq_event_percpu
- nvme_irq
- 2.23% blk_mq_end_request_batch
- 1.41% __rq_qos_done
- wbt_done
- 1.38% __wake_up_common_lock
- 1.36% _raw_spin_lock_irqsave
native_queued_spin_lock_slowpath
0.51% raid10_end_read_request
- 0.61% asm_sysvec_apic_timer_interrupt
- sysvec_apic_timer_interrupt
- 0.57% __irq_exit_rcu
- 0.56% __softirqentry_text_start
- 0.55% asm_common_interrupt
- common_interrupt
- 0.55% __common_interrupt
- 0.55% handle_edge_irq
- 0.54% handle_irq_event
- 0.54% __handle_irq_event_percpu
- nvme_irq
0.54% blk_mq_end_request_batch
- 0.87% io_schedule
- 0.85% schedule
- 0.84% __schedule
- 0.62% pick_next_task_fair
- 0.61% newidle_balance
- 0.60% load_balance
0.50% find_busiest_group
- 3.98% __wake_up_common_lock
- 3.21% _raw_spin_lock_irqsave
3.02% native_queued_spin_lock_slowpath
- 0.77% _raw_spin_unlock_irqrestore
- 0.64% asm_common_interrupt
- common_interrupt
- 0.63% __common_interrupt
- 0.63% handle_edge_irq
- 0.60% handle_irq_event
- 0.59% __handle_irq_event_percpu
- nvme_irq
0.58% blk_mq_end_request_batch
0.84% blk_mq_flush_plug_list
- 12.50% iomap_dio_bio_iter
- 10.79% submit_bio_noacct_nocheck
- 10.73% __submit_bio
- 9.77% md_handle_request
- 7.14% raid10_make_request
- 2.98% raid10_write_one_disk
- 0.52% asm_common_interrupt
- common_interrupt
- 0.51% __common_interrupt
0.51% handle_edge_irq
1.16% wait_blocked_dev
- 0.83% regular_request_wait
0.82% wait_barrier
0.95% md_submit_bio
- 3.54% iomap_iter
- 3.52% ext4_iomap_overwrite_begin
- 3.52% ext4_iomap_begin
1.80% ext4_set_iomap
- 2.19% file_modified_flags
2.16% inode_needs_update_time.part.0
1.82% up_read
1.61% down_read
- 0.88% ext4_generic_write_checks
0.57% generic_write_checks
- 41.78% aio_read
- 41.64% ext4_file_read_iter
- 31.62% iomap_dio_rw
- 31.61% __iomap_dio_rw
- 19.92% iomap_dio_bio_iter
- 16.26% submit_bio_noacct_nocheck
- 15.60% __submit_bio
- 13.31% md_handle_request
- 7.50% raid10_make_request
- 6.14% raid10_read_request
- 1.94% regular_request_wait
1.92% wait_barrier
1.12% read_balance
- 0.53% asm_common_interrupt
- 0.53% common_interrupt
- 0.53% __common_interrupt
- 0.53% handle_edge_irq
0.50% handle_irq_event
- 1.14% asm_common_interrupt
- 1.14% common_interrupt
- 1.13% __common_interrupt
- 1.13% handle_edge_irq
- 1.08% handle_irq_event
- 1.07% __handle_irq_event_percpu
- nvme_irq
1.05% blk_mq_end_request_batch
2.22% md_submit_bio
0.52% blk_mq_submit_bio
- 0.67% asm_common_interrupt
- common_interrupt
- 0.66% __common_interrupt
- 0.66% handle_edge_irq
- 0.62% handle_irq_event
- 0.62% __handle_irq_event_percpu
- nvme_irq
0.61% blk_mq_end_request_batch
- 8.90% iomap_iter
- 8.86% ext4_iomap_begin
- 4.24% ext4_set_iomap
- 0.86% asm_common_interrupt
- 0.86% common_interrupt
- 0.85% __common_interrupt
- 0.85% handle_edge_irq
- 0.81% handle_irq_event
- 0.80% __handle_irq_event_percpu
- nvme_irq
0.79% blk_mq_end_request_batch
- 0.88% ext4_map_blocks
0.68% ext4_es_lookup_extent
- 0.81% asm_common_interrupt
- 0.81% common_interrupt
- 0.81% __common_interrupt
- 0.81% handle_edge_irq
- 0.77% handle_irq_event
- 0.76% __handle_irq_event_percpu
- nvme_irq
0.75% blk_mq_end_request_batch
- 4.53% down_read
- 0.87% asm_common_interrupt
- 0.86% common_interrupt
- 0.86% __common_interrupt
- 0.86% handle_edge_irq
- 0.81% handle_irq_event
- 0.81% __handle_irq_event_percpu
- nvme_irq
0.79% blk_mq_end_request_batch
- 4.42% up_read
- 0.82% asm_common_interrupt
- 0.82% common_interrupt
- 0.81% __common_interrupt
- 0.81% handle_edge_irq
- 0.77% handle_irq_event
- 0.77% __handle_irq_event_percpu
- nvme_irq
0.75% blk_mq_end_request_batch
- 0.86% ext4_dio_alignment
0.83% ext4_inode_journal_mode
next prev parent reply other threads:[~2023-07-01 11:32 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-15 7:54 Unacceptably Poor RAID1 Performance with Many CPU Cores Ali Gholami Rudi
2023-06-15 9:16 ` Xiao Ni
2023-06-15 17:08 ` Ali Gholami Rudi
2023-06-15 17:36 ` Ali Gholami Rudi
2023-06-16 1:53 ` Xiao Ni
2023-06-16 5:20 ` Ali Gholami Rudi
2023-06-15 14:02 ` Yu Kuai
2023-06-16 2:14 ` Xiao Ni
2023-06-16 2:34 ` Yu Kuai
2023-06-16 5:52 ` Ali Gholami Rudi
[not found] ` <20231606091224@laper.mirepesht>
2023-06-16 7:31 ` Ali Gholami Rudi
2023-06-16 7:42 ` Yu Kuai
2023-06-16 8:21 ` Ali Gholami Rudi
2023-06-16 8:34 ` Yu Kuai
2023-06-16 8:52 ` Ali Gholami Rudi
2023-06-16 9:17 ` Yu Kuai
2023-06-16 11:51 ` Ali Gholami Rudi
2023-06-16 12:27 ` Yu Kuai
2023-06-18 20:30 ` Ali Gholami Rudi
2023-06-19 1:22 ` Yu Kuai
2023-06-19 5:19 ` Ali Gholami Rudi
2023-06-19 6:53 ` Yu Kuai
2023-06-21 8:05 ` Xiao Ni
2023-06-21 8:26 ` Yu Kuai
2023-06-21 8:55 ` Xiao Ni
2023-07-01 11:17 ` Ali Gholami Rudi [this message]
2023-07-03 12:39 ` Yu Kuai
2023-07-05 7:59 ` Ali Gholami Rudi
2023-06-21 19:34 ` Wols Lists
2023-06-23 0:52 ` Xiao Ni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230107144743@tare.nit \
--to=aligrudi@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=song@kernel.org \
--cc=xni@redhat.com \
--cc=yukuai1@huaweicloud.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).