linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unacceptably Poor RAID1 Performance with Many CPU Cores
@ 2023-06-15  7:54 Ali Gholami Rudi
  2023-06-15  9:16 ` Xiao Ni
  2023-06-15 14:02 ` Yu Kuai
  0 siblings, 2 replies; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-15  7:54 UTC (permalink / raw)
  To: linux-raid; +Cc: song

Hi,

This simple experiment reproduces the problem.

Create a RAID1 array using two ramdisks of size 1G:

  mdadm --create /dev/md/test --level=1 --raid-devices=2 /dev/ram0 /dev/ram1

Then use fio to test disk performance (iodepth=64 and numjobs=40;
details at the end of this email).  This is what we get in our machine
(two AMD EPYC 7002 CPUs each with 64 cores and 2TB of RAM; Linux v5.10.0):

Without RAID (writing to /dev/ram0)
READ:  IOPS=14391K BW=56218MiB/s
WRITE: IOPS= 6167K BW=24092MiB/s

RAID1 (writing to /dev/md/test)
READ:  IOPS=  542K BW= 2120MiB/s
WRITE: IOPS=  232K BW=  935MiB/s

The difference, even for reading is huge.

I tried perf to see what is the problem; results are included at the
end of this email.

Any ideas?

We are actually executing hundreds of VMs on our hosts.  The problem
is that when we use RAID1 for our enterprise NVMe disks, the
performance degrades very much compared to using them directly; it
seems we have the same bottleneck as the test described above.

Thanks,
Ali

Perf output:

Samples: 1M of event 'cycles', Event count (approx.): 1158425235997
  Children      Self  Command  Shared Object           Symbol
+   97.98%     0.01%  fio      fio                     [.] fio_libaio_commit
+   97.95%     0.01%  fio      libaio.so.1.0.1         [.] io_submit
+   97.85%     0.01%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
-   97.82%     0.01%  fio      [kernel.kallsyms]       [k] io_submit_one
   - 97.81% io_submit_one
      - 54.62% aio_write
         - 54.60% blkdev_write_iter
            - 36.30% blk_finish_plug
               - flush_plug_callbacks
                  - 36.29% raid1_unplug
                     - flush_bio_list
                        - 18.44% submit_bio_noacct
                           - 18.40% brd_submit_bio
                              - 18.13% raid1_end_write_request
                                 - 17.94% raid_end_bio_io
                                    - 17.82% __wake_up_common_lock
                                       + 17.79% _raw_spin_lock_irqsave
                        - 17.79% __wake_up_common_lock
                           + 17.76% _raw_spin_lock_irqsave
            + 18.29% __generic_file_write_iter
      - 43.12% aio_read
         - 43.07% blkdev_read_iter
            - generic_file_read_iter
               - 43.04% blkdev_direct_IO
                  - 42.95% submit_bio_noacct
                     - 42.23% brd_submit_bio
                        - 41.91% raid1_end_read_request
                           - 41.70% raid_end_bio_io
                              - 41.43% __wake_up_common_lock
                                 + 41.36% _raw_spin_lock_irqsave
                     - 0.68% md_submit_bio
                          0.61% md_handle_request
+   94.90%     0.00%  fio      [kernel.kallsyms]       [k] __wake_up_common_lock
+   94.86%     0.22%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irqsave
+   94.64%    94.64%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
+   79.63%     0.02%  fio      [kernel.kallsyms]       [k] submit_bio_noacct


FIO configuration file:

[global] 
name=random reads and writes
ioengine=libaio 
direct=1
readwrite=randrw 
rwmixread=70 
iodepth=64 
buffered=0 
#filename=/dev/ram0
filename=/dev/dm/test
size=1G
runtime=30 
time_based 
randrepeat=0 
norandommap 
refill_buffers 
ramp_time=10
bs=4k
numjobs=400
group_reporting=1
[job1]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-15  7:54 Unacceptably Poor RAID1 Performance with Many CPU Cores Ali Gholami Rudi
@ 2023-06-15  9:16 ` Xiao Ni
  2023-06-15 17:08   ` Ali Gholami Rudi
  2023-06-15 14:02 ` Yu Kuai
  1 sibling, 1 reply; 30+ messages in thread
From: Xiao Ni @ 2023-06-15  9:16 UTC (permalink / raw)
  To: Ali Gholami Rudi; +Cc: linux-raid, song

On Thu, Jun 15, 2023 at 4:04 PM Ali Gholami Rudi <aligrudi@gmail.com> wrote:
>
> Hi,
>
> This simple experiment reproduces the problem.
>
> Create a RAID1 array using two ramdisks of size 1G:
>
>   mdadm --create /dev/md/test --level=1 --raid-devices=2 /dev/ram0 /dev/ram1
>
> Then use fio to test disk performance (iodepth=64 and numjobs=40;
> details at the end of this email).  This is what we get in our machine
> (two AMD EPYC 7002 CPUs each with 64 cores and 2TB of RAM; Linux v5.10.0):
>
> Without RAID (writing to /dev/ram0)
> READ:  IOPS=14391K BW=56218MiB/s
> WRITE: IOPS= 6167K BW=24092MiB/s
>
> RAID1 (writing to /dev/md/test)
> READ:  IOPS=  542K BW= 2120MiB/s
> WRITE: IOPS=  232K BW=  935MiB/s
>
> The difference, even for reading is huge.
>
> I tried perf to see what is the problem; results are included at the
> end of this email.
>
> Any ideas?

Hello Ali

Because it can be reproduced easily in your environment. Can you try
with the latest upstream kernel? If the problem doesn't exist with
latest upstream kernel. You can use git bisect to find which patch can
fix this problem.

>
> We are actually executing hundreds of VMs on our hosts.  The problem
> is that when we use RAID1 for our enterprise NVMe disks, the
> performance degrades very much compared to using them directly; it
> seems we have the same bottleneck as the test described above.

So those hundreds VMs run on the raid1, and the raid1 is created with
nvme disks. What's /proc/mdstat?

Regards
Xiao
>
> Thanks,
> Ali
>
> Perf output:
>
> Samples: 1M of event 'cycles', Event count (approx.): 1158425235997
>   Children      Self  Command  Shared Object           Symbol
> +   97.98%     0.01%  fio      fio                     [.] fio_libaio_commit
> +   97.95%     0.01%  fio      libaio.so.1.0.1         [.] io_submit
> +   97.85%     0.01%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> -   97.82%     0.01%  fio      [kernel.kallsyms]       [k] io_submit_one
>    - 97.81% io_submit_one
>       - 54.62% aio_write
>          - 54.60% blkdev_write_iter
>             - 36.30% blk_finish_plug
>                - flush_plug_callbacks
>                   - 36.29% raid1_unplug
>                      - flush_bio_list
>                         - 18.44% submit_bio_noacct
>                            - 18.40% brd_submit_bio
>                               - 18.13% raid1_end_write_request
>                                  - 17.94% raid_end_bio_io
>                                     - 17.82% __wake_up_common_lock
>                                        + 17.79% _raw_spin_lock_irqsave
>                         - 17.79% __wake_up_common_lock
>                            + 17.76% _raw_spin_lock_irqsave
>             + 18.29% __generic_file_write_iter
>       - 43.12% aio_read
>          - 43.07% blkdev_read_iter
>             - generic_file_read_iter
>                - 43.04% blkdev_direct_IO
>                   - 42.95% submit_bio_noacct
>                      - 42.23% brd_submit_bio
>                         - 41.91% raid1_end_read_request
>                            - 41.70% raid_end_bio_io
>                               - 41.43% __wake_up_common_lock
>                                  + 41.36% _raw_spin_lock_irqsave
>                      - 0.68% md_submit_bio
>                           0.61% md_handle_request
> +   94.90%     0.00%  fio      [kernel.kallsyms]       [k] __wake_up_common_lock
> +   94.86%     0.22%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irqsave
> +   94.64%    94.64%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
> +   79.63%     0.02%  fio      [kernel.kallsyms]       [k] submit_bio_noacct
>
>
> FIO configuration file:
>
> [global]
> name=random reads and writes
> ioengine=libaio
> direct=1
> readwrite=randrw
> rwmixread=70
> iodepth=64
> buffered=0
> #filename=/dev/ram0
> filename=/dev/dm/test
> size=1G
> runtime=30
> time_based
> randrepeat=0
> norandommap
> refill_buffers
> ramp_time=10
> bs=4k
> numjobs=400
> group_reporting=1
> [job1]
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-15  7:54 Unacceptably Poor RAID1 Performance with Many CPU Cores Ali Gholami Rudi
  2023-06-15  9:16 ` Xiao Ni
@ 2023-06-15 14:02 ` Yu Kuai
  2023-06-16  2:14   ` Xiao Ni
  1 sibling, 1 reply; 30+ messages in thread
From: Yu Kuai @ 2023-06-15 14:02 UTC (permalink / raw)
  To: Ali Gholami Rudi, linux-raid; +Cc: song, yukuai (C)

Hi,

在 2023/06/15 15:54, Ali Gholami Rudi 写道:
> Perf output:
> 
> Samples: 1M of event 'cycles', Event count (approx.): 1158425235997
>    Children      Self  Command  Shared Object           Symbol
> +   97.98%     0.01%  fio      fio                     [.] fio_libaio_commit
> +   97.95%     0.01%  fio      libaio.so.1.0.1         [.] io_submit
> +   97.85%     0.01%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> -   97.82%     0.01%  fio      [kernel.kallsyms]       [k] io_submit_one
>     - 97.81% io_submit_one
>        - 54.62% aio_write
>           - 54.60% blkdev_write_iter
>              - 36.30% blk_finish_plug
>                 - flush_plug_callbacks
>                    - 36.29% raid1_unplug
>                       - flush_bio_list
>                          - 18.44% submit_bio_noacct
>                             - 18.40% brd_submit_bio
>                                - 18.13% raid1_end_write_request
>                                   - 17.94% raid_end_bio_io
>                                      - 17.82% __wake_up_common_lock
>                                         + 17.79% _raw_spin_lock_irqsave
>                          - 17.79% __wake_up_common_lock
>                             + 17.76% _raw_spin_lock_irqsave
>              + 18.29% __generic_file_write_iter
>        - 43.12% aio_read
>           - 43.07% blkdev_read_iter
>              - generic_file_read_iter
>                 - 43.04% blkdev_direct_IO
>                    - 42.95% submit_bio_noacct
>                       - 42.23% brd_submit_bio
>                          - 41.91% raid1_end_read_request
>                             - 41.70% raid_end_bio_io
>                                - 41.43% __wake_up_common_lock
>                                   + 41.36% _raw_spin_lock_irqsave
>                       - 0.68% md_submit_bio
>                            0.61% md_handle_request
> +   94.90%     0.00%  fio      [kernel.kallsyms]       [k] __wake_up_common_lock
> +   94.86%     0.22%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irqsave
> +   94.64%    94.64%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
> +   79.63%     0.02%  fio      [kernel.kallsyms]       [k] submit_bio_noacct

This looks familiar... Perhaps can you try to test with raid10 with
latest mainline kernel? I used to optimize spin_lock for raid10, and I
don't do this for raid1 yet... I can try to do the same thing for raid1
if it's valuable.

> 
> 
> FIO configuration file:
> 
> [global]
> name=random reads and writes
> ioengine=libaio
> direct=1
> readwrite=randrw
> rwmixread=70
> iodepth=64
> buffered=0
> #filename=/dev/ram0
> filename=/dev/dm/test
> size=1G
> runtime=30
> time_based
> randrepeat=0
> norandommap
> refill_buffers
> ramp_time=10
> bs=4k
> numjobs=400

400 is too aggressive, I think spin_lock from fast path is probably
causing the problem, same as I met before for raid10...

Thanks,
Kuai

> group_reporting=1
> [job1]
> 
> .
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-15  9:16 ` Xiao Ni
@ 2023-06-15 17:08   ` Ali Gholami Rudi
  2023-06-15 17:36     ` Ali Gholami Rudi
  0 siblings, 1 reply; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-15 17:08 UTC (permalink / raw)
  To: Xiao Ni; +Cc: linux-raid, song

Xiao Ni <xni@redhat.com> wrote:
> Because it can be reproduced easily in your environment. Can you try
> with the latest upstream kernel? If the problem doesn't exist with
> latest upstream kernel. You can use git bisect to find which patch can
> fix this problem.

I just tried the upstream.  I get almost the same result with 1G ramdisks.

Without RAID (writing to /dev/ram0)
READ:  IOPS=15.8M BW=60.3GiB/s
WRITE: IOPS= 6.8M BW=27.7GiB/s

RAID1 (writing to /dev/md/test)
READ:  IOPS=518K BW=2028MiB/s
WRITE: IOPS=222K BW= 912MiB/s

> > We are actually executing hundreds of VMs on our hosts.  The problem
> > is that when we use RAID1 for our enterprise NVMe disks, the
> > performance degrades very much compared to using them directly; it
> > seems we have the same bottleneck as the test described above.
> 
> So those hundreds VMs run on the raid1, and the raid1 is created with
> nvme disks. What's /proc/mdstat?

At the moment we do not use raid1 due to this performance issue.
Since the machines are in production, I can not change their disk
layout.  If I find the opportunity, I will set up raid1 on real
disks and report the contents of /proc/mdstat.

Thanks,
Ali


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-15 17:08   ` Ali Gholami Rudi
@ 2023-06-15 17:36     ` Ali Gholami Rudi
  2023-06-16  1:53       ` Xiao Ni
  0 siblings, 1 reply; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-15 17:36 UTC (permalink / raw)
  To: Xiao Ni; +Cc: linux-raid, song


Ali Gholami Rudi <aligrudi@gmail.com> wrote:
> Xiao Ni <xni@redhat.com> wrote:
> > Because it can be reproduced easily in your environment. Can you try
> > with the latest upstream kernel? If the problem doesn't exist with
> > latest upstream kernel. You can use git bisect to find which patch can
> > fix this problem.
> 
> I just tried the upstream.  I get almost the same result with 1G ramdisks.
> 
> Without RAID (writing to /dev/ram0)
> READ:  IOPS=15.8M BW=60.3GiB/s
> WRITE: IOPS= 6.8M BW=27.7GiB/s
> 
> RAID1 (writing to /dev/md/test)
> READ:  IOPS=518K BW=2028MiB/s
> WRITE: IOPS=222K BW= 912MiB/s

And this is perf's output:

+   98.73%     0.01%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
+   98.63%     0.01%  fio      [kernel.kallsyms]       [k] do_syscall_64
+   97.28%     0.01%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
-   97.09%     0.01%  fio      [kernel.kallsyms]       [k] io_submit_one
   - 97.08% io_submit_one
      - 53.58% aio_write
         - 53.42% blkdev_write_iter
            - 35.28% blk_finish_plug
               - flush_plug_callbacks
                  - 35.27% raid1_unplug
                     - flush_bio_list
                        - 17.88% submit_bio_noacct_nocheck
                           - 17.88% __submit_bio
                              - 17.61% raid1_end_write_request
                                 - 17.47% raid_end_bio_io
                                    - 17.41% __wake_up_common_lock
                                       - 17.38% _raw_spin_lock_irqsave
                                            native_queued_spin_lock_slowpath
                        - 17.35% __wake_up_common_lock
                           - 17.31% _raw_spin_lock_irqsave
                                native_queued_spin_lock_slowpath
            + 18.07% __generic_file_write_iter
      - 43.00% aio_read
         - 42.64% blkdev_read_iter
            - 42.37% __blkdev_direct_IO_async
               - 41.40% submit_bio_noacct_nocheck
                  - 41.34% __submit_bio
                     - 40.68% raid1_end_read_request
                        - 40.55% raid_end_bio_io
                           - 40.35% __wake_up_common_lock
                              - 40.28% _raw_spin_lock_irqsave
                                   native_queued_spin_lock_slowpath
+   95.19%     0.32%  fio      fio                     [.] thread_main
+   95.08%     0.00%  fio      [unknown]               [.] 0xffffffffffffffff
+   95.03%     0.00%  fio      fio                     [.] run_threads
+   94.77%     0.00%  fio      fio                     [.] do_io (inlined)
+   94.65%     0.16%  fio      fio                     [.] td_io_queue
+   94.65%     0.11%  fio      libc-2.31.so            [.] syscall
+   94.54%     0.07%  fio      fio                     [.] fio_libaio_commit
+   94.53%     0.05%  fio      fio                     [.] td_io_commit
+   94.50%     0.00%  fio      fio                     [.] io_u_submit (inlined)
+   94.47%     0.04%  fio      libaio.so.1.0.1         [.] io_submit
+   92.48%     0.02%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irqsave
+   92.48%     0.00%  fio      [kernel.kallsyms]       [k] __wake_up_common_lock
+   92.46%    92.32%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
+   76.85%     0.03%  fio      [kernel.kallsyms]       [k] submit_bio_noacct_nocheck
+   76.76%     0.00%  fio      [kernel.kallsyms]       [k] __submit_bio
+   60.25%     0.06%  fio      [kernel.kallsyms]       [k] __blkdev_direct_IO_async
+   58.12%     0.11%  fio      [kernel.kallsyms]       [k] raid_end_bio_io
..

Thanks,
Ali


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-15 17:36     ` Ali Gholami Rudi
@ 2023-06-16  1:53       ` Xiao Ni
  2023-06-16  5:20         ` Ali Gholami Rudi
  0 siblings, 1 reply; 30+ messages in thread
From: Xiao Ni @ 2023-06-16  1:53 UTC (permalink / raw)
  To: Ali Gholami Rudi; +Cc: linux-raid, song

On Fri, Jun 16, 2023 at 1:38 AM Ali Gholami Rudi <aligrudi@gmail.com> wrote:
>
>
> Ali Gholami Rudi <aligrudi@gmail.com> wrote:
> > Xiao Ni <xni@redhat.com> wrote:
> > > Because it can be reproduced easily in your environment. Can you try
> > > with the latest upstream kernel? If the problem doesn't exist with
> > > latest upstream kernel. You can use git bisect to find which patch can
> > > fix this problem.
> >
> > I just tried the upstream.  I get almost the same result with 1G ramdisks.
> >
> > Without RAID (writing to /dev/ram0)
> > READ:  IOPS=15.8M BW=60.3GiB/s
> > WRITE: IOPS= 6.8M BW=27.7GiB/s
> >
> > RAID1 (writing to /dev/md/test)
> > READ:  IOPS=518K BW=2028MiB/s
> > WRITE: IOPS=222K BW= 912MiB/s

Hi Ali

I can reproduce this with upstream kernel too.

RAID1
READ: bw=3699MiB/s (3879MB/s)
WRITE: bw=1586MiB/s (1663MB/s)

ram disk:
READ: bw=5720MiB/s (5997MB/s)
WRITE: bw=2451MiB/s (2570MB/s)

There is a performance problem. But not like your result. Your result
has a huge gap. I'm not sure the reason. Any thoughts?


>
> And this is perf's output:

I'm not familiar with perf, what's your command that I can use to see
the same output?

Regards
Xiao
>
> +   98.73%     0.01%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> +   98.63%     0.01%  fio      [kernel.kallsyms]       [k] do_syscall_64
> +   97.28%     0.01%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> -   97.09%     0.01%  fio      [kernel.kallsyms]       [k] io_submit_one
>    - 97.08% io_submit_one
>       - 53.58% aio_write
>          - 53.42% blkdev_write_iter
>             - 35.28% blk_finish_plug
>                - flush_plug_callbacks
>                   - 35.27% raid1_unplug
>                      - flush_bio_list
>                         - 17.88% submit_bio_noacct_nocheck
>                            - 17.88% __submit_bio
>                               - 17.61% raid1_end_write_request
>                                  - 17.47% raid_end_bio_io
>                                     - 17.41% __wake_up_common_lock
>                                        - 17.38% _raw_spin_lock_irqsave
>                                             native_queued_spin_lock_slowpath
>                         - 17.35% __wake_up_common_lock
>                            - 17.31% _raw_spin_lock_irqsave
>                                 native_queued_spin_lock_slowpath
>             + 18.07% __generic_file_write_iter
>       - 43.00% aio_read
>          - 42.64% blkdev_read_iter
>             - 42.37% __blkdev_direct_IO_async
>                - 41.40% submit_bio_noacct_nocheck
>                   - 41.34% __submit_bio
>                      - 40.68% raid1_end_read_request
>                         - 40.55% raid_end_bio_io
>                            - 40.35% __wake_up_common_lock
>                               - 40.28% _raw_spin_lock_irqsave
>                                    native_queued_spin_lock_slowpath
> +   95.19%     0.32%  fio      fio                     [.] thread_main
> +   95.08%     0.00%  fio      [unknown]               [.] 0xffffffffffffffff
> +   95.03%     0.00%  fio      fio                     [.] run_threads
> +   94.77%     0.00%  fio      fio                     [.] do_io (inlined)
> +   94.65%     0.16%  fio      fio                     [.] td_io_queue
> +   94.65%     0.11%  fio      libc-2.31.so            [.] syscall
> +   94.54%     0.07%  fio      fio                     [.] fio_libaio_commit
> +   94.53%     0.05%  fio      fio                     [.] td_io_commit
> +   94.50%     0.00%  fio      fio                     [.] io_u_submit (inlined)
> +   94.47%     0.04%  fio      libaio.so.1.0.1         [.] io_submit
> +   92.48%     0.02%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irqsave
> +   92.48%     0.00%  fio      [kernel.kallsyms]       [k] __wake_up_common_lock
> +   92.46%    92.32%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
> +   76.85%     0.03%  fio      [kernel.kallsyms]       [k] submit_bio_noacct_nocheck
> +   76.76%     0.00%  fio      [kernel.kallsyms]       [k] __submit_bio
> +   60.25%     0.06%  fio      [kernel.kallsyms]       [k] __blkdev_direct_IO_async
> +   58.12%     0.11%  fio      [kernel.kallsyms]       [k] raid_end_bio_io
> ..
>
> Thanks,
> Ali
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-15 14:02 ` Yu Kuai
@ 2023-06-16  2:14   ` Xiao Ni
  2023-06-16  2:34     ` Yu Kuai
                       ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Xiao Ni @ 2023-06-16  2:14 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Ali Gholami Rudi, linux-raid, song, yukuai (C)

On Thu, Jun 15, 2023 at 10:06 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/15 15:54, Ali Gholami Rudi 写道:
> > Perf output:
> >
> > Samples: 1M of event 'cycles', Event count (approx.): 1158425235997
> >    Children      Self  Command  Shared Object           Symbol
> > +   97.98%     0.01%  fio      fio                     [.] fio_libaio_commit
> > +   97.95%     0.01%  fio      libaio.so.1.0.1         [.] io_submit
> > +   97.85%     0.01%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> > -   97.82%     0.01%  fio      [kernel.kallsyms]       [k] io_submit_one
> >     - 97.81% io_submit_one
> >        - 54.62% aio_write
> >           - 54.60% blkdev_write_iter
> >              - 36.30% blk_finish_plug
> >                 - flush_plug_callbacks
> >                    - 36.29% raid1_unplug
> >                       - flush_bio_list
> >                          - 18.44% submit_bio_noacct
> >                             - 18.40% brd_submit_bio
> >                                - 18.13% raid1_end_write_request
> >                                   - 17.94% raid_end_bio_io
> >                                      - 17.82% __wake_up_common_lock
> >                                         + 17.79% _raw_spin_lock_irqsave
> >                          - 17.79% __wake_up_common_lock
> >                             + 17.76% _raw_spin_lock_irqsave
> >              + 18.29% __generic_file_write_iter
> >        - 43.12% aio_read
> >           - 43.07% blkdev_read_iter
> >              - generic_file_read_iter
> >                 - 43.04% blkdev_direct_IO
> >                    - 42.95% submit_bio_noacct
> >                       - 42.23% brd_submit_bio
> >                          - 41.91% raid1_end_read_request
> >                             - 41.70% raid_end_bio_io
> >                                - 41.43% __wake_up_common_lock
> >                                   + 41.36% _raw_spin_lock_irqsave
> >                       - 0.68% md_submit_bio
> >                            0.61% md_handle_request
> > +   94.90%     0.00%  fio      [kernel.kallsyms]       [k] __wake_up_common_lock
> > +   94.86%     0.22%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irqsave
> > +   94.64%    94.64%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
> > +   79.63%     0.02%  fio      [kernel.kallsyms]       [k] submit_bio_noacct
>
> This looks familiar... Perhaps can you try to test with raid10 with
> latest mainline kernel? I used to optimize spin_lock for raid10, and I
> don't do this for raid1 yet... I can try to do the same thing for raid1
> if it's valuable.

Hi Kuai

Which patch?

I have a try on raid10. The results are:

raid10
READ: bw=3711MiB/s (3892MB/s)
WRITE: bw=1590MiB/s (1667MB/s)

raid0
READ: bw=5610MiB/s (5882MB/s)
WRITE: bw=2405MiB/s (2521MB/s)

ram0
READ: bw=5468MiB/s (5734MB/s)
WRITE: bw=2343MiB/s (2457MB/s)

Because raid10 has a function like raid0. So I did a test on raid0
too. There is a performance gap between raid10 and ram disk too. The
strange thing is that raid0 doesn't have a big performance
improvement.

Regards
Xiao



>
> >
> >
> > FIO configuration file:
> >
> > [global]
> > name=random reads and writes
> > ioengine=libaio
> > direct=1
> > readwrite=randrw
> > rwmixread=70
> > iodepth=64
> > buffered=0
> > #filename=/dev/ram0
> > filename=/dev/dm/test
> > size=1G
> > runtime=30
> > time_based
> > randrepeat=0
> > norandommap
> > refill_buffers
> > ramp_time=10
> > bs=4k
> > numjobs=400
>
> 400 is too aggressive, I think spin_lock from fast path is probably
> causing the problem, same as I met before for raid10...
>
> Thanks,
> Kuai
>
> > group_reporting=1
> > [job1]
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16  2:14   ` Xiao Ni
@ 2023-06-16  2:34     ` Yu Kuai
  2023-06-16  5:52     ` Ali Gholami Rudi
       [not found]     ` <20231606091224@laper.mirepesht>
  2 siblings, 0 replies; 30+ messages in thread
From: Yu Kuai @ 2023-06-16  2:34 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai; +Cc: Ali Gholami Rudi, linux-raid, song, yukuai (C)

Hi,

在 2023/06/16 10:14, Xiao Ni 写道:
> On Thu, Jun 15, 2023 at 10:06 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/06/15 15:54, Ali Gholami Rudi 写道:
>>> Perf output:
>>>
>>> Samples: 1M of event 'cycles', Event count (approx.): 1158425235997
>>>     Children      Self  Command  Shared Object           Symbol
>>> +   97.98%     0.01%  fio      fio                     [.] fio_libaio_commit
>>> +   97.95%     0.01%  fio      libaio.so.1.0.1         [.] io_submit
>>> +   97.85%     0.01%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
>>> -   97.82%     0.01%  fio      [kernel.kallsyms]       [k] io_submit_one
>>>      - 97.81% io_submit_one
>>>         - 54.62% aio_write
>>>            - 54.60% blkdev_write_iter
>>>               - 36.30% blk_finish_plug
>>>                  - flush_plug_callbacks
>>>                     - 36.29% raid1_unplug
>>>                        - flush_bio_list
>>>                           - 18.44% submit_bio_noacct
>>>                              - 18.40% brd_submit_bio
>>>                                 - 18.13% raid1_end_write_request
>>>                                    - 17.94% raid_end_bio_io
>>>                                       - 17.82% __wake_up_common_lock
>>>                                          + 17.79% _raw_spin_lock_irqsave
>>>                           - 17.79% __wake_up_common_lock
>>>                              + 17.76% _raw_spin_lock_irqsave
>>>               + 18.29% __generic_file_write_iter
>>>         - 43.12% aio_read
>>>            - 43.07% blkdev_read_iter
>>>               - generic_file_read_iter
>>>                  - 43.04% blkdev_direct_IO
>>>                     - 42.95% submit_bio_noacct
>>>                        - 42.23% brd_submit_bio
>>>                           - 41.91% raid1_end_read_request
>>>                              - 41.70% raid_end_bio_io
>>>                                 - 41.43% __wake_up_common_lock
>>>                                    + 41.36% _raw_spin_lock_irqsave
>>>                        - 0.68% md_submit_bio
>>>                             0.61% md_handle_request
>>> +   94.90%     0.00%  fio      [kernel.kallsyms]       [k] __wake_up_common_lock
>>> +   94.86%     0.22%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irqsave
>>> +   94.64%    94.64%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
>>> +   79.63%     0.02%  fio      [kernel.kallsyms]       [k] submit_bio_noacct
>>
>> This looks familiar... Perhaps can you try to test with raid10 with
>> latest mainline kernel? I used to optimize spin_lock for raid10, and I
>> don't do this for raid1 yet... I can try to do the same thing for raid1
>> if it's valuable.
> 
> Hi Kuai
> 
> Which patch?

https://lore.kernel.org/lkml/fbf58ad3-2eff-06df-6426-3b4629408e94@linux.dev/T/

> 
> I have a try on raid10. The results are:

Do your create the arry with 4 ramdisk?
> 
> raid10
> READ: bw=3711MiB/s (3892MB/s)
> WRITE: bw=1590MiB/s (1667MB/s)
> 
> raid0
> READ: bw=5610MiB/s (5882MB/s)
> WRITE: bw=2405MiB/s (2521MB/s)
> 
> ram0
> READ: bw=5468MiB/s (5734MB/s)
> WRITE: bw=2343MiB/s (2457MB/s)

I think spin_lock from fastpath in your machine probably doesn't matter
much. I'll be helpful if Ali can try to test raid10...

Thanks,
Kuai
> 
> Because raid10 has a function like raid0. So I did a test on raid0
> too. There is a performance gap between raid10 and ram disk too. The
> strange thing is that raid0 doesn't have a big performance
> improvement.
> 
> Regards
> Xiao
> 
> 
> 
>>
>>>
>>>
>>> FIO configuration file:
>>>
>>> [global]
>>> name=random reads and writes
>>> ioengine=libaio
>>> direct=1
>>> readwrite=randrw
>>> rwmixread=70
>>> iodepth=64
>>> buffered=0
>>> #filename=/dev/ram0
>>> filename=/dev/dm/test
>>> size=1G
>>> runtime=30
>>> time_based
>>> randrepeat=0
>>> norandommap
>>> refill_buffers
>>> ramp_time=10
>>> bs=4k
>>> numjobs=400
>>
>> 400 is too aggressive, I think spin_lock from fast path is probably
>> causing the problem, same as I met before for raid10...
>>
>> Thanks,
>> Kuai
>>
>>> group_reporting=1
>>> [job1]
>>>
>>> .
>>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16  1:53       ` Xiao Ni
@ 2023-06-16  5:20         ` Ali Gholami Rudi
  0 siblings, 0 replies; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-16  5:20 UTC (permalink / raw)
  To: Xiao Ni; +Cc: linux-raid, song

Hi Xiao,

Xiao Ni <xni@redhat.com> wrote:
> > Ali Gholami Rudi <aligrudi@gmail.com> wrote:
> > > Xiao Ni <xni@redhat.com> wrote:
> > > > Because it can be reproduced easily in your environment. Can you try
> > > > with the latest upstream kernel? If the problem doesn't exist with
> > > > latest upstream kernel. You can use git bisect to find which patch can
> > > > fix this problem.
> > >
> > > I just tried the upstream.  I get almost the same result with 1G ramdisks.
> > >
> > > Without RAID (writing to /dev/ram0)
> > > READ:  IOPS=15.8M BW=60.3GiB/s
> > > WRITE: IOPS= 6.8M BW=27.7GiB/s
> > >
> > > RAID1 (writing to /dev/md/test)
> > > READ:  IOPS=518K BW=2028MiB/s
> > > WRITE: IOPS=222K BW= 912MiB/s
> 
> I can reproduce this with upstream kernel too.
> 
> RAID1
> READ: bw=3699MiB/s (3879MB/s)
> WRITE: bw=1586MiB/s (1663MB/s)
> 
> ram disk:
> READ: bw=5720MiB/s (5997MB/s)
> WRITE: bw=2451MiB/s (2570MB/s)
> 
> There is a performance problem. But not like your result. Your result
> has a huge gap. I'm not sure the reason. Any thoughts?

It may be the number of cores; in my setup there are 128 cores (256
threads).  If I understand it correctly, the problem is that
wakeup(...->wait_barrier) is called unconditionally at the same time
by many cores in functions like allow_barrier and flush_bio_list,
while no task is waiting in the task queue.  This results in a
lock contention for wq_head->lock in __wakekup_common_lock().

> > And this is perf's output:
> 
> I'm not familiar with perf, what's your command that I can use to see
> the same output?

perf record --call-graph dwarf fio fio.test

perf report

Thanks,
Ali


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16  2:14   ` Xiao Ni
  2023-06-16  2:34     ` Yu Kuai
@ 2023-06-16  5:52     ` Ali Gholami Rudi
       [not found]     ` <20231606091224@laper.mirepesht>
  2 siblings, 0 replies; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-16  5:52 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)


On Thu, Jun 15, 2023 at 10:06 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > > FIO configuration file:
> > >
> > > [global]
> > > name=random reads and writes
> > > ioengine=libaio
> > > direct=1
> > > readwrite=randrw
> > > rwmixread=70
> > > iodepth=64
> > > buffered=0
> > > #filename=/dev/ram0
> > > filename=/dev/dm/test
> > > size=1G
> > > runtime=30
> > > time_based
> > > randrepeat=0
> > > norandommap
> > > refill_buffers
> > > ramp_time=10
> > > bs=4k
> > > numjobs=400
> >
> > 400 is too aggressive, I think spin_lock from fast path is probably
> > causing the problem, same as I met before for raid10...

In our workload, we run about this many KVM guests on one machine, and
when many of the VMs use their disks, we experienced almost the same
problem with raid1.

Thanks,
Ali


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
       [not found]     ` <20231606091224@laper.mirepesht>
@ 2023-06-16  7:31       ` Ali Gholami Rudi
  2023-06-16  7:42         ` Yu Kuai
  0 siblings, 1 reply; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-16  7:31 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)


Ali Gholami Rudi <aligrudi@gmail.com> wrote:
> Xiao Ni <xni@redhat.com> wrote:
> > On Thu, Jun 15, 2023 at 10:06 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > > This looks familiar... Perhaps can you try to test with raid10 with
> > > latest mainline kernel? I used to optimize spin_lock for raid10, and I
> > > don't do this for raid1 yet... I can try to do the same thing for raid1
> > > if it's valuable.
> 
> I do get improvements with raid10:
> 
> Without RAID (writing to /dev/ram0)
> READ:  IOPS=15.8M BW=60.3GiB/s
> WRITE: IOPS= 6.8M BW=27.7GiB/s
> 
> RAID1 (writing to /dev/md/test)
> READ:  IOPS=518K BW=2028MiB/s
> WRITE: IOPS=222K BW= 912MiB/s
> 
> RAID10 (writing to /dev/md/test)
> READ:  IOPS=2033k BW=8329MB/s
> WRITE: IOPS= 871k BW=3569MB/s
> 
> raid10 is about four times faster than raid1.

And this is perf's output for raid10:

+   97.33%     0.04%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
+   96.96%     0.02%  fio      [kernel.kallsyms]       [k] do_syscall_64
+   94.43%     0.03%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
-   93.71%     0.04%  fio      [kernel.kallsyms]       [k] io_submit_one
   - 93.67% io_submit_one
      - 76.03% aio_write
         - 75.53% blkdev_write_iter
            - 68.95% blk_finish_plug
               - flush_plug_callbacks
                  - 68.93% raid10_unplug
                     - 64.31% __wake_up_common_lock
                        - 64.17% _raw_spin_lock_irqsave
                             native_queued_spin_lock_slowpath
                     - 4.43% submit_bio_noacct_nocheck
                        - 4.42% __submit_bio
                           - 2.28% raid10_end_write_request
                              - 0.82% raid_end_bio_io
                                   0.82% allow_barrier
                             2.09% brd_submit_bio
            - 6.41% __generic_file_write_iter
               - 6.08% generic_file_direct_write
                  - 5.64% __blkdev_direct_IO_async
                     - 4.72% submit_bio_noacct_nocheck
                        - 4.69% __submit_bio
                           - 4.67% md_handle_request
                              - 4.66% raid10_make_request
                                   2.59% raid10_write_one_disk
      - 16.14% aio_read
         - 15.07% blkdev_read_iter
            - 14.16% __blkdev_direct_IO_async
               - 11.36% submit_bio_noacct_nocheck
                  - 11.17% __submit_bio
                     - 5.89% md_handle_request
                        - 5.84% raid10_make_request
                           + 4.18% raid10_read_request
                     - 3.74% raid10_end_read_request
                        - 2.04% raid_end_bio_io
                             1.46% allow_barrier
                          0.55% mempool_free
                       1.39% brd_submit_bio
               - 1.33% bio_iov_iter_get_pages
                  - 1.00% iov_iter_get_pages
                     - __iov_iter_get_pages_alloc
                        - 0.85% get_user_pages_fast
                             0.75% internal_get_user_pages_fast
                 0.93% bio_alloc_bioset
              0.65% filemap_write_and_wait_range
+   88.31%     0.86%  fio      fio                     [.] thread_main
+   87.69%     0.00%  fio      [unknown]               [k] 0xffffffffffffffff
+   87.60%     0.00%  fio      fio                     [.] run_threads
+   87.31%     0.00%  fio      fio                     [.] do_io (inlined)
+   86.60%     0.32%  fio      libc-2.31.so            [.] syscall
+   85.87%     0.52%  fio      fio                     [.] td_io_queue
+   85.49%     0.18%  fio      fio                     [.] fio_libaio_commit
+   85.45%     0.14%  fio      fio                     [.] td_io_commit
+   85.37%     0.11%  fio      libaio.so.1.0.1         [.] io_submit
+   85.35%     0.00%  fio      fio                     [.] io_u_submit (inlined)
+   76.04%     0.01%  fio      [kernel.kallsyms]       [k] aio_write
+   75.54%     0.01%  fio      [kernel.kallsyms]       [k] blkdev_write_iter
+   68.96%     0.00%  fio      [kernel.kallsyms]       [k] blk_finish_plug
+   68.95%     0.00%  fio      [kernel.kallsyms]       [k] flush_plug_callbacks
+   68.94%     0.13%  fio      [kernel.kallsyms]       [k] raid10_unplug
+   64.41%     0.03%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irqsave
+   64.32%     0.01%  fio      [kernel.kallsyms]       [k] __wake_up_common_lock
+   64.05%    63.85%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
+   21.05%     1.51%  fio      [kernel.kallsyms]       [k] submit_bio_noacct_nocheck
+   20.97%     1.18%  fio      [kernel.kallsyms]       [k] __blkdev_direct_IO_async
+   20.29%     0.03%  fio      [kernel.kallsyms]       [k] __submit_bio
+   16.15%     0.02%  fio      [kernel.kallsyms]       [k] aio_read
..

Thanks,
Ali


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16  7:31       ` Ali Gholami Rudi
@ 2023-06-16  7:42         ` Yu Kuai
  2023-06-16  8:21           ` Ali Gholami Rudi
  0 siblings, 1 reply; 30+ messages in thread
From: Yu Kuai @ 2023-06-16  7:42 UTC (permalink / raw)
  To: Ali Gholami Rudi, Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)

Hi,

在 2023/06/16 15:31, Ali Gholami Rudi 写道:
> 
> Ali Gholami Rudi <aligrudi@gmail.com> wrote:
>> Xiao Ni <xni@redhat.com> wrote:
>>> On Thu, Jun 15, 2023 at 10:06 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>> This looks familiar... Perhaps can you try to test with raid10 with
>>>> latest mainline kernel? I used to optimize spin_lock for raid10, and I
>>>> don't do this for raid1 yet... I can try to do the same thing for raid1
>>>> if it's valuable.
>>
>> I do get improvements with raid10:
>>
>> Without RAID (writing to /dev/ram0)
>> READ:  IOPS=15.8M BW=60.3GiB/s
>> WRITE: IOPS= 6.8M BW=27.7GiB/s
>>
>> RAID1 (writing to /dev/md/test)
>> READ:  IOPS=518K BW=2028MiB/s
>> WRITE: IOPS=222K BW= 912MiB/s
>>
>> RAID10 (writing to /dev/md/test)
>> READ:  IOPS=2033k BW=8329MB/s
>> WRITE: IOPS= 871k BW=3569MB/s
>>
>> raid10 is about four times faster than raid1.
> 
> And this is perf's output for raid10:
> 
> +   97.33%     0.04%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> +   96.96%     0.02%  fio      [kernel.kallsyms]       [k] do_syscall_64
> +   94.43%     0.03%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> -   93.71%     0.04%  fio      [kernel.kallsyms]       [k] io_submit_one
>     - 93.67% io_submit_one
>        - 76.03% aio_write
>           - 75.53% blkdev_write_iter
>              - 68.95% blk_finish_plug
>                 - flush_plug_callbacks
>                    - 68.93% raid10_unplug
>                       - 64.31% __wake_up_common_lock
>                          - 64.17% _raw_spin_lock_irqsave
>                               native_queued_spin_lock_slowpath

This is unexpected, can you check if your kernel contain following
patch?

commit 460af1f9d9e62acce4a21f9bd00b5bcd5963bcd4
Author: Yu Kuai <yukuai3@huawei.com>
Date:   Mon May 29 21:11:06 2023 +0800

     md/raid1-10: limit the number of plugged bio

If so, can you retest with following patch?

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d0de8c9fb3cf..6fdd99c3e59a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -911,7 +911,7 @@ static void flush_pending_writes(struct r10conf *conf)

                 blk_start_plug(&plug);
                 raid1_prepare_flush_writes(conf->mddev->bitmap);
-               wake_up(&conf->wait_barrier);
+               wake_up_barrier(&conf->wait_barrier);

                 while (bio) { /* submit pending writes */
                         struct bio *next = bio->bi_next;

Thanks,
Kuai
>                       - 4.43% submit_bio_noacct_nocheck
>                          - 4.42% __submit_bio
>                             - 2.28% raid10_end_write_request
>                                - 0.82% raid_end_bio_io
>                                     0.82% allow_barrier
>                               2.09% brd_submit_bio
>              - 6.41% __generic_file_write_iter
>                 - 6.08% generic_file_direct_write
>                    - 5.64% __blkdev_direct_IO_async
>                       - 4.72% submit_bio_noacct_nocheck
>                          - 4.69% __submit_bio
>                             - 4.67% md_handle_request
>                                - 4.66% raid10_make_request
>                                     2.59% raid10_write_one_disk
>        - 16.14% aio_read
>           - 15.07% blkdev_read_iter
>              - 14.16% __blkdev_direct_IO_async
>                 - 11.36% submit_bio_noacct_nocheck
>                    - 11.17% __submit_bio
>                       - 5.89% md_handle_request
>                          - 5.84% raid10_make_request
>                             + 4.18% raid10_read_request
>                       - 3.74% raid10_end_read_request
>                          - 2.04% raid_end_bio_io
>                               1.46% allow_barrier
>                            0.55% mempool_free
>                         1.39% brd_submit_bio
>                 - 1.33% bio_iov_iter_get_pages
>                    - 1.00% iov_iter_get_pages
>                       - __iov_iter_get_pages_alloc
>                          - 0.85% get_user_pages_fast
>                               0.75% internal_get_user_pages_fast
>                   0.93% bio_alloc_bioset
>                0.65% filemap_write_and_wait_range
> +   88.31%     0.86%  fio      fio                     [.] thread_main
> +   87.69%     0.00%  fio      [unknown]               [k] 0xffffffffffffffff
> +   87.60%     0.00%  fio      fio                     [.] run_threads
> +   87.31%     0.00%  fio      fio                     [.] do_io (inlined)
> +   86.60%     0.32%  fio      libc-2.31.so            [.] syscall
> +   85.87%     0.52%  fio      fio                     [.] td_io_queue
> +   85.49%     0.18%  fio      fio                     [.] fio_libaio_commit
> +   85.45%     0.14%  fio      fio                     [.] td_io_commit
> +   85.37%     0.11%  fio      libaio.so.1.0.1         [.] io_submit
> +   85.35%     0.00%  fio      fio                     [.] io_u_submit (inlined)
> +   76.04%     0.01%  fio      [kernel.kallsyms]       [k] aio_write
> +   75.54%     0.01%  fio      [kernel.kallsyms]       [k] blkdev_write_iter
> +   68.96%     0.00%  fio      [kernel.kallsyms]       [k] blk_finish_plug
> +   68.95%     0.00%  fio      [kernel.kallsyms]       [k] flush_plug_callbacks
> +   68.94%     0.13%  fio      [kernel.kallsyms]       [k] raid10_unplug
> +   64.41%     0.03%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irqsave
> +   64.32%     0.01%  fio      [kernel.kallsyms]       [k] __wake_up_common_lock
> +   64.05%    63.85%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
> +   21.05%     1.51%  fio      [kernel.kallsyms]       [k] submit_bio_noacct_nocheck
> +   20.97%     1.18%  fio      [kernel.kallsyms]       [k] __blkdev_direct_IO_async
> +   20.29%     0.03%  fio      [kernel.kallsyms]       [k] __submit_bio
> +   16.15%     0.02%  fio      [kernel.kallsyms]       [k] aio_read
> ..
> 
> Thanks,
> Ali
> 
> .
> 


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16  7:42         ` Yu Kuai
@ 2023-06-16  8:21           ` Ali Gholami Rudi
  2023-06-16  8:34             ` Yu Kuai
  0 siblings, 1 reply; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-16  8:21 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)

Hi,

Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > And this is perf's output for raid10:
> > 
> > +   97.33%     0.04%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> > +   96.96%     0.02%  fio      [kernel.kallsyms]       [k] do_syscall_64
> > +   94.43%     0.03%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> > -   93.71%     0.04%  fio      [kernel.kallsyms]       [k] io_submit_one
> >     - 93.67% io_submit_one
> >        - 76.03% aio_write
> >           - 75.53% blkdev_write_iter
> >              - 68.95% blk_finish_plug
> >                 - flush_plug_callbacks
> >                    - 68.93% raid10_unplug
> >                       - 64.31% __wake_up_common_lock
> >                          - 64.17% _raw_spin_lock_irqsave
> >                               native_queued_spin_lock_slowpath
> 
> This is unexpected, can you check if your kernel contain following
> patch?
> 
> commit 460af1f9d9e62acce4a21f9bd00b5bcd5963bcd4
> Author: Yu Kuai <yukuai3@huawei.com>
> Date:   Mon May 29 21:11:06 2023 +0800
> 
>      md/raid1-10: limit the number of plugged bio
> 
> If so, can you retest with following patch?
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index d0de8c9fb3cf..6fdd99c3e59a 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -911,7 +911,7 @@ static void flush_pending_writes(struct r10conf *conf)
> 
>                  blk_start_plug(&plug);
>                  raid1_prepare_flush_writes(conf->mddev->bitmap);
> -               wake_up(&conf->wait_barrier);
> +               wake_up_barrier(&conf->wait_barrier);
> 
>                  while (bio) { /* submit pending writes */
>                          struct bio *next = bio->bi_next;

No, this patch was not present.  I applied this one:

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 4fcfcb350d2b..52f0c24128ff 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -905,7 +905,7 @@ static void flush_pending_writes(struct r10conf *conf)
 		/* flush any pending bitmap writes to disk
 		 * before proceeding w/ I/O */
 		md_bitmap_unplug(conf->mddev->bitmap);
-		wake_up(&conf->wait_barrier);
+		wake_up_barrier(conf);
 
 		while (bio) { /* submit pending writes */
 			struct bio *next = bio->bi_next;

I get almost the same result as before nevertheless:

Without the patch:
READ:  IOPS=2033k BW=8329MB/s
WRITE: IOPS= 871k BW=3569MB/s

With the patch:
READ:  IOPS=2027K BW=7920MiB/s
WRITE: IOPS= 869K BW=3394MiB/s

Perf:

+   96.23%     0.04%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
+   95.86%     0.02%  fio      [kernel.kallsyms]       [k] do_syscall_64
+   94.30%     0.03%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
-   93.63%     0.04%  fio      [kernel.kallsyms]       [k] io_submit_one
   - 93.58% io_submit_one
      - 76.44% aio_write
         - 75.97% blkdev_write_iter
            - 70.17% blk_finish_plug
               - flush_plug_callbacks
                  - 70.15% raid10_unplug
                     - 66.12% __wake_up_common_lock
                        - 65.97% _raw_spin_lock_irqsave
                             65.57% native_queued_spin_lock_slowpath
                     - 3.85% submit_bio_noacct_nocheck
                        - 3.84% __submit_bio
                           - 2.09% raid10_end_write_request
                              - 0.83% raid_end_bio_io
                                   0.82% allow_barrier
                             1.70% brd_submit_bio
            + 5.59% __generic_file_write_iter
      + 15.71% aio_read
+   88.38%     0.71%  fio      fio                     [.] thread_main
+   87.89%     0.00%  fio      [unknown]               [k] 0xffffffffffffffff
+   87.81%     0.00%  fio      fio                     [.] run_threads
+   87.54%     0.00%  fio      fio                     [.] do_io (inlined)
+   86.79%     0.31%  fio      libc-2.31.so            [.] syscall
+   86.19%     0.54%  fio      fio                     [.] td_io_queue
+   85.79%     0.18%  fio      fio                     [.] fio_libaio_commit
+   85.76%     0.14%  fio      fio                     [.] td_io_commit
+   85.69%     0.14%  fio      libaio.so.1.0.1         [.] io_submit
+   85.66%     0.00%  fio      fio                     [.] io_u_submit (inlined)
+   76.45%     0.01%  fio      [kernel.kallsyms]       [k] aio_write
..


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16  8:21           ` Ali Gholami Rudi
@ 2023-06-16  8:34             ` Yu Kuai
  2023-06-16  8:52               ` Ali Gholami Rudi
  0 siblings, 1 reply; 30+ messages in thread
From: Yu Kuai @ 2023-06-16  8:34 UTC (permalink / raw)
  To: Ali Gholami Rudi, Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)

Hi,

在 2023/06/16 16:21, Ali Gholami Rudi 写道:
> Hi,
> 
> Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>> And this is perf's output for raid10:
>>>
>>> +   97.33%     0.04%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
>>> +   96.96%     0.02%  fio      [kernel.kallsyms]       [k] do_syscall_64
>>> +   94.43%     0.03%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
>>> -   93.71%     0.04%  fio      [kernel.kallsyms]       [k] io_submit_one
>>>      - 93.67% io_submit_one
>>>         - 76.03% aio_write
>>>            - 75.53% blkdev_write_iter
>>>               - 68.95% blk_finish_plug
>>>                  - flush_plug_callbacks
>>>                     - 68.93% raid10_unplug
>>>                        - 64.31% __wake_up_common_lock
>>>                           - 64.17% _raw_spin_lock_irqsave
>>>                                native_queued_spin_lock_slowpath
>>
>> This is unexpected, can you check if your kernel contain following
>> patch?
>>
>> commit 460af1f9d9e62acce4a21f9bd00b5bcd5963bcd4
>> Author: Yu Kuai <yukuai3@huawei.com>
>> Date:   Mon May 29 21:11:06 2023 +0800
>>
>>       md/raid1-10: limit the number of plugged bio
>>
>> If so, can you retest with following patch?
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index d0de8c9fb3cf..6fdd99c3e59a 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -911,7 +911,7 @@ static void flush_pending_writes(struct r10conf *conf)
>>
>>                   blk_start_plug(&plug);
>>                   raid1_prepare_flush_writes(conf->mddev->bitmap);
>> -               wake_up(&conf->wait_barrier);
>> +               wake_up_barrier(&conf->wait_barrier);
>>
>>                   while (bio) { /* submit pending writes */
>>                           struct bio *next = bio->bi_next;
> 
> No, this patch was not present.  I applied this one:
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 4fcfcb350d2b..52f0c24128ff 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -905,7 +905,7 @@ static void flush_pending_writes(struct r10conf *conf)
>   		/* flush any pending bitmap writes to disk
>   		 * before proceeding w/ I/O */
>   		md_bitmap_unplug(conf->mddev->bitmap);
> -		wake_up(&conf->wait_barrier);
> +		wake_up_barrier(conf);
>   
>   		while (bio) { /* submit pending writes */
>   			struct bio *next = bio->bi_next;

Thanks for the testing, sorry that I missed one place... Can you try to
change wake_up() to wake_up_barrier() from raid10_unplug() and test
again?

Thanks,
Kuai

> 
> I get almost the same result as before nevertheless:
> 
> Without the patch:
> READ:  IOPS=2033k BW=8329MB/s
> WRITE: IOPS= 871k BW=3569MB/s
> 
> With the patch:
> READ:  IOPS=2027K BW=7920MiB/s
> WRITE: IOPS= 869K BW=3394MiB/s
> 
> Perf:
> 
> +   96.23%     0.04%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> +   95.86%     0.02%  fio      [kernel.kallsyms]       [k] do_syscall_64
> +   94.30%     0.03%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> -   93.63%     0.04%  fio      [kernel.kallsyms]       [k] io_submit_one
>     - 93.58% io_submit_one
>        - 76.44% aio_write
>           - 75.97% blkdev_write_iter
>              - 70.17% blk_finish_plug
>                 - flush_plug_callbacks
>                    - 70.15% raid10_unplug
>                       - 66.12% __wake_up_common_lock
>                          - 65.97% _raw_spin_lock_irqsave
>                               65.57% native_queued_spin_lock_slowpath
>                       - 3.85% submit_bio_noacct_nocheck
>                          - 3.84% __submit_bio
>                             - 2.09% raid10_end_write_request
>                                - 0.83% raid_end_bio_io
>                                     0.82% allow_barrier
>                               1.70% brd_submit_bio
>              + 5.59% __generic_file_write_iter
>        + 15.71% aio_read
> +   88.38%     0.71%  fio      fio                     [.] thread_main
> +   87.89%     0.00%  fio      [unknown]               [k] 0xffffffffffffffff
> +   87.81%     0.00%  fio      fio                     [.] run_threads
> +   87.54%     0.00%  fio      fio                     [.] do_io (inlined)
> +   86.79%     0.31%  fio      libc-2.31.so            [.] syscall
> +   86.19%     0.54%  fio      fio                     [.] td_io_queue
> +   85.79%     0.18%  fio      fio                     [.] fio_libaio_commit
> +   85.76%     0.14%  fio      fio                     [.] td_io_commit
> +   85.69%     0.14%  fio      libaio.so.1.0.1         [.] io_submit
> +   85.66%     0.00%  fio      fio                     [.] io_u_submit (inlined)
> +   76.45%     0.01%  fio      [kernel.kallsyms]       [k] aio_write
> ..
> 
> .
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16  8:34             ` Yu Kuai
@ 2023-06-16  8:52               ` Ali Gholami Rudi
  2023-06-16  9:17                 ` Yu Kuai
  2023-06-16 11:51                 ` Ali Gholami Rudi
  0 siblings, 2 replies; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-16  8:52 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)


Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index 4fcfcb350d2b..52f0c24128ff 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -905,7 +905,7 @@ static void flush_pending_writes(struct r10conf *conf)
> >   		/* flush any pending bitmap writes to disk
> >   		 * before proceeding w/ I/O */
> >   		md_bitmap_unplug(conf->mddev->bitmap);
> > -		wake_up(&conf->wait_barrier);
> > +		wake_up_barrier(conf);
> >   
> >   		while (bio) { /* submit pending writes */
> >   			struct bio *next = bio->bi_next;
> 
> Thanks for the testing, sorry that I missed one place... Can you try to
> change wake_up() to wake_up_barrier() from raid10_unplug() and test
> again?

OK.  I replaced only the second occurrence of wake_up() in raid10_unplug().

> > Without the patch:
> > READ:  IOPS=2033k BW=8329MB/s
> > WRITE: IOPS= 871k BW=3569MB/s
> > 
> > With the patch:
> > READ:  IOPS=2027K BW=7920MiB/s
> > WRITE: IOPS= 869K BW=3394MiB/s

With the second patch:
READ:  IOPS=3642K BW=13900MiB/s
WRITE: IOPS=1561K BW= 6097MiB/s

That is impressive.  Great job.

I shall test it more.

Thanks,
Ali


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16  8:52               ` Ali Gholami Rudi
@ 2023-06-16  9:17                 ` Yu Kuai
  2023-06-16 11:51                 ` Ali Gholami Rudi
  1 sibling, 0 replies; 30+ messages in thread
From: Yu Kuai @ 2023-06-16  9:17 UTC (permalink / raw)
  To: Ali Gholami Rudi, Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)

Hi,

在 2023/06/16 16:52, Ali Gholami Rudi 写道:
> 
> Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>> index 4fcfcb350d2b..52f0c24128ff 100644
>>> --- a/drivers/md/raid10.c
>>> +++ b/drivers/md/raid10.c
>>> @@ -905,7 +905,7 @@ static void flush_pending_writes(struct r10conf *conf)
>>>    		/* flush any pending bitmap writes to disk
>>>    		 * before proceeding w/ I/O */
>>>    		md_bitmap_unplug(conf->mddev->bitmap);
>>> -		wake_up(&conf->wait_barrier);
>>> +		wake_up_barrier(conf);
>>>    
>>>    		while (bio) { /* submit pending writes */
>>>    			struct bio *next = bio->bi_next;
>>
>> Thanks for the testing, sorry that I missed one place... Can you try to
>> change wake_up() to wake_up_barrier() from raid10_unplug() and test
>> again?
> 
> OK.  I replaced only the second occurrence of wake_up() in raid10_unplug().

I think it's better to change them together.

> 
>>> Without the patch:
>>> READ:  IOPS=2033k BW=8329MB/s
>>> WRITE: IOPS= 871k BW=3569MB/s
>>>
>>> With the patch:
>>> READ:  IOPS=2027K BW=7920MiB/s
>>> WRITE: IOPS= 869K BW=3394MiB/s
> 
> With the second patch:
> READ:  IOPS=3642K BW=13900MiB/s
> WRITE: IOPS=1561K BW= 6097MiB/s
> 
> That is impressive.  Great job.

Good, thanks for testing, can you please show perf result as well, I'd
like to check if there are other obvoius bottleneck.

By the way, I think raid1 can definitly benifit from same optimizations,
I'll look into raid1.

Thanks,
Kuai

> 
> I shall test it more.
> 
> Thanks,
> Ali
> 
> .
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16  8:52               ` Ali Gholami Rudi
  2023-06-16  9:17                 ` Yu Kuai
@ 2023-06-16 11:51                 ` Ali Gholami Rudi
  2023-06-16 12:27                   ` Yu Kuai
  1 sibling, 1 reply; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-16 11:51 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)


Ali Gholami Rudi <aligrudi@gmail.com> wrote:
> Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > > index 4fcfcb350d2b..52f0c24128ff 100644
> > > --- a/drivers/md/raid10.c
> > > +++ b/drivers/md/raid10.c
> > > @@ -905,7 +905,7 @@ static void flush_pending_writes(struct r10conf *conf)
> > >   		/* flush any pending bitmap writes to disk
> > >   		 * before proceeding w/ I/O */
> > >   		md_bitmap_unplug(conf->mddev->bitmap);
> > > -		wake_up(&conf->wait_barrier);
> > > +		wake_up_barrier(conf);
> > >   
> > >   		while (bio) { /* submit pending writes */
> > >   			struct bio *next = bio->bi_next;
> > 
> > Thanks for the testing, sorry that I missed one place... Can you try to
> > change wake_up() to wake_up_barrier() from raid10_unplug() and test
> > again?
> 
> OK.  I replaced the second occurrence of wake_up().
> 
> > > Without the patch:
> > > READ:  IOPS=2033k BW=8329MB/s
> > > WRITE: IOPS= 871k BW=3569MB/s
> > > 
> > > With the patch:
> > > READ:  IOPS=2027K BW=7920MiB/s
> > > WRITE: IOPS= 869K BW=3394MiB/s
> 
> With the second patch:
> READ:  IOPS=3642K BW=13900MiB/s
> WRITE: IOPS=1561K BW= 6097MiB/s
> 
> That is impressive.  Great job.

Perf's output:

+   93.79%     0.09%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
+   92.89%     0.05%  fio      [kernel.kallsyms]       [k] do_syscall_64
+   86.59%     0.07%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
-   85.61%     0.10%  fio      [kernel.kallsyms]       [k] io_submit_one
   - 85.51% io_submit_one
      - 47.98% aio_read
         - 46.18% blkdev_read_iter
            - 44.90% __blkdev_direct_IO_async
               - 41.68% submit_bio_noacct_nocheck
                  - 41.50% __submit_bio
                     - 18.76% md_handle_request
                        - 18.71% raid10_make_request
                           - 18.54% raid10_read_request
                                16.54% read_balance
                                1.80% wait_barrier_nolock
                     - 14.18% raid10_end_read_request
                        - 8.16% raid_end_bio_io
                             7.44% allow_barrier
                     - 8.40% brd_submit_bio
                          2.49% __radix_tree_lookup
               - 1.39% bio_iov_iter_get_pages
                  - 1.04% iov_iter_get_pages
                     - __iov_iter_get_pages_alloc
                        - 0.93% get_user_pages_fast
                             0.79% internal_get_user_pages_fast
               - 1.17% bio_alloc_bioset
                    0.59% mempool_alloc
              0.93% filemap_write_and_wait_range
           0.65% security_file_permission
      - 35.21% aio_write
         - 34.53% blkdev_write_iter
            - 18.25% __generic_file_write_iter
               - 17.84% generic_file_direct_write
                  - 17.35% __blkdev_direct_IO_async
                     - 16.34% submit_bio_noacct_nocheck
                        - 16.31% __submit_bio
                           - 16.28% md_handle_request
                              - 16.26% raid10_make_request
                                   9.26% raid10_write_one_disk
                                   2.11% wait_blocked_dev
                                   0.58% wait_barrier_nolock
            - 16.02% blk_finish_plug
               - 16.01% flush_plug_callbacks
                  - 16.00% raid10_unplug
                     - 15.89% submit_bio_noacct_nocheck
                        - 15.84% __submit_bio
                           - 8.66% raid10_end_write_request
                              - 3.55% raid_end_bio_io
                                   3.54% allow_barrier
                                0.72% find_bio_disk.isra.0
                           - 7.04% brd_submit_bio
                                1.38% __radix_tree_lookup
        0.61% kmem_cache_alloc
+   84.41%     0.99%  fio      fio                     [.] thread_main
+   83.79%     0.00%  fio      [unknown]               [.] 0xffffffffffffffff
+   83.60%     0.00%  fio      fio                     [.] run_threads
+   83.32%     0.00%  fio      fio                     [.] do_io (inlined)
+   81.12%     0.43%  fio      libc-2.31.so            [.] syscall
+   76.23%     0.69%  fio      fio                     [.] td_io_queue
+   76.16%     4.66%  fio      [kernel.kallsyms]       [k] submit_bio_noacct_nocheck
+   75.63%     0.25%  fio      fio                     [.] fio_libaio_commit
+   75.57%     0.17%  fio      fio                     [.] td_io_commit
+   75.54%     0.00%  fio      fio                     [.] io_u_submit (inlined)
+   75.33%     0.17%  fio      libaio.so.1.0.1         [.] io_submit
+   73.66%     0.07%  fio      [kernel.kallsyms]       [k] __submit_bio
+   67.30%     5.07%  fio      [kernel.kallsyms]       [k] __blkdev_direct_IO_async
+   48.02%     0.03%  fio      [kernel.kallsyms]       [k] aio_read
+   46.22%     0.05%  fio      [kernel.kallsyms]       [k] blkdev_read_iter
+   35.71%     3.88%  fio      [kernel.kallsyms]       [k] raid10_make_request
+   35.23%     0.02%  fio      [kernel.kallsyms]       [k] aio_write
+   35.08%     0.06%  fio      [kernel.kallsyms]       [k] md_handle_request
+   34.55%     0.02%  fio      [kernel.kallsyms]       [k] blkdev_write_iter
+   20.16%     1.65%  fio      [kernel.kallsyms]       [k] raid10_read_request
+   18.27%     0.01%  fio      [kernel.kallsyms]       [k] __generic_file_write_iter
+   18.02%     3.63%  fio      [kernel.kallsyms]       [k] brd_submit_bio
+   17.86%     0.02%  fio      [kernel.kallsyms]       [k] generic_file_direct_write
+   17.08%    11.16%  fio      [kernel.kallsyms]       [k] read_balance
+   16.24%     0.33%  fio      [kernel.kallsyms]       [k] raid10_unplug
+   16.02%     0.01%  fio      [kernel.kallsyms]       [k] blk_finish_plug
+   16.02%     0.01%  fio      [kernel.kallsyms]       [k] flush_plug_callbacks
+   14.25%     0.26%  fio      [kernel.kallsyms]       [k] raid10_end_read_request
+   12.77%     1.98%  fio      [kernel.kallsyms]       [k] allow_barrier
+   11.74%     0.40%  fio      [kernel.kallsyms]       [k] raid_end_bio_io
+   10.21%     1.99%  fio      [kernel.kallsyms]       [k] raid10_write_one_disk
+    8.85%     1.52%  fio      [kernel.kallsyms]       [k] raid10_end_write_request
     8.06%     6.43%  fio      [kernel.kallsyms]       [k] wait_barrier_nolock
..

Thanks,
Ali


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16 11:51                 ` Ali Gholami Rudi
@ 2023-06-16 12:27                   ` Yu Kuai
  2023-06-18 20:30                     ` Ali Gholami Rudi
  2023-06-21  8:05                     ` Xiao Ni
  0 siblings, 2 replies; 30+ messages in thread
From: Yu Kuai @ 2023-06-16 12:27 UTC (permalink / raw)
  To: Ali Gholami Rudi, Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)

Hi,

在 2023/06/16 19:51, Ali Gholami Rudi 写道:
> 

Thanks for testing!

> Perf's output:
> 
> +   93.79%     0.09%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> +   92.89%     0.05%  fio      [kernel.kallsyms]       [k] do_syscall_64
> +   86.59%     0.07%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> -   85.61%     0.10%  fio      [kernel.kallsyms]       [k] io_submit_one
>     - 85.51% io_submit_one
>        - 47.98% aio_read
>           - 46.18% blkdev_read_iter
>              - 44.90% __blkdev_direct_IO_async
>                 - 41.68% submit_bio_noacct_nocheck
>                    - 41.50% __submit_bio
>                       - 18.76% md_handle_request
>                          - 18.71% raid10_make_request
>                             - 18.54% raid10_read_request
>                                  16.54% read_balance

There is not any spin_lock in fast path anymore. Now, looks like
main cost is raid10 io path now(read_balance looks worth
investigation, 16.54% is too much), and for a real device with ms
io latency, I think latency in io path may not matter.

Thanks,
Kuai
>                                  1.80% wait_barrier_nolock
>                       - 14.18% raid10_end_read_request
>                          - 8.16% raid_end_bio_io
>                               7.44% allow_barrier
>                       - 8.40% brd_submit_bio
>                            2.49% __radix_tree_lookup
>                 - 1.39% bio_iov_iter_get_pages
>                    - 1.04% iov_iter_get_pages
>                       - __iov_iter_get_pages_alloc
>                          - 0.93% get_user_pages_fast
>                               0.79% internal_get_user_pages_fast
>                 - 1.17% bio_alloc_bioset
>                      0.59% mempool_alloc
>                0.93% filemap_write_and_wait_range
>             0.65% security_file_permission
>        - 35.21% aio_write
>           - 34.53% blkdev_write_iter
>              - 18.25% __generic_file_write_iter
>                 - 17.84% generic_file_direct_write
>                    - 17.35% __blkdev_direct_IO_async
>                       - 16.34% submit_bio_noacct_nocheck
>                          - 16.31% __submit_bio
>                             - 16.28% md_handle_request
>                                - 16.26% raid10_make_request
>                                     9.26% raid10_write_one_disk
>                                     2.11% wait_blocked_dev
>                                     0.58% wait_barrier_nolock
>              - 16.02% blk_finish_plug
>                 - 16.01% flush_plug_callbacks
>                    - 16.00% raid10_unplug
>                       - 15.89% submit_bio_noacct_nocheck
>                          - 15.84% __submit_bio
>                             - 8.66% raid10_end_write_request
>                                - 3.55% raid_end_bio_io
>                                     3.54% allow_barrier
>                                  0.72% find_bio_disk.isra.0
>                             - 7.04% brd_submit_bio
>                                  1.38% __radix_tree_lookup
>          0.61% kmem_cache_alloc
> +   84.41%     0.99%  fio      fio                     [.] thread_main
> +   83.79%     0.00%  fio      [unknown]               [.] 0xffffffffffffffff
> +   83.60%     0.00%  fio      fio                     [.] run_threads
> +   83.32%     0.00%  fio      fio                     [.] do_io (inlined)
> +   81.12%     0.43%  fio      libc-2.31.so            [.] syscall
> +   76.23%     0.69%  fio      fio                     [.] td_io_queue
> +   76.16%     4.66%  fio      [kernel.kallsyms]       [k] submit_bio_noacct_nocheck
> +   75.63%     0.25%  fio      fio                     [.] fio_libaio_commit
> +   75.57%     0.17%  fio      fio                     [.] td_io_commit
> +   75.54%     0.00%  fio      fio                     [.] io_u_submit (inlined)
> +   75.33%     0.17%  fio      libaio.so.1.0.1         [.] io_submit
> +   73.66%     0.07%  fio      [kernel.kallsyms]       [k] __submit_bio
> +   67.30%     5.07%  fio      [kernel.kallsyms]       [k] __blkdev_direct_IO_async
> +   48.02%     0.03%  fio      [kernel.kallsyms]       [k] aio_read
> +   46.22%     0.05%  fio      [kernel.kallsyms]       [k] blkdev_read_iter
> +   35.71%     3.88%  fio      [kernel.kallsyms]       [k] raid10_make_request
> +   35.23%     0.02%  fio      [kernel.kallsyms]       [k] aio_write
> +   35.08%     0.06%  fio      [kernel.kallsyms]       [k] md_handle_request
> +   34.55%     0.02%  fio      [kernel.kallsyms]       [k] blkdev_write_iter
> +   20.16%     1.65%  fio      [kernel.kallsyms]       [k] raid10_read_request
> +   18.27%     0.01%  fio      [kernel.kallsyms]       [k] __generic_file_write_iter
> +   18.02%     3.63%  fio      [kernel.kallsyms]       [k] brd_submit_bio
> +   17.86%     0.02%  fio      [kernel.kallsyms]       [k] generic_file_direct_write
> +   17.08%    11.16%  fio      [kernel.kallsyms]       [k] read_balance
> +   16.24%     0.33%  fio      [kernel.kallsyms]       [k] raid10_unplug
> +   16.02%     0.01%  fio      [kernel.kallsyms]       [k] blk_finish_plug
> +   16.02%     0.01%  fio      [kernel.kallsyms]       [k] flush_plug_callbacks
> +   14.25%     0.26%  fio      [kernel.kallsyms]       [k] raid10_end_read_request
> +   12.77%     1.98%  fio      [kernel.kallsyms]       [k] allow_barrier
> +   11.74%     0.40%  fio      [kernel.kallsyms]       [k] raid_end_bio_io
> +   10.21%     1.99%  fio      [kernel.kallsyms]       [k] raid10_write_one_disk
> +    8.85%     1.52%  fio      [kernel.kallsyms]       [k] raid10_end_write_request
>       8.06%     6.43%  fio      [kernel.kallsyms]       [k] wait_barrier_nolock
> ..
> 
> Thanks,
> Ali
> 
> .
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16 12:27                   ` Yu Kuai
@ 2023-06-18 20:30                     ` Ali Gholami Rudi
  2023-06-19  1:22                       ` Yu Kuai
  2023-06-19  5:19                       ` Ali Gholami Rudi
  2023-06-21  8:05                     ` Xiao Ni
  1 sibling, 2 replies; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-18 20:30 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)

Hi,

I tested raid10 with NVMe disks on Debian 12 (Linux 6.1.0), which
includes your first patch only.

The speed was very bad:

READ:  IOPS=360K BW=1412MiB/s
WRITE: IOPS=154K BW= 606MiB/s

Perf's output:

+   98.90%     0.00%  fio      [unknown]               [.] 0xffffffffffffffff
+   98.71%     0.00%  fio      fio                     [.] 0x0000563ae0f62117
+   97.69%     0.02%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
+   97.66%     0.02%  fio      [kernel.kallsyms]       [k] do_syscall_64
+   97.29%     0.00%  fio      fio                     [.] 0x0000563ae0f5fceb
+   97.29%     0.05%  fio      fio                     [.] td_io_queue
+   97.20%     0.01%  fio      fio                     [.] td_io_commit
+   97.20%     0.02%  fio      libc.so.6               [.] syscall
+   96.94%     0.05%  fio      libaio.so.1.0.2         [.] io_submit
+   96.94%     0.00%  fio      fio                     [.] 0x0000563ae0f84e5e
+   96.50%     0.02%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
-   96.44%     0.03%  fio      [kernel.kallsyms]       [k] io_submit_one
   - 96.41% io_submit_one
      - 65.16% aio_read
         - 65.07% xfs_file_read_iter
            - 65.06% xfs_file_dio_read
               - 60.21% iomap_dio_rw
                  - 60.21% __iomap_dio_rw
                     - 49.84% iomap_dio_bio_iter
                        - 49.39% submit_bio_noacct_nocheck
                           - 49.08% __submit_bio
                              - 48.80% md_handle_request
                                 - 48.40% raid10_make_request
                                    - 48.14% raid10_read_request
                                       - 47.63% regular_request_wait
                                          - 47.62% wait_barrier
                                             - 44.17% _raw_spin_lock_irq
                                                  44.14% native_queued_spin_lock_slowpath
                                             - 2.39% schedule
                                                - 2.38% __schedule
                                                   + 1.99% pick_next_task_fair
                     - 9.78% iomap_iter
                        - 9.77% xfs_read_iomap_begin
                           - 9.30% xfs_ilock_for_iomap
                              - 9.29% down_read
                                 - 9.18% rwsem_down_read_slowpath
                                    - 4.67% schedule_preempt_disabled
                                       - 4.67% schedule
                                          - 4.67% __schedule
                                             - 4.08% pick_next_task_fair
                                                - 4.08% newidle_balance
                                                   - 3.94% load_balance
                                                      - 3.60% find_busiest_group
                                                           3.59% update_sd_lb_stats.constprop.0
                                    - 4.12% _raw_spin_lock_irq
                                         4.11% native_queued_spin_lock_slowpath
               + 4.56% touch_atime
      - 31.12% aio_write
         - 31.06% xfs_file_write_iter
            - 31.00% xfs_file_dio_write_aligned
               - 27.41% iomap_dio_rw
                  - 27.40% __iomap_dio_rw
                     - 23.29% iomap_dio_bio_iter
                        - 23.14% submit_bio_noacct_nocheck
                           - 23.11% __submit_bio
                              - 23.02% md_handle_request
                                 - 22.85% raid10_make_request
                                    - 20.45% regular_request_wait
                                       - 20.44% wait_barrier
                                          - 18.97% _raw_spin_lock_irq
                                               18.96% native_queued_spin_lock_slowpath
                                          - 1.02% schedule
                                             - 1.02% __schedule
                                                - 0.85% pick_next_task_fair
                                                   + 0.84% newidle_balance
                                    + 1.85% md_bitmap_startwrite
                     - 3.20% iomap_iter
                        - 3.19% xfs_direct_write_iomap_begin
                           - 3.00% xfs_ilock_for_iomap
                              - 2.99% down_read
                                 - 2.95% rwsem_down_read_slowpath
                                    + 1.70% schedule_preempt_disabled
                                    + 1.13% _raw_spin_lock_irq
                     + 0.81% blk_finish_plug
               + 3.47% xfs_file_write_checks
+   87.62%     0.01%  fio      [kernel.kallsyms]       [k] iomap_dio_rw
+   87.61%     0.14%  fio      [kernel.kallsyms]       [k] __iomap_dio_rw
+   74.85%    74.85%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
+   73.13%     0.10%  fio      [kernel.kallsyms]       [k] iomap_dio_bio_iter
+   72.99%     0.11%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irq
+   72.76%     0.02%  fio      [kernel.kallsyms]       [k] submit_bio_noacct_nocheck
+   72.20%     0.01%  fio      [kernel.kallsyms]       [k] __submit_bio
+   71.82%     0.43%  fio      [kernel.kallsyms]       [k] md_handle_request
+   71.25%     0.15%  fio      [kernel.kallsyms]       [k] raid10_make_request
+   68.08%     0.02%  fio      [kernel.kallsyms]       [k] regular_request_wait
+   68.06%     0.57%  fio      [kernel.kallsyms]       [k] wait_barrier
+   65.16%     0.01%  fio      [kernel.kallsyms]       [k] aio_read
+   65.07%     0.01%  fio      [kernel.kallsyms]       [k] xfs_file_read_iter
+   65.06%     0.01%  fio      [kernel.kallsyms]       [k] xfs_file_dio_read
+   48.14%     0.12%  fio      [kernel.kallsyms]       [k] raid10_read_request

Note that in the ramdisk tests, I gate whole ramdisks or raid devices
to fio.  Here I used files on the filesystem.

Thanks,
Ali

# cat /proc/mdstat:
Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
md127 : active raid10 ram1[1] ram0[0]
      1046528 blocks super 1.2 2 near-copies [2/2] [UU]

md3 : active raid10 nvme0n1p5[0] nvme1n1p5[1] nvme3n1p5[3] nvme4n1p5[4] nvme6n1p5[6] nvme5n1p5[5] nvme7n1p5[7] nvme2n1p5[2]
      14887084032 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
      [=======>.............]  resync = 37.2% (5549960960/14887084032) finish=754.4min speed=206272K/sec
      bitmap: 70/111 pages [280KB], 65536KB chunk

md1 : active raid10 nvme1n1p3[1] nvme3n1p3[3] nvme0n1p3[0] nvme4n1p3[4] nvme5n1p3[5] nvme6n1p3[6] nvme7n1p3[7] nvme2n1p3[2]
      41906176 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]

md0 : active raid10 nvme1n1p2[1] nvme3n1p2[3] nvme0n1p2[0] nvme6n1p2[6] nvme4n1p2[4] nvme5n1p2[5] nvme7n1p2[7] nvme2n1p2[2]
      2084864 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]

md2 : active (auto-read-only) raid10 nvme4n1p4[4] nvme1n1p4[1] nvme3n1p4[3] nvme0n1p4[0] nvme6n1p4[6] nvme7n1p4[7] nvme5n1p4[5] nvme2n1p4[2]
      67067904 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
        resync=PENDING

unused devices: <none>

# lspci | grep NVM
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
61:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
62:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
83:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
84:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
#


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-18 20:30                     ` Ali Gholami Rudi
@ 2023-06-19  1:22                       ` Yu Kuai
  2023-06-19  5:19                       ` Ali Gholami Rudi
  1 sibling, 0 replies; 30+ messages in thread
From: Yu Kuai @ 2023-06-19  1:22 UTC (permalink / raw)
  To: Ali Gholami Rudi, Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)

Hi,

在 2023/06/19 4:30, Ali Gholami Rudi 写道:
> Hi,
> 
> I tested raid10 with NVMe disks on Debian 12 (Linux 6.1.0), which
> includes your first patch only.
> 
> The speed was very bad:
> 
> READ:  IOPS=360K BW=1412MiB/s
> WRITE: IOPS=154K BW= 606MiB/s
> 

Can you try to test with --bitmap=none, and --assume-clean(or echo
frozen to sync_action), let's see if spin_lock from wait_barrier() and
from md_bitmap_startwrite is bypassed, how performance will be.

Thanks,
Kuai

> Perf's output:
> 
> +   98.90%     0.00%  fio      [unknown]               [.] 0xffffffffffffffff
> +   98.71%     0.00%  fio      fio                     [.] 0x0000563ae0f62117
> +   97.69%     0.02%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> +   97.66%     0.02%  fio      [kernel.kallsyms]       [k] do_syscall_64
> +   97.29%     0.00%  fio      fio                     [.] 0x0000563ae0f5fceb
> +   97.29%     0.05%  fio      fio                     [.] td_io_queue
> +   97.20%     0.01%  fio      fio                     [.] td_io_commit
> +   97.20%     0.02%  fio      libc.so.6               [.] syscall
> +   96.94%     0.05%  fio      libaio.so.1.0.2         [.] io_submit
> +   96.94%     0.00%  fio      fio                     [.] 0x0000563ae0f84e5e
> +   96.50%     0.02%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> -   96.44%     0.03%  fio      [kernel.kallsyms]       [k] io_submit_one
>     - 96.41% io_submit_one
>        - 65.16% aio_read
>           - 65.07% xfs_file_read_iter
>              - 65.06% xfs_file_dio_read
>                 - 60.21% iomap_dio_rw
>                    - 60.21% __iomap_dio_rw
>                       - 49.84% iomap_dio_bio_iter
>                          - 49.39% submit_bio_noacct_nocheck
>                             - 49.08% __submit_bio
>                                - 48.80% md_handle_request
>                                   - 48.40% raid10_make_request
>                                      - 48.14% raid10_read_request
>                                         - 47.63% regular_request_wait
>                                            - 47.62% wait_barrier
>                                               - 44.17% _raw_spin_lock_irq
>                                                    44.14% native_queued_spin_lock_slowpath
>                                               - 2.39% schedule
>                                                  - 2.38% __schedule
>                                                     + 1.99% pick_next_task_fair
>                       - 9.78% iomap_iter
>                          - 9.77% xfs_read_iomap_begin
>                             - 9.30% xfs_ilock_for_iomap
>                                - 9.29% down_read
>                                   - 9.18% rwsem_down_read_slowpath
>                                      - 4.67% schedule_preempt_disabled
>                                         - 4.67% schedule
>                                            - 4.67% __schedule
>                                               - 4.08% pick_next_task_fair
>                                                  - 4.08% newidle_balance
>                                                     - 3.94% load_balance
>                                                        - 3.60% find_busiest_group
>                                                             3.59% update_sd_lb_stats.constprop.0
>                                      - 4.12% _raw_spin_lock_irq
>                                           4.11% native_queued_spin_lock_slowpath
>                 + 4.56% touch_atime
>        - 31.12% aio_write
>           - 31.06% xfs_file_write_iter
>              - 31.00% xfs_file_dio_write_aligned
>                 - 27.41% iomap_dio_rw
>                    - 27.40% __iomap_dio_rw
>                       - 23.29% iomap_dio_bio_iter
>                          - 23.14% submit_bio_noacct_nocheck
>                             - 23.11% __submit_bio
>                                - 23.02% md_handle_request
>                                   - 22.85% raid10_make_request
>                                      - 20.45% regular_request_wait
>                                         - 20.44% wait_barrier
>                                            - 18.97% _raw_spin_lock_irq
>                                                 18.96% native_queued_spin_lock_slowpath
>                                            - 1.02% schedule
>                                               - 1.02% __schedule
>                                                  - 0.85% pick_next_task_fair
>                                                     + 0.84% newidle_balance
>                                      + 1.85% md_bitmap_startwrite
>                       - 3.20% iomap_iter
>                          - 3.19% xfs_direct_write_iomap_begin
>                             - 3.00% xfs_ilock_for_iomap
>                                - 2.99% down_read
>                                   - 2.95% rwsem_down_read_slowpath
>                                      + 1.70% schedule_preempt_disabled
>                                      + 1.13% _raw_spin_lock_irq
>                       + 0.81% blk_finish_plug
>                 + 3.47% xfs_file_write_checks
> +   87.62%     0.01%  fio      [kernel.kallsyms]       [k] iomap_dio_rw
> +   87.61%     0.14%  fio      [kernel.kallsyms]       [k] __iomap_dio_rw
> +   74.85%    74.85%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
> +   73.13%     0.10%  fio      [kernel.kallsyms]       [k] iomap_dio_bio_iter
> +   72.99%     0.11%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irq
> +   72.76%     0.02%  fio      [kernel.kallsyms]       [k] submit_bio_noacct_nocheck
> +   72.20%     0.01%  fio      [kernel.kallsyms]       [k] __submit_bio
> +   71.82%     0.43%  fio      [kernel.kallsyms]       [k] md_handle_request
> +   71.25%     0.15%  fio      [kernel.kallsyms]       [k] raid10_make_request
> +   68.08%     0.02%  fio      [kernel.kallsyms]       [k] regular_request_wait
> +   68.06%     0.57%  fio      [kernel.kallsyms]       [k] wait_barrier
> +   65.16%     0.01%  fio      [kernel.kallsyms]       [k] aio_read
> +   65.07%     0.01%  fio      [kernel.kallsyms]       [k] xfs_file_read_iter
> +   65.06%     0.01%  fio      [kernel.kallsyms]       [k] xfs_file_dio_read
> +   48.14%     0.12%  fio      [kernel.kallsyms]       [k] raid10_read_request
> 
> Note that in the ramdisk tests, I gate whole ramdisks or raid devices
> to fio.  Here I used files on the filesystem.
> 
> Thanks,
> Ali
> 
> # cat /proc/mdstat:
> Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
> md127 : active raid10 ram1[1] ram0[0]
>        1046528 blocks super 1.2 2 near-copies [2/2] [UU]
> 
> md3 : active raid10 nvme0n1p5[0] nvme1n1p5[1] nvme3n1p5[3] nvme4n1p5[4] nvme6n1p5[6] nvme5n1p5[5] nvme7n1p5[7] nvme2n1p5[2]
>        14887084032 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
>        [=======>.............]  resync = 37.2% (5549960960/14887084032) finish=754.4min speed=206272K/sec
>        bitmap: 70/111 pages [280KB], 65536KB chunk
> 
> md1 : active raid10 nvme1n1p3[1] nvme3n1p3[3] nvme0n1p3[0] nvme4n1p3[4] nvme5n1p3[5] nvme6n1p3[6] nvme7n1p3[7] nvme2n1p3[2]
>        41906176 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
> 
> md0 : active raid10 nvme1n1p2[1] nvme3n1p2[3] nvme0n1p2[0] nvme6n1p2[6] nvme4n1p2[4] nvme5n1p2[5] nvme7n1p2[7] nvme2n1p2[2]
>        2084864 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
> 
> md2 : active (auto-read-only) raid10 nvme4n1p4[4] nvme1n1p4[1] nvme3n1p4[3] nvme0n1p4[0] nvme6n1p4[6] nvme7n1p4[7] nvme5n1p4[5] nvme2n1p4[2]
>        67067904 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
>          resync=PENDING
> 
> unused devices: <none>
> 
> # lspci | grep NVM
> 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 61:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 62:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 83:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> 84:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
> #
> 
> .
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-18 20:30                     ` Ali Gholami Rudi
  2023-06-19  1:22                       ` Yu Kuai
@ 2023-06-19  5:19                       ` Ali Gholami Rudi
  2023-06-19  6:53                         ` Yu Kuai
  1 sibling, 1 reply; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-06-19  5:19 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)

Hi,

Yu Kuai <yukuai1@huaweicloud.com> wrote:
> Can you try to test with --bitmap=none, and --assume-clean(or echo
> frozen to sync_action), let's see if spin_lock from wait_barrier() and
> from md_bitmap_startwrite is bypassed, how performance will be.

I did not notice that it was syncing disks in the background.
It is much better now:

READ:  IOPS=1748K BW=6830MiB/s
WRITE: IOPS= 749K BW=2928MiB/s

Perf's output:

+   98.04%     0.00%  fio      [unknown]               [k] 0xffffffffffffffff
+   97.86%     0.00%  fio      fio                     [.] 0x000055ed33e52117
+   94.64%     0.03%  fio      libc.so.6               [.] syscall
+   94.54%     0.09%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
+   94.44%     0.03%  fio      [kernel.kallsyms]       [k] do_syscall_64
+   94.31%     0.00%  fio      fio                     [.] 0x000055ed33e4fceb
+   94.31%     0.07%  fio      fio                     [.] td_io_queue
+   94.15%     0.03%  fio      fio                     [.] td_io_commit
+   93.77%     0.05%  fio      libaio.so.1.0.2         [.] io_submit
+   93.77%     0.00%  fio      fio                     [.] 0x000055ed33e74e5e
+   92.03%     0.04%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
-   91.86%     0.05%  fio      [kernel.kallsyms]       [k] io_submit_one
   - 91.81% io_submit_one
      - 52.43% aio_read
         - 51.94% ext4_file_read_iter
            - 36.03% iomap_dio_rw
               - 36.02% __iomap_dio_rw
                  - 21.56% iomap_dio_bio_iter
                     - 17.72% submit_bio_noacct_nocheck
                        - 17.09% __submit_bio
                           - 14.81% md_handle_request
                              - 8.34% raid10_make_request
                                 - 6.96% raid10_read_request
                                    - 2.76% regular_request_wait
                                         2.58% wait_barrier
                                      0.76% read_balance
                              - 1.07% asm_common_interrupt
                                 - 1.07% common_interrupt
                                    - 1.06% __common_interrupt
                                       + 1.06% handle_edge_irq
                             2.22% md_submit_bio
                          0.51% blk_mq_submit_bio
                  - 10.68% iomap_iter
                     - 10.64% ext4_iomap_begin
                        - 4.07% ext4_map_blocks
                           - 3.80% ext4_es_lookup_extent
                                1.77% _raw_read_lock
                                1.68% _raw_read_unlock
                          3.51% ext4_set_iomap
                        + 0.54% asm_common_interrupt
                  - 1.21% blk_finish_plug
                     - 1.20% __blk_flush_plug
                        - 1.19% blk_mq_flush_plug_list
                           - 1.15% nvme_queue_rqs
                              + 1.01% nvme_prep_rq.part.0
            - 6.04% down_read
               - 1.73% asm_common_interrupt
                  - 1.73% common_interrupt
                     - 1.72% __common_interrupt
                        - 1.71% handle_edge_irq
                           - 1.58% handle_irq_event
                              - 1.57% __handle_irq_event_percpu
                                 - nvme_irq
                                    - 1.46% blk_mq_end_request_batch
                                         0.80% raid10_end_write_request
                                         0.55% raid10_end_read_request
            - 5.43% up_read
               + 0.99% asm_common_interrupt
            - 3.99% touch_atime
               - 3.87% atime_needs_update
                  + 0.68% asm_common_interrupt
      - 39.00% aio_write
         - 38.78% ext4_file_write_iter
            - 28.16% iomap_dio_rw
               - 28.15% __iomap_dio_rw
                  - 14.49% iomap_dio_bio_iter
                     - 12.65% submit_bio_noacct_nocheck
                        - 12.61% __submit_bio
                           - 11.54% md_handle_request
                              - 7.91% raid10_make_request
                                   3.24% raid10_write_one_disk
                                 - 1.21% regular_request_wait
                                      1.13% wait_barrier
                                   1.14% wait_blocked_dev
                              + 0.57% asm_common_interrupt
                             1.04% md_submit_bio
                  - 8.05% blk_finish_plug
                     - 8.04% __blk_flush_plug
                        - 6.42% raid10_unplug
                           - 4.19% __wake_up_common_lock
                              - 3.60% _raw_spin_lock_irqsave
                                   3.40% native_queued_spin_lock_slowpath
                                0.59% _raw_spin_unlock_irqrestore
                           - 1.00% submit_bio_noacct_nocheck
                              + 0.93% blk_mq_submit_bio
                        - 1.59% blk_mq_flush_plug_list
                           - 1.02% blk_mq_sched_insert_requests
                              + 0.98% blk_mq_try_issue_list_directly
                  - 4.53% iomap_iter
                     - 4.51% ext4_iomap_overwrite_begin
                        - 4.51% ext4_iomap_begin
                           - 1.69% ext4_map_blocks
                              - 1.58% ext4_es_lookup_extent
                                   0.72% _raw_read_lock
                                   0.71% _raw_read_unlock
                             1.54% ext4_set_iomap
            - 2.83% up_read
               + 0.92% asm_common_interrupt
            - 2.26% down_read
               + 0.57% asm_common_interrupt
            - 1.81% ext4_map_blocks
               - 1.66% ext4_es_lookup_extent
                    0.81% _raw_read_lock
                    0.70% _raw_read_unlock
            - 1.67% file_modified_flags
                 1.43% inode_needs_update_time.part.0
+   64.19%     0.02%  fio      [kernel.kallsyms]       [k] iomap_dio_rw
+   64.17%     2.41%  fio      [kernel.kallsyms]       [k] __iomap_dio_rw
+   52.43%     0.03%  fio      [kernel.kallsyms]       [k] aio_read
+   51.94%     0.04%  fio      [kernel.kallsyms]       [k] ext4_file_read_iter

Thanks,
Ali


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-19  5:19                       ` Ali Gholami Rudi
@ 2023-06-19  6:53                         ` Yu Kuai
  0 siblings, 0 replies; 30+ messages in thread
From: Yu Kuai @ 2023-06-19  6:53 UTC (permalink / raw)
  To: Ali Gholami Rudi, Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)

Hi,

在 2023/06/19 13:19, Ali Gholami Rudi 写道:
> Hi,
> 
> Yu Kuai <yukuai1@huaweicloud.com> wrote:
>> Can you try to test with --bitmap=none, and --assume-clean(or echo
>> frozen to sync_action), let's see if spin_lock from wait_barrier() and
>> from md_bitmap_startwrite is bypassed, how performance will be.
> 
> I did not notice that it was syncing disks in the background.
> It is much better now:
> 
> READ:  IOPS=1748K BW=6830MiB/s
> WRITE: IOPS= 749K BW=2928MiB/s
> 

This looks good, and cost from spin_lock() from raid10_unplug() is about
4%, this can also be avoided, however, performance should not be
improved obviously.

Thanks,
Kuai
> Perf's output:
> 
> +   98.04%     0.00%  fio      [unknown]               [k] 0xffffffffffffffff
> +   97.86%     0.00%  fio      fio                     [.] 0x000055ed33e52117
> +   94.64%     0.03%  fio      libc.so.6               [.] syscall
> +   94.54%     0.09%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> +   94.44%     0.03%  fio      [kernel.kallsyms]       [k] do_syscall_64
> +   94.31%     0.00%  fio      fio                     [.] 0x000055ed33e4fceb
> +   94.31%     0.07%  fio      fio                     [.] td_io_queue
> +   94.15%     0.03%  fio      fio                     [.] td_io_commit
> +   93.77%     0.05%  fio      libaio.so.1.0.2         [.] io_submit
> +   93.77%     0.00%  fio      fio                     [.] 0x000055ed33e74e5e
> +   92.03%     0.04%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> -   91.86%     0.05%  fio      [kernel.kallsyms]       [k] io_submit_one
>     - 91.81% io_submit_one
>        - 52.43% aio_read
>           - 51.94% ext4_file_read_iter
>              - 36.03% iomap_dio_rw
>                 - 36.02% __iomap_dio_rw
>                    - 21.56% iomap_dio_bio_iter
>                       - 17.72% submit_bio_noacct_nocheck
>                          - 17.09% __submit_bio
>                             - 14.81% md_handle_request
>                                - 8.34% raid10_make_request
>                                   - 6.96% raid10_read_request
>                                      - 2.76% regular_request_wait
>                                           2.58% wait_barrier
>                                        0.76% read_balance
>                                - 1.07% asm_common_interrupt
>                                   - 1.07% common_interrupt
>                                      - 1.06% __common_interrupt
>                                         + 1.06% handle_edge_irq
>                               2.22% md_submit_bio
>                            0.51% blk_mq_submit_bio
>                    - 10.68% iomap_iter
>                       - 10.64% ext4_iomap_begin
>                          - 4.07% ext4_map_blocks
>                             - 3.80% ext4_es_lookup_extent
>                                  1.77% _raw_read_lock
>                                  1.68% _raw_read_unlock
>                            3.51% ext4_set_iomap
>                          + 0.54% asm_common_interrupt
>                    - 1.21% blk_finish_plug
>                       - 1.20% __blk_flush_plug
>                          - 1.19% blk_mq_flush_plug_list
>                             - 1.15% nvme_queue_rqs
>                                + 1.01% nvme_prep_rq.part.0
>              - 6.04% down_read
>                 - 1.73% asm_common_interrupt
>                    - 1.73% common_interrupt
>                       - 1.72% __common_interrupt
>                          - 1.71% handle_edge_irq
>                             - 1.58% handle_irq_event
>                                - 1.57% __handle_irq_event_percpu
>                                   - nvme_irq
>                                      - 1.46% blk_mq_end_request_batch
>                                           0.80% raid10_end_write_request
>                                           0.55% raid10_end_read_request
>              - 5.43% up_read
>                 + 0.99% asm_common_interrupt
>              - 3.99% touch_atime
>                 - 3.87% atime_needs_update
>                    + 0.68% asm_common_interrupt
>        - 39.00% aio_write
>           - 38.78% ext4_file_write_iter
>              - 28.16% iomap_dio_rw
>                 - 28.15% __iomap_dio_rw
>                    - 14.49% iomap_dio_bio_iter
>                       - 12.65% submit_bio_noacct_nocheck
>                          - 12.61% __submit_bio
>                             - 11.54% md_handle_request
>                                - 7.91% raid10_make_request
>                                     3.24% raid10_write_one_disk
>                                   - 1.21% regular_request_wait
>                                        1.13% wait_barrier
>                                     1.14% wait_blocked_dev
>                                + 0.57% asm_common_interrupt
>                               1.04% md_submit_bio
>                    - 8.05% blk_finish_plug
>                       - 8.04% __blk_flush_plug
>                          - 6.42% raid10_unplug
>                             - 4.19% __wake_up_common_lock
>                                - 3.60% _raw_spin_lock_irqsave
>                                     3.40% native_queued_spin_lock_slowpath
>                                  0.59% _raw_spin_unlock_irqrestore
>                             - 1.00% submit_bio_noacct_nocheck
>                                + 0.93% blk_mq_submit_bio
>                          - 1.59% blk_mq_flush_plug_list
>                             - 1.02% blk_mq_sched_insert_requests
>                                + 0.98% blk_mq_try_issue_list_directly
>                    - 4.53% iomap_iter
>                       - 4.51% ext4_iomap_overwrite_begin
>                          - 4.51% ext4_iomap_begin
>                             - 1.69% ext4_map_blocks
>                                - 1.58% ext4_es_lookup_extent
>                                     0.72% _raw_read_lock
>                                     0.71% _raw_read_unlock
>                               1.54% ext4_set_iomap
>              - 2.83% up_read
>                 + 0.92% asm_common_interrupt
>              - 2.26% down_read
>                 + 0.57% asm_common_interrupt
>              - 1.81% ext4_map_blocks
>                 - 1.66% ext4_es_lookup_extent
>                      0.81% _raw_read_lock
>                      0.70% _raw_read_unlock
>              - 1.67% file_modified_flags
>                   1.43% inode_needs_update_time.part.0
> +   64.19%     0.02%  fio      [kernel.kallsyms]       [k] iomap_dio_rw
> +   64.17%     2.41%  fio      [kernel.kallsyms]       [k] __iomap_dio_rw
> +   52.43%     0.03%  fio      [kernel.kallsyms]       [k] aio_read
> +   51.94%     0.04%  fio      [kernel.kallsyms]       [k] ext4_file_read_iter
> 
> Thanks,
> Ali
> 
> .
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-16 12:27                   ` Yu Kuai
  2023-06-18 20:30                     ` Ali Gholami Rudi
@ 2023-06-21  8:05                     ` Xiao Ni
  2023-06-21  8:26                       ` Yu Kuai
  2023-06-21 19:34                       ` Wols Lists
  1 sibling, 2 replies; 30+ messages in thread
From: Xiao Ni @ 2023-06-21  8:05 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Ali Gholami Rudi, linux-raid, song, yukuai (C)

On Fri, Jun 16, 2023 at 8:27 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/16 19:51, Ali Gholami Rudi 写道:
> >
>
> Thanks for testing!
>
> > Perf's output:
> >
> > +   93.79%     0.09%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> > +   92.89%     0.05%  fio      [kernel.kallsyms]       [k] do_syscall_64
> > +   86.59%     0.07%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> > -   85.61%     0.10%  fio      [kernel.kallsyms]       [k] io_submit_one
> >     - 85.51% io_submit_one
> >        - 47.98% aio_read
> >           - 46.18% blkdev_read_iter
> >              - 44.90% __blkdev_direct_IO_async
> >                 - 41.68% submit_bio_noacct_nocheck
> >                    - 41.50% __submit_bio
> >                       - 18.76% md_handle_request
> >                          - 18.71% raid10_make_request
> >                             - 18.54% raid10_read_request
> >                                  16.54% read_balance
>
> There is not any spin_lock in fast path anymore. Now, looks like
> main cost is raid10 io path now(read_balance looks worth
> investigation, 16.54% is too much), and for a real device with ms
> io latency, I think latency in io path may not matter.

Hi Kuai

Cool. And I noticed you mentioned 'fast path' in many places. What's
the meaning of 'fast path'? Does it mean the path that i/os are
submitting?

Regards
Xiao


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-21  8:05                     ` Xiao Ni
@ 2023-06-21  8:26                       ` Yu Kuai
  2023-06-21  8:55                         ` Xiao Ni
  2023-07-01 11:17                         ` Ali Gholami Rudi
  2023-06-21 19:34                       ` Wols Lists
  1 sibling, 2 replies; 30+ messages in thread
From: Yu Kuai @ 2023-06-21  8:26 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai; +Cc: Ali Gholami Rudi, linux-raid, song, yukuai (C)

Hi,

在 2023/06/21 16:05, Xiao Ni 写道:
> On Fri, Jun 16, 2023 at 8:27 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/06/16 19:51, Ali Gholami Rudi 写道:
>>>
>>
>> Thanks for testing!
>>
>>> Perf's output:
>>>
>>> +   93.79%     0.09%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
>>> +   92.89%     0.05%  fio      [kernel.kallsyms]       [k] do_syscall_64
>>> +   86.59%     0.07%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
>>> -   85.61%     0.10%  fio      [kernel.kallsyms]       [k] io_submit_one
>>>      - 85.51% io_submit_one
>>>         - 47.98% aio_read
>>>            - 46.18% blkdev_read_iter
>>>               - 44.90% __blkdev_direct_IO_async
>>>                  - 41.68% submit_bio_noacct_nocheck
>>>                     - 41.50% __submit_bio
>>>                        - 18.76% md_handle_request
>>>                           - 18.71% raid10_make_request
>>>                              - 18.54% raid10_read_request
>>>                                   16.54% read_balance
>>
>> There is not any spin_lock in fast path anymore. Now, looks like
>> main cost is raid10 io path now(read_balance looks worth
>> investigation, 16.54% is too much), and for a real device with ms
>> io latency, I think latency in io path may not matter.
> 
> Hi Kuai
> 
> Cool. And I noticed you mentioned 'fast path' in many places. What's
> the meaning of 'fast path'? Does it mean the path that i/os are
> submitting?

Yes, and fast path means the case all resources is available and io can
be submitted to device without blocking.

There should be no spin_lock or atomic ops in fast path, otherwise io
performance will be affected.

Thanks,
Kuai
> 
> Regards
> Xiao
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-21  8:26                       ` Yu Kuai
@ 2023-06-21  8:55                         ` Xiao Ni
  2023-07-01 11:17                         ` Ali Gholami Rudi
  1 sibling, 0 replies; 30+ messages in thread
From: Xiao Ni @ 2023-06-21  8:55 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Ali Gholami Rudi, linux-raid, song, yukuai (C)

On Wed, Jun 21, 2023 at 4:27 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/06/21 16:05, Xiao Ni 写道:
> > On Fri, Jun 16, 2023 at 8:27 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/06/16 19:51, Ali Gholami Rudi 写道:
> >>>
> >>
> >> Thanks for testing!
> >>
> >>> Perf's output:
> >>>
> >>> +   93.79%     0.09%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> >>> +   92.89%     0.05%  fio      [kernel.kallsyms]       [k] do_syscall_64
> >>> +   86.59%     0.07%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
> >>> -   85.61%     0.10%  fio      [kernel.kallsyms]       [k] io_submit_one
> >>>      - 85.51% io_submit_one
> >>>         - 47.98% aio_read
> >>>            - 46.18% blkdev_read_iter
> >>>               - 44.90% __blkdev_direct_IO_async
> >>>                  - 41.68% submit_bio_noacct_nocheck
> >>>                     - 41.50% __submit_bio
> >>>                        - 18.76% md_handle_request
> >>>                           - 18.71% raid10_make_request
> >>>                              - 18.54% raid10_read_request
> >>>                                   16.54% read_balance
> >>
> >> There is not any spin_lock in fast path anymore. Now, looks like
> >> main cost is raid10 io path now(read_balance looks worth
> >> investigation, 16.54% is too much), and for a real device with ms
> >> io latency, I think latency in io path may not matter.
> >
> > Hi Kuai
> >
> > Cool. And I noticed you mentioned 'fast path' in many places. What's
> > the meaning of 'fast path'? Does it mean the path that i/os are
> > submitting?
>
> Yes, and fast path means the case all resources is available and io can
> be submitted to device without blocking.
>
> There should be no spin_lock or atomic ops in fast path, otherwise io
> performance will be affected.
>
> Thanks,
> Kuai

I c. Thanks for the explanation.

Regards
Xiao


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-21  8:05                     ` Xiao Ni
  2023-06-21  8:26                       ` Yu Kuai
@ 2023-06-21 19:34                       ` Wols Lists
  2023-06-23  0:52                         ` Xiao Ni
  1 sibling, 1 reply; 30+ messages in thread
From: Wols Lists @ 2023-06-21 19:34 UTC (permalink / raw)
  To: Xiao Ni; +Cc: linux-raid

On 21/06/2023 09:05, Xiao Ni wrote:
> Cool. And I noticed you mentioned 'fast path' in many places. What's
> the meaning of 'fast path'? Does it mean the path that i/os are
> submitting?

It's a pretty generic kernel term, used everywhere. It's intended to be 
the normal route for whatever is going on, but it must ALWAYS ALWAYS 
ALWAYS be optimised for speed.

If it hits a problem, it must back out and use the "slow path", which 
can wait, block, whatever.

So the idea is that all your operations normally complete straight away, 
but if they can't they go into a different path that guarantees they 
complete, but don't block the normal operation of the system.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-21 19:34                       ` Wols Lists
@ 2023-06-23  0:52                         ` Xiao Ni
  0 siblings, 0 replies; 30+ messages in thread
From: Xiao Ni @ 2023-06-23  0:52 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

On Thu, Jun 22, 2023 at 3:34 AM Wols Lists <antlists@youngman.org.uk> wrote:
>
> On 21/06/2023 09:05, Xiao Ni wrote:
> > Cool. And I noticed you mentioned 'fast path' in many places. What's
> > the meaning of 'fast path'? Does it mean the path that i/os are
> > submitting?
>
> It's a pretty generic kernel term, used everywhere. It's intended to be
> the normal route for whatever is going on, but it must ALWAYS ALWAYS
> ALWAYS be optimised for speed.
>
> If it hits a problem, it must back out and use the "slow path", which
> can wait, block, whatever.
>
> So the idea is that all your operations normally complete straight away,
> but if they can't they go into a different path that guarantees they
> complete, but don't block the normal operation of the system.
>
> Cheers,
> Wol
>

Hi Wol

Thanks for the explanation!

Regards
Xiao


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-06-21  8:26                       ` Yu Kuai
  2023-06-21  8:55                         ` Xiao Ni
@ 2023-07-01 11:17                         ` Ali Gholami Rudi
  2023-07-03 12:39                           ` Yu Kuai
  1 sibling, 1 reply; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-07-01 11:17 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)


Hi,

I repeated the test on a large array (14TB instead of 40GB):

$ cat /proc/mdstat
Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
md3 : active raid10 nvme1n1p5[1] nvme3n1p5[3] nvme0n1p5[0] nvme2n1p5[2]
      14889424896 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

md0 : active raid10 nvme2n1p2[2] nvme3n1p2[3] nvme0n1p2[0] nvme1n1p2[1]
      2091008 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

..

I get these results.

On md0 (40GB):
READ:  IOPS=1563K BW=6109MiB/s
WRITE: IOPS= 670K BW=2745MiB/s

On md3 (14TB):
READ:  IOPS=1177K BW=4599MiB/s
WRITE: IOPS= 505K BW=1972MiB/s

On md3 but disabling mdadm bitmap (mdadm --grow --bitmap=none /dev/md3):
READ:  IOPS=1351K BW=5278MiB/s
WRITE: IOPS= 579K BW=2261MiB/s

The tests are performed on Debian-12 (kernel version 6.1).

Any room for improvement?

Thanks,
Ali

This is perf's output; there are lock contentions.

+   95.25%     0.00%  fio      [unknown]               [k] 0xffffffffffffffff
+   95.00%     0.00%  fio      fio                     [.] 0x000055e073fcd117
+   93.68%     0.13%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
+   93.54%     0.03%  fio      [kernel.kallsyms]       [k] do_syscall_64
+   92.38%     0.03%  fio      libc.so.6               [.] syscall
+   92.18%     0.00%  fio      fio                     [.] 0x000055e073fcaceb
+   92.18%     0.08%  fio      fio                     [.] td_io_queue
+   92.04%     0.02%  fio      fio                     [.] td_io_commit
+   91.76%     0.00%  fio      fio                     [.] 0x000055e073fefe5e
-   91.76%     0.05%  fio      libaio.so.1.0.2         [.] io_submit
   - 91.71% io_submit
      - 91.69% syscall
         - 91.58% entry_SYSCALL_64_after_hwframe
            - 91.55% do_syscall_64
               - 91.06% __x64_sys_io_submit
                  - 90.93% io_submit_one
                     - 48.85% aio_write
                        - 48.77% ext4_file_write_iter
                           - 39.86% iomap_dio_rw
                              - 39.85% __iomap_dio_rw
                                 - 22.55% blk_finish_plug
                                    - 22.55% __blk_flush_plug
                                       - 21.67% raid10_unplug
                                          - 16.54% submit_bio_noacct_nocheck
                                             - 16.44% blk_mq_submit_bio
                                                - 16.17% __rq_qos_throttle
                                                   - 16.01% wbt_wait
                                                      - 15.93% rq_qos_wait
                                                         - 14.52% prepare_to_wait_exclusive
                                                            - 11.50% _raw_spin_lock_irqsave
                                                                 11.49% native_queued_spin_lock_slowpath
                                                            - 3.01% _raw_spin_unlock_irqrestore
                                                               - 2.29% asm_common_interrupt
                                                                  - 2.29% common_interrupt
                                                                     - 2.28% __common_interrupt
                                                                        - 2.28% handle_edge_irq
                                                                           - 2.26% handle_irq_event
                                                                              - 2.26% __handle_irq_event_percpu
                                                                                 - nvme_irq
                                                                                    - 2.23% blk_mq_end_request_batch
                                                                                       - 1.41% __rq_qos_done
                                                                                          - wbt_done
                                                                                             - 1.38% __wake_up_common_lock
                                                                                                - 1.36% _raw_spin_lock_irqsave
                                                                                                     native_queued_spin_lock_slowpath
                                                                                         0.51% raid10_end_read_request
                                                               - 0.61% asm_sysvec_apic_timer_interrupt
                                                                  - sysvec_apic_timer_interrupt
                                                                     - 0.57% __irq_exit_rcu
                                                                        - 0.56% __softirqentry_text_start
                                                                           - 0.55% asm_common_interrupt
                                                                              - common_interrupt
                                                                                 - 0.55% __common_interrupt
                                                                                    - 0.55% handle_edge_irq
                                                                                       - 0.54% handle_irq_event
                                                                                          - 0.54% __handle_irq_event_percpu
                                                                                             - nvme_irq
                                                                                                  0.54% blk_mq_end_request_batch
                                                         - 0.87% io_schedule
                                                            - 0.85% schedule
                                                               - 0.84% __schedule
                                                                  - 0.62% pick_next_task_fair
                                                                     - 0.61% newidle_balance
                                                                        - 0.60% load_balance
                                                                             0.50% find_busiest_group
                                          - 3.98% __wake_up_common_lock
                                             - 3.21% _raw_spin_lock_irqsave
                                                  3.02% native_queued_spin_lock_slowpath
                                             - 0.77% _raw_spin_unlock_irqrestore
                                                - 0.64% asm_common_interrupt
                                                   - common_interrupt
                                                      - 0.63% __common_interrupt
                                                         - 0.63% handle_edge_irq
                                                            - 0.60% handle_irq_event
                                                               - 0.59% __handle_irq_event_percpu
                                                                  - nvme_irq
                                                                       0.58% blk_mq_end_request_batch
                                         0.84% blk_mq_flush_plug_list
                                 - 12.50% iomap_dio_bio_iter
                                    - 10.79% submit_bio_noacct_nocheck
                                       - 10.73% __submit_bio
                                          - 9.77% md_handle_request
                                             - 7.14% raid10_make_request
                                                - 2.98% raid10_write_one_disk
                                                   - 0.52% asm_common_interrupt
                                                      - common_interrupt
                                                         - 0.51% __common_interrupt
                                                              0.51% handle_edge_irq
                                                  1.16% wait_blocked_dev
                                                - 0.83% regular_request_wait
                                                     0.82% wait_barrier
                                            0.95% md_submit_bio
                                 - 3.54% iomap_iter
                                    - 3.52% ext4_iomap_overwrite_begin
                                       - 3.52% ext4_iomap_begin
                                            1.80% ext4_set_iomap
                           - 2.19% file_modified_flags
                                2.16% inode_needs_update_time.part.0
                             1.82% up_read
                             1.61% down_read
                           - 0.88% ext4_generic_write_checks
                                0.57% generic_write_checks
                     - 41.78% aio_read
                        - 41.64% ext4_file_read_iter
                           - 31.62% iomap_dio_rw
                              - 31.61% __iomap_dio_rw
                                 - 19.92% iomap_dio_bio_iter
                                    - 16.26% submit_bio_noacct_nocheck
                                       - 15.60% __submit_bio
                                          - 13.31% md_handle_request
                                             - 7.50% raid10_make_request
                                                - 6.14% raid10_read_request
                                                   - 1.94% regular_request_wait
                                                        1.92% wait_barrier
                                                     1.12% read_balance
                                                   - 0.53% asm_common_interrupt
                                                      - 0.53% common_interrupt
                                                         - 0.53% __common_interrupt
                                                            - 0.53% handle_edge_irq
                                                                 0.50% handle_irq_event
                                             - 1.14% asm_common_interrupt
                                                - 1.14% common_interrupt
                                                   - 1.13% __common_interrupt
                                                      - 1.13% handle_edge_irq
                                                         - 1.08% handle_irq_event
                                                            - 1.07% __handle_irq_event_percpu
                                                               - nvme_irq
                                                                    1.05% blk_mq_end_request_batch
                                            2.22% md_submit_bio
                                         0.52% blk_mq_submit_bio
                                    - 0.67% asm_common_interrupt
                                       - common_interrupt
                                          - 0.66% __common_interrupt
                                             - 0.66% handle_edge_irq
                                                - 0.62% handle_irq_event
                                                   - 0.62% __handle_irq_event_percpu
                                                      - nvme_irq
                                                           0.61% blk_mq_end_request_batch
                                 - 8.90% iomap_iter
                                    - 8.86% ext4_iomap_begin
                                       - 4.24% ext4_set_iomap
                                          - 0.86% asm_common_interrupt
                                             - 0.86% common_interrupt
                                                - 0.85% __common_interrupt
                                                   - 0.85% handle_edge_irq
                                                      - 0.81% handle_irq_event
                                                         - 0.80% __handle_irq_event_percpu
                                                            - nvme_irq
                                                                 0.79% blk_mq_end_request_batch
                                       - 0.88% ext4_map_blocks
                                            0.68% ext4_es_lookup_extent
                                       - 0.81% asm_common_interrupt
                                          - 0.81% common_interrupt
                                             - 0.81% __common_interrupt
                                                - 0.81% handle_edge_irq
                                                   - 0.77% handle_irq_event
                                                      - 0.76% __handle_irq_event_percpu
                                                         - nvme_irq
                                                              0.75% blk_mq_end_request_batch
                           - 4.53% down_read
                              - 0.87% asm_common_interrupt
                                 - 0.86% common_interrupt
                                    - 0.86% __common_interrupt
                                       - 0.86% handle_edge_irq
                                          - 0.81% handle_irq_event
                                             - 0.81% __handle_irq_event_percpu
                                                - nvme_irq
                                                     0.79% blk_mq_end_request_batch
                           - 4.42% up_read
                              - 0.82% asm_common_interrupt
                                 - 0.82% common_interrupt
                                    - 0.81% __common_interrupt
                                       - 0.81% handle_edge_irq
                                          - 0.77% handle_irq_event
                                             - 0.77% __handle_irq_event_percpu
                                                - nvme_irq
                                                     0.75% blk_mq_end_request_batch
                           - 0.86% ext4_dio_alignment
                                0.83% ext4_inode_journal_mode


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-07-01 11:17                         ` Ali Gholami Rudi
@ 2023-07-03 12:39                           ` Yu Kuai
  2023-07-05  7:59                             ` Ali Gholami Rudi
  0 siblings, 1 reply; 30+ messages in thread
From: Yu Kuai @ 2023-07-03 12:39 UTC (permalink / raw)
  To: Ali Gholami Rudi, Yu Kuai; +Cc: Xiao Ni, linux-raid, song, yukuai (C)

Hi,

在 2023/07/01 19:17, Ali Gholami Rudi 写道:
> 
> Hi,
> 
> I repeated the test on a large array (14TB instead of 40GB):
> 
> $ cat /proc/mdstat
> Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
> md3 : active raid10 nvme1n1p5[1] nvme3n1p5[3] nvme0n1p5[0] nvme2n1p5[2]
>        14889424896 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
> 
> md0 : active raid10 nvme2n1p2[2] nvme3n1p2[3] nvme0n1p2[0] nvme1n1p2[1]
>        2091008 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
> 
> ..
> 
> I get these results.
> 
> On md0 (40GB):
> READ:  IOPS=1563K BW=6109MiB/s
> WRITE: IOPS= 670K BW=2745MiB/s
> 
> On md3 (14TB):
> READ:  IOPS=1177K BW=4599MiB/s
> WRITE: IOPS= 505K BW=1972MiB/s
> 
> On md3 but disabling mdadm bitmap (mdadm --grow --bitmap=none /dev/md3):
> READ:  IOPS=1351K BW=5278MiB/s
> WRITE: IOPS= 579K BW=2261MiB/s
> 

Currently, if bitmap is enabled, a bitmap level spinlock will be grabbed
for each write, and sadly this will require a huge refactor to improve
performance.

> The tests are performed on Debian-12 (kernel version 6.1).
> 
> Any room for improvement?
> 
> Thanks,
> Ali
> 
> This is perf's output; there are lock contentions.
> 
> +   95.25%     0.00%  fio      [unknown]               [k] 0xffffffffffffffff
> +   95.00%     0.00%  fio      fio                     [.] 0x000055e073fcd117
> +   93.68%     0.13%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> +   93.54%     0.03%  fio      [kernel.kallsyms]       [k] do_syscall_64
> +   92.38%     0.03%  fio      libc.so.6               [.] syscall
> +   92.18%     0.00%  fio      fio                     [.] 0x000055e073fcaceb
> +   92.18%     0.08%  fio      fio                     [.] td_io_queue
> +   92.04%     0.02%  fio      fio                     [.] td_io_commit
> +   91.76%     0.00%  fio      fio                     [.] 0x000055e073fefe5e
> -   91.76%     0.05%  fio      libaio.so.1.0.2         [.] io_submit
>     - 91.71% io_submit
>        - 91.69% syscall
>           - 91.58% entry_SYSCALL_64_after_hwframe
>              - 91.55% do_syscall_64
>                 - 91.06% __x64_sys_io_submit
>                    - 90.93% io_submit_one
>                       - 48.85% aio_write
>                          - 48.77% ext4_file_write_iter
>                             - 39.86% iomap_dio_rw
>                                - 39.85% __iomap_dio_rw
>                                   - 22.55% blk_finish_plug
>                                      - 22.55% __blk_flush_plug
>                                         - 21.67% raid10_unplug
>                                            - 16.54% submit_bio_noacct_nocheck
>                                               - 16.44% blk_mq_submit_bio
>                                                  - 16.17% __rq_qos_throttle
>                                                     - 16.01% wbt_wait

You can disable wbt to prevent overhead here.

Thanks,
Kuai
>                                                        - 15.93% rq_qos_wait
>                                                           - 14.52% prepare_to_wait_exclusive
>                                                              - 11.50% _raw_spin_lock_irqsave
>                                                                   11.49% native_queued_spin_lock_slowpath
>                                                              - 3.01% _raw_spin_unlock_irqrestore
>                                                                 - 2.29% asm_common_interrupt
>                                                                    - 2.29% common_interrupt
>                                                                       - 2.28% __common_interrupt
>                                                                          - 2.28% handle_edge_irq
>                                                                             - 2.26% handle_irq_event
>                                                                                - 2.26% __handle_irq_event_percpu
>                                                                                   - nvme_irq
>                                                                                      - 2.23% blk_mq_end_request_batch
>                                                                                         - 1.41% __rq_qos_done
>                                                                                            - wbt_done
>                                                                                               - 1.38% __wake_up_common_lock
>                                                                                                  - 1.36% _raw_spin_lock_irqsave
>                                                                                                       native_queued_spin_lock_slowpath
>                                                                                           0.51% raid10_end_read_request
>                                                                 - 0.61% asm_sysvec_apic_timer_interrupt
>                                                                    - sysvec_apic_timer_interrupt
>                                                                       - 0.57% __irq_exit_rcu
>                                                                          - 0.56% __softirqentry_text_start
>                                                                             - 0.55% asm_common_interrupt
>                                                                                - common_interrupt
>                                                                                   - 0.55% __common_interrupt
>                                                                                      - 0.55% handle_edge_irq
>                                                                                         - 0.54% handle_irq_event
>                                                                                            - 0.54% __handle_irq_event_percpu
>                                                                                               - nvme_irq
>                                                                                                    0.54% blk_mq_end_request_batch
>                                                           - 0.87% io_schedule
>                                                              - 0.85% schedule
>                                                                 - 0.84% __schedule
>                                                                    - 0.62% pick_next_task_fair
>                                                                       - 0.61% newidle_balance
>                                                                          - 0.60% load_balance
>                                                                               0.50% find_busiest_group
>                                            - 3.98% __wake_up_common_lock
>                                               - 3.21% _raw_spin_lock_irqsave
>                                                    3.02% native_queued_spin_lock_slowpath
>                                               - 0.77% _raw_spin_unlock_irqrestore
>                                                  - 0.64% asm_common_interrupt
>                                                     - common_interrupt
>                                                        - 0.63% __common_interrupt
>                                                           - 0.63% handle_edge_irq
>                                                              - 0.60% handle_irq_event
>                                                                 - 0.59% __handle_irq_event_percpu
>                                                                    - nvme_irq
>                                                                         0.58% blk_mq_end_request_batch
>                                           0.84% blk_mq_flush_plug_list
>                                   - 12.50% iomap_dio_bio_iter
>                                      - 10.79% submit_bio_noacct_nocheck
>                                         - 10.73% __submit_bio
>                                            - 9.77% md_handle_request
>                                               - 7.14% raid10_make_request
>                                                  - 2.98% raid10_write_one_disk
>                                                     - 0.52% asm_common_interrupt
>                                                        - common_interrupt
>                                                           - 0.51% __common_interrupt
>                                                                0.51% handle_edge_irq
>                                                    1.16% wait_blocked_dev
>                                                  - 0.83% regular_request_wait
>                                                       0.82% wait_barrier
>                                              0.95% md_submit_bio
>                                   - 3.54% iomap_iter
>                                      - 3.52% ext4_iomap_overwrite_begin
>                                         - 3.52% ext4_iomap_begin
>                                              1.80% ext4_set_iomap
>                             - 2.19% file_modified_flags
>                                  2.16% inode_needs_update_time.part.0
>                               1.82% up_read
>                               1.61% down_read
>                             - 0.88% ext4_generic_write_checks
>                                  0.57% generic_write_checks
>                       - 41.78% aio_read
>                          - 41.64% ext4_file_read_iter
>                             - 31.62% iomap_dio_rw
>                                - 31.61% __iomap_dio_rw
>                                   - 19.92% iomap_dio_bio_iter
>                                      - 16.26% submit_bio_noacct_nocheck
>                                         - 15.60% __submit_bio
>                                            - 13.31% md_handle_request
>                                               - 7.50% raid10_make_request
>                                                  - 6.14% raid10_read_request
>                                                     - 1.94% regular_request_wait
>                                                          1.92% wait_barrier
>                                                       1.12% read_balance
>                                                     - 0.53% asm_common_interrupt
>                                                        - 0.53% common_interrupt
>                                                           - 0.53% __common_interrupt
>                                                              - 0.53% handle_edge_irq
>                                                                   0.50% handle_irq_event
>                                               - 1.14% asm_common_interrupt
>                                                  - 1.14% common_interrupt
>                                                     - 1.13% __common_interrupt
>                                                        - 1.13% handle_edge_irq
>                                                           - 1.08% handle_irq_event
>                                                              - 1.07% __handle_irq_event_percpu
>                                                                 - nvme_irq
>                                                                      1.05% blk_mq_end_request_batch
>                                              2.22% md_submit_bio
>                                           0.52% blk_mq_submit_bio
>                                      - 0.67% asm_common_interrupt
>                                         - common_interrupt
>                                            - 0.66% __common_interrupt
>                                               - 0.66% handle_edge_irq
>                                                  - 0.62% handle_irq_event
>                                                     - 0.62% __handle_irq_event_percpu
>                                                        - nvme_irq
>                                                             0.61% blk_mq_end_request_batch
>                                   - 8.90% iomap_iter
>                                      - 8.86% ext4_iomap_begin
>                                         - 4.24% ext4_set_iomap
>                                            - 0.86% asm_common_interrupt
>                                               - 0.86% common_interrupt
>                                                  - 0.85% __common_interrupt
>                                                     - 0.85% handle_edge_irq
>                                                        - 0.81% handle_irq_event
>                                                           - 0.80% __handle_irq_event_percpu
>                                                              - nvme_irq
>                                                                   0.79% blk_mq_end_request_batch
>                                         - 0.88% ext4_map_blocks
>                                              0.68% ext4_es_lookup_extent
>                                         - 0.81% asm_common_interrupt
>                                            - 0.81% common_interrupt
>                                               - 0.81% __common_interrupt
>                                                  - 0.81% handle_edge_irq
>                                                     - 0.77% handle_irq_event
>                                                        - 0.76% __handle_irq_event_percpu
>                                                           - nvme_irq
>                                                                0.75% blk_mq_end_request_batch
>                             - 4.53% down_read
>                                - 0.87% asm_common_interrupt
>                                   - 0.86% common_interrupt
>                                      - 0.86% __common_interrupt
>                                         - 0.86% handle_edge_irq
>                                            - 0.81% handle_irq_event
>                                               - 0.81% __handle_irq_event_percpu
>                                                  - nvme_irq
>                                                       0.79% blk_mq_end_request_batch
>                             - 4.42% up_read
>                                - 0.82% asm_common_interrupt
>                                   - 0.82% common_interrupt
>                                      - 0.81% __common_interrupt
>                                         - 0.81% handle_edge_irq
>                                            - 0.77% handle_irq_event
>                                               - 0.77% __handle_irq_event_percpu
>                                                  - nvme_irq
>                                                       0.75% blk_mq_end_request_batch
>                             - 0.86% ext4_dio_alignment
>                                  0.83% ext4_inode_journal_mode
> 
> .
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unacceptably Poor RAID1 Performance with Many CPU Cores
  2023-07-03 12:39                           ` Yu Kuai
@ 2023-07-05  7:59                             ` Ali Gholami Rudi
  0 siblings, 0 replies; 30+ messages in thread
From: Ali Gholami Rudi @ 2023-07-05  7:59 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Yu Kuai, Xiao Ni, linux-raid, song

Hi,

Yu Kuai <yukuai3@huawei.com> wrote:
> > 
> > On md0 (40GB):
> > READ:  IOPS=1563K BW=6109MiB/s
> > WRITE: IOPS= 670K BW=2745MiB/s
> > 
> > On md3 (14TB):
> > READ:  IOPS=1177K BW=4599MiB/s
> > WRITE: IOPS= 505K BW=1972MiB/s
> > 
> > On md3 but disabling mdadm bitmap (mdadm --grow --bitmap=none /dev/md3):
> > READ:  IOPS=1351K BW=5278MiB/s
> > WRITE: IOPS= 579K BW=2261MiB/s
> 
> Currently, if bitmap is enabled, a bitmap level spinlock will be grabbed
> for each write, and sadly this will require a huge refactor to improve
> performance.

OK.

> > +   95.25%     0.00%  fio      [unknown]               [k] 0xffffffffffffffff
> > +   95.00%     0.00%  fio      fio                     [.] 0x000055e073fcd117
> > +   93.68%     0.13%  fio      [kernel.kallsyms]       [k] entry_SYSCALL_64_after_hwframe
> > +   93.54%     0.03%  fio      [kernel.kallsyms]       [k] do_syscall_64
> > +   92.38%     0.03%  fio      libc.so.6               [.] syscall
> > +   92.18%     0.00%  fio      fio                     [.] 0x000055e073fcaceb
> > +   92.18%     0.08%  fio      fio                     [.] td_io_queue
> > +   92.04%     0.02%  fio      fio                     [.] td_io_commit
> > +   91.76%     0.00%  fio      fio                     [.] 0x000055e073fefe5e
> > -   91.76%     0.05%  fio      libaio.so.1.0.2         [.] io_submit
> >     - 91.71% io_submit
> >        - 91.69% syscall
> >           - 91.58% entry_SYSCALL_64_after_hwframe
> >              - 91.55% do_syscall_64
> >                 - 91.06% __x64_sys_io_submit
> >                    - 90.93% io_submit_one
> >                       - 48.85% aio_write
> >                          - 48.77% ext4_file_write_iter
> >                             - 39.86% iomap_dio_rw
> >                                - 39.85% __iomap_dio_rw
> >                                   - 22.55% blk_finish_plug
> >                                      - 22.55% __blk_flush_plug
> >                                         - 21.67% raid10_unplug
> >                                            - 16.54% submit_bio_noacct_nocheck
> >                                               - 16.44% blk_mq_submit_bio
> >                                                  - 16.17% __rq_qos_throttle
> >                                                     - 16.01% wbt_wait
> 
> You can disable wbt to prevent overhead here.

Very good.  I will give it a try.  And thanks for your time.

Thanks,
Ali


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-07-05  8:02 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-15  7:54 Unacceptably Poor RAID1 Performance with Many CPU Cores Ali Gholami Rudi
2023-06-15  9:16 ` Xiao Ni
2023-06-15 17:08   ` Ali Gholami Rudi
2023-06-15 17:36     ` Ali Gholami Rudi
2023-06-16  1:53       ` Xiao Ni
2023-06-16  5:20         ` Ali Gholami Rudi
2023-06-15 14:02 ` Yu Kuai
2023-06-16  2:14   ` Xiao Ni
2023-06-16  2:34     ` Yu Kuai
2023-06-16  5:52     ` Ali Gholami Rudi
     [not found]     ` <20231606091224@laper.mirepesht>
2023-06-16  7:31       ` Ali Gholami Rudi
2023-06-16  7:42         ` Yu Kuai
2023-06-16  8:21           ` Ali Gholami Rudi
2023-06-16  8:34             ` Yu Kuai
2023-06-16  8:52               ` Ali Gholami Rudi
2023-06-16  9:17                 ` Yu Kuai
2023-06-16 11:51                 ` Ali Gholami Rudi
2023-06-16 12:27                   ` Yu Kuai
2023-06-18 20:30                     ` Ali Gholami Rudi
2023-06-19  1:22                       ` Yu Kuai
2023-06-19  5:19                       ` Ali Gholami Rudi
2023-06-19  6:53                         ` Yu Kuai
2023-06-21  8:05                     ` Xiao Ni
2023-06-21  8:26                       ` Yu Kuai
2023-06-21  8:55                         ` Xiao Ni
2023-07-01 11:17                         ` Ali Gholami Rudi
2023-07-03 12:39                           ` Yu Kuai
2023-07-05  7:59                             ` Ali Gholami Rudi
2023-06-21 19:34                       ` Wols Lists
2023-06-23  0:52                         ` Xiao Ni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).