linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Derrick <jonathan.derrick@linux.dev>
To: Xiao Ni <xni@redhat.com>
Cc: Song Liu <song@kernel.org>,
	linux-raid@vger.kernel.org,
	Reindl Harald <h.reindl@thelounge.net>,
	Christoph Hellwig <hch@lst.de>,
	Christoph Hellwig <hch@infradead.org>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	Sushma Kalakota <sushma.kalakota@intel.com>
Subject: Re: [PATCH v5 3/3] md: Use optimal I/O size for last bitmap page
Date: Tue, 28 Feb 2023 16:09:56 -0700	[thread overview]
Message-ID: <2f2ba1cb-a053-7494-ce42-4670b66baacf@linux.dev> (raw)
In-Reply-To: <CALTww2_P6TaV7C5i2k5sUeHOpnqTxjFB-ZA98Y2re+17J5d7Kw@mail.gmail.com>

Hi Xiao

On 2/26/2023 6:56 PM, Xiao Ni wrote:
> Hi Jonathan
> 
> I did a test in my environment, but I didn't see such a big
> performance difference.
> 
> The first environment:
> All nvme devices have 512 logical size, 512 phy size, and 0 optimal size. Then
> I used your way to rebuild the kernel
> /sys/block/nvme0n1/queue/physical_block_size 512
> /sys/block/nvme0n1/queue/optimal_io_size 4096
> cat /sys/block/nvme0n1/queue/logical_block_size 512
> 
> without the patch set
> write: IOPS=68.0k, BW=266MiB/s (279MB/s)(15.6GiB/60001msec); 0 zone resets
> with the patch set
> write: IOPS=69.1k, BW=270MiB/s (283MB/s)(15.8GiB/60001msec); 0 zone resets
> 
> The second environment:
> The nvme devices' opt size are 4096. So I don't need to rebuild the kernel.
> /sys/block/nvme0n1/queue/logical_block_size
> /sys/block/nvme0n1/queue/physical_block_size
> /sys/block/nvme0n1/queue/optimal_io_size
> 
> without the patch set
> write: IOPS=51.6k, BW=202MiB/s (212MB/s)(11.8GiB/60001msec); 0 zone resets
> with the patch set
> write: IOPS=53.5k, BW=209MiB/s (219MB/s)(12.2GiB/60001msec); 0 zone resets
> 
Sounds like your devices may not have latency issues at sub-optimal sizes.
Can you provide biosnoop traces with and without patches?

Still, 'works fine for me' is generally not a reason to reject the patches.

> Best Regards
> Xiao
> 
> On Sat, Feb 25, 2023 at 2:34 AM Jonathan Derrick
> <jonathan.derrick@linux.dev> wrote:
>>
>> From: Jon Derrick <jonathan.derrick@linux.dev>
>>
>> If the bitmap space has enough room, size the I/O for the last bitmap
>> page write to the optimal I/O size for the storage device. The expanded
>> write is checked that it won't overrun the data or metadata.
>>
>> The drive this was tested against has higher latencies when there are
>> sub-4k writes due to device-side read-mod-writes of its atomic 4k write
>> unit. This change helps increase performance by sizing the last bitmap
>> page I/O for the device's preferred write unit, if it is given.
>>
>> Example Intel/Solidigm P5520
>> Raid10, Chunk-size 64M, bitmap-size 57228 bits
>>
>> $ mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/nvme{0,1,2,3}n1
>>         --assume-clean --bitmap=internal --bitmap-chunk=64M
>> $ fio --name=test --direct=1 --filename=/dev/md0 --rw=randwrite --bs=4k --runtime=60
>>
>> Without patch:
>>   write: IOPS=1676, BW=6708KiB/s (6869kB/s)(393MiB/60001msec); 0 zone resets
>>
>> With patch:
>>   write: IOPS=15.7k, BW=61.4MiB/s (64.4MB/s)(3683MiB/60001msec); 0 zone resets
>>
>> Biosnoop:
>> Without patch:
>> Time        Process        PID     Device      LBA        Size      Lat
>> 1.410377    md0_raid10     6900    nvme0n1   W 16         4096      0.02
>> 1.410387    md0_raid10     6900    nvme2n1   W 16         4096      0.02
>> 1.410374    md0_raid10     6900    nvme3n1   W 16         4096      0.01
>> 1.410381    md0_raid10     6900    nvme1n1   W 16         4096      0.02
>> 1.410411    md0_raid10     6900    nvme1n1   W 115346512  4096      0.01
>> 1.410418    md0_raid10     6900    nvme0n1   W 115346512  4096      0.02
>> 1.410915    md0_raid10     6900    nvme2n1   W 24         3584      0.43 <--
>> 1.410935    md0_raid10     6900    nvme3n1   W 24         3584      0.45 <--
>> 1.411124    md0_raid10     6900    nvme1n1   W 24         3584      0.64 <--
>> 1.411147    md0_raid10     6900    nvme0n1   W 24         3584      0.66 <--
>> 1.411176    md0_raid10     6900    nvme3n1   W 2019022184 4096      0.01
>> 1.411189    md0_raid10     6900    nvme2n1   W 2019022184 4096      0.02
>>
>> With patch:
>> Time        Process        PID     Device      LBA        Size      Lat
>> 5.747193    md0_raid10     727     nvme0n1   W 16         4096      0.01
>> 5.747192    md0_raid10     727     nvme1n1   W 16         4096      0.02
>> 5.747195    md0_raid10     727     nvme3n1   W 16         4096      0.01
>> 5.747202    md0_raid10     727     nvme2n1   W 16         4096      0.02
>> 5.747229    md0_raid10     727     nvme3n1   W 1196223704 4096      0.02
>> 5.747224    md0_raid10     727     nvme0n1   W 1196223704 4096      0.01
>> 5.747279    md0_raid10     727     nvme0n1   W 24         4096      0.01 <--
>> 5.747279    md0_raid10     727     nvme1n1   W 24         4096      0.02 <--
>> 5.747284    md0_raid10     727     nvme3n1   W 24         4096      0.02 <--
>> 5.747291    md0_raid10     727     nvme2n1   W 24         4096      0.02 <--
>> 5.747314    md0_raid10     727     nvme2n1   W 2234636712 4096      0.01
>> 5.747317    md0_raid10     727     nvme1n1   W 2234636712 4096      0.02
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Jon Derrick <jonathan.derrick@linux.dev>
>> ---
>>  drivers/md/md-bitmap.c | 33 +++++++++++++++++++++++++++++----
>>  1 file changed, 29 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
>> index bf250a5e3a86..920bb68156d2 100644
>> --- a/drivers/md/md-bitmap.c
>> +++ b/drivers/md/md-bitmap.c
>> @@ -209,6 +209,28 @@ static struct md_rdev *next_active_rdev(struct md_rdev *rdev, struct mddev *mdde
>>         return NULL;
>>  }
>>
>> +static unsigned int optimal_io_size(struct block_device *bdev,
>> +                                   unsigned int last_page_size,
>> +                                   unsigned int io_size)
>> +{
>> +       if (bdev_io_opt(bdev) > bdev_logical_block_size(bdev))
>> +               return roundup(last_page_size, bdev_io_opt(bdev));
>> +       return io_size;
>> +}
>> +
>> +static unsigned int bitmap_io_size(unsigned int io_size, unsigned int opt_size,
>> +                                  sector_t start, sector_t boundary)
>> +{
>> +       if (io_size != opt_size &&
>> +           start + opt_size / SECTOR_SIZE <= boundary)
>> +               return opt_size;
>> +       if (start + io_size / SECTOR_SIZE <= boundary)
>> +               return io_size;
>> +
>> +       /* Overflows boundary */
>> +       return 0;
>> +}
>> +
>>  static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap,
>>                            struct page *page)
>>  {
>> @@ -218,6 +240,7 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap,
>>         sector_t offset = mddev->bitmap_info.offset;
>>         sector_t ps, sboff, doff;
>>         unsigned int size = PAGE_SIZE;
>> +       unsigned int opt_size = PAGE_SIZE;
>>
>>         bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev;
>>         if (page->index == store->file_pages - 1) {
>> @@ -225,8 +248,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap,
>>
>>                 if (last_page_size == 0)
>>                         last_page_size = PAGE_SIZE;
>> -               size = roundup(last_page_size,
>> -                              bdev_logical_block_size(bdev));
>> +               size = roundup(last_page_size, bdev_logical_block_size(bdev));
>> +               opt_size = optimal_io_size(bdev, last_page_size, size);
>>         }
>>
>>         ps = page->index * PAGE_SIZE / SECTOR_SIZE;
>> @@ -241,7 +264,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap,
>>                         return -EINVAL;
>>         } else if (offset < 0) {
>>                 /* DATA  BITMAP METADATA  */
>> -               if (offset + ps + size / SECTOR_SIZE > 0)
>> +               size = bitmap_io_size(size, opt_size, offset + ps, 0);
>> +               if (size == 0)
>>                         /* bitmap runs in to metadata */
>>                         return -EINVAL;
>>
>> @@ -250,7 +274,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap,
>>                         return -EINVAL;
>>         } else if (rdev->sb_start < rdev->data_offset) {
>>                 /* METADATA BITMAP DATA */
>> -               if (sboff + ps + size / SECTOR_SIZE > doff)
>> +               size = bitmap_io_size(size, opt_size, sboff + ps, doff);
>> +               if (size == 0)
>>                         /* bitmap runs in to data */
>>                         return -EINVAL;
>>         } else {
>> --
>> 2.27.0
>>
> 

  reply	other threads:[~2023-02-28 23:10 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-24 18:33 [PATCH v5 0/3] md/bitmap: Optimal last page size Jonathan Derrick
2023-02-24 18:33 ` [PATCH v5 1/3] md: Move sb writer loop to its own function Jonathan Derrick
2023-02-24 18:33 ` [PATCH v5 2/3] md: Fix types in sb writer Jonathan Derrick
2023-02-24 18:33 ` [PATCH v5 3/3] md: Use optimal I/O size for last bitmap page Jonathan Derrick
2023-02-27  1:56   ` Xiao Ni
2023-02-28 23:09     ` Jonathan Derrick [this message]
2023-03-01 12:36       ` Xiao Ni
2023-03-01 17:56         ` Jonathan Derrick
2023-03-02  9:05           ` Xiao Ni
2023-03-02 17:17             ` Jonathan Derrick
2023-03-03  1:55               ` Xiao Ni
2023-03-13 21:38 ` [PATCH v5 0/3] md/bitmap: Optimal last page size Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f2ba1cb-a053-7494-ce42-4670b66baacf@linux.dev \
    --to=jonathan.derrick@linux.dev \
    --cc=h.reindl@thelounge.net \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=pmenzel@molgen.mpg.de \
    --cc=song@kernel.org \
    --cc=sushma.kalakota@intel.com \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).