From: Jonathan Derrick <jonathan.derrick@linux.dev>
To: Song Liu <song@kernel.org>, <linux-raid@vger.kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>,
Sushma Kalakota <sushma.kalakota@intel.com>,
Jon Derrick <jonathan.derrick@linux.dev>
Subject: [PATCH v2] md: Use optimal I/O size for last bitmap page
Date: Fri, 17 Feb 2023 11:31:39 -0700 [thread overview]
Message-ID: <20230217183139.106-1-jonathan.derrick@linux.dev> (raw)
From: Jon Derrick <jonathan.derrick@linux.dev>
If the bitmap space has enough room, size the I/O for the last bitmap
page write to the optimal I/O size for the storage device. The expanded
write is checked that it won't overrun the data or metadata.
This change helps increase performance by preventing unnecessary
device-side read-mod-writes due to non-atomic write unit sizes.
Example Intel/Solidigm P5520
Raid10, Chunk-size 64M, bitmap-size 57228 bits
$ mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/nvme{0,1,2,3}n1 --assume-clean --bitmap=internal --bitmap-chunk=64M
$ fio --name=test --direct=1 --filename=/dev/md0 --rw=randwrite --bs=4k --runtime=60
Without patch:
write: IOPS=1676, BW=6708KiB/s (6869kB/s)(393MiB/60001msec); 0 zone resets
With patch:
write: IOPS=15.7k, BW=61.4MiB/s (64.4MB/s)(3683MiB/60001msec); 0 zone resets
Biosnoop:
Without patch:
Time Process PID Device LBA Size Lat
1.410377 md0_raid10 6900 nvme0n1 W 16 4096 0.02
1.410387 md0_raid10 6900 nvme2n1 W 16 4096 0.02
1.410374 md0_raid10 6900 nvme3n1 W 16 4096 0.01
1.410381 md0_raid10 6900 nvme1n1 W 16 4096 0.02
1.410411 md0_raid10 6900 nvme1n1 W 115346512 4096 0.01
1.410418 md0_raid10 6900 nvme0n1 W 115346512 4096 0.02
1.410915 md0_raid10 6900 nvme2n1 W 24 3584 0.43
1.410935 md0_raid10 6900 nvme3n1 W 24 3584 0.45
1.411124 md0_raid10 6900 nvme1n1 W 24 3584 0.64
1.411147 md0_raid10 6900 nvme0n1 W 24 3584 0.66
1.411176 md0_raid10 6900 nvme3n1 W 2019022184 4096 0.01
1.411189 md0_raid10 6900 nvme2n1 W 2019022184 4096 0.02
With patch:
Time Process PID Device LBA Size Lat
5.747193 md0_raid10 727 nvme0n1 W 16 4096 0.01
5.747192 md0_raid10 727 nvme1n1 W 16 4096 0.02
5.747195 md0_raid10 727 nvme3n1 W 16 4096 0.01
5.747202 md0_raid10 727 nvme2n1 W 16 4096 0.02
5.747229 md0_raid10 727 nvme3n1 W 1196223704 4096 0.02
5.747224 md0_raid10 727 nvme0n1 W 1196223704 4096 0.01
5.747279 md0_raid10 727 nvme0n1 W 24 4096 0.01
5.747279 md0_raid10 727 nvme1n1 W 24 4096 0.02
5.747284 md0_raid10 727 nvme3n1 W 24 4096 0.02
5.747291 md0_raid10 727 nvme2n1 W 24 4096 0.02
5.747314 md0_raid10 727 nvme2n1 W 2234636712 4096 0.01
5.747317 md0_raid10 727 nvme1n1 W 2234636712 4096 0.02
Signed-off-by: Jon Derrick <jonathan.derrick@linux.dev>
---
v1->v2: Add more information to commit message
drivers/md/md-bitmap.c | 27 ++++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index e7cc6ba1b657..569297ea9b99 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -220,6 +220,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
rdev = NULL;
while ((rdev = next_active_rdev(rdev, mddev)) != NULL) {
int size = PAGE_SIZE;
+ int optimal_size = PAGE_SIZE;
loff_t offset = mddev->bitmap_info.offset;
bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev;
@@ -228,9 +229,14 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
int last_page_size = store->bytes & (PAGE_SIZE-1);
if (last_page_size == 0)
last_page_size = PAGE_SIZE;
- size = roundup(last_page_size,
- bdev_logical_block_size(bdev));
+ size = roundup(last_page_size, bdev_logical_block_size(bdev));
+ if (bdev_io_opt(bdev) > bdev_logical_block_size(bdev))
+ optimal_size = roundup(last_page_size, bdev_io_opt(bdev));
+ else
+ optimal_size = size;
}
+
+
/* Just make sure we aren't corrupting data or
* metadata
*/
@@ -246,9 +252,11 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
goto bad_alignment;
} else if (offset < 0) {
/* DATA BITMAP METADATA */
- if (offset
- + (long)(page->index * (PAGE_SIZE/512))
- + size/512 > 0)
+ loff_t off = offset + (long)(page->index * (PAGE_SIZE/512));
+ if (size != optimal_size &&
+ off + optimal_size/512 <= 0)
+ size = optimal_size;
+ else if (off + size/512 > 0)
/* bitmap runs in to metadata */
goto bad_alignment;
if (rdev->data_offset + mddev->dev_sectors
@@ -257,10 +265,11 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
goto bad_alignment;
} else if (rdev->sb_start < rdev->data_offset) {
/* METADATA BITMAP DATA */
- if (rdev->sb_start
- + offset
- + page->index*(PAGE_SIZE/512) + size/512
- > rdev->data_offset)
+ loff_t off = rdev->sb_start + offset + page->index*(PAGE_SIZE/512);
+ if (size != optimal_size &&
+ off + optimal_size/512 <= rdev->data_offset)
+ size = optimal_size;
+ else if (off + size/512 > rdev->data_offset)
/* bitmap runs in to data */
goto bad_alignment;
} else {
--
2.27.0
next reply other threads:[~2023-02-17 18:34 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-17 18:31 Jonathan Derrick [this message]
2023-02-21 2:38 ` [PATCH v2] md: Use optimal I/O size for last bitmap page Xiao Ni
2023-02-21 3:31 ` Jonathan Derrick
2023-02-21 5:35 ` Xiao Ni
2023-02-21 7:34 ` Jonathan Derrick
2023-02-21 14:39 ` Christoph Hellwig
2023-02-21 14:44 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230217183139.106-1-jonathan.derrick@linux.dev \
--to=jonathan.derrick@linux.dev \
--cc=linux-raid@vger.kernel.org \
--cc=pmenzel@molgen.mpg.de \
--cc=song@kernel.org \
--cc=sushma.kalakota@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.