From: Lukas Straub <lukasstraub2@web.de>
To: Mike Snitzer <snitzer@redhat.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Alasdair Kergon <agk@redhat.com>, dm-devel <dm-devel@redhat.com>,
Mikulas Patocka <mpatocka@redhat.com>
Subject: Re: [PATCH v2] dm-integrity: Prevent RMW for full metadata buffer writes
Date: Thu, 26 Mar 2020 12:24:18 +0100 [thread overview]
Message-ID: <20200326122418.741b2ffc@luklap> (raw)
In-Reply-To: <20200324191148.GA2921@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6489 bytes --]
On Tue, 24 Mar 2020 15:11:49 -0400
Mike Snitzer <snitzer@redhat.com> wrote:
> On Tue, Mar 24 2020 at 2:59pm -0400,
> Lukas Straub <lukasstraub2@web.de> wrote:
>
> > On Tue, 24 Mar 2020 13:18:22 -0400
> > Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > > On Thu, Feb 27 2020 at 8:26am -0500,
> > > Lukas Straub <lukasstraub2@web.de> wrote:
> > >
> > > > If a full metadata buffer is being written, don't read it first. This
> > > > prevents a read-modify-write cycle and increases performance on HDDs
> > > > considerably.
> > > >
> > > > To do this we now calculate the checksums for all sectors in the bio in one
> > > > go in integrity_metadata and then pass the result to dm_integrity_rw_tag,
> > > > which now checks if we overwrite the whole buffer.
> > > >
> > > > Benchmarking with a 5400RPM HDD with bitmap mode:
> > > > integritysetup format --no-wipe --batch-mode --interleave-sectors $((64*1024)) -t 4 -s 512 -I crc32c -B /dev/sdc
> > > > integritysetup open --buffer-sectors=1 -I crc32c -B /dev/sdc hdda_integ
> > > > dd if=/dev/zero of=/dev/mapper/hdda_integ bs=64K count=$((16*1024*4)) conv=fsync oflag=direct status=progress
> > > >
> > > > Without patch:
> > > > 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 400.326 s, 10.7 MB/s
> > > >
> > > > With patch:
> > > > 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 41.2057 s, 104 MB/s
> > > >
> > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > > > ---
> > > > Hello Everyone,
> > > > So here is v2, now checking if we overwrite a whole metadata buffer instead
> > > > of the (buggy) check if we overwrite a whole tag area before.
> > > > Performance stayed the same (with --buffer-sectors=1).
> > > >
> > > > The only quirk now is that it advertises a very big optimal io size in the
> > > > standard configuration (where buffer_sectors=128). Is this a Problem?
> > > >
> > > > v2:
> > > > -fix dm_integrity_rw_tag to check if we overwrite a whole metadat buffer
> > > > -fix optimal io size to check if we overwrite a whole metadata buffer
> > > >
> > > > Regards,
> > > > Lukas Straub
> > >
> > >
> > > Not sure why you didn't cc Mikulas but I just checked with him and he
> > > thinks:
> > >
> > > You're only seeing a boost because you're using 512-byte sector and
> > > 512-byte buffer. Patch helps that case but hurts in most other cases
> > > (due to small buffers).
> >
> > Hmm, buffer-sectors is still user configurable. If the user wants fast
> > write performance on slow HDDs he can set a small buffer-sector and
> > benefit from this patch. With the default buffer-sectors=128 it
> > shouldn't have any impact.
>
> OK, if you'd be willing to conduct tests to prove there is no impact
> with larger buffers that'd certainly help reinforce your position and
> make it easier for me to take your patch.
>
> But if you're just testing against slow HDD testbeds then the test
> coverage isn't wide enough.
>
> Thanks,
> Mike
>
Sure,
This time testing with an Samsung 850 EVO SSD:
without patch:
root@teststore:/home/lukas# integritysetup format --no-wipe --batch-mode -I crc32c -B /dev/sdc1
root@teststore:/home/lukas# integritysetup open -I crc32c -B /dev/sdc1 ssda_integ
root@teststore:/home/lukas# dd if=/dev/zero of=/dev/mapper/ssda_integ bs=64K count=$((16*1024*4)) conv=fsync oflag=direct status=progress
4177264640 bytes (4.2 GB, 3.9 GiB) copied, 28 s, 149 MB/s
65536+0 records in
65536+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 28.8946 s, 149 MB/s
root@teststore:/home/lukas# dd if=/dev/zero of=/dev/mapper/ssda_integ bs=4M count=$((256*4)) conv=fsync oflag=direct status=progress
4211081216 bytes (4.2 GB, 3.9 GiB) copied, 22 s, 191 MB/s
1024+0 records in
1024+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 22.6355 s, 190 MB/s
root@teststore:/home/lukas# time mkfs.ext4 /dev/mapper/ssda_integ 4g
mke2fs 1.45.5 (07-Jan-2020)
Creating filesystem with 1048576 4k blocks and 262144 inodes
Filesystem UUID: 679c4808-d549-4576-a8ef-4c46c78c6070
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736
Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
real 0m0.318s
user 0m0.016s
sys 0m0.001s
root@teststore:/home/lukas# mount /dev/mapper/ssda_integ /mnt/
root@teststore:/home/lukas# cd /mnt/
root@teststore:/mnt# time tar -xJf /home/lukas/linux-5.5.4.tar.xz
real 0m18.708s
user 0m14.351s
sys 0m8.585s
root@teststore:/mnt# time rm -rf linux-5.5.4/
real 0m3.226s
user 0m0.117s
sys 0m2.958s
with patch:
root@teststore:/home/lukas# integritysetup format --no-wipe --batch-mode -I crc32c -B /dev/sdc1
root@teststore:/home/lukas# integritysetup open -I crc32c -B /dev/sdc1 ssda_integ
root@teststore:/home/lukas# dd if=/dev/zero of=/dev/mapper/ssda_integ bs=64K count=$((16*1024*4)) conv=fsync oflag=direct status=progress
4169007104 bytes (4.2 GB, 3.9 GiB) copied, 27 s, 154 MB/s
65536+0 records in
65536+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 27.9165 s, 154 MB/s
root@teststore:/home/lukas# dd if=/dev/zero of=/dev/mapper/ssda_integ bs=4M count=$((256*4)) conv=fsync oflag=direct status=progress
4169138176 bytes (4.2 GB, 3.9 GiB) copied, 22 s, 189 MB/s
1024+0 records in
1024+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 22.9181 s, 187 MB/s
root@teststore:/home/lukas# time mkfs.ext4 /dev/mapper/ssda_integ 4g
mke2fs 1.45.5 (07-Jan-2020)
Creating filesystem with 1048576 4k blocks and 262144 inodes
Filesystem UUID: 9a9decad-19e2-46a4-8cc5-ce27238829f2
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736
Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
real 0m0.341s
user 0m0.000s
sys 0m0.016s
root@teststore:/home/lukas# mount /dev/mapper/ssda_integ /mnt/
root@teststore:/home/lukas# cd /mnt/
root@teststore:/mnt# time tar -xJf /home/lukas/linux-5.5.4.tar.xz
real 0m18.409s
user 0m14.191s
sys 0m8.585s
root@teststore:/mnt# time rm -rf linux-5.5.4/
real 0m3.200s
user 0m0.133s
sys 0m2.979s
Regards,
Lukas Straub
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2020-03-26 11:24 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-27 13:26 [PATCH v2] dm-integrity: Prevent RMW for full metadata buffer writes Lukas Straub
2020-03-24 17:18 ` Mike Snitzer
2020-03-24 18:59 ` Lukas Straub
2020-03-24 19:11 ` Mike Snitzer
2020-03-26 11:24 ` Lukas Straub [this message]
2020-03-30 16:34 ` [dm-devel] " Mikulas Patocka
2020-03-31 12:06 ` Lukas Straub
2020-04-03 11:58 ` Lukas Straub
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200326122418.741b2ffc@luklap \
--to=lukasstraub2@web.de \
--cc=agk@redhat.com \
--cc=dm-devel@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.