* [Drbd-dev] Checksum based resync block size
@ 2019-06-22 0:03 Eric Wheeler
2019-06-24 15:49 ` Lars Ellenberg
0 siblings, 1 reply; 5+ messages in thread
From: Eric Wheeler @ 2019-06-22 0:03 UTC (permalink / raw)
To: drbd-dev
Hello all,
Can someone help explain how checksum-based sync and verify are
implemented in the sender and receive side? It looks like the hashes are
per-sector (looking at read_for_csum?) and I am interested in making the
csum chunk size configurable, or at least hack in some test code to see if
it would provide a performance benefit to csum multiple sectors.
I'm also trying to understand what iterates over the lldev and understand
where the csum takes place foreach chunk of data.
Any direction would be helpful. Thank you.
--
Eric Wheeler
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] Checksum based resync block size
2019-06-22 0:03 [Drbd-dev] Checksum based resync block size Eric Wheeler
@ 2019-06-24 15:49 ` Lars Ellenberg
2019-06-26 19:20 ` Eric Wheeler
0 siblings, 1 reply; 5+ messages in thread
From: Lars Ellenberg @ 2019-06-24 15:49 UTC (permalink / raw)
To: drbd-dev
On Sat, Jun 22, 2019 at 12:03:55AM +0000, Eric Wheeler wrote:
> Hello all,
>
> Can someone help explain how checksum-based sync and verify are
> implemented in the sender and receive side? It looks like the hashes are
> per-sector (looking at read_for_csum?) and I am interested in making the
> csum chunk size configurable, or at least hack in some test code to see if
> it would provide a performance benefit to csum multiple sectors.
>
> I'm also trying to understand what iterates over the lldev and understand
> where the csum takes place foreach chunk of data.
>
> Any direction would be helpful. Thank you.
As our in-sync/out-of-sync bitmap tracks 4k blocks,
we want to compare 4k checkesums.
Yes, that generates "a lot" of requests, and if these are not merged by
some IO scheduler on the lower layers, that may seriously suck.
make_ov_request() is what generates the online-verify requests.
What we potentially could do is issue the requests in larger chunks,
like (1 MiB) to the backends, then calculate and communicate the
checksum per each 4k, as well as the result.
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support
DRBD® and LINBIT® are registered trademarks of LINBIT
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] Checksum based resync block size
2019-06-24 15:49 ` Lars Ellenberg
@ 2019-06-26 19:20 ` Eric Wheeler
2019-06-27 10:22 ` Robert Altnoeder
0 siblings, 1 reply; 5+ messages in thread
From: Eric Wheeler @ 2019-06-26 19:20 UTC (permalink / raw)
To: Lars Ellenberg; +Cc: drbd-dev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1735 bytes --]
On Mon, 24 Jun 2019, Lars Ellenberg wrote:
> On Sat, Jun 22, 2019 at 12:03:55AM +0000, Eric Wheeler wrote:
> > Hello all,
> >
> > Can someone help explain how checksum-based sync and verify are
> > implemented in the sender and receive side? It looks like the hashes are
> > per-sector (looking at read_for_csum?) and I am interested in making the
> > csum chunk size configurable, or at least hack in some test code to see if
> > it would provide a performance benefit to csum multiple sectors.
> >
> > I'm also trying to understand what iterates over the lldev and understand
> > where the csum takes place foreach chunk of data.
> >
> > Any direction would be helpful. Thank you.
>
> As our in-sync/out-of-sync bitmap tracks 4k blocks,
> we want to compare 4k checkesums.
>
> Yes, that generates "a lot" of requests, and if these are not merged by
> some IO scheduler on the lower layers, that may seriously suck.
>
> make_ov_request() is what generates the online-verify requests.
>
> What we potentially could do is issue the requests in larger chunks,
> like (1 MiB) to the backends, then calculate and communicate the
> checksum per each 4k, as well as the result.
What if it were to calculate 1MiB chunks (configurable) and then
invalidate all 4k bitmap entries in that 1MiB range if the hash
mismatches?
--
Eric Wheeler
>
> --
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker
> : R&D, Integration, Ops, Consulting, Support
>
> DRBD® and LINBIT® are registered trademarks of LINBIT
> _______________________________________________
> drbd-dev mailing list
> drbd-dev@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-dev
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] Checksum based resync block size
2019-06-26 19:20 ` Eric Wheeler
@ 2019-06-27 10:22 ` Robert Altnoeder
2019-06-27 17:59 ` Eric Wheeler
0 siblings, 1 reply; 5+ messages in thread
From: Robert Altnoeder @ 2019-06-27 10:22 UTC (permalink / raw)
To: drbd-dev
On 6/26/19 9:20 PM, Eric Wheeler wrote:
> On Mon, 24 Jun 2019, Lars Ellenberg wrote:
>
>> As our in-sync/out-of-sync bitmap tracks 4k blocks,
>> we want to compare 4k checkesums.
>>
>> Yes, that generates "a lot" of requests, and if these are not merged by
>> some IO scheduler on the lower layers, that may seriously suck.
>>
>> make_ov_request() is what generates the online-verify requests.
>>
>> What we potentially could do is issue the requests in larger chunks,
>> like (1 MiB) to the backends, then calculate and communicate the
>> checksum per each 4k, as well as the result.
> What if it were to calculate 1MiB chunks (configurable) and then
> invalidate all 4k bitmap entries in that 1MiB range if the hash
> mismatches?
Is your intention to reduce the number of packets with checksums that
are being sent, and/or the number of checksum comparisons for the same
amount of data?
Both could have a positive impact on performance, but the question is,
whether the difference is big enough to be relevant. On the other hand,
hashing more data per checksum increases the chance of hash collisions.
br,
Robert
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] Checksum based resync block size
2019-06-27 10:22 ` Robert Altnoeder
@ 2019-06-27 17:59 ` Eric Wheeler
0 siblings, 0 replies; 5+ messages in thread
From: Eric Wheeler @ 2019-06-27 17:59 UTC (permalink / raw)
To: Robert Altnoeder; +Cc: drbd-dev
On Thu, 27 Jun 2019, Robert Altnoeder wrote:
> On 6/26/19 9:20 PM, Eric Wheeler wrote:
> > On Mon, 24 Jun 2019, Lars Ellenberg wrote:
> >
> >> As our in-sync/out-of-sync bitmap tracks 4k blocks,
> >> we want to compare 4k checkesums.
> >>
> >> Yes, that generates "a lot" of requests, and if these are not merged by
> >> some IO scheduler on the lower layers, that may seriously suck.
> >>
> >> make_ov_request() is what generates the online-verify requests.
> >>
> >> What we potentially could do is issue the requests in larger chunks,
> >> like (1 MiB) to the backends, then calculate and communicate the
> >> checksum per each 4k, as well as the result.
>
> > What if it were to calculate 1MiB chunks (configurable) and then
> > invalidate all 4k bitmap entries in that 1MiB range if the hash
> > mismatches?
This could also help resync by checksuming contiguous dirty bitmap entries
(up to a chunk size limit) and resyncing the whole series instead of each
4k block.
> Is your intention to reduce the number of packets with checksums that
> are being sent, and/or the number of checksum comparisons for the same
> amount of data?
Reduce the number of packets, but also, crypto transforms perform better
on larger data chunks. You make another good point: fewer hash comparisons
will help too.
> Both could have a positive impact on performance, but the question is,
> whether the difference is big enough to be relevant. On the other hand,
> hashing more data per checksum increases the chance of hash collisions.
I'm not too concerned about hash collisions. That might be a problem with
small CRC32-based sums, but assuming crypto hashes with a hash size of
2^128, then the birthday paradox gives us a collision betwen some two
hashes after 2^64 hashes---that is, any 2 hashes---not specifically the
two being compared. The probability of two chosen hashes colliding is
even lower than 1/(2^64).
-Eric
>
> br,
> Robert
>
> _______________________________________________
> drbd-dev mailing list
> drbd-dev@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-dev
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-06-27 17:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-22 0:03 [Drbd-dev] Checksum based resync block size Eric Wheeler
2019-06-24 15:49 ` Lars Ellenberg
2019-06-26 19:20 ` Eric Wheeler
2019-06-27 10:22 ` Robert Altnoeder
2019-06-27 17:59 ` Eric Wheeler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox