Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
* [Drbd-dev] Checksum based resync block size
@ 2019-06-22  0:03 Eric Wheeler
  2019-06-24 15:49 ` Lars Ellenberg
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wheeler @ 2019-06-22  0:03 UTC (permalink / raw)
  To: drbd-dev

Hello all,

Can someone help explain how checksum-based sync and verify are 
implemented in the sender and receive side?  It looks like the hashes are 
per-sector (looking at read_for_csum?) and I am interested in making the 
csum chunk size configurable, or at least hack in some test code to see if 
it would provide a performance benefit to csum multiple sectors.

I'm also trying to understand what iterates over the lldev and understand 
where the csum takes place foreach chunk of data.

Any direction would be helpful.  Thank you.

--
Eric Wheeler

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Drbd-dev] Checksum based resync block size
  2019-06-22  0:03 [Drbd-dev] Checksum based resync block size Eric Wheeler
@ 2019-06-24 15:49 ` Lars Ellenberg
  2019-06-26 19:20   ` Eric Wheeler
  0 siblings, 1 reply; 5+ messages in thread
From: Lars Ellenberg @ 2019-06-24 15:49 UTC (permalink / raw)
  To: drbd-dev

On Sat, Jun 22, 2019 at 12:03:55AM +0000, Eric Wheeler wrote:
> Hello all,
> 
> Can someone help explain how checksum-based sync and verify are 
> implemented in the sender and receive side?  It looks like the hashes are 
> per-sector (looking at read_for_csum?) and I am interested in making the 
> csum chunk size configurable, or at least hack in some test code to see if 
> it would provide a performance benefit to csum multiple sectors.
> 
> I'm also trying to understand what iterates over the lldev and understand 
> where the csum takes place foreach chunk of data.
> 
> Any direction would be helpful.  Thank you.

As our in-sync/out-of-sync bitmap tracks 4k blocks,
we want to compare 4k checkesums.

Yes, that generates "a lot" of requests, and if these are not merged by
some IO scheduler on the lower layers, that may seriously suck.

make_ov_request() is what generates the online-verify requests.

What we potentially could do is issue the requests in larger chunks,
like (1 MiB) to the backends, then calculate and communicate the
checksum per each 4k, as well as the result.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Drbd-dev] Checksum based resync block size
  2019-06-24 15:49 ` Lars Ellenberg
@ 2019-06-26 19:20   ` Eric Wheeler
  2019-06-27 10:22     ` Robert Altnoeder
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wheeler @ 2019-06-26 19:20 UTC (permalink / raw)
  To: Lars Ellenberg; +Cc: drbd-dev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1735 bytes --]

On Mon, 24 Jun 2019, Lars Ellenberg wrote:

> On Sat, Jun 22, 2019 at 12:03:55AM +0000, Eric Wheeler wrote:
> > Hello all,
> > 
> > Can someone help explain how checksum-based sync and verify are 
> > implemented in the sender and receive side?  It looks like the hashes are 
> > per-sector (looking at read_for_csum?) and I am interested in making the 
> > csum chunk size configurable, or at least hack in some test code to see if 
> > it would provide a performance benefit to csum multiple sectors.
> > 
> > I'm also trying to understand what iterates over the lldev and understand 
> > where the csum takes place foreach chunk of data.
> > 
> > Any direction would be helpful.  Thank you.
> 
> As our in-sync/out-of-sync bitmap tracks 4k blocks,
> we want to compare 4k checkesums.
> 
> Yes, that generates "a lot" of requests, and if these are not merged by
> some IO scheduler on the lower layers, that may seriously suck.
> 
> make_ov_request() is what generates the online-verify requests.
> 
> What we potentially could do is issue the requests in larger chunks,
> like (1 MiB) to the backends, then calculate and communicate the
> checksum per each 4k, as well as the result.

What if it were to calculate 1MiB chunks (configurable) and then 
invalidate all 4k bitmap entries in that 1MiB range if the hash 
mismatches?


--
Eric Wheeler


> 
> -- 
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker
> : R&D, Integration, Ops, Consulting, Support
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT
> _______________________________________________
> drbd-dev mailing list
> drbd-dev@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-dev
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Drbd-dev] Checksum based resync block size
  2019-06-26 19:20   ` Eric Wheeler
@ 2019-06-27 10:22     ` Robert Altnoeder
  2019-06-27 17:59       ` Eric Wheeler
  0 siblings, 1 reply; 5+ messages in thread
From: Robert Altnoeder @ 2019-06-27 10:22 UTC (permalink / raw)
  To: drbd-dev

On 6/26/19 9:20 PM, Eric Wheeler wrote:
> On Mon, 24 Jun 2019, Lars Ellenberg wrote:
>
>> As our in-sync/out-of-sync bitmap tracks 4k blocks,
>> we want to compare 4k checkesums.
>>
>> Yes, that generates "a lot" of requests, and if these are not merged by
>> some IO scheduler on the lower layers, that may seriously suck.
>>
>> make_ov_request() is what generates the online-verify requests.
>>
>> What we potentially could do is issue the requests in larger chunks,
>> like (1 MiB) to the backends, then calculate and communicate the
>> checksum per each 4k, as well as the result.
> What if it were to calculate 1MiB chunks (configurable) and then 
> invalidate all 4k bitmap entries in that 1MiB range if the hash 
> mismatches?

Is your intention to reduce the number of packets with checksums that
are being sent, and/or the number of checksum comparisons for the same
amount of data?

Both could have a positive impact on performance, but the question is,
whether the difference is big enough to be relevant. On the other hand,
hashing more data per checksum increases the chance of hash collisions.

br,
Robert


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Drbd-dev] Checksum based resync block size
  2019-06-27 10:22     ` Robert Altnoeder
@ 2019-06-27 17:59       ` Eric Wheeler
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Wheeler @ 2019-06-27 17:59 UTC (permalink / raw)
  To: Robert Altnoeder; +Cc: drbd-dev

On Thu, 27 Jun 2019, Robert Altnoeder wrote:

> On 6/26/19 9:20 PM, Eric Wheeler wrote:
> > On Mon, 24 Jun 2019, Lars Ellenberg wrote:
> >
> >> As our in-sync/out-of-sync bitmap tracks 4k blocks,
> >> we want to compare 4k checkesums.
> >>
> >> Yes, that generates "a lot" of requests, and if these are not merged by
> >> some IO scheduler on the lower layers, that may seriously suck.
> >>
> >> make_ov_request() is what generates the online-verify requests.
> >>
> >> What we potentially could do is issue the requests in larger chunks,
> >> like (1 MiB) to the backends, then calculate and communicate the
> >> checksum per each 4k, as well as the result.
>
> > What if it were to calculate 1MiB chunks (configurable) and then 
> > invalidate all 4k bitmap entries in that 1MiB range if the hash 
> > mismatches?

This could also help resync by checksuming contiguous dirty bitmap entries 
(up to a chunk size limit) and resyncing the whole series instead of each 
4k block.
 
> Is your intention to reduce the number of packets with checksums that
> are being sent, and/or the number of checksum comparisons for the same
> amount of data?

Reduce the number of packets, but also, crypto transforms perform better 
on larger data chunks. You make another good point: fewer hash comparisons 
will help too.

> Both could have a positive impact on performance, but the question is,
> whether the difference is big enough to be relevant. On the other hand,
> hashing more data per checksum increases the chance of hash collisions.

I'm not too concerned about hash collisions.  That might be a problem with 
small CRC32-based sums, but assuming crypto hashes with a hash size of 
2^128, then the birthday paradox gives us a collision betwen some two 
hashes after 2^64 hashes---that is, any 2 hashes---not specifically the 
two being compared.  The probability of two chosen hashes colliding is 
even lower than 1/(2^64).

-Eric

> 
> br,
> Robert
> 
> _______________________________________________
> drbd-dev mailing list
> drbd-dev@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-dev
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-27 17:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-22  0:03 [Drbd-dev] Checksum based resync block size Eric Wheeler
2019-06-24 15:49 ` Lars Ellenberg
2019-06-26 19:20   ` Eric Wheeler
2019-06-27 10:22     ` Robert Altnoeder
2019-06-27 17:59       ` Eric Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox