[dm-crypt] Understanding of --integrity* parameters for LUKS-independent dm-integrity setup

public inbox for dm-crypt@saout.de
 help / color / mirror / Atom feed

* [dm-crypt] Understanding of --integrity* parameters for LUKS-independent dm-integrity setup
@ 2020-05-24 11:45 asmqb7
  2020-05-31 11:18 ` Milan Broz
  0 siblings, 1 reply; 2+ messages in thread
From: asmqb7 @ 2020-05-24 11:45 UTC (permalink / raw)
  To: dm-crypt

Hi,

I'm looking to identify efficient and correct reference parameters for
integritysetup that will reasonably defend against hardware failure,
generally considering scenarios that won't necessarily benefit from
the affordances (and overheads) of FDE.

Approaching the dm-integrity/integritysetup documentation with a
(presently) poor/untrustworthy fundamental understanding of
cryptography, I'm not currently able to conclusively evaluate the
significance and semantic impact of the choices available for the
--integrity, --integrity-key-file and --integrity-key-size parameters.
Instead of copy-pasting random commands I've found on the Internet
that just happen to work, I don't want to proceed until I
comprehensively (if not fundamentally) understand the options
available and what scenarios each might be appropriate for.

A first-principle comparison of all available crypto algorithms would
of course be beyond the scope of the integritysetup documentation, but
because of cryptography's highly context- and use-case-specific
nature, I still think that a storage-integrity-focused overview/guide,
even if slightly biased, would be extremely useful. Cohesive and
authoritative reference documentation may also effectively mitigate
superstitious, catastrophically underinformed bikeshedding, and for
this reason my queries err toward a pedantic level of detail.

*** HMAC-SHA* key configuration

- Tracing from integritysetup.c through dm-integrity.c to
hmac_setkey() in /crypto/hmac.c in the kernel reveals that the
keyfiles used by dm-integrity appear to be used as initialization
seeds for the HMAC function. My current working theory/assumption for
why the the user is allowed/required to supply this data is that the
uniqueness and secrecy associated with a user-specifiable HMAC seed
holds defensive cryptographic value. Is this view correct, and are
there any non-secrecy-related reasons I might want to specify my own
HMAC seed value?

- What pathological issues might arise from providing HMAC seed values
of all-0x00, all-0xFF, 0x00..0x63, a repeating pattern, etc? Could
such highly-deterministic, low/zero-entropy keyfiles be considered
universally sane defaults, including for the purposes of automated
system installation, in scenarios where good data integrity is
strictly the only consideration? (I would be particularly partial to
all-0x00, because then I could do --integrity-key-file=/dev/zero.)

- Apparently correct key lengths are 32 for SHA256 (256/8) and 64 for
SHA512 (512/8). Do I understand correctly that longer keys will be
truncated, while shorter keys will be zero-padded (specifically
-suffixed)? (Newly-formatted, untouched volumes using HMAC-SHA512 with
each of 1-, 32-, 64- and 2048-byte all-'\0' keys all present the same
checksum, suggesting this is true.)

*** Integrity algorithm selection

I was curious if the "crc32c" and "hmac-sha256" options noted in the
manpage (as of May 2020) represented the full list of algorithms
accepted by the integritysetup --integrity parameter, so after loading
all modules in /lib/modules/.../crypto/ (using the stock 4.19.0 kernel
in Debian 10.4), I iterated over all entries presented in
/proc/crypto. In my case, the following algorithms resulted in
successful volume creation (omitting --integrity-key-{file,size}):
tgr128, tgr160, tgr192, wp256, wp384, wp512, rmd128, rmd160, rmd256,
rmd320, poly1305, md4, md5, sha1, sha224, sha384, sha256, sha512,
crct10dif, crc32, and crc32c.

In the interests of disambiguation and thoroughness I offer a
**indeterminate/unverified** commentary of the algorithms my kernel
currently offers. Any critique/correction (NAK) where there is wrong
understanding, and agreement/consensus (ACK) where there is soundness,
will equally be appreciated.

- CRC32 (which is weaker than CRC32C) and "crct10dif" (T.10 DIF CRC16,
basically half of CRC32) are probably universally unsuitable;
- CRC32C may be useful where a low/rudimentary level of protection is
wanted in low-power/embedded contexts (where acceleration may be
unavailable), and maybe even for many general-purpose COTS/mainstream
setups;
- SHA-1 (vulnerable to both the SHAttered attack from 2017, and the
more recent chosen-prefix SHA-mbles attack from 2019), MD5
(widely-known to be vulnerable to collision attacks), and MD4
(significantly compromised since 1995) all suffer from significant
well-known vulnerabilities and attacks and ARE NOT cryptographically
sound, but may offer a stronger level of defense than CRC32C, with
SHA-1 a relative order of magnitude stronger than MD5 and MD4;
- "wp"/Whirlpool (as used in the original TrueCrypt), "rmd"/RIPEMD
(specifically the strengthened versions released in 1996), and
"tgr"/Tiger (which OpenPGP abandoned for RIPEMD-160) are (grossly)
variously cryptographically controversial ("uncertain") but not
concretely broken, may offer a stronger level of protection than
SHA-1, MD5, et al for the purposes of integrity preservation, and may
become subject to future catastrophic attacks that break their current
cryptographic guarantees;
- SHA-224 (the truncation of SHA-256) and SHA-384 (the truncation of
SHA-512) may offer adequate collision resistance despite limited
entropy in (admittedly hard to imagine) scenarios where there is
insufficient space for 32- or 64-byte hashes;
- HMAC-SHA1 may be more acceptable than it first appears, thanks to
the HMAC construction's resistance to the length attacks plain SHA* is
susceptible to (see next point), and also because HMAC does not demand
strong security guarantees
(https://security.stackexchange.com/questions/187866/why-aren-t-collisions-important-with-hmac,
although a comment mentions HMAC-MD4 is compromised);
- Plain SHA-256 and SHA-512 are probably acceptable for data integrity
protection, because the fixed-length interleaved blocks used by
dm-integrity renders it immune to the length attacks SHA* is
vulnerable to (https://security.stackexchange.com/questions/79577/whats-the-difference-between-hmac-sha256key-data-and-sha256key-data);
- Poly1305 (as used for message authentication in TLS 1.3),
HMAC-SHA256 and HMAC-SHA512 are probably good sane defaults.

I also have some further general questions in addition to the above
purely cryptographically-focused analysis.

- How does Poly1305 generally compare with HMAC-SHA{1,256,512} within
an integrity-preserving checksumming context? Where/why might I pick
one over the other?
- In what real-world scenarios might I accept CRC32C? I've read a lot
of anecdata about its (poor) collision resistance, mostly consisting
of ambiguous (to me), handwavy/inconclusive field reports and
discussion like https://news.ycombinator.com/item?id=13853110 (root
article link: https://news.ycombinator.com/item?id=13851349)
- In what scenarios might I find plain SHA-* acceptable over HMAC-SHA*?
- Where might it be acceptable to choose HMAC-SHA1 over HMAC-SHA256 or
HMAC-SHA512?
- Am I correct in theorizing(/presuming) that dm-integrity's
fixed-length block structure does indeed make SHA-256 and SHA-512 safe
to use?
- What inputs exactly are provided to the hash functions? I found an
interesting HMAC-SHA512 vs Poly1305 comparison at
https://crypto.stackexchange.com/questions/56429/which-algorithm-has-better-performance-hmac-umac-and-poly1305,
which incidentally highlights the criticality of supplying correctly
unique inputs to Poly1305 to ensure secure output. I have no reason to
believe this detail was not well known during dm-integrity's design,
and that it was handled correctly; I just can't find any explicit
references in Authenticated and Resilient Disk Encryption (final.pdf).
(That paper does document in Table 4.5 on page 39 that "random"
integrity IVs are used for AEAD modes, but there is no specific
connection made to Poly1305, nor to HMAC-SHA* for that matter.)
- Very tangentially, could it be reasonable to propose "hmac-crc32c"
for data integrity protection? I found
https://news.ycombinator.com/item?id=16750767 which vaguely implies
such a construction would be silly, but the comment doesn't clarify
why. I also found
https://www.spinics.net/lists/linux-crypto/msg25086.html which notes
the removal of an apparent mistaken hmac-crc32 capability. I get the
impression this is an interesting but wrongheaded idea, but am yet
without explanation.
- Do any modes other than HMAC-SHA* require a keyfile? (I tried all
entries in /proc/crypto with a keyfile, and all failed except
"digest_null" (heh).)

*** Miscellaneous findings

integritysetup and dm-integrity currently report invalid algorithm
selections in a somewhat inscrutable way: integritysetup --debug
reports "device-mapper: reload ioctl on   failed: No such file or
directory" (with two spaces between "on" and "failed", as presented),
while dm-integrity squirrels "device-mapper: table: 254:0: integrity:
Invalid internal hash" or "device-mapper: table: 254:0: integrity:
Error setting internal hash key" into dmesg. I first ran into this
problem when experimenting with different integrity algorithms and
trying "aead" and "cmac-aes" after finding references to those
algorithms in integritysetup.c (I now realize these modes only make
sense for dm-crypt). I later found when checking all algorithm
possibilities listed in /proc/crypto that this was how invalid
algorithms were reported.

I wonder how "integrity-only" and "integrity-with-encryption" might be
distinguished in the code, so integritysetup can properly bail out if
asked to use an irrelevant mode. FWIW, my first instinct on seeing
that reference to a zero-length filename was to momentarily fear
something had been deleted upon cleanup, until I remembered I really
wasn't dealing with a flaky bash script :)

As a small aside, the fact that AEAD is not usable for integrity-only
contexts makes logical sense; "Authenticated and Resilient Disk
Encryption" (final.pdf) describes how AEAD implements both encryption
and integrity protection (I am yet to understand how it does so in a
length-preserving manner :) ) so obviously trying to use that
algorithm in an integrity-only context would fail. I observe that
searching the same PDF for "cmac-aes" returns 0 results, perhaps
because this mode was implemented after the paper was published.

*** References consulted before emailing, in order of descending relevance:

- https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.txt
- https://gitlab.com/cryptsetup/cryptsetup/-/wikis/DMIntegrity
- https://is.muni.cz/th/vesfr/final.pdf (this was very interesting to read)
- https://archive.fosdem.org/2018/schedule/event/cryptsetup/attachments/slides/2506/export/events/attachments/cryptsetup/slides/2506/fosdem18_cryptsetup_aead.pdf
- https://crypto.stackexchange.com/questions/56429/which-algorithm-has-better-performance-hmac-umac-and-poly1305
- https://securitypitfalls.wordpress.com/2018/05/08/raid-doesnt-work/
- https://gist.github.com/MawKKe/caa2bbf7edcc072129d73b61ae7815fb
- https://github.com/torvalds/linux/blob/master/drivers/md/dm-integrity.c
- https://wiki.gentoo.org/wiki/Device-mapper#Integrity (this is
currently terribly out of date)
- https://dm-devel.redhat.narkive.com/3zjEiVPz/dmitry-kasatkin-huawei-com
- https://security.stackexchange.com/questions/190670/luks2-dm-integrity
- https://en.wikipedia.org/wiki/HMAC (this is very inscrutable and
doesn't enlighten much...)
- https://security.stackexchange.com/questions/79577/whats-the-difference-between-hmac-sha256key-data-and-sha256key-data/79581
- https://security.stackexchange.com/questions/135936/finding-hash-collision
- https://github.com/torvalds/linux/blob/master/crypto/hmac.c
- https://github.com/mbroz/cryptsetup/blob/master/src/integritysetup.c
- https://github.com/mbroz/cryptsetup/blob/master/lib/integrity/integrity.c
- https://github.com/mbroz/cryptsetup/blob/master/lib/utils.c
- https://github.com/mbroz/cryptsetup/blob/master/lib/utils_crypt.c
- https://github.com/mbroz/cryptsetup/blob/master/src/utils_tools.c
- https://github.com/mbroz/cryptsetup/blob/master/lib/setup.c
- https://github.com/mbroz/cryptsetup/blob/master/lib/libcryptsetup.h
- https://github.com/torvalds/linux/blob/master/drivers/md/dm-crypt.c

***

Thanks very much for implementing the dm-integrity target, and for
introducing novel, universal silent data corruption detection to
Linux's block layer in a pluggable way and making it usable
independently of LUKS; it allows virtually all current and future
Linux filesystems, dm-raid and LVM configurations, and software that
works with block devices, to be informed of offline tampering and
hardware failure by simply checking for -EILSEQ. I look forward to the
day major distribution installers offer zero-effort opt-in to
integrity protection and enable mainstream, enterprise and embedded
Linux users everywhere to regularly contribute significantly to the
"storage media reports healthy but just returned wrong data"
statistics. Ideally, real and significant progress will be made in
this area within the next 10 years.

David Lindsay

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [dm-crypt] Understanding of --integrity* parameters for LUKS-independent dm-integrity setup
  2020-05-24 11:45 [dm-crypt] Understanding of --integrity* parameters for LUKS-independent dm-integrity setup asmqb7
@ 2020-05-31 11:18 ` Milan Broz
  0 siblings, 0 replies; 2+ messages in thread
From: Milan Broz @ 2020-05-31 11:18 UTC (permalink / raw)
  To: asmqb7, dm-crypt

Hi,

I'll try to answer some questions here, but I should probably
write some blog about it later.

The whole idea of dm-integrity was to show that we NEED data integrity
protection. Waiting for superior solution in fs does not not work.
It has limits - but unlike as many proposed solutions in academic paper,
it is in mainline kernel.

But my combined academic/practical planned approach was one thing,
and the reality is slightly different now.

The basic idea was that we have two uses for dm-integrity

- authenticated sector encryption (but without sector replay protection)
  (dm-integrity just provides additional space for sector authentication tag
  processed in dm-crypt above it; => cryptsetup/LUKS2 controls it)

- non-cryptographic data integrity protection (dm-integrity standalone,
  data checksums are calculated in dm-integrity; => integritysetup controls it)

The reality is that dm-integrity itself can use own cryptography like
keyed HASH checksums (HMAC) but in the beginning this was not intended use
(but Mikulas as author of this part convinced me and we support it in integritysetup).

So my suggestion is - if you do not need encryption (confidentiality protection),
and all you need is to check for hw/random failures, just use dm-integrity/integritysetup
with some fast checksum - default is CRC32C and it works quite fine.

Do not mess with slow hashes or even HMAC etc until you have some valid threat model
where it helps to protect <something>.

On 24/05/2020 13:45, asmqb7 wrote:
> Approaching the dm-integrity/integritysetup documentation with a
> (presently) poor/untrustworthy fundamental understanding of
> cryptography, I'm not currently able to conclusively evaluate the
> significance and semantic impact of the choices available for the
> --integrity, --integrity-key-file and --integrity-key-size parameters.
> Instead of copy-pasting random commands I've found on the Internet
> that just happen to work, I don't want to proceed until I
> comprehensively (if not fundamentally) understand the options
> available and what scenarios each might be appropriate for.

As said above, if you want only detection of random data corruption,
do not use these. Just use non-cryptographic checksum like crc32.

...
> *** HMAC-SHA* key configuration
> 
> - Tracing from integritysetup.c through dm-integrity.c to
> hmac_setkey() in /crypto/hmac.c in the kernel reveals that the
> keyfiles used by dm-integrity appear to be used as initialization
> seeds for the HMAC function. My current working theory/assumption for
> why the the user is allowed/required to supply this data is that the
> uniqueness and secrecy associated with a user-specifiable HMAC seed
> holds defensive cryptographic value. Is this view correct, and are
> there any non-secrecy-related reasons I might want to specify my own
> HMAC seed value?

It is just keyed-hash, so nobody without the key can calculate or change
data integrity tag. Think about it as your secret key, it must be protected.

> - What pathological issues might arise from providing HMAC seed values
> of all-0x00, all-0xFF, 0x00..0x63, a repeating pattern, etc? Could
> such highly-deterministic, low/zero-entropy keyfiles be considered
> universally sane defaults, including for the purposes of automated
> system installation, in scenarios where good data integrity is
> strictly the only consideration? (I would be particularly partial to
> all-0x00, because then I could do --integrity-key-file=/dev/zero.)

Do not do that. If you do not need HMAC do not use it all.

In this case just use plain hash (--integrity sha256 for example; note
you need to set it in each open command - this is a misconception though).
But as said above, crc32 is usually ok, it takes
only 4 additional bytes and it is faster.

It really depends what are you trying to protect - if all you need is to
detect random bit/byte change, using cryptographic hashes is overkill.

> - Apparently correct key lengths are 32 for SHA256 (256/8) and 64 for
> SHA512 (512/8). Do I understand correctly that longer keys will be
> truncated, while shorter keys will be zero-padded (specifically
> -suffixed)?

In integritysetup, you can set key size as you want, HMAC padding is done
by the kernel crypto inside the kernel. See kernel crypto API doc
(it is basically rfc2104 padding).

> *** Integrity algorithm selection
> 
> I was curious if the "crc32c" and "hmac-sha256" options noted in the
> manpage (as of May 2020) represented the full list of algorithms
> accepted by the integritysetup

For algorithms, integritysetup should accept all algorithms available
in running kernel. There is actually no dynamic listing of all variants.

(Note, for LUKS2 we hardcode and allow only some algorithms, but integritysetup
is just direct interface to dm-integrity/crypto API.)

> - CRC32 (which is weaker than CRC32C) and "crct10dif" (T.10 DIF CRC16,
> basically half of CRC32) are probably universally unsuitable;

Why? CRC32C just use different polynomial IIRC and we set is as default
because it is hw accelerated on Intel CPUs.

crc10dif is meant for HW that allows native data integrity fields
(512 + 8) bytes sectors drives. Do not us it outside of this context
(despite it probably works).

Again, I think you need to define what (against what/who) are you want to protect,
and also avoid over-engineering it.

If you really need cryptographic hash, I would say stay with SHA256.
But this is for different discussion. (SHA1 could be disabled later,
many others are terribly slow etc.)

If not keyed, collisions do not matter (attacker can recalculate it anyway).

[I'll stop here, nobody reads so long mails :]

Anyway, we will check the error message reports later, there should be more sensible
checking and error reporting than just syslog messages.
Thanks for these comments!

Milan

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-05-31 11:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-24 11:45 [dm-crypt] Understanding of --integrity* parameters for LUKS-independent dm-integrity setup asmqb7
2020-05-31 11:18 ` Milan Broz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox