From: Arno Wagner <arno@wagner.name>
To: dm-crypt@saout.de
Subject: Re: [dm-crypt] LUKS header recovery attempt from apparently healthy SSD
Date: Sat, 22 Apr 2017 02:25:48 +0200 [thread overview]
Message-ID: <20170422002548.GA23882@tansi.org> (raw)
In-Reply-To: <f11f16b9-bbcd-4484-bc4b-403d25dc00b5@depressiverobots.com>
Hi Protagonist,
this is an impressive analysis and I basically agree with
all of it.
Personally, I stropnglys suspect your option "I". This design
here is 5 years old and MLC. MLC requires the firmware to do
regular scanning, error correction and rewrites in order to
be reliable. 5 years ago the state of the firmware for that
was more "experimental" than "stable".
For example, I have one old SSD from back then (OCZ trash),
that has silent single bit-errors on average on one of 5 full
reads. If such a bit-error happens on scrubbing or
garbage-collection or regular writes to a partial internal
(very large) sector, parts of the LUKS header may get rewritten
with a permanent bit-error, even if the LUKS header itself was
not written from outside at all.
Such corruption can of course also be due to a failing SSD
controller, bad RAM in the SSD, bus-problems, etc. In
particular, single-bit errors in an MLC-design will not
result from corrupted FLASH, but from other problems.
Now, are there any recovery options?
Aassume 1 bit has been corrupted in a random place.
A key-slot is 256kB, i.e. 2Mbit. That means trying it
out (flip one bit, do an unlock attempt) would take
2 million seconds on the original PC, i.e. 23 days.
This can maybe be brought down by a factor of 5 or so
with the fastest avaliable CPU (the oteration count of
150k is pretty low), i.e. still roughly 5 days.
This may be worth giving it a try, but it requires some
serious coding with libcryptsetup and it will only
help on a single bit-error.
It may of course be a more complex error, especially
when ECC in the disk has corrected an error to the
wrong value, because the original was too corrupted.
A sane design prevents this by using a second,
independent checksum on the ECC result, but as I said,
5 years ago SSD design was pretty experimental and
beginner's mistakes were made.
The keyslot checker is no help here, it is intendend
to find gross localized corruption, for example a
new MBR being right in there in a keyslot. Chesckums
on LUKS-level were not implemented because they are
not really needed as classical HDDs are very good at
detecting read-errors. Unless you go to ZFS ot the like,
filesystems do not do this either, for the same reasons.
There is one gobal "checksum" in LUKS though, exactly
the one that now tells you that there is no matching
keyslot, and on entry of a good passphrase that means
the keyslot is corrupted.
My take is that apart from making absolutely sure
the passphrase is correct (it sounds very much like it
is though) and running the manufacturers diagnostic
tools on the SSD, there is not much more you can do.
Regards,
Arno
On Fri, Apr 21, 2017 at 16:26:30 CEST, protagonist wrote:
> Hello all,
> someone found his way into our local hackerspace looking for help and
> advice with recovering his OS partition from a LUKS-encrypted INTEL SSD
> (SSDSC2CT240A4), and I've decided to get onto the case. Obviously,
> there is no backup, and he's aware of the consequences of this basic
> mistake by now.
>
> The disk refused to unlock on boot in the original machine from one day
> to the other. Opening it form any other of several machines with
> different versions of Ubuntu/Debian, including Debian Stretch with a
> recent version of cryptsetup have been completely unsuccessful,
> indicating a MK digest mismatch and therefore "wrong password". The
> password is fairly simple and contains no special characters or
> locale-sensitive characters and had been written down. Therefore I
> assume it is known correctly and the header must be partially faulty.
>
> After reading the header specification, the FAQs, relevant recovery
> threads on here as well as going through the header with a hex editor
> and deducing some of it's contents by hand, it is obvious to me that
> losing any significant portion (more than a few bytes) of the relevant
> LUKS header sections, either the critical parts of the meta-area or the
> actual key slot, would make the device contents provably irrecoverable,
> as even brute forcing becomes exponentially hard with the number of
> missing pseudo-randomly distributed bits.
>
> Normally, one would move directly to grief stage number five -
> "Acceptance" - if the storage device in question was known to have data
> loss.
>
> However, upon closer inspection, I can detect no obvious signs of
> multiple-byte data loss. There had been no intentional changes to the
> LUKS header, linux system upgrade or any other (known) relevant event to
> the system between it booting one day and refusing to unlock the day
> after. I realize that for *some* reasoning related to anti-forensics,
> the LUKS header specification contains no checksum over actual raw byte
> fields at all, making it very hard to detect the presence of minor
> defects in the header or providing any help in pinpointing their location.
>
> Looking for major defects with the keyslot_checker reveals no obvious
> problems:
>
> parameters (commandline and LUKS header):
> sector size: 512
> threshold: 0.900000
>
> - processing keyslot 0: keyslot not in use
> - processing keyslot 1: start: 0x040000 end: 0x07e800
> - processing keyslot 2: keyslot not in use
> - processing keyslot 3: keyslot not in use
> - processing keyslot 4: keyslot not in use
> - processing keyslot 5: keyslot not in use
> - processing keyslot 6: keyslot not in use
> - processing keyslot 7: keyslot not in use
>
> this is also the case if we increase the desired entropy to -t 0.935:
>
> parameters (commandline and LUKS header):
> sector size: 512
> threshold: 0.935000
>
> - processing keyslot 0: keyslot not in use
> - processing keyslot 1: start: 0x040000 end: 0x07e800
> - processing keyslot 2: keyslot not in use
> [...]
>
> Going through the sectors reported with -v at a higher -t value, I'm
> unable to find any suspicious groupings, for example unusual numbers of
> 00 00 or FF FF. Multi-byte substitution with a non-randomized pattern
> seems unlikely.
>
> ------------------
>
> The luksDump header information looks sane as well. The encryption had
> been created by the Mint 17.1 installation in the second half of 2014 on
> a fairly weak laptop and it's password later changed to a better one,
> which accounts for the use of keyslot #1 and fairly low iteration counts.
>
> LUKS header information for /dev/sda5
>
> Version: 1
> Cipher name: aes
> Cipher mode: xts-plain64
> Hash spec: sha1
> Payload offset: 4096
> MK bits: 512
> MK digest: ff 5c 64 48 bc 1f b2 f2 66 23 d3 66 38 41 c9 60 8a 7e
> de 0a
> MK salt: 04 e3 04 8c 51 fd 07 ee d1 f3 4a 5e c1 8c b9 88
> ab 0d cf dc 55 7c fa bc ca 1a b7 02 5a 55 ac 2c
> MK iterations: 35125
> UUID: 24e05704-f8ed-4391-9a3d-a59330a919d2
>
> Key Slot 0: DISABLED
> Key Slot 1: ENABLED
> Iterations: 144306
> Salt: b8 6f 20 a7 fe 8b 6a 9a 21 58 92 13 ce 1a 43 12 9c
> 4e a0 bf 7c 51 5e a1 78 47 05 ca b6 32 da a4
> Key material offset: 512
> AF stripes: 4000
> Key Slot 2: DISABLED
> Key Slot 3: DISABLED
> Key Slot 4: DISABLED
> Key Slot 5: DISABLED
> Key Slot 6: DISABLED
> Key Slot 7: DISABLED
>
> The disabled key slot #0 salt is correctly filled up with nulls, making
> it unusable for any recovery attempt. All magic bytes of the key slots,
> including 2 to 7 look good. The uuid is "version: 4 (random data based)"
> according to uuid -d output and therefore not of much help.
> ------------------
>
> smartctl indicates fairly standard use for a 240GB desktop ssd, with
> about ~3.7TB written at 2650h runtime, 1 reallocated sector and 0
> "Reported Uncorrectable Errors". The firmware version 335u seems to be
> the latest available, from what I've read. Smartctl tests with "-t
> short", "-t offline" and "-t long" test show no errors:
> # 1 Extended offline Completed without error 00% 2648
> -
> # 2 Offline Completed without error 00% 2646
> -
> # 3 Short offline Completed without error 00% 2572
> -
> The device also shows no issues during idle or read states hinting at
> physical problems.
>
> Checksumming the 240GB of data read blockwise from the device by dd with
> sha512sum lead to identical results on three runs, so the device isn't
> mixing sectors or lying about their content in a different fashion
> differently each time we ask for data.
>
> All in all, the failure mode is still a mystery to me. I can think of
> mainly three explanations:
>
> I. silent data corruption events that have gone undetected by the
> SSD-internal sector-wide checksumming, namely bit/byte level changes on
> * MK salt / digest
> * key slot #1 iterations count / salt
> * key slot #1 AF stripe data
>
> II. actual passphrase mistakes
> * "constant" mistake or layout mismatch
> This seems quite unlikely, as none of the characters change between a US
> layout and the DE layout that was used. There are also no characters
> that can be easily confused such as O/0.
>
> III. some failure I've overlooked, like an OS-level bug or devilish
> malware causing "intentional" writes to the first 2M of the drive.
>
> Failure case #I is still the most likely, but from my understanding, a
> four-digit number of system bootups and associated read events over the
> lifetime of the header shouldn't be able to cause any kind of flash
> wearout, let alone silent data corruption, unless the firmware is broken
> in a subtle way. Assuming it is - what to do besides bruteforcing the
> AF section for bit flips?
>
> I would be delighted about any advice or idea for further tests to
> narrow down whatever happened to this header.
> Regards,
> protagonist
> _______________________________________________
> dm-crypt mailing list
> dm-crypt@saout.de
> http://www.saout.de/mailman/listinfo/dm-crypt
--
Arno Wagner, Dr. sc. techn., Dipl. Inform., Email: arno@wagner.name
GnuPG: ID: CB5D9718 FP: 12D6 C03B 1B30 33BB 13CF B774 E35C 5FA1 CB5D 9718
----
A good decision is based on knowledge and not on numbers. -- Plato
If it's in the news, don't worry about it. The very definition of
"news" is "something that hardly ever happens." -- Bruce Schneier
next prev parent reply other threads:[~2017-04-22 0:25 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <a6e54426-5188-d4c7-ee7b-6f022b84bf22@depressiverobots.com>
2017-04-21 14:26 ` [dm-crypt] LUKS header recovery attempt from apparently healthy SSD protagonist
2017-04-21 23:25 ` David Christensen
2017-04-22 0:25 ` Arno Wagner [this message]
2017-04-22 13:33 ` Robert Nichols
2017-04-22 13:45 ` Arno Wagner
2017-04-22 18:02 ` [dm-crypt] LUKS header recovery attempt, bruteforce detection of bit errors protagonist
2017-04-23 20:03 ` [dm-crypt] LUKS header recovery attempt, bruteforce detection of AF-keyslot " protagonist
2017-04-24 5:50 ` Dominic Raferd
2017-04-24 13:26 ` protagonist
2017-04-24 17:00 ` Dominic Raferd
2017-04-24 17:44 ` Michael Kjörling
2017-04-24 23:49 ` protagonist
2017-04-25 13:14 ` Robert Nichols
2017-04-25 13:44 ` Dominic Raferd
2017-04-25 14:37 ` Robert Nichols
2017-04-25 14:43 ` Robert Nichols
2017-04-25 14:45 ` Ondrej Kozina
2017-04-25 16:16 ` Sven Eschenberg
2017-04-25 16:30 ` Milan Broz
2017-04-25 17:09 ` Sven Eschenberg
2017-04-26 14:45 ` Hendrik Brueckner
2017-04-26 18:46 ` Milan Broz
2017-04-28 15:51 ` protagonist
2017-04-30 15:06 ` protagonist
2017-04-30 18:39 ` Arno Wagner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170422002548.GA23882@tansi.org \
--to=arno@wagner.name \
--cc=dm-crypt@saout.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.