bad key ordering - repairable?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* bad key ordering - repairable?
@ 2018-01-22 21:06 Claes Fransson
  2018-01-22 21:22 ` Hugo Mills
  2018-01-23  2:35 ` Chris Murphy
  0 siblings, 2 replies; 17+ messages in thread
From: Claes Fransson @ 2018-01-22 21:06 UTC (permalink / raw)
  To: linux-btrfs

Hi!

I really like the features of BTRFS, especially deduplication,
snapshotting and checksumming. However, when using it on my laptop the
last couple of years, it has became corrupted a lot of times.
Sometimes I have managed to fix the problems (at least so much that I
can continue to use the filesystem) with check --repair, but several
times I had to recreate the file system and reinstall the operating
system.

I am guessing the corruptions might be the results of unclean
shutdowns, mostly after system hangs, but also because of running out
of battery sometimes?
Furthermore, the power-led has recently started blinking (also when
the power-cable is plugged in), I guess because of an old and bad
battery. Maybe the current corruption also can have something to do
with this? However I almost always run with power cable plugged in in
last year, only on battery a few seconds a few times when moving the
laptop.

Currently, I can only mount the filesystem readonly, it goes readonly
automatically if I try to mount it normally.

When booting an OpenSUSE Tumbleweed-20180119 live-iso:
localhost:~ # uname -r
4.14.13-1-default
localhost:~ # btrfs --version
btrfs-progs v4.14.1

localhost:~ # btrfs check -p /dev/sda12
Checking filesystem on /dev/sda12
                                                    UUID:
d2819d5a-fd69-484b-bf34-f2b5692cbe1f
                                        bad key ordering 159 160

                           bad block 690436964352



            ERROR: errors found in extent allocation tree or chunk
allocation                                               checking free
space cache [.]
                                           checking fs roots [o]

                                 checking csums

                      bad key ordering 159 160

         Error looking up extent record -1

Right section didn't have a record
                                                        There are no
extents for csum range 22732550144-24923615232
                        Csum exists for 16303538176-24923615232 but
there is no extent record                                     ERROR:
errors found in csum tree
                                             found 344063430663 bytes
used, error(s) found
                      total csum bytes: 0

            total tree bytes: 453410816

total fs tree bytes: 0
                                                                total
extent tree bytes: 452952064
btree space waste bytes: 140165932
                                                    file data blocks
allocated: 108462080
 referenced 108462080

localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
/dev/sda12
btrfs-progs v4.14.1
                                                               leaf
690436964352 items 170 free space 1811 generation 196864 owner 2
                            leaf 690436964352 flags 0x1(WRITTEN)
backref revision 1
    fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
                                                  chunk uuid
52f81fe6-893b-4432-9336-895057ee81e1
.
.
.
        item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
                refs 1 gen 821 flags DATA
                extent data backref root 287 objectid 51665 offset 0 count 1
        item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
                refs 1 gen 821 flags DATA
                extent data backref root 287 objectid 51666 offset 0 count 1
        item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
print-tree.c:428: print_extent_item: BUG_ON `item_size !=
sizeof(*ei0)` triggered, value 1
btrfs(+0x365c6)[0x55bdfaada5c6]
btrfs(print_extent_item+0x424)[0x55bdfaadb284]
btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
btrfs(main+0x7d)[0x55bdfaac7d4d]
/lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
btrfs(_start+0x2a)[0x55bdfaac7e5a]
Aborted (core dumped)


check --repair hangs after reporting "bad key ordering 159 160" with
no disk activity but constant high cpu usage.

localhost:~ # smartctl -a /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.13-1-default] (SUSE RPM)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SanDisk SD8SB8U1T001122
Serial Number:    163076421231
LU WWN Device Id: 5 001b44 4a4dde388
Firmware Version: X4140000
User Capacity:    1,024,209,543,168 bytes [1.02 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jan 22 15:28:46 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   100   100   ---    Old_age
Always       -       7692
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age
Always       -       496
165 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       1112516724361
166 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       1
167 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       25
168 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       44
169 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       753
170 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       0
171 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       0
172 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       0
173 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       18
174 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       57
184 End-to-End_Error        0x0032   100   100   ---    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age
Always       -       1
194 Temperature_Celsius     0x0022   061   062   ---    Old_age
Always       -       39 (Min/Max 9/62)
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age
Always       -       0
230 Unknown_SSD_Attribute   0x0032   100   100   ---    Old_age
Always       -       4733091251278
232 Available_Reservd_Space 0x0033   100   100   004    Pre-fail
Always       -       100
233 Media_Wearout_Indicator 0x0032   100   100   ---    Old_age
Always       -       19202
234 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       32167
241 Total_LBAs_Written      0x0030   253   253   ---    Old_age
Offline      -       22520
242 Total_LBAs_Read         0x0030   253   253   ---    Old_age
Offline      -       183882
244 Unknown_Attribute       0x0032   000   100   ---    Old_age
Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      7570         -
# 2  Extended offline    Completed without error       00%      7395         -
# 3  Extended offline    Completed without error       00%      6253         -
# 4  Short offline       Completed without error       00%      4030         -
# 5  Extended offline    Completed without error       00%      1568         -
# 6  Extended offline    Completed without error       00%      1434         -

Selective Self-tests/Logging not supported

localhost:~ # btrfs fi usage /mnt
Overall:
    Device size:                 450.00GiB
    Device allocated:            424.04GiB
    Device unallocated:           25.96GiB
    Device missing:                  0.00B
    Used:                        420.38GiB
    Free (estimated):             27.39GiB      (min: 27.39GiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:411.98GiB, Used:410.55GiB
   /dev/sda12    411.98GiB

Metadata,single: Size:12.00GiB, Used:9.83GiB
   /dev/sda12     12.00GiB

System,single: Size:64.00MiB, Used:64.00KiB
   /dev/sda12     64.00MiB

Unallocated:
   /dev/sda12     25.96GiB

The filesystem had become pretty full, I had planned to increase the
Btrfs-partition size before it became corrupt.

Active kernel when the filesystem went read only: OpenSUSE Linux
4.14.14-1.geef6178-default, from the
http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
repository.

Fstab mount options: noatime,autodefrag (I have been using the option
nossd with older kernels one period in the past on the filesystem).

If it matters, I have been running duperemove many times on the
filesystem since creation.

To test the RAM, I have been running mprime Blend-test for 24 hours
after the corruption without any error or warning.

Is there a way I can try to repair this filesystem without the need to
recreate it and reinstall the operating system? A reinstall including
all currently installed packages, and restoring all current system
settings, would probably take some time for me to do.
If it is currently not repairable, it would be nice if this kind of
corruption could be repaired in the future, even if losing a few
files. Or if the corruptions could be avoided in the first place.

Laptop: Asus N56JR-S4075H, bought new 2014
Hard drive: since 14 months a SanDisk X400 SD8SB8U1T001122 1TB SSD,
originally a Seagate ST750LM000 SSHD
RAM: lshw:-memory
          description: System Memory
          physical id: c
          slot: System board or motherboard
          size: 12GiB
        *-bank:0
             description: SODIMM DDR3 Synchronous 1600 MHz (0,6 ns)
             product: ASU16D3LS1KBG/4G
             vendor: Kingston
             physical id: 0
             serial: C32D5655
             slot: ChannelA-DIMM0
             size: 4GiB
             width: 64 bits
             clock: 1600MHz (0.6ns)
        *-bank:1
             description: DIMM [empty]
             product: [Empty]
             vendor: [Empty]
             physical id: 1
             serial: [Empty]
             slot: ChannelA-DIMM1
        *-bank:2
             description: SODIMM DDR3 Synchronous 1600 MHz (0,6 ns)
             product: M471B1G73QH0-YK0
             vendor: Samsung
             physical id: 2
             serial: 1519AD27
             slot: ChannelB-DIMM0
             size: 8GiB
             width: 64 bits
             clock: 1600MHz (0.6ns)
        *-bank:3
             description: DIMM [empty]
             product: [Empty]
             vendor: [Empty]
             physical id: 3
             serial: [Empty]
             slot: ChannelB-DIMM1
CPU: Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz
BIOS version: N56JRH.202
SSD Partitions (among others): Btrfs with OpenSUSE Tumbleweed
installation, NTFS with Windows 10, Ext4 with Fedora installation.

I have never noticed any corruptions on the NTFS and Ext4 file systems
on the laptop, only on the Btrfs file systems.

Best regards,
Claes Fransson

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-22 21:06 bad key ordering - repairable? Claes Fransson
@ 2018-01-22 21:22 ` Hugo Mills
  2018-01-23 13:06   ` Claes Fransson
  2018-01-23  2:35 ` Chris Murphy
  1 sibling, 1 reply; 17+ messages in thread
From: Hugo Mills @ 2018-01-22 21:22 UTC (permalink / raw)
  To: Claes Fransson; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5612 bytes --]

On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote:
> Hi!
> 
> I really like the features of BTRFS, especially deduplication,
> snapshotting and checksumming. However, when using it on my laptop the
> last couple of years, it has became corrupted a lot of times.
> Sometimes I have managed to fix the problems (at least so much that I
> can continue to use the filesystem) with check --repair, but several
> times I had to recreate the file system and reinstall the operating
> system.
> 
> I am guessing the corruptions might be the results of unclean
> shutdowns, mostly after system hangs, but also because of running out
> of battery sometimes?
> Furthermore, the power-led has recently started blinking (also when
> the power-cable is plugged in), I guess because of an old and bad
> battery. Maybe the current corruption also can have something to do
> with this? However I almost always run with power cable plugged in in
> last year, only on battery a few seconds a few times when moving the
> laptop.
> 
> Currently, I can only mount the filesystem readonly, it goes readonly
> automatically if I try to mount it normally.
> 
> When booting an OpenSUSE Tumbleweed-20180119 live-iso:
> localhost:~ # uname -r
> 4.14.13-1-default
> localhost:~ # btrfs --version
> btrfs-progs v4.14.1
> 
> localhost:~ # btrfs check -p /dev/sda12
> Checking filesystem on /dev/sda12

[fixing up bad paste]

> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f
> bad key ordering 159 160 bad block 690436964352
> ERROR: errors found in extent allocation tree or chunk allocation
> checking free space cache [.]
> checking fs roots [o]
> checking csums
> bad key ordering 159 160
> Error looking up extent record -1

[snip]

> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
> /dev/sda12
> btrfs-progs v4.14.1
>     leaf 690436964352 items 170 free space 1811 generation 196864 owner 2
>     leaf 690436964352 flags 0x1(WRITTEN) backref revision 1
>     fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>     chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1
> .
> .
> .
>         item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
>                 refs 1 gen 821 flags DATA
>                 extent data backref root 287 objectid 51665 offset 0 count 1
>         item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
>                 refs 1 gen 821 flags DATA
>                 extent data backref root 287 objectid 51666 offset 0 count 1
>         item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` triggered, value 1
> btrfs(+0x365c6)[0x55bdfaada5c6]
> btrfs(print_extent_item+0x424)[0x55bdfaadb284]
> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
> btrfs(main+0x7d)[0x55bdfaac7d4d]
> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
> btrfs(_start+0x2a)[0x55bdfaac7e5a]
> Aborted (core dumped)

   Wow, I've never seen it do that before. It's the next thing I'd
have asked for, so it's good you've preempted it.

   The main thing is that bad key ordering is almost always due to RAM
corruption. That's either bad RAM, or dodgy power regulation -- the
latter could be the PSU, or capacitors on the motherboard. (In this
case, it might also be something funny with the battery).

   I would definitely recommend a long run of memtest86. At least 8
hours, preferably 24. If you get errors repeatedly in the sme place,
it's the RAM. If they appear randomly, it's probably the power
regulation.

[snip]

> 
> The filesystem had become pretty full, I had planned to increase the
> Btrfs-partition size before it became corrupt.
> 
> Active kernel when the filesystem went read only: OpenSUSE Linux
> 4.14.14-1.geef6178-default, from the
> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
> repository.
> 
> Fstab mount options: noatime,autodefrag (I have been using the option
> nossd with older kernels one period in the past on the filesystem).
> 
> If it matters, I have been running duperemove many times on the
> filesystem since creation.
> 
> To test the RAM, I have been running mprime Blend-test for 24 hours
> after the corruption without any error or warning.

   Of all of the bad key order errors I've seen (dozens), I think
there were a whole two which turned out not to be obviously related to
corrupt RAM. I still say that it's most likely the hardware.

> Is there a way I can try to repair this filesystem without the need to
> recreate it and reinstall the operating system? A reinstall including
> all currently installed packages, and restoring all current system
> settings, would probably take some time for me to do.
> If it is currently not repairable, it would be nice if this kind of
> corruption could be repaired in the future, even if losing a few
> files. Or if the corruptions could be avoided in the first place.

   Given that the current tools crash, the answer's a definite
no. However, if you can get a developer interested, they may be able
to write a fix for it, given an image of the FS (using btrfs-image).

[snip]
> I have never noticed any corruptions on the NTFS and Ext4 file systems
> on the laptop, only on the Btrfs file systems.

   You've never _noticed_ them. :)

   Hugo.

-- 
Hugo Mills             | ... one ping(1) to rule them all, and in the
hugo@... carfax.org.uk | darkness bind(2) them.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                                Illiad

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-22 21:06 bad key ordering - repairable? Claes Fransson
  2018-01-22 21:22 ` Hugo Mills
@ 2018-01-23  2:35 ` Chris Murphy
  2018-01-23 12:51   ` Austin S. Hemmelgarn
  2018-01-23 13:17   ` Claes Fransson
  1 sibling, 2 replies; 17+ messages in thread
From: Chris Murphy @ 2018-01-23  2:35 UTC (permalink / raw)
  To: Claes Fransson; +Cc: Btrfs BTRFS

On Mon, Jan 22, 2018 at 2:06 PM, Claes Fransson
<claes.v.fransson@gmail.com> wrote:
> Hi!
>
> I really like the features of BTRFS, especially deduplication,
> snapshotting and checksumming. However, when using it on my laptop the
> last couple of years, it has became corrupted a lot of times.
> Sometimes I have managed to fix the problems (at least so much that I
> can continue to use the filesystem) with check --repair, but several
> times I had to recreate the file system and reinstall the operating
> system.
>
> I am guessing the corruptions might be the results of unclean
> shutdowns, mostly after system hangs, but also because of running out
> of battery sometimes?

I think it's something else because I intentionally and
unintentionally do unclean shutdowns (I'm really impatient and I'm a
saboteur) on my laptop and I never get corruptions. In 18 months with
an HP Spectre which doesn't even have ECC memory, and has an NVMe
drive, *and* really remarkable for almost half this time I used the
discard mount option which pretty much instantly obliterates unused
roots, even when referenced in the super block as backup roots - and
yet still zero corruption. No complaints on mount, scrub, or readonly
checks. *shrug*

Anyway I suspect hardware or power issue. Or even SSD firmware issue.

> Furthermore, the power-led has recently started blinking (also when
> the power-cable is plugged in), I guess because of an old and bad
> battery. Maybe the current corruption also can have something to do
> with this? However I almost always run with power cable plugged in in
> last year, only on battery a few seconds a few times when moving the
> laptop.
>
> Currently, I can only mount the filesystem readonly, it goes readonly
> automatically if I try to mount it normally.

Btrfs is confused and doesn't want to make the corruption worse.

>
> Fstab mount options: noatime,autodefrag (I have been using the option
> nossd with older kernels one period in the past on the filesystem).
>
> If it matters, I have been running duperemove many times on the
> filesystem since creation.

I don't think it's related.

>
> To test the RAM, I have been running mprime Blend-test for 24 hours
> after the corruption without any error or warning.

I'm not familiar with it, pretty sure you want this for UEFI:

https://www.memtest86.com/download.htm

Where you can use that or memtest86+ if the firmware is BIOS based.

> I have never noticed any corruptions on the NTFS and Ext4 file systems
> on the laptop, only on the Btrfs file systems.

NTFS and ext4 likely won't notice such corruptions either (although
new ext4 volumes any day now will have checksummed metadata by
default) as they're weren't designed with such detection in mind.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-23  2:35 ` Chris Murphy
@ 2018-01-23 12:51   ` Austin S. Hemmelgarn
  2018-01-23 13:29     ` Claes Fransson
  2018-01-24  0:44     ` Chris Murphy
  2018-01-23 13:17   ` Claes Fransson
  1 sibling, 2 replies; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-23 12:51 UTC (permalink / raw)
  To: Chris Murphy, Claes Fransson; +Cc: Btrfs BTRFS

On 2018-01-22 21:35, Chris Murphy wrote:
> On Mon, Jan 22, 2018 at 2:06 PM, Claes Fransson
> <claes.v.fransson@gmail.com> wrote:
>> Hi!
>>
>> I really like the features of BTRFS, especially deduplication,
>> snapshotting and checksumming. However, when using it on my laptop the
>> last couple of years, it has became corrupted a lot of times.
>> Sometimes I have managed to fix the problems (at least so much that I
>> can continue to use the filesystem) with check --repair, but several
>> times I had to recreate the file system and reinstall the operating
>> system.
>>
>> I am guessing the corruptions might be the results of unclean
>> shutdowns, mostly after system hangs, but also because of running out
>> of battery sometimes?
> 
> I think it's something else because I intentionally and
> unintentionally do unclean shutdowns (I'm really impatient and I'm a
> saboteur) on my laptop and I never get corruptions. In 18 months with
> an HP Spectre which doesn't even have ECC memory, and has an NVMe
> drive, *and* really remarkable for almost half this time I used the
> discard mount option which pretty much instantly obliterates unused
> roots, even when referenced in the super block as backup roots - and
> yet still zero corruption. No complaints on mount, scrub, or readonly
> checks. *shrug*
> 
> Anyway I suspect hardware or power issue. Or even SSD firmware issue.
I would tend to agree here, with one caveat, if it's a laptop that's 
less than 3 years old, you can probably rule out power issues.  Some 
more info on the particular system might help identify what's wrong.
> 
>> Furthermore, the power-led has recently started blinking (also when
>> the power-cable is plugged in), I guess because of an old and bad
>> battery. Maybe the current corruption also can have something to do
>> with this? However I almost always run with power cable plugged in in
>> last year, only on battery a few seconds a few times when moving the
>> laptop.
>>
>> Currently, I can only mount the filesystem readonly, it goes readonly
>> automatically if I try to mount it normally.
> 
> Btrfs is confused and doesn't want to make the corruption worse. >
>>
>> Fstab mount options: noatime,autodefrag (I have been using the option
>> nossd with older kernels one period in the past on the filesystem).
>>
>> If it matters, I have been running duperemove many times on the
>> filesystem since creation.
> 
> I don't think it's related.
> 
> 
>>
>> To test the RAM, I have been running mprime Blend-test for 24 hours
>> after the corruption without any error or warning.
> 
> I'm not familiar with it, pretty sure you want this for UEFI:
> 
> https://www.memtest86.com/download.htm
> 
> Where you can use that or memtest86+ if the firmware is BIOS based.
Do keep in mind that just because it passes memory checks does not mean 
it's not an issue with the RAM.  Memory testers rarely throw false 
positives, but it's pretty common to get false negatives from them.>
>> I have never noticed any corruptions on the NTFS and Ext4 file systems
>> on the laptop, only on the Btrfs file systems.
> 
> NTFS and ext4 likely won't notice such corruptions either (although
> new ext4 volumes any day now will have checksummed metadata by
> default) as they're weren't designed with such detection in mind.
This is extremely important to understand.  BTRFS and ZFS are 
essentially the only filesystems available on Linux that actually 
validate things enough to notice this reliably (ReFS on Windows probably 
does, and I think whatever Apple is calling their new FS does too). 
Even if ext4 did notice it, it would just mark the filesystem for a 
check and then keep going without doing anything else about it 
(seriously, the default behavior for internal errors on ext4 is to just 
continue like nothing happened and mark the FS for fsck).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-22 21:22 ` Hugo Mills
@ 2018-01-23 13:06   ` Claes Fransson
  2018-01-23 18:13     ` Claes Fransson
  2018-01-27 14:54     ` Claes Fransson
  0 siblings, 2 replies; 17+ messages in thread
From: Claes Fransson @ 2018-01-23 13:06 UTC (permalink / raw)
  To: linux-btrfs

2018-01-22 22:22 GMT+01:00 Hugo Mills <hugo@carfax.org.uk>:
> On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote:
>> Hi!
>>
>> I really like the features of BTRFS, especially deduplication,
>> snapshotting and checksumming. However, when using it on my laptop the
>> last couple of years, it has became corrupted a lot of times.
>> Sometimes I have managed to fix the problems (at least so much that I
>> can continue to use the filesystem) with check --repair, but several
>> times I had to recreate the file system and reinstall the operating
>> system.
>>
>> I am guessing the corruptions might be the results of unclean
>> shutdowns, mostly after system hangs, but also because of running out
>> of battery sometimes?
>> Furthermore, the power-led has recently started blinking (also when
>> the power-cable is plugged in), I guess because of an old and bad
>> battery. Maybe the current corruption also can have something to do
>> with this? However I almost always run with power cable plugged in in
>> last year, only on battery a few seconds a few times when moving the
>> laptop.
>>
>> Currently, I can only mount the filesystem readonly, it goes readonly
>> automatically if I try to mount it normally.
>>
>> When booting an OpenSUSE Tumbleweed-20180119 live-iso:
>> localhost:~ # uname -r
>> 4.14.13-1-default
>> localhost:~ # btrfs --version
>> btrfs-progs v4.14.1
>>
>> localhost:~ # btrfs check -p /dev/sda12
>> Checking filesystem on /dev/sda12
>
> [fixing up bad paste]
>
>> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>> bad key ordering 159 160 bad block 690436964352
>> ERROR: errors found in extent allocation tree or chunk allocation
>> checking free space cache [.]
>> checking fs roots [o]
>> checking csums
>> bad key ordering 159 160
>> Error looking up extent record -1
>
> [snip]
>
>> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
>> /dev/sda12
>> btrfs-progs v4.14.1
>>     leaf 690436964352 items 170 free space 1811 generation 196864 owner 2
>>     leaf 690436964352 flags 0x1(WRITTEN) backref revision 1
>>     fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>>     chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1
>> .
>> .
>> .
>>         item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
>>                 refs 1 gen 821 flags DATA
>>                 extent data backref root 287 objectid 51665 offset 0 count 1
>>         item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
>>                 refs 1 gen 821 flags DATA
>>                 extent data backref root 287 objectid 51666 offset 0 count 1
>>         item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
>> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` triggered, value 1
>> btrfs(+0x365c6)[0x55bdfaada5c6]
>> btrfs(print_extent_item+0x424)[0x55bdfaadb284]
>> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
>> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
>> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
>> btrfs(main+0x7d)[0x55bdfaac7d4d]
>> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
>> btrfs(_start+0x2a)[0x55bdfaac7e5a]
>> Aborted (core dumped)
>
>    Wow, I've never seen it do that before. It's the next thing I'd
> have asked for, so it's good you've preempted it.
>
>    The main thing is that bad key ordering is almost always due to RAM
> corruption. That's either bad RAM, or dodgy power regulation -- the
> latter could be the PSU, or capacitors on the motherboard. (In this
> case, it might also be something funny with the battery).
>
>    I would definitely recommend a long run of memtest86. At least 8
> hours, preferably 24. If you get errors repeatedly in the sme place,
> it's the RAM. If they appear randomly, it's probably the power
> regulation.
>
Thanks for the suggestion, I will try to do this in the next days.

> [snip]
>
>>
>> The filesystem had become pretty full, I had planned to increase the
>> Btrfs-partition size before it became corrupt.
>>
>> Active kernel when the filesystem went read only: OpenSUSE Linux
>> 4.14.14-1.geef6178-default, from the
>> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
>> repository.
>>
>> Fstab mount options: noatime,autodefrag (I have been using the option
>> nossd with older kernels one period in the past on the filesystem).
>>
>> If it matters, I have been running duperemove many times on the
>> filesystem since creation.
>>
>> To test the RAM, I have been running mprime Blend-test for 24 hours
>> after the corruption without any error or warning.
>
>    Of all of the bad key order errors I've seen (dozens), I think
> there were a whole two which turned out not to be obviously related to
> corrupt RAM. I still say that it's most likely the hardware.

Okay, thank you for sharing your experience with me.

>
>> Is there a way I can try to repair this filesystem without the need to
>> recreate it and reinstall the operating system? A reinstall including
>> all currently installed packages, and restoring all current system
>> settings, would probably take some time for me to do.
>> If it is currently not repairable, it would be nice if this kind of
>> corruption could be repaired in the future, even if losing a few
>> files. Or if the corruptions could be avoided in the first place.
>
>    Given that the current tools crash, the answer's a definite
> no. However, if you can get a developer interested, they may be able
> to write a fix for it, given an image of the FS (using btrfs-image).
>
Okay, will try to produce and upload an image within the next week.


> [snip]
>> I have never noticed any corruptions on the NTFS and Ext4 file systems
>> on the laptop, only on the Btrfs file systems.
>
>    You've never _noticed_ them. :)
>
>    Hugo.
>
> --
> Hugo Mills             | ... one ping(1) to rule them all, and in the
> hugo@... carfax.org.uk | darkness bind(2) them.
> http://carfax.org.uk/  |
> PGP: E2AB1DE4          |                                                Illiad

Thank you for your answers.

Claes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-23  2:35 ` Chris Murphy
  2018-01-23 12:51   ` Austin S. Hemmelgarn
@ 2018-01-23 13:17   ` Claes Fransson
  1 sibling, 0 replies; 17+ messages in thread
From: Claes Fransson @ 2018-01-23 13:17 UTC (permalink / raw)
  To: Btrfs BTRFS

2018-01-23 3:35 GMT+01:00 Chris Murphy <lists@colorremedies.com>:
> On Mon, Jan 22, 2018 at 2:06 PM, Claes Fransson
> <claes.v.fransson@gmail.com> wrote:
>> Hi!
>>
>> I really like the features of BTRFS, especially deduplication,
>> snapshotting and checksumming. However, when using it on my laptop the
>> last couple of years, it has became corrupted a lot of times.
>> Sometimes I have managed to fix the problems (at least so much that I
>> can continue to use the filesystem) with check --repair, but several
>> times I had to recreate the file system and reinstall the operating
>> system.
>>
>> I am guessing the corruptions might be the results of unclean
>> shutdowns, mostly after system hangs, but also because of running out
>> of battery sometimes?
>
> I think it's something else because I intentionally and
> unintentionally do unclean shutdowns (I'm really impatient and I'm a
> saboteur) on my laptop and I never get corruptions. In 18 months with
> an HP Spectre which doesn't even have ECC memory, and has an NVMe
> drive, *and* really remarkable for almost half this time I used the
> discard mount option which pretty much instantly obliterates unused
> roots, even when referenced in the super block as backup roots - and
> yet still zero corruption. No complaints on mount, scrub, or readonly
> checks. *shrug*
>
Okay, thank you for sharing your experience


> Anyway I suspect hardware or power issue. Or even SSD firmware issue.
>


>> Furthermore, the power-led has recently started blinking (also when
>> the power-cable is plugged in), I guess because of an old and bad
>> battery. Maybe the current corruption also can have something to do
>> with this? However I almost always run with power cable plugged in in
>> last year, only on battery a few seconds a few times when moving the
>> laptop.
>>
>> Currently, I can only mount the filesystem readonly, it goes readonly
>> automatically if I try to mount it normally.
>
> Btrfs is confused and doesn't want to make the corruption worse.
>
>
>
>
>>
>> Fstab mount options: noatime,autodefrag (I have been using the option
>> nossd with older kernels one period in the past on the filesystem).
>>
>> If it matters, I have been running duperemove many times on the
>> filesystem since creation.
>
> I don't think it's related.
>
>
>>
>> To test the RAM, I have been running mprime Blend-test for 24 hours
>> after the corruption without any error or warning.
>
> I'm not familiar with it, pretty sure you want this for UEFI:
>
> https://www.memtest86.com/download.htm
>
Thanks, I will try this within the next days (I boot my laptop in UEFI mode),


> Where you can use that or memtest86+ if the firmware is BIOS based.
>
>
>> I have never noticed any corruptions on the NTFS and Ext4 file systems
>> on the laptop, only on the Btrfs file systems.
>
> NTFS and ext4 likely won't notice such corruptions either (although
> new ext4 volumes any day now will have checksummed metadata by
> default) as they're weren't designed with such detection in mind.
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-23 12:51   ` Austin S. Hemmelgarn
@ 2018-01-23 13:29     ` Claes Fransson
  2018-01-24  0:44     ` Chris Murphy
  1 sibling, 0 replies; 17+ messages in thread
From: Claes Fransson @ 2018-01-23 13:29 UTC (permalink / raw)
  To: Btrfs BTRFS

2018-01-23 13:51 GMT+01:00 Austin S. Hemmelgarn <ahferroin7@gmail.com>:
> On 2018-01-22 21:35, Chris Murphy wrote:
>>
>> On Mon, Jan 22, 2018 at 2:06 PM, Claes Fransson
>> <claes.v.fransson@gmail.com> wrote:
>>>
>>> Hi!
>>>
>>> I really like the features of BTRFS, especially deduplication,
>>> snapshotting and checksumming. However, when using it on my laptop the
>>> last couple of years, it has became corrupted a lot of times.
>>> Sometimes I have managed to fix the problems (at least so much that I
>>> can continue to use the filesystem) with check --repair, but several
>>> times I had to recreate the file system and reinstall the operating
>>> system.
>>>
>>> I am guessing the corruptions might be the results of unclean
>>> shutdowns, mostly after system hangs, but also because of running out
>>> of battery sometimes?
>>
>>
>> I think it's something else because I intentionally and
>> unintentionally do unclean shutdowns (I'm really impatient and I'm a
>> saboteur) on my laptop and I never get corruptions. In 18 months with
>> an HP Spectre which doesn't even have ECC memory, and has an NVMe
>> drive, *and* really remarkable for almost half this time I used the
>> discard mount option which pretty much instantly obliterates unused
>> roots, even when referenced in the super block as backup roots - and
>> yet still zero corruption. No complaints on mount, scrub, or readonly
>> checks. *shrug*
>>
>> Anyway I suspect hardware or power issue. Or even SSD firmware issue.
>
> I would tend to agree here, with one caveat, if it's a laptop that's less
> than 3 years old, you can probably rule out power issues.  Some more info on
> the particular system might help identify what's wrong.

Hi,

I boughtThe laptop new in July 2014, but have had corruption issues
with btrfs I think as long as I have been trying it, since the end of
2014 I think. You can find addtitional info about my laptop in my
original post, please let me know if you want som more info.

>>
>>
>>> Furthermore, the power-led has recently started blinking (also when
>>> the power-cable is plugged in), I guess because of an old and bad
>>> battery. Maybe the current corruption also can have something to do
>>> with this? However I almost always run with power cable plugged in in
>>> last year, only on battery a few seconds a few times when moving the
>>> laptop.
>>>
>>> Currently, I can only mount the filesystem readonly, it goes readonly
>>> automatically if I try to mount it normally.
>>
>>
>> Btrfs is confused and doesn't want to make the corruption worse. >
>>>
>>>
>>> Fstab mount options: noatime,autodefrag (I have been using the option
>>> nossd with older kernels one period in the past on the filesystem).
>>>
>>> If it matters, I have been running duperemove many times on the
>>> filesystem since creation.
>>
>>
>> I don't think it's related.
>>
>>
>>>
>>> To test the RAM, I have been running mprime Blend-test for 24 hours
>>> after the corruption without any error or warning.
>>
>>
>> I'm not familiar with it, pretty sure you want this for UEFI:
>>
>> https://www.memtest86.com/download.htm
>>
>> Where you can use that or memtest86+ if the firmware is BIOS based.
>
> Do keep in mind that just because it passes memory checks does not mean it's
> not an issue with the RAM.  Memory testers rarely throw false positives, but
> it's pretty common to get false negatives from them.>

Okay, thanks for telling me.

>>>
>>> I have never noticed any corruptions on the NTFS and Ext4 file systems
>>> on the laptop, only on the Btrfs file systems.
>>
>>
>> NTFS and ext4 likely won't notice such corruptions either (although
>> new ext4 volumes any day now will have checksummed metadata by
>> default) as they're weren't designed with such detection in mind.
>
> This is extremely important to understand.  BTRFS and ZFS are essentially
> the only filesystems available on Linux that actually validate things enough
> to notice this reliably (ReFS on Windows probably does, and I think whatever
> Apple is calling their new FS does too). Even if ext4 did notice it, it
> would just mark the filesystem for a check and then keep going without doing
> anything else about it (seriously, the default behavior for internal errors
> on ext4 is to just continue like nothing happened and mark the FS for fsck).

Well, personally I think it would be great if I (optionally) could do
that with Btrfs too. Even if it notice me of corruption and I might
even lose e few files, I think it would be good if I could continue to
use the filesystem with normal read/write capabilities, so I wouldnt
need to reinstall the operating system.

Best regards,

Claes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-23 13:06   ` Claes Fransson
@ 2018-01-23 18:13     ` Claes Fransson
  2018-01-24  0:31       ` Chris Murphy
  2018-01-27 14:54     ` Claes Fransson
  1 sibling, 1 reply; 17+ messages in thread
From: Claes Fransson @ 2018-01-23 18:13 UTC (permalink / raw)
  To: Btrfs BTRFS

2018-01-23 14:06 GMT+01:00 Claes Fransson <claes.v.fransson@gmail.com>:
> 2018-01-22 22:22 GMT+01:00 Hugo Mills <hugo@carfax.org.uk>:
>> On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote:
>>> Hi!
>>>
>>> I really like the features of BTRFS, especially deduplication,
>>> snapshotting and checksumming. However, when using it on my laptop the
>>> last couple of years, it has became corrupted a lot of times.
>>> Sometimes I have managed to fix the problems (at least so much that I
>>> can continue to use the filesystem) with check --repair, but several
>>> times I had to recreate the file system and reinstall the operating
>>> system.
>>>
>>> I am guessing the corruptions might be the results of unclean
>>> shutdowns, mostly after system hangs, but also because of running out
>>> of battery sometimes?
>>> Furthermore, the power-led has recently started blinking (also when
>>> the power-cable is plugged in), I guess because of an old and bad
>>> battery. Maybe the current corruption also can have something to do
>>> with this? However I almost always run with power cable plugged in in
>>> last year, only on battery a few seconds a few times when moving the
>>> laptop.
>>>
>>> Currently, I can only mount the filesystem readonly, it goes readonly
>>> automatically if I try to mount it normally.
>>>
>>> When booting an OpenSUSE Tumbleweed-20180119 live-iso:
>>> localhost:~ # uname -r
>>> 4.14.13-1-default
>>> localhost:~ # btrfs --version
>>> btrfs-progs v4.14.1
>>>
>>> localhost:~ # btrfs check -p /dev/sda12
>>> Checking filesystem on /dev/sda12
>>
>> [fixing up bad paste]
>>
>>> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>>> bad key ordering 159 160 bad block 690436964352
>>> ERROR: errors found in extent allocation tree or chunk allocation
>>> checking free space cache [.]
>>> checking fs roots [o]
>>> checking csums
>>> bad key ordering 159 160
>>> Error looking up extent record -1
>>
>> [snip]
>>
>>> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
>>> /dev/sda12
>>> btrfs-progs v4.14.1
>>>     leaf 690436964352 items 170 free space 1811 generation 196864 owner 2
>>>     leaf 690436964352 flags 0x1(WRITTEN) backref revision 1
>>>     fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>>>     chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1
>>> .
>>> .
>>> .
>>>         item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
>>>                 refs 1 gen 821 flags DATA
>>>                 extent data backref root 287 objectid 51665 offset 0 count 1
>>>         item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
>>>                 refs 1 gen 821 flags DATA
>>>                 extent data backref root 287 objectid 51666 offset 0 count 1
>>>         item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
>>> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` triggered, value 1
>>> btrfs(+0x365c6)[0x55bdfaada5c6]
>>> btrfs(print_extent_item+0x424)[0x55bdfaadb284]
>>> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
>>> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
>>> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
>>> btrfs(main+0x7d)[0x55bdfaac7d4d]
>>> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
>>> btrfs(_start+0x2a)[0x55bdfaac7e5a]
>>> Aborted (core dumped)
>>
>>    Wow, I've never seen it do that before. It's the next thing I'd
>> have asked for, so it's good you've preempted it.
>>
>>    The main thing is that bad key ordering is almost always due to RAM
>> corruption. That's either bad RAM, or dodgy power regulation -- the
>> latter could be the PSU, or capacitors on the motherboard. (In this
>> case, it might also be something funny with the battery).
>>
>>    I would definitely recommend a long run of memtest86. At least 8
>> hours, preferably 24. If you get errors repeatedly in the sme place,
>> it's the RAM. If they appear randomly, it's probably the power
>> regulation.
>>
> Thanks for the suggestion, I will try to do this in the next days.
>

I haven't noticed before that there is actually RAM-modules from
different vendors in the laptop. One 8GB by Samsung, and one 4GB by
Kingston! Maybe that is a source for the corruptions.
I also found that there indeed was a new firmware version for my
SSD-disk, so I have now updated it's firmware to the newest version.
Unfortunately I couldn't find any information of what possible issues
it was supposed to fix. The laptop has already the latest BIOS version
provided by ASUS for the model.
I have not yet run the memtest86.

Claes

>> [snip]
>>
>>>
>>> The filesystem had become pretty full, I had planned to increase the
>>> Btrfs-partition size before it became corrupt.
>>>
>>> Active kernel when the filesystem went read only: OpenSUSE Linux
>>> 4.14.14-1.geef6178-default, from the
>>> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
>>> repository.
>>>
>>> Fstab mount options: noatime,autodefrag (I have been using the option
>>> nossd with older kernels one period in the past on the filesystem).
>>>
>>> If it matters, I have been running duperemove many times on the
>>> filesystem since creation.
>>>
>>> To test the RAM, I have been running mprime Blend-test for 24 hours
>>> after the corruption without any error or warning.
>>
>>    Of all of the bad key order errors I've seen (dozens), I think
>> there were a whole two which turned out not to be obviously related to
>> corrupt RAM. I still say that it's most likely the hardware.
>
> Okay, thank you for sharing your experience with me.
>
>>
>>> Is there a way I can try to repair this filesystem without the need to
>>> recreate it and reinstall the operating system? A reinstall including
>>> all currently installed packages, and restoring all current system
>>> settings, would probably take some time for me to do.
>>> If it is currently not repairable, it would be nice if this kind of
>>> corruption could be repaired in the future, even if losing a few
>>> files. Or if the corruptions could be avoided in the first place.
>>
>>    Given that the current tools crash, the answer's a definite
>> no. However, if you can get a developer interested, they may be able
>> to write a fix for it, given an image of the FS (using btrfs-image).
>>
> Okay, will try to produce and upload an image within the next week.
>
>
>> [snip]
>>> I have never noticed any corruptions on the NTFS and Ext4 file systems
>>> on the laptop, only on the Btrfs file systems.
>>
>>    You've never _noticed_ them. :)
>>
>>    Hugo.
>>
>> --
>> Hugo Mills             | ... one ping(1) to rule them all, and in the
>> hugo@... carfax.org.uk | darkness bind(2) them.
>> http://carfax.org.uk/  |
>> PGP: E2AB1DE4          |                                                Illiad
>
> Thank you for your answers.
>
> Claes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-23 18:13     ` Claes Fransson
@ 2018-01-24  0:31       ` Chris Murphy
  2018-01-24 19:44         ` Claes Fransson
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Murphy @ 2018-01-24  0:31 UTC (permalink / raw)
  To: Claes Fransson; +Cc: Btrfs BTRFS

On Tue, Jan 23, 2018 at 11:13 AM, Claes Fransson
<claes.v.fransson@gmail.com> wrote:

> I haven't noticed before that there is actually RAM-modules from
> different vendors in the laptop. One 8GB by Samsung, and one 4GB by
> Kingston!

If they have the correct tolerances, I don't think it's a problem.
Some memory controllers use a kind of interleaving if the module sizes
are the same, so worse case you might be leaving a bit of a
performance improvement on the table by the fact they aren't the same
size.

If the memory testing doesn't pan out, you could go down a bit of a
rabbit hole and run each module in production for twice the length of
time you figure you should see a corruption appear.

> I also found that there indeed was a new firmware version for my
> SSD-disk, so I have now updated it's firmware to the newest version.
> Unfortunately I couldn't find any information of what possible issues
> it was supposed to fix. The laptop has already the latest BIOS version
> provided by ASUS for the model.

I don't know enough about the bad key ordering error and its cause. If
that corruption can happen only in memory then the SSD firmware update
may change nothing. If there's some possibility the corruption can be
the result of SSD firmware bugs, then it might make sense to use DUP
metadata in the short term, even on an SSD. Any memory corruption
would affect both copies. Any SSD induced corruption *might* affect
both copies, depending on whether the SSD deduplicates or colocates
the two copies of metadata...but I'd like to think that there's at
least a pretty decent chance one of the copies would be good in which
case you'd get Btrfs self-healing for metadata only.

Anyway, it's a tedious search.

As for Btrfs getting better at handling these kinds of cases. Yeah
it's a valid question. What we know about other file systems is they
can become unrepairable because they don't detect corruption soon
enough. Whereas Btrfs has detected a problem early on yet it's still
damaged enough now that effectively you can no longer mount it rw.
>From a data integrity point of view, at least you can ro mount and get
your data off the volume with a normal file copy operation, not
something that's certain with other file systems.

If you were to try another file system, I'd look at XFS, tools and
kernels in the past couple of years support metadata checksumming with
the V5 format.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-23 12:51   ` Austin S. Hemmelgarn
  2018-01-23 13:29     ` Claes Fransson
@ 2018-01-24  0:44     ` Chris Murphy
  2018-01-24 12:30       ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 17+ messages in thread
From: Chris Murphy @ 2018-01-24  0:44 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Claes Fransson, Btrfs BTRFS

On Tue, Jan 23, 2018 at 5:51 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

> This is extremely important to understand.  BTRFS and ZFS are essentially
> the only filesystems available on Linux that actually validate things enough
> to notice this reliably (ReFS on Windows probably does, and I think whatever
> Apple is calling their new FS does too).

ReFS always checksums metadata, optionally can checksum data.

APFS is really vague on this front, it may be checksumming metadata,
it's not checksumming data and with no option to. Apple proposes their
branded storage devices do not return bogus data. OK so then why
checksum the metadata?

>Even if ext4 did notice it, it
> would just mark the filesystem for a check and then keep going without doing
> anything else about it (seriously, the default behavior for internal errors
> on ext4 is to just continue like nothing happened and mark the FS for fsck).

I haven't used ext4 with metadata checksumming enabled, and have no
idea how it behaves when it starts encountering checksum errors during
normal use. For sure XFS will complain a lot and will go read only
when it gets confused. I'd expect any file system going to the trouble
of checksumming would have to have some means of bailing out, rather
than just continuing on.

Btrfs (and maybe ZFS) COW everything except supers. So ostensibly a
future feature might let them continue on with a kind of
integrated/single volume variation on seed/sprout device. I'd like to
see something like this just for undoable and testable offline
repairs, rather than offline repair only being predicated on
overwritting metadata.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-24  0:44     ` Chris Murphy
@ 2018-01-24 12:30       ` Austin S. Hemmelgarn
  2018-01-24 23:54         ` Chris Murphy
  0 siblings, 1 reply; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-24 12:30 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Claes Fransson, Btrfs BTRFS

On 2018-01-23 19:44, Chris Murphy wrote:
> On Tue, Jan 23, 2018 at 5:51 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
> 
>> This is extremely important to understand.  BTRFS and ZFS are essentially
>> the only filesystems available on Linux that actually validate things enough
>> to notice this reliably (ReFS on Windows probably does, and I think whatever
>> Apple is calling their new FS does too).
> 
> ReFS always checksums metadata, optionally can checksum data.
Good to know, I've not actually dealt with ReFS myself yet (we're mostly 
a Linux shop where I work, and the two Windows servers we do have aren't 
using ReFS simply because it wasn't beyond the technology preview level 
when we installed them and we don't want to screw anything up).
> 
> APFS is really vague on this front, it may be checksumming metadata,
> it's not checksumming data and with no option to. Apple proposes their
> branded storage devices do not return bogus data. OK so then why
> checksum the metadata?
Even aside from the fact that it might be checksumming data, Apple's 
storage engineers are still smoking something pretty damn strong if they 
think that they can claim their storage devices _never_ return bogus 
data.  Either they're running some kind of checksumming _and_ 
replication below the block layer in the storage device itself (which 
actually might explain the insane cost of at least one piece of their 
hardware), or they think they've come up with some fail-safe way to 
detect corruption and return errors reliably, and in either case things 
can still fail.  I smell a potential future lawsuit in the works...
> 
>> Even if ext4 did notice it, it
>> would just mark the filesystem for a check and then keep going without doing
>> anything else about it (seriously, the default behavior for internal errors
>> on ext4 is to just continue like nothing happened and mark the FS for fsck).
> 
> I haven't used ext4 with metadata checksumming enabled, and have no
> idea how it behaves when it starts encountering checksum errors during
> normal use. For sure XFS will complain a lot and will go read only
> when it gets confused. I'd expect any file system going to the trouble
> of checksumming would have to have some means of bailing out, rather
> than just continuing on.
Actually, I forgot about the (newer) metadata checksumming feature in 
ext4, and was just basing my statement on behavior the last time I used 
it for anything serious.  Having just checked mkfs.ext4, it appears that 
the metadata in the SB that tells the kernel what to do when it runs 
into an error for the FS still defaults to continuing on as if nothing 
happens, even if you enable metadata checksumming (which still seems to 
be disabled by default).  Whether or not that actually is honored by 
modern kernels, I don't know, but I've seen no evidence to suggest that 
it isn't.
> 
> Btrfs (and maybe ZFS) COW everything except supers. So ostensibly a
> future feature might let them continue on with a kind of
> integrated/single volume variation on seed/sprout device. I'd like to
> see something like this just for undoable and testable offline
> repairs, rather than offline repair only being predicated on
> overwritting metadata.Agreed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-24  0:31       ` Chris Murphy
@ 2018-01-24 19:44         ` Claes Fransson
  2018-01-24 23:15           ` Duncan
       [not found]           ` <CAEY8F1pVrZnf3M6mGJaxogx14ZrJ5CV3++_-y13sTniJ3ds4ww@mail.gmail.com>
  0 siblings, 2 replies; 17+ messages in thread
From: Claes Fransson @ 2018-01-24 19:44 UTC (permalink / raw)
  Cc: Btrfs BTRFS

On Jan 24, 2018 01:31, "Chris Murphy" <lists@colorremedies.com> wrote:

On Tue, Jan 23, 2018 at 11:13 AM, Claes Fransson
<claes.v.fransson@gmail.com> wrote:

> I haven't noticed before that there is actually RAM-modules from
> different vendors in the laptop. One 8GB by Samsung, and one 4GB by
> Kingston!

If they have the correct tolerances, I don't think it's a problem.
Some memory controllers use a kind of interleaving if the module sizes
are the same, so worse case you might be leaving a bit of a
performance improvement on the table by the fact they aren't the same
size.

If the memory testing doesn't pan out, you could go down a bit of a
rabbit hole and run each module in production for twice the length of
time you figure you should see a corruption appear.

So, I have now some results from the PassMark Memtest86! I let the
default automatic tests run for about 19 hours and 16 passes. It
reported zero "Errors", but 4 lines of "[Note] RAM may be vulnerable
to high frequency row hammer bit flips". If I understand it correctly,
it means that some errors were detected when the RAM was tested at
higher rates than guaranteed accurate by the vendors. I am not sure
what that may indicate regarding the performance of the RAM for my
Btrfs filesystem. I "only" got irreparable corruptions maybe once
every couple of months or half a year.

I also forgot that I have been trying using Zswap the last couple of
months with OpenSUSE on the Btrfs-filesystem (and also Fedora on the
Ext4-partition). Maybe that is a source for the last corruption (I am
pretty sure I was not using Zswap during previous corruptions, of
which I think at least one was reporting "transid verify failed" or
similar.) Sometimes, but not when the filesystem went readonly, the
computer has been freezing almost completely (mouse pointer moving
only extremely slowly) when running out of RAM the last months. I have
sometimes waited many hours for the operating system to swap out not
so important memory to the swap-partition, but end up having to force
a reboot. I suspect that it might be Zswap not working optimally,
maybe it also affects Btrfs? I have used pretty low swappiness values,
1 or 10.

I might try using only one of the RAM modules in the future if nothing
else works. I usually use most of my available 12 GB RAM though (and
often even more :) ) when using my laptop.

> I also found that there indeed was a new firmware version for my
> SSD-disk, so I have now updated it's firmware to the newest version.
> Unfortunately I couldn't find any information of what possible issues
> it was supposed to fix. The laptop has already the latest BIOS version
> provided by ASUS for the model.

I don't know enough about the bad key ordering error and its cause. If
that corruption can happen only in memory then the SSD firmware update
may change nothing. If there's some possibility the corruption can be
the result of SSD firmware bugs, then it might make sense to use DUP
metadata in the short term, even on an SSD. Any memory corruption
would affect both copies. Any SSD induced corruption *might* affect
both copies, depending on whether the SSD deduplicates or colocates
the two copies of metadata...but I'd like to think that there's at
least a pretty decent chance one of the copies would be good in which
case you'd get Btrfs self-healing for metadata only.

Thanks, I might try metadata DUP in the future.

Anyway, it's a tedious search.

As for Btrfs getting better at handling these kinds of cases. Yeah
it's a valid question. What we know about other file systems is they
can become unrepairable because they don't detect corruption soon
enough. Whereas Btrfs has detected a problem early on yet it's still
damaged enough now that effectively you can no longer mount it rw.
>From a data integrity point of view, at least you can ro mount and get
your data off the volume with a normal file copy operation, not
something that's certain with other file systems.

If you were to try another file system, I'd look at XFS, tools and
kernels in the past couple of years support metadata checksumming with
the V5 format.

Yes, XFS should also have deduplication as an experimental feature.
Don't know how stable it is yet, I might try it. In the future it is
also supposed to get snapshot feature.

Thanks for all your tips and thoughts.

Claes

--
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-24 19:44         ` Claes Fransson
@ 2018-01-24 23:15           ` Duncan
       [not found]           ` <CAEY8F1pVrZnf3M6mGJaxogx14ZrJ5CV3++_-y13sTniJ3ds4ww@mail.gmail.com>
  1 sibling, 0 replies; 17+ messages in thread
From: Duncan @ 2018-01-24 23:15 UTC (permalink / raw)
  To: linux-btrfs

Claes Fransson posted on Wed, 24 Jan 2018 20:44:33 +0100 as excerpted:

> So, I have now some results from the PassMark Memtest86! I let the
> default automatic tests run for about 19 hours and 16 passes. It
> reported zero "Errors", but 4 lines of "[Note] RAM may be vulnerable to
> high frequency row hammer bit flips". If I understand it correctly,
> it means that some errors were detected when the RAM was tested at
> higher rates than guaranteed accurate by the vendors.

>From Wikipedia:

Row hammer (also written as rowhammer) is an unintended side effect in 
dynamic random-access memory (DRAM) that causes memory cells to leak 
their charges and interact electrically between themselves, possibly 
altering the contents of nearby memory rows that were not addressed in 
the original memory access. This circumvention of the isolation between 
DRAM memory cells results from the high cell density in modern DRAM, and 
can be triggered by specially crafted memory access patterns that rapidly 
activate the same memory rows numerous times.[1][2][3]

The row hammer effect has been used in some privilege escalation computer 
security exploits.

https://en.wikipedia.org/wiki/Row_hammer

So it has nothing to do with (generic) testing the RAM at higher rates 
than guaranteed by the vendors, but rather, with deliberate rapid 
repeated access (at normal clock rates) of the same cell rows in ordered 
to trigger a bitflip in nearby memory cells that could not normally be 
accessed due to process separation and insufficient privileges.

IOW, it's unlikely to be accidentally tripped, and thus is exceedingly 
unlikely to be relevant here, unless you're being hacked, of course.

That said, and entirely unrelated to rowhammer, I know one of the 
problems of memory test false-negatives from experience.

In my case, I was even running ECC RAM.  But the memory I had purchased 
(back in the day when memory was far more expensive and sub-GB memory was 
the norm) was cheap, and as it happened, marked as stable at slightly 
higher clock rates than it actually was.  But I couldn't afford more (or 
I'd have procured less dodgy RAM in the first place) and had little 
recourse but to live with it for awhile.  A year or so later there was a 
BIOS update that added better memory clocking control, and I was able to 
declock the RAM slightly from its rating (IIRC to PC-3000 level, it was 
PC3200 rated, this was DDR1 era), after which it was /entirely/ stable, 
even after reducing some of the wait-state settings somewhat to try to 
claw back some of what I lost due to the underclocking.

I run gentoo, and nearly all of my problems occurred when I was doing 
updates, building packages at 100% CPU with multiple cores accessing the 
same RAM.  FWIW, the most frequent /detected/ problem was bunzip checksum 
errors as it decompressed and verified the data in memory (before writing 
out)... that would move or go away if I tried again.  Occasionally I'd 
get machine-check errors (MCEs), but not frequently, and the ECC RAM 
subsystem /never/ reported errors.

But the memory tests gave that memory an all-clear.

The problem with the memory tests in this case is that they tend to work 
on an otherwise unloaded system, and test the retention of the memory 
cells, /not/ so much the speed and reliability at which they are accessed 
under fully loaded system stress -- and how could they when memory speed 
is normally set by the BIOS and not something the memory tester has 
access to?

But my memory problems weren't with the memory cells themselves -- they 
retained their data just fine and indeed it was ECC RAM so would have 
triggered ECC errors if they didn't -- but with the precision timing of 
memory IO -- it wasn't quite up to the specs it claimed to support and 
would occasionally produce in-transit errors (the ECC would have detected 
and possibly corrected errors in storage), and the memory testers simply 
didn't test that like a fully loaded system doing unpacks of sources and 
builds from them did.

As mentioned, once I got a BIOS update that let me declock the RAM a bit, 
everything was fine, and it remained fine when I did upgrade the RAM some 
years later, after prices had fallen, as well.

(The system was first-gen AMD Opteron, on a server-grade Tyan board, that 
I ran from purchase in late 2003 for over eight years, maxing out the 
pair of CPUs to dual-core Opteron 290s and the RAM to 8 gigs, over time, 
until the board finally died in 2012 due to burst capacitors.  Which 
reminds me, I'm still running the replacement, a Gigabyte with an fx6100 
overclocked a bit to 3.9 GHz and 16 gig RAM, and it's now nearing six 
years old, so I suppose I better start planning for the next upgrade...  
I've spent that six years upgrading to big-screen TVs as monitors, with a 
65inch/165cm 4K as my primary now and a 48inch/122cm as a secondary to 
put youtube or whatever on fullscreen, and to now my second generation of 
ssds, a pair of 1 TB samsung evos, but this reminds me that at nearing 
six years old the main system's aging too, so I better start thinking of 
replacing it again...)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-24 12:30       ` Austin S. Hemmelgarn
@ 2018-01-24 23:54         ` Chris Murphy
  2018-01-25 12:41           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Murphy @ 2018-01-24 23:54 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Claes Fransson, Btrfs BTRFS

On Wed, Jan 24, 2018 at 5:30 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

>> APFS is really vague on this front, it may be checksumming metadata,
>> it's not checksumming data and with no option to. Apple proposes their
>> branded storage devices do not return bogus data. OK so then why
>> checksum the metadata?
>
> Even aside from the fact that it might be checksumming data, Apple's storage
> engineers are still smoking something pretty damn strong if they think that
> they can claim their storage devices _never_ return bogus data.  Either
> they're running some kind of checksumming _and_ replication below the block
> layer in the storage device itself (which actually might explain the insane
> cost of at least one piece of their hardware), or they think they've come up
> with some fail-safe way to detect corruption and return errors reliably, and
> in either case things can still fail.  I smell a potential future lawsuit in
> the works.

I read somewhere the hardware (or more correctly their flash firmware)
supposedly uses 128 bytes of checksum per 4KB data. That's a lot, I
wonder if it's actually some kind of parity. But regardless, this kind
of in-hardware checksumming won't account for things like misdirected
or torn writes or literally any sort of corruption happening prior to
the flash firmware computing those checksums.

On flash storage, maybe they're just concerned about bit rot or even
the most superficial bit flips, and having just enough information to
detect and correct for 1 or 2 flips per 4KB, not totally dissimilar to
ECC memory. But that they don't use ECC memory, leave them open to
corruption in the storage stack happening outside the literal storage
device.

> Actually, I forgot about the (newer) metadata checksumming feature in ext4,
> and was just basing my statement on behavior the last time I used it for
> anything serious.  Having just checked mkfs.ext4, it appears that the
> metadata in the SB that tells the kernel what to do when it runs into an
> error for the FS still defaults to continuing on as if nothing happens, even
> if you enable metadata checksumming (which still seems to be disabled by
> default).  Whether or not that actually is honored by modern kernels, I
> don't know, but I've seen no evidence to suggest that it isn't.

Depending on the corruption, Btrfs continues as well. If I corrupt a
deadend leaf that contains file metadata (like names or security
contexts), I just get some complaints of corruption. The file system
remains rw mounted though. I don't know the metric by which metadata
can be damaged and Btrfs says "whoooaa!!" and puts on the brakes by
going read only. XFS certainly has its limits and goes read only when
it detects certain metadata corruption via checksum fail. I'd guess
ext4 will do the same thing, otherwise whats the point if it's going
to knowingly eat itself alive?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-24 23:54         ` Chris Murphy
@ 2018-01-25 12:41           ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2018-01-25 12:41 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Claes Fransson, Btrfs BTRFS

On 2018-01-24 18:54, Chris Murphy wrote:
> On Wed, Jan 24, 2018 at 5:30 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
> 
>>> APFS is really vague on this front, it may be checksumming metadata,
>>> it's not checksumming data and with no option to. Apple proposes their
>>> branded storage devices do not return bogus data. OK so then why
>>> checksum the metadata?
>>
>> Even aside from the fact that it might be checksumming data, Apple's storage
>> engineers are still smoking something pretty damn strong if they think that
>> they can claim their storage devices _never_ return bogus data.  Either
>> they're running some kind of checksumming _and_ replication below the block
>> layer in the storage device itself (which actually might explain the insane
>> cost of at least one piece of their hardware), or they think they've come up
>> with some fail-safe way to detect corruption and return errors reliably, and
>> in either case things can still fail.  I smell a potential future lawsuit in
>> the works.
> 
> 
> I read somewhere the hardware (or more correctly their flash firmware)
> supposedly uses 128 bytes of checksum per 4KB data. That's a lot, I
> wonder if it's actually some kind of parity. But regardless, this kind
> of in-hardware checksumming won't account for things like misdirected
> or torn writes or literally any sort of corruption happening prior to
> the flash firmware computing those checksums.
It's most likely more generic erasure coding (parity as most people 
think of it in the storage sense (RAID5 and RAID6) is a special case of 
(n, n-1) or (n, n-2) erasure coding that happens to be optimal), so in 
theory they could correct up to 1024 bits of errors, which is all well 
and good, but as you say doesn't really protect against much (more 
specifically, it only protects reliably against cell discharges from 
various sources, or more generic read-disturb errors).
> 
> On flash storage, maybe they're just concerned about bit rot or even
> the most superficial bit flips, and having just enough information to
> detect and correct for 1 or 2 flips per 4KB, not totally dissimilar to
> ECC memory. But that they don't use ECC memory, leave them open to
> corruption in the storage stack happening outside the literal storage
> device.
They also don't appear to use T.10 DIF (or whatever the T.13 equivalent 
that I can never remember the name of is), which means even if they did 
use ECC RAM they would still have a period of time where the data is 
unprotected.
> 
>> Actually, I forgot about the (newer) metadata checksumming feature in ext4,
>> and was just basing my statement on behavior the last time I used it for
>> anything serious.  Having just checked mkfs.ext4, it appears that the
>> metadata in the SB that tells the kernel what to do when it runs into an
>> error for the FS still defaults to continuing on as if nothing happens, even
>> if you enable metadata checksumming (which still seems to be disabled by
>> default).  Whether or not that actually is honored by modern kernels, I
>> don't know, but I've seen no evidence to suggest that it isn't.
> 
> 
> Depending on the corruption, Btrfs continues as well. If I corrupt a
> deadend leaf that contains file metadata (like names or security
> contexts), I just get some complaints of corruption. The file system
> remains rw mounted though. I don't know the metric by which metadata
> can be damaged and Btrfs says "whoooaa!!" and puts on the brakes by
> going read only. XFS certainly has its limits and goes read only when
> it detects certain metadata corruption via checksum fail. I'd guess
> ext4 will do the same thing, otherwise whats the point if it's going
> to knowingly eat itself alive?
I'm pretty sure the ext4 behavior is a hold-over from the original ext 
filesystem, and I think even as far back as the version of the MINIX 
filesystem that Linux originally used (which ext evolved out of).  At a 
minimum, all three error behaviors (panic, go read-only, or flag and 
ignore) have been around since the early days of ext2.

FWIW, there are some cases where it does make sense to just not care and 
ignore the errors.  As a pretty specific example, one of the last 
remaining places I still use ext4 is on top of compressed ramdisks when 
I need some quick ephemeral storage that I want to be more memory 
efficient than tmpfs.  In such cases, the FS gets mounted exactly once, 
and is usually used only for a very short period of time, and as a 
result, the 'on-disk' data doesn't really matter much, so there's not 
much point in worrying about it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
  2018-01-23 13:06   ` Claes Fransson
  2018-01-23 18:13     ` Claes Fransson
@ 2018-01-27 14:54     ` Claes Fransson
  1 sibling, 0 replies; 17+ messages in thread
From: Claes Fransson @ 2018-01-27 14:54 UTC (permalink / raw)
  To: Btrfs BTRFS

2018-01-23 14:06 GMT+01:00 Claes Fransson <claes.v.fransson@gmail.com>:
> 2018-01-22 22:22 GMT+01:00 Hugo Mills <hugo@carfax.org.uk>:
>> On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote:
>>> Hi!
>>>
>>> I really like the features of BTRFS, especially deduplication,
>>> snapshotting and checksumming. However, when using it on my laptop the
>>> last couple of years, it has became corrupted a lot of times.
>>> Sometimes I have managed to fix the problems (at least so much that I
>>> can continue to use the filesystem) with check --repair, but several
>>> times I had to recreate the file system and reinstall the operating
>>> system.
>>>
>>> I am guessing the corruptions might be the results of unclean
>>> shutdowns, mostly after system hangs, but also because of running out
>>> of battery sometimes?
>>> Furthermore, the power-led has recently started blinking (also when
>>> the power-cable is plugged in), I guess because of an old and bad
>>> battery. Maybe the current corruption also can have something to do
>>> with this? However I almost always run with power cable plugged in in
>>> last year, only on battery a few seconds a few times when moving the
>>> laptop.
>>>
>>> Currently, I can only mount the filesystem readonly, it goes readonly
>>> automatically if I try to mount it normally.
>>>
>>> When booting an OpenSUSE Tumbleweed-20180119 live-iso:
>>> localhost:~ # uname -r
>>> 4.14.13-1-default
>>> localhost:~ # btrfs --version
>>> btrfs-progs v4.14.1
>>>
>>> localhost:~ # btrfs check -p /dev/sda12
>>> Checking filesystem on /dev/sda12
>>
>> [fixing up bad paste]
>>
>>> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>>> bad key ordering 159 160 bad block 690436964352
>>> ERROR: errors found in extent allocation tree or chunk allocation
>>> checking free space cache [.]
>>> checking fs roots [o]
>>> checking csums
>>> bad key ordering 159 160
>>> Error looking up extent record -1
>>
>> [snip]
>>
>>> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
>>> /dev/sda12
>>> btrfs-progs v4.14.1
>>>     leaf 690436964352 items 170 free space 1811 generation 196864 owner 2
>>>     leaf 690436964352 flags 0x1(WRITTEN) backref revision 1
>>>     fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>>>     chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1
>>> .
>>> .
>>> .
>>>         item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
>>>                 refs 1 gen 821 flags DATA
>>>                 extent data backref root 287 objectid 51665 offset 0 count 1
>>>         item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
>>>                 refs 1 gen 821 flags DATA
>>>                 extent data backref root 287 objectid 51666 offset 0 count 1
>>>         item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
>>> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` triggered, value 1
>>> btrfs(+0x365c6)[0x55bdfaada5c6]
>>> btrfs(print_extent_item+0x424)[0x55bdfaadb284]
>>> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
>>> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
>>> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
>>> btrfs(main+0x7d)[0x55bdfaac7d4d]
>>> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
>>> btrfs(_start+0x2a)[0x55bdfaac7e5a]
>>> Aborted (core dumped)
>>
>>    Wow, I've never seen it do that before. It's the next thing I'd
>> have asked for, so it's good you've preempted it.
>>
>>    The main thing is that bad key ordering is almost always due to RAM
>> corruption. That's either bad RAM, or dodgy power regulation -- the
>> latter could be the PSU, or capacitors on the motherboard. (In this
>> case, it might also be something funny with the battery).
>>
>>    I would definitely recommend a long run of memtest86. At least 8
>> hours, preferably 24. If you get errors repeatedly in the sme place,
>> it's the RAM. If they appear randomly, it's probably the power
>> regulation.
>>
> Thanks for the suggestion, I will try to do this in the next days.
>
>> [snip]
>>
>>>
>>> The filesystem had become pretty full, I had planned to increase the
>>> Btrfs-partition size before it became corrupt.
>>>
>>> Active kernel when the filesystem went read only: OpenSUSE Linux
>>> 4.14.14-1.geef6178-default, from the
>>> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
>>> repository.
>>>
>>> Fstab mount options: noatime,autodefrag (I have been using the option
>>> nossd with older kernels one period in the past on the filesystem).
>>>
>>> If it matters, I have been running duperemove many times on the
>>> filesystem since creation.
>>>
>>> To test the RAM, I have been running mprime Blend-test for 24 hours
>>> after the corruption without any error or warning.
>>
>>    Of all of the bad key order errors I've seen (dozens), I think
>> there were a whole two which turned out not to be obviously related to
>> corrupt RAM. I still say that it's most likely the hardware.
>
> Okay, thank you for sharing your experience with me.
>
>>
>>> Is there a way I can try to repair this filesystem without the need to
>>> recreate it and reinstall the operating system? A reinstall including
>>> all currently installed packages, and restoring all current system
>>> settings, would probably take some time for me to do.
>>> If it is currently not repairable, it would be nice if this kind of
>>> corruption could be repaired in the future, even if losing a few
>>> files. Or if the corruptions could be avoided in the first place.
>>
>>    Given that the current tools crash, the answer's a definite
>> no. However, if you can get a developer interested, they may be able
>> to write a fix for it, given an image of the FS (using btrfs-image).
>>
> Okay, will try to produce and upload an image within the next week.
>
>

I have now uploaded a btrfs-image of the file system to the cloud:
https://drive.google.com/file/d/1r2nesQy_W4wVb00BdZc5o2wqS8mHtOkK/view?usp=sharing


Claes


>> [snip]
>>> I have never noticed any corruptions on the NTFS and Ext4 file systems
>>> on the laptop, only on the Btrfs file systems.
>>
>>    You've never _noticed_ them. :)
>>
>>    Hugo.
>>
>> --
>> Hugo Mills             | ... one ping(1) to rule them all, and in the
>> hugo@... carfax.org.uk | darkness bind(2) them.
>> http://carfax.org.uk/  |
>> PGP: E2AB1DE4          |                                                Illiad
>
> Thank you for your answers.
>
> Claes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad key ordering - repairable?
       [not found]           ` <CAEY8F1pVrZnf3M6mGJaxogx14ZrJ5CV3++_-y13sTniJ3ds4ww@mail.gmail.com>
@ 2018-01-27 17:42             ` Claes Fransson
  0 siblings, 0 replies; 17+ messages in thread
From: Claes Fransson @ 2018-01-27 17:42 UTC (permalink / raw)
  To: Btrfs BTRFS

2018-01-27 18:32 GMT+01:00 Claes Fransson <claes.v.fransson@gmail.com>:
>
> Duncan Wed, 24 Jan 2018 15:18:25 -0800
>
> Claes Fransson posted on Wed, 24 Jan 2018 20:44:33 +0100 as excerpted:
>
> > So, I have now some results from the PassMark Memtest86! I let the
> > default automatic tests run for about 19 hours and 16 passes. It
> > reported zero "Errors", but 4 lines of "[Note] RAM may be vulnerable to
> > high frequency row hammer bit flips". If I understand it correctly,
> > it means that some errors were detected when the RAM was tested at
> > higher rates than guaranteed accurate by the vendors.
>
> >From Wikipedia:
>
>> Row hammer (also written as rowhammer) is an unintended side effect in
>> dynamic random-access memory (DRAM) that causes memory cells to leak
>> their charges and interact electrically between themselves, possibly
>> altering the contents of nearby memory rows that were not addressed in
>> the original memory access. This circumvention of the isolation between
>> DRAM memory cells results from the high cell density in modern DRAM, and
>> can be triggered by specially crafted memory access patterns that rapidly
>> activate the same memory rows numerous times.[1][2][3]
>>
>> The row hammer effect has been used in some privilege escalation computer
>> security exploits.
>>
>> https://en.wikipedia.org/wiki/Row_hammer
>>
>> So it has nothing to do with (generic) testing the RAM at higher rates
>> than guaranteed by the vendors, but rather, with deliberate rapid
>> repeated access (at normal clock rates) of the same cell rows in ordered
>> to trigger a bitflip in nearby memory cells that could not normally be
>> accessed due to process separation and insufficient privileges.
>
>
Well, I was thinking of the specific error message by memtest86.
According to the PassMark website,
https://www.memtest86.com/troubleshooting.htm, "Why am I only getting
errors during Test 13 Hammer Test?", second paragraph.
Thanks for the Wikipedia explanation though.
>
>> IOW, it's unlikely to be accidentally tripped, and thus is exceedingly
>> unlikely to be relevant here, unless you're being hacked, of course.
>
>
Okay, thanks for your conclusion.
>
>>
> That said, and entirely unrelated to rowhammer, I know one of the
> problems of memory test false-negatives from experience.
>
> In my case, I was even running ECC RAM.  But the memory I had purchased
> (back in the day when memory was far more expensive and sub-GB memory was
> the norm) was cheap, and as it happened, marked as stable at slightly
> higher clock rates than it actually was.  But I couldn't afford more (or
> I'd have procured less dodgy RAM in the first place) and had little
> recourse but to live with it for awhile.  A year or so later there was a
> BIOS update that added better memory clocking control, and I was able to
> declock the RAM slightly from its rating (IIRC to PC-3000 level, it was
> PC3200 rated, this was DDR1 era), after which it was /entirely/ stable,
> even after reducing some of the wait-state settings somewhat to try to
> claw back some of what I lost due to the underclocking.
>
> I run gentoo, and nearly all of my problems occurred when I was doing
> updates, building packages at 100% CPU with multiple cores accessing the
> same RAM.  FWIW, the most frequent /detected/ problem was bunzip checksum
> errors as it decompressed and verified the data in memory (before writing
> out)... that would move or go away if I tried again.  Occasionally I'd
> get machine-check errors (MCEs), but not frequently, and the ECC RAM
> subsystem /never/ reported errors.
>
My filesystem went readonly just after I did some updating of a lot of
packages (I think it was thousands of packages :) ), so massive
disk-IO for me, but possible also some CPU and RAM usage...
>
>> But the memory tests gave that memory an all-clear.
>
>
>>> The problem with the memory tests in this case is that they tend to work
>>> on an otherwise unloaded system, and test the retention of the memory
>>> cells, /not/ so much the speed and reliability at which they are accessed
>>> under fully loaded system stress -- and how could they when memory speed
>>> is normally set by the BIOS and not something the memory tester has
>>> access to?
>>>
>>> But my memory problems weren't with the memory cells themselves -- they
>>> retained their data just fine and indeed it was ECC RAM so would have
>>> triggered ECC errors if they didn't -- but with the precision timing of
>>> memory IO -- it wasn't quite up to the specs it claimed to support and
>>> would occasionally produce in-transit errors (the ECC would have detected
>>> and possibly corrected errors in storage), and the memory testers simply
>>> didn't test that like a fully loaded system doing unpacks of sources and
>>> builds from them did.
>>>
>>> As mentioned, once I got a BIOS update that let me declock the RAM a bit,
>>> everything was fine, and it remained fine when I did upgrade the RAM some
>>> years later, after prices had fallen, as well.
>
>
Thanks for telling, but unfortunately I do not have any setting to
change the clocking of the RAM on my laptop when booting into the
BIOS-settings menus.

Claes
>
>> (The system was first-gen AMD Opteron, on a server-grade Tyan board, that
>> I ran from purchase in late 2003 for over eight years, maxing out the
>>
>> pair of CPUs to dual-core Opteron 290s and the RAM to 8 gigs, over time,
>> until the board finally died in 2012 due to burst capacitors.  Which
>> reminds me, I'm still running the replacement, a Gigabyte with an fx6100
>> overclocked a bit to 3.9 GHz and 16 gig RAM, and it's now nearing six
>> years old, so I suppose I better start planning for the next upgrade...
>> I've spent that six years upgrading to big-screen TVs as monitors, with a
>> 65inch/165cm 4K as my primary now and a 48inch/122cm as a secondary to
>> put youtube or whatever on fullscreen, and to now my second generation of
>> ssds, a pair of 1 TB samsung evos, but this reminds me that at nearing
>> six years old the main system's aging too, so I better start thinking of
>> replacing it again...)
>>
>> --
>> Duncan - List replies preferred.   No HTML msgs.
>> "Every nonfree program has a lord, a master --
>> and if you use the program, he is your master."  Richard Stallman
>>
>> --

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-01-27 17:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-22 21:06 bad key ordering - repairable? Claes Fransson
2018-01-22 21:22 ` Hugo Mills
2018-01-23 13:06   ` Claes Fransson
2018-01-23 18:13     ` Claes Fransson
2018-01-24  0:31       ` Chris Murphy
2018-01-24 19:44         ` Claes Fransson
2018-01-24 23:15           ` Duncan
     [not found]           ` <CAEY8F1pVrZnf3M6mGJaxogx14ZrJ5CV3++_-y13sTniJ3ds4ww@mail.gmail.com>
2018-01-27 17:42             ` Claes Fransson
2018-01-27 14:54     ` Claes Fransson
2018-01-23  2:35 ` Chris Murphy
2018-01-23 12:51   ` Austin S. Hemmelgarn
2018-01-23 13:29     ` Claes Fransson
2018-01-24  0:44     ` Chris Murphy
2018-01-24 12:30       ` Austin S. Hemmelgarn
2018-01-24 23:54         ` Chris Murphy
2018-01-25 12:41           ` Austin S. Hemmelgarn
2018-01-23 13:17   ` Claes Fransson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).