All of lore.kernel.org
 help / color / mirror / Atom feed
* Block corruption
@ 2004-06-08 13:36 Andrew Snare
  2004-06-08 14:12 ` Vladimir Saveliev
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Snare @ 2004-06-08 13:36 UTC (permalink / raw)
  To: reiserfs-list; +Cc: Paul Wagland

Hi,

After a few 2.6.6 oopsen and --rebuild-trees on some filesystems, we've
still got some bad-block errors that are proving troublesome. In
particular, we appear to have 3 bad blocks:

vs-5150: search_by_key: invalid format found in block 5963192. Fsck?
vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to
find stat data of [52570 52653 0x0 SD]
vs-5150: search_by_key: invalid format found in block 5946544. Fsck?
vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to
find stat data of [46770 46771 0x0 SD]
vs-5150: search_by_key: invalid format found in block 10058201. Fsck?
vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to
find stat data of [111684 111687 0x0 SD]

It's a hardware RAID-5 configuration that has no errors, so we know
it's not a hardware problem; this points to corruption in one or more of
our 11 reiserfs filesystems.

Is it possible to:

1) Work out which of the 11 filesystems these errors are referring to?
Preferably while the system is mounted. I see a block number, but
several of our filesystems are large enough to have a block number that
big.
2) Work out which directory/file is corrupted? In the past whenever we
did a --rebuild-tree we ended up with quite a lot of stuff in
lost+found, so we'd like to try and work out where it's coming from if
possible.

Thanks in advance,

 - Andrew Snare

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block corruption
  2004-06-08 13:36 Block corruption Andrew Snare
@ 2004-06-08 14:12 ` Vladimir Saveliev
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Saveliev @ 2004-06-08 14:12 UTC (permalink / raw)
  To: Andrew Snare; +Cc: reiserfs-list, Paul Wagland

Hello

On Tue, 2004-06-08 at 17:36, Andrew Snare wrote:
> Hi,
> 
> After a few 2.6.6 oopsen and --rebuild-trees on some filesystems, we've
> still got some bad-block errors that are proving troublesome. In
> particular, we appear to have 3 bad blocks:
> 
> vs-5150: search_by_key: invalid format found in block 5963192. Fsck?
> vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to
> find stat data of [52570 52653 0x0 SD]
> vs-5150: search_by_key: invalid format found in block 5946544. Fsck?
> vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to
> find stat data of [46770 46771 0x0 SD]
> vs-5150: search_by_key: invalid format found in block 10058201. Fsck?
> vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to
> find stat data of [111684 111687 0x0 SD]
> 
> It's a hardware RAID-5 configuration that has no errors, so we know
> it's not a hardware problem; this points to corruption in one or more of
> our 11 reiserfs filesystems.
> 
> Is it possible to:
> 
> 1) Work out which of the 11 filesystems these errors are referring to?
> Preferably while the system is mounted. I see a block number, but
> several of our filesystems are large enough to have a block number that
> big.

Well, reiserfs needs to include device name into its warnings
Until it is not done you might want to do

find /mountpoint -inum 52653 
for each of your reiserfs filesystem.

> 2) Work out which directory/file is corrupted? In the past whenever we
> did a --rebuild-tree we ended up with quite a lot of stuff in
> lost+found, so we'd like to try and work out where it's coming from if
> possible.
> 

the same find command should find those files.

Even better, find /mountpoint will say "Permission denied" for all files
unreachable on a filesystem

ALso, reiserfs may store more than one file in a node. So, few files may
be unreachable

> Thanks in advance,
> 
>  - Andrew Snare
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block corruption
@ 2004-06-08 14:31 Andrew Snare
  2004-06-08 15:55 ` Vladimir Saveliev
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Snare @ 2004-06-08 14:31 UTC (permalink / raw)
  To: vs; +Cc: Paul Wagland, reiserfs-list

Vladimir Saveliev suggested:
> Well, reiserfs needs to include device name into its warnings
> Until it is not done you might want to do
> 
> find /mountpoint -inum 52653 
> for each of your reiserfs filesystem.

That number comes from the vs-13070 messages. They're of the form:
> vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to
> find stat data of [52570 52653 0x0 SD]

What are these numbers near the end? You've referred to the second as
the inode, but we'd like to understand more clearly what's going on.

Cheers,

 - Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Block corruption
  2004-06-08 14:31 Block corruption Andrew Snare
@ 2004-06-08 15:55 ` Vladimir Saveliev
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Saveliev @ 2004-06-08 15:55 UTC (permalink / raw)
  To: Andrew Snare; +Cc: Paul Wagland, reiserfs-list

Hello

On Tue, 2004-06-08 at 18:31, Andrew Snare wrote:
> Vladimir Saveliev suggested:
> > Well, reiserfs needs to include device name into its warnings
> > Until it is not done you might want to do
> > 
> > find /mountpoint -inum 52653 
> > for each of your reiserfs filesystem.
> 
> That number comes from the vs-13070 messages. They're of the form:
> > vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to
> > find stat data of [52570 52653 0x0 SD]
> 
> What are these numbers near the end? You've referred to the second as
> the inode, but we'd like to understand more clearly what's going on.
> 
reiserfs builds files of items. Item is stored in an ondisk tree. Each
file has at least one item - stat data item. Each item has key. Key has
4 components.
[52570 52653 0x0 SD] is key of stat data. Key of stat data was taken
from directory entry.
First key component is unique identifier of directory in which file was
created.
Second key component is unique identifier of file itself. This is what
is reported in st_ino field of struct stat
Third key component is offset within a file. It is 0 for items if stat
data type.
Fourth key component is item type.


> Cheers,
> 
>  - Andrew
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* block corruption
@ 2019-08-31 18:19 Rann Bar-On
  2019-08-31 20:26 ` Rann Bar-On
  0 siblings, 1 reply; 14+ messages in thread
From: Rann Bar-On @ 2019-08-31 18:19 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have a btrfs filesystem with a corrupt block. After finding errors
such as 

BTRFS critical (device nvme0n1p6): corrupt leaf: root=5
block=154977681408 slot=19 ino=8708079, invalid mode: has 00 expect
valid S_IF* bit(s)

spamming my dmesg, I examined the block with 

btrfs-debug-tree -b 154977681408 /dev/nvme0n1p6

I found the source of the error in this line:

item 19 key (8708079 INODE_ITEM 0) itemoff 14769 itemsize 160   inode 
generation 292430 transid 292449 size 0 block group 0 mode 0 links 1
uid 0 gid 0 rdev 0 flags 0x0

I then ran 

find <mount point> -type f -exec cp {} /dev/null \;

to look for corrupt files. Almost all files that appeared in the list
also appeared in the btrfs-debug-tree command above.

I conclude that this block is corrupt.

Two questions:

1. Given that none of the files in the list are critical, I'd like to
remove the block, or at least the files. Is this possible? How?

2. Is this an indicator of a problem with the drive? smartctl does not
give errors, nor does scrubbing the file system.

More info below:

# uname -a
Linux debian-x1yoga 5.2.0-2-amd64 #1 SMP Debian 5.2.9-2 (2019-08-21)
x86_64 GNU/Linux

#   btrfs --version
Btrfs v3.17

#   btrfs fi show
Label: none  uuid: 91624ec9-49ef-469e-a949-7699dc681c52
        Total devices 1 FS bytes used 13.27GiB
        devid    1 size 19.26GiB used 16.03GiB path /dev/nvme0n1p2

Label: none  uuid: 4133c951-4327-4040-83ed-9e8a71270cc2
        Total devices 1 FS bytes used 6.85GiB
        devid    1 size 9.93GiB used 9.51GiB path /dev/nvme0n1p3

Label: none  uuid: be5b72e2-5cd1-498e-b38c-d83b10548ef3
        Total devices 1 FS bytes used 174.09GiB
        devid    1 size 182.99GiB used 182.99GiB path /dev/nvme0n1p6

Label: none  uuid: bbba344c-b256-4509-ac49-4b69b1a73607
        Total devices 1 FS bytes used 1.93MiB
        devid    1 size 381.00MiB used 381.00MiB path /dev/nvme0n1p5

Btrfs v3.17

# btrfs fi df /home
Data, single: total=180.98GiB, used=173.41GiB
System, single: total=4.00MiB, used=48.00KiB
Metadata, single: total=2.01GiB, used=698.38MiB
GlobalReserve, single: total=512.00MiB, used=0.00B

The only error in dmesg is the one referred to above.

I appreciate any help! 

Cheers,
Rann


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: block corruption
  2019-08-31 18:19 block corruption Rann Bar-On
@ 2019-08-31 20:26 ` Rann Bar-On
  2019-08-31 23:04   ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: Rann Bar-On @ 2019-08-31 20:26 UTC (permalink / raw)
  To: linux-btrfs

I just downgraded to kernel 4.19, and the supposed corruption vanished.
This may be related to

https://www.spinics.net/lists/linux-btrfs/msg91046.html

If I can provide information that would help fix this issue, I'd be
glad to, but I cannot upgrade back to kernel 5.2, as I can't risk this
system.

Rann
-- 
Rann Bar-On
Senior Lecturer
Dept of Mathematics
Duke University

Pronouns: he/him/his

On Sat, 2019-08-31 at 14:19 -0400, Rann Bar-On wrote:
> Hi,
> 
> I have a btrfs filesystem with a corrupt block. After finding errors
> such as 
> 
> BTRFS critical (device nvme0n1p6): corrupt leaf: root=5
> block=154977681408 slot=19 ino=8708079, invalid mode: has 00 expect
> valid S_IF* bit(s)
> 
> spamming my dmesg, I examined the block with 
> 
> btrfs-debug-tree -b 154977681408 /dev/nvme0n1p6
> 
> I found the source of the error in this line:
> 
> item 19 key (8708079 INODE_ITEM 0) itemoff 14769 itemsize
> 160   inode 
> generation 292430 transid 292449 size 0 block group 0 mode 0 links 1
> uid 0 gid 0 rdev 0 flags 0x0
> 
> I then ran 
> 
> find <mount point> -type f -exec cp {} /dev/null \;
> 
> to look for corrupt files. Almost all files that appeared in the list
> also appeared in the btrfs-debug-tree command above.
> 
> I conclude that this block is corrupt.
> 
> Two questions:
> 
> 1. Given that none of the files in the list are critical, I'd like to
> remove the block, or at least the files. Is this possible? How?
> 
> 2. Is this an indicator of a problem with the drive? smartctl does
> not
> give errors, nor does scrubbing the file system.
> 
> More info below:
> 
> # uname -a
> Linux debian-x1yoga 5.2.0-2-amd64 #1 SMP Debian 5.2.9-2 (2019-08-21)
> x86_64 GNU/Linux
> 
> #   btrfs --version
> Btrfs v3.17
> 
> #   btrfs fi show
> Label: none  uuid: 91624ec9-49ef-469e-a949-7699dc681c52
>         Total devices 1 FS bytes used 13.27GiB
>         devid    1 size 19.26GiB used 16.03GiB path /dev/nvme0n1p2
> 
> Label: none  uuid: 4133c951-4327-4040-83ed-9e8a71270cc2
>         Total devices 1 FS bytes used 6.85GiB
>         devid    1 size 9.93GiB used 9.51GiB path /dev/nvme0n1p3
> 
> Label: none  uuid: be5b72e2-5cd1-498e-b38c-d83b10548ef3
>         Total devices 1 FS bytes used 174.09GiB
>         devid    1 size 182.99GiB used 182.99GiB path /dev/nvme0n1p6
> 
> Label: none  uuid: bbba344c-b256-4509-ac49-4b69b1a73607
>         Total devices 1 FS bytes used 1.93MiB
>         devid    1 size 381.00MiB used 381.00MiB path /dev/nvme0n1p5
> 
> Btrfs v3.17
> 
> # btrfs fi df /home
> Data, single: total=180.98GiB, used=173.41GiB
> System, single: total=4.00MiB, used=48.00KiB
> Metadata, single: total=2.01GiB, used=698.38MiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> The only error in dmesg is the one referred to above.
> 
> I appreciate any help! 
> 
> Cheers,
> Rann


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: block corruption
  2019-08-31 20:26 ` Rann Bar-On
@ 2019-08-31 23:04   ` Chris Murphy
  2019-08-31 23:39     ` Rann Bar-On
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2019-08-31 23:04 UTC (permalink / raw)
  To: Rann Bar-On; +Cc: linux-btrfs

On Sat, Aug 31, 2019 at 2:26 PM Rann Bar-On <rann@math.duke.edu> wrote:
>
> I just downgraded to kernel 4.19, and the supposed corruption vanished.
> This may be related to
>
> https://www.spinics.net/lists/linux-btrfs/msg91046.html
>
> If I can provide information that would help fix this issue, I'd be
> glad to, but I cannot upgrade back to kernel 5.2, as I can't risk this
> system.

5.2 has more strict checks for corruption, exposing the rare case
where metadata in a leaf is corrupt but the checksum was properly
computed.

> > Btrfs v3.17

This is old. I suggest finding a newer version of btrfs-progs, ideally
latest stable version is 5.2.1. Definitely don't use --repair with
this version. It's safe to use check --readonly (which is the default)
with this version but probably not that helpful to devs.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: block corruption
  2019-08-31 23:04   ` Chris Murphy
@ 2019-08-31 23:39     ` Rann Bar-On
  2019-08-31 23:48       ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Rann Bar-On @ 2019-08-31 23:39 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs


On Sat, 2019-08-31 at 17:04 -0600, Chris Murphy wrote:
> On Sat, Aug 31, 2019 at 2:26 PM Rann Bar-On <rann@math.duke.edu>
> wrote:
> > I just downgraded to kernel 4.19, and the supposed corruption
> > vanished.
> > This may be related to
> > 
> > https://www.spinics.net/lists/linux-btrfs/msg91046.html
> > 
> > If I can provide information that would help fix this issue, I'd be
> > glad to, but I cannot upgrade back to kernel 5.2, as I can't risk
> > this
> > system.
> 
> 5.2 has more strict checks for corruption, exposing the rare case
> where metadata in a leaf is corrupt but the checksum was properly
> computed.
> 
> > > Btrfs v3.17
> 
> This is old. I suggest finding a newer version of btrfs-progs,
> ideally
> latest stable version is 5.2.1. Definitely don't use --repair with
> this version. It's safe to use check --readonly (which is the
> default)
> with this version but probably not that helpful to devs.
> 

Not really sure why that said 3.17:

$ btrfs --version
btrfs-progs v5.2.1 

Anyway, running btrfs --repair on the file system didn't do anything to
fix the above errors.

I removed at least one of the corrupt files (the only one that was mode
0) using kernel 4.19.

Happy to help further if I can. What would you suggest as far as fixing
this or reporting it usefully? If you believe 5.2 isn't causing the
corruption, but rather, just exposing it, I'm more than happy to
experiment with it.

Rann

> 


> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: block corruption
  2019-08-31 23:39     ` Rann Bar-On
@ 2019-08-31 23:48       ` Qu Wenruo
  2019-09-01 17:39         ` Rann Bar-On
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2019-08-31 23:48 UTC (permalink / raw)
  To: Rann Bar-On, Chris Murphy; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1911 bytes --]



On 2019/9/1 上午7:39, Rann Bar-On wrote:
> 
> On Sat, 2019-08-31 at 17:04 -0600, Chris Murphy wrote:
>> On Sat, Aug 31, 2019 at 2:26 PM Rann Bar-On <rann@math.duke.edu>
>> wrote:
>>> I just downgraded to kernel 4.19, and the supposed corruption
>>> vanished.
>>> This may be related to
>>>
>>> https://www.spinics.net/lists/linux-btrfs/msg91046.html
>>>
>>> If I can provide information that would help fix this issue, I'd be
>>> glad to, but I cannot upgrade back to kernel 5.2, as I can't risk
>>> this
>>> system.
>>
>> 5.2 has more strict checks for corruption, exposing the rare case
>> where metadata in a leaf is corrupt but the checksum was properly
>> computed.

Exactly.

Although for your case, it's some older kernel doing something bad.

It's also reported once for the same problem, some older kernel doesn't
set the mode member properly.

>>
>>>> Btrfs v3.17
>>
>> This is old. I suggest finding a newer version of btrfs-progs,
>> ideally
>> latest stable version is 5.2.1. Definitely don't use --repair with
>> this version. It's safe to use check --readonly (which is the
>> default)
>> with this version but probably not that helpful to devs.
>>
> 
> Not really sure why that said 3.17:
> 
> $ btrfs --version
> btrfs-progs v5.2.1 
> 
> Anyway, running btrfs --repair on the file system didn't do anything to
> fix the above errors.

That's what we need to enhance next.

> 
> I removed at least one of the corrupt files (the only one that was mode
> 0) using kernel 4.19.
> 
> Happy to help further if I can. What would you suggest as far as fixing
> this or reporting it usefully? If you believe 5.2 isn't causing the
> corruption, but rather, just exposing it, I'm more than happy to
> experiment with it.

Deleting the offending inodes would be enough to fix the alert.

Thanks,
Qu

> 
> Rann
> 
>>
> 
> 
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: block corruption
  2019-08-31 23:48       ` Qu Wenruo
@ 2019-09-01 17:39         ` Rann Bar-On
  2019-09-01 20:09           ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: Rann Bar-On @ 2019-09-01 17:39 UTC (permalink / raw)
  To: Qu Wenruo, Chris Murphy; +Cc: linux-btrfs

On Sun, 2019-09-01 at 07:48 +0800, Qu Wenruo wrote:
> 
> On 2019/9/1 上午7:39, Rann Bar-On wrote:
> > On Sat, 2019-08-31 at 17:04 -0600, Chris Murphy wrote:
> > > On Sat, Aug 31, 2019 at 2:26 PM Rann Bar-On <rann@math.duke.edu>
> > > wrote:
> > > > I just downgraded to kernel 4.19, and the supposed corruption
> > > > vanished.
> > > > This may be related to
> > > > 
> > > > https://www.spinics.net/lists/linux-btrfs/msg91046.html
> > > > 
> > > > If I can provide information that would help fix this issue,
> > > > I'd be
> > > > glad to, but I cannot upgrade back to kernel 5.2, as I can't
> > > > risk
> > > > this
> > > > system.
> > > 
> > > 5.2 has more strict checks for corruption, exposing the rare case
> > > where metadata in a leaf is corrupt but the checksum was properly
> > > computed.
> 
> Exactly.
> 
> Although for your case, it's some older kernel doing something bad.
> 
> It's also reported once for the same problem, some older kernel
> doesn't
> set the mode member properly.

> > > > > Btrfs v3.17
> > > 
> > > This is old. I suggest finding a newer version of btrfs-progs,
> > > ideally
> > > latest stable version is 5.2.1. Definitely don't use --repair
> > > with
> > > this version. It's safe to use check --readonly (which is the
> > > default)
> > > with this version but probably not that helpful to devs.
> > > 
> > 
> > Not really sure why that said 3.17:
> > 
> > $ btrfs --version
> > btrfs-progs v5.2.1 
> > 
> > Anyway, running btrfs --repair on the file system didn't do
> > anything to
> > fix the above errors.
> 
> That's what we need to enhance next.
> 
> > I removed at least one of the corrupt files (the only one that was
> > mode
> > 0) using kernel 4.19.
> > 
> > Happy to help further if I can. What would you suggest as far as
> > fixing
> > this or reporting it usefully? If you believe 5.2 isn't causing the
> > corruption, but rather, just exposing it, I'm more than happy to
> > experiment with it.
> 
> Deleting the offending inodes would be enough to fix the alert.
> 

I deleted the file using the older kernel. I rebooted into the new
kernel, and things seem good for now.

Note: The newer one wouldn't let me access the file to delete it, nor
did any btrfs repair tool do anything at all. This is a big problem
IMO!

> Thanks,
> Qu
> 
> > Rann
> > 
-- 
Rann Bar-On
Senior Lecturer
Dept of Mathematics
Duke University

Pronouns: he/him/his


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: block corruption
  2019-09-01 17:39         ` Rann Bar-On
@ 2019-09-01 20:09           ` Chris Murphy
  2019-09-01 20:35             ` Rann Bar-On
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2019-09-01 20:09 UTC (permalink / raw)
  To: Rann Bar-On; +Cc: Qu Wenruo, Chris Murphy, linux-btrfs

On Sun, Sep 1, 2019 at 11:39 AM Rann Bar-On <rann@math.duke.edu> wrote:
>
> On Sun, 2019-09-01 at 07:48 +0800, Qu Wenruo wrote:
> >
> > On 2019/9/1 上午7:39, Rann Bar-On wrote:
> > > On Sat, 2019-08-31 at 17:04 -0600, Chris Murphy wrote:
> > > > On Sat, Aug 31, 2019 at 2:26 PM Rann Bar-On <rann@math.duke.edu>
> > > > wrote:
> > > > > I just downgraded to kernel 4.19, and the supposed corruption
> > > > > vanished.
> > > > > This may be related to
> > > > >
> > > > > https://www.spinics.net/lists/linux-btrfs/msg91046.html
> > > > >
> > > > > If I can provide information that would help fix this issue,
> > > > > I'd be
> > > > > glad to, but I cannot upgrade back to kernel 5.2, as I can't
> > > > > risk
> > > > > this
> > > > > system.
> > > >
> > > > 5.2 has more strict checks for corruption, exposing the rare case
> > > > where metadata in a leaf is corrupt but the checksum was properly
> > > > computed.
> >
> > Exactly.
> >
> > Although for your case, it's some older kernel doing something bad.
> >
> > It's also reported once for the same problem, some older kernel
> > doesn't
> > set the mode member properly.
>
> > > > > > Btrfs v3.17
> > > >
> > > > This is old. I suggest finding a newer version of btrfs-progs,
> > > > ideally
> > > > latest stable version is 5.2.1. Definitely don't use --repair
> > > > with
> > > > this version. It's safe to use check --readonly (which is the
> > > > default)
> > > > with this version but probably not that helpful to devs.
> > > >
> > >
> > > Not really sure why that said 3.17:
> > >
> > > $ btrfs --version
> > > btrfs-progs v5.2.1
> > >
> > > Anyway, running btrfs --repair on the file system didn't do
> > > anything to
> > > fix the above errors.
> >
> > That's what we need to enhance next.
> >
> > > I removed at least one of the corrupt files (the only one that was
> > > mode
> > > 0) using kernel 4.19.
> > >
> > > Happy to help further if I can. What would you suggest as far as
> > > fixing
> > > this or reporting it usefully? If you believe 5.2 isn't causing the
> > > corruption, but rather, just exposing it, I'm more than happy to
> > > experiment with it.
> >
> > Deleting the offending inodes would be enough to fix the alert.
> >
>
> I deleted the file using the older kernel. I rebooted into the new
> kernel, and things seem good for now.
>
> Note: The newer one wouldn't let me access the file to delete it, nor
> did any btrfs repair tool do anything at all. This is a big problem
> IMO!

The current behavior is an improvement over propagating corruption and
never detecting it because the leaf is assumed to be correct only
because the checksum matches. The next step is figuring out ways to
work around such rare detected corruptions, hopefully automatically
and while online.

I don't consider it user responsibility to have to do this, but I'm
vaguely curious if it's possible to delete the offending file in a
snapshot, then delete the original subvolume. i.e.

1.
snapshot the subvolume containing the file (default rw snapshot)
2.
delete the bad file(s) in the snapshot
3.
delete the original subvolume (snapshot's parent)

I'm curious if either 2 or 3 are permitted.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: block corruption
  2019-09-01 20:09           ` Chris Murphy
@ 2019-09-01 20:35             ` Rann Bar-On
  2019-09-02  5:33               ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Rann Bar-On @ 2019-09-01 20:35 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Qu Wenruo, linux-btrfs


-- 
Rann Bar-On
Senior Lecturer
Dept of Mathematics
Duke University

Pronouns: he/him/his

On Sun, 2019-09-01 at 14:09 -0600, Chris Murphy wrote:
> On Sun, Sep 1, 2019 at 11:39 AM Rann Bar-On <rann@math.duke.edu>
> wrote:
> > On Sun, 2019-09-01 at 07:48 +0800, Qu Wenruo wrote:
> > > On 2019/9/1 上午7:39, Rann Bar-On wrote:
> > > > On Sat, 2019-08-31 at 17:04 -0600, Chris Murphy wrote:
> > > > > On Sat, Aug 31, 2019 at 2:26 PM Rann Bar-On <
> > > > > rann@math.duke.edu>
> > > > > wrote:
> > > > > > I just downgraded to kernel 4.19, and the supposed
> > > > > > corruption
> > > > > > vanished.
> > > > > > This may be related to
> > > > > > 
> > > > > > https://www.spinics.net/lists/linux-btrfs/msg91046.html
> > > > > > 
> > > > > > If I can provide information that would help fix this
> > > > > > issue,
> > > > > > I'd be
> > > > > > glad to, but I cannot upgrade back to kernel 5.2, as I
> > > > > > can't
> > > > > > risk
> > > > > > this
> > > > > > system.
> > > > > 
> > > > > 5.2 has more strict checks for corruption, exposing the rare
> > > > > case
> > > > > where metadata in a leaf is corrupt but the checksum was
> > > > > properly
> > > > > computed.
> > > 
> > > Exactly.
> > > 
> > > Although for your case, it's some older kernel doing something
> > > bad.
> > > 
> > > It's also reported once for the same problem, some older kernel
> > > doesn't
> > > set the mode member properly.
> > > > > > > Btrfs v3.17
> > > > > 
> > > > > This is old. I suggest finding a newer version of btrfs-
> > > > > progs,
> > > > > ideally
> > > > > latest stable version is 5.2.1. Definitely don't use --repair
> > > > > with
> > > > > this version. It's safe to use check --readonly (which is the
> > > > > default)
> > > > > with this version but probably not that helpful to devs.
> > > > > 
> > > > 
> > > > Not really sure why that said 3.17:
> > > > 
> > > > $ btrfs --version
> > > > btrfs-progs v5.2.1
> > > > 
> > > > Anyway, running btrfs --repair on the file system didn't do
> > > > anything to
> > > > fix the above errors.
> > > 
> > > That's what we need to enhance next.
> > > 
> > > > I removed at least one of the corrupt files (the only one that
> > > > was
> > > > mode
> > > > 0) using kernel 4.19.
> > > > 
> > > > Happy to help further if I can. What would you suggest as far
> > > > as
> > > > fixing
> > > > this or reporting it usefully? If you believe 5.2 isn't causing
> > > > the
> > > > corruption, but rather, just exposing it, I'm more than happy
> > > > to
> > > > experiment with it.
> > > 
> > > Deleting the offending inodes would be enough to fix the alert.
> > > 
> > 
> > I deleted the file using the older kernel. I rebooted into the new
> > kernel, and things seem good for now.
> > 
> > Note: The newer one wouldn't let me access the file to delete it,
> > nor
> > did any btrfs repair tool do anything at all. This is a big problem
> > IMO!
> 
> The current behavior is an improvement over propagating corruption
> and
> never detecting it because the leaf is assumed to be correct only
> because the checksum matches. The next step is figuring out ways to
> work around such rare detected corruptions, hopefully automatically
> and while online.
> 
> I don't consider it user responsibility to have to do this, but I'm
> vaguely curious if it's possible to delete the offending file in a
> snapshot, then delete the original subvolume. i.e.
> 
> 1.
> snapshot the subvolume containing the file (default rw snapshot)
> 2.
> delete the bad file(s) in the snapshot
> 3.
> delete the original subvolume (snapshot's parent)
> 
> I'm curious if either 2 or 3 are permitted.
> 
> 

Wish I could help, but I already deleted the file. If there's something
I can do to move this forward, I'd be glad to.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: block corruption
  2019-09-01 20:35             ` Rann Bar-On
@ 2019-09-02  5:33               ` Qu Wenruo
  2019-09-02  6:27                 ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2019-09-02  5:33 UTC (permalink / raw)
  To: Rann Bar-On, Chris Murphy; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 751 bytes --]

[...]
>>
>> I'm curious if either 2 or 3 are permitted.
>>
>>
> 
> Wish I could help, but I already deleted the file. If there's something
> I can do to move this forward, I'd be glad to.
> 

Maybe it's too late to mention, in fact "btrfs check" has the ability to
find such problem.

"btrfs check" has two modes, original mode (the default one) and lowmem
mode.
The latter is mostly written from scratch, thus it has more strict check
rules (not to mention it uses less memory while will be much slower as
it causes more IO).

If you're not sure if those inodes are the only offending ones, then you
can try "btrfs check --mode=lowmem --readonly" to verify.

The support for original mode will come very soon.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: block corruption
  2019-09-02  5:33               ` Qu Wenruo
@ 2019-09-02  6:27                 ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2019-09-02  6:27 UTC (permalink / raw)
  To: Rann Bar-On, Chris Murphy; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1001 bytes --]



On 2019/9/2 下午1:33, Qu Wenruo wrote:
> [...]
>>>
>>> I'm curious if either 2 or 3 are permitted.
>>>
>>>
>>
>> Wish I could help, but I already deleted the file. If there's something
>> I can do to move this forward, I'd be glad to.
>>
> 
> Maybe it's too late to mention, in fact "btrfs check" has the ability to
> find such problem.
> 
> "btrfs check" has two modes, original mode (the default one) and lowmem
> mode.
> The latter is mostly written from scratch, thus it has more strict check
> rules (not to mention it uses less memory while will be much slower as
> it causes more IO).
> 
> If you're not sure if those inodes are the only offending ones, then you
> can try "btrfs check --mode=lowmem --readonly" to verify.
> 
> The support for original mode will come very soon.

Wait for a minute, that report and repair functionality is already added
in v5.1 btrfs-progs.

Are you using some too old btrfs-progs?

THanks,
Qu

> 
> Thanks,
> Qu
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-09-02  6:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-08 14:31 Block corruption Andrew Snare
2004-06-08 15:55 ` Vladimir Saveliev
  -- strict thread matches above, loose matches on Subject: below --
2019-08-31 18:19 block corruption Rann Bar-On
2019-08-31 20:26 ` Rann Bar-On
2019-08-31 23:04   ` Chris Murphy
2019-08-31 23:39     ` Rann Bar-On
2019-08-31 23:48       ` Qu Wenruo
2019-09-01 17:39         ` Rann Bar-On
2019-09-01 20:09           ` Chris Murphy
2019-09-01 20:35             ` Rann Bar-On
2019-09-02  5:33               ` Qu Wenruo
2019-09-02  6:27                 ` Qu Wenruo
2004-06-08 13:36 Block corruption Andrew Snare
2004-06-08 14:12 ` Vladimir Saveliev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.