linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy
       [not found]   ` <1521139304@msgid.manchmal.in-ulm.de>
@ 2018-03-16 12:30     ` Greg Kroah-Hartman
  2018-03-16 13:22       ` David Sterba
  2018-03-17 17:27       ` Christoph Biedl
  0 siblings, 2 replies; 7+ messages in thread
From: Greg Kroah-Hartman @ 2018-03-16 12:30 UTC (permalink / raw)
  To: Christoph Biedl, linux-btrfs
  Cc: linux-kernel, stable, Anand Jain, Liu Bo, David Sterba

On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
> Greg Kroah-Hartman wrote...
> 
> > 4.14-stable review patch.  If anyone has any objections, please let me know.
> 
> > commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
> (...)
> 
> > If the filesystem is always used on a same endian host, this will not
> > be a problem.
> 
> >From my observations I cannot quite subscribe to that.
> 
> On big-endian systems, this change intruduces severe corruption,
> resulting in complete loss of the data on the used block device.
> 
> Steps to reproduce (tested on ppc/powerpc and parisc/hppa):
> 
> # mkfs.btrfs $DEV
> # mount $DEV /mnt/tmp/
> # umount /mnt/tmp/
> 
> This simple umount corrupts the file system:
> 
> # mount $DEV /mnt/tmp/
> mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing codepage or helper program, or other error.
> 
> # dmesg:
> BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096
> BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096
> BTRFS critical (device <dev>): unable to find logical 18102363734671360 length 16384
> BTRFS error (device <dev>): failed to read chunk root
> BTRFS error (device <dev>): open_ctree failed
> 
> Also fsck is of no help:
> 
> # btrfsck $DEV
> Couldn't map the block 18102363734671360
> No mapping for 18102363734671360-18102363734687744
> Couldn't map the block 18102363734671360
> bytenr mismatch, want=18102363734671360, have=0
> ERROR: cannot read chunk root
> ERROR: cannot open file system
> 
> 
> Trying mount or fsck on a little-endian system does not help either. So
> I consider the data on that device lost - luckily I use btrfs only for
> files where a backup exists all the time.
> 
> 
> Reverting that change restored the previous error-free behaviour. I
> didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last
> that affected these files. Still I could give this a try if anybody
> wishes so.

That sucks.  Can you test Linus's tree to verify the problem is there?
I'll gladly revert this if Linus's tree also gets the revert, I don't
want you to hit this when you upgrade to a newer kernel.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy
  2018-03-16 12:30     ` [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy Greg Kroah-Hartman
@ 2018-03-16 13:22       ` David Sterba
  2018-03-16 14:02         ` Greg Kroah-Hartman
  2018-03-17 17:27       ` Christoph Biedl
  1 sibling, 1 reply; 7+ messages in thread
From: David Sterba @ 2018-03-16 13:22 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Christoph Biedl, linux-btrfs, linux-kernel, stable, Anand Jain,
	Liu Bo, David Sterba

On Fri, Mar 16, 2018 at 01:30:49PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
> > Greg Kroah-Hartman wrote...
> > 
> > > 4.14-stable review patch.  If anyone has any objections, please let me know.
> > 
> > > commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
> > (...)
> > 
> > > If the filesystem is always used on a same endian host, this will not
> > > be a problem.
> > 
> > >From my observations I cannot quite subscribe to that.
> > 
> > On big-endian systems, this change intruduces severe corruption,
> > resulting in complete loss of the data on the used block device.
> > 
> > Steps to reproduce (tested on ppc/powerpc and parisc/hppa):
> > 
> > # mkfs.btrfs $DEV
> > # mount $DEV /mnt/tmp/
> > # umount /mnt/tmp/
> > 
> > This simple umount corrupts the file system:
> > 
> > # mount $DEV /mnt/tmp/
> > mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing codepage or helper program, or other error.
> > 
> > # dmesg:
> > BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096
> > BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096
> > BTRFS critical (device <dev>): unable to find logical 18102363734671360 length 16384
> > BTRFS error (device <dev>): failed to read chunk root
> > BTRFS error (device <dev>): open_ctree failed
> > 
> > Also fsck is of no help:
> > 
> > # btrfsck $DEV
> > Couldn't map the block 18102363734671360
> > No mapping for 18102363734671360-18102363734687744
> > Couldn't map the block 18102363734671360
> > bytenr mismatch, want=18102363734671360, have=0
> > ERROR: cannot read chunk root
> > ERROR: cannot open file system
> > 
> > 
> > Trying mount or fsck on a little-endian system does not help either. So
> > I consider the data on that device lost - luckily I use btrfs only for
> > files where a backup exists all the time.
> > 
> > 
> > Reverting that change restored the previous error-free behaviour. I
> > didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last
> > that affected these files. Still I could give this a try if anybody
> > wishes so.
> 
> That sucks.  Can you test Linus's tree to verify the problem is there?
> I'll gladly revert this if Linus's tree also gets the revert, I don't
> want you to hit this when you upgrade to a newer kernel.

I'll push a fix for the upcoming rc but I think it would be better to
remove the broken patch from stable kernels ASAP, so I'd recommend to
revert it now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy
  2018-03-16 13:22       ` David Sterba
@ 2018-03-16 14:02         ` Greg Kroah-Hartman
  2018-03-16 16:21           ` Anand Jain
  0 siblings, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2018-03-16 14:02 UTC (permalink / raw)
  To: dsterba, Christoph Biedl, linux-btrfs, linux-kernel, stable,
	Anand Jain, Liu Bo, David Sterba

On Fri, Mar 16, 2018 at 02:22:02PM +0100, David Sterba wrote:
> On Fri, Mar 16, 2018 at 01:30:49PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
> > > Greg Kroah-Hartman wrote...
> > > 
> > > > 4.14-stable review patch.  If anyone has any objections, please let me know.
> > > 
> > > > commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
> > > (...)
> > > 
> > > > If the filesystem is always used on a same endian host, this will not
> > > > be a problem.
> > > 
> > > >From my observations I cannot quite subscribe to that.
> > > 
> > > On big-endian systems, this change intruduces severe corruption,
> > > resulting in complete loss of the data on the used block device.
> > > 
> > > Steps to reproduce (tested on ppc/powerpc and parisc/hppa):
> > > 
> > > # mkfs.btrfs $DEV
> > > # mount $DEV /mnt/tmp/
> > > # umount /mnt/tmp/
> > > 
> > > This simple umount corrupts the file system:
> > > 
> > > # mount $DEV /mnt/tmp/
> > > mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing codepage or helper program, or other error.
> > > 
> > > # dmesg:
> > > BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096
> > > BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096
> > > BTRFS critical (device <dev>): unable to find logical 18102363734671360 length 16384
> > > BTRFS error (device <dev>): failed to read chunk root
> > > BTRFS error (device <dev>): open_ctree failed
> > > 
> > > Also fsck is of no help:
> > > 
> > > # btrfsck $DEV
> > > Couldn't map the block 18102363734671360
> > > No mapping for 18102363734671360-18102363734687744
> > > Couldn't map the block 18102363734671360
> > > bytenr mismatch, want=18102363734671360, have=0
> > > ERROR: cannot read chunk root
> > > ERROR: cannot open file system
> > > 
> > > 
> > > Trying mount or fsck on a little-endian system does not help either. So
> > > I consider the data on that device lost - luckily I use btrfs only for
> > > files where a backup exists all the time.
> > > 
> > > 
> > > Reverting that change restored the previous error-free behaviour. I
> > > didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last
> > > that affected these files. Still I could give this a try if anybody
> > > wishes so.
> > 
> > That sucks.  Can you test Linus's tree to verify the problem is there?
> > I'll gladly revert this if Linus's tree also gets the revert, I don't
> > want you to hit this when you upgrade to a newer kernel.
> 
> I'll push a fix for the upcoming rc but I think it would be better to
> remove the broken patch from stable kernels ASAP, so I'd recommend to
> revert it now.

Now reverted, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy
  2018-03-16 14:02         ` Greg Kroah-Hartman
@ 2018-03-16 16:21           ` Anand Jain
  0 siblings, 0 replies; 7+ messages in thread
From: Anand Jain @ 2018-03-16 16:21 UTC (permalink / raw)
  To: Greg Kroah-Hartman, dsterba, Christoph Biedl, linux-btrfs,
	linux-kernel, stable, Liu Bo, David Sterba



On 03/16/2018 10:02 PM, Greg Kroah-Hartman wrote:
> On Fri, Mar 16, 2018 at 02:22:02PM +0100, David Sterba wrote:
>> On Fri, Mar 16, 2018 at 01:30:49PM +0100, Greg Kroah-Hartman wrote:
>>> On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
>>>> Greg Kroah-Hartman wrote...
>>>>
>>>>> 4.14-stable review patch.  If anyone has any objections, please let me know.
>>>>
>>>>> commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
>>>> (...)
>>>>
>>>>> If the filesystem is always used on a same endian host, this will not
>>>>> be a problem.
>>>>
>>>> >From my observations I cannot quite subscribe to that.
>>>>
>>>> On big-endian systems, this change intruduces severe corruption,
>>>> resulting in complete loss of the data on the used block device.
>>>>
>>>> Steps to reproduce (tested on ppc/powerpc and parisc/hppa):
>>>>
>>>> # mkfs.btrfs $DEV
>>>> # mount $DEV /mnt/tmp/
>>>> # umount /mnt/tmp/
>>>>
>>>> This simple umount corrupts the file system:
>>>>
>>>> # mount $DEV /mnt/tmp/
>>>> mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing codepage or helper program, or other error.
>>>>
>>>> # dmesg:
>>>> BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096
>>>> BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096
>>>> BTRFS critical (device <dev>): unable to find logical 18102363734671360 length 16384
>>>> BTRFS error (device <dev>): failed to read chunk root
>>>> BTRFS error (device <dev>): open_ctree failed
>>>>
>>>> Also fsck is of no help:
>>>>
>>>> # btrfsck $DEV
>>>> Couldn't map the block 18102363734671360
>>>> No mapping for 18102363734671360-18102363734687744
>>>> Couldn't map the block 18102363734671360
>>>> bytenr mismatch, want=18102363734671360, have=0
>>>> ERROR: cannot read chunk root
>>>> ERROR: cannot open file system
>>>>
>>>>
>>>> Trying mount or fsck on a little-endian system does not help either. So
>>>> I consider the data on that device lost - luckily I use btrfs only for
>>>> files where a backup exists all the time.
>>>>
>>>>
>>>> Reverting that change restored the previous error-free behaviour. I
>>>> didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last
>>>> that affected these files. Still I could give this a try if anybody
>>>> wishes so.
>>>
>>> That sucks.  Can you test Linus's tree to verify the problem is there?
>>> I'll gladly revert this if Linus's tree also gets the revert, I don't
>>> want you to hit this when you upgrade to a newer kernel.
>>
>> I'll push a fix for the upcoming rc but I think it would be better to
>> remove the broken patch from stable kernels ASAP, so I'd recommend to
>> revert it now.
> 
> Now reverted, thanks.

  Thanks ! Sorry for the mess.

-Anand

> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy
  2018-03-16 12:30     ` [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy Greg Kroah-Hartman
  2018-03-16 13:22       ` David Sterba
@ 2018-03-17 17:27       ` Christoph Biedl
  2018-03-19 19:32         ` David Sterba
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Biedl @ 2018-03-17 17:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-btrfs, linux-kernel, stable, Anand Jain, Liu Bo,
	David Sterba

Greg Kroah-Hartman wrote...

> On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:

> > > commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.

> > On big-endian systems, this change intruduces severe corruption,
> > resulting in complete loss of the data on the used block device.

> That sucks.  Can you test Linus's tree to verify the problem is there?
> I'll gladly revert this if Linus's tree also gets the revert, I don't
> want you to hit this when you upgrade to a newer kernel.

Confirmed: The problem is, err ... was in Linus' tree as well. The
rather recent commit 8f5fd927c3a7 reverted the change, after that
everything is as expected again.

Looking at the original commit, I don't have a clue why things go wrong
so horribly - otherwise don't be afraid of my data. I took this as a
chance to verify my data recovery procedure, with success.

    Christoph

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy
  2018-03-17 17:27       ` Christoph Biedl
@ 2018-03-19 19:32         ` David Sterba
  2018-03-20  9:32           ` Anand Jain
  0 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2018-03-19 19:32 UTC (permalink / raw)
  To: Christoph Biedl
  Cc: Greg Kroah-Hartman, Anand Jain, Liu Bo, linux-btrfs, linux-kernel,
	stable

On Sat, Mar 17, 2018 at 06:27:15PM +0100, Christoph Biedl wrote:
> Greg Kroah-Hartman wrote...
> 
> > On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
> 
> > > > commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
> 
> > > On big-endian systems, this change intruduces severe corruption,
> > > resulting in complete loss of the data on the used block device.
> 
> > That sucks.  Can you test Linus's tree to verify the problem is there?
> > I'll gladly revert this if Linus's tree also gets the revert, I don't
> > want you to hit this when you upgrade to a newer kernel.
> 
> Confirmed: The problem is, err ... was in Linus' tree as well. The
> rather recent commit 8f5fd927c3a7 reverted the change, after that
> everything is as expected again.

Thanks for checking.

> Looking at the original commit, I don't have a clue why things go wrong
> so horribly

It's a half endianness conversion. The plain in-memory structures are in
LE and has to be accessed via the helpers, super block copy and the
root item.  The commit adds only one half of the conversion, that
naturally does not exhibit on LE, because the macros are no-op.

Originally, the items were stored from the on-disk type to on-disk type,
regardless of the CPU:

  super->chunk_root = root_item->bytenr;

The patch should have added conversion of both values, like

  btrfs_set_super_chunk_root(super, btrfs_root_bytenr(root_item));

and not

  btrfs_set_super_chunk_root(super, root_item->bytenr);

It's not very obvious from the context though, typically there's one
structure that needs the accessors and is set from a value that's in CPU
byteorder. I think that the exception to this pattern confused all
involved developers.

The root_item members are annotated as __le64 that should be caught by
sparse/smatch checker in the buggy case, but we don't run the checkers
every the time.

> - otherwise don't be afraid of my data. I took this as a
> chance to verify my data recovery procedure, with success.

Should that not be the case, the damaged items in superblock can be
byteswapped back. That's 6 x u64 at most, I have a tool for that now.

Thanks again for the report and sorry for the trouble.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy
  2018-03-19 19:32         ` David Sterba
@ 2018-03-20  9:32           ` Anand Jain
  0 siblings, 0 replies; 7+ messages in thread
From: Anand Jain @ 2018-03-20  9:32 UTC (permalink / raw)
  To: dsterba, Christoph Biedl, Greg Kroah-Hartman, Liu Bo, linux-btrfs,
	linux-kernel, stable



On 03/20/2018 03:32 AM, David Sterba wrote:
> On Sat, Mar 17, 2018 at 06:27:15PM +0100, Christoph Biedl wrote:
>> Greg Kroah-Hartman wrote...
>>
>>> On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
>>
>>>>> commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
>>
>>>> On big-endian systems, this change intruduces severe corruption,
>>>> resulting in complete loss of the data on the used block device.
>>
>>> That sucks.  Can you test Linus's tree to verify the problem is there?
>>> I'll gladly revert this if Linus's tree also gets the revert, I don't
>>> want you to hit this when you upgrade to a newer kernel.
>>
>> Confirmed: The problem is, err ... was in Linus' tree as well. The
>> rather recent commit 8f5fd927c3a7 reverted the change, after that
>> everything is as expected again.
> 
> Thanks for checking.
> 
>> Looking at the original commit, I don't have a clue why things go wrong
>> so horribly
> 
> It's a half endianness conversion. The plain in-memory structures are in
> LE and has to be accessed via the helpers, super block copy and the
> root item.  The commit adds only one half of the conversion, that
> naturally does not exhibit on LE, because the macros are no-op.
> 
> Originally, the items were stored from the on-disk type to on-disk type,
> regardless of the CPU:
> 
>    super->chunk_root = root_item->bytenr;
> 
> The patch should have added conversion of both values, like
> 
>    btrfs_set_super_chunk_root(super, btrfs_root_bytenr(root_item));
> 
> and not
> 
>    btrfs_set_super_chunk_root(super, root_item->bytenr);
> 
> It's not very obvious from the context though, typically there's one
> structure that needs the accessors and is set from a value that's in CPU
> byteorder. I think that the exception to this pattern confused all
> involved developers.
> 
> The root_item members are annotated as __le64 that should be caught by
> sparse/smatch checker in the buggy case, but we don't run the checkers
> every the time.

  Ah! the RC is at the other side of the equation.
  Makes sense to me. Thanks.

>> - otherwise don't be afraid of my data. I took this as a
>> chance to verify my data recovery procedure, with success.
> 
> Should that not be the case, the damaged items in superblock can be
> byteswapped back. That's 6 x u64 at most, I have a tool for that now.
> 
> Thanks again for the report and sorry for the trouble.

  It's entirely my mistake. My apologies for the inconvenience.

Thanks, Anand


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-03-20  9:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20180307191039.748351103@linuxfoundation.org>
     [not found] ` <20180307191042.810088712@linuxfoundation.org>
     [not found]   ` <1521139304@msgid.manchmal.in-ulm.de>
2018-03-16 12:30     ` [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy Greg Kroah-Hartman
2018-03-16 13:22       ` David Sterba
2018-03-16 14:02         ` Greg Kroah-Hartman
2018-03-16 16:21           ` Anand Jain
2018-03-17 17:27       ` Christoph Biedl
2018-03-19 19:32         ` David Sterba
2018-03-20  9:32           ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).