xfs_repair: couldn't map inode 2089979520, err = 117

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* xfs_repair: couldn't map inode 2089979520, err = 117
@ 2018-01-18  6:27 Christian Kujau
  2018-01-18 14:18 ` Brian Foster
  2018-01-18 18:36 ` Brian Foster
  0 siblings, 2 replies; 7+ messages in thread
From: Christian Kujau @ 2018-01-18  6:27 UTC (permalink / raw)
  To: linux-xfs

Hi,

after a(nother) power outage this disk enclosure (containing two seperate 
disks, connected via USB) was acting up and while one of the disks seems 
to have died, the other one still works and no more hardware errors are 
reported for the enclosure or the disk.

The XFS file system on this disk can be mounted (!) and data can be read, 
but an xfs_repair fails to complete: http://nerdbynature.de/bits/4.14/xfs/

I have (compressed) xfs_metadump images available if anyone is interested.

A timeline of events:

 * disk enclosure[0] connected to a Raspbery Pi (aarch64)
 * power failure, and possible power spike after power came back
 * RPI and disk enclosure disconnected from power.
 * disk enclosure connected to an x86-64 machine with lots of RAM
 * xfs_repair (Fedora 27, xfsprogs-4.12) attempted, but the disk enclosure
   was still trying to handle the other (failing) disk and the repair
   failed after some USB resets.
 * failed disk was removed from the enclosure, no more hardware errors 
   since, but still xfs_repair is unable to complete.

After a chat on #xfs, Eric and Dave remarked:

> error 117 means the inode is corrupted; probably shouldn't be at that 
> stage, probably indicates a repair bug? just looking at the first few 
> errors
> bad magic # 0x49414233 in btbno block 28/134141
> bad magic # 0x46494233 in btcnt block 30/870600
> the first magic is IAB3 the 2nd is FIB3 those are magic numbers for
> xfs, but not for the type of block it thought it was checking

...and also:

> cross linked btrees does tend to indicate something went badly wrong
> at the hardware level

So, with all that (failed xfs_repair runs that were interrupted by 
hardware faults and also possibly flaky USB controller[0]) - has anybody 
an idea on how to convince xfs_repair to still clean up this mess? Or is 
there no other way than to restore from backup?

Thanks,
Christian.

[0] When the disk enclosure is connected to the Raspberry Pi 3, the kernel 
    usually recognizes it as follows:

usb 1-1.4: new high-speed USB device number 4 using dwc2
usb 1-1.4: New USB device found, idVendor=7825, idProduct=a2a8
usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=5
usb 1-1.4: Product: ElitePro Dual U3FW
usb 1-1.4: Manufacturer: OWC
usb 1-1.4: SerialNumber: DB9876543211160
usb 1-1.4: The driver for the USB controller dwc2_hsotg does not support scatter-gather which is
usb 1-1.4: required by the UAS driver. Please try an other USB controller if you wish to use UAS.
usb 1-1.4: The driver for the USB controller dwc2_hsotg does not support scatter-gather which is
usb 1-1.4: required by the UAS driver. Please try an other USB controller if you wish to use UAS.
usb-storage 1-1.4:1.0: USB Mass Storage device detected
scsi host0: usb-storage 1-1.4:1.0
scsi 0:0:0:0: Direct-Access     ElitePro Dual U3FW-1      0006 PQ: 0 ANSI: 6
scsi 0:0:0:1: Direct-Access     ElitePro Dual U3FW-2      0006 PQ: 0 ANSI: 6
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 47 00 10 08
sd 0:0:0:0: [sda] No Caching mode page found
sd 0:0:0:0: [sda] Assuming drive cache: write through
sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[...] 

-- 
BOFH excuse #449:

greenpeace free'd the mallocs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfs_repair: couldn't map inode 2089979520, err = 117
  2018-01-18  6:27 xfs_repair: couldn't map inode 2089979520, err = 117 Christian Kujau
@ 2018-01-18 14:18 ` Brian Foster
  2018-01-18 18:36 ` Brian Foster
  1 sibling, 0 replies; 7+ messages in thread
From: Brian Foster @ 2018-01-18 14:18 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-xfs

On Wed, Jan 17, 2018 at 10:27:19PM -0800, Christian Kujau wrote:
> Hi,
> 
> after a(nother) power outage this disk enclosure (containing two seperate 
> disks, connected via USB) was acting up and while one of the disks seems 
> to have died, the other one still works and no more hardware errors are 
> reported for the enclosure or the disk.
> 
> The XFS file system on this disk can be mounted (!) and data can be read, 
> but an xfs_repair fails to complete: http://nerdbynature.de/bits/4.14/xfs/
> 
> I have (compressed) xfs_metadump images available if anyone is interested.
> 
> A timeline of events:
> 
>  * disk enclosure[0] connected to a Raspbery Pi (aarch64)
>  * power failure, and possible power spike after power came back
>  * RPI and disk enclosure disconnected from power.
>  * disk enclosure connected to an x86-64 machine with lots of RAM
>  * xfs_repair (Fedora 27, xfsprogs-4.12) attempted, but the disk enclosure
>    was still trying to handle the other (failing) disk and the repair
>    failed after some USB resets.
>  * failed disk was removed from the enclosure, no more hardware errors 
>    since, but still xfs_repair is unable to complete.
> 
> After a chat on #xfs, Eric and Dave remarked:
> 
> > error 117 means the inode is corrupted; probably shouldn't be at that 
> > stage, probably indicates a repair bug? just looking at the first few 
> > errors
> > bad magic # 0x49414233 in btbno block 28/134141
> > bad magic # 0x46494233 in btcnt block 30/870600
> > the first magic is IAB3 the 2nd is FIB3 those are magic numbers for
> > xfs, but not for the type of block it thought it was checking
> 
> ...and also:
> 
> > cross linked btrees does tend to indicate something went badly wrong
> > at the hardware level
> 
> So, with all that (failed xfs_repair runs that were interrupted by 
> hardware faults and also possibly flaky USB controller[0]) - has anybody 
> an idea on how to convince xfs_repair to still clean up this mess? Or is 
> there no other way than to restore from backup?
> 

I suspect, as intimated by the irc snippet above, there's a bug in
xfs_repair where we've run into an on-disk corruption that was expected
to have been resolved one way or another before phase 7. Note that
xfs_repair is not a data recovery tool, so it has full license to simply
throw objects away that are considered beyond repair or cannot be made
sense of. For that reason, it's usually considered a bug for repair to
exit/crash as shown in your logs. I think you'll need to make your
metadump(s) available for anybody to make progress beyond that.

Brian

> Thanks,
> Christian.
> 
> [0] When the disk enclosure is connected to the Raspberry Pi 3, the kernel 
>     usually recognizes it as follows:
> 
> usb 1-1.4: new high-speed USB device number 4 using dwc2
> usb 1-1.4: New USB device found, idVendor=7825, idProduct=a2a8
> usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=5
> usb 1-1.4: Product: ElitePro Dual U3FW
> usb 1-1.4: Manufacturer: OWC
> usb 1-1.4: SerialNumber: DB9876543211160
> usb 1-1.4: The driver for the USB controller dwc2_hsotg does not support scatter-gather which is
> usb 1-1.4: required by the UAS driver. Please try an other USB controller if you wish to use UAS.
> usb 1-1.4: The driver for the USB controller dwc2_hsotg does not support scatter-gather which is
> usb 1-1.4: required by the UAS driver. Please try an other USB controller if you wish to use UAS.
> usb-storage 1-1.4:1.0: USB Mass Storage device detected
> scsi host0: usb-storage 1-1.4:1.0
> scsi 0:0:0:0: Direct-Access     ElitePro Dual U3FW-1      0006 PQ: 0 ANSI: 6
> scsi 0:0:0:1: Direct-Access     ElitePro Dual U3FW-2      0006 PQ: 0 ANSI: 6
> sd 0:0:0:0: Attached scsi generic sg0 type 0
> sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
> sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 47 00 10 08
> sd 0:0:0:0: [sda] No Caching mode page found
> sd 0:0:0:0: [sda] Assuming drive cache: write through
> sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
> [...] 
> 
> 
> -- 
> BOFH excuse #449:
> 
> greenpeace free'd the mallocs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfs_repair: couldn't map inode 2089979520, err = 117
  2018-01-18  6:27 xfs_repair: couldn't map inode 2089979520, err = 117 Christian Kujau
  2018-01-18 14:18 ` Brian Foster
@ 2018-01-18 18:36 ` Brian Foster
  2018-01-18 18:55   ` Darrick J. Wong
  2018-01-18 21:59   ` Christian Kujau
  1 sibling, 2 replies; 7+ messages in thread
From: Brian Foster @ 2018-01-18 18:36 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-xfs

On Wed, Jan 17, 2018 at 10:27:19PM -0800, Christian Kujau wrote:
> Hi,
> 
> after a(nother) power outage this disk enclosure (containing two seperate 
> disks, connected via USB) was acting up and while one of the disks seems 
> to have died, the other one still works and no more hardware errors are 
> reported for the enclosure or the disk.
> 
> The XFS file system on this disk can be mounted (!) and data can be read, 
> but an xfs_repair fails to complete: http://nerdbynature.de/bits/4.14/xfs/
> 
> I have (compressed) xfs_metadump images available if anyone is interested.
> 
> A timeline of events:
> 
>  * disk enclosure[0] connected to a Raspbery Pi (aarch64)
>  * power failure, and possible power spike after power came back
>  * RPI and disk enclosure disconnected from power.
>  * disk enclosure connected to an x86-64 machine with lots of RAM
>  * xfs_repair (Fedora 27, xfsprogs-4.12) attempted, but the disk enclosure
>    was still trying to handle the other (failing) disk and the repair
>    failed after some USB resets.
>  * failed disk was removed from the enclosure, no more hardware errors 
>    since, but still xfs_repair is unable to complete.
> 
> After a chat on #xfs, Eric and Dave remarked:
> 
> > error 117 means the inode is corrupted; probably shouldn't be at that 
> > stage, probably indicates a repair bug? just looking at the first few 
> > errors
> > bad magic # 0x49414233 in btbno block 28/134141
> > bad magic # 0x46494233 in btcnt block 30/870600
> > the first magic is IAB3 the 2nd is FIB3 those are magic numbers for
> > xfs, but not for the type of block it thought it was checking
> 
> ...and also:
> 
> > cross linked btrees does tend to indicate something went badly wrong
> > at the hardware level
> 
> So, with all that (failed xfs_repair runs that were interrupted by 
> hardware faults and also possibly flaky USB controller[0]) - has anybody 
> an idea on how to convince xfs_repair to still clean up this mess? Or is 
> there no other way than to restore from backup?
> 

After looking at one of Christian's metadumps, it looks like this is a
possible regression as of the inline directory fork verification bits. I
don't have the full cause, but xfs_repair explodes due to the parent
inode validation in xfs_iformat_fork -> xfs_dir2_sf_verify() when
processing directory inode 2089979520. A quick test without the verifier
allows repair to complete.

Christian, for the time being I suppose you could try a slightly older
xfs_repair and see if that gets you anywhere. v4.10 or so appears to not
include the associated commits.

Brian

> Thanks,
> Christian.
> 
> [0] When the disk enclosure is connected to the Raspberry Pi 3, the kernel 
>     usually recognizes it as follows:
> 
> usb 1-1.4: new high-speed USB device number 4 using dwc2
> usb 1-1.4: New USB device found, idVendor=7825, idProduct=a2a8
> usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=5
> usb 1-1.4: Product: ElitePro Dual U3FW
> usb 1-1.4: Manufacturer: OWC
> usb 1-1.4: SerialNumber: DB9876543211160
> usb 1-1.4: The driver for the USB controller dwc2_hsotg does not support scatter-gather which is
> usb 1-1.4: required by the UAS driver. Please try an other USB controller if you wish to use UAS.
> usb 1-1.4: The driver for the USB controller dwc2_hsotg does not support scatter-gather which is
> usb 1-1.4: required by the UAS driver. Please try an other USB controller if you wish to use UAS.
> usb-storage 1-1.4:1.0: USB Mass Storage device detected
> scsi host0: usb-storage 1-1.4:1.0
> scsi 0:0:0:0: Direct-Access     ElitePro Dual U3FW-1      0006 PQ: 0 ANSI: 6
> scsi 0:0:0:1: Direct-Access     ElitePro Dual U3FW-2      0006 PQ: 0 ANSI: 6
> sd 0:0:0:0: Attached scsi generic sg0 type 0
> sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
> sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 47 00 10 08
> sd 0:0:0:0: [sda] No Caching mode page found
> sd 0:0:0:0: [sda] Assuming drive cache: write through
> sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
> [...] 
> 
> 
> -- 
> BOFH excuse #449:
> 
> greenpeace free'd the mallocs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfs_repair: couldn't map inode 2089979520, err = 117
  2018-01-18 18:36 ` Brian Foster
@ 2018-01-18 18:55   ` Darrick J. Wong
  2018-01-18 19:59     ` Brian Foster
  2018-01-29  3:22     ` Christian Kujau
  2018-01-18 21:59   ` Christian Kujau
  1 sibling, 2 replies; 7+ messages in thread
From: Darrick J. Wong @ 2018-01-18 18:55 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christian Kujau, linux-xfs

On Thu, Jan 18, 2018 at 01:36:37PM -0500, Brian Foster wrote:
> On Wed, Jan 17, 2018 at 10:27:19PM -0800, Christian Kujau wrote:
> > Hi,
> > 
> > after a(nother) power outage this disk enclosure (containing two seperate 
> > disks, connected via USB) was acting up and while one of the disks seems 
> > to have died, the other one still works and no more hardware errors are 
> > reported for the enclosure or the disk.
> > 
> > The XFS file system on this disk can be mounted (!) and data can be read, 
> > but an xfs_repair fails to complete: http://nerdbynature.de/bits/4.14/xfs/
> > 
> > I have (compressed) xfs_metadump images available if anyone is interested.
> > 
> > A timeline of events:
> > 
> >  * disk enclosure[0] connected to a Raspbery Pi (aarch64)
> >  * power failure, and possible power spike after power came back
> >  * RPI and disk enclosure disconnected from power.
> >  * disk enclosure connected to an x86-64 machine with lots of RAM
> >  * xfs_repair (Fedora 27, xfsprogs-4.12) attempted, but the disk enclosure
> >    was still trying to handle the other (failing) disk and the repair
> >    failed after some USB resets.
> >  * failed disk was removed from the enclosure, no more hardware errors 
> >    since, but still xfs_repair is unable to complete.
> > 
> > After a chat on #xfs, Eric and Dave remarked:
> > 
> > > error 117 means the inode is corrupted; probably shouldn't be at that 
> > > stage, probably indicates a repair bug? just looking at the first few 
> > > errors
> > > bad magic # 0x49414233 in btbno block 28/134141
> > > bad magic # 0x46494233 in btcnt block 30/870600
> > > the first magic is IAB3 the 2nd is FIB3 those are magic numbers for
> > > xfs, but not for the type of block it thought it was checking
> > 
> > ...and also:
> > 
> > > cross linked btrees does tend to indicate something went badly wrong
> > > at the hardware level
> > 
> > So, with all that (failed xfs_repair runs that were interrupted by 
> > hardware faults and also possibly flaky USB controller[0]) - has anybody 
> > an idea on how to convince xfs_repair to still clean up this mess? Or is 
> > there no other way than to restore from backup?
> > 
> 
> After looking at one of Christian's metadumps, it looks like this is a
> possible regression as of the inline directory fork verification bits. I
> don't have the full cause, but xfs_repair explodes due to the parent
> inode validation in xfs_iformat_fork -> xfs_dir2_sf_verify() when
> processing directory inode 2089979520. A quick test without the verifier
> allows repair to complete.
> 
> Christian, for the time being I suppose you could try a slightly older
> xfs_repair and see if that gets you anywhere. v4.10 or so appears to not
> include the associated commits.

Ahhhurrgh.  Yes, right now xfsprogs is rather inflexible about the
verifiers -- the directory repairer decides that it can simply reset the
parent pointer, but then libxfs_iget & friends barf because the sf
directory verifier fails, and there's no way to turn that off.

Well, there /is/ a way -- refactor the sf verifiers such that they're
(optionally) called by _iget so that repair can load the inode w/o
verifiers, make the corrections, and write everything back out.  That
refactoring will appear in Linux 4.16, so I imagine xfs_repair 4.16 will
get back on track with that.

FWIW I think a reasonable reproducer is running xfs/384 with:

SCRATCH_XFS_LIST_METADATA_FIELDS=u3.sfdir3.hdr.parent.i4
SCRATCH_XFS_LIST_FUZZ_VERBS=random

set in the environment (assumes v5 filesystem, etc.)

In the meantime, yeah, what Brian said.

--D

> Brian
> 
> > Thanks,
> > Christian.
> > 
> > [0] When the disk enclosure is connected to the Raspberry Pi 3, the kernel 
> >     usually recognizes it as follows:
> > 
> > usb 1-1.4: new high-speed USB device number 4 using dwc2
> > usb 1-1.4: New USB device found, idVendor=7825, idProduct=a2a8
> > usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=5
> > usb 1-1.4: Product: ElitePro Dual U3FW
> > usb 1-1.4: Manufacturer: OWC
> > usb 1-1.4: SerialNumber: DB9876543211160
> > usb 1-1.4: The driver for the USB controller dwc2_hsotg does not support scatter-gather which is
> > usb 1-1.4: required by the UAS driver. Please try an other USB controller if you wish to use UAS.
> > usb 1-1.4: The driver for the USB controller dwc2_hsotg does not support scatter-gather which is
> > usb 1-1.4: required by the UAS driver. Please try an other USB controller if you wish to use UAS.
> > usb-storage 1-1.4:1.0: USB Mass Storage device detected
> > scsi host0: usb-storage 1-1.4:1.0
> > scsi 0:0:0:0: Direct-Access     ElitePro Dual U3FW-1      0006 PQ: 0 ANSI: 6
> > scsi 0:0:0:1: Direct-Access     ElitePro Dual U3FW-2      0006 PQ: 0 ANSI: 6
> > sd 0:0:0:0: Attached scsi generic sg0 type 0
> > sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
> > sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
> > sd 0:0:0:0: [sda] Write Protect is off
> > sd 0:0:0:0: [sda] Mode Sense: 47 00 10 08
> > sd 0:0:0:0: [sda] No Caching mode page found
> > sd 0:0:0:0: [sda] Assuming drive cache: write through
> > sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
> > [...] 
> > 
> > 
> > -- 
> > BOFH excuse #449:
> > 
> > greenpeace free'd the mallocs
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfs_repair: couldn't map inode 2089979520, err = 117
  2018-01-18 18:55   ` Darrick J. Wong
@ 2018-01-18 19:59     ` Brian Foster
  2018-01-29  3:22     ` Christian Kujau
  1 sibling, 0 replies; 7+ messages in thread
From: Brian Foster @ 2018-01-18 19:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christian Kujau, linux-xfs

On Thu, Jan 18, 2018 at 10:55:46AM -0800, Darrick J. Wong wrote:
> On Thu, Jan 18, 2018 at 01:36:37PM -0500, Brian Foster wrote:
> > On Wed, Jan 17, 2018 at 10:27:19PM -0800, Christian Kujau wrote:
> > > Hi,
> > > 
> > > after a(nother) power outage this disk enclosure (containing two seperate 
> > > disks, connected via USB) was acting up and while one of the disks seems 
> > > to have died, the other one still works and no more hardware errors are 
> > > reported for the enclosure or the disk.
> > > 
> > > The XFS file system on this disk can be mounted (!) and data can be read, 
> > > but an xfs_repair fails to complete: http://nerdbynature.de/bits/4.14/xfs/
> > > 
> > > I have (compressed) xfs_metadump images available if anyone is interested.
> > > 
> > > A timeline of events:
> > > 
> > >  * disk enclosure[0] connected to a Raspbery Pi (aarch64)
> > >  * power failure, and possible power spike after power came back
> > >  * RPI and disk enclosure disconnected from power.
> > >  * disk enclosure connected to an x86-64 machine with lots of RAM
> > >  * xfs_repair (Fedora 27, xfsprogs-4.12) attempted, but the disk enclosure
> > >    was still trying to handle the other (failing) disk and the repair
> > >    failed after some USB resets.
> > >  * failed disk was removed from the enclosure, no more hardware errors 
> > >    since, but still xfs_repair is unable to complete.
> > > 
> > > After a chat on #xfs, Eric and Dave remarked:
> > > 
> > > > error 117 means the inode is corrupted; probably shouldn't be at that 
> > > > stage, probably indicates a repair bug? just looking at the first few 
> > > > errors
> > > > bad magic # 0x49414233 in btbno block 28/134141
> > > > bad magic # 0x46494233 in btcnt block 30/870600
> > > > the first magic is IAB3 the 2nd is FIB3 those are magic numbers for
> > > > xfs, but not for the type of block it thought it was checking
> > > 
> > > ...and also:
> > > 
> > > > cross linked btrees does tend to indicate something went badly wrong
> > > > at the hardware level
> > > 
> > > So, with all that (failed xfs_repair runs that were interrupted by 
> > > hardware faults and also possibly flaky USB controller[0]) - has anybody 
> > > an idea on how to convince xfs_repair to still clean up this mess? Or is 
> > > there no other way than to restore from backup?
> > > 
> > 
> > After looking at one of Christian's metadumps, it looks like this is a
> > possible regression as of the inline directory fork verification bits. I
> > don't have the full cause, but xfs_repair explodes due to the parent
> > inode validation in xfs_iformat_fork -> xfs_dir2_sf_verify() when
> > processing directory inode 2089979520. A quick test without the verifier
> > allows repair to complete.
> > 
> > Christian, for the time being I suppose you could try a slightly older
> > xfs_repair and see if that gets you anywhere. v4.10 or so appears to not
> > include the associated commits.
> 
> Ahhhurrgh.  Yes, right now xfsprogs is rather inflexible about the
> verifiers -- the directory repairer decides that it can simply reset the
> parent pointer, but then libxfs_iget & friends barf because the sf
> directory verifier fails, and there's no way to turn that off.
> 
> Well, there /is/ a way -- refactor the sf verifiers such that they're
> (optionally) called by _iget so that repair can load the inode w/o
> verifiers, make the corrections, and write everything back out.  That
> refactoring will appear in Linux 4.16, so I imagine xfs_repair 4.16 will
> get back on track with that.
> 

Ah, right. I thought this whole problem sounded familiar and hadn't
quite been able to put my finger on it yet. I recall some of the
discussion around refactoring those bits for verification flexibility in
userspace. It looks like that stuff just hasn't made it into userspace
yet.. thanks!

Brian

> FWIW I think a reasonable reproducer is running xfs/384 with:
> 
> SCRATCH_XFS_LIST_METADATA_FIELDS=u3.sfdir3.hdr.parent.i4
> SCRATCH_XFS_LIST_FUZZ_VERBS=random
> 
> set in the environment (assumes v5 filesystem, etc.)
> 
> In the meantime, yeah, what Brian said.
> 
> --D
> 
> > Brian
> > 
> > > Thanks,
> > > Christian.
> > > 
> > > [0] When the disk enclosure is connected to the Raspberry Pi 3, the kernel 
> > >     usually recognizes it as follows:
> > > 
> > > usb 1-1.4: new high-speed USB device number 4 using dwc2
> > > usb 1-1.4: New USB device found, idVendor=7825, idProduct=a2a8
> > > usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=5
> > > usb 1-1.4: Product: ElitePro Dual U3FW
> > > usb 1-1.4: Manufacturer: OWC
> > > usb 1-1.4: SerialNumber: DB9876543211160
> > > usb 1-1.4: The driver for the USB controller dwc2_hsotg does not support scatter-gather which is
> > > usb 1-1.4: required by the UAS driver. Please try an other USB controller if you wish to use UAS.
> > > usb 1-1.4: The driver for the USB controller dwc2_hsotg does not support scatter-gather which is
> > > usb 1-1.4: required by the UAS driver. Please try an other USB controller if you wish to use UAS.
> > > usb-storage 1-1.4:1.0: USB Mass Storage device detected
> > > scsi host0: usb-storage 1-1.4:1.0
> > > scsi 0:0:0:0: Direct-Access     ElitePro Dual U3FW-1      0006 PQ: 0 ANSI: 6
> > > scsi 0:0:0:1: Direct-Access     ElitePro Dual U3FW-2      0006 PQ: 0 ANSI: 6
> > > sd 0:0:0:0: Attached scsi generic sg0 type 0
> > > sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
> > > sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
> > > sd 0:0:0:0: [sda] Write Protect is off
> > > sd 0:0:0:0: [sda] Mode Sense: 47 00 10 08
> > > sd 0:0:0:0: [sda] No Caching mode page found
> > > sd 0:0:0:0: [sda] Assuming drive cache: write through
> > > sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
> > > [...] 
> > > 
> > > 
> > > -- 
> > > BOFH excuse #449:
> > > 
> > > greenpeace free'd the mallocs
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfs_repair: couldn't map inode 2089979520, err = 117
  2018-01-18 18:55   ` Darrick J. Wong
  2018-01-18 19:59     ` Brian Foster
@ 2018-01-29  3:22     ` Christian Kujau
  1 sibling, 0 replies; 7+ messages in thread
From: Christian Kujau @ 2018-01-29  3:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs

On Thu, 18 Jan 2018, Darrick J. Wong wrote:
> Ahhhurrgh.  Yes, right now xfsprogs is rather inflexible about the
> verifiers -- the directory repairer decides that it can simply reset the
> parent pointer, but then libxfs_iget & friends barf because the sf
> directory verifier fails, and there's no way to turn that off.
> 
> Well, there /is/ a way -- refactor the sf verifiers such that they're
> (optionally) called by _iget so that repair can load the inode w/o
> verifiers, make the corrections, and write everything back out.  That
> refactoring will appear in Linux 4.16, so I imagine xfs_repair 4.16 will
> get back on track with that.
> 
> FWIW I think a reasonable reproducer is running xfs/384 with:
> 
> SCRATCH_XFS_LIST_METADATA_FIELDS=u3.sfdir3.hdr.parent.i4
> SCRATCH_XFS_LIST_FUZZ_VERBS=random

While the file system could be repaired eventually with xfsprogs-v4.10, it 
took me a while to find time to setup a VM and try to run xfs/384.

I checked out djwong-xfs-linux/master and booted into a kernel with 
CONFIG_XFS_ONLINE_SCRUB=y and also built djwong-xfsprogs-dev/djwong-devel 
where xfs_scrub was available, and then:


ubuntu0# export PATH=/opt/xfsprogs-dev/sbin:$PATH
ubuntu0# export SCRATCH_XFS_LIST_METADATA_FIELDS=u3.sfdir3.hdr.parent.i4 SCRATCH_XFS_LIST_FUZZ_VERBS=random
ubuntu0# ./check tests/xfs/384
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 ubuntu0 4.15.0-rc9-00001-g0d665e7b109d
MKFS_OPTIONS  -- -f -bsize=4096 /dev/mapper/vg0-lv1
MOUNT_OPTIONS -- /dev/mapper/vg0-lv1 /mnt/scratch

xfs/384  - output mismatch (see 
/opt/xfstests/xfstests/results//xfs/384.out.bad)
    --- tests/xfs/384.out       2018-01-19 17:10:21.382080009 -0800
    +++ /opt/xfstests/xfstests/results//xfs/384.out.bad 2018-01-28 
19:17:58.795130435 -0800
    @@ -2,4 +2,8 @@
     Format and populate
     Find inline-format dir inode
     Fuzz inline-format dir inode
    +offline repair failed (1) with u3.sfdir3.hdr.parent.i4 = random.
    +offline re-scrub (1) with u3.sfdir3.hdr.parent.i4 = random.
    +online re-scrub (1) with u3.sfdir3.hdr.parent.i4 = random.
    +re-repair failed (1) with u3.sfdir3.hdr.parent.i4 = random.
    ...
    (Run 'diff -u tests/xfs/384.out 
/opt/xfstests/xfstests/results//xfs/384.out.bad'  to see the entire diff)
_check_dmesg: something found in dmesg (see 
/opt/xfstests/xfstests/results//xfs/384.dmesg)
Ran: xfs/384
Failures: xfs/384
Failed 1 of 1 tests


   If this means anything to you, I've put the result files here:
   http://nerdbynature.de/bits/4.14/xfs/


Thanks,
Christian.
-- 
BOFH excuse #209:

Only people with names beginning with 'A' are getting mail this week (a la Microsoft)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfs_repair: couldn't map inode 2089979520, err = 117
  2018-01-18 18:36 ` Brian Foster
  2018-01-18 18:55   ` Darrick J. Wong
@ 2018-01-18 21:59   ` Christian Kujau
  1 sibling, 0 replies; 7+ messages in thread
From: Christian Kujau @ 2018-01-18 21:59 UTC (permalink / raw)
  To: Brian Foster, Darrick J. Wong; +Cc: linux-xfs

On Thu, 18 Jan 2018, Brian Foster wrote:
> Christian, for the time being I suppose you could try a slightly older
> xfs_repair and see if that gets you anywhere. v4.10 or so appears to not
> include the associated commits.

OK, I've compiled v4.10.0 from source and gave it a try - and it 
succeeded: http://nerdbynature.de/bits/4.14/xfs/xfs_6.log 

This time it was able to complete phase 6 and 7 and moved 4.3 GB to 
lost+found (not bad for a 3.7 TB file system :)) 

I tried again with xfs_repair -n v4.12, but didn't dare running it w/o -n 
just yet: http://nerdbynature.de/bits/4.14/xfs/xfs_7.log

Thanks for your help here! If there are patches to try, let me know...

Christian.
-- 
BOFH excuse #66:

bit bucket overflow

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-01-29  3:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-18  6:27 xfs_repair: couldn't map inode 2089979520, err = 117 Christian Kujau
2018-01-18 14:18 ` Brian Foster
2018-01-18 18:36 ` Brian Foster
2018-01-18 18:55   ` Darrick J. Wong
2018-01-18 19:59     ` Brian Foster
2018-01-29  3:22     ` Christian Kujau
2018-01-18 21:59   ` Christian Kujau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox