public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Problem with XFS on USB 2TB HD
@ 2010-12-18 11:26 Kevin Richter
  2010-12-19  2:04 ` Kevin Richter
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Kevin Richter @ 2010-12-18 11:26 UTC (permalink / raw)
  To: xfs

Hi,

I have an external USB 2TB harddisk with an XFS filesystem connected to
a Debian Lenny with a 2.6.33.4 kernel.

5 days ago I backuped my data to this external drive. While this backup
process the USB port got reset and now all the data on the xfs is lost.
I cannot mount the filesystem anymore.

home:/# mount /dev/mapper/backup /mnt/backup/
mount: wrong fs type, bad option, bad superblock on /dev/mapper/backup,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

home:/# mount -t xfs /dev/mapper/backup /mnt/backup/
mount: wrong fs type, bad option, bad superblock on /dev/mapper/backup,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so


The XFS is in a LUKS volume, but the luksOpen works flawlessly
(cryptsetup luksOpen /dev/sdb4 backup).


The syslog at the time of the backup process:
Dec 13 22:34:26 home kernel: usb 1-5: reset high speed USB device using
ehci_hcd and address 5
Dec 13 22:34:37 home kernel: usb 1-5: reset high speed USB device using
ehci_hcd and address 5
Dec 13 22:35:15 home kernel: d2fd1000: c2 5b ed 04 29 c1 19 04 dd 51 f8
84 5c ca 33 79  .[..)....Q..\.3y
Dec 13 22:35:15 home kernel: Filesystem "dm-0": XFS internal error
xfs_da_do_buf(2) at line 2113 of file fs/xfs/xfs_da_btree.c.  Caller
0xc115d3cf
Dec 13 22:35:15 home kernel:
Dec 13 22:35:15 home kernel: Pid: 30113, comm: smbd Not tainted 2.6.33.4 #4
Dec 13 22:35:15 home kernel: Call Trace:
Dec 13 22:35:15 home kernel: [<c116720e>] xfs_error_report+0x2c/0x2e
Dec 13 22:35:15 home kernel: [<c1167245>] xfs_corruption_error+0x35/0x40
Dec 13 22:35:15 home kernel: [<c115d3cf>] ? xfs_da_read_buf+0x18/0x1d
Dec 13 22:35:15 home kernel: [<c115d2bc>] xfs_da_do_buf+0x571/0x629
Dec 13 22:35:15 home kernel: [<c115d3cf>] ? xfs_da_read_buf+0x18/0x1d
Dec 13 22:35:15 home kernel: [<c1372a76>] ? ip_output+0x78/0x7d
Dec 13 22:35:15 home kernel: [<c13724f7>] ? ip_queue_xmit+0x2ce/0x304
Dec 13 22:35:15 home kernel: [<c115d3cf>] xfs_da_read_buf+0x18/0x1d
Dec 13 22:35:15 home kernel: [<c11605b0>] ?
xfs_dir2_block_lookup_int+0x39/0x17c
Dec 13 22:35:15 home kernel: [<c11605b0>]
xfs_dir2_block_lookup_int+0x39/0x17c
Dec 13 22:35:15 home kernel: [<c10cb7fb>] ? __ext3_get_inode_loc+0xc6/0x260
Dec 13 22:35:15 home kernel: [<c114f48e>] ? xfs_bmap_last_offset+0xe8/0xfc
Dec 13 22:35:15 home kernel: [<c1160b34>] xfs_dir2_block_lookup+0x16/0xa2
Dec 13 22:35:15 home kernel: [<c115f993>] xfs_dir_lookup+0x98/0x100
Dec 13 22:35:15 home kernel: [<c118131b>] xfs_lookup+0x3d/0x94
Dec 13 22:35:15 home kernel: [<c1188736>] xfs_vn_lookup+0x38/0x70
Dec 13 22:35:15 home kernel: [<c10704d0>] do_lookup+0xd0/0x16f
Dec 13 22:35:15 home kernel: [<c1071e27>] link_path_walk+0x63b/0x9dc
Dec 13 22:35:15 home kernel: [<c10722f5>] path_walk+0x50/0xb2
Dec 13 22:35:15 home kernel: [<c1072400>] do_path_lookup+0x21/0x42
Dec 13 22:35:15 home kernel: [<c1072c27>] user_path_at+0x3c/0x67
Dec 13 22:35:15 home kernel: [<c106cc20>] vfs_fstatat+0x2d/0x54
Dec 13 22:35:15 home kernel: [<c106cd18>] vfs_stat+0x13/0x15
Dec 13 22:35:15 home kernel: [<c106cd2e>] sys_stat64+0x14/0x28
Dec 13 22:35:15 home kernel: [<c10025d0>] sysenter_do_call+0x12/0x26



The current XFS header bytes of the /dev/mapper/backup volume:
00000000  58 46 53 42 00 00 10 00  00 00 00 00 1c d8 c3 a4
|XFSB.........ØÃ¤|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
00000020  ec 24 29 ae 9c 2f 4a 75  a8 59 58 14 b5 2d 5e ac
|ì$)®./Ju¨YX.µ-^¬|
00000030  00 00 00 00 10 00 00 04  00 00 00 00 00 00 00 80
|................|
00000040  98 55 e4 53 ef e6 e9 03  9a 71 4f 19 3f 8b 6f 14
|.UäSïæé..qO.?.o.|
00000050  75 d6 51 9a dd 84 53 3e  c4 80 ae a1 c2 83 53 5e
|uÖQ.Ý.S>Ä.®¡Â.S^|
00000060  69 c3 f8 1b 35 0b 15 f2  4f 15 46 42 79 6f 8b 13
|iÃø.5..òO.FByo..|
00000070  4c 65 64 ba 38 cd 51 8b  00 00 00 00 2c f9 ac 2c
|Ledº8ÍQ.....,ù¬,|
00000080  47 91 80 45 73 3e 93 77  8f 95 80 81 ab 8b b8 eb
|G..Es>.w....«.¸ë|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
000000a0  ec 24 29 ae 9c 2f 4a 75  a8 59 58 14 b5 2d 5e ac
|ì$)®./Ju¨YX.µ-^¬|
000000b0  00 00 00 00 10 00 00 04  00 00 00 00 00 00 00 80
|................|
000000c0  98 55 e4 53 ef e6 e9 03  9a 71 4f 19 3f 8b 6f 14
|.UäSïæé..qO.?.o.|
000000d0  75 d6 51 9a dd 84 53 3e  c4 80 ae a1 c2 83 53 5e
|uÖQ.Ý.S>Ä.®¡Â.S^|
000000e0  69 c3 f8 1b 35 0b 15 f2  4f 15 46 42 79 6f 8b 13
|iÃø.5..òO.FByo..|
000000f0  4c 65 64 ba 38 cd 51 8b  00 00 00 00 2c f9 ac 2c
|Ledº8ÍQ.....,ù¬,|
00000100  47 91 80 45 73 3e 93 77  8f 95 80 81 ab 8b b8 eb
|G..Es>.w....«.¸ë|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
00000120  ec 24 29 ae 9c 2f 4a 75  a8 59 58 14 b5 2d 5e ac
|ì$)®./Ju¨YX.µ-^¬|
00000130  00 00 00 00 10 00 00 04  00 00 00 00 00 00 00 80
|................|
00000140  98 55 e4 53 ef e6 e9 03  9a 71 4f 19 3f 8b 6f 14
|.UäSïæé..qO.?.o.|
00000150  75 d6 51 9a dd 84 53 3e  c4 80 ae a1 c2 83 53 5e
|uÖQ.Ý.S>Ä.®¡Â.S^|
00000160  69 c3 f8 1b 35 0b 15 f2  4f 15 46 42 79 6f 8b 13
|iÃø.5..òO.FByo..|
00000170  4c 65 64 ba 38 cd 51 8b  00 00 00 00 2c f9 ac 2c
|Ledº8ÍQ.....,ù¬,|
00000180  47 91 80 45 73 3e 93 77  8f 95 80 81 ab 8b b8 eb
|G..Es>.w....«.¸ë|
00000190  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
000001a0  ec 24 29 ae 9c 2f 4a 75  a8 59 58 14 b5 2d 5e ac
|ì$)®./Ju¨YX.µ-^¬|
000001b0  00 00 00 00 10 00 00 04  00 00 00 00 00 00 00 80
|................|
000001c0  98 55 e4 53 ef e6 e9 03  9a 71 4f 19 3f 8b 6f 14
|.UäSïæé..qO.?.o.|
000001d0  75 d6 51 9a dd 84 53 3e  c4 80 ae a1 c2 83 53 5e
|uÖQ.Ý.S>Ä.®¡Â.S^|
000001e0  69 c3 f8 1b 35 0b 15 f2  4f 15 46 42 79 6f 8b 13
|iÃø.5..òO.FByo..|
000001f0  4c 65 64 ba 38 cd 51 8b  00 00 00 00 2c f9 ac 2c
|Ledº8ÍQ.....,ù¬,|



xfs_repair runs since a few hours, but has just printed a bunch of dots.
The first line of xfs_repair is "bad or unsupported version" and that it
cant find a superblock.


Some ideas?
What happened?
Do I use a buggy version?
I have 2.9.8 (http://packages.debian.org/lenny/xfsprogs)
What can I do to avoid these problems in the future?
It is a problem with the interaction of LUKS, XFS and USB?



Thanks in advance
Kevin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2010-12-18 11:26 Problem with XFS on USB 2TB HD Kevin Richter
@ 2010-12-19  2:04 ` Kevin Richter
  2010-12-19  2:37   ` Stan Hoeppner
  2010-12-19 14:57   ` Emmanuel Florac
  2010-12-20  0:10 ` Dave Chinner
  2010-12-20  8:59 ` Michael Monnerie
  2 siblings, 2 replies; 14+ messages in thread
From: Kevin Richter @ 2010-12-19  2:04 UTC (permalink / raw)
  To: xfs

... the xfs_repair process did succeed.
But in the end I still cannot mount the volume.
In the following there is a log of my actions with the appropriate
syslog messages.

The xfs_check (see below) says that there is a bug in xfs. So I am
writing to this list a second time asking for help :)


Thanks,
Kevin



[xfs_repair: after 500 screens of dots...]

.................................................................found
candidate secondary superblock...
verified secondary superblock...
writing modified primary superblock
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent
with calculated value 129
resetting superblock realtime bitmap ino pointer to 129
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent
with calculated value 130
resetting superblock realtime summary ino pointer to 130
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.



home:/# mount -t xfs /dev/mapper/backup /mnt/backup/
mount: Nicht genügend Hauptspeicher verfügbar (engl. out of memory)

syslog:
Dec 18 21:53:25 home kernel: XFS mounting filesystem dm-0
Dec 18 21:53:26 home kernel: XFS: nil uuid in log - IRIX style log
Dec 18 21:53:26 home kernel: Starting XFS recovery on filesystem: dm-0
(logdev: internal)
Dec 18 21:53:26 home kernel: XFS: Invalid block length (0x16d283) given
for buffer
Dec 18 21:53:26 home kernel: XFS: log mount/recovery failed: error 12
Dec 18 21:53:26 home kernel: XFS: log mount failed



home:/# mount -t xfs /dev/mapper/backup /mnt/backup/
mount: Die Struktur muss bereinigt werden (engl. structure needs cleaning)

syslog:
Dec 18 21:53:40 home kernel: XFS mounting filesystem dm-0
Dec 18 21:53:40 home kernel: XFS: nil uuid in log - IRIX style log
Dec 18 21:53:40 home kernel: Starting XFS recovery on filesystem: dm-0
(logdev: internal)
Dec 18 21:53:40 home kernel: XFS: dirty log written in incompatible
format - can't recover
Dec 18 21:53:40 home kernel: XFS: log mount/recovery failed: error 5
Dec 18 21:53:40 home kernel: XFS: log mount failed



home:/# mount -t xfs /dev/mapper/backup /mnt/backup/
mount: /dev/mapper/backup: can't read superblock

syslog:
Dec 18 21:55:40 home kernel: XFS mounting filesystem dm-0
Dec 18 21:55:41 home kernel: XFS: nil uuid in log - IRIX style log
Dec 18 21:55:41 home kernel: Starting XFS recovery on filesystem: dm-0
(logdev: internal)
Dec 18 21:55:41 home kernel: Filesystem "dm-0": corrupt inode 128
((a)extents = 316629094).  Unmount and run xfs_repair.
Dec 18 21:55:41 home kernel: c77f9000: 49 4e 41 ed 02 02 00 00 00 00 00
00 00 00 00 00  INA.............
Dec 18 21:55:41 home kernel: Filesystem "dm-0": XFS internal error
xfs_iformat_extents(1) at line 558 of file fs/xfs/xfs_inode.c.  Caller
0xc116f10b
Dec 18 21:55:41 home kernel:
Dec 18 21:55:41 home kernel: Pid: 23696, comm: mount Not tainted 2.6.33.4 #4
Dec 18 21:55:41 home kernel: Call Trace:
Dec 18 21:55:41 home kernel: [<c116720e>] xfs_error_report+0x2c/0x2e
Dec 18 21:55:41 home kernel: [<c1167245>] xfs_corruption_error+0x35/0x40
Dec 18 21:55:41 home kernel: [<c116f10b>] ? xfs_iformat+0x314/0x499
Dec 18 21:55:41 home kernel: [<c116eb50>] xfs_iformat_extents+0xba/0x1bc
Dec 18 21:55:41 home kernel: [<c116f10b>] ? xfs_iformat+0x314/0x499
Dec 18 21:55:41 home kernel: [<c116f10b>] xfs_iformat+0x314/0x499
Dec 18 21:55:41 home kernel: [<c116f333>] xfs_iread+0xa3/0x160
Dec 18 21:55:41 home kernel: [<c116b563>] xfs_iget+0x1a6/0x2b7
Dec 18 21:55:41 home kernel: [<c117a96e>] xfs_mountfs+0x35e/0x594
Dec 18 21:55:41 home kernel: [<c11832cb>] ? kmem_alloc+0x59/0xab
Dec 18 21:55:41 home kernel: [<c1183379>] ? kmem_zalloc+0x10/0x25
Dec 18 21:55:41 home kernel: [<c117b44a>] ? xfs_mru_cache_create+0xe9/0x121
Dec 18 21:55:41 home kernel: [<c118b4ec>] xfs_fs_fill_super+0x14e/0x292
Dec 18 21:55:41 home kernel: [<c106bde9>] get_sb_bdev+0xf9/0x137
Dec 18 21:55:41 home kernel: [<c1058acb>] ? kstrdup+0x29/0x3a
Dec 18 21:55:41 home kernel: [<c1189f30>] xfs_fs_get_sb+0x13/0x15
Dec 18 21:55:41 home kernel: [<c118b39e>] ? xfs_fs_fill_super+0x0/0x292
Dec 18 21:55:41 home kernel: [<c106ba9f>] vfs_kern_mount+0x86/0x11f
Dec 18 21:55:41 home kernel: [<c106bb7c>] do_kern_mount+0x32/0xbe
Dec 18 21:55:41 home kernel: [<c107c5c2>] do_mount+0x5a9/0x5fb
Dec 18 21:55:41 home kernel: [<c1058924>] ? strndup_user+0x48/0x67
Dec 18 21:55:41 home kernel: [<c107c675>] sys_mount+0x61/0x94
Dec 18 21:55:41 home kernel: [<c10025d0>] sysenter_do_call+0x12/0x26
Dec 18 21:55:41 home kernel: XFS: failed to read root inode



# SWAP SPACE ADDED:
Dec 19 02:32:26 home kernel: Adding 7815612k swap on /dev/sda2.
Priority:-2 extents:1 across:7815612k



xfs_repair executed



home:/# mount -t xfs /dev/mapper/backup /mnt/backup/
mount: Die Struktur muss bereinigt werden (engl. structure needs cleaning)

syslog:
Dec 19 02:33:23 home kernel: XFS mounting filesystem dm-0
Dec 19 02:33:24 home kernel: Starting XFS recovery on filesystem: dm-0
(logdev: internal)
Dec 19 02:33:24 home kernel: Filesystem "dm-0": corrupt inode 128
((a)extents = 316629094).  Unmount and run xfs_repair.
Dec 19 02:33:24 home kernel: d21fb000: 49 4e 41 ed 02 02 00 00 00 00 00
00 00 00 00 00  INA.............
Dec 19 02:33:24 home kernel: Filesystem "dm-0": XFS internal error
xfs_iformat_extents(1) at line 558 of file fs/xfs/xfs_inode.c.  Caller
0xc116f10b
Dec 19 02:33:24 home kernel:
Dec 19 02:33:24 home kernel: Pid: 24735, comm: mount Not tainted 2.6.33.4 #4
Dec 19 02:33:24 home kernel: Call Trace:
Dec 19 02:33:24 home kernel: [<c116720e>] xfs_error_report+0x2c/0x2e
Dec 19 02:33:24 home kernel: [<c1167245>] xfs_corruption_error+0x35/0x40
Dec 19 02:33:24 home kernel: [<c116f10b>] ? xfs_iformat+0x314/0x499
Dec 19 02:33:24 home kernel: [<c116eb50>] xfs_iformat_extents+0xba/0x1bc
Dec 19 02:33:24 home kernel: [<c116f10b>] ? xfs_iformat+0x314/0x499
Dec 19 02:33:24 home kernel: [<c116f10b>] xfs_iformat+0x314/0x499
Dec 19 02:33:24 home kernel: [<c116f333>] xfs_iread+0xa3/0x160
Dec 19 02:33:24 home kernel: [<c116b563>] xfs_iget+0x1a6/0x2b7
Dec 19 02:33:24 home kernel: [<c117a96e>] xfs_mountfs+0x35e/0x594
Dec 19 02:33:24 home kernel: [<c11832cb>] ? kmem_alloc+0x59/0xab
Dec 19 02:33:24 home kernel: [<c1183379>] ? kmem_zalloc+0x10/0x25
Dec 19 02:33:24 home kernel: [<c117b44a>] ? xfs_mru_cache_create+0xe9/0x121
Dec 19 02:33:24 home kernel: [<c118b4ec>] xfs_fs_fill_super+0x14e/0x292
Dec 19 02:33:24 home kernel: [<c106bde9>] get_sb_bdev+0xf9/0x137
Dec 19 02:33:24 home kernel: [<c1058acb>] ? kstrdup+0x29/0x3a
Dec 19 02:33:24 home kernel: [<c1189f30>] xfs_fs_get_sb+0x13/0x15
Dec 19 02:33:24 home kernel: [<c118b39e>] ? xfs_fs_fill_super+0x0/0x292
Dec 19 02:33:24 home kernel: [<c106ba9f>] vfs_kern_mount+0x86/0x11f
Dec 19 02:33:24 home kernel: [<c106bb7c>] do_kern_mount+0x32/0xbe
Dec 19 02:33:24 home kernel: [<c107c5c2>] do_mount+0x5a9/0x5fb
Dec 19 02:33:24 home kernel: [<c1058924>] ? strndup_user+0x48/0x67
Dec 19 02:33:24 home kernel: [<c107c675>] sys_mount+0x61/0x94
Dec 19 02:33:24 home kernel: [<c10025d0>] sysenter_do_call+0x12/0x26
Dec 19 02:33:24 home kernel: XFS: failed to read root inode



home:/# xfs_repair /dev/mapper/backup
Phase 1 - find and verify superblock...
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent
with calculated value 129
resetting superblock realtime bitmap ino pointer to 129
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent
with calculated value 130
resetting superblock realtime summary ino pointer to 130
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.



home:/# xfs_check /dev/mapper/backup
corrupt inode 128 ((a)extents = 316629094).  This is a bug.
Please capture the filesystem metadata with xfs_metadump and
report it to xfs@oss.sgi.com.
cache_node_purge: refcount was 1, not zero (node=0x80dba08)
xfs_check: cannot read root inode (117)
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_check.  If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2010-12-19  2:04 ` Kevin Richter
@ 2010-12-19  2:37   ` Stan Hoeppner
  2010-12-19 14:57   ` Emmanuel Florac
  1 sibling, 0 replies; 14+ messages in thread
From: Stan Hoeppner @ 2010-12-19  2:37 UTC (permalink / raw)
  To: xfs

Kevin Richter put forth on 12/18/2010 8:04 PM:
> ... the xfs_repair process did succeed.
> But in the end I still cannot mount the volume.
> In the following there is a log of my actions with the appropriate
> syslog messages.
> 
> The xfs_check (see below) says that there is a bug in xfs. So I am
> writing to this list a second time asking for help :)

I don't have a solution to your problem unfortunately.  Keep in mind you
posted this problem on a weekend, and one week before Christmas no less.
 Your timing is not optimal for receiving a prompt response.  It's
possible you may have to wait until Monday for help. :(

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2010-12-19  2:04 ` Kevin Richter
  2010-12-19  2:37   ` Stan Hoeppner
@ 2010-12-19 14:57   ` Emmanuel Florac
  2010-12-19 17:30     ` Eric Sandeen
  1 sibling, 1 reply; 14+ messages in thread
From: Emmanuel Florac @ 2010-12-19 14:57 UTC (permalink / raw)
  To: xfs; +Cc: xfs

Le Sun, 19 Dec 2010 03:04:21 +0100 vous écriviez:

> ERROR: The filesystem has valuable metadata changes in a log which
> needs to be replayed.

Given this message, you'll have to run xfs_repair with the zero log
option ( -L ). This is dangerous, but it can't get much worse anyway.

Run xfs_repair -L /device , you should be able to mount your filesystem
afterwards, however any data under change at the time of failure will
most probably be lost.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2010-12-19 14:57   ` Emmanuel Florac
@ 2010-12-19 17:30     ` Eric Sandeen
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Sandeen @ 2010-12-19 17:30 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs, xfs

On 12/19/10 8:57 AM, Emmanuel Florac wrote:
> Le Sun, 19 Dec 2010 03:04:21 +0100 vous écriviez:
> 
>> ERROR: The filesystem has valuable metadata changes in a log which
>> needs to be replayed.
> 
> Given this message, you'll have to run xfs_repair with the zero log
> option ( -L ). This is dangerous, but it can't get much worse anyway.

I agree with that approach.

> Run xfs_repair -L /device , you should be able to mount your filesystem
> afterwards, however any data under change at the time of failure will
> most probably be lost.

And something seems to have really whacked your filesystem; odds are 
the USB transport was lying to xfs one way or another about completed
writes, and when it went away, things were not consistent on disk
as expected.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2010-12-18 11:26 Problem with XFS on USB 2TB HD Kevin Richter
  2010-12-19  2:04 ` Kevin Richter
@ 2010-12-20  0:10 ` Dave Chinner
  2010-12-20  2:56   ` Kevin Richter
  2010-12-20  8:59 ` Michael Monnerie
  2 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2010-12-20  0:10 UTC (permalink / raw)
  To: Kevin Richter; +Cc: xfs

On Sat, Dec 18, 2010 at 12:26:07PM +0100, Kevin Richter wrote:
> Hi,
> 
> I have an external USB 2TB harddisk with an XFS filesystem connected to
> a Debian Lenny with a 2.6.33.4 kernel.
> 
> 5 days ago I backuped my data to this external drive. While this backup
> process the USB port got reset and now all the data on the xfs is lost.

There's been garbage written all over this superblock:

> The current XFS header bytes of the /dev/mapper/backup volume:
> 00000000  58 46 53 42 00 00 10 00  00 00 00 00 1c d8 c3 a4 |XFSB.........ØÃ¤|
> 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
> 00000020  ec 24 29 ae 9c 2f 4a 75  a8 59 58 14 b5 2d 5e ac |ì$)®./Ju¨YX.µ-^¬|
> 00000030  00 00 00 00 10 00 00 04  00 00 00 00 00 00 00 80 |................|

Ok up to here (first 64 bytes)

> 00000040  98 55 e4 53 ef e6 e9 03  9a 71 4f 19 3f 8b 6f 14 |.UäSïæé..qO.?.o.|
> 00000050  75 d6 51 9a dd 84 53 3e  c4 80 ae a1 c2 83 53 5e |uÖQ.Ý.S>Ä.®¡Â.S^|
> 00000060  69 c3 f8 1b 35 0b 15 f2  4f 15 46 42 79 6f 8b 13 |iÃø.5..òO.FByo..|
> 00000070  4c 65 64 ba 38 cd 51 8b  00 00 00 00 2c f9 ac 2c |Ledº8ÍQ.....,ù¬,|
> 00000080  47 91 80 45 73 3e 93 77  8f 95 80 81 ab 8b b8 eb |G..Es>.w....«.¸ë|

This is all garbage (second 64 bytes)

> 00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|

Might be ok.

> 000000a0  ec 24 29 ae 9c 2f 4a 75  a8 59 58 14 b5 2d 5e ac |ì$)®./Ju¨YX.µ-^¬|

Garbage

> 000000b0  00 00 00 00 10 00 00 04  00 00 00 00 00 00 00 80 |................|

Might be ok

> 000000c0  98 55 e4 53 ef e6 e9 03  9a 71 4f 19 3f 8b 6f 14 |.UäSïæé..qO.?.o.|
> 000000d0  75 d6 51 9a dd 84 53 3e  c4 80 ae a1 c2 83 53 5e |uÖQ.Ý.S>Ä.®¡Â.S^|
> 000000e0  69 c3 f8 1b 35 0b 15 f2  4f 15 46 42 79 6f 8b 13 |iÃø.5..òO.FByo..|
> 000000f0  4c 65 64 ba 38 cd 51 8b  00 00 00 00 2c f9 ac 2c |Ledº8ÍQ.....,ù¬,|
> 00000100  47 91 80 45 73 3e 93 77  8f 95 80 81 ab 8b b8 eb |G..Es>.w....«.¸ë|

Garbage.

> 00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
> 00000120  ec 24 29 ae 9c 2f 4a 75  a8 59 58 14 b5 2d 5e ac |ì$)®./Ju¨YX.µ-^¬|
> 00000130  00 00 00 00 10 00 00 04  00 00 00 00 00 00 00 80 |................|
> 00000140  98 55 e4 53 ef e6 e9 03  9a 71 4f 19 3f 8b 6f 14 |.UäSïæé..qO.?.o.|
> 00000150  75 d6 51 9a dd 84 53 3e  c4 80 ae a1 c2 83 53 5e |uÖQ.Ý.S>Ä.®¡Â.S^|
> 00000160  69 c3 f8 1b 35 0b 15 f2  4f 15 46 42 79 6f 8b 13 |iÃø.5..òO.FByo..|
> 00000170  4c 65 64 ba 38 cd 51 8b  00 00 00 00 2c f9 ac 2c |Ledº8ÍQ.....,ù¬,|
> 00000180  47 91 80 45 73 3e 93 77  8f 95 80 81 ab 8b b8 eb |G..Es>.w....«.¸ë|
> 00000190  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
> 000001a0  ec 24 29 ae 9c 2f 4a 75  a8 59 58 14 b5 2d 5e ac |ì$)®./Ju¨YX.µ-^¬|
> 000001b0  00 00 00 00 10 00 00 04  00 00 00 00 00 00 00 80 |................|
> 000001c0  98 55 e4 53 ef e6 e9 03  9a 71 4f 19 3f 8b 6f 14 |.UäSïæé..qO.?.o.|
> 000001d0  75 d6 51 9a dd 84 53 3e  c4 80 ae a1 c2 83 53 5e |uÖQ.Ý.S>Ä.®¡Â.S^|
> 000001e0  69 c3 f8 1b 35 0b 15 f2  4f 15 46 42 79 6f 8b 13 |iÃø.5..òO.FByo..|
> 000001f0  4c 65 64 ba 38 cd 51 8b  00 00 00 00 2c f9 ac 2c |Ledº8ÍQ.....,ù¬,|

These should all be zero.

Basically, whatever happened to cause the USB reset has resulted in
corruption of various sectors of the hard drive. If you can't get
your usb drive to work reliably, then don't expect the filesystem to
stay intact.  I'd start by replacing the USB cable and probably the
enclosure, writing a pattern to the entire harddrive (e.g.
0xa5a55a5a) and verifying you can read it back without corruption or
USB resets. If you can't get your USB drive to work reliably, then
it's not a good medium for storing backups....

If that is fine, mkfs the filesystem on it and redo your backup
again.

> It is a problem with the interaction of LUKS, XFS and USB?

You are encrypting the external drive? That would explain the
garbage then - a single bit error in a sector will render it
completely incorrect....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2010-12-20  0:10 ` Dave Chinner
@ 2010-12-20  2:56   ` Kevin Richter
  2010-12-20  4:51     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Richter @ 2010-12-20  2:56 UTC (permalink / raw)
  To: xfs

Thanks a lot for your responses.

> I don't have a solution to your problem unfortunately.  Keep in mind you
> posted this problem on a weekend, and one week before Christmas no less.
>  Your timing is not optimal for receiving a prompt response.  It's
> possible you may have to wait until Monday for help. :(

No problem at all. It can wait. My backup drive has indeed no real
important stuff on it - just 100% backups from other disks.
I'd just use this issue as an opportunity to try out the xfs repair
functionality.


>> It is a problem with the interaction of LUKS, XFS and USB?
>
> You are encrypting the external drive? That would explain the
> garbage then - a single bit error in a sector will render it
> completely incorrect....

Yes, I do.
aes-cbc-essiv:sha256
If the connection breaks while writing a block, only this block should
be garbled? This should be 256 bit in this case, shouldn't it?


>> ERROR: The filesystem has valuable metadata changes in a log which
>> > needs to be replayed.
> Given this message, you'll have to run xfs_repair with the zero log
> option ( -L ). This is dangerous, but it can't get much worse anyway.
> 
> Run xfs_repair -L /device , you should be able to mount your filesystem
> afterwards, however any data under change at the time of failure will
> most probably be lost.

OK, if there is no way around...

xfs_repair -L /dev/mapper/backup has run a few minutes and has printed a
lot of screens. After lines like

disconnected inode 4083355215, moving to lost+found
disconnected dir inode 4083355216, moving to lost+found

or

Phase 7 - verify and correct link counts...
resetting inode 128 nlinks from 2 to 3
...
b767c6f0: Badness in key lookup (length)
bp=(bno 0, len 4096 bytes) key=(bno 0, len 512 bytes)
done

it has succeeded in this way, that it is now at least possible to mount
the volume. But there is only the lost+found folder in there which
contains again a lot of folders and files named with numbers. Looking
deeper in these directories all the names of further files or
directories are preserved. Phew! Only the fs structure in the first
level seems to be garbled.

Checking the size of the lost+found folder, (nearly) everything seems to
be there.

Now I am asking myself: If only one 256 bit block is garbled (f. ex.
because of a terminated usb connection) all the directory and file names
in the first level gets garbled? Wicked!



Cheers,
Kevin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2010-12-20  2:56   ` Kevin Richter
@ 2010-12-20  4:51     ` Dave Chinner
  2010-12-20  9:55       ` Emmanuel Florac
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2010-12-20  4:51 UTC (permalink / raw)
  To: Kevin Richter; +Cc: xfs

On Mon, Dec 20, 2010 at 03:56:05AM +0100, Kevin Richter wrote:
> Thanks a lot for your responses.
> 
> > I don't have a solution to your problem unfortunately.  Keep in mind you
> > posted this problem on a weekend, and one week before Christmas no less.
> >  Your timing is not optimal for receiving a prompt response.  It's
> > possible you may have to wait until Monday for help. :(
> 
> No problem at all. It can wait. My backup drive has indeed no real
> important stuff on it - just 100% backups from other disks.
> I'd just use this issue as an opportunity to try out the xfs repair
> functionality.
> 
> 
> >> It is a problem with the interaction of LUKS, XFS and USB?
> >
> > You are encrypting the external drive? That would explain the
> > garbage then - a single bit error in a sector will render it
> > completely incorrect....
> 
> Yes, I do.
> aes-cbc-essiv:sha256
> If the connection breaks while writing a block, only this block should
> be garbled? This should be 256 bit in this case, shouldn't it?

<shrug>

Who knows how the encryption algorithm (sha256) is encrypting
blocks. or what internal block size it is using. All I know is that
the it has to ensure that sectors (512 bytes) are written and read
atomically to/from the storage device.

Hence, at minimum, it will be encoding individual sectors (512
bytes), so any single bit error in the sector will likely result in
512 bytes of garbage.

> it has succeeded in this way, that it is now at least possible to mount
> the volume. But there is only the lost+found folder in there which
> contains again a lot of folders and files named with numbers. Looking
> deeper in these directories all the names of further files or
> directories are preserved. Phew! Only the fs structure in the first
> level seems to be garbled.
> 
> Checking the size of the lost+found folder, (nearly) everything seems to
> be there.
> 
> Now I am asking myself: If only one 256 bit block is garbled (f. ex.
> because of a terminated usb connection) all the directory and file names
> in the first level gets garbled? Wicked!

That's because you had the misfortune of garbling the root directory
inode along with the superblock. That's a very specific corruption,
but if the corruption occurred a few blocks away in a data extent
you wouldn't even know about it until you restore from backup and
realised the file content in the backup are corrupted. Indeed - you
should consider that entire backup as corrupted and redo it from
scratch.

FWIW, encryption makes any sort of corruption below the encrypted
layer much, much worse as it turns things like single bit media
errors into undecipherable, unrecoverable blocks of noise. No
filesystem can recover from such corruption of an encrypted device.
Hence if you had the same problem on ex3, ext4, JFS, etc you will
end up with the same mess (or worse).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2010-12-18 11:26 Problem with XFS on USB 2TB HD Kevin Richter
  2010-12-19  2:04 ` Kevin Richter
  2010-12-20  0:10 ` Dave Chinner
@ 2010-12-20  8:59 ` Michael Monnerie
  2 siblings, 0 replies; 14+ messages in thread
From: Michael Monnerie @ 2010-12-20  8:59 UTC (permalink / raw)
  To: xfs, xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 717 bytes --]

On Samstag, 18. Dezember 2010 Kevin Richter wrote:
> I have 2.9.8 (http://packages.debian.org/lenny/xfsprogs)

Actual is 3.1.x, which has a *lot* of enhancements especially for speed 
and memory consumption, and also fixes some additional errors which the 
older versions couldn't repair.

You've fixed your problem now, but it's really good to get xfs_repair 
updated.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// ****** Radiointerview zum Thema Spam ******
// http://www.it-podcast.at/archiv.html#podcast-100716
// 
// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2010-12-20  4:51     ` Dave Chinner
@ 2010-12-20  9:55       ` Emmanuel Florac
  2011-01-12 23:37         ` Kevin Richter
  0 siblings, 1 reply; 14+ messages in thread
From: Emmanuel Florac @ 2010-12-20  9:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, Kevin Richter

Le Mon, 20 Dec 2010 15:51:26 +1100 vous écriviez:
> <shrug>
> 
> Who knows how the encryption algorithm (sha256) is encrypting
> blocks.

According to Wikipedia, the block size is 512.

> FWIW, encryption makes any sort of corruption below the encrypted
> layer much, much worse as it turns things like single bit media
> errors into undecipherable, unrecoverable blocks of noise. No
> filesystem can recover from such corruption of an encrypted device.
> Hence if you had the same problem on ex3, ext4, JFS, etc you will
> end up with the same mess (or worse).

That's why I have an unencrypted local backup and an encrypted remote
one :)

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2010-12-20  9:55       ` Emmanuel Florac
@ 2011-01-12 23:37         ` Kevin Richter
  2011-01-13 13:15           ` Extreme fragmentation when backing up via NFS Phil Karn
  2011-01-13 21:52           ` Problem with XFS on USB 2TB HD Geoffrey Wehrman
  0 siblings, 2 replies; 14+ messages in thread
From: Kevin Richter @ 2011-01-12 23:37 UTC (permalink / raw)
  Cc: xfs

> That's because you had the misfortune of garbling the root directory
> inode along with the superblock. That's a very specific corruption,
> but if the corruption occurred a few blocks away in a data extent
> you wouldn't even know about it until you restore from backup and
> realised the file content in the backup are corrupted. Indeed - you
> should consider that entire backup as corrupted and redo it from
> scratch.

I am wondering if there is a simple solution to backup/restore the inode
table (the relation "inode <-> filename").

With "ls -aliR" I get a list which I am now saving every few days.
The parameter "-i" displays the inode, that I can reconstruct the
filename from the inode, if this garbling error occurs a second time.

The reconstruction process probably would be a simple "grep | cut" thing.

Is there perhaps a finished script doing exactly this? Or any other ideas?


Bye,
Kevin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Extreme fragmentation when backing up via NFS
  2011-01-12 23:37         ` Kevin Richter
@ 2011-01-13 13:15           ` Phil Karn
  2011-01-14  4:51             ` Dave Chinner
  2011-01-13 21:52           ` Problem with XFS on USB 2TB HD Geoffrey Wehrman
  1 sibling, 1 reply; 14+ messages in thread
From: Phil Karn @ 2011-01-13 13:15 UTC (permalink / raw)
  To: xfs

I have been backing up my main Linux server onto a secondary machine via
NFS. I use xfsdump like this:

xfsdump -l 9 -f /machine/backups/fs.9.xfsdump /

Over on the server machine, xfs_bmap shows an *extreme* amount of
fragmentation in the backup file. 20,000+ extents are not uncommon, with
many extents consisting of a single allocation block (8x 512B sectors or
4KB).

I do notice while the backup file is being written that holes often
appear in the extent map towards the end of the file. I theorize that
somehow the individual writes are going to the file system out of order,
and this causes both the temporary holes and the extreme fragmentation.

I'm able to work around the fragmentation manually by looking at the
estimate from xfsdump of the size of the backup and then using the
fallocate command locally on the file server to allocate more than that
amount of space to the backup file. When the backup is done, I look at
xfsdump's report of the actual size of the backup file and use the
truncate command locally on the server to trim off the excess.

Is fragmentation on XFS via NFS a known problem?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Problem with XFS on USB 2TB HD
  2011-01-12 23:37         ` Kevin Richter
  2011-01-13 13:15           ` Extreme fragmentation when backing up via NFS Phil Karn
@ 2011-01-13 21:52           ` Geoffrey Wehrman
  1 sibling, 0 replies; 14+ messages in thread
From: Geoffrey Wehrman @ 2011-01-13 21:52 UTC (permalink / raw)
  To: Kevin Richter; +Cc: xfs

On Thu, Jan 13, 2011 at 12:37:28AM +0100, Kevin Richter wrote:
| > That's because you had the misfortune of garbling the root directory
| > inode along with the superblock. That's a very specific corruption,
| > but if the corruption occurred a few blocks away in a data extent
| > you wouldn't even know about it until you restore from backup and
| > realised the file content in the backup are corrupted. Indeed - you
| > should consider that entire backup as corrupted and redo it from
| > scratch.
| 
| I am wondering if there is a simple solution to backup/restore the inode
| table (the relation "inode <-> filename").
| 
| With "ls -aliR" I get a list which I am now saving every few days.
| The parameter "-i" displays the inode, that I can reconstruct the
| filename from the inode, if this garbling error occurs a second time.
| 
| The reconstruction process probably would be a simple "grep | cut" thing.
| 
| Is there perhaps a finished script doing exactly this? Or any other ideas?

xfs_ncheck(8) will generate the patname/inode mapping, but does not
provide a restore service.


-- 
Geoffrey Wehrman

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Extreme fragmentation when backing up via NFS
  2011-01-13 13:15           ` Extreme fragmentation when backing up via NFS Phil Karn
@ 2011-01-14  4:51             ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2011-01-14  4:51 UTC (permalink / raw)
  To: karn; +Cc: xfs

On Thu, Jan 13, 2011 at 05:15:52AM -0800, Phil Karn wrote:
> I have been backing up my main Linux server onto a secondary machine via
> NFS. I use xfsdump like this:
> 
> xfsdump -l 9 -f /machine/backups/fs.9.xfsdump /
> 
> Over on the server machine, xfs_bmap shows an *extreme* amount of
> fragmentation in the backup file. 20,000+ extents are not uncommon, with
> many extents consisting of a single allocation block (8x 512B sectors or
> 4KB).
> 
> I do notice while the backup file is being written that holes often
> appear in the extent map towards the end of the file. I theorize that
> somehow the individual writes are going to the file system out of order,
> and this causes both the temporary holes and the extreme fragmentation.
> 
> I'm able to work around the fragmentation manually by looking at the
> estimate from xfsdump of the size of the backup and then using the
> fallocate command locally on the file server to allocate more than that
> amount of space to the backup file. When the backup is done, I look at
> xfsdump's report of the actual size of the backup file and use the
> truncate command locally on the server to trim off the excess.
> 
> Is fragmentation on XFS via NFS a known problem?

Yes, and it's caused by the way the NFS server uses the VFS. These
commits that have just hit mainline in the 2.6.38-rc1 merge window:

6e85756 xfs: don't truncate prealloc from frequently accessed inodes
055388a xfs: dynamic speculative EOF preallocation

Should mostly fix the problem. It would be good to know if they
really do fix your problem or not, because you are suffering from
exactly the problem they are supposed to fix. I've copied the
commit messages below so I don't have to spend time explaining the
problem or the fix. :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

commit 6e857567dbbfe14dd6cc3f7414671b047b1ff5c7
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Dec 23 12:02:31 2010 +1100

    xfs: don't truncate prealloc from frequently accessed inodes
    
    A long standing problem for streaming writeѕ through the NFS server
    has been that the NFS server opens and closes file descriptors on an
    inode for every write. The result of this behaviour is that the
    ->release() function is called on every close and that results in
    XFS truncating speculative preallocation beyond the EOF.  This has
    an adverse effect on file layout when multiple files are being
    written at the same time - they interleave their extents and can
    result in severe fragmentation.
    
    To avoid this problem, keep track of ->release calls made on a dirty
    inode. For most cases, an inode is only going to be opened once for
    writing and then closed again during it's lifetime in cache. Hence
    if there are multiple ->release calls when the inode is dirty, there
    is a good chance that the inode is being accessed by the NFS server.
    Hence set a flag the first time ->release is called while there are
    delalloc blocks still outstanding on the inode.

    If this flag is set when ->release is next called, then do no
    truncate away the speculative preallocation - leave it there so that
    subsequent writes do not need to reallocate the delalloc space. This
    will prevent interleaving of extents of different inodes written
    concurrently to the same AG.
    
    If we get this wrong, it is not a big deal as we truncate
    speculative allocation beyond EOF anyway in xfs_inactive() when the
    inode is thrown out of the cache.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>


commit 055388a3188f56676c21e92962fc366ac8b5cb72
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Jan 4 11:35:03 2011 +1100

    xfs: dynamic speculative EOF preallocation
    
    Currently the size of the speculative preallocation during delayed
    allocation is fixed by either the allocsize mount option of a
    default size. We are seeing a lot of cases where we need to
    recommend using the allocsize mount option to prevent fragmentation
    when buffered writes land in the same AG.
    
    Rather than using a fixed preallocation size by default (up to 64k),
    make it dynamic by basing it on the current inode size. That way the
    EOF preallocation will increase as the file size increases.  Hence
    for streaming writes we are much more likely to get large
    preallocations exactly when we need it to reduce fragementation.
    
    For default settings, the size of the initial extents is determined
    by the number of parallel writers and the amount of memory in the
    machine. For 4GB RAM and 4 concurrent 32GB file writes:
    
    EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
       0: [0..1048575]:         1048672..2097247      0 (1048672..2097247)      1048576
       1: [1048576..2097151]:   5242976..6291551      0 (5242976..6291551)      1048576
       2: [2097152..4194303]:   12583008..14680159    0 (12583008..14680159)    2097152
       3: [4194304..8388607]:   25165920..29360223    0 (25165920..29360223)    4194304
       4: [8388608..16777215]:  58720352..67108959    0 (58720352..67108959)    8388608
       5: [16777216..33554423]: 117440584..134217791  0 (117440584..134217791) 16777208
       6: [33554424..50331511]: 184549056..201326143  0 (184549056..201326143) 16777088
       7: [50331512..67108599]: 251657408..268434495  0 (251657408..268434495) 16777088
    
    and for 16 concurrent 16GB file writes:
    
     EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                 TOTAL
       0: [0..262143]:          2490472..2752615      0 (2490472..2752615)       262144
       1: [262144..524287]:     6291560..6553703      0 (6291560..6553703)       262144
       2: [524288..1048575]:    13631592..14155879    0 (13631592..14155879)     524288
       3: [1048576..2097151]:   30408808..31457383    0 (30408808..31457383)    1048576
       4: [2097152..4194303]:   52428904..54526055    0 (52428904..54526055)    2097152
       5: [4194304..8388607]:   104857704..109052007  0 (104857704..109052007)  4194304
       6: [8388608..16777215]:  209715304..218103911  0 (209715304..218103911)  8388608
       7: [16777216..33554423]: 452984848..469762055  0 (452984848..469762055) 16777208
    
    Because it is hard to take back specualtive preallocation, cases
    where there are large slow growing log files on a nearly full
    filesystem may cause premature ENOSPC. Hence as the filesystem nears
    full, the maximum dynamic prealloc size іs reduced according to this
    table (based on 4k block size):
    
    freespace       max prealloc size
      >5%             full extent (8GB)
      4-5%             2GB (8GB >> 2)
      3-4%             1GB (8GB >> 3)
      2-3%           512MB (8GB >> 4)
      1-2%           256MB (8GB >> 5)
      <1%            128MB (8GB >> 6)
    
    This should reduce the amount of space held in speculative
    preallocation for such cases.
    
    The allocsize mount option turns off the dynamic behaviour and fixes
    the prealloc size to whatever the mount option specifies. i.e. the
    behaviour is unchanged.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-01-14  4:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-18 11:26 Problem with XFS on USB 2TB HD Kevin Richter
2010-12-19  2:04 ` Kevin Richter
2010-12-19  2:37   ` Stan Hoeppner
2010-12-19 14:57   ` Emmanuel Florac
2010-12-19 17:30     ` Eric Sandeen
2010-12-20  0:10 ` Dave Chinner
2010-12-20  2:56   ` Kevin Richter
2010-12-20  4:51     ` Dave Chinner
2010-12-20  9:55       ` Emmanuel Florac
2011-01-12 23:37         ` Kevin Richter
2011-01-13 13:15           ` Extreme fragmentation when backing up via NFS Phil Karn
2011-01-14  4:51             ` Dave Chinner
2011-01-13 21:52           ` Problem with XFS on USB 2TB HD Geoffrey Wehrman
2010-12-20  8:59 ` Michael Monnerie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox