* XFS on RBD crash
@ 2017-12-09 21:01 Alex Gorbachev
2017-12-09 21:04 ` Alex Gorbachev
2017-12-11 21:06 ` Dave Chinner
0 siblings, 2 replies; 3+ messages in thread
From: Alex Gorbachev @ 2017-12-09 21:01 UTC (permalink / raw)
To: linux-xfs
I have experienced a crash today (in a sense of filesystem going
offline) of a 25TB XFS filesystem. Tried searching the list and
google, and not much specific info I can use, so very much appreciate
any insight:
System: Ubuntu 16.04, kernel 4.10.17-041017-generic
Mount info:
/dev/rbd0 on /srv/exports/sclun63 type xfs
(rw,relatime,attr2,inode64,logbsize=256k,sunit=8192,swidth=8192,noquota)
Failure trace:
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.769776] XFS (rbd1):
Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file
/home/kernel/COD/linux/fs/xfs/libxfs/xfs_btree.c. Caller
xfs_free_ag_extent+0x352/0x740 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.769890] CPU: 20 PID:
724783 Comm: kworker/20:4 Tainted: G OE
4.10.17-041017-generic #201705201051
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.769963] Hardware name:
Dell Inc. Precision Rack 7910/01J90F, BIOS 1.1.4 11/04/2014
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770058] Workqueue:
xfs-conv/rbd1 xfs_end_io [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770098] Call Trace:
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770136] dump_stack+0x63/0x81
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770198]
xfs_error_report+0x3c/0x40 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770257] ?
xfs_free_ag_extent+0x352/0x740 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770317]
xfs_btree_insert+0x16d/0x1c0 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770375]
xfs_free_ag_extent+0x352/0x740 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770433]
xfs_free_extent+0xa8/0x140 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770501]
xfs_trans_free_extent+0x43/0x110 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770568]
xfs_extent_free_finish_item+0x26/0x40 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770630]
xfs_defer_finish+0x151/0x430 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770694]
xfs_iomap_write_unwritten+0xd3/0x2e0 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770737] ?
propagate_entity_cfs_rq.isra.66+0x271/0xa30
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770802]
xfs_end_io+0xa0/0xb0 [xfs]
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770840]
process_one_work+0x1fc/0x4b0
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770877] worker_thread+0x4b/0x500
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770914] kthread+0x109/0x140
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770949] ?
process_one_work+0x4b0/0x4b0
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770987] ?
kthread_create_on_node+0x60/0x60
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.771028] ret_from_fork+0x2c/0x40
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.771093] XFS (rbd1):
xfs_do_force_shutdown(0x8) called from line 236 of file
/home/kernel/COD/linux/fs/xfs/libxfs/xfs_defer.c. Return address =
0xffffffffc0f0d366
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772151] XFS (rbd1):
Corruption of in-memory data detected. Shutting down filesystem
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772219] XFS (rbd1):
Please umount the filesystem and rectify the problem(s)
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772292] Buffer I/O
error on dev rbd1, logical block 5078005778, lost async page write
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772362] Buffer I/O
error on dev rbd1, logical block 5078005779, lost async page write
Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.777755] Buffer I/O
error on dev rbd1, logical block 5078005780, lost async page write
xfs_repair (had to do -L):
root@roc01r-scd224:~# xfs_repair -L /dev/rbd0
Phase 1 - find and verify superblock...
- reporting progress in intervals of 15 minutes
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
- scan filesystem freespace and inode maps...
freeblk count 3 != flcount 4 in ag 47
freeblk count 3 != flcount 4 in ag 45
freeblk count 3 != flcount 4 in ag 46
freeblk count 3 != flcount 4 in ag 53
freeblk count 3 != flcount 4 in ag 49
freeblk count 3 != flcount 4 in ag 52
freeblk count 3 != flcount 4 in ag 56
freeblk count 3 != flcount 4 in ag 51
freeblk count 3 != flcount 4 in ag 48
freeblk count 3 != flcount 4 in ag 57
freeblk count 3 != flcount 4 in ag 54
freeblk count 3 != flcount 4 in ag 50
freeblk count 3 != flcount 4 in ag 58
freeblk count 3 != flcount 4 in ag 55
freeblk count 3 != flcount 4 in ag 59
freeblk count 3 != flcount 4 in ag 61
freeblk count 3 != flcount 4 in ag 62
freeblk count 3 != flcount 4 in ag 63
freeblk count 3 != flcount 4 in ag 60
freeblk count 3 != flcount 4 in ag 70
freeblk count 3 != flcount 4 in ag 65
freeblk count 3 != flcount 4 in ag 66
freeblk count 3 != flcount 4 in ag 71
freeblk count 3 != flcount 4 in ag 68
freeblk count 3 != flcount 4 in ag 69
freeblk count 3 != flcount 4 in ag 64
freeblk count 3 != flcount 4 in ag 73
freeblk count 3 != flcount 4 in ag 67
freeblk count 3 != flcount 4 in ag 78
freeblk count 3 != flcount 4 in ag 76
freeblk count 3 != flcount 4 in ag 72
freeblk count 3 != flcount 4 in ag 80
freeblk count 3 != flcount 4 in ag 83
freeblk count 3 != flcount 4 in ag 84
freeblk count 3 != flcount 4 in ag 75
freeblk count 3 != flcount 4 in ag 74
freeblk count 3 != flcount 4 in ag 85
freeblk count 3 != flcount 4 in ag 79
freeblk count 3 != flcount 4 in ag 81
freeblk count 3 != flcount 4 in ag 77
freeblk count 3 != flcount 4 in ag 82
freeblk count 3 != flcount 4 in ag 88
freeblk count 3 != flcount 4 in ag 91
freeblk count 3 != flcount 4 in ag 86
freeblk count 3 != flcount 4 in ag 87
freeblk count 3 != flcount 4 in ag 93
freeblk count 3 != flcount 4 in ag 92
freeblk count 3 != flcount 4 in ag 90
freeblk count 3 != flcount 4 in ag 94
freeblk count 3 != flcount 4 in ag 89
freeblk count 3 != flcount 4 in ag 97
freeblk count 3 != flcount 4 in ag 95
freeblk count 3 != flcount 4 in ag 96
freeblk count 3 != flcount 4 in ag 98
freeblk count 3 != flcount 4 in ag 100
freeblk count 3 != flcount 4 in ag 99
freeblk count 3 != flcount 4 in ag 103
freeblk count 3 != flcount 4 in ag 101
freeblk count 3 != flcount 4 in ag 104
freeblk count 3 != flcount 4 in ag 102
freeblk count 3 != flcount 4 in ag 105
freeblk count 3 != flcount 4 in ag 110
freeblk count 3 != flcount 4 in ag 106
freeblk count 3 != flcount 4 in ag 113
freeblk count 3 != flcount 4 in ag 115
freeblk count 3 != flcount 4 in ag 111
freeblk count 3 != flcount 4 in ag 109
freeblk count 3 != flcount 4 in ag 107
freeblk count 3 != flcount 4 in ag 116
freeblk count 3 != flcount 4 in ag 114
freeblk count 3 != flcount 4 in ag 108
freeblk count 3 != flcount 4 in ag 117
freeblk count 3 != flcount 4 in ag 124
freeblk count 3 != flcount 4 in ag 119
freeblk count 3 != flcount 4 in ag 118
freeblk count 3 != flcount 4 in ag 120
freeblk count 3 != flcount 4 in ag 121
freeblk count 3 != flcount 4 in ag 122
freeblk count 3 != flcount 4 in ag 126
freeblk count 3 != flcount 4 in ag 125
freeblk count 3 != flcount 4 in ag 127
freeblk count 3 != flcount 4 in ag 131
freeblk count 3 != flcount 4 in ag 123
freeblk count 3 != flcount 4 in ag 130
freeblk count 3 != flcount 4 in ag 133
freeblk count 3 != flcount 4 in ag 128
freeblk count 3 != flcount 4 in ag 137
freeblk count 3 != flcount 4 in ag 136
freeblk count 3 != flcount 4 in ag 138
freeblk count 3 != flcount 4 in ag 135
freeblk count 3 != flcount 4 in ag 132
freeblk count 3 != flcount 4 in ag 139
freeblk count 3 != flcount 4 in ag 134
freeblk count 3 != flcount 4 in ag 129
freeblk count 3 != flcount 4 in ag 145
freeblk count 3 != flcount 4 in ag 147
freeblk count 3 != flcount 4 in ag 144
freeblk count 3 != flcount 4 in ag 140
freeblk count 3 != flcount 4 in ag 149
freeblk count 3 != flcount 4 in ag 143
freeblk count 3 != flcount 4 in ag 142
freeblk count 3 != flcount 4 in ag 141
freeblk count 3 != flcount 4 in ag 152
freeblk count 3 != flcount 4 in ag 148
freeblk count 3 != flcount 4 in ag 151
freeblk count 3 != flcount 4 in ag 146
freeblk count 3 != flcount 4 in ag 154
freeblk count 3 != flcount 4 in ag 150
freeblk count 3 != flcount 4 in ag 155
freeblk count 3 != flcount 4 in ag 157
freeblk count 3 != flcount 4 in ag 156
freeblk count 3 != flcount 4 in ag 160
freeblk count 3 != flcount 4 in ag 158
freeblk count 3 != flcount 4 in ag 153
freeblk count 3 != flcount 4 in ag 159
sb_ifree 667, counted 615
sb_fdblocks 3930698117, counted 1111093367
- 15:06:10: scanning filesystem freespace - 193 of 193
allocation groups done
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- 15:06:10: scanning agi unlinked lists - 193 of 193
allocation groups done
- process known inodes and perform inode discovery...
- agno = 0
- agno = 30
- agno = 60
- agno = 15
- agno = 75
- agno = 45
- agno = 150
- agno = 31
- agno = 105
- agno = 61
- agno = 46
- agno = 135
- agno = 76
- agno = 120
- agno = 151
- agno = 90
- agno = 180
- agno = 165
- agno = 91
- agno = 32
- agno = 62
- agno = 106
- agno = 121
- agno = 166
- agno = 136
- agno = 47
- agno = 92
- agno = 16
- agno = 1
- agno = 181
- agno = 33
- agno = 167
- agno = 93
- agno = 122
- agno = 63
- agno = 152
- agno = 77
- agno = 2
- agno = 168
- agno = 48
- agno = 34
- agno = 182
- agno = 123
- agno = 107
- agno = 78
- agno = 153
- agno = 17
- agno = 3
- agno = 137
- agno = 35
- agno = 64
- agno = 79
- agno = 169
- agno = 138
- agno = 183
- agno = 108
- agno = 94
- agno = 65
- agno = 124
- agno = 36
- agno = 154
- agno = 80
- agno = 49
- agno = 184
- agno = 109
- agno = 37
- agno = 95
- agno = 125
- agno = 170
- agno = 50
- agno = 66
- agno = 81
- agno = 110
- agno = 139
- agno = 155
- agno = 185
- agno = 96
- agno = 126
- agno = 38
- agno = 171
- agno = 127
- agno = 140
- agno = 97
- agno = 51
- agno = 128
- agno = 67
- agno = 111
- agno = 82
- agno = 156
- agno = 186
- agno = 141
- agno = 172
- agno = 98
- agno = 129
- agno = 52
- agno = 130
- agno = 39
- agno = 142
- agno = 173
- agno = 53
- agno = 99
- agno = 157
- agno = 68
- agno = 112
- agno = 40
- agno = 83
- agno = 131
- agno = 54
- agno = 187
- agno = 113
- agno = 69
- agno = 143
- agno = 174
- agno = 84
- agno = 158
- agno = 41
- agno = 100
- agno = 159
- agno = 175
- agno = 144
- agno = 188
- agno = 42
- agno = 85
- agno = 70
- agno = 132
- agno = 101
- agno = 114
- agno = 55
- agno = 176
- agno = 160
- agno = 145
- agno = 86
- agno = 71
- agno = 189
- agno = 102
- agno = 43
- agno = 115
- agno = 161
- agno = 177
- agno = 56
- agno = 133
- agno = 72
- agno = 103
- agno = 44
- agno = 87
- agno = 190
- agno = 134
- agno = 73
- agno = 162
- agno = 116
- agno = 57
- agno = 146
- agno = 74
- agno = 191
- agno = 104
- agno = 117
- agno = 147
- agno = 163
- agno = 88
- agno = 178
- agno = 58
- agno = 192
- agno = 118
- agno = 148
- agno = 89
- agno = 164
- agno = 59
- agno = 179
- agno = 119
- agno = 149
- agno = 4
- agno = 5
- agno = 18
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- 15:10:19: process known inodes and inode discovery - 896 of
896 inodes done
- process newly discovered inodes...
- 15:10:19: process newly discovered inodes - 193 of 193
allocation groups done
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- 15:10:19: setting up duplicate extent list - 193 of 193
allocation groups done
- check for inodes claiming duplicate blocks...
- agno = 1
- agno = 0
- agno = 2
- agno = 3
- agno = 9
- agno = 12
- agno = 6
- agno = 8
- agno = 4
- agno = 10
- agno = 7
- agno = 13
- agno = 31
- agno = 29
- agno = 35
- agno = 38
- agno = 40
- agno = 42
- agno = 43
- agno = 45
- agno = 18
- agno = 19
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 26
- agno = 51
- agno = 52
- agno = 25
- agno = 53
- agno = 54
- agno = 55
- agno = 32
- agno = 14
- agno = 33
- agno = 61
- agno = 62
- agno = 30
- agno = 5
- agno = 65
- agno = 66
- agno = 37
- agno = 16
- agno = 39
- agno = 68
- agno = 69
- agno = 70
- agno = 17
- agno = 44
- agno = 73
- agno = 74
- agno = 75
- agno = 76
- agno = 77
- agno = 78
- agno = 79
- agno = 50
- agno = 27
- agno = 81
- agno = 28
- agno = 11
- agno = 34
- agno = 85
- agno = 86
- agno = 59
- agno = 87
- agno = 60
- agno = 63
- agno = 90
- agno = 91
- agno = 64
- agno = 93
- agno = 94
- agno = 95
- agno = 41
- agno = 71
- agno = 72
- agno = 36
- agno = 20
- agno = 46
- agno = 47
- agno = 102
- agno = 48
- agno = 104
- agno = 105
- agno = 106
- agno = 108
- agno = 84
- agno = 110
- agno = 56
- agno = 112
- agno = 113
- agno = 115
- agno = 116
- agno = 117
- agno = 118
- agno = 96
- agno = 97
- agno = 98
- agno = 99
- agno = 100
- agno = 101
- agno = 103
- agno = 49
- agno = 126
- agno = 127
- agno = 107
- agno = 82
- agno = 83
- agno = 131
- agno = 109
- agno = 133
- agno = 111
- agno = 134
- agno = 57
- agno = 58
- agno = 89
- agno = 92
- agno = 67
- agno = 142
- agno = 143
- agno = 119
- agno = 120
- agno = 145
- agno = 146
- agno = 123
- agno = 148
- agno = 149
- agno = 150
- agno = 151
- agno = 152
- agno = 129
- agno = 154
- agno = 130
- agno = 157
- agno = 135
- agno = 136
- agno = 88
- agno = 114
- agno = 137
- agno = 162
- agno = 164
- agno = 139
- agno = 140
- agno = 166
- agno = 141
- agno = 168
- agno = 122
- agno = 171
- agno = 124
- agno = 173
- agno = 174
- agno = 128
- agno = 176
- agno = 177
- agno = 155
- agno = 156
- agno = 132
- agno = 180
- agno = 158
- agno = 159
- agno = 160
- agno = 161
- agno = 186
- agno = 138
- agno = 188
- agno = 189
- agno = 15
- agno = 144
- agno = 191
- agno = 192
- agno = 170
- agno = 147
- agno = 172
- agno = 125
- agno = 80
- agno = 175
- agno = 153
- agno = 178
- agno = 179
- agno = 181
- agno = 182
- agno = 183
- agno = 184
- agno = 185
- agno = 163
- agno = 187
- agno = 165
- agno = 167
- agno = 190
- agno = 169
- agno = 121
- 15:10:22: check for inodes claiming duplicate blocks - 896
of 896 inodes done
Phase 5 - rebuild AG headers and trees...
- 15:10:22: rebuild AG headers and trees - 193 of 193
allocation groups done
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (792:1294080) is ahead of log (1:64).
Format log to cycle 795.
done
root@roc01r-scd224:~# mount -t xfs /dev/rbd0 /mnt
No other errors in logs, Ceph or hardware.
Thank you in advance!
--
Alex Gorbachev
Storcium
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: XFS on RBD crash
2017-12-09 21:01 XFS on RBD crash Alex Gorbachev
@ 2017-12-09 21:04 ` Alex Gorbachev
2017-12-11 21:06 ` Dave Chinner
1 sibling, 0 replies; 3+ messages in thread
From: Alex Gorbachev @ 2017-12-09 21:04 UTC (permalink / raw)
To: linux-xfs
On Sat, Dec 9, 2017 at 4:01 PM, Alex Gorbachev <ag@iss-integration.com> wrote:
> I have experienced a crash today (in a sense of filesystem going
> offline) of a 25TB XFS filesystem. Tried searching the list and
> google, and not much specific info I can use, so very much appreciate
> any insight:
>
> System: Ubuntu 16.04, kernel 4.10.17-041017-generic
>
> Mount info:
>
> /dev/rbd0 on /srv/exports/sclun63 type xfs
> (rw,relatime,attr2,inode64,logbsize=256k,sunit=8192,swidth=8192,noquota)
>
> Failure trace:
>
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.769776] XFS (rbd1):
> Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file
> /home/kernel/COD/linux/fs/xfs/libxfs/xfs_btree.c. Caller
> xfs_free_ag_extent+0x352/0x740 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.769890] CPU: 20 PID:
> 724783 Comm: kworker/20:4 Tainted: G OE
> 4.10.17-041017-generic #201705201051
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.769963] Hardware name:
> Dell Inc. Precision Rack 7910/01J90F, BIOS 1.1.4 11/04/2014
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770058] Workqueue:
> xfs-conv/rbd1 xfs_end_io [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770098] Call Trace:
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770136] dump_stack+0x63/0x81
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770198]
> xfs_error_report+0x3c/0x40 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770257] ?
> xfs_free_ag_extent+0x352/0x740 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770317]
> xfs_btree_insert+0x16d/0x1c0 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770375]
> xfs_free_ag_extent+0x352/0x740 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770433]
> xfs_free_extent+0xa8/0x140 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770501]
> xfs_trans_free_extent+0x43/0x110 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770568]
> xfs_extent_free_finish_item+0x26/0x40 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770630]
> xfs_defer_finish+0x151/0x430 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770694]
> xfs_iomap_write_unwritten+0xd3/0x2e0 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770737] ?
> propagate_entity_cfs_rq.isra.66+0x271/0xa30
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770802]
> xfs_end_io+0xa0/0xb0 [xfs]
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770840]
> process_one_work+0x1fc/0x4b0
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770877] worker_thread+0x4b/0x500
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770914] kthread+0x109/0x140
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770949] ?
> process_one_work+0x4b0/0x4b0
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.770987] ?
> kthread_create_on_node+0x60/0x60
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.771028] ret_from_fork+0x2c/0x40
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.771093] XFS (rbd1):
> xfs_do_force_shutdown(0x8) called from line 236 of file
> /home/kernel/COD/linux/fs/xfs/libxfs/xfs_defer.c. Return address =
> 0xffffffffc0f0d366
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772151] XFS (rbd1):
> Corruption of in-memory data detected. Shutting down filesystem
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772219] XFS (rbd1):
> Please umount the filesystem and rectify the problem(s)
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772292] Buffer I/O
> error on dev rbd1, logical block 5078005778, lost async page write
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.772362] Buffer I/O
> error on dev rbd1, logical block 5078005779, lost async page write
> Dec 9 14:05:45 roc01r-scd224 kernel: [3455811.777755] Buffer I/O
> error on dev rbd1, logical block 5078005780, lost async page write
>
> xfs_repair (had to do -L):
>
> root@roc01r-scd224:~# xfs_repair -L /dev/rbd0
> Phase 1 - find and verify superblock...
> - reporting progress in intervals of 15 minutes
> Phase 2 - using internal log
> - zero log...
> ALERT: The filesystem has valuable metadata changes in a log which is being
> destroyed because the -L option was used.
> - scan filesystem freespace and inode maps...
> freeblk count 3 != flcount 4 in ag 47
> freeblk count 3 != flcount 4 in ag 45
> freeblk count 3 != flcount 4 in ag 46
> freeblk count 3 != flcount 4 in ag 53
> freeblk count 3 != flcount 4 in ag 49
> freeblk count 3 != flcount 4 in ag 52
> freeblk count 3 != flcount 4 in ag 56
> freeblk count 3 != flcount 4 in ag 51
> freeblk count 3 != flcount 4 in ag 48
> freeblk count 3 != flcount 4 in ag 57
> freeblk count 3 != flcount 4 in ag 54
> freeblk count 3 != flcount 4 in ag 50
> freeblk count 3 != flcount 4 in ag 58
> freeblk count 3 != flcount 4 in ag 55
> freeblk count 3 != flcount 4 in ag 59
> freeblk count 3 != flcount 4 in ag 61
> freeblk count 3 != flcount 4 in ag 62
> freeblk count 3 != flcount 4 in ag 63
> freeblk count 3 != flcount 4 in ag 60
> freeblk count 3 != flcount 4 in ag 70
> freeblk count 3 != flcount 4 in ag 65
> freeblk count 3 != flcount 4 in ag 66
> freeblk count 3 != flcount 4 in ag 71
> freeblk count 3 != flcount 4 in ag 68
> freeblk count 3 != flcount 4 in ag 69
> freeblk count 3 != flcount 4 in ag 64
> freeblk count 3 != flcount 4 in ag 73
> freeblk count 3 != flcount 4 in ag 67
> freeblk count 3 != flcount 4 in ag 78
> freeblk count 3 != flcount 4 in ag 76
> freeblk count 3 != flcount 4 in ag 72
> freeblk count 3 != flcount 4 in ag 80
> freeblk count 3 != flcount 4 in ag 83
> freeblk count 3 != flcount 4 in ag 84
> freeblk count 3 != flcount 4 in ag 75
> freeblk count 3 != flcount 4 in ag 74
> freeblk count 3 != flcount 4 in ag 85
> freeblk count 3 != flcount 4 in ag 79
> freeblk count 3 != flcount 4 in ag 81
> freeblk count 3 != flcount 4 in ag 77
> freeblk count 3 != flcount 4 in ag 82
> freeblk count 3 != flcount 4 in ag 88
> freeblk count 3 != flcount 4 in ag 91
> freeblk count 3 != flcount 4 in ag 86
> freeblk count 3 != flcount 4 in ag 87
> freeblk count 3 != flcount 4 in ag 93
> freeblk count 3 != flcount 4 in ag 92
> freeblk count 3 != flcount 4 in ag 90
> freeblk count 3 != flcount 4 in ag 94
> freeblk count 3 != flcount 4 in ag 89
> freeblk count 3 != flcount 4 in ag 97
> freeblk count 3 != flcount 4 in ag 95
> freeblk count 3 != flcount 4 in ag 96
> freeblk count 3 != flcount 4 in ag 98
> freeblk count 3 != flcount 4 in ag 100
> freeblk count 3 != flcount 4 in ag 99
> freeblk count 3 != flcount 4 in ag 103
> freeblk count 3 != flcount 4 in ag 101
> freeblk count 3 != flcount 4 in ag 104
> freeblk count 3 != flcount 4 in ag 102
> freeblk count 3 != flcount 4 in ag 105
> freeblk count 3 != flcount 4 in ag 110
> freeblk count 3 != flcount 4 in ag 106
> freeblk count 3 != flcount 4 in ag 113
> freeblk count 3 != flcount 4 in ag 115
> freeblk count 3 != flcount 4 in ag 111
> freeblk count 3 != flcount 4 in ag 109
> freeblk count 3 != flcount 4 in ag 107
> freeblk count 3 != flcount 4 in ag 116
> freeblk count 3 != flcount 4 in ag 114
> freeblk count 3 != flcount 4 in ag 108
> freeblk count 3 != flcount 4 in ag 117
> freeblk count 3 != flcount 4 in ag 124
> freeblk count 3 != flcount 4 in ag 119
> freeblk count 3 != flcount 4 in ag 118
> freeblk count 3 != flcount 4 in ag 120
> freeblk count 3 != flcount 4 in ag 121
> freeblk count 3 != flcount 4 in ag 122
> freeblk count 3 != flcount 4 in ag 126
> freeblk count 3 != flcount 4 in ag 125
> freeblk count 3 != flcount 4 in ag 127
> freeblk count 3 != flcount 4 in ag 131
> freeblk count 3 != flcount 4 in ag 123
> freeblk count 3 != flcount 4 in ag 130
> freeblk count 3 != flcount 4 in ag 133
> freeblk count 3 != flcount 4 in ag 128
> freeblk count 3 != flcount 4 in ag 137
> freeblk count 3 != flcount 4 in ag 136
> freeblk count 3 != flcount 4 in ag 138
> freeblk count 3 != flcount 4 in ag 135
> freeblk count 3 != flcount 4 in ag 132
> freeblk count 3 != flcount 4 in ag 139
> freeblk count 3 != flcount 4 in ag 134
> freeblk count 3 != flcount 4 in ag 129
> freeblk count 3 != flcount 4 in ag 145
> freeblk count 3 != flcount 4 in ag 147
> freeblk count 3 != flcount 4 in ag 144
> freeblk count 3 != flcount 4 in ag 140
> freeblk count 3 != flcount 4 in ag 149
> freeblk count 3 != flcount 4 in ag 143
> freeblk count 3 != flcount 4 in ag 142
> freeblk count 3 != flcount 4 in ag 141
> freeblk count 3 != flcount 4 in ag 152
> freeblk count 3 != flcount 4 in ag 148
> freeblk count 3 != flcount 4 in ag 151
> freeblk count 3 != flcount 4 in ag 146
> freeblk count 3 != flcount 4 in ag 154
> freeblk count 3 != flcount 4 in ag 150
> freeblk count 3 != flcount 4 in ag 155
> freeblk count 3 != flcount 4 in ag 157
> freeblk count 3 != flcount 4 in ag 156
> freeblk count 3 != flcount 4 in ag 160
> freeblk count 3 != flcount 4 in ag 158
> freeblk count 3 != flcount 4 in ag 153
> freeblk count 3 != flcount 4 in ag 159
> sb_ifree 667, counted 615
> sb_fdblocks 3930698117, counted 1111093367
> - 15:06:10: scanning filesystem freespace - 193 of 193
> allocation groups done
> - found root inode chunk
> Phase 3 - for each AG...
> - scan and clear agi unlinked lists...
> - 15:06:10: scanning agi unlinked lists - 193 of 193
> allocation groups done
> - process known inodes and perform inode discovery...
> - agno = 0
> - agno = 30
> - agno = 60
> - agno = 15
> - agno = 75
> - agno = 45
> - agno = 150
> - agno = 31
> - agno = 105
> - agno = 61
> - agno = 46
> - agno = 135
> - agno = 76
> - agno = 120
> - agno = 151
> - agno = 90
> - agno = 180
> - agno = 165
> - agno = 91
> - agno = 32
> - agno = 62
> - agno = 106
> - agno = 121
> - agno = 166
> - agno = 136
> - agno = 47
> - agno = 92
> - agno = 16
> - agno = 1
> - agno = 181
> - agno = 33
> - agno = 167
> - agno = 93
> - agno = 122
> - agno = 63
> - agno = 152
> - agno = 77
> - agno = 2
> - agno = 168
> - agno = 48
> - agno = 34
> - agno = 182
> - agno = 123
> - agno = 107
> - agno = 78
> - agno = 153
> - agno = 17
> - agno = 3
> - agno = 137
> - agno = 35
> - agno = 64
> - agno = 79
> - agno = 169
> - agno = 138
> - agno = 183
> - agno = 108
> - agno = 94
> - agno = 65
> - agno = 124
> - agno = 36
> - agno = 154
> - agno = 80
> - agno = 49
> - agno = 184
> - agno = 109
> - agno = 37
> - agno = 95
> - agno = 125
> - agno = 170
> - agno = 50
> - agno = 66
> - agno = 81
> - agno = 110
> - agno = 139
> - agno = 155
> - agno = 185
> - agno = 96
> - agno = 126
> - agno = 38
> - agno = 171
> - agno = 127
> - agno = 140
> - agno = 97
> - agno = 51
> - agno = 128
> - agno = 67
> - agno = 111
> - agno = 82
> - agno = 156
> - agno = 186
> - agno = 141
> - agno = 172
> - agno = 98
> - agno = 129
> - agno = 52
> - agno = 130
> - agno = 39
> - agno = 142
> - agno = 173
> - agno = 53
> - agno = 99
> - agno = 157
> - agno = 68
> - agno = 112
> - agno = 40
> - agno = 83
> - agno = 131
> - agno = 54
> - agno = 187
> - agno = 113
> - agno = 69
> - agno = 143
> - agno = 174
> - agno = 84
> - agno = 158
> - agno = 41
> - agno = 100
> - agno = 159
> - agno = 175
> - agno = 144
> - agno = 188
> - agno = 42
> - agno = 85
> - agno = 70
> - agno = 132
> - agno = 101
> - agno = 114
> - agno = 55
> - agno = 176
> - agno = 160
> - agno = 145
> - agno = 86
> - agno = 71
> - agno = 189
> - agno = 102
> - agno = 43
> - agno = 115
> - agno = 161
> - agno = 177
> - agno = 56
> - agno = 133
> - agno = 72
> - agno = 103
> - agno = 44
> - agno = 87
> - agno = 190
> - agno = 134
> - agno = 73
> - agno = 162
> - agno = 116
> - agno = 57
> - agno = 146
> - agno = 74
> - agno = 191
> - agno = 104
> - agno = 117
> - agno = 147
> - agno = 163
> - agno = 88
> - agno = 178
> - agno = 58
> - agno = 192
> - agno = 118
> - agno = 148
> - agno = 89
> - agno = 164
> - agno = 59
> - agno = 179
> - agno = 119
> - agno = 149
> - agno = 4
> - agno = 5
> - agno = 18
> - agno = 6
> - agno = 7
> - agno = 8
> - agno = 9
> - agno = 10
> - agno = 11
> - agno = 12
> - agno = 13
> - agno = 14
> - agno = 19
> - agno = 20
> - agno = 21
> - agno = 22
> - agno = 23
> - agno = 24
> - agno = 25
> - agno = 26
> - agno = 27
> - agno = 28
> - agno = 29
> - 15:10:19: process known inodes and inode discovery - 896 of
> 896 inodes done
> - process newly discovered inodes...
> - 15:10:19: process newly discovered inodes - 193 of 193
> allocation groups done
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - 15:10:19: setting up duplicate extent list - 193 of 193
> allocation groups done
> - check for inodes claiming duplicate blocks...
> - agno = 1
> - agno = 0
> - agno = 2
> - agno = 3
> - agno = 9
> - agno = 12
> - agno = 6
> - agno = 8
> - agno = 4
> - agno = 10
> - agno = 7
> - agno = 13
> - agno = 31
> - agno = 29
> - agno = 35
> - agno = 38
> - agno = 40
> - agno = 42
> - agno = 43
> - agno = 45
> - agno = 18
> - agno = 19
> - agno = 21
> - agno = 22
> - agno = 23
> - agno = 24
> - agno = 26
> - agno = 51
> - agno = 52
> - agno = 25
> - agno = 53
> - agno = 54
> - agno = 55
> - agno = 32
> - agno = 14
> - agno = 33
> - agno = 61
> - agno = 62
> - agno = 30
> - agno = 5
> - agno = 65
> - agno = 66
> - agno = 37
> - agno = 16
> - agno = 39
> - agno = 68
> - agno = 69
> - agno = 70
> - agno = 17
> - agno = 44
> - agno = 73
> - agno = 74
> - agno = 75
> - agno = 76
> - agno = 77
> - agno = 78
> - agno = 79
> - agno = 50
> - agno = 27
> - agno = 81
> - agno = 28
> - agno = 11
> - agno = 34
> - agno = 85
> - agno = 86
> - agno = 59
> - agno = 87
> - agno = 60
> - agno = 63
> - agno = 90
> - agno = 91
> - agno = 64
> - agno = 93
> - agno = 94
> - agno = 95
> - agno = 41
> - agno = 71
> - agno = 72
> - agno = 36
> - agno = 20
> - agno = 46
> - agno = 47
> - agno = 102
> - agno = 48
> - agno = 104
> - agno = 105
> - agno = 106
> - agno = 108
> - agno = 84
> - agno = 110
> - agno = 56
> - agno = 112
> - agno = 113
> - agno = 115
> - agno = 116
> - agno = 117
> - agno = 118
> - agno = 96
> - agno = 97
> - agno = 98
> - agno = 99
> - agno = 100
> - agno = 101
> - agno = 103
> - agno = 49
> - agno = 126
> - agno = 127
> - agno = 107
> - agno = 82
> - agno = 83
> - agno = 131
> - agno = 109
> - agno = 133
> - agno = 111
> - agno = 134
> - agno = 57
> - agno = 58
> - agno = 89
> - agno = 92
> - agno = 67
> - agno = 142
> - agno = 143
> - agno = 119
> - agno = 120
> - agno = 145
> - agno = 146
> - agno = 123
> - agno = 148
> - agno = 149
> - agno = 150
> - agno = 151
> - agno = 152
> - agno = 129
> - agno = 154
> - agno = 130
> - agno = 157
> - agno = 135
> - agno = 136
> - agno = 88
> - agno = 114
> - agno = 137
> - agno = 162
> - agno = 164
> - agno = 139
> - agno = 140
> - agno = 166
> - agno = 141
> - agno = 168
> - agno = 122
> - agno = 171
> - agno = 124
> - agno = 173
> - agno = 174
> - agno = 128
> - agno = 176
> - agno = 177
> - agno = 155
> - agno = 156
> - agno = 132
> - agno = 180
> - agno = 158
> - agno = 159
> - agno = 160
> - agno = 161
> - agno = 186
> - agno = 138
> - agno = 188
> - agno = 189
> - agno = 15
> - agno = 144
> - agno = 191
> - agno = 192
> - agno = 170
> - agno = 147
> - agno = 172
> - agno = 125
> - agno = 80
> - agno = 175
> - agno = 153
> - agno = 178
> - agno = 179
> - agno = 181
> - agno = 182
> - agno = 183
> - agno = 184
> - agno = 185
> - agno = 163
> - agno = 187
> - agno = 165
> - agno = 167
> - agno = 190
> - agno = 169
> - agno = 121
> - 15:10:22: check for inodes claiming duplicate blocks - 896
> of 896 inodes done
> Phase 5 - rebuild AG headers and trees...
> - 15:10:22: rebuild AG headers and trees - 193 of 193
> allocation groups done
> - reset superblock...
> Phase 6 - check inode connectivity...
> - resetting contents of realtime bitmap and summary inodes
> - traversing filesystem ...
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> Maximum metadata LSN (792:1294080) is ahead of log (1:64).
> Format log to cycle 795.
> done
> root@roc01r-scd224:~# mount -t xfs /dev/rbd0 /mnt
>
>
> No other errors in logs, Ceph or hardware.
>
>
> Thank you in advance!
Forgot to include xfs_info output:
xfs_info /srv/exports/sclun63
meta-data=/dev/rbd0 isize=512 agcount=193, agsize=33553408 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1 spinodes=0
data = bsize=4096 blocks=6442450944, imaxpct=5
= sunit=1024 swidth=1024 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: XFS on RBD crash
2017-12-09 21:01 XFS on RBD crash Alex Gorbachev
2017-12-09 21:04 ` Alex Gorbachev
@ 2017-12-11 21:06 ` Dave Chinner
1 sibling, 0 replies; 3+ messages in thread
From: Dave Chinner @ 2017-12-11 21:06 UTC (permalink / raw)
To: Alex Gorbachev; +Cc: linux-xfs
On Sat, Dec 09, 2017 at 04:01:34PM -0500, Alex Gorbachev wrote:
> I have experienced a crash today (in a sense of filesystem going
> offline) of a 25TB XFS filesystem. Tried searching the list and
> google, and not much specific info I can use, so very much appreciate
> any insight:
freespace tree corruption. No idea what the cause might have been.
> System: Ubuntu 16.04, kernel 4.10.17-041017-generic
>
> Mount info:
>
> /dev/rbd0 on /srv/exports/sclun63 type xfs
> (rw,relatime,attr2,inode64,logbsize=256k,sunit=8192,swidth=8192,noquota)
^^^^^^^^^^^^^^^^^^^^^^
Why?
> xfs_repair (had to do -L):
Because the corrupted metadata was in the log, causing mount to
fail, and that's why you zeroed the log?
> root@roc01r-scd224:~# xfs_repair -L /dev/rbd0
> Phase 1 - find and verify superblock...
> - reporting progress in intervals of 15 minutes
> Phase 2 - using internal log
> - zero log...
> ALERT: The filesystem has valuable metadata changes in a log which is being
> destroyed because the -L option was used.
> - scan filesystem freespace and inode maps...
> freeblk count 3 != flcount 4 in ag 47
Those are from trashing the log, I think.
> sb_ifree 667, counted 615
> sb_fdblocks 3930698117, counted 1111093367
That's a major discrepancy - superblock said ~16TB free, repair
counted only 5TB free. And the inode count is off, too - had you
been removing files recently?
> - 15:06:10: scanning filesystem freespace - 193 of 193
Lots of AGs for a small filesystem - this filesystem has been grown
several times?
> - 15:10:22: check for inodes claiming duplicate blocks - 896
> of 896 inodes done
Only ~900 files in the filesystem?
> No other errors in logs, Ceph or hardware.
And no error reported from xfs_repair, either. So the corruption
occurred in memory and was captured by the log, which you zeroed
to run xfs_repair.
So there's really nothing left for us to analyse and debug.....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-12-11 21:06 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-09 21:01 XFS on RBD crash Alex Gorbachev
2017-12-09 21:04 ` Alex Gorbachev
2017-12-11 21:06 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox