public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Re: xfs resize: primary superblock is not updated immediately
       [not found] <3685DFAD20214109878873CF81232704@alyakaslap>
@ 2016-02-22 21:20 ` Dave Chinner
  2016-02-22 22:38   ` Alex Lyakas
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2016-02-22 21:20 UTC (permalink / raw)
  To: Alex Lyakas; +Cc: hch, Danny Shavit, xfs

On Mon, Feb 22, 2016 at 09:08:06PM +0200, Alex Lyakas wrote:
> Greetings XFS developers,
> 
> I am seeing the following issue with XFS on kernel 3.18.19.
> 
> When resizing, XFS adds new AGs and eventually updates the primary
> superblock with the new “sb_agcount” value. However, it happens few
> seconds after the resize operation completes back to user-space. As
> a result, if a block-level snapshot is taken off the underlying
> block device, while “sb_agcount” still has the old value, then
> subsequent XFS mount crashes with stack like[1].

The primary superblock change is logged, so it doesn't need to be
written back immediately. That means it is in the journal...

> Some debugging shows that _xfs_buf_find is called with agno that has
> been added during the resize, but appropriate "pag" has not been
> created for this agno during mount.

The new per-ag structures are created during growfs, after the
growfs transaction has committed. if you are mounting a snapshot
that has the wrong agcount in it, then lots of things will go wrong
if there is metadata that already uses the expanded space.

> I have found the patch by Christoph Hellwig:
> http://oss.sgi.com/archives/xfs/2015-01/msg00391.html
> which sets the resize transaction to be synchronous, and applied it,
> but it still doesn’t help.
> 
> Right after the resize completes, I am issuing:
> xfs_db -r -c "sb 0" -c "p"   <device>
> and for a few seconds still get the old value of “sb_agcount”.
> 
> Can anybody advise what am I missing? What needs to be done so that
> the primary superblock will get the new value of “sb_agount”
> promptly?

Are you freezing the filesystem before taking a block level
snapshot?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-02-22 21:20 ` xfs resize: primary superblock is not updated immediately Dave Chinner
@ 2016-02-22 22:38   ` Alex Lyakas
  2016-02-22 23:56     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Lyakas @ 2016-02-22 22:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Danny Shavit, xfs

Hi Dave,
Thanks for your response.

I am not freezing the filesystem before the snapshot.

However, let's assume that somebody resized the XFS, and it completed
and got back to user-space. At this moment the primary superblock
on-disk is not updated yet with the new agcount. And at this same
moment there is a power-out. After the power comes back and the
machine boots, if we mount the XFS, the same problem would happen, I
believe. Because the primary superblock on-disk still has old agcount.
So the in-memory pag structures will not be created for the new AGs
during mount, but replaying the log might try to use them.

Taking a block-level snapshot is exactly like a power-out from XFS
perspective. And XFS should, in principle, be able to recover from
that. The snapshot will come up as a new block device, which exhibits
identical content as the original block device had at the moment when
the snapshot was taken (like a boot after power-out).

I will try to reproduce the problem by crashing the machine at the
problematic moment, when the primary on-disk superblock still has the
old value. Without the snapshot thing.

Thanks,
Alex.




On Mon, Feb 22, 2016 at 11:20 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Feb 22, 2016 at 09:08:06PM +0200, Alex Lyakas wrote:
>> Greetings XFS developers,
>>
>> I am seeing the following issue with XFS on kernel 3.18.19.
>>
>> When resizing, XFS adds new AGs and eventually updates the primary
>> superblock with the new “sb_agcount” value. However, it happens few
>> seconds after the resize operation completes back to user-space. As
>> a result, if a block-level snapshot is taken off the underlying
>> block device, while “sb_agcount” still has the old value, then
>> subsequent XFS mount crashes with stack like[1].
>
> The primary superblock change is logged, so it doesn't need to be
> written back immediately. That means it is in the journal...
>
>> Some debugging shows that _xfs_buf_find is called with agno that has
>> been added during the resize, but appropriate "pag" has not been
>> created for this agno during mount.
>
> The new per-ag structures are created during growfs, after the
> growfs transaction has committed. if you are mounting a snapshot
> that has the wrong agcount in it, then lots of things will go wrong
> if there is metadata that already uses the expanded space.
>
>> I have found the patch by Christoph Hellwig:
>> http://oss.sgi.com/archives/xfs/2015-01/msg00391.html
>> which sets the resize transaction to be synchronous, and applied it,
>> but it still doesn’t help.
>>
>> Right after the resize completes, I am issuing:
>> xfs_db -r -c "sb 0" -c "p"   <device>
>> and for a few seconds still get the old value of “sb_agcount”.
>>
>> Can anybody advise what am I missing? What needs to be done so that
>> the primary superblock will get the new value of “sb_agount”
>> promptly?
>
> Are you freezing the filesystem before taking a block level
> snapshot?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com<div id="DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br />
<table style="border-top: 1px solid #aaabb6;">
	<tr>
		<td style="width: 470px; padding-top: 20px; color: #41424e;
font-size: 13px; font-family: Arial, Helvetica, sans-serif;
line-height: 18px;">This email has been sent from a virus-free
computer protected by Avast. <br /><a
href="https://www.avast.com/sig-email" target="_blank" style="color:
#4453ea;">www.avast.com</a>
		</td>
	</tr>
</table><a href="#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1"
height="1"></a></div>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-02-22 22:38   ` Alex Lyakas
@ 2016-02-22 23:56     ` Dave Chinner
  2016-02-23 12:25       ` Alex Lyakas
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2016-02-22 23:56 UTC (permalink / raw)
  To: Alex Lyakas; +Cc: Christoph Hellwig, Danny Shavit, xfs

On Tue, Feb 23, 2016 at 12:38:48AM +0200, Alex Lyakas wrote:
> Hi Dave,
> Thanks for your response.
> 
> I am not freezing the filesystem before the snapshot.

There's your problem. A mounted filesystem is not consistent on disk
without flushing the entire journal and all the dirty metadata to
disk.

> However, let's assume that somebody resized the XFS, and it completed
> and got back to user-space. At this moment the primary superblock
> on-disk is not updated yet with the new agcount. And at this same
> moment there is a power-out. After the power comes back and the
> machine boots, if we mount the XFS, the same problem would happen, I
> believe.

Log recovery will run and update the superblock buffer with the correct
values. But the in-memory superblock that log recoery is working
with does not change, and so if there were accesses beyond the
current superblock ag/block count you'd see messages like this:

XFS (sda1): _xfs_buf_find: Block out of range: block 0xnnnnn EOFS 0xmmmmm

and log recovery should fail at that point because it can't pull in
a buffer it needs for recovery to make further progress. At which
point, you have an unmountable filesystem.

If log recovery succeeds, then yes, I can see that there is a
problem here because the per-ag tree is not reinitialised after the
superblock is re-read. That's a pretty easy fix, though (3-4 lines
of code in xlog_do_recover() to detect a change in filesystem block
count and call xfs_initialize_perag() again.

> Taking a block-level snapshot is exactly like a power-out from XFS
> perspective.

It's similar, but it's not the same. e.g. there are no issues like
volatile storage cache loss that have to be handled.

> And XFS should, in principle, be able to recover from
> that.

For some definition of recover. There is no guarantee that any of
the async transactions in memory will make it to disk, so the point
to which XFS can recover is undefined.

> The snapshot will come up as a new block device, which exhibits
> identical content as the original block device had at the moment when
> the snapshot was taken (like a boot after power-out).

The block device might be identical, but it's not identical to what
the filesystem is presenting the user. Any user dirty data cached in
memory, or metadata changes staged in the CIL will not be in the
snapshot. Hence the snapshot block device is not identical to the
original user visible state and data. You only get that if you
freeze the filesystem before taking the snapshot.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-02-22 23:56     ` Dave Chinner
@ 2016-02-23 12:25       ` Alex Lyakas
  2016-02-23 22:59         ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Lyakas @ 2016-02-23 12:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Danny Shavit, Shyam Kaushik, Yair Hershko, xfs

Hi Dave,

Below is a detailed reproduction scenario of the problem. No snapshots 
involved, only XFS. The scenario is performed on a VM, running kernel 
3.18.19.

1) Use 100 MB block device for XFS. In my case, this is achieved by:
# dmsetup create xfs_base --table "0 204800 linear /dev/vdd 0"

2) Create XFS on the block device:
# mkfs.xfs -f -K /dev/mapper/xfs_base -d agsize=25690112  -l 
size=10485760 -p /etc/zadara/xfs.protofile
The protofile is [1].

Output:
meta-data=/dev/mapper/xfs_base   isize=256    agcount=4, agsize=6272 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

So we have 4 AGs right now. We are not using the full 100 Mb.

3) Mount the XFS:
# mount -o sync /dev/mapper/xfs_base /mnt/xfs/

4) Verify the primary superblock on disk:
# xfs_db -r -c "sb 0" -c "p"   /dev/mapper/xfs_base  | grep agc
agcount = 4

5) Resize to full 100MB:
# xfs_growfs -d /mnt/xfs

Output:
meta-data=/dev/mapper/xfs_base   isize=256    agcount=4, agsize=6272 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 25088 to 25600

6) Verify that primary superblock still have 4 AGs:
# xfs_db -r -c "sb 0" -c "p"   /dev/mapper/xfs_base  | grep agc
agcount = 4

7) Immediately crash the VM

8) After VM reboots, re-create the device mapper
# dmsetup create xfs_base --table "0 204800 linear /dev/vdd 0"

9) mount
# mount -o sync /dev/mapper/xfs_base /mnt/xfs/

Kernel panics with [2]. Note that I added some prints to xfs_perag_get() in 
case the pag is not found, and also to xfs_initialize_perag() when adding a 
new pag. The prints indicate, that after mount XFS did not create pag for 
agno 4. But during log mount/replay it needs this pag and crashes.

Does this repro align with what you expect currently?

Thanks,
Alex.


[1]
# cat /etc/zadara/xfs.protofile
dummy                   : bootfilename, not used, backward compatibility
0 0                             : numbers of blocks and inodes, not used, 
backward compatibility
d--777 0 0              : set 777 perms for the root dir
$
$

[2]
[   53.506307] [2392]xfs*[xfs_perag_get:130] XFS(dm-0): pag[0]: not found!
[   53.506323] [2392]xfs [xfs_initialize_perag:239] XFS(dm-0): Add pag[0]
[   53.506326] [2392]xfs*[xfs_perag_get:130] XFS(dm-0): pag[1]: not found!
[   53.506332] [2392]xfs [xfs_initialize_perag:239] XFS(dm-0): Add pag[1]
[   53.506336] [2392]xfs*[xfs_perag_get:130] XFS(dm-0): pag[2]: not found!
[   53.506348] [2392]xfs [xfs_initialize_perag:239] XFS(dm-0): Add pag[2]
[   53.506358] [2392]xfs*[xfs_perag_get:130] XFS(dm-0): pag[3]: not found!
[   53.506392] [2392]xfs [xfs_initialize_perag:239] XFS(dm-0): Add pag[3]
[   53.506397] XFS (dm-0): Mounting V4 Filesystem
[   53.562231] XFS (dm-0): Starting recovery (logdev: internal)
[   53.567501] [2392]xfs*[xfs_perag_get:130] XFS(dm-0): pag[4]: not found!
[   53.567574] BUG: unable to handle kernel NULL pointer dereference at 
00000000000000a0
[   53.568464] IP: [<ffffffff81717436>] _raw_spin_lock+0x16/0x60
[   53.568464] PGD 7b446067 PUD 35299067 PMD 0
[   53.568464] Oops: 0002 [#1] PREEMPT SMP
[   53.568464] CPU: 3 PID: 2392 Comm: mount Tainted: G           OE 
3.18.19-zadara05 #1
[   53.568464] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   53.568464] task: ffff88007b698a20 ti: ffff880076fdc000 task.ti: 
ffff880076fdc000
[   53.568464] RIP: 0010:[<ffffffff81717436>]  [<ffffffff81717436>] 
_raw_spin_lock+0x16/0x60
[   53.568464] RSP: 0018:ffff880076fdfa48  EFLAGS: 00010282
[   53.568464] RAX: 0000000000020000 RBX: ffff880035331900 RCX: 
0000000000000000
[   53.568464] RDX: ffff88007fd8f238 RSI: ffff88007fd8d318 RDI: 
00000000000000a0
[   53.568464] RBP: ffff880076fdfa48 R08: 0000000000000096 R09: 
0000000000000000
[   53.568464] R10: 00000000000002de R11: ffff880076fdf64e R12: 
0000000000000001
[   53.568464] R13: 0000000000000001 R14: 0000000000000000 R15: 
0000000000000000
[   53.568464] FS:  00007ff569ee7880(0000) GS:ffff88007fd80000(0000) 
knlGS:0000000000000000
[   53.568464] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.568464] CR2: 00000000000000a0 CR3: 0000000076f07000 CR4: 
00000000000406e0
[   53.568464] Stack:
[   53.568464]  ffff880076fdfa98 ffffffffc0989247 00000000000000a0 
0000000000031001
[   53.568464]  ffff880076fdfac8 ffff880035331900 0000000000000001 
0000000000000001
[   53.568464]  ffff880076fdfbb8 0000000000000001 ffff880076fdfae8 
ffffffffc09894ba
[   53.568464] Call Trace:
[   53.568464]  [<ffffffffc0989247>] _xfs_buf_find+0x97/0x2e0 [xfs]
[   53.568464]  [<ffffffffc09894ba>] xfs_buf_get_map+0x2a/0x210 [xfs]
[   53.568464]  [<ffffffffc098a1d3>] ? _xfs_buf_read+0x23/0x40 [xfs]
[   53.568464]  [<ffffffffc098a21c>] xfs_buf_read_map+0x2c/0x190 [xfs]
[   53.568464]  [<ffffffffc09bb969>] xfs_trans_read_buf_map+0x1e9/0x490 
[xfs]
[   53.568464]  [<ffffffffc094ae04>] xfs_read_agf+0x84/0x110 [xfs]
[   53.568464]  [<ffffffffc094aedb>] xfs_alloc_read_agf+0x4b/0x150 [xfs]
[   53.568464]  [<ffffffffc094affa>] xfs_alloc_pagf_init+0x1a/0x40 [xfs]
[   53.568464]  [<ffffffffc097e690>] xfs_initialize_perag_data+0xa0/0x120 
[xfs]
[   53.568464]  [<ffffffffc09a4972>] xfs_mountfs+0x5d2/0x7b0 [xfs]
[   53.568464]  [<ffffffffc09a810a>] xfs_fs_fill_super+0x2ca/0x360 [xfs]
[   53.568464]  [<ffffffff811eb220>] mount_bdev+0x1b0/0x1f0
[   53.568464]  [<ffffffffc09a7e40>] ? xfs_parseargs+0xbe0/0xbe0 [xfs]
[   53.568464]  [<ffffffffc09a5dd5>] xfs_fs_mount+0x15/0x20 [xfs]
[   53.568464]  [<ffffffff811ebb79>] mount_fs+0x39/0x1b0
[   53.568464]  [<ffffffff81192fc5>] ? __alloc_percpu+0x15/0x20
[   53.568464]  [<ffffffff812070db>] vfs_kern_mount+0x6b/0x120
[   53.568464]  [<ffffffff8120a032>] do_mount+0x222/0xca0
[   53.568464]  [<ffffffff8120adab>] SyS_mount+0x8b/0xe0
[   53.568464]  [<ffffffff817179cd>] system_call_fastpath+0x16/0x1b


-----Original Message----- 
From: Dave Chinner
Sent: 23 February, 2016 1:56 AM
To: Alex Lyakas
Cc: xfs@oss.sgi.com ; Christoph Hellwig ; Danny Shavit
Subject: Re: xfs resize: primary superblock is not updated immediately

On Tue, Feb 23, 2016 at 12:38:48AM +0200, Alex Lyakas wrote:
> Hi Dave,
> Thanks for your response.
>
> I am not freezing the filesystem before the snapshot.

There's your problem. A mounted filesystem is not consistent on disk
without flushing the entire journal and all the dirty metadata to
disk.

> However, let's assume that somebody resized the XFS, and it completed
> and got back to user-space. At this moment the primary superblock
> on-disk is not updated yet with the new agcount. And at this same
> moment there is a power-out. After the power comes back and the
> machine boots, if we mount the XFS, the same problem would happen, I
> believe.

Log recovery will run and update the superblock buffer with the correct
values. But the in-memory superblock that log recoery is working
with does not change, and so if there were accesses beyond the
current superblock ag/block count you'd see messages like this:

XFS (sda1): _xfs_buf_find: Block out of range: block 0xnnnnn EOFS 0xmmmmm

and log recovery should fail at that point because it can't pull in
a buffer it needs for recovery to make further progress. At which
point, you have an unmountable filesystem.

If log recovery succeeds, then yes, I can see that there is a
problem here because the per-ag tree is not reinitialised after the
superblock is re-read. That's a pretty easy fix, though (3-4 lines
of code in xlog_do_recover() to detect a change in filesystem block
count and call xfs_initialize_perag() again.

> Taking a block-level snapshot is exactly like a power-out from XFS
> perspective.

It's similar, but it's not the same. e.g. there are no issues like
volatile storage cache loss that have to be handled.

> And XFS should, in principle, be able to recover from
> that.

For some definition of recover. There is no guarantee that any of
the async transactions in memory will make it to disk, so the point
to which XFS can recover is undefined.

> The snapshot will come up as a new block device, which exhibits
> identical content as the original block device had at the moment when
> the snapshot was taken (like a boot after power-out).

The block device might be identical, but it's not identical to what
the filesystem is presenting the user. Any user dirty data cached in
memory, or metadata changes staged in the CIL will not be in the
snapshot. Hence the snapshot block device is not identical to the
original user visible state and data. You only get that if you
freeze the filesystem before taking the snapshot.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-02-23 12:25       ` Alex Lyakas
@ 2016-02-23 22:59         ` Dave Chinner
  2016-02-29  9:47           ` Alex Lyakas
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2016-02-23 22:59 UTC (permalink / raw)
  To: Alex Lyakas
  Cc: Christoph Hellwig, Danny Shavit, Shyam Kaushik, Yair Hershko, xfs

On Tue, Feb 23, 2016 at 02:25:38PM +0200, Alex Lyakas wrote:
> Hi Dave,
> 
> Below is a detailed reproduction scenario of the problem.

Can you package this for xfstests and send to
fstests@vger.kernel.org, please?

> No
> snapshots involved, only XFS. The scenario is performed on a VM,
> running kernel 3.18.19.

And a current kernel (e.g. 4.5-rc5) behaves how?

> 9) mount
> # mount -o sync /dev/mapper/xfs_base /mnt/xfs/
> 
> Kernel panics with [2].

It tried to read a buffer beyond the current end of filesystem that
log recoery knows about. Given the on-disk superblock had not been
updated by the growfs operation, this should have been detected
by _xfs_buf_find() and errored out, not tried to look up a per-ag
structure that is beyond the current end of filesystem.

i.e. the code I pointed out in my previous email failed to detect
the situation is it supposed to be protecting against.  Why did that
"block beyond the end of the filesystem" detection fail?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-02-23 22:59         ` Dave Chinner
@ 2016-02-29  9:47           ` Alex Lyakas
  2016-02-29 21:16             ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Lyakas @ 2016-02-29  9:47 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Danny Shavit, Shyam Kaushik, Yair Hershko, xfs

Hello Dave,
I have tried the same scenario with the 4.5 kernel from about a week ago, 
latest commit being [1].
The same crash is happening, stack trace being [2].

I am not proficient with xfstests, unfortunately. I tried running them 
several times, but I am not sure I was doing that properly.

As for your question why the "block beyond end of the filesystem fails". I 
tried to debug it further and added a print into _xfs_buf_find. What happens 
is that at some point, the sb_dblocks value is updated to the new value, but 
the in-memory pag object is not created. So the test:

eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks);
if (blkno < 0 || blkno >= eofs) {
...

still holds, but the needed pag does not exist.

Here are the results of the prints that I added:
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.546542] SGI XFS with ACLs, 
security attributes, realtime, no debug enabled
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.551243] XFS (dm-0): Mounting 
V4 Filesystem
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.576677] XFS (dm-0): Starting 
recovery (logdev: internal)
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.577866] _xfs_buf_find: 
blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.577872] _xfs_buf_find: 
blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.577882] _xfs_buf_find: 
blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
===> Here we start seeing the new value of "sb_dblocks" and hence the new 
value of "eofs":
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606796] _xfs_buf_find: 
blkno=1 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606804] _xfs_buf_find: 
blkno=1 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606988] _xfs_buf_find: 
blkno=2 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606991] _xfs_buf_find: 
blkno=2 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607205] _xfs_buf_find: 
blkno=50177 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607210] _xfs_buf_find: 
blkno=50177 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607375] _xfs_buf_find: 
blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607378] _xfs_buf_find: 
blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607512] _xfs_buf_find: 
blkno=100353 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607515] _xfs_buf_find: 
blkno=100353 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607652] _xfs_buf_find: 
blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607655] _xfs_buf_find: 
blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607829] _xfs_buf_find: 
blkno=150529 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607832] _xfs_buf_find: 
blkno=150529 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607980] _xfs_buf_find: 
blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607982] _xfs_buf_find: 
blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.608073] _xfs_buf_find: 
blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
===> and here we crash, but as you see, blkno is valid WRT eofs value.
Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.608120] BUG: unable to 
handle kernel NULL pointer dereference at 0000000000000098

Just for reference, immediately after the resize, the relevant superblock 
values are:

root@vc-00-00-350-dev:~# xfs_db -r -c "sb 0" -c "p"   /dev/mapper/xfs_base 
| egrep "agc|blocks"
blocksize = 4096
dblocks = 25088
rblocks = 0
agblocks = 6272
agcount = 4
rbmblocks = 0
logblocks = 2560
fdblocks = 22508

These values are identical to the values before the resize.

Thanks,
Alex.



[1]
commit 84e54c46b2f440a365a5224f1e5f173a462b7cca
Merge: 0ecdcd3 4328daa
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Feb 23 19:03:43 2016 -0800

    Merge tag 'dm-4.5-fix' of 
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

    Pull device mapper fix from Mike Snitzer:
     "Fix a 112 byte leak for each IO request that is requeued while DM
      multipath is handling faults due to path failures.

      This leak does not happen if blk-mq DM multipath is used.  It only
      occurs if .request_fn DM multipath is stacked ontop of blk-mq paths
      (e.g. scsi-mq devices)"

    * tag 'dm-4.5-fix' of 
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
      dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq 
paths

[2]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.333515] SGI XFS with ACLs, 
security attributes, realtime, no debug enabled
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.339593] XFS (dm-0): Mounting 
V4 Filesystem
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.355170] XFS (dm-0): Starting 
recovery (logdev: internal)
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.356997] BUG: unable to 
handle kernel NULL pointer dereference at 0000000000000098
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.357761] IP: 
[<ffffffff816f705c>] _raw_spin_lock+0xc/0x30
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.358289] PGD 36675067 PUD 
7a47a067 PMD 0
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.358721] Oops: 0002 [#1] SMP
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.359050] Modules linked in: 
xfs libcrc32c nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp 
llc ip6table_filter ip6_tables iptable_filter ip_tables x_tables deflate ctr 
twofish_generic twofish_x86_64_3way xts twofish_x86_64 twofish_common 
camellia_generic serpent_generic blowfish_generic blowfish_x86_64 
blowfish_common cast5_generic cast_common des_generic nfsd cmac auth_rpcgss 
oid_registry nfs_acl nfs xcbc lockd grace sunrpc fscache rmd160 
sha512_generic af_key xfrm_algo ppdev dm_multipath kvm irqbypass 
ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul 
glue_helper mac_hid i2c_piix4 parport_pc serio_raw tpm_tis i6300esb lp 
parport psmouse floppy
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] CPU: 0 PID: 3067 
Comm: mount Not tainted 4.5.0-555-generic #1456324867
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] Hardware name: Bochs 
Bochs, BIOS Bochs 01/01/2007
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] task: 
ffff88007bc9d580 ti: ffff880079530000 task.ti: ffff880079530000
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] RIP: 
0010:[<ffffffff816f705c>]  [<ffffffff816f705c>] _raw_spin_lock+0xc/0x30
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] RSP: 
0018:ffff880079533ac0  EFLAGS: 00010246
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] RAX: 
0000000000000000 RBX: ffff880079533bf0 RCX: 0000000000000000
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] RDX: 
0000000000000001 RSI: 0000000000000004 RDI: 0000000000000098
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] RBP: 
ffff880079533b00 R08: 0000000000000001 R09: ffff880079fa56d0
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] R10: 
ffff880079fa5718 R11: 0000000000000000 R12: 0000000000000001
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] R13: 
ffff8800365d93c0 R14: 0000000000000001 R15: 0000000000031001
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] FS: 
00007fc7b2fd2880(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] CS:  0010 DS: 0000 
ES: 0000 CR0: 0000000080050033
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] CR2: 
0000000000000098 CR3: 00000000360ed000 CR4: 00000000000406f0
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] Stack:
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932]  ffffffffa03e2656 
0000000000000000 0000000000000000 ffff880079533bf0
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932]  ffff88007a6a6000 
0000000000000001 ffff8800365d93c0 0000000000000001
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932]  ffff880079533b40 
ffffffffa03e291a ffff880079533bf8 0000000000000001
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] Call Trace:
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03e2656>] ? _xfs_buf_find+0x96/0x330 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03e291a>] xfs_buf_get_map+0x2a/0x270 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03e353d>] xfs_buf_read_map+0x2d/0x180 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa040f3c1>] xfs_trans_read_buf_map+0xf1/0x300 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03a5697>] xfs_read_agf+0x87/0x100 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03a575b>] xfs_alloc_read_agf+0x4b/0x130 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03a5c1a>] xfs_alloc_pagf_init+0x1a/0x40 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03d7eb9>] xfs_initialize_perag_data+0x99/0x110 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03f9408>] xfs_mountfs+0x608/0x820 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03fc1fe>] xfs_fs_fill_super+0x3be/0x4d0 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff811c1a77>] mount_bdev+0x187/0x1c0
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03fbe40>] ? xfs_parseargs+0xa70/0xa70 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffffa03fa885>] xfs_fs_mount+0x15/0x20 [xfs]
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff811c2739>] mount_fs+0x39/0x160
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff81172d55>] ? __alloc_percpu+0x15/0x20
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff811dc817>] vfs_kern_mount+0x67/0x110
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff811df345>] do_mount+0x225/0xdb0
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff811c06fd>] ? __fput+0x17d/0x1e0
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff811aaea4>] ? __kmalloc_track_caller+0x54/0x190
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff8116e341>] ? strndup_user+0x41/0xa0
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff8116e262>] ? memdup_user+0x42/0x70
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff811e01b3>] SyS_mount+0x83/0xd0
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] 
[<ffffffff816f7397>] entry_SYSCALL_64_fastpath+0x12/0x6a
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] Code: 5d c3 ba 01 00 
00 00 f0 0f b1 17 85 c0 75 ef b0 01 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 
00 66 66 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 
48 89 e5 e8 10 f7 9a ff 5d
Feb 29 11:26:05 vc-00-00-350-dev kernel: [  100.360932] RIP 
[<ffffffff816f705c>] _raw_spin_lock+0xc/0x30


-----Original Message----- 
From: Dave Chinner
Sent: 24 February, 2016 12:59 AM
To: Alex Lyakas
Cc: xfs@oss.sgi.com ; Christoph Hellwig ; Danny Shavit ; Yair Hershko ; 
Shyam Kaushik
Subject: Re: xfs resize: primary superblock is not updated immediately

On Tue, Feb 23, 2016 at 02:25:38PM +0200, Alex Lyakas wrote:
> Hi Dave,
>
> Below is a detailed reproduction scenario of the problem.

Can you package this for xfstests and send to
fstests@vger.kernel.org, please?

> No
> snapshots involved, only XFS. The scenario is performed on a VM,
> running kernel 3.18.19.

And a current kernel (e.g. 4.5-rc5) behaves how?

> 9) mount
> # mount -o sync /dev/mapper/xfs_base /mnt/xfs/
>
> Kernel panics with [2].

It tried to read a buffer beyond the current end of filesystem that
log recoery knows about. Given the on-disk superblock had not been
updated by the growfs operation, this should have been detected
by _xfs_buf_find() and errored out, not tried to look up a per-ag
structure that is beyond the current end of filesystem.

i.e. the code I pointed out in my previous email failed to detect
the situation is it supposed to be protecting against.  Why did that
"block beyond the end of the filesystem" detection fail?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-02-29  9:47           ` Alex Lyakas
@ 2016-02-29 21:16             ` Dave Chinner
  2016-03-01  7:20               ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2016-02-29 21:16 UTC (permalink / raw)
  To: Alex Lyakas
  Cc: Christoph Hellwig, Danny Shavit, Shyam Kaushik, Yair Hershko, xfs

On Mon, Feb 29, 2016 at 11:47:54AM +0200, Alex Lyakas wrote:
> Hello Dave,
> I have tried the same scenario with the 4.5 kernel from about a week
> ago, latest commit being [1].
> The same crash is happening, stack trace being [2].
> 
> I am not proficient with xfstests, unfortunately. I tried running
> them several times, but I am not sure I was doing that properly.
> 
> As for your question why the "block beyond end of the filesystem
> fails". I tried to debug it further and added a print into
> _xfs_buf_find. What happens is that at some point, the sb_dblocks
> value is updated to the new value, but the in-memory pag object is
> not created. So the test:
> 
> eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks);
> if (blkno < 0 || blkno >= eofs) {
> ...
> 
> still holds, but the needed pag does not exist.
> 
> Here are the results of the prints that I added:
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.546542] SGI XFS with
> ACLs, security attributes, realtime, no debug enabled
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.551243] XFS (dm-0):
> Mounting V4 Filesystem
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.576677] XFS (dm-0):
> Starting recovery (logdev: internal)
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.577866]
> _xfs_buf_find: blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.577872]
> _xfs_buf_find: blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.577882]
> _xfs_buf_find: blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
> ===> Here we start seeing the new value of "sb_dblocks" and hence
> the new value of "eofs":

We shouldn't see sb_dblocks change until log recovery completes the
first phase of log recovery and the in-core superblock is re-read
from disk in xlog_do_recover().

> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606796]
> _xfs_buf_find: blkno=1 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606804]
> _xfs_buf_find: blkno=1 eofs=204800 >m_sb.sb_dblocks=25600

looking up AGF 0.

> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606988]
> _xfs_buf_find: blkno=2 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606991]
> _xfs_buf_find: blkno=2 eofs=204800 >m_sb.sb_dblocks=25600

Now AGI 0.

> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607205]
> _xfs_buf_find: blkno=50177 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607210]
> _xfs_buf_find: blkno=50177 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607375]
> _xfs_buf_find: blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607378]
> _xfs_buf_find: blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600

AGF/AGI 1.

> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607512]
> _xfs_buf_find: blkno=100353 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607515]
> _xfs_buf_find: blkno=100353 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607652]
> _xfs_buf_find: blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607655]
> _xfs_buf_find: blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600

AGF/AGI 2.

> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607829]
> _xfs_buf_find: blkno=150529 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607832]
> _xfs_buf_find: blkno=150529 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607980]
> _xfs_buf_find: blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607982]
> _xfs_buf_find: blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.608073]

AGF/AGI 4.

> _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
> ===> and here we crash, but as you see, blkno is valid WRT eofs value.
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.608120] BUG: unable
> to handle kernel NULL pointer dereference at 0000000000000098

AGF 5 goes splat.

Which means it's through the first phase of log recovery and it's
not failing in log recovery. i.e. we are now running
xfs_initialize_perag_data() after log recovery. So, as I said a
couple of posts back up this thread:

| If log recovery succeeds, then yes, I can see that there is a
| problem here because the per-ag tree is not reinitialised after
| the superblock is re-read. That's a pretty easy fix, though (3-4
| lines of code in xlog_do_recover() to detect a change in
| filesystem block count and call xfs_initialize_perag() again..

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-02-29 21:16             ` Dave Chinner
@ 2016-03-01  7:20               ` Dave Chinner
  2016-03-02 13:14                 ` Fanael Linithien
  2016-03-03  9:18                 ` Alex Lyakas
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Chinner @ 2016-03-01  7:20 UTC (permalink / raw)
  To: Alex Lyakas
  Cc: Christoph Hellwig, Danny Shavit, Shyam Kaushik, Yair Hershko, xfs

On Tue, Mar 01, 2016 at 08:16:28AM +1100, Dave Chinner wrote:
> On Mon, Feb 29, 2016 at 11:47:54AM +0200, Alex Lyakas wrote:
> Which means it's through the first phase of log recovery and it's
> not failing in log recovery. i.e. we are now running
> xfs_initialize_perag_data() after log recovery. So, as I said a
> couple of posts back up this thread:
> 
> | If log recovery succeeds, then yes, I can see that there is a
> | problem here because the per-ag tree is not reinitialised after
> | the superblock is re-read. That's a pretty easy fix, though (3-4
> | lines of code in xlog_do_recover() to detect a change in
> | filesystem block count and call xfs_initialize_perag() again..

Patch below.

-- 
Dave Chinner
david@fromorbit.com


xfs: reinitialise per-AG structures if geometry changes during recovery

From: Dave Chinner <dchinner@redhat.com>

If a crash occurs immediately after a filesystem grow operation, the
updated superblock geometry is found only in the log. After we
recover the log, the superblock is reread and re-initialised and so
has the new geometry in memory. If the new geometry has more AGs
than prior to the grow operation, then the new AGs will not have
in-memory xfs_perag structurea associated with them.

This will result in an oops when the first metadata buffer from a
new AG is looked up in the buffer cache, as the block lies within
the new geometry but then fails to find a perag structure on lookup.
This is easily fixed by simply re-initialising the perag structure
after re-reading the superblock at the conclusion of the first pahse
of log recovery.

This, however, does not fix the case of log recovery requiring
access to metadata in the newly grown space. Fortunately for us,
because the in-core superblock has not been updated, this will
result in detection of access beyond the end of the filesystem
and so recovery will fail at that point. If this proves to be
a problem, then we can address it separately to the current
reported issue.

Reported-by: Alex Lyakas <alex@zadarastorage.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1dc0e14..520471b 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4898,6 +4898,7 @@ xlog_do_recover(
 	xfs_daddr_t	head_blk,
 	xfs_daddr_t	tail_blk)
 {
+	struct xfs_mount *mp = log->l_mp;
 	int		error;
 	xfs_buf_t	*bp;
 	xfs_sb_t	*sbp;
@@ -4912,7 +4913,7 @@ xlog_do_recover(
 	/*
 	 * If IO errors happened during recovery, bail out.
 	 */
-	if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
+	if (XFS_FORCED_SHUTDOWN(mp)) {
 		return -EIO;
 	}
 
@@ -4925,13 +4926,13 @@ xlog_do_recover(
 	 * or iunlinks they will have some entries in the AIL; so we look at
 	 * the AIL to determine how to set the tail_lsn.
 	 */
-	xlog_assign_tail_lsn(log->l_mp);
+	xlog_assign_tail_lsn(mp);
 
 	/*
 	 * Now that we've finished replaying all buffer and inode
 	 * updates, re-read in the superblock and reverify it.
 	 */
-	bp = xfs_getsb(log->l_mp, 0);
+	bp = xfs_getsb(mp, 0);
 	bp->b_flags &= ~(XBF_DONE | XBF_ASYNC);
 	ASSERT(!(bp->b_flags & XBF_WRITE));
 	bp->b_flags |= XBF_READ;
@@ -4939,7 +4940,7 @@ xlog_do_recover(
 
 	error = xfs_buf_submit_wait(bp);
 	if (error) {
-		if (!XFS_FORCED_SHUTDOWN(log->l_mp)) {
+		if (!XFS_FORCED_SHUTDOWN(mp)) {
 			xfs_buf_ioerror_alert(bp, __func__);
 			ASSERT(0);
 		}
@@ -4948,14 +4949,17 @@ xlog_do_recover(
 	}
 
 	/* Convert superblock from on-disk format */
-	sbp = &log->l_mp->m_sb;
+	sbp = &mp->m_sb;
 	xfs_sb_from_disk(sbp, XFS_BUF_TO_SBP(bp));
-	ASSERT(sbp->sb_magicnum == XFS_SB_MAGIC);
-	ASSERT(xfs_sb_good_version(sbp));
-	xfs_reinit_percpu_counters(log->l_mp);
-
 	xfs_buf_relse(bp);
 
+	/* re-initialise in-core superblock and geometry structures */
+	xfs_reinit_percpu_counters(mp);
+	error = xfs_initialize_perag(mp, sbp->sb_agcount, &mp->m_maxagi);
+	if (error) {
+		xfs_warn(mp, "Failed post-recovery per-ag init: %d", error);
+		return error;
+	}
 
 	xlog_recover_check_summary(log);
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-03-01  7:20               ` Dave Chinner
@ 2016-03-02 13:14                 ` Fanael Linithien
  2016-03-03  9:18                 ` Alex Lyakas
  1 sibling, 0 replies; 14+ messages in thread
From: Fanael Linithien @ 2016-03-02 13:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Danny Shavit, Shyam Kaushik, Yair Hershko, xfs, Christoph Hellwig,
	Alex Lyakas

2016-03-01 8:20 GMT+01:00 Dave Chinner <david@fromorbit.com>:
> than prior to the grow operation, then the new AGs will not have
> in-memory xfs_perag structurea associated with them.

                      ^^^^^^^^^^
structures

> This is easily fixed by simply re-initialising the perag structure
> after re-reading the superblock at the conclusion of the first pahse
> of log recovery.

                                                                 ^^^^^
phase

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-03-01  7:20               ` Dave Chinner
  2016-03-02 13:14                 ` Fanael Linithien
@ 2016-03-03  9:18                 ` Alex Lyakas
  2016-03-03 21:31                   ` Dave Chinner
  1 sibling, 1 reply; 14+ messages in thread
From: Alex Lyakas @ 2016-03-03  9:18 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Danny Shavit, Shyam Kaushik, Yair Hershko, xfs

Hello Dave,
Thanks for the patch! I confirm that it fixes the scenario.

At [1] please find all the blknos that are being used during the log 
recovery (if that's of any interest).

Thanks,
Alex.

[1]
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.068647] XFS (dm-0): Mounting 
V4 Filesystem
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.099664] XFS (dm-0): Starting 
recovery (logdev: internal)
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.102539] _xfs_buf_find: 
blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.102549] _xfs_buf_find: 
blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.102580] _xfs_buf_find: 
blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.126231] _xfs_buf_find: 
blkno=1 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.126238] _xfs_buf_find: 
blkno=1 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.126753] _xfs_buf_find: 
blkno=2 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.126762] _xfs_buf_find: 
blkno=2 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.127234] _xfs_buf_find: 
blkno=50177 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.127244] _xfs_buf_find: 
blkno=50177 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.127751] _xfs_buf_find: 
blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.127760] _xfs_buf_find: 
blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.128224] _xfs_buf_find: 
blkno=100353 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.128234] _xfs_buf_find: 
blkno=100353 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.128638] _xfs_buf_find: 
blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.128646] _xfs_buf_find: 
blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129054] _xfs_buf_find: 
blkno=150529 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129062] _xfs_buf_find: 
blkno=150529 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129449] _xfs_buf_find: 
blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129459] _xfs_buf_find: 
blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129739] _xfs_buf_find: 
blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129746] _xfs_buf_find: 
blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130051] _xfs_buf_find: 
blkno=200706 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130061] _xfs_buf_find: 
blkno=200706 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130317] _xfs_buf_find: 
blkno=64 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130324] _xfs_buf_find: 
blkno=64 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130548] _xfs_buf_find: 
blkno=64 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130560] _xfs_buf_find: 
blkno=64 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130571] _xfs_buf_find: 
blkno=2 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130581] _xfs_buf_find: 
blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130591] _xfs_buf_find: 
blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130610] _xfs_buf_find: 
blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130613] _xfs_buf_find: 
blkno=200706 eofs=204800 >m_sb.sb_dblocks=25600
Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.130618] XFS (dm-0): Ending 
recovery (logdev: internal)

-----Original Message----- 
From: Dave Chinner
Sent: 01 March, 2016 9:20 AM
To: Alex Lyakas
Cc: Christoph Hellwig ; Danny Shavit ; Shyam Kaushik ; Yair Hershko ; 
xfs@oss.sgi.com
Subject: Re: xfs resize: primary superblock is not updated immediately

On Tue, Mar 01, 2016 at 08:16:28AM +1100, Dave Chinner wrote:
> On Mon, Feb 29, 2016 at 11:47:54AM +0200, Alex Lyakas wrote:
> Which means it's through the first phase of log recovery and it's
> not failing in log recovery. i.e. we are now running
> xfs_initialize_perag_data() after log recovery. So, as I said a
> couple of posts back up this thread:
>
> | If log recovery succeeds, then yes, I can see that there is a
> | problem here because the per-ag tree is not reinitialised after
> | the superblock is re-read. That's a pretty easy fix, though (3-4
> | lines of code in xlog_do_recover() to detect a change in
> | filesystem block count and call xfs_initialize_perag() again..

Patch below.

-- 
Dave Chinner
david@fromorbit.com


xfs: reinitialise per-AG structures if geometry changes during recovery

From: Dave Chinner <dchinner@redhat.com>

If a crash occurs immediately after a filesystem grow operation, the
updated superblock geometry is found only in the log. After we
recover the log, the superblock is reread and re-initialised and so
has the new geometry in memory. If the new geometry has more AGs
than prior to the grow operation, then the new AGs will not have
in-memory xfs_perag structurea associated with them.

This will result in an oops when the first metadata buffer from a
new AG is looked up in the buffer cache, as the block lies within
the new geometry but then fails to find a perag structure on lookup.
This is easily fixed by simply re-initialising the perag structure
after re-reading the superblock at the conclusion of the first pahse
of log recovery.

This, however, does not fix the case of log recovery requiring
access to metadata in the newly grown space. Fortunately for us,
because the in-core superblock has not been updated, this will
result in detection of access beyond the end of the filesystem
and so recovery will fail at that point. If this proves to be
a problem, then we can address it separately to the current
reported issue.

Reported-by: Alex Lyakas <alex@zadarastorage.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log_recover.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1dc0e14..520471b 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4898,6 +4898,7 @@ xlog_do_recover(
  xfs_daddr_t head_blk,
  xfs_daddr_t tail_blk)
{
+ struct xfs_mount *mp = log->l_mp;
  int error;
  xfs_buf_t *bp;
  xfs_sb_t *sbp;
@@ -4912,7 +4913,7 @@ xlog_do_recover(
  /*
  * If IO errors happened during recovery, bail out.
  */
- if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
+ if (XFS_FORCED_SHUTDOWN(mp)) {
  return -EIO;
  }

@@ -4925,13 +4926,13 @@ xlog_do_recover(
  * or iunlinks they will have some entries in the AIL; so we look at
  * the AIL to determine how to set the tail_lsn.
  */
- xlog_assign_tail_lsn(log->l_mp);
+ xlog_assign_tail_lsn(mp);

  /*
  * Now that we've finished replaying all buffer and inode
  * updates, re-read in the superblock and reverify it.
  */
- bp = xfs_getsb(log->l_mp, 0);
+ bp = xfs_getsb(mp, 0);
  bp->b_flags &= ~(XBF_DONE | XBF_ASYNC);
  ASSERT(!(bp->b_flags & XBF_WRITE));
  bp->b_flags |= XBF_READ;
@@ -4939,7 +4940,7 @@ xlog_do_recover(

  error = xfs_buf_submit_wait(bp);
  if (error) {
- if (!XFS_FORCED_SHUTDOWN(log->l_mp)) {
+ if (!XFS_FORCED_SHUTDOWN(mp)) {
  xfs_buf_ioerror_alert(bp, __func__);
  ASSERT(0);
  }
@@ -4948,14 +4949,17 @@ xlog_do_recover(
  }

  /* Convert superblock from on-disk format */
- sbp = &log->l_mp->m_sb;
+ sbp = &mp->m_sb;
  xfs_sb_from_disk(sbp, XFS_BUF_TO_SBP(bp));
- ASSERT(sbp->sb_magicnum == XFS_SB_MAGIC);
- ASSERT(xfs_sb_good_version(sbp));
- xfs_reinit_percpu_counters(log->l_mp);
-
  xfs_buf_relse(bp);

+ /* re-initialise in-core superblock and geometry structures */
+ xfs_reinit_percpu_counters(mp);
+ error = xfs_initialize_perag(mp, sbp->sb_agcount, &mp->m_maxagi);
+ if (error) {
+ xfs_warn(mp, "Failed post-recovery per-ag init: %d", error);
+ return error;
+ }

  xlog_recover_check_summary(log);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-03-03  9:18                 ` Alex Lyakas
@ 2016-03-03 21:31                   ` Dave Chinner
  2016-03-06  9:46                     ` Alex Lyakas
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2016-03-03 21:31 UTC (permalink / raw)
  To: Alex Lyakas
  Cc: Christoph Hellwig, Danny Shavit, Shyam Kaushik, Yair Hershko, xfs

On Thu, Mar 03, 2016 at 11:18:43AM +0200, Alex Lyakas wrote:
> Hello Dave,
> Thanks for the patch! I confirm that it fixes the scenario.
> 
> At [1] please find all the blknos that are being used during the log
> recovery (if that's of any interest).
....
> Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129739]
> _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
> Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129746]
> _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600

Where is the warning that this block is out of range?

And why didn't recovery fail at this point because the block
requested is out of range and so the buffer lookup should have
failed?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-03-03 21:31                   ` Dave Chinner
@ 2016-03-06  9:46                     ` Alex Lyakas
  2016-03-06 15:46                       ` Eric Sandeen
  2016-03-06 20:49                       ` Dave Chinner
  0 siblings, 2 replies; 14+ messages in thread
From: Alex Lyakas @ 2016-03-06  9:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Danny Shavit, xfs

Hello Dave,

On Thu, Mar 3, 2016 at 11:31 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Mar 03, 2016 at 11:18:43AM +0200, Alex Lyakas wrote:
>> Hello Dave,
>> Thanks for the patch! I confirm that it fixes the scenario.
>>
>> At [1] please find all the blknos that are being used during the log
>> recovery (if that's of any interest).
> ....
>> Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129739]
>> _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
>> Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129746]
>> _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
>
> Where is the warning that this block is out of range?
Perhaps you are being confused by the ">" mark that appears in the
prints? This was definitely added by mistake, it appears on every
print. I apologize for that.
If not, then my understanding is that 200705 is still less than
204800, so this block number is not out of range. And since we have
added the new pag structure, the issue is now fixed.

Otherwise, I can provide an XFS metadump for you to analyze.

Thanks,
Alex.

>
> And why didn't recovery fail at this point because the block
> requested is out of range and so the buffer lookup should have
> failed?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-03-06  9:46                     ` Alex Lyakas
@ 2016-03-06 15:46                       ` Eric Sandeen
  2016-03-06 20:49                       ` Dave Chinner
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Sandeen @ 2016-03-06 15:46 UTC (permalink / raw)
  To: Alex Lyakas; +Cc: Christoph Hellwig, Danny Shavit, xfs


> On Mar 6, 2016, at 3:46 AM, Alex Lyakas <alex@zadarastorage.com> wrote:
> 
> Hello Dave,
> 
>> On Thu, Mar 3, 2016 at 11:31 PM, Dave Chinner <david@fromorbit.com> wrote:
>>> On Thu, Mar 03, 2016 at 11:18:43AM +0200, Alex Lyakas wrote:
>>> Hello Dave,
>>> Thanks for the patch! I confirm that it fixes the scenario.
>>> 
>>> At [1] please find all the blknos that are being used during the log
>>> recovery (if that's of any interest).
>> ....
>>> Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129739]
>>> _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
>>> Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129746]
>>> _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
>> 
>> Where is the warning that this block is out of range?
> Perhaps you are being confused by the ">" mark that appears in the
> prints? This was definitely added by mistake, it appears on every
> print. I apologize for that.
> If not, then my understanding is that 200705 is still less than
> 204800, so this block number is not out of range. And since we have
> added the new pag structure, the issue is now fixed.

Block units in printks are never clear; 204800 sectors is 25600 4K blocks, and yes, the buffer at sector 200705 looks to be in range of the filesystem.

Eric

> Otherwise, I can provide an XFS metadump for you to analyze.
> 
> Thanks,
> Alex.
> 
>> 
>> And why didn't recovery fail at this point because the block
>> requested is out of range and so the buffer lookup should have
>> failed?
>> 
>> Cheers,
>> 
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: xfs resize: primary superblock is not updated immediately
  2016-03-06  9:46                     ` Alex Lyakas
  2016-03-06 15:46                       ` Eric Sandeen
@ 2016-03-06 20:49                       ` Dave Chinner
  1 sibling, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2016-03-06 20:49 UTC (permalink / raw)
  To: Alex Lyakas; +Cc: Christoph Hellwig, Danny Shavit, xfs

On Sun, Mar 06, 2016 at 11:46:58AM +0200, Alex Lyakas wrote:
> Hello Dave,
> 
> On Thu, Mar 3, 2016 at 11:31 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Mar 03, 2016 at 11:18:43AM +0200, Alex Lyakas wrote:
> >> Hello Dave,
> >> Thanks for the patch! I confirm that it fixes the scenario.
> >>
> >> At [1] please find all the blknos that are being used during the log
> >> recovery (if that's of any interest).
> > ....
> >> Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129739]
> >> _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
> >> Mar  3 11:17:41 vc-00-00-350-dev kernel: [   68.129746]
> >> _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
> >
> > Where is the warning that this block is out of range?
> Perhaps you are being confused by the ">" mark that appears in the
> prints? This was definitely added by mistake, it appears on every
> print. I apologize for that.
> If not, then my understanding is that 200705 is still less than
> 204800, so this block number is not out of range. And since we have
> added the new pag structure, the issue is now fixed.

Sorry, I misread it as 200480, not 204800. My fault, too much to do,
brain mostly fried by other stuff. So the patch works.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-03-06 20:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3685DFAD20214109878873CF81232704@alyakaslap>
2016-02-22 21:20 ` xfs resize: primary superblock is not updated immediately Dave Chinner
2016-02-22 22:38   ` Alex Lyakas
2016-02-22 23:56     ` Dave Chinner
2016-02-23 12:25       ` Alex Lyakas
2016-02-23 22:59         ` Dave Chinner
2016-02-29  9:47           ` Alex Lyakas
2016-02-29 21:16             ` Dave Chinner
2016-03-01  7:20               ` Dave Chinner
2016-03-02 13:14                 ` Fanael Linithien
2016-03-03  9:18                 ` Alex Lyakas
2016-03-03 21:31                   ` Dave Chinner
2016-03-06  9:46                     ` Alex Lyakas
2016-03-06 15:46                       ` Eric Sandeen
2016-03-06 20:49                       ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox