public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS crashes on VMs
@ 2015-05-27 23:06 Shrinand Javadekar
  2015-05-27 23:27 ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Shrinand Javadekar @ 2015-05-27 23:06 UTC (permalink / raw)
  To: xfs

Hi,

I am running Openstack Swift in a VM with XFS as the underlying
filesystem. This is generating a metadata heavy workload on XFS.
Essentially, it is creating a new directory and a new file (256KB) in
that directory. This file has extended attributes of size 243 bytes.

I am seeing the following two crashes of the machine:

http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg

AND

http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq

I have only seen these when running in a VM. We have run several tests
on physical server but have never seen these problems.

Are there any known issues with XFS running on VMs?

Thanks in advance.
-Shri

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: XFS crashes on VMs
  2015-05-27 23:06 XFS crashes on VMs Shrinand Javadekar
@ 2015-05-27 23:27 ` Eric Sandeen
  2015-05-28  0:03   ` Shrinand Javadekar
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2015-05-27 23:27 UTC (permalink / raw)
  To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com

That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do.  If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug -  but I think this is not something that we have seen before.

Eric 

> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
> 
> Hi,
> 
> I am running Openstack Swift in a VM with XFS as the underlying
> filesystem. This is generating a metadata heavy workload on XFS.
> Essentially, it is creating a new directory and a new file (256KB) in
> that directory. This file has extended attributes of size 243 bytes.
> 
> I am seeing the following two crashes of the machine:
> 
> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
> 
> AND
> 
> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
> 
> I have only seen these when running in a VM. We have run several tests
> on physical server but have never seen these problems.
> 
> Are there any known issues with XFS running on VMs?
> 
> Thanks in advance.
> -Shri
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: XFS crashes on VMs
  2015-05-27 23:27 ` Eric Sandeen
@ 2015-05-28  0:03   ` Shrinand Javadekar
  2015-05-28  0:52     ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Shrinand Javadekar @ 2015-05-28  0:03 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs@oss.sgi.com

Thanks Eric,

We ran xfs_repair and were able to get it back into a running state.
This is fine for a test & dev but in production it won't be
acceptable. What other data do we need to get to the bottom of this?

On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do.  If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug -  but I think this is not something that we have seen before.
>
> Eric
>
>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>>
>> Hi,
>>
>> I am running Openstack Swift in a VM with XFS as the underlying
>> filesystem. This is generating a metadata heavy workload on XFS.
>> Essentially, it is creating a new directory and a new file (256KB) in
>> that directory. This file has extended attributes of size 243 bytes.
>>
>> I am seeing the following two crashes of the machine:
>>
>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>
>> AND
>>
>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>
>> I have only seen these when running in a VM. We have run several tests
>> on physical server but have never seen these problems.
>>
>> Are there any known issues with XFS running on VMs?
>>
>> Thanks in advance.
>> -Shri
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: XFS crashes on VMs
  2015-05-28  0:03   ` Shrinand Javadekar
@ 2015-05-28  0:52     ` Eric Sandeen
  2015-05-28  0:53       ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2015-05-28  0:52 UTC (permalink / raw)
  To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com

You'll need to try to narrow down how it happened.

The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata.

Either:

a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read

b) xfs read the wrong block due to other metadata corruption.

c) something corrupted the storage after it was written

d) the storage returned the wrong data on a read request ...

e) ???

Did you save the xfs_repair output?  That might offer more clues.

Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try?

-Eric


On 5/27/15 7:03 PM, Shrinand Javadekar wrote:
> Thanks Eric,
> 
> We ran xfs_repair and were able to get it back into a running state.
> This is fine for a test & dev but in production it won't be
> acceptable. What other data do we need to get to the bottom of this?
> 
> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do.  If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug -  but I think this is not something that we have seen before.
>>
>> Eric
>>
>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>>>
>>> Hi,
>>>
>>> I am running Openstack Swift in a VM with XFS as the underlying
>>> filesystem. This is generating a metadata heavy workload on XFS.
>>> Essentially, it is creating a new directory and a new file (256KB) in
>>> that directory. This file has extended attributes of size 243 bytes.
>>>
>>> I am seeing the following two crashes of the machine:
>>>
>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>>
>>> AND
>>>
>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>>
>>> I have only seen these when running in a VM. We have run several tests
>>> on physical server but have never seen these problems.
>>>
>>> Are there any known issues with XFS running on VMs?
>>>
>>> Thanks in advance.
>>> -Shri
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: XFS crashes on VMs
  2015-05-28  0:52     ` Eric Sandeen
@ 2015-05-28  0:53       ` Eric Sandeen
  2015-05-28 18:08         ` Shrinand Javadekar
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2015-05-28  0:53 UTC (permalink / raw)
  To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com

And did anything else "interesting" happen prior to the detection?

> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> 
> You'll need to try to narrow down how it happened.
> 
> The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata.
> 
> Either:
> 
> a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read
> 
> b) xfs read the wrong block due to other metadata corruption.
> 
> c) something corrupted the storage after it was written
> 
> d) the storage returned the wrong data on a read request ...
> 
> e) ???
> 
> Did you save the xfs_repair output?  That might offer more clues.
> 
> Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try?
> 
> -Eric
> 
> 
>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote:
>> Thanks Eric,
>> 
>> We ran xfs_repair and were able to get it back into a running state.
>> This is fine for a test & dev but in production it won't be
>> acceptable. What other data do we need to get to the bottom of this?
>> 
>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do.  If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug -  but I think this is not something that we have seen before.
>>> 
>>> Eric
>>> 
>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am running Openstack Swift in a VM with XFS as the underlying
>>>> filesystem. This is generating a metadata heavy workload on XFS.
>>>> Essentially, it is creating a new directory and a new file (256KB) in
>>>> that directory. This file has extended attributes of size 243 bytes.
>>>> 
>>>> I am seeing the following two crashes of the machine:
>>>> 
>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>>> 
>>>> AND
>>>> 
>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>>> 
>>>> I have only seen these when running in a VM. We have run several tests
>>>> on physical server but have never seen these problems.
>>>> 
>>>> Are there any known issues with XFS running on VMs?
>>>> 
>>>> Thanks in advance.
>>>> -Shri
>>>> 
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs
>> 
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: XFS crashes on VMs
  2015-05-28  0:53       ` Eric Sandeen
@ 2015-05-28 18:08         ` Shrinand Javadekar
  2015-06-19 18:34           ` Shrinand Javadekar
  0 siblings, 1 reply; 9+ messages in thread
From: Shrinand Javadekar @ 2015-05-28 18:08 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs@oss.sgi.com

We'll try and reproduce this and capture the output of xfs_repair when
it happens next. Will keep an eye on what else was happening in the
infrastructure when it happens.

FWIW, we've seen this in local VMware environment as well as when we
were running on Amazon EC2 instances. So it doesn't seem hypervisor
specific.

On Wed, May 27, 2015 at 5:53 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> And did anything else "interesting" happen prior to the detection?
>
>> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>
>> You'll need to try to narrow down how it happened.
>>
>> The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata.
>>
>> Either:
>>
>> a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read
>>
>> b) xfs read the wrong block due to other metadata corruption.
>>
>> c) something corrupted the storage after it was written
>>
>> d) the storage returned the wrong data on a read request ...
>>
>> e) ???
>>
>> Did you save the xfs_repair output?  That might offer more clues.
>>
>> Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try?
>>
>> -Eric
>>
>>
>>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote:
>>> Thanks Eric,
>>>
>>> We ran xfs_repair and were able to get it back into a running state.
>>> This is fine for a test & dev but in production it won't be
>>> acceptable. What other data do we need to get to the bottom of this?
>>>
>>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do.  If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug -  but I think this is not something that we have seen before.
>>>>
>>>> Eric
>>>>
>>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am running Openstack Swift in a VM with XFS as the underlying
>>>>> filesystem. This is generating a metadata heavy workload on XFS.
>>>>> Essentially, it is creating a new directory and a new file (256KB) in
>>>>> that directory. This file has extended attributes of size 243 bytes.
>>>>>
>>>>> I am seeing the following two crashes of the machine:
>>>>>
>>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>>>>
>>>>> AND
>>>>>
>>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>>>>
>>>>> I have only seen these when running in a VM. We have run several tests
>>>>> on physical server but have never seen these problems.
>>>>>
>>>>> Are there any known issues with XFS running on VMs?
>>>>>
>>>>> Thanks in advance.
>>>>> -Shri
>>>>>
>>>>> _______________________________________________
>>>>> xfs mailing list
>>>>> xfs@oss.sgi.com
>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: XFS crashes on VMs
  2015-05-28 18:08         ` Shrinand Javadekar
@ 2015-06-19 18:34           ` Shrinand Javadekar
  2015-06-19 19:37             ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Shrinand Javadekar @ 2015-06-19 18:34 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs@oss.sgi.com

[-- Attachment #1: Type: text/plain, Size: 4067 bytes --]

I hit this problem again and captured the output of all the steps
while repairing the filesystem. Here's the crash:
http://pastie.org/private/prift1xjcc38s0jcvehvew

And the output of the xfs_repair steps (also attached if needed):
http://pastie.org/private/gvq3aiisudfhy69ezagw

Hope this can provide some insights.

-Shri

On Thu, May 28, 2015 at 11:08 AM, Shrinand Javadekar
<shrinand@maginatics.com> wrote:
> We'll try and reproduce this and capture the output of xfs_repair when
> it happens next. Will keep an eye on what else was happening in the
> infrastructure when it happens.
>
> FWIW, we've seen this in local VMware environment as well as when we
> were running on Amazon EC2 instances. So it doesn't seem hypervisor
> specific.
>
> On Wed, May 27, 2015 at 5:53 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>> And did anything else "interesting" happen prior to the detection?
>>
>>> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>>
>>> You'll need to try to narrow down how it happened.
>>>
>>> The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata.
>>>
>>> Either:
>>>
>>> a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read
>>>
>>> b) xfs read the wrong block due to other metadata corruption.
>>>
>>> c) something corrupted the storage after it was written
>>>
>>> d) the storage returned the wrong data on a read request ...
>>>
>>> e) ???
>>>
>>> Did you save the xfs_repair output?  That might offer more clues.
>>>
>>> Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try?
>>>
>>> -Eric
>>>
>>>
>>>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote:
>>>> Thanks Eric,
>>>>
>>>> We ran xfs_repair and were able to get it back into a running state.
>>>> This is fine for a test & dev but in production it won't be
>>>> acceptable. What other data do we need to get to the bottom of this?
>>>>
>>>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>>>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do.  If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug -  but I think this is not something that we have seen before.
>>>>>
>>>>> Eric
>>>>>
>>>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am running Openstack Swift in a VM with XFS as the underlying
>>>>>> filesystem. This is generating a metadata heavy workload on XFS.
>>>>>> Essentially, it is creating a new directory and a new file (256KB) in
>>>>>> that directory. This file has extended attributes of size 243 bytes.
>>>>>>
>>>>>> I am seeing the following two crashes of the machine:
>>>>>>
>>>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>>>>>
>>>>>> AND
>>>>>>
>>>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>>>>>
>>>>>> I have only seen these when running in a VM. We have run several tests
>>>>>> on physical server but have never seen these problems.
>>>>>>
>>>>>> Are there any known issues with XFS running on VMs?
>>>>>>
>>>>>> Thanks in advance.
>>>>>> -Shri
>>>>>>
>>>>>> _______________________________________________
>>>>>> xfs mailing list
>>>>>> xfs@oss.sgi.com
>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs

[-- Attachment #2: xfs_crash --]
[-- Type: application/octet-stream, Size: 2074 bytes --]

root@foods-12:/home/maginatics# xfs_repair /dev/mapper/TrollGroup-TrollVolume
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        ERROR: The filesystem has valuable metadata changes in a log which needs to
        be replayed.  Mount the filesystem to replay the log, and unmount it before
        re-running xfs_repair.  If you are unable to mount the filesystem, then use
        the -L option to destroy the log and attempt a repair.
        Note that destroying the log may cause corruption -- please attempt a mount
        of the filesystem before doing this.


root@foods-12:/home/maginatics# mount /dev/mapper/TrollGroup-TrollVolume
root@foods-12:/home/maginatics# mount
...
/dev/mapper/TrollGroup-TrollVolume on /lvm type xfs (rw,noexec,nosuid,nodev,noatime,nodiratime,nobarrier,logbufs=8)

root@foods-12:/home/maginatics# umount /lvm

root@foods-12:/home/maginatics# mount
<no /lvm>

root@foods-12:/home/maginatics: xfs_repair /dev/mapper/TrollGroup-TrollVolume
root@foods-12:/home/maginatics# tail -f repair_output
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: XFS crashes on VMs
  2015-06-19 18:34           ` Shrinand Javadekar
@ 2015-06-19 19:37             ` Eric Sandeen
  2015-08-03 19:11               ` Shrinand Javadekar
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2015-06-19 19:37 UTC (permalink / raw)
  To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com

On 6/19/15 1:34 PM, Shrinand Javadekar wrote:
> I hit this problem again and captured the output of all the steps
> while repairing the filesystem. Here's the crash:
> http://pastie.org/private/prift1xjcc38s0jcvehvew

that starts with:

Jun 18 18:40:19 foods-12 kernel: [3639696.006884] ffff8801740f8000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
Jun 18 18:40:19 foods-12 kernel: [3639696.007056] ffff8801740f8010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Jun 18 18:40:19 foods-12 kernel: [3639696.007140] ffff8801740f8020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jun 18 18:40:19 foods-12 kernel: [3639696.007230] ffff8801740f8030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

I think there should have been other interesting bits prior to that line, can you check, and provide it please?  Full dmesg in a pastebin would be just fine.

xfs_attr3_leaf_write_verify at line 216 of file /build/buildd/linux-lts-trusty-3.13.0/fs/xfs/xfs_attr_leaf.c.  Caller 0xffffffffa00a193a

which is ... interesting; something went wrong on the way _to_ disk?

Ok, what is wrong, then.  here's the first 64 bytes of the buffer,
it contains:

typedef struct xfs_attr_leafblock {
        xfs_attr_leaf_hdr_t     hdr;    /* constant-structure header block */

where

typedef struct xfs_attr_leaf_hdr {      /* constant-structure header block */
        xfs_da_blkinfo_t info;          /* block type, links, etc. */
        __be16  count;                  /* count of active leaf_entry's */
        __be16  usedbytes;              /* num bytes of names/values stored */
        __be16  firstused;              /* first used byte in name area */
        __u8    holes;                  /* != 0 if blk needs compaction */
        __u8    pad1;
        xfs_attr_leaf_map_t freemap[XFS_ATTR_LEAF_MAPSIZE];
                                        /* N largest free regions */
} xfs_attr_leaf_hdr_t;

and

typedef struct xfs_da_blkinfo {
        __be32          forw;                   /* previous block in list */
        __be32          back;                   /* following block in list */
        __be16          magic;                  /* validity check on block */
        __be16          pad;                    /* unused */
} xfs_da_blkinfo_t;

so:

00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00
|   forw   |    back   |magic| pad |count|used|
10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

and the only thing the verifier checks on non-crc is
magic (which is good), and count (which is what tripped here)

        if (xfs_sb_version_hascrc(&mp->m_sb)) {
		<snip>
        } else {
                if (ichdr.magic != XFS_ATTR_LEAF_MAGIC)
                        return false;
        }
        if (ichdr.count == 0)
                return false;

so this failed to verify because count was 0.

> And the output of the xfs_repair steps (also attached if needed):
> http://pastie.org/private/gvq3aiisudfhy69ezagw

Ok, no on-disk corruption, that's good.

Can you please provide as much info as possible about your system
and setup?

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

-Eric

> Hope this can provide some insights.
> 
> -Shri

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: XFS crashes on VMs
  2015-06-19 19:37             ` Eric Sandeen
@ 2015-08-03 19:11               ` Shrinand Javadekar
  0 siblings, 0 replies; 9+ messages in thread
From: Shrinand Javadekar @ 2015-08-03 19:11 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs@oss.sgi.com

We hit this again on one of our VMs. This is running the 3.13 kernel.
So, now we have seen this crash on 3.16 and 3.13 kernels. We had
another setup with a 3.8 kernel and for several months we haven't seen
this problem. Is there a way to narrow down what changed between 3.8
and 3.13 and get to the bottom of this?

I had provided info about the workload on a different thread:
http://oss.sgi.com/archives/xfs/2015-06/msg00108.html

If that doesn't work, let me know and I can get it again.

-Shri


On Fri, Jun 19, 2015 at 12:37 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 6/19/15 1:34 PM, Shrinand Javadekar wrote:
>> I hit this problem again and captured the output of all the steps
>> while repairing the filesystem. Here's the crash:
>> http://pastie.org/private/prift1xjcc38s0jcvehvew
>
> that starts with:
>
> Jun 18 18:40:19 foods-12 kernel: [3639696.006884] ffff8801740f8000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
> Jun 18 18:40:19 foods-12 kernel: [3639696.007056] ffff8801740f8010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
> Jun 18 18:40:19 foods-12 kernel: [3639696.007140] ffff8801740f8020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> Jun 18 18:40:19 foods-12 kernel: [3639696.007230] ffff8801740f8030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>
> I think there should have been other interesting bits prior to that line, can you check, and provide it please?  Full dmesg in a pastebin would be just fine.
>
> xfs_attr3_leaf_write_verify at line 216 of file /build/buildd/linux-lts-trusty-3.13.0/fs/xfs/xfs_attr_leaf.c.  Caller 0xffffffffa00a193a
>
> which is ... interesting; something went wrong on the way _to_ disk?
>
> Ok, what is wrong, then.  here's the first 64 bytes of the buffer,
> it contains:
>
> typedef struct xfs_attr_leafblock {
>         xfs_attr_leaf_hdr_t     hdr;    /* constant-structure header block */
>
> where
>
> typedef struct xfs_attr_leaf_hdr {      /* constant-structure header block */
>         xfs_da_blkinfo_t info;          /* block type, links, etc. */
>         __be16  count;                  /* count of active leaf_entry's */
>         __be16  usedbytes;              /* num bytes of names/values stored */
>         __be16  firstused;              /* first used byte in name area */
>         __u8    holes;                  /* != 0 if blk needs compaction */
>         __u8    pad1;
>         xfs_attr_leaf_map_t freemap[XFS_ATTR_LEAF_MAPSIZE];
>                                         /* N largest free regions */
> } xfs_attr_leaf_hdr_t;
>
> and
>
> typedef struct xfs_da_blkinfo {
>         __be32          forw;                   /* previous block in list */
>         __be32          back;                   /* following block in list */
>         __be16          magic;                  /* validity check on block */
>         __be16          pad;                    /* unused */
> } xfs_da_blkinfo_t;
>
> so:
>
> 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00
> |   forw   |    back   |magic| pad |count|used|
> 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> and the only thing the verifier checks on non-crc is
> magic (which is good), and count (which is what tripped here)
>
>         if (xfs_sb_version_hascrc(&mp->m_sb)) {
>                 <snip>
>         } else {
>                 if (ichdr.magic != XFS_ATTR_LEAF_MAGIC)
>                         return false;
>         }
>         if (ichdr.count == 0)
>                 return false;
>
> so this failed to verify because count was 0.
>
>> And the output of the xfs_repair steps (also attached if needed):
>> http://pastie.org/private/gvq3aiisudfhy69ezagw
>
> Ok, no on-disk corruption, that's good.
>
> Can you please provide as much info as possible about your system
> and setup?
>
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> -Eric
>
>> Hope this can provide some insights.
>>
>> -Shri
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-08-03 19:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-27 23:06 XFS crashes on VMs Shrinand Javadekar
2015-05-27 23:27 ` Eric Sandeen
2015-05-28  0:03   ` Shrinand Javadekar
2015-05-28  0:52     ` Eric Sandeen
2015-05-28  0:53       ` Eric Sandeen
2015-05-28 18:08         ` Shrinand Javadekar
2015-06-19 18:34           ` Shrinand Javadekar
2015-06-19 19:37             ` Eric Sandeen
2015-08-03 19:11               ` Shrinand Javadekar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox