* XFS crashes on VMs
@ 2015-05-27 23:06 Shrinand Javadekar
2015-05-27 23:27 ` Eric Sandeen
0 siblings, 1 reply; 9+ messages in thread
From: Shrinand Javadekar @ 2015-05-27 23:06 UTC (permalink / raw)
To: xfs
Hi,
I am running Openstack Swift in a VM with XFS as the underlying
filesystem. This is generating a metadata heavy workload on XFS.
Essentially, it is creating a new directory and a new file (256KB) in
that directory. This file has extended attributes of size 243 bytes.
I am seeing the following two crashes of the machine:
http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
AND
http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
I have only seen these when running in a VM. We have run several tests
on physical server but have never seen these problems.
Are there any known issues with XFS running on VMs?
Thanks in advance.
-Shri
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs
2015-05-27 23:06 XFS crashes on VMs Shrinand Javadekar
@ 2015-05-27 23:27 ` Eric Sandeen
2015-05-28 0:03 ` Shrinand Javadekar
0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2015-05-27 23:27 UTC (permalink / raw)
To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com
That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before.
Eric
> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>
> Hi,
>
> I am running Openstack Swift in a VM with XFS as the underlying
> filesystem. This is generating a metadata heavy workload on XFS.
> Essentially, it is creating a new directory and a new file (256KB) in
> that directory. This file has extended attributes of size 243 bytes.
>
> I am seeing the following two crashes of the machine:
>
> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>
> AND
>
> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>
> I have only seen these when running in a VM. We have run several tests
> on physical server but have never seen these problems.
>
> Are there any known issues with XFS running on VMs?
>
> Thanks in advance.
> -Shri
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs
2015-05-27 23:27 ` Eric Sandeen
@ 2015-05-28 0:03 ` Shrinand Javadekar
2015-05-28 0:52 ` Eric Sandeen
0 siblings, 1 reply; 9+ messages in thread
From: Shrinand Javadekar @ 2015-05-28 0:03 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs@oss.sgi.com
Thanks Eric,
We ran xfs_repair and were able to get it back into a running state.
This is fine for a test & dev but in production it won't be
acceptable. What other data do we need to get to the bottom of this?
On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before.
>
> Eric
>
>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>>
>> Hi,
>>
>> I am running Openstack Swift in a VM with XFS as the underlying
>> filesystem. This is generating a metadata heavy workload on XFS.
>> Essentially, it is creating a new directory and a new file (256KB) in
>> that directory. This file has extended attributes of size 243 bytes.
>>
>> I am seeing the following two crashes of the machine:
>>
>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>
>> AND
>>
>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>
>> I have only seen these when running in a VM. We have run several tests
>> on physical server but have never seen these problems.
>>
>> Are there any known issues with XFS running on VMs?
>>
>> Thanks in advance.
>> -Shri
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs
2015-05-28 0:03 ` Shrinand Javadekar
@ 2015-05-28 0:52 ` Eric Sandeen
2015-05-28 0:53 ` Eric Sandeen
0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2015-05-28 0:52 UTC (permalink / raw)
To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com
You'll need to try to narrow down how it happened.
The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata.
Either:
a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read
b) xfs read the wrong block due to other metadata corruption.
c) something corrupted the storage after it was written
d) the storage returned the wrong data on a read request ...
e) ???
Did you save the xfs_repair output? That might offer more clues.
Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try?
-Eric
On 5/27/15 7:03 PM, Shrinand Javadekar wrote:
> Thanks Eric,
>
> We ran xfs_repair and were able to get it back into a running state.
> This is fine for a test & dev but in production it won't be
> acceptable. What other data do we need to get to the bottom of this?
>
> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before.
>>
>> Eric
>>
>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>>>
>>> Hi,
>>>
>>> I am running Openstack Swift in a VM with XFS as the underlying
>>> filesystem. This is generating a metadata heavy workload on XFS.
>>> Essentially, it is creating a new directory and a new file (256KB) in
>>> that directory. This file has extended attributes of size 243 bytes.
>>>
>>> I am seeing the following two crashes of the machine:
>>>
>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>>
>>> AND
>>>
>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>>
>>> I have only seen these when running in a VM. We have run several tests
>>> on physical server but have never seen these problems.
>>>
>>> Are there any known issues with XFS running on VMs?
>>>
>>> Thanks in advance.
>>> -Shri
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs
2015-05-28 0:52 ` Eric Sandeen
@ 2015-05-28 0:53 ` Eric Sandeen
2015-05-28 18:08 ` Shrinand Javadekar
0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2015-05-28 0:53 UTC (permalink / raw)
To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com
And did anything else "interesting" happen prior to the detection?
> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>
> You'll need to try to narrow down how it happened.
>
> The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata.
>
> Either:
>
> a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read
>
> b) xfs read the wrong block due to other metadata corruption.
>
> c) something corrupted the storage after it was written
>
> d) the storage returned the wrong data on a read request ...
>
> e) ???
>
> Did you save the xfs_repair output? That might offer more clues.
>
> Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try?
>
> -Eric
>
>
>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote:
>> Thanks Eric,
>>
>> We ran xfs_repair and were able to get it back into a running state.
>> This is fine for a test & dev but in production it won't be
>> acceptable. What other data do we need to get to the bottom of this?
>>
>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before.
>>>
>>> Eric
>>>
>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am running Openstack Swift in a VM with XFS as the underlying
>>>> filesystem. This is generating a metadata heavy workload on XFS.
>>>> Essentially, it is creating a new directory and a new file (256KB) in
>>>> that directory. This file has extended attributes of size 243 bytes.
>>>>
>>>> I am seeing the following two crashes of the machine:
>>>>
>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>>>
>>>> AND
>>>>
>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>>>
>>>> I have only seen these when running in a VM. We have run several tests
>>>> on physical server but have never seen these problems.
>>>>
>>>> Are there any known issues with XFS running on VMs?
>>>>
>>>> Thanks in advance.
>>>> -Shri
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs
2015-05-28 0:53 ` Eric Sandeen
@ 2015-05-28 18:08 ` Shrinand Javadekar
2015-06-19 18:34 ` Shrinand Javadekar
0 siblings, 1 reply; 9+ messages in thread
From: Shrinand Javadekar @ 2015-05-28 18:08 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs@oss.sgi.com
We'll try and reproduce this and capture the output of xfs_repair when
it happens next. Will keep an eye on what else was happening in the
infrastructure when it happens.
FWIW, we've seen this in local VMware environment as well as when we
were running on Amazon EC2 instances. So it doesn't seem hypervisor
specific.
On Wed, May 27, 2015 at 5:53 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> And did anything else "interesting" happen prior to the detection?
>
>> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>
>> You'll need to try to narrow down how it happened.
>>
>> The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata.
>>
>> Either:
>>
>> a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read
>>
>> b) xfs read the wrong block due to other metadata corruption.
>>
>> c) something corrupted the storage after it was written
>>
>> d) the storage returned the wrong data on a read request ...
>>
>> e) ???
>>
>> Did you save the xfs_repair output? That might offer more clues.
>>
>> Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try?
>>
>> -Eric
>>
>>
>>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote:
>>> Thanks Eric,
>>>
>>> We ran xfs_repair and were able to get it back into a running state.
>>> This is fine for a test & dev but in production it won't be
>>> acceptable. What other data do we need to get to the bottom of this?
>>>
>>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before.
>>>>
>>>> Eric
>>>>
>>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am running Openstack Swift in a VM with XFS as the underlying
>>>>> filesystem. This is generating a metadata heavy workload on XFS.
>>>>> Essentially, it is creating a new directory and a new file (256KB) in
>>>>> that directory. This file has extended attributes of size 243 bytes.
>>>>>
>>>>> I am seeing the following two crashes of the machine:
>>>>>
>>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>>>>
>>>>> AND
>>>>>
>>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>>>>
>>>>> I have only seen these when running in a VM. We have run several tests
>>>>> on physical server but have never seen these problems.
>>>>>
>>>>> Are there any known issues with XFS running on VMs?
>>>>>
>>>>> Thanks in advance.
>>>>> -Shri
>>>>>
>>>>> _______________________________________________
>>>>> xfs mailing list
>>>>> xfs@oss.sgi.com
>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs
2015-05-28 18:08 ` Shrinand Javadekar
@ 2015-06-19 18:34 ` Shrinand Javadekar
2015-06-19 19:37 ` Eric Sandeen
0 siblings, 1 reply; 9+ messages in thread
From: Shrinand Javadekar @ 2015-06-19 18:34 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs@oss.sgi.com
[-- Attachment #1: Type: text/plain, Size: 4067 bytes --]
I hit this problem again and captured the output of all the steps
while repairing the filesystem. Here's the crash:
http://pastie.org/private/prift1xjcc38s0jcvehvew
And the output of the xfs_repair steps (also attached if needed):
http://pastie.org/private/gvq3aiisudfhy69ezagw
Hope this can provide some insights.
-Shri
On Thu, May 28, 2015 at 11:08 AM, Shrinand Javadekar
<shrinand@maginatics.com> wrote:
> We'll try and reproduce this and capture the output of xfs_repair when
> it happens next. Will keep an eye on what else was happening in the
> infrastructure when it happens.
>
> FWIW, we've seen this in local VMware environment as well as when we
> were running on Amazon EC2 instances. So it doesn't seem hypervisor
> specific.
>
> On Wed, May 27, 2015 at 5:53 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>> And did anything else "interesting" happen prior to the detection?
>>
>>> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>>
>>> You'll need to try to narrow down how it happened.
>>>
>>> The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata.
>>>
>>> Either:
>>>
>>> a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read
>>>
>>> b) xfs read the wrong block due to other metadata corruption.
>>>
>>> c) something corrupted the storage after it was written
>>>
>>> d) the storage returned the wrong data on a read request ...
>>>
>>> e) ???
>>>
>>> Did you save the xfs_repair output? That might offer more clues.
>>>
>>> Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try?
>>>
>>> -Eric
>>>
>>>
>>>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote:
>>>> Thanks Eric,
>>>>
>>>> We ran xfs_repair and were able to get it back into a running state.
>>>> This is fine for a test & dev but in production it won't be
>>>> acceptable. What other data do we need to get to the bottom of this?
>>>>
>>>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>>>>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before.
>>>>>
>>>>> Eric
>>>>>
>>>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am running Openstack Swift in a VM with XFS as the underlying
>>>>>> filesystem. This is generating a metadata heavy workload on XFS.
>>>>>> Essentially, it is creating a new directory and a new file (256KB) in
>>>>>> that directory. This file has extended attributes of size 243 bytes.
>>>>>>
>>>>>> I am seeing the following two crashes of the machine:
>>>>>>
>>>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg
>>>>>>
>>>>>> AND
>>>>>>
>>>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq
>>>>>>
>>>>>> I have only seen these when running in a VM. We have run several tests
>>>>>> on physical server but have never seen these problems.
>>>>>>
>>>>>> Are there any known issues with XFS running on VMs?
>>>>>>
>>>>>> Thanks in advance.
>>>>>> -Shri
>>>>>>
>>>>>> _______________________________________________
>>>>>> xfs mailing list
>>>>>> xfs@oss.sgi.com
>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
[-- Attachment #2: xfs_crash --]
[-- Type: application/octet-stream, Size: 2074 bytes --]
root@foods-12:/home/maginatics# xfs_repair /dev/mapper/TrollGroup-TrollVolume
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
root@foods-12:/home/maginatics# mount /dev/mapper/TrollGroup-TrollVolume
root@foods-12:/home/maginatics# mount
...
/dev/mapper/TrollGroup-TrollVolume on /lvm type xfs (rw,noexec,nosuid,nodev,noatime,nodiratime,nobarrier,logbufs=8)
root@foods-12:/home/maginatics# umount /lvm
root@foods-12:/home/maginatics# mount
<no /lvm>
root@foods-12:/home/maginatics: xfs_repair /dev/mapper/TrollGroup-TrollVolume
root@foods-12:/home/maginatics# tail -f repair_output
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
[-- Attachment #3: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs
2015-06-19 18:34 ` Shrinand Javadekar
@ 2015-06-19 19:37 ` Eric Sandeen
2015-08-03 19:11 ` Shrinand Javadekar
0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2015-06-19 19:37 UTC (permalink / raw)
To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com
On 6/19/15 1:34 PM, Shrinand Javadekar wrote:
> I hit this problem again and captured the output of all the steps
> while repairing the filesystem. Here's the crash:
> http://pastie.org/private/prift1xjcc38s0jcvehvew
that starts with:
Jun 18 18:40:19 foods-12 kernel: [3639696.006884] ffff8801740f8000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 ................
Jun 18 18:40:19 foods-12 kernel: [3639696.007056] ffff8801740f8010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 ..... ..........
Jun 18 18:40:19 foods-12 kernel: [3639696.007140] ffff8801740f8020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Jun 18 18:40:19 foods-12 kernel: [3639696.007230] ffff8801740f8030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
I think there should have been other interesting bits prior to that line, can you check, and provide it please? Full dmesg in a pastebin would be just fine.
xfs_attr3_leaf_write_verify at line 216 of file /build/buildd/linux-lts-trusty-3.13.0/fs/xfs/xfs_attr_leaf.c. Caller 0xffffffffa00a193a
which is ... interesting; something went wrong on the way _to_ disk?
Ok, what is wrong, then. here's the first 64 bytes of the buffer,
it contains:
typedef struct xfs_attr_leafblock {
xfs_attr_leaf_hdr_t hdr; /* constant-structure header block */
where
typedef struct xfs_attr_leaf_hdr { /* constant-structure header block */
xfs_da_blkinfo_t info; /* block type, links, etc. */
__be16 count; /* count of active leaf_entry's */
__be16 usedbytes; /* num bytes of names/values stored */
__be16 firstused; /* first used byte in name area */
__u8 holes; /* != 0 if blk needs compaction */
__u8 pad1;
xfs_attr_leaf_map_t freemap[XFS_ATTR_LEAF_MAPSIZE];
/* N largest free regions */
} xfs_attr_leaf_hdr_t;
and
typedef struct xfs_da_blkinfo {
__be32 forw; /* previous block in list */
__be32 back; /* following block in list */
__be16 magic; /* validity check on block */
__be16 pad; /* unused */
} xfs_da_blkinfo_t;
so:
00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00
| forw | back |magic| pad |count|used|
10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
and the only thing the verifier checks on non-crc is
magic (which is good), and count (which is what tripped here)
if (xfs_sb_version_hascrc(&mp->m_sb)) {
<snip>
} else {
if (ichdr.magic != XFS_ATTR_LEAF_MAGIC)
return false;
}
if (ichdr.count == 0)
return false;
so this failed to verify because count was 0.
> And the output of the xfs_repair steps (also attached if needed):
> http://pastie.org/private/gvq3aiisudfhy69ezagw
Ok, no on-disk corruption, that's good.
Can you please provide as much info as possible about your system
and setup?
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
-Eric
> Hope this can provide some insights.
>
> -Shri
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs
2015-06-19 19:37 ` Eric Sandeen
@ 2015-08-03 19:11 ` Shrinand Javadekar
0 siblings, 0 replies; 9+ messages in thread
From: Shrinand Javadekar @ 2015-08-03 19:11 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs@oss.sgi.com
We hit this again on one of our VMs. This is running the 3.13 kernel.
So, now we have seen this crash on 3.16 and 3.13 kernels. We had
another setup with a 3.8 kernel and for several months we haven't seen
this problem. Is there a way to narrow down what changed between 3.8
and 3.13 and get to the bottom of this?
I had provided info about the workload on a different thread:
http://oss.sgi.com/archives/xfs/2015-06/msg00108.html
If that doesn't work, let me know and I can get it again.
-Shri
On Fri, Jun 19, 2015 at 12:37 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 6/19/15 1:34 PM, Shrinand Javadekar wrote:
>> I hit this problem again and captured the output of all the steps
>> while repairing the filesystem. Here's the crash:
>> http://pastie.org/private/prift1xjcc38s0jcvehvew
>
> that starts with:
>
> Jun 18 18:40:19 foods-12 kernel: [3639696.006884] ffff8801740f8000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 ................
> Jun 18 18:40:19 foods-12 kernel: [3639696.007056] ffff8801740f8010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 ..... ..........
> Jun 18 18:40:19 foods-12 kernel: [3639696.007140] ffff8801740f8020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> Jun 18 18:40:19 foods-12 kernel: [3639696.007230] ffff8801740f8030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>
> I think there should have been other interesting bits prior to that line, can you check, and provide it please? Full dmesg in a pastebin would be just fine.
>
> xfs_attr3_leaf_write_verify at line 216 of file /build/buildd/linux-lts-trusty-3.13.0/fs/xfs/xfs_attr_leaf.c. Caller 0xffffffffa00a193a
>
> which is ... interesting; something went wrong on the way _to_ disk?
>
> Ok, what is wrong, then. here's the first 64 bytes of the buffer,
> it contains:
>
> typedef struct xfs_attr_leafblock {
> xfs_attr_leaf_hdr_t hdr; /* constant-structure header block */
>
> where
>
> typedef struct xfs_attr_leaf_hdr { /* constant-structure header block */
> xfs_da_blkinfo_t info; /* block type, links, etc. */
> __be16 count; /* count of active leaf_entry's */
> __be16 usedbytes; /* num bytes of names/values stored */
> __be16 firstused; /* first used byte in name area */
> __u8 holes; /* != 0 if blk needs compaction */
> __u8 pad1;
> xfs_attr_leaf_map_t freemap[XFS_ATTR_LEAF_MAPSIZE];
> /* N largest free regions */
> } xfs_attr_leaf_hdr_t;
>
> and
>
> typedef struct xfs_da_blkinfo {
> __be32 forw; /* previous block in list */
> __be32 back; /* following block in list */
> __be16 magic; /* validity check on block */
> __be16 pad; /* unused */
> } xfs_da_blkinfo_t;
>
> so:
>
> 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00
> | forw | back |magic| pad |count|used|
> 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> and the only thing the verifier checks on non-crc is
> magic (which is good), and count (which is what tripped here)
>
> if (xfs_sb_version_hascrc(&mp->m_sb)) {
> <snip>
> } else {
> if (ichdr.magic != XFS_ATTR_LEAF_MAGIC)
> return false;
> }
> if (ichdr.count == 0)
> return false;
>
> so this failed to verify because count was 0.
>
>> And the output of the xfs_repair steps (also attached if needed):
>> http://pastie.org/private/gvq3aiisudfhy69ezagw
>
> Ok, no on-disk corruption, that's good.
>
> Can you please provide as much info as possible about your system
> and setup?
>
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> -Eric
>
>> Hope this can provide some insights.
>>
>> -Shri
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-08-03 19:11 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-27 23:06 XFS crashes on VMs Shrinand Javadekar
2015-05-27 23:27 ` Eric Sandeen
2015-05-28 0:03 ` Shrinand Javadekar
2015-05-28 0:52 ` Eric Sandeen
2015-05-28 0:53 ` Eric Sandeen
2015-05-28 18:08 ` Shrinand Javadekar
2015-06-19 18:34 ` Shrinand Javadekar
2015-06-19 19:37 ` Eric Sandeen
2015-08-03 19:11 ` Shrinand Javadekar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox