* XFS crashes on VMs @ 2015-05-27 23:06 Shrinand Javadekar 2015-05-27 23:27 ` Eric Sandeen 0 siblings, 1 reply; 9+ messages in thread From: Shrinand Javadekar @ 2015-05-27 23:06 UTC (permalink / raw) To: xfs Hi, I am running Openstack Swift in a VM with XFS as the underlying filesystem. This is generating a metadata heavy workload on XFS. Essentially, it is creating a new directory and a new file (256KB) in that directory. This file has extended attributes of size 243 bytes. I am seeing the following two crashes of the machine: http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg AND http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq I have only seen these when running in a VM. We have run several tests on physical server but have never seen these problems. Are there any known issues with XFS running on VMs? Thanks in advance. -Shri _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs 2015-05-27 23:06 XFS crashes on VMs Shrinand Javadekar @ 2015-05-27 23:27 ` Eric Sandeen 2015-05-28 0:03 ` Shrinand Javadekar 0 siblings, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2015-05-27 23:27 UTC (permalink / raw) To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before. Eric > On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote: > > Hi, > > I am running Openstack Swift in a VM with XFS as the underlying > filesystem. This is generating a metadata heavy workload on XFS. > Essentially, it is creating a new directory and a new file (256KB) in > that directory. This file has extended attributes of size 243 bytes. > > I am seeing the following two crashes of the machine: > > http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg > > AND > > http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq > > I have only seen these when running in a VM. We have run several tests > on physical server but have never seen these problems. > > Are there any known issues with XFS running on VMs? > > Thanks in advance. > -Shri > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs 2015-05-27 23:27 ` Eric Sandeen @ 2015-05-28 0:03 ` Shrinand Javadekar 2015-05-28 0:52 ` Eric Sandeen 0 siblings, 1 reply; 9+ messages in thread From: Shrinand Javadekar @ 2015-05-28 0:03 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs@oss.sgi.com Thanks Eric, We ran xfs_repair and were able to get it back into a running state. This is fine for a test & dev but in production it won't be acceptable. What other data do we need to get to the bottom of this? On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote: > That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before. > > Eric > >> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote: >> >> Hi, >> >> I am running Openstack Swift in a VM with XFS as the underlying >> filesystem. This is generating a metadata heavy workload on XFS. >> Essentially, it is creating a new directory and a new file (256KB) in >> that directory. This file has extended attributes of size 243 bytes. >> >> I am seeing the following two crashes of the machine: >> >> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg >> >> AND >> >> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq >> >> I have only seen these when running in a VM. We have run several tests >> on physical server but have never seen these problems. >> >> Are there any known issues with XFS running on VMs? >> >> Thanks in advance. >> -Shri >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs >> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs 2015-05-28 0:03 ` Shrinand Javadekar @ 2015-05-28 0:52 ` Eric Sandeen 2015-05-28 0:53 ` Eric Sandeen 0 siblings, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2015-05-28 0:52 UTC (permalink / raw) To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com You'll need to try to narrow down how it happened. The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata. Either: a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read b) xfs read the wrong block due to other metadata corruption. c) something corrupted the storage after it was written d) the storage returned the wrong data on a read request ... e) ??? Did you save the xfs_repair output? That might offer more clues. Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try? -Eric On 5/27/15 7:03 PM, Shrinand Javadekar wrote: > Thanks Eric, > > We ran xfs_repair and were able to get it back into a running state. > This is fine for a test & dev but in production it won't be > acceptable. What other data do we need to get to the bottom of this? > > On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote: >> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before. >> >> Eric >> >>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote: >>> >>> Hi, >>> >>> I am running Openstack Swift in a VM with XFS as the underlying >>> filesystem. This is generating a metadata heavy workload on XFS. >>> Essentially, it is creating a new directory and a new file (256KB) in >>> that directory. This file has extended attributes of size 243 bytes. >>> >>> I am seeing the following two crashes of the machine: >>> >>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg >>> >>> AND >>> >>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq >>> >>> I have only seen these when running in a VM. We have run several tests >>> on physical server but have never seen these problems. >>> >>> Are there any known issues with XFS running on VMs? >>> >>> Thanks in advance. >>> -Shri >>> >>> _______________________________________________ >>> xfs mailing list >>> xfs@oss.sgi.com >>> http://oss.sgi.com/mailman/listinfo/xfs >>> > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs 2015-05-28 0:52 ` Eric Sandeen @ 2015-05-28 0:53 ` Eric Sandeen 2015-05-28 18:08 ` Shrinand Javadekar 0 siblings, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2015-05-28 0:53 UTC (permalink / raw) To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com And did anything else "interesting" happen prior to the detection? > On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@sandeen.net> wrote: > > You'll need to try to narrow down how it happened. > > The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata. > > Either: > > a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read > > b) xfs read the wrong block due to other metadata corruption. > > c) something corrupted the storage after it was written > > d) the storage returned the wrong data on a read request ... > > e) ??? > > Did you save the xfs_repair output? That might offer more clues. > > Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try? > > -Eric > > >> On 5/27/15 7:03 PM, Shrinand Javadekar wrote: >> Thanks Eric, >> >> We ran xfs_repair and were able to get it back into a running state. >> This is fine for a test & dev but in production it won't be >> acceptable. What other data do we need to get to the bottom of this? >> >>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote: >>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before. >>> >>> Eric >>> >>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote: >>>> >>>> Hi, >>>> >>>> I am running Openstack Swift in a VM with XFS as the underlying >>>> filesystem. This is generating a metadata heavy workload on XFS. >>>> Essentially, it is creating a new directory and a new file (256KB) in >>>> that directory. This file has extended attributes of size 243 bytes. >>>> >>>> I am seeing the following two crashes of the machine: >>>> >>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg >>>> >>>> AND >>>> >>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq >>>> >>>> I have only seen these when running in a VM. We have run several tests >>>> on physical server but have never seen these problems. >>>> >>>> Are there any known issues with XFS running on VMs? >>>> >>>> Thanks in advance. >>>> -Shri >>>> >>>> _______________________________________________ >>>> xfs mailing list >>>> xfs@oss.sgi.com >>>> http://oss.sgi.com/mailman/listinfo/xfs >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs 2015-05-28 0:53 ` Eric Sandeen @ 2015-05-28 18:08 ` Shrinand Javadekar 2015-06-19 18:34 ` Shrinand Javadekar 0 siblings, 1 reply; 9+ messages in thread From: Shrinand Javadekar @ 2015-05-28 18:08 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs@oss.sgi.com We'll try and reproduce this and capture the output of xfs_repair when it happens next. Will keep an eye on what else was happening in the infrastructure when it happens. FWIW, we've seen this in local VMware environment as well as when we were running on Amazon EC2 instances. So it doesn't seem hypervisor specific. On Wed, May 27, 2015 at 5:53 PM, Eric Sandeen <sandeen@sandeen.net> wrote: > And did anything else "interesting" happen prior to the detection? > >> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@sandeen.net> wrote: >> >> You'll need to try to narrow down how it happened. >> >> The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata. >> >> Either: >> >> a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read >> >> b) xfs read the wrong block due to other metadata corruption. >> >> c) something corrupted the storage after it was written >> >> d) the storage returned the wrong data on a read request ... >> >> e) ??? >> >> Did you save the xfs_repair output? That might offer more clues. >> >> Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try? >> >> -Eric >> >> >>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote: >>> Thanks Eric, >>> >>> We ran xfs_repair and were able to get it back into a running state. >>> This is fine for a test & dev but in production it won't be >>> acceptable. What other data do we need to get to the bottom of this? >>> >>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote: >>>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before. >>>> >>>> Eric >>>> >>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I am running Openstack Swift in a VM with XFS as the underlying >>>>> filesystem. This is generating a metadata heavy workload on XFS. >>>>> Essentially, it is creating a new directory and a new file (256KB) in >>>>> that directory. This file has extended attributes of size 243 bytes. >>>>> >>>>> I am seeing the following two crashes of the machine: >>>>> >>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg >>>>> >>>>> AND >>>>> >>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq >>>>> >>>>> I have only seen these when running in a VM. We have run several tests >>>>> on physical server but have never seen these problems. >>>>> >>>>> Are there any known issues with XFS running on VMs? >>>>> >>>>> Thanks in advance. >>>>> -Shri >>>>> >>>>> _______________________________________________ >>>>> xfs mailing list >>>>> xfs@oss.sgi.com >>>>> http://oss.sgi.com/mailman/listinfo/xfs >>> >>> _______________________________________________ >>> xfs mailing list >>> xfs@oss.sgi.com >>> http://oss.sgi.com/mailman/listinfo/xfs >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs 2015-05-28 18:08 ` Shrinand Javadekar @ 2015-06-19 18:34 ` Shrinand Javadekar 2015-06-19 19:37 ` Eric Sandeen 0 siblings, 1 reply; 9+ messages in thread From: Shrinand Javadekar @ 2015-06-19 18:34 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs@oss.sgi.com [-- Attachment #1: Type: text/plain, Size: 4067 bytes --] I hit this problem again and captured the output of all the steps while repairing the filesystem. Here's the crash: http://pastie.org/private/prift1xjcc38s0jcvehvew And the output of the xfs_repair steps (also attached if needed): http://pastie.org/private/gvq3aiisudfhy69ezagw Hope this can provide some insights. -Shri On Thu, May 28, 2015 at 11:08 AM, Shrinand Javadekar <shrinand@maginatics.com> wrote: > We'll try and reproduce this and capture the output of xfs_repair when > it happens next. Will keep an eye on what else was happening in the > infrastructure when it happens. > > FWIW, we've seen this in local VMware environment as well as when we > were running on Amazon EC2 instances. So it doesn't seem hypervisor > specific. > > On Wed, May 27, 2015 at 5:53 PM, Eric Sandeen <sandeen@sandeen.net> wrote: >> And did anything else "interesting" happen prior to the detection? >> >>> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@sandeen.net> wrote: >>> >>> You'll need to try to narrow down how it happened. >>> >>> The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata. >>> >>> Either: >>> >>> a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read >>> >>> b) xfs read the wrong block due to other metadata corruption. >>> >>> c) something corrupted the storage after it was written >>> >>> d) the storage returned the wrong data on a read request ... >>> >>> e) ??? >>> >>> Did you save the xfs_repair output? That might offer more clues. >>> >>> Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try? >>> >>> -Eric >>> >>> >>>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote: >>>> Thanks Eric, >>>> >>>> We ran xfs_repair and were able to get it back into a running state. >>>> This is fine for a test & dev but in production it won't be >>>> acceptable. What other data do we need to get to the bottom of this? >>>> >>>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@sandeen.net> wrote: >>>>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before. >>>>> >>>>> Eric >>>>> >>>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@maginatics.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am running Openstack Swift in a VM with XFS as the underlying >>>>>> filesystem. This is generating a metadata heavy workload on XFS. >>>>>> Essentially, it is creating a new directory and a new file (256KB) in >>>>>> that directory. This file has extended attributes of size 243 bytes. >>>>>> >>>>>> I am seeing the following two crashes of the machine: >>>>>> >>>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg >>>>>> >>>>>> AND >>>>>> >>>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq >>>>>> >>>>>> I have only seen these when running in a VM. We have run several tests >>>>>> on physical server but have never seen these problems. >>>>>> >>>>>> Are there any known issues with XFS running on VMs? >>>>>> >>>>>> Thanks in advance. >>>>>> -Shri >>>>>> >>>>>> _______________________________________________ >>>>>> xfs mailing list >>>>>> xfs@oss.sgi.com >>>>>> http://oss.sgi.com/mailman/listinfo/xfs >>>> >>>> _______________________________________________ >>>> xfs mailing list >>>> xfs@oss.sgi.com >>>> http://oss.sgi.com/mailman/listinfo/xfs >>> >>> _______________________________________________ >>> xfs mailing list >>> xfs@oss.sgi.com >>> http://oss.sgi.com/mailman/listinfo/xfs [-- Attachment #2: xfs_crash --] [-- Type: application/octet-stream, Size: 2074 bytes --] root@foods-12:/home/maginatics# xfs_repair /dev/mapper/TrollGroup-TrollVolume Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. root@foods-12:/home/maginatics# mount /dev/mapper/TrollGroup-TrollVolume root@foods-12:/home/maginatics# mount ... /dev/mapper/TrollGroup-TrollVolume on /lvm type xfs (rw,noexec,nosuid,nodev,noatime,nodiratime,nobarrier,logbufs=8) root@foods-12:/home/maginatics# umount /lvm root@foods-12:/home/maginatics# mount <no /lvm> root@foods-12:/home/maginatics: xfs_repair /dev/mapper/TrollGroup-TrollVolume root@foods-12:/home/maginatics# tail -f repair_output Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done [-- Attachment #3: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs 2015-06-19 18:34 ` Shrinand Javadekar @ 2015-06-19 19:37 ` Eric Sandeen 2015-08-03 19:11 ` Shrinand Javadekar 0 siblings, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2015-06-19 19:37 UTC (permalink / raw) To: Shrinand Javadekar; +Cc: xfs@oss.sgi.com On 6/19/15 1:34 PM, Shrinand Javadekar wrote: > I hit this problem again and captured the output of all the steps > while repairing the filesystem. Here's the crash: > http://pastie.org/private/prift1xjcc38s0jcvehvew that starts with: Jun 18 18:40:19 foods-12 kernel: [3639696.006884] ffff8801740f8000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 ................ Jun 18 18:40:19 foods-12 kernel: [3639696.007056] ffff8801740f8010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 ..... .......... Jun 18 18:40:19 foods-12 kernel: [3639696.007140] ffff8801740f8020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Jun 18 18:40:19 foods-12 kernel: [3639696.007230] ffff8801740f8030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ I think there should have been other interesting bits prior to that line, can you check, and provide it please? Full dmesg in a pastebin would be just fine. xfs_attr3_leaf_write_verify at line 216 of file /build/buildd/linux-lts-trusty-3.13.0/fs/xfs/xfs_attr_leaf.c. Caller 0xffffffffa00a193a which is ... interesting; something went wrong on the way _to_ disk? Ok, what is wrong, then. here's the first 64 bytes of the buffer, it contains: typedef struct xfs_attr_leafblock { xfs_attr_leaf_hdr_t hdr; /* constant-structure header block */ where typedef struct xfs_attr_leaf_hdr { /* constant-structure header block */ xfs_da_blkinfo_t info; /* block type, links, etc. */ __be16 count; /* count of active leaf_entry's */ __be16 usedbytes; /* num bytes of names/values stored */ __be16 firstused; /* first used byte in name area */ __u8 holes; /* != 0 if blk needs compaction */ __u8 pad1; xfs_attr_leaf_map_t freemap[XFS_ATTR_LEAF_MAPSIZE]; /* N largest free regions */ } xfs_attr_leaf_hdr_t; and typedef struct xfs_da_blkinfo { __be32 forw; /* previous block in list */ __be32 back; /* following block in list */ __be16 magic; /* validity check on block */ __be16 pad; /* unused */ } xfs_da_blkinfo_t; so: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 | forw | back |magic| pad |count|used| 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 and the only thing the verifier checks on non-crc is magic (which is good), and count (which is what tripped here) if (xfs_sb_version_hascrc(&mp->m_sb)) { <snip> } else { if (ichdr.magic != XFS_ATTR_LEAF_MAGIC) return false; } if (ichdr.count == 0) return false; so this failed to verify because count was 0. > And the output of the xfs_repair steps (also attached if needed): > http://pastie.org/private/gvq3aiisudfhy69ezagw Ok, no on-disk corruption, that's good. Can you please provide as much info as possible about your system and setup? http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F -Eric > Hope this can provide some insights. > > -Shri _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: XFS crashes on VMs 2015-06-19 19:37 ` Eric Sandeen @ 2015-08-03 19:11 ` Shrinand Javadekar 0 siblings, 0 replies; 9+ messages in thread From: Shrinand Javadekar @ 2015-08-03 19:11 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs@oss.sgi.com We hit this again on one of our VMs. This is running the 3.13 kernel. So, now we have seen this crash on 3.16 and 3.13 kernels. We had another setup with a 3.8 kernel and for several months we haven't seen this problem. Is there a way to narrow down what changed between 3.8 and 3.13 and get to the bottom of this? I had provided info about the workload on a different thread: http://oss.sgi.com/archives/xfs/2015-06/msg00108.html If that doesn't work, let me know and I can get it again. -Shri On Fri, Jun 19, 2015 at 12:37 PM, Eric Sandeen <sandeen@sandeen.net> wrote: > On 6/19/15 1:34 PM, Shrinand Javadekar wrote: >> I hit this problem again and captured the output of all the steps >> while repairing the filesystem. Here's the crash: >> http://pastie.org/private/prift1xjcc38s0jcvehvew > > that starts with: > > Jun 18 18:40:19 foods-12 kernel: [3639696.006884] ffff8801740f8000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 ................ > Jun 18 18:40:19 foods-12 kernel: [3639696.007056] ffff8801740f8010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 ..... .......... > Jun 18 18:40:19 foods-12 kernel: [3639696.007140] ffff8801740f8020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > Jun 18 18:40:19 foods-12 kernel: [3639696.007230] ffff8801740f8030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > I think there should have been other interesting bits prior to that line, can you check, and provide it please? Full dmesg in a pastebin would be just fine. > > xfs_attr3_leaf_write_verify at line 216 of file /build/buildd/linux-lts-trusty-3.13.0/fs/xfs/xfs_attr_leaf.c. Caller 0xffffffffa00a193a > > which is ... interesting; something went wrong on the way _to_ disk? > > Ok, what is wrong, then. here's the first 64 bytes of the buffer, > it contains: > > typedef struct xfs_attr_leafblock { > xfs_attr_leaf_hdr_t hdr; /* constant-structure header block */ > > where > > typedef struct xfs_attr_leaf_hdr { /* constant-structure header block */ > xfs_da_blkinfo_t info; /* block type, links, etc. */ > __be16 count; /* count of active leaf_entry's */ > __be16 usedbytes; /* num bytes of names/values stored */ > __be16 firstused; /* first used byte in name area */ > __u8 holes; /* != 0 if blk needs compaction */ > __u8 pad1; > xfs_attr_leaf_map_t freemap[XFS_ATTR_LEAF_MAPSIZE]; > /* N largest free regions */ > } xfs_attr_leaf_hdr_t; > > and > > typedef struct xfs_da_blkinfo { > __be32 forw; /* previous block in list */ > __be32 back; /* following block in list */ > __be16 magic; /* validity check on block */ > __be16 pad; /* unused */ > } xfs_da_blkinfo_t; > > so: > > 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 > | forw | back |magic| pad |count|used| > 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > and the only thing the verifier checks on non-crc is > magic (which is good), and count (which is what tripped here) > > if (xfs_sb_version_hascrc(&mp->m_sb)) { > <snip> > } else { > if (ichdr.magic != XFS_ATTR_LEAF_MAGIC) > return false; > } > if (ichdr.count == 0) > return false; > > so this failed to verify because count was 0. > >> And the output of the xfs_repair steps (also attached if needed): >> http://pastie.org/private/gvq3aiisudfhy69ezagw > > Ok, no on-disk corruption, that's good. > > Can you please provide as much info as possible about your system > and setup? > > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > > -Eric > >> Hope this can provide some insights. >> >> -Shri > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-08-03 19:11 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-05-27 23:06 XFS crashes on VMs Shrinand Javadekar 2015-05-27 23:27 ` Eric Sandeen 2015-05-28 0:03 ` Shrinand Javadekar 2015-05-28 0:52 ` Eric Sandeen 2015-05-28 0:53 ` Eric Sandeen 2015-05-28 18:08 ` Shrinand Javadekar 2015-06-19 18:34 ` Shrinand Javadekar 2015-06-19 19:37 ` Eric Sandeen 2015-08-03 19:11 ` Shrinand Javadekar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox