Re: xfs umount with i/o error hang/memory corruption

From: Stan Hoeppner <stan@hardwarefreak.com>
To: Bob Mastors <bob.mastors@solidfire.com>, xfs@oss.sgi.com
Subject: Re: xfs umount with i/o error hang/memory corruption
Date: Fri, 04 Apr 2014 13:50:47 -0500	[thread overview]
Message-ID: <533EFF07.6050308@hardwarefreak.com> (raw)
In-Reply-To: <CALjwKZAJ-R8dS13Rsj3+K3hM9p0z08qvi4ZVTYbDWKT1Biu=-Q@mail.gmail.com>

On 4/4/2014 1:15 PM, Bob Mastors wrote:
> Greetings,
> 
> I am new to xfs and am running into a problem
> and would appreciate any guidance on how to proceed.
> 
> After an i/o error from the block device that xfs is using,
> an umount results in a message like:
> [  370.636473] XFS (sdx): Log I/O Error Detected.  Shutting down filesystem
> [  370.644073] XFS (h           ���h"h          ���H#h          ���bsg):
> Please umount the filesystem and rectify the problem(s)
> Note the garbage on the previous line which suggests memory corruption.
> About half the time I get the garbled log message. About half the time
> umount hangs.
> 
> And then I get this kind of error and the system is unresponsive:
> Message from syslogd@debian at Apr  4 09:27:40 ...
>  kernel:[  680.080022] BUG: soft lockup - CPU#2 stuck for 22s! [umount:2849]
> 
> The problem appears to be similar to this issue:
> http://www.spinics.net/lists/linux-xfs/msg00061.html
> 
> I can reproduce the problem easily using open-iscsi to create
> the block device with an iscsi initiator.
> I use lio to create an iscsi target.
> 
> The problem is triggered by doing an iscsi logout which causes
> the block device to return i/o errors to xfs.
> Steps to reproduce the problem are below.

This is not a problem but the expected behavior.  XFS is designed to do
this to prevent filesystem corruption.  Logging out of a LUN is no
different than pulling the power plug on a direct attached disk drive.
Surely you would not do that to a running filesystem.

> Using VirtualBox, I can reproduce it with two processors but not one.
> I first saw this on a 3.8 kernel and most recently reproduced it with 3.14+.
...

The only problem I see here is that XFS should be shutting down every
time the disk device disappears.  Which means in your test cases where
it does not, your VM environment isn't passing the IO errors up the
stack, and it should be.  Which means your VM environment is broken.

Cheers,

Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs