* xfs umount with i/o error hang/memory corruption
@ 2014-04-04 18:15 Bob Mastors
2014-04-04 18:50 ` Stan Hoeppner
2014-04-04 21:20 ` Dave Chinner
0 siblings, 2 replies; 6+ messages in thread
From: Bob Mastors @ 2014-04-04 18:15 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 2672 bytes --]
Greetings,
I am new to xfs and am running into a problem
and would appreciate any guidance on how to proceed.
After an i/o error from the block device that xfs is using,
an umount results in a message like:
[ 370.636473] XFS (sdx): Log I/O Error Detected. Shutting down filesystem
[ 370.644073] XFS (h ���h"h ���H#h ���bsg):
Please umount the filesystem and rectify the problem(s)
Note the garbage on the previous line which suggests memory corruption.
About half the time I get the garbled log message. About half the time
umount hangs.
And then I get this kind of error and the system is unresponsive:
Message from syslogd@debian at Apr 4 09:27:40 ...
kernel:[ 680.080022] BUG: soft lockup - CPU#2 stuck for 22s! [umount:2849]
The problem appears to be similar to this issue:
http://www.spinics.net/lists/linux-xfs/msg00061.html
I can reproduce the problem easily using open-iscsi to create
the block device with an iscsi initiator.
I use lio to create an iscsi target.
The problem is triggered by doing an iscsi logout which causes
the block device to return i/o errors to xfs.
Steps to reproduce the problem are below.
Using VirtualBox, I can reproduce it with two processors but not one.
I first saw this on a 3.8 kernel and most recently reproduced it with 3.14+.
Thanks,
Bob
virtual box setup
-----------------
1 GB ram
8 GB disk
2 processors
Operating System: Linux
Version: Debian (64 bit)
Bridged networking
iso: debian-7.4.0-amd64-netinst.iso
Software selection
select SSH server
select Standard system utilities
deselect everything else
dhcp ip address: 172.17.5.129
upgrade linux to 3.14.0+
------------------------
ssh 172.17.5.129
su
cd /mnt/ksrc/linux
make modules_install
make install
reboot
install tools
-------------
ssh 172.17.5.129
su
apt-get install xfsprogs
apt-get install open-iscsi
apt-get install --no-install-recommends targetcli python-urwid
reboot
reproduce bug
-------------
ssh 172.17.5.129
su
targetcli
cd backstores
fileio/ create name=file_backend file_or_dev=my_disk size=100M
/iscsi create
portals/ create 172.17.5.129
cd <
luns/ create /backstores/fileio/file_backend
cd <
set attribute authentication=0 demo_mode_write_protect=0
generate_node_acls=1 cache_dynamic_acls=1
exit
iscsiadm -m discovery -t sendtargets -p 172.17.5.129
iscsiadm -m node --login --target iqn.from_previous_iscsiadm_cmd_output
fdisk -l
mkfs -t xfs -f /dev/sdXX
mkdir /mnt/vola
mount -t xfs /dev/sdXX /mnt/vola
iscsiadm -m node -U all
umount /mnt/vola
[-- Attachment #1.2: Type: text/html, Size: 3723 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: xfs umount with i/o error hang/memory corruption
2014-04-04 18:15 xfs umount with i/o error hang/memory corruption Bob Mastors
@ 2014-04-04 18:50 ` Stan Hoeppner
2014-04-04 19:47 ` Bob Mastors
2014-04-04 21:20 ` Dave Chinner
1 sibling, 1 reply; 6+ messages in thread
From: Stan Hoeppner @ 2014-04-04 18:50 UTC (permalink / raw)
To: Bob Mastors, xfs
On 4/4/2014 1:15 PM, Bob Mastors wrote:
> Greetings,
>
> I am new to xfs and am running into a problem
> and would appreciate any guidance on how to proceed.
>
> After an i/o error from the block device that xfs is using,
> an umount results in a message like:
> [ 370.636473] XFS (sdx): Log I/O Error Detected. Shutting down filesystem
> [ 370.644073] XFS (h ���h"h ���H#h ���bsg):
> Please umount the filesystem and rectify the problem(s)
> Note the garbage on the previous line which suggests memory corruption.
> About half the time I get the garbled log message. About half the time
> umount hangs.
>
> And then I get this kind of error and the system is unresponsive:
> Message from syslogd@debian at Apr 4 09:27:40 ...
> kernel:[ 680.080022] BUG: soft lockup - CPU#2 stuck for 22s! [umount:2849]
>
> The problem appears to be similar to this issue:
> http://www.spinics.net/lists/linux-xfs/msg00061.html
>
> I can reproduce the problem easily using open-iscsi to create
> the block device with an iscsi initiator.
> I use lio to create an iscsi target.
>
> The problem is triggered by doing an iscsi logout which causes
> the block device to return i/o errors to xfs.
> Steps to reproduce the problem are below.
This is not a problem but the expected behavior. XFS is designed to do
this to prevent filesystem corruption. Logging out of a LUN is no
different than pulling the power plug on a direct attached disk drive.
Surely you would not do that to a running filesystem.
> Using VirtualBox, I can reproduce it with two processors but not one.
> I first saw this on a 3.8 kernel and most recently reproduced it with 3.14+.
...
The only problem I see here is that XFS should be shutting down every
time the disk device disappears. Which means in your test cases where
it does not, your VM environment isn't passing the IO errors up the
stack, and it should be. Which means your VM environment is broken.
Cheers,
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: xfs umount with i/o error hang/memory corruption
2014-04-04 18:50 ` Stan Hoeppner
@ 2014-04-04 19:47 ` Bob Mastors
0 siblings, 0 replies; 6+ messages in thread
From: Bob Mastors @ 2014-04-04 19:47 UTC (permalink / raw)
To: stan; +Cc: xfs
[-- Attachment #1.1: Type: text/plain, Size: 3066 bytes --]
On Fri, Apr 4, 2014 at 12:50 PM, Stan Hoeppner <stan@hardwarefreak.com>wrote:
> On 4/4/2014 1:15 PM, Bob Mastors wrote:
> > Greetings,
> >
> > I am new to xfs and am running into a problem
> > and would appreciate any guidance on how to proceed.
> >
> > After an i/o error from the block device that xfs is using,
> > an umount results in a message like:
> > [ 370.636473] XFS (sdx): Log I/O Error Detected. Shutting down
> filesystem
> > [ 370.644073] XFS (h ���h"h ���H#h ���bsg):
> > Please umount the filesystem and rectify the problem(s)
> > Note the garbage on the previous line which suggests memory corruption.
> > About half the time I get the garbled log message. About half the time
> > umount hangs.
> >
> > And then I get this kind of error and the system is unresponsive:
> > Message from syslogd@debian at Apr 4 09:27:40 ...
> > kernel:[ 680.080022] BUG: soft lockup - CPU#2 stuck for 22s!
> [umount:2849]
> >
> > The problem appears to be similar to this issue:
> > http://www.spinics.net/lists/linux-xfs/msg00061.html
> >
> > I can reproduce the problem easily using open-iscsi to create
> > the block device with an iscsi initiator.
> > I use lio to create an iscsi target.
> >
> > The problem is triggered by doing an iscsi logout which causes
> > the block device to return i/o errors to xfs.
> > Steps to reproduce the problem are below.
>
> This is not a problem but the expected behavior. XFS is designed to do
> this to prevent filesystem corruption. Logging out of a LUN is no
> different than pulling the power plug on a direct attached disk drive.
> Surely you would not do that to a running filesystem.
>
Sorry, I don't think I was clear on the nature of the problem and it's wide
ranging effects.
Using iscsi to access block storage on another server is the goal.
A failure on the other server is possible which could result in the iscsi
initiator returning i/o errors to xfs.
The behavior that I would like from xfs is to put the filesystem in some
kind of offline state
on an i/o error from the block device. Today xfs usually has this desirable
behavior.
But there is a corner case were instead xfs hangs the entire server,
forcing a hard reboot.
On a large file server with many filesystems using iscsi to talk to block
storage on multiple
block servers, I would like the failure of a single block server to only
impact the filesystems
that are dependent on the block server, and to not impact the other
filesystems.
Bob
> Using VirtualBox, I can reproduce it with two processors but not one.
> > I first saw this on a 3.8 kernel and most recently reproduced it with
> 3.14+.
> ...
>
> The only problem I see here is that XFS should be shutting down every
> time the disk device disappears. Which means in your test cases where
> it does not, your VM environment isn't passing the IO errors up the
> stack, and it should be. Which means your VM environment is broken.
>
> Cheers,
>
> Stan
>
>
>
[-- Attachment #1.2: Type: text/html, Size: 3963 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: xfs umount with i/o error hang/memory corruption
2014-04-04 18:15 xfs umount with i/o error hang/memory corruption Bob Mastors
2014-04-04 18:50 ` Stan Hoeppner
@ 2014-04-04 21:20 ` Dave Chinner
2014-04-04 21:40 ` Bob Mastors
1 sibling, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2014-04-04 21:20 UTC (permalink / raw)
To: Bob Mastors; +Cc: xfs
On Fri, Apr 04, 2014 at 12:15:23PM -0600, Bob Mastors wrote:
> Greetings,
>
> I am new to xfs and am running into a problem
> and would appreciate any guidance on how to proceed.
>
> After an i/o error from the block device that xfs is using,
> an umount results in a message like:
> [ 370.636473] XFS (sdx): Log I/O Error Detected. Shutting down filesystem
> [ 370.644073] XFS (h ���h"h ���H#h ���bsg):
> Please umount the filesystem and rectify the problem(s)
> Note the garbage on the previous line which suggests memory corruption.
> About half the time I get the garbled log message. About half the time
> umount hangs.
I got an email about this last night with a different trigger - thin
provisioning failing log IO in the unmount path. I know what the
problem is, I just don't have a fix for it yet.
To confirm it's the same problem, can you post the entirity of the
dmesg where the error occurs?
In essence, the log IO failure is triggering a shutdown, and as part
of the shutdown process it wakes anyone waiting on a log force.
The log quiesce code that waits for log completion during unmount
uses a log force to ensure the log is idle before tearing down all
the log structures and finishing the unmount. Unfortunatey, the log
force the unmount blocks on is woken prematurely by the shutdown,
and hence it runs before the log IO processing is completed. Hence
the use after free.
> And then I get this kind of error and the system is unresponsive:
> Message from syslogd@debian at Apr 4 09:27:40 ...
> kernel:[ 680.080022] BUG: soft lockup - CPU#2 stuck for 22s! [umount:2849]
>
> The problem appears to be similar to this issue:
> http://www.spinics.net/lists/linux-xfs/msg00061.html
Similar symptoms, but not the same bug.
> The problem is triggered by doing an iscsi logout which causes
> the block device to return i/o errors to xfs.
> Steps to reproduce the problem are below.
Seems similar to the thinp ENOSPC issue I mentioned above - data IO
errors occur, then you do an unmount, which causes a log IO error
writing the superblock, and then this happens....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: xfs umount with i/o error hang/memory corruption
2014-04-04 21:20 ` Dave Chinner
@ 2014-04-04 21:40 ` Bob Mastors
2014-04-04 21:57 ` Dave Chinner
0 siblings, 1 reply; 6+ messages in thread
From: Bob Mastors @ 2014-04-04 21:40 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
[-- Attachment #1.1: Type: text/plain, Size: 10826 bytes --]
On Fri, Apr 4, 2014 at 3:20 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Apr 04, 2014 at 12:15:23PM -0600, Bob Mastors wrote:
> > Greetings,
> >
> > I am new to xfs and am running into a problem
> > and would appreciate any guidance on how to proceed.
> >
> > After an i/o error from the block device that xfs is using,
> > an umount results in a message like:
> > [ 370.636473] XFS (sdx): Log I/O Error Detected. Shutting down
> filesystem
> > [ 370.644073] XFS (h ���h"h ���H#h ���bsg):
> > Please umount the filesystem and rectify the problem(s)
> > Note the garbage on the previous line which suggests memory corruption.
> > About half the time I get the garbled log message. About half the time
> > umount hangs.
>
> I got an email about this last night with a different trigger - thin
> provisioning failing log IO in the unmount path. I know what the
> problem is, I just don't have a fix for it yet.
>
> To confirm it's the same problem, can you post the entirity of the
> dmesg where the error occurs?
>
Here is the console output from putting a BUG() call in
xfs_do_force_shutdown
after the "Please umount ..." message.
Let me know if want something else.
I don't know how much help I can be on this, but I can run more tests
or try out fixes.
Thanks,
Bob
[ 111.004665] XFS (sdb): Mounting Filesystem
[ 111.099442] XFS (sdb): Ending clean mount
[ 129.548644] XFS (sdb): metadata I/O error: block 0x19022 ("xlog_iodone")
error 19 numblks 64
[ 129.618739] XFS (sdb): xfs_do_force_shutdown(0x2) called from line 1170
of file fs/xfs/xfs_log.c. Return address = 0xffffffffa04c8001
[ 129.699768] XFS (sdb): Log I/O Error Detected. Shutting down filesystem
[ 129.699924] XFS (sdb): Unable to update superblock counters. Freespace
may not be correct on next mount.
[ 129.699927] XFS (sdb): xfs_log_force: error 5 returned.
[ 129.716031] XFS (P専/): Please umount the filesystem and rectify the
problem(s)
[ 129.718411] ------------[ cut here ]------------
[ 129.719120] Kernel BUG at ffffffffa04726ac [verbose debug info
unavailable]
[ 129.720016] invalid opcode: 0000 [#1] SMP
[ 129.720016] Modules linked in: xfs(F) libcrc32c ib_srpt tcm_qla2xxx
qla2xxx tcm_loop tcm_fc libfc scsi_transport_fc scsi_tgt iscsi_target_mod
target_core_pscsi target_core_file target_core_iblock target_core_mod
configfs ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi nfsd nfs_acl nfs lockd fscac\
he sunrpc snd_intel8x0 snd_ac97_codec parport_pc parport ac97_bus snd_pcm
psmouse snd_timer serio_raw snd joydev evbug mac_hid pcspkr i2c_piix4
soundcore hid_generic usbhid hid e1000 ahci libahci
[ 129.720016] CPU: 1 PID: 234 Comm: kworker/1:1H Tainted: GF
3.14.0+ #1
[ 129.720016] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox
12/01/2006
[ 129.720016] Workqueue: xfslogd xfs_buf_iodone_work [xfs]
[ 129.720016] task: ffff88002f598000 ti: ffff88002fecc000 task.ti:
ffff88002fecc000
[ 129.720016] RIP: 0010:[<ffffffffa04726ac>] [<ffffffffa04726ac>]
xfs_do_force_shutdown+0x9c/0x170 [xfs]
[ 129.720016] RSP: 0018:ffff88002fecdd68 EFLAGS: 00010296
[ 129.720016] RAX: 0000000000000043 RBX: 0000000000000002 RCX:
0000000000000000
[ 129.720016] RDX: ffff88003fd0efb0 RSI: ffff88003fd0d3d8 RDI:
ffff88003fd0d3d8
[ 129.720016] RBP: ffff88002fecdd98 R08: 0000000000000000 R09:
0000000000000000
[ 129.720016] R10: 00000000000001cf R11: 00000000000001ce R12:
ffff880028865000
[ 129.720016] R13: 0000000000000002 R14: 0000000000000000 R15:
0000000000000000
[ 129.720016] FS: 0000000000000000(0000) GS:ffff88003fd00000(0000)
knlGS:0000000000000000
[ 129.720016] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 129.720016] CR2: 00000000006105e8 CR3: 0000000001c0e000 CR4:
00000000000006e0
[ 129.720016] Stack:
[ 129.720016] ffffffffa04c8001 0000000000000000 ffff880030140300
ffff88003026dc00
[ 129.720016] ffff88002fa13980 0000000000000000 ffff88002fecddc8
ffffffffa04c8001
[ 129.720016] ffff88003653cf48 ffff88002fa13a20 ffff88002fa13980
ffffe8ffffd01e00
[ 129.720016] Call Trace:
[ 129.720016] [<ffffffffa04c8001>] ? xlog_iodone+0x91/0xa0 [xfs]
[ 129.720016] [<ffffffffa04c8001>] xlog_iodone+0x91/0xa0 [xfs]
[ 129.720016] [<ffffffffa046ac3e>] xfs_buf_iodone_work+0x5e/0xc0 [xfs]
[ 129.720016] [<ffffffff8106ad04>] process_one_work+0x174/0x4a0
[ 129.720016] [<ffffffff8106c0ab>] worker_thread+0x11b/0x370
[ 129.720016] [<ffffffff8106bf90>] ? manage_workers.isra.20+0x2d0/0x2d0
[ 129.720016] [<ffffffff81072c29>] kthread+0xc9/0xe0
[ 129.720016] [<ffffffff81072b60>] ? flush_kthread_worker+0xb0/0xb0
[ 129.720016] [<ffffffff816d6aec>] ret_from_fork+0x7c/0xb0
[ 129.720016] [<ffffffff81072b60>] ? flush_kthread_worker+0xb0/0xb0
[ 129.720016] Code: d0 45 85 ed 0f 85 a5 00 00 00 f6 c3 20 66 90 75 7e 83
e3 10 0f 84 b5 00 00 00 48 c7 c6 18 8c 4e a0 4c 89 e7 31 c0 e8 54 92 00 00
<0f> 0b 66 90 48 8b 45 08 49 89 d1 41 89 c8 48 c7 c2 d0 4f 4e a0
[ 129.720016] RIP [<ffffffffa04726ac>] xfs_do_force_shutdown+0x9c/0x170
[xfs]
[ 129.720016] RSP <ffff88002fecdd68>
[ 129.777164] ---[ end trace d335180e3c345156 ]---
[ 129.777953] BUG: unable to handle kernel paging request at
ffffffffffffffd8
[ 129.779087] IP: [<ffffffff81072c80>] kthread_data+0x10/0x20
[ 129.779960] PGD 1c0f067 PUD 1c11067 PMD 0
[ 129.780900] Oops: 0000 [#2] SMP
[ 129.781150] Modules linked in: xfs(F) libcrc32c ib_srpt tcm_qla2xxx
qla2xxx tcm_loop tcm_fc libfc scsi_transport_fc scsi_tgt iscsi_target_mod
target_core_pscsi target_core_file target_core_iblock target_core_mod
configfs ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi nfsd nfs_acl nfs lockd fscac\
he sunrpc snd_intel8x0 snd_ac97_codec parport_pc parport ac97_bus snd_pcm
psmouse snd_timer serio_raw snd joydev evbug mac_hid pcspkr i2c_piix4
soundcore hid_generic usbhid hid e1000 ahci libahci
[ 129.781150] CPU: 1 PID: 234 Comm: kworker/1:1H Tainted: GF D
3.14.0+ #1
[ 129.781150] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox
12/01/2006
[ 129.781150] task: ffff88002f598000 ti: ffff88002fecc000 task.ti:
ffff88002fecc000
[ 129.781150] RIP: 0010:[<ffffffff81072c80>] [<ffffffff81072c80>]
kthread_data+0x10/0x20
[ 129.781150] RSP: 0018:ffff88002fecd9f8 EFLAGS: 00010046
[ 129.781150] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffffffff81e75380
[ 129.781150] RDX: 0000000000000001 RSI: 0000000000000001 RDI:
ffff88002f598000
[ 129.781150] RBP: ffff88002fecd9f8 R08: 0000000000000000 R09:
0000000000000164
[ 129.781150] R10: ffffffff8104c8b4 R11: 0000000000000000 R12:
0000000000000001
[ 129.781150] R13: ffff88002f5984e0 R14: ffff88003d9f8000 R15:
ffff88002f5983d8
[ 129.781150] FS: 0000000000000000(0000) GS:ffff88003fd00000(0000)
knlGS:0000000000000000
[ 129.781150] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 129.781150] CR2: 0000000000000028 CR3: 0000000001c0e000 CR4:
00000000000006e0
[ 129.781150] Stack:
[ 129.781150] ffff88002fecda18 ffffffff8106c526 ffff88003fd13140
0000000000000001
[ 129.781150] ffff88002fecda98 ffffffff816ca7cf 0000000000000001
ffff88002f598000
[ 129.781150] ffff88002f598000 0000000000013140 ffff88002fecdfd8
0000000000013140
[ 129.781150] Call Trace:
[ 129.781150] [<ffffffff8106c526>] wq_worker_sleeping+0x16/0xa0
[ 129.781150] [<ffffffff816ca7cf>] __schedule+0x56f/0x7c0
[ 129.781150] [<ffffffff816caaf9>] schedule+0x29/0x70
[ 129.781150] [<ffffffff81052075>] do_exit+0x6e5/0xa70
[ 129.781150] [<ffffffff816c2303>] ? printk+0x4d/0x4f
[ 129.781150] [<ffffffff816cf0e1>] oops_end+0xa1/0xf0
[ 129.781150] [<ffffffff81006208>] die+0x58/0x90
[ 129.781150] [<ffffffff816ce93b>] do_trap+0xcb/0x170
[ 129.781150] [<ffffffff8100308d>] do_invalid_op+0xad/0x100
[ 129.781150] [<ffffffffa04726ac>] ? xfs_do_force_shutdown+0x9c/0x170
[xfs]
[ 129.781150] [<ffffffff816c2303>] ? printk+0x4d/0x4f
[ 129.781150] [<ffffffff816d8088>] invalid_op+0x18/0x20
[ 129.781150] [<ffffffffa04726ac>] ? xfs_do_force_shutdown+0x9c/0x170
[xfs]
[ 129.781150] [<ffffffffa04726ac>] ? xfs_do_force_shutdown+0x9c/0x170
[xfs]
[ 129.781150] [<ffffffffa04c8001>] ? xlog_iodone+0x91/0xa0 [xfs]
[ 129.781150] [<ffffffffa04c8001>] xlog_iodone+0x91/0xa0 [xfs]
[ 129.781150] [<ffffffffa046ac3e>] xfs_buf_iodone_work+0x5e/0xc0 [xfs]
[ 129.781150] [<ffffffff8106ad04>] process_one_work+0x174/0x4a0
[ 129.781150] [<ffffffff8106c0ab>] worker_thread+0x11b/0x370
[ 129.781150] [<ffffffff8106bf90>] ? manage_workers.isra.20+0x2d0/0x2d0
[ 129.781150] [<ffffffff81072c29>] kthread+0xc9/0xe0
[ 129.781150] [<ffffffff81072b60>] ? flush_kthread_worker+0xb0/0xb0
[ 129.781150] [<ffffffff816d6aec>] ret_from_fork+0x7c/0xb0
[ 129.781150] [<ffffffff81072b60>] ? flush_kthread_worker+0xb0/0xb0
[ 129.781150] Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66
2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 88 04 00 00 55 48 89 e5
<48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[ 129.781150] RIP [<ffffffff81072c80>] kthread_data+0x10/0x20
[ 129.781150] RSP <ffff88002fecd9f8>
[ 129.781150] CR2: ffffffffffffffd8
[ 129.781150] ---[ end trace d335180e3c345157 ]---
[ 129.781150] Fixing recursive fault but reboot is needed!
> In essence, the log IO failure is triggering a shutdown, and as part
> of the shutdown process it wakes anyone waiting on a log force.
> The log quiesce code that waits for log completion during unmount
> uses a log force to ensure the log is idle before tearing down all
> the log structures and finishing the unmount. Unfortunatey, the log
> force the unmount blocks on is woken prematurely by the shutdown,
> and hence it runs before the log IO processing is completed. Hence
> the use after free.
>
> > And then I get this kind of error and the system is unresponsive:
> > Message from syslogd@debian at Apr 4 09:27:40 ...
> > kernel:[ 680.080022] BUG: soft lockup - CPU#2 stuck for 22s!
> [umount:2849]
> >
> > The problem appears to be similar to this issue:
> > http://www.spinics.net/lists/linux-xfs/msg00061.html
>
> Similar symptoms, but not the same bug.
>
> > The problem is triggered by doing an iscsi logout which causes
> > the block device to return i/o errors to xfs.
> > Steps to reproduce the problem are below.
>
> Seems similar to the thinp ENOSPC issue I mentioned above - data IO
> errors occur, then you do an unmount, which causes a log IO error
> writing the superblock, and then this happens....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
[-- Attachment #1.2: Type: text/html, Size: 13324 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: xfs umount with i/o error hang/memory corruption
2014-04-04 21:40 ` Bob Mastors
@ 2014-04-04 21:57 ` Dave Chinner
0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2014-04-04 21:57 UTC (permalink / raw)
To: Bob Mastors; +Cc: xfs
On Fri, Apr 04, 2014 at 03:40:38PM -0600, Bob Mastors wrote:
> On Fri, Apr 4, 2014 at 3:20 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> > On Fri, Apr 04, 2014 at 12:15:23PM -0600, Bob Mastors wrote:
> > > Greetings,
> > >
> > > I am new to xfs and am running into a problem
> > > and would appreciate any guidance on how to proceed.
> > >
> > > After an i/o error from the block device that xfs is using,
> > > an umount results in a message like:
> > > [ 370.636473] XFS (sdx): Log I/O Error Detected. Shutting down
> > filesystem
> > > [ 370.644073] XFS (h ���h"h ���H#h ���bsg):
> > > Please umount the filesystem and rectify the problem(s)
> > > Note the garbage on the previous line which suggests memory corruption.
> > > About half the time I get the garbled log message. About half the time
> > > umount hangs.
> >
> > I got an email about this last night with a different trigger - thin
> > provisioning failing log IO in the unmount path. I know what the
> > problem is, I just don't have a fix for it yet.
> >
> > To confirm it's the same problem, can you post the entirity of the
> > dmesg where the error occurs?
> >
> Here is the console output from putting a BUG() call in
> xfs_do_force_shutdown
> after the "Please umount ..." message.
> Let me know if want something else.
>
> I don't know how much help I can be on this, but I can run more tests
> or try out fixes.
I'll let you know when I have a fix for you to try - probably early
next week as it's Saturday morning right now...
> [ 129.548644] XFS (sdb): metadata I/O error: block 0x19022 ("xlog_iodone")
> error 19 numblks 64
> [ 129.618739] XFS (sdb): xfs_do_force_shutdown(0x2) called from line 1170
> of file fs/xfs/xfs_log.c. Return address = 0xffffffffa04c8001
> [ 129.699768] XFS (sdb): Log I/O Error Detected. Shutting down filesystem
> [ 129.699924] XFS (sdb): Unable to update superblock counters. Freespace
> may not be correct on next mount.
> [ 129.699927] XFS (sdb): xfs_log_force: error 5 returned.
> [ 129.716031] XFS (P専/): Please umount the filesystem and rectify the
Yup, that is identical to the bug report I got from the dm-thinp
folks. Thanks for the quick followup!
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-04-04 21:57 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-04 18:15 xfs umount with i/o error hang/memory corruption Bob Mastors
2014-04-04 18:50 ` Stan Hoeppner
2014-04-04 19:47 ` Bob Mastors
2014-04-04 21:20 ` Dave Chinner
2014-04-04 21:40 ` Bob Mastors
2014-04-04 21:57 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox