public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Austin Schuh <austin@peloton-tech.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS crash?
Date: Thu, 6 Mar 2014 10:35:51 +1100	[thread overview]
Message-ID: <20140305233551.GK6851@dastard> (raw)
In-Reply-To: <CANGgnMYPLF+8616Rs9eQOXUc9He2NSgFnNrvHvepV-x+pWS6oQ@mail.gmail.com>

On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
> Howdy,
> 
> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
> and I'm seeing a couple lockups and crashes which I think are related
> to XFS.

I think they ar emore likely related to RT issues....

> After the first lockup, I booted into single user mode, mounted the
> filesystem readonly, and then ran xfs_repair -d.  When I rebooted
> again, I initiated a large copy to a USB drive (formated EXT3) and
> continued editing source files and compiling.  After a couple minutes,
> my laptop started locking up and after 2 minutes, the following showed
> up in the kernel.
> 
> Austin
> 
> $ uname -a
> Linux vpc5 3.10-3-rt-amd64 #5 SMP PREEMPT RT Debian 3.10.11-2
> (2013-09-10) x86_64 GNU/Linux
> 
> [ 1186.363397] usb 2-1: USB disconnect, device number 2
> [ 1200.895262] INFO: task kworker/u16:5:250 blocked for more than 120 seconds.
> [ 1200.895267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1200.895270] kworker/u16:5   D ffff88040dc62cc0     0   250      2 0x00000000
> [ 1200.895281] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
....
> [ 1200.895294] Call Trace:
> [ 1200.895301]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
> [ 1200.895304]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
> [ 1200.895307]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
> [ 1200.895310]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
> [ 1200.895313]  [<ffffffff81065cfd>] ? migrate_enable+0x1cd/0x1dd
> [ 1200.895317]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
> [ 1200.895319]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
> [ 1200.895322]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
> [ 1200.895356]  [<ffffffffa01a5032>] ? xfs_bmapi_allocate+0x92/0x9e [xfs]
> [ 1200.895371]  [<ffffffffa01a535d>] ? xfs_bmapi_write+0x31f/0x558 [xfs]
> [ 1200.895375]  [<ffffffff81109680>] ? kmem_cache_alloc+0x7c/0x17d
> [ 1200.895390]  [<ffffffffa01a2e6e>] ? __xfs_bmapi_allocate+0x22b/0x22b [xfs]
> [ 1200.895402]  [<ffffffffa018b899>] ? xfs_iomap_write_allocate+0x1bc/0x2c8 [xfs]
> [ 1200.895414]  [<ffffffffa017edc5>] ? xfs_map_blocks+0x125/0x1f5 [xfs]
> [ 1200.895424]  [<ffffffffa017fc87>] ? xfs_vm_writepage+0x266/0x48f [xfs]
> [ 1200.895428]  [<ffffffff810d3d14>] ? __writepage+0xd/0x2a
> [ 1200.895430]  [<ffffffff810d4790>] ? write_cache_pages+0x207/0x302
> [ 1200.895432]  [<ffffffff810d3d07>] ? page_index+0x14/0x14
> [ 1200.895435]  [<ffffffff810d48c6>] ? generic_writepages+0x3b/0x57
> [ 1200.895438]  [<ffffffff81134698>] ? __writeback_single_inode+0x72/0x225
> [ 1200.895441]  [<ffffffff8113550b>] ? writeback_sb_inodes+0x215/0x36d
> [ 1200.895444]  [<ffffffff811356cc>] ? __writeback_inodes_wb+0x69/0xab
> [ 1200.895446]  [<ffffffff81135844>] ? wb_writeback+0x136/0x2a7

That's waiting for allocation of blocks to complete during writeback
of dirty data via background writeback.

> [ 1200.895517] Workqueue: xfs-data/sda7 xfs_end_io [xfs]
....
> [ 1200.895526] Call Trace:
> [ 1200.895531]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
> [ 1200.895534]  [<ffffffff813a2438>] ? __rt_mutex_slowlock+0x7b/0xb4
> [ 1200.895537]  [<ffffffff813a2577>] ? rt_mutex_slowlock+0xe5/0x150
> [ 1200.895542]  [<ffffffff8100c02f>] ? load_TLS+0x7/0xa
> [ 1200.895552]  [<ffffffffa017eadb>] ? xfs_setfilesize+0x48/0x120 [xfs]
> [ 1200.895555]  [<ffffffff81063d25>] ? finish_task_switch+0x80/0xc6
> [ 1200.895565]  [<ffffffffa017f62f>] ? xfs_end_io+0x7a/0x8e [xfs]
> [ 1200.895568]  [<ffffffff81055a49>] ? process_one_work+0x19b/0x2b2
> [ 1200.895570]  [<ffffffff81055f41>] ? worker_thread+0x12b/0x1f6
> [ 1200.895572]  [<ffffffff81055e16>] ? rescuer_thread+0x28f/0x28f
> [ 1200.895574]  [<ffffffff8105a909>] ? kthread+0x81/0x89
> [ 1200.895576]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
> [ 1200.895579]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
> [ 1200.895581]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c

IO completion waiting on a lock. No idea what lock, because 
the rt code replaces various different types of locks with
"rt_mutexes".

> [ 1200.895630] BrowserBlocking D ffff88040dc62cc0     0  5547      1 0x00000000

fsync waiting for allocation completion during data writeback.

> [ 1200.895801] BrowserBlocking D ffff88040dde2cc0     0  5558      1 0x00000000

fsync waiting for IO completion of a data page during data
writeback.

> [ 1200.895876] BrowserBlocking D ffff88040dd62cc0     0  5575      1 0x00000000

ditto.

> [ 1200.895979] xterm           D ffff88040dc62cc0     0  6032      1 0x00000000

Blocked waiting on a workqueue flush. Completely unrealted to
filesystems and IO.

> [ 1200.896060] tup             D ffff88040dc62cc0     0 12846   7061 0x00000000

Blocked on an inode lock (rt_mutex, again) during timestamp updates
on a write(2) syscall.

> [ 1200.896162] ld              D ffff8802d3c4a180     0 12961      1 0x00000004

FUSE filesystem write blocked ?something? in it's IO path.

> [ 1200.896228] ld              D ffff88040dce2cc0     0 12970      1 0x00000006

FUSE filesystem write blocked ?something? in it's IO path after
recieving a signal in read(2) call.

> [ 1320.927338] khubd           D ffff88040dd62cc0     0   210      2 0x00000000

This is the smoking gun:

> [ 1320.927364]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
> [ 1320.927367]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
> [ 1320.927370]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
> [ 1320.927374]  [<ffffffff8102c27b>] ? _flat_send_IPI_mask+0x68/0x78
> [ 1320.927378]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
> [ 1320.927381]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
> [ 1320.927382]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
> [ 1320.927394]  [<ffffffff810549bb>] ? flush_work+0xf6/0x119
> [ 1320.927396]  [<ffffffff8105344b>] ? create_and_start_worker+0x5e/0x5e
> [ 1320.927400]  [<ffffffff810d7309>] ? __pagevec_release+0x20/0x20
> [ 1320.927402]  [<ffffffff810564cc>] ? schedule_on_each_cpu+0x9e/0xd5
> [ 1320.927422]  [<ffffffff8113f5a0>] ? invalidate_bdev+0x1d/0x2e
> [ 1320.927425]  [<ffffffff8113f5f4>] ? __invalidate_device+0x43/0x4b
> [ 1320.927427]  [<ffffffff811bf94d>] ? invalidate_partition+0x27/0x41
> [ 1320.927429]  [<ffffffff811c09f9>] ? del_gendisk+0x86/0x1bb
> [ 1320.927435]  [<ffffffffa00de643>] ? sd_remove+0x5f/0x98 [sd_mod]
> [ 1320.927443]  [<ffffffff8128d2a0>] ? __device_release_driver+0x7f/0xd5
> [ 1320.927444]  [<ffffffff8128d53a>] ? device_release_driver+0x1a/0x25
> [ 1320.927446]  [<ffffffff8128cee7>] ? bus_remove_device+0xe7/0xfc
> [ 1320.927448]  [<ffffffff8128ada0>] ? device_del+0x120/0x176
> [ 1320.927455]  [<ffffffffa0029a0f>] ? __scsi_remove_device+0x4d/0xaf [scsi_mod]
> [ 1320.927461]  [<ffffffffa002898c>] ? scsi_forget_host+0x48/0x68 [scsi_mod]
> [ 1320.927467]  [<ffffffffa002025c>] ? scsi_remove_host+0x85/0x101 [scsi_mod]
> [ 1320.927472]  [<ffffffffa00d1f7c>] ? usb_stor_disconnect+0x63/0xae
> [usb_storage]
> [ 1320.927479]  [<ffffffffa0071d31>] ? usb_unbind_interface+0x5e/0x135 [usbcore]
> [ 1320.927488]  [<ffffffff8128d2a0>] ? __device_release_driver+0x7f/0xd5
> [ 1320.927490]  [<ffffffff8128d53a>] ? device_release_driver+0x1a/0x25
> [ 1320.927491]  [<ffffffff8128cee7>] ? bus_remove_device+0xe7/0xfc
> [ 1320.927493]  [<ffffffff8128ada0>] ? device_del+0x120/0x176
> [ 1320.927501]  [<ffffffffa0070025>] ? usb_disable_device+0x6a/0x180 [usbcore]
> [ 1320.927509]  [<ffffffffa0069507>] ? usb_disconnect+0x79/0x151 [usbcore]
> [ 1320.927515]  [<ffffffffa006abee>] ? hub_thread+0x60d/0xea2 [usbcore]
> [ 1320.927518]  [<ffffffff8105b309>] ? abort_exclusive_wait+0x7f/0x7f
> [ 1320.927524]  [<ffffffffa006a5e1>] ? hub_port_debounce+0xcf/0xcf [usbcore]
> [ 1320.927526]  [<ffffffff8105a909>] ? kthread+0x81/0x89
> [ 1320.927528]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
> [ 1320.927530]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
> [ 1320.927532]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c

Your usb device has disconnected and gone down the device
removal/invalidate partition route. and it's trying to flush the
device, which is stuck on IO completion which is stuck waiting for
the device error handling to error them out.

So, this is a block device problem error handling problem caused by
device unplug getting stuck because it's decided to ask the
filesystem to complete operations that can't be completed until the
device error handling progress far enough to error out the IOs that
the filesystem is waiting for completion on.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-03-05 23:36 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-05 23:08 XFS crash? Austin Schuh
2014-03-05 23:35 ` Dave Chinner [this message]
2014-03-06  0:53   ` Austin Schuh
2014-05-13  1:29     ` Austin Schuh
2014-05-13  3:10       ` Austin Schuh
2014-05-13  3:33       ` Austin Schuh
2014-05-13  3:46       ` Dave Chinner
2014-05-13  4:03         ` Austin Schuh
2014-05-13  6:39           ` Dave Chinner
2014-05-13  7:02             ` Austin Schuh
2014-05-13  9:03               ` Dave Chinner
2014-05-13 17:11                 ` Austin Schuh
2014-06-23 20:05                   ` Austin Schuh
2014-06-24  3:02                     ` On-stack work item completion race? (was Re: XFS crash?) Dave Chinner
2014-06-24  3:25                       ` Tejun Heo
2014-06-25  3:05                         ` Austin Schuh
2014-06-25 14:00                           ` Tejun Heo
2014-06-25 17:04                             ` Austin Schuh
2014-06-25  3:16                         ` Austin Schuh
2014-06-25  5:56                         ` Dave Chinner
2014-06-25 14:18                           ` Tejun Heo
2014-06-25 22:08                             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140305233551.GK6851@dastard \
    --to=david@fromorbit.com \
    --cc=austin@peloton-tech.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox