From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@bugzilla.kernel.org
Subject: [Bug 46031] New: kswapd0 moving to uninterruptible sleep (STAT D)
Date: Thu, 16 Aug 2012 12:40:13 +0000 (UTC)
Message-ID:
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Return-path:
Received: from mail.kernel.org ([198.145.19.201]:44285 "EHLO mail.kernel.org"
rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
id S1754724Ab2HPMkS (ORCPT );
Thu, 16 Aug 2012 08:40:18 -0400
Received: from mail.kernel.org (localhost [127.0.0.1])
by mail.kernel.org (Postfix) with ESMTP id CC5332025B
for ; Thu, 16 Aug 2012 12:40:16 +0000 (UTC)
Received: from bugzilla.kernel.org (unknown [198.145.19.217])
by mail.kernel.org (Postfix) with ESMTP id 0B8C420223
for ; Thu, 16 Aug 2012 12:40:14 +0000 (UTC)
Sender: linux-scsi-owner@vger.kernel.org
List-Id: linux-scsi@vger.kernel.org
To: linux-scsi@vger.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=46031
Summary: kswapd0 moving to uninterruptible sleep (STAT D)
Product: IO/Storage
Version: 2.5
Kernel Version: 3.5.2
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: SCSI
AssignedTo: linux-scsi@vger.kernel.org
ReportedBy: Markus.Hetzmannseder@jku.at
Regression: No
Hi,
I have a hangup problem with my litle server. The Hardware is a Dell Poweredge
SC1430 with mirrored harddrives conntected on the PERC 5/i Adapter, it uses the
megaraid/megasas scsi driver.
The problem occurs specially at heavy diskIO like update of the file name
database.
The system is running in x86_PAE mode with 8GB RAM installed. So far I have
tried out kernel 3.1.4 3.6.0-rc1 and now running 3.5.2 version.
According to kernel.log its allways the kswapd0 process which starts to hang in
STAT D mode. After that more and more processes are hitting STAT D and the
system is getting practically unusable. In that state a login over the network
is still possible. A normal reboot is not working anymore (keeps waiting to
kill some processes) only a reboot -f is doing the job.
When the error accurs the /proc/sys/kernel/tainted has state 512
In the attachment I add all the kern.log output I got so far.
In the kern.log I see something like this:
-----------------------------------------------------------------
Aug 16 11:49:57 servername kernel: [ 7361.062388] WARNING: at
fs/jbd/journal.c:469 __log_start_commit+0x6b/0x7e()
Aug 16 11:49:57 servername kernel: [ 7361.062391] Hardware name: PowerEdge
SC1430
Aug 16 11:49:57 servername kernel: [ 7361.062393] jbd: bad log_start_commit:
2168023832 2168023832 0 0
Aug 16 11:49:57 servername kernel: [ 7361.062395] Modules linked in: ppdev lp
bluetooth rfkill mperf cpufreq_conservative cpufreq_userspace cpufreq_powersave
cpufreq_stats nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc fuse
loop psmouse lpc_ich mfd_core i5000_edac edac_core serio_raw evdev tpm_tis
pcspkr tpm shpchp hid_generic coretemp rng_core dcdbas tpm_bios i5k_amb
pci_hotplug microcode parport_pc processor button parport thermal_sys usbhid
hid uhci_hcd sg sr_mod tg3 cdrom ehci_hcd libphy usbcore usb_common sd_mod
crc_t10dif [last unloaded: scsi_wait_scan]
Aug 16 11:49:57 servername kernel: [ 7361.062454] Pid: 46, comm: kswapd0 Not
tainted 3.5.2 #1
Aug 16 11:49:57 servername kernel: [ 7361.062456] Call Trace:
Aug 16 11:49:57 servername kernel: [ 7361.062464] [] ?
warn_slowpath_common+0x6a/0x7b
Aug 16 11:49:57 servername kernel: [ 7361.062468] [] ?
__log_start_commit+0x6b/0x7e
Aug 16 11:49:57 servername kernel: [ 7361.062472] [] ?
warn_slowpath_fmt+0x28/0x2c
Aug 16 11:49:57 servername kernel: [ 7361.062476] [] ?
__log_start_commit+0x6b/0x7e
Aug 16 11:49:57 servername kernel: [ 7361.062480] [] ?
log_start_commit+0x1b/0x22
Aug 16 11:49:57 servername kernel: [ 7361.062484] [] ?
ext3_evict_inode+0xbe/0x1cc
Aug 16 11:49:57 servername kernel: [ 7361.062489] [] ?
evict+0x8a/0x126
Aug 16 11:49:57 servername kernel: [ 7361.062492] [] ?
dispose_list+0x2e/0x37
Aug 16 11:49:57 servername kernel: [ 7361.062496] [] ?
prune_icache_sb+0x27f/0x287
Aug 16 11:49:57 servername kernel: [ 7361.062501] [] ?
prune_super+0xa2/0xf5
Aug 16 11:49:57 servername kernel: [ 7361.062506] [] ?
shrink_slab+0x1b7/0x254
Aug 16 11:49:57 servername kernel: [ 7361.062509] [] ?
kswapd+0x54f/0x805
Aug 16 11:49:57 servername kernel: [ 7361.062515] [] ?
wake_up_bit+0x56/0x56
Aug 16 11:49:57 servername kernel: [ 7361.062519] [] ?
try_to_free_pages+0xd5/0xd5
Aug 16 11:49:57 servername kernel: [ 7361.062522] [] ?
kthread+0x68/0x6d
Aug 16 11:49:57 servername kernel: [ 7361.062526] [] ?
kthread_freezable_should_stop+0x45/0x45
Aug 16 11:49:57 servername kernel: [ 7361.062531] [] ?
kernel_thread_helper+0x6/0xd
Aug 16 11:49:57 servername kernel: [ 7361.062534] ---[ end trace
7f2284fed89c7a03 ]---
Aug 16 12:33:17 servername kernel: [ 9960.684081] INFO: task acroread:3117
blocked for more than 120 seconds.
Aug 16 12:33:17 servername kernel: [ 9960.684116] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 16 12:33:17 servername kernel: [ 9960.684162] acroread D 00000000
0 3117 3115 0x00000000
Aug 16 12:33:17 servername kernel: [ 9960.684179] f0ef69a0 00200082 00000001
00000000 c6b6ddac 00000002 39abe377 c1514dc0
Aug 16 12:33:17 servername kernel: [ 9960.684186] c6b6ddac c2c0dd38 c1514dc0
c1514dc0 f0ef69a0 c1514dc0 0101b7ba 00000020
Aug 16 12:33:17 servername kernel: [ 9960.684192] c10d7899 c2c0ddb0 009e8d67
00000000 da7ff09c c6b6ddac 0000000b ce221700
Aug 16 12:33:17 servername kernel: [ 9960.684199] Call Trace:
Aug 16 12:33:17 servername kernel: [ 9960.684210] [] ?
mntput_no_expire+0x15/0xf1
Aug 16 12:33:17 servername kernel: [ 9960.684215] [] ?
search_dirblock+0x5f/0x93
Aug 16 12:33:17 servername kernel: [ 9960.684221] [] ?
prepare_to_wait+0x14/0x52
Aug 16 12:33:17 servername kernel: [ 9960.684225] [] ?
__wait_on_freeing_inode+0x6e/0x88
Aug 16 12:33:17 servername kernel: [ 9960.684229] [] ?
autoremove_wake_function+0x29/0x29
Aug 16 12:33:17 servername kernel: [ 9960.684232] [] ?
find_inode_fast+0x35/0x6d
Aug 16 12:33:17 servername kernel: [ 9960.684236] [] ?
iget_locked+0x2f/0xd5
Aug 16 12:33:17 servername kernel: [ 9960.684240] [] ?
ext3_iget+0x18/0x332
Aug 16 12:33:17 servername kernel: [ 9960.684243] [] ?
ext3_lookup+0x5d/0x9b
Aug 16 12:33:17 servername kernel: [ 9960.684248] [] ?
__lookup_hash+0x8f/0xa8
Aug 16 12:33:17 servername kernel: [ 9960.684251] [] ?
lookup_slow+0x2c/0x78
Aug 16 12:33:17 servername kernel: [ 9960.684255] [] ?
walk_component+0x48/0xe8
Aug 16 12:33:17 servername kernel: [ 9960.684259] [] ?
path_lookupat+0xa4/0x2a6
Aug 16 12:33:17 servername kernel: [ 9960.684264] [] ?
free_hot_cold_page_list+0x4a/0x60
Aug 16 12:33:17 servername kernel: [ 9960.684268] [] ?
do_path_lookup+0x1b/0x85
Aug 16 12:33:17 servername kernel: [ 9960.684271] [] ?
user_path_at_empty+0x3d/0x65
Aug 16 12:33:17 servername kernel: [ 9960.684277] [] ?
handle_mm_fault+0x118/0x129
Aug 16 12:33:17 servername kernel: [ 9960.684281] [] ?
user_path_at+0xb/0xe
Aug 16 12:33:17 servername kernel: [ 9960.684284] [] ?
vfs_fstatat+0x3d/0x63
Aug 16 12:33:17 servername kernel: [ 9960.684287] [] ?
vfs_stat+0x10/0x12
Aug 16 12:33:17 servername kernel: [ 9960.684290] [] ?
sys_stat64+0xf/0x23
Aug 16 12:33:17 servername kernel: [ 9960.684295] [] ?
spurious_fault+0xe5/0xe5
Aug 16 12:33:17 servername kernel: [ 9960.684299] [] ?
sysenter_do_call+0x12/0x22
Aug 16 12:35:17 servername kernel: [10080.684102] INFO: task acroread:3117
blocked for more than 120 seconds.
Aug 16 12:35:17 servername kernel: [10080.684138] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 16 12:35:17 servername kernel: [10080.684183] acroread D 00000000
0 3117 3115 0x00000000
Aug 16 12:35:17 servername kernel: [10080.684200] f0ef69a0 00200082 00000001
00000000 c6b6ddac 00000002 39abe377 c1514dc0
Aug 16 12:35:17 servername kernel: [10080.684207] c6b6ddac c2c0dd38 c1514dc0
c1514dc0 f0ef69a0 c1514dc0 0101b7ba 00000020
Aug 16 12:35:17 servername kernel: [10080.684214] c10d7899 c2c0ddb0 009e8d67
00000000 da7ff09c c6b6ddac 0000000b ce221700
Aug 16 12:35:17 servername kernel: [10080.684220] Call Trace:
Aug 16 12:35:17 servername kernel: [10080.684231] [] ?
mntput_no_expire+0x15/0xf1
Aug 16 12:35:17 servername kernel: [10080.684237] [] ?
search_dirblock+0x5f/0x93
Aug 16 12:35:17 servername kernel: [10080.684243] [] ?
prepare_to_wait+0x14/0x52
Aug 16 12:35:17 servername kernel: [10080.684247] [] ?
__wait_on_freeing_inode+0x6e/0x88
Aug 16 12:35:17 servername kernel: [10080.684251] [] ?
autoremove_wake_function+0x29/0x29
Aug 16 12:35:17 servername kernel: [10080.684254] [] ?
find_inode_fast+0x35/0x6d
Aug 16 12:35:17 servername kernel: [10080.684258] [] ?
iget_locked+0x2f/0xd5
Aug 16 12:35:17 servername kernel: [10080.684261] [] ?
ext3_iget+0x18/0x332
Aug 16 12:35:17 servername kernel: [10080.684265] [] ?
ext3_lookup+0x5d/0x9b
Aug 16 12:35:17 servername kernel: [10080.684269] [] ?
__lookup_hash+0x8f/0xa8
Aug 16 12:35:17 servername kernel: [10080.684273] [] ?
lookup_slow+0x2c/0x78
Aug 16 12:35:17 servername kernel: [10080.684276] [] ?
walk_component+0x48/0xe8
Aug 16 12:35:17 servername kernel: [10080.684280] [] ?
path_lookupat+0xa4/0x2a6
Aug 16 12:35:17 servername kernel: [10080.684285] [] ?
free_hot_cold_page_list+0x4a/0x60
Aug 16 12:35:17 servername kernel: [10080.684289] [] ?
do_path_lookup+0x1b/0x85
Aug 16 12:35:17 servername kernel: [10080.684292] [] ?
user_path_at_empty+0x3d/0x65
Aug 16 12:35:17 servername kernel: [10080.684298] [] ?
handle_mm_fault+0x118/0x129
Aug 16 12:35:17 servername kernel: [10080.684302] [] ?
user_path_at+0xb/0xe
Aug 16 12:35:17 servername kernel: [10080.684305] [] ?
vfs_fstatat+0x3d/0x63
Aug 16 12:35:17 servername kernel: [10080.684308] [] ?
vfs_stat+0x10/0x12
Aug 16 12:35:17 servername kernel: [10080.684311] [] ?
sys_stat64+0xf/0x23
Aug 16 12:35:17 servername kernel: [10080.684316] [] ?
spurious_fault+0xe5/0xe5
Aug 16 12:35:17 servername kernel: [10080.684320] [] ?
sysenter_do_call+0x12/0x22
--------------------------------------------------------------
Any hints how to get the system back in a stable mode?
Markus
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.