From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753969AbaIKJxG (ORCPT ); Thu, 11 Sep 2014 05:53:06 -0400 Received: from setup.z80.it ([83.103.80.66]:44235 "EHLO setup.z80.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753516AbaIKJxC (ORCPT ); Thu, 11 Sep 2014 05:53:02 -0400 X-Greylist: delayed 841 seconds by postgrey-1.27 at vger.kernel.org; Thu, 11 Sep 2014 05:53:01 EDT From: AndreaML Reply-To: andreaml@z80.it Organization: z80.it To: linux-kernel@vger.kernel.org Subject: IRQ problem with the AACRAID driver with debian wheezy kernel 3.2.60-1+deb7u3 x86_64 Date: Thu, 11 Sep 2014 11:38:58 +0200 User-Agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201409111138.58874.andreaml@z80.it> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello I'm writing here after a very long time spent trying to debug a problem i have in an host i use for backup pourposes. My google-fu haven't helped in this case. This problem is there *at least* from the start of the year (i don't remember when it started) This is a pc I assembled in a 19" rack case and put in my small server farm to be connected to the tape libraries and to be used as a backup server with rsync and bacula This server is equipped with an adaptec 2100S storage raid controller with 4 sata disks for storage and 2 sata disks directly connected to the motherboard in a md mirror configuration for the system to reside on. I plan to attach a second 2100S i have around to get more disk space. The os i have installed is debian wheezy completely updated as of today, with kernel version "Linux frodo 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u3 x86_64 GNU/Linux" The problem i have is that at least one time every day the kernel complains that it receives an interrupt #18 with corresponds to the 2100S storage controller that wasn't serviced by a driver.... as it stands and confirmed by lspci -nnvv this is attached to aacraid (see below). The problem is that when the kernel complains about irq #18, my system become sloooowww... to the point that even a sync on the command line becomes unresponsive and i need to recurse to the magic systreq trick (echo b > /proc/sysrc_trigger) or the physical reset button to regain access to the system. and i'm really really out of ideas on how to resolve this. can you folks give me some hints? and yes, i have tried the irqpoll option as suggested in kernel msg with no results. following is the data i have, in compact form. -- Andrea -- The motherboard bios and the 2100S bios are at the latest image available from the manufacturers. The kernel complains in this manner: Sep 10 22:01:11 frodo kernel: [215125.931642] irq 18: nobody cared (try booting with the "irqpoll" option) Sep 10 22:01:11 frodo kernel: [215125.931703] Pid: 0, comm: swapper/0 Not tainted 3.2.0-4-amd64 #1 Debian 3.2.60-1+deb7u3 Sep 10 22:01:11 frodo kernel: [215125.931706] Call Trace: Sep 10 22:01:11 frodo kernel: [215125.931708] [] ? __report_bad_irq+0x2c/0xb5 Sep 10 22:01:11 frodo kernel: [215125.931720] [] ? note_interrupt+0x170/0x1f2 Sep 10 22:01:11 frodo kernel: [215125.931725] [] ? handle_irq_event_percpu+0x15f/0x17d Sep 10 22:01:11 frodo kernel: [215125.931729] [] ? handle_irq_event+0x34/0x52 Sep 10 22:01:11 frodo kernel: [215125.931734] [] ? arch_local_irq_save+0x11/0x17 Sep 10 22:01:11 frodo kernel: [215125.931738] [] ? handle_fasteoi_irq+0x7c/0xaf Sep 10 22:01:11 frodo kernel: [215125.931744] [] ? handle_irq+0x1d/0x21 Sep 10 22:01:11 frodo kernel: [215125.931748] [] ? do_IRQ+0x42/0x98 Sep 10 22:01:11 frodo kernel: [215125.931753] [] ? common_interrupt+0x6e/0x6e Sep 10 22:01:11 frodo kernel: [215125.931755] [] ? ktime_get+0x50/0x86 Sep 10 22:01:11 frodo kernel: [215125.931763] [] ? intel_idle+0xea/0x119 Sep 10 22:01:11 frodo kernel: [215125.931767] [] ? intel_idle+0xc9/0x119 Sep 10 22:01:11 frodo kernel: [215125.931771] [] ? cpuidle_idle_call+0xec/0x179 Sep 10 22:01:11 frodo kernel: [215125.931775] [] ? cpu_idle+0xa5/0xf2 Sep 10 22:01:11 frodo kernel: [215125.931780] [] ? start_kernel+0x3b8/0x3c3 Sep 10 22:01:11 frodo kernel: [215125.931785] [] ? early_idt_handlers+0x140/0x140 Sep 10 22:01:11 frodo kernel: [215125.931789] [] ? x86_64_start_kernel+0x104/0x111 Sep 10 22:01:11 frodo kernel: [215125.931792] handlers: Sep 10 22:01:11 frodo kernel: [215125.931824] [] aac_rx_intr_message Sep 10 22:01:11 frodo kernel: [215125.931864] Disabling IRQ #18 and after a while (obviously) Sep 11 00:44:13 frodo kernel: [224892.536925] INFO: task rsync:7383 blocked for more than 120 seconds. Sep 11 00:44:13 frodo kernel: [224892.536971] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 11 00:44:13 frodo kernel: [224892.537023] rsync D ffff88011ed93780 0 7383 7363 0x00000000 Sep 11 00:44:13 frodo kernel: [224892.537028] ffff8801169150c0 0000000000000082 ffffffff00000000 ffff880119f55100 Sep 11 00:44:13 frodo kernel: [224892.537033] 0000000000013780 ffff88002d907fd8 ffff88002d907fd8 ffff8801169150c0 Sep 11 00:44:13 frodo kernel: [224892.537038] ffff88002d907c74 0000000100000004 ffffffff8135049f ffff88010dca2210 Sep 11 00:44:13 frodo kernel: [224892.537042] Call Trace: Sep 11 00:44:13 frodo kernel: [224892.537050] [] ? _raw_spin_unlock_irqrestore+0xe/0xf Sep 11 00:44:13 frodo kernel: [224892.537055] [] ? __mutex_lock_common.isra.5+0xff/0x164 Sep 11 00:44:13 frodo kernel: [224892.537059] [] ? mutex_lock+0x1a/0x2d Sep 11 00:44:13 frodo kernel: [224892.537063] [] ? walk_component+0x1f4/0x406 Sep 11 00:44:13 frodo kernel: [224892.537069] [] ? generic_file_aio_read+0x570/0x5cf Sep 11 00:44:13 frodo kernel: [224892.537072] [] ? path_lookupat+0x7c/0x2bd Sep 11 00:44:13 frodo kernel: [224892.537076] [] ? should_resched+0x5/0x23 Sep 11 00:44:13 frodo kernel: [224892.537079] [] ? _cond_resched+0x7/0x1c Sep 11 00:44:13 frodo kernel: [224892.537083] [] ? do_path_lookup+0x1c/0x87 Sep 11 00:44:13 frodo kernel: [224892.537087] [] ? user_path_at_empty+0x47/0x7b Sep 11 00:44:13 frodo kernel: [224892.537091] [] ? do_sync_read+0xb4/0xec Sep 11 00:44:13 frodo kernel: [224892.537095] [] ? force_quiescent_state+0x19/0x178 Sep 11 00:44:13 frodo kernel: [224892.537098] [] ? vfs_fstatat+0x32/0x60 Sep 11 00:44:13 frodo kernel: [224892.537101] [] ? fput+0x17a/0x1a1 Sep 11 00:44:13 frodo kernel: [224892.537104] [] ? sys_newlstat+0x12/0x2b Sep 11 00:44:13 frodo kernel: [224892.537107] [] ? mntput_no_expire+0x1e/0xc9 Sep 11 00:44:13 frodo kernel: [224892.537111] [] ? filp_close+0x62/0x6a Sep 11 00:44:13 frodo kernel: [224892.537114] [] ? sys_close+0x8e/0xcb Sep 11 00:44:13 frodo kernel: [224892.537117] [] ? system_call_fastpath+0x16/0x1b --- kernel driver version root@frodo:~# cat /sys/bus/pci/drivers/aacraid/module/version 1.1-7[28000]-ms --- excerpt from hwinfo str1: "ASUSTeK COMPUTER INC." str2: "P8H61" str2: "Intel" str3: "Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz" --- excerpt from lspci -nnvv 05:02.0 RAID bus controller [0104]: Adaptec AAC-RAID [9005:0285] (rev 01) Subsystem: Adaptec AAR-2410SA PCI SATA 4ch (Jaguar II) [9005:0290] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- SERR-