From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anton Cook Subject: Possible SMP bug: Linux kernel 2.4.18, Xeon Date: Wed, 11 Jun 2003 11:03:22 +1000 Sender: linux-smp-owner@vger.kernel.org Message-ID: <01C33009.13ADC4E0@ACOOK> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Return-path: List-Id: To: "'linux-smp@vger.kernel.org.'" We are having an intermittent 'hang' with Linux 2.4.18 on a dual Xeon CPU SMP enabled host: [1] The hang occurs on a production web server/database application system approximately 1/week. At a hang that there is no HDD activity. One console dump (see below) showed 'Divide error:0000' Kernel panic: Aiee, killing interrupt handler'. The host may still be 'pingable' after the hang in some cases. Examination of the console dump that we obtained indicated the DIV in smp_local_timer_interrupt called from smp_apic_timer_interrupt. This is the divide at the line: __setup_APIC_LVTT(calibration_result/prof_counter[cpu]); It seems likely that the problem may be triggered by high HDD activity. We have seen it most often when we run 'rsync' for daily backups. [2] Kernel version - 2.4.18. Slackware 8 [3] Kernel build options include: CONFIG_M586=y CONFIG_SMP=y CONFIG_HIMEM4G=y (There is 2GB RAM) [4] Environment. Processors: 2 x Intel Xeon 2.0GHz, Mainboard: Intel SE7500CW. Disk drive configuration: 4 x IDE ST312028A - Software RAID1 array of 4 partitions on one pair of HDDs - the remaining 2 HDDs are not currently used Any advice greatly appreciated. Anton Cook The console dump (handwritten source for what it is worth) is reproduced below: Divide error: 0000 CPU - 1070834432 EIP - 0010[cc0110f30] Not tainted E flag: 00010046 eax: 000f35eb ebx: 00b17400 ecx: 00000000 edx: 00000000 esi: 0558ba00 edi: 00000000 ebp: c02c5d00 esp: c195b758 ds: 0018 es: 0018 ss: 0018 Process (pid:0 stackpage:d953b00) (Too much indistinct handwritten data for stack and call trace to reproduce.) Code: f7 f1 50 e8 dc ?? ?? 8b 83 00 56 20 c0 83 c4 82 83 80 <0> Kernel panic: Aiee, killing interrupt handler In interrupt handler - not syncing This email and any attachments are confidential and may be subject to copyright, legal or some other professional privilege. They are intended solely for the attention and use of the named addressee(s). They may only be copied, distributed or disclosed with the consent of the copyright owner. If you have received this email by mistake or by breach of the confidentiality clause, please notify the sender immediately by return email and delete or destroy all copies of the email. Any confidentiality, privilege or copyright is not waived or lost because this email has been sent to you by mistake.