From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lee Howard Subject: Re: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358] Date: Tue, 20 Oct 2009 22:24:32 -0700 Message-ID: <4ADE9B10.2030204@howardsilvan.com> References: <70ed7c3e0910202201g13ffa18di7eddd625ffca52fc@mail.gmail.com> <70ed7c3e0910202202y53231834y639db36af6e964db@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <70ed7c3e0910202202y53231834y639db36af6e964db@mail.gmail.com> Sender: linux-raid-owner@vger.kernel.org To: "Majed B." Cc: Steven Haigh , linux-raid@vger.kernel.org List-Id: linux-raid.ids I've been deliberately monitoring the kernel via the git web interfaces, and I can't yet see the patch committed that supposedly fixed this. (Please correct me if it was actually committed.) While a single 10s stuck CPU may not be serious, it *is* serious when it happens over and over and over again consecutively (like it does in my case). Thanks, Lee. Majed B. wrote: > And it's not serious. > > On Wed, Oct 21, 2009 at 8:01 AM, Majed B. wrote: > >> Hello, >> >> I believe this has been fixed in 2.6.30 or 2.6.31. >> >> On Wed, Oct 21, 2009 at 5:46 AM, Steven Haigh wrote: >> >>> When trying to run a check using: >>> echo check > /sys/block/md2/md/sync_action >>> >>> I got the following errors printed to the console: >>> >>> Oct 21 13:31:03 wireless kernel: md: syncing RAID array md2 >>> Oct 21 13:31:03 wireless kernel: md: minimum _guaranteed_ reconstruction >>> speed: 1000 KB/sec/disc. >>> Oct 21 13:31:03 wireless kernel: md: using maximum available idle IO >>> bandwidth (but not more than 20000 KB/sec) for reconstruction. >>> Oct 21 13:31:03 wireless kernel: md: using 128k window, over a total of >>> 300511808 blocks. >>> BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358] >>> >>> Pid: 358, comm: md2_raid1 >>> EIP: 0060:[] CPU: 0 >>> EIP is at memcmp+0xd/0x22 >>> EFLAGS: 00000202 Not tainted (2.6.18-164.el5 #1) >>> EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0 EDX: 00000000 >>> ESI: 00000020 EDI: 00000090 EBP: f70b8e40 DS: 007b ES: 007b >>> CR0: 8005003b CR2: 0806af70 CR3: 37872000 CR4: 000006d0 >>> [] raid1d+0x270/0xbea [raid1] >>> [] schedule+0x9cc/0xa55 >>> [] schedule_timeout+0x13/0x8c >>> [] md_thread+0xdf/0xf5 >>> [] autoremove_wake_function+0x0/0x2d >>> [] md_thread+0x0/0xf5 >>> [] kthread+0xc0/0xeb >>> [] kthread+0x0/0xeb >>> [] kernel_thread_helper+0x7/0x10 >>> ======================= >>> Oct 21 13:37:50 wireless kernel: BUG: soft lockup - CPU#0 stuck for 10s! >>> [md2_raid1:358] >>> Oct 21 13:37:50 wireless kernel: >>> Oct 21 13:37:50 wireless kernel: Pid: 358, comm: md2_raid1 >>> Oct 21 13:37:50 wireless kernel: EIP: 0060:[] CPU: 0 >>> Oct 21 13:37:50 wireless kernel: EIP is at memcmp+0xd/0x22 >>> Oct 21 13:37:50 wireless kernel: EFLAGS: 00000202 Not tainted >>> (2.6.18-164.el5 #1) >>> Oct 21 13:37:50 wireless kernel: EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0 >>> EDX: 00000000 >>> Oct 21 13:37:50 wireless kernel: ESI: 00000020 EDI: 00000090 EBP: f70b8e40 >>> DS: 007b ES: 007b >>> Oct 21 13:37:50 wireless kernel: CR0: 8005003b CR2: 0806af70 CR3: 37872000 >>> CR4: 000006d0 >>> Oct 21 13:37:50 wireless kernel: [] raid1d+0x270/0xbea [raid1] >>> Oct 21 13:37:50 wireless kernel: [] schedule+0x9cc/0xa55 >>> Oct 21 13:37:50 wireless kernel: [] schedule_timeout+0x13/0x8c >>> Oct 21 13:37:50 wireless kernel: [] md_thread+0xdf/0xf5 >>> Oct 21 13:37:51 wireless kernel: [] >>> autoremove_wake_function+0x0/0x2d >>> Oct 21 13:37:51 wireless kernel: [] md_thread+0x0/0xf5 >>> Oct 21 13:37:51 wireless kernel: [] kthread+0xc0/0xeb >>> Oct 21 13:37:51 wireless kernel: [] kthread+0x0/0xeb >>> Oct 21 13:37:51 wireless kernel: [] kernel_thread_helper+0x7/0x10 >>> Oct 21 13:37:51 wireless kernel: ======================= >>> >>> This is using CentOS 5.3 with Kernel 2.6.18-164.el5 on an i686. >>> >>> Is this a serious type error? Is there anything else I can supply to >>> diagnose things more? >>> >>> # mdadm --detail /dev/md2 >>> /dev/md2: >>> Version : 00.90.03 >>> Creation Time : Mon Feb 23 17:15:41 2009 >>> Raid Level : raid1 >>> Array Size : 300511808 (286.59 GiB 307.72 GB) >>> Used Dev Size : 300511808 (286.59 GiB 307.72 GB) >>> Raid Devices : 2 >>> Total Devices : 2 >>> Preferred Minor : 2 >>> Persistence : Superblock is persistent >>> >>> Update Time : Wed Oct 21 13:46:28 2009 >>> State : clean, resyncing >>> Active Devices : 2 >>> Working Devices : 2 >>> Failed Devices : 0 >>> Spare Devices : 0 >>> >>> Rebuild Status : 5% complete >>> >>> UUID : fed99e3d:d08fdcc9:b9593a45:2cc09736 >>> Events : 0.30584 >>> >>> Number Major Minor RaidDevice State >>> 0 3 3 0 active sync /dev/hda3 >>> 1 22 3 1 active sync /dev/hdc3 >>> >>> >>> -- >>> Steven Haigh >>> >>> Email: netwiz@crc.id.au >>> Web: http://www.crc.id.au >>> Phone: (03) 9001 6090 - 0412 935 897 >>> >>> >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> >> -- >> Majed B. >> >> > > > >