* BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
@ 2009-10-21 2:46 Steven Haigh
2009-10-21 5:01 ` Majed B.
0 siblings, 1 reply; 5+ messages in thread
From: Steven Haigh @ 2009-10-21 2:46 UTC (permalink / raw)
To: linux-raid
When trying to run a check using:
echo check > /sys/block/md2/md/sync_action
I got the following errors printed to the console:
Oct 21 13:31:03 wireless kernel: md: syncing RAID array md2
Oct 21 13:31:03 wireless kernel: md: minimum _guaranteed_
reconstruction speed: 1000 KB/sec/disc.
Oct 21 13:31:03 wireless kernel: md: using maximum available idle IO
bandwidth (but not more than 20000 KB/sec) for reconstruction.
Oct 21 13:31:03 wireless kernel: md: using 128k window, over a total
of 300511808 blocks.
BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
Pid: 358, comm: md2_raid1
EIP: 0060:[<c04ec1bc>] CPU: 0
EIP is at memcmp+0xd/0x22
EFLAGS: 00000202 Not tainted (2.6.18-164.el5 #1)
EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0 EDX: 00000000
ESI: 00000020 EDI: 00000090 EBP: f70b8e40 DS: 007b ES: 007b
CR0: 8005003b CR2: 0806af70 CR3: 37872000 CR4: 000006d0
[<f8843c64>] raid1d+0x270/0xbea [raid1]
[<c0616870>] schedule+0x9cc/0xa55
[<c0616f33>] schedule_timeout+0x13/0x8c
[<c05a6b5e>] md_thread+0xdf/0xf5
[<c0434907>] autoremove_wake_function+0x0/0x2d
[<c05a6a7f>] md_thread+0x0/0xf5
[<c0434845>] kthread+0xc0/0xeb
[<c0434785>] kthread+0x0/0xeb
[<c0405c53>] kernel_thread_helper+0x7/0x10
=======================
Oct 21 13:37:50 wireless kernel: BUG: soft lockup - CPU#0 stuck for
10s! [md2_raid1:358]
Oct 21 13:37:50 wireless kernel:
Oct 21 13:37:50 wireless kernel: Pid: 358, comm: md2_raid1
Oct 21 13:37:50 wireless kernel: EIP: 0060:[<c04ec1bc>] CPU: 0
Oct 21 13:37:50 wireless kernel: EIP is at memcmp+0xd/0x22
Oct 21 13:37:50 wireless kernel: EFLAGS: 00000202 Not tainted
(2.6.18-164.el5 #1)
Oct 21 13:37:50 wireless kernel: EAX: 00000000 EBX: e2826fe0 ECX:
d15f3fe0 EDX: 00000000
Oct 21 13:37:50 wireless kernel: ESI: 00000020 EDI: 00000090 EBP:
f70b8e40 DS: 007b ES: 007b
Oct 21 13:37:50 wireless kernel: CR0: 8005003b CR2: 0806af70 CR3:
37872000 CR4: 000006d0
Oct 21 13:37:50 wireless kernel: [<f8843c64>] raid1d+0x270/0xbea
[raid1]
Oct 21 13:37:50 wireless kernel: [<c0616870>] schedule+0x9cc/0xa55
Oct 21 13:37:50 wireless kernel: [<c0616f33>] schedule_timeout
+0x13/0x8c
Oct 21 13:37:50 wireless kernel: [<c05a6b5e>] md_thread+0xdf/0xf5
Oct 21 13:37:51 wireless kernel: [<c0434907>] autoremove_wake_function
+0x0/0x2d
Oct 21 13:37:51 wireless kernel: [<c05a6a7f>] md_thread+0x0/0xf5
Oct 21 13:37:51 wireless kernel: [<c0434845>] kthread+0xc0/0xeb
Oct 21 13:37:51 wireless kernel: [<c0434785>] kthread+0x0/0xeb
Oct 21 13:37:51 wireless kernel: [<c0405c53>] kernel_thread_helper
+0x7/0x10
Oct 21 13:37:51 wireless kernel: =======================
This is using CentOS 5.3 with Kernel 2.6.18-164.el5 on an i686.
Is this a serious type error? Is there anything else I can supply to
diagnose things more?
# mdadm --detail /dev/md2
/dev/md2:
Version : 00.90.03
Creation Time : Mon Feb 23 17:15:41 2009
Raid Level : raid1
Array Size : 300511808 (286.59 GiB 307.72 GB)
Used Dev Size : 300511808 (286.59 GiB 307.72 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Wed Oct 21 13:46:28 2009
State : clean, resyncing
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Rebuild Status : 5% complete
UUID : fed99e3d:d08fdcc9:b9593a45:2cc09736
Events : 0.30584
Number Major Minor RaidDevice State
0 3 3 0 active sync /dev/hda3
1 22 3 1 active sync /dev/hdc3
--
Steven Haigh
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
2009-10-21 2:46 BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358] Steven Haigh
@ 2009-10-21 5:01 ` Majed B.
2009-10-21 5:02 ` Majed B.
0 siblings, 1 reply; 5+ messages in thread
From: Majed B. @ 2009-10-21 5:01 UTC (permalink / raw)
To: Steven Haigh; +Cc: linux-raid
Hello,
I believe this has been fixed in 2.6.30 or 2.6.31.
On Wed, Oct 21, 2009 at 5:46 AM, Steven Haigh <netwiz@crc.id.au> wrote:
> When trying to run a check using:
> echo check > /sys/block/md2/md/sync_action
>
> I got the following errors printed to the console:
>
> Oct 21 13:31:03 wireless kernel: md: syncing RAID array md2
> Oct 21 13:31:03 wireless kernel: md: minimum _guaranteed_ reconstruction
> speed: 1000 KB/sec/disc.
> Oct 21 13:31:03 wireless kernel: md: using maximum available idle IO
> bandwidth (but not more than 20000 KB/sec) for reconstruction.
> Oct 21 13:31:03 wireless kernel: md: using 128k window, over a total of
> 300511808 blocks.
> BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
>
> Pid: 358, comm: md2_raid1
> EIP: 0060:[<c04ec1bc>] CPU: 0
> EIP is at memcmp+0xd/0x22
> EFLAGS: 00000202 Not tainted (2.6.18-164.el5 #1)
> EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0 EDX: 00000000
> ESI: 00000020 EDI: 00000090 EBP: f70b8e40 DS: 007b ES: 007b
> CR0: 8005003b CR2: 0806af70 CR3: 37872000 CR4: 000006d0
> [<f8843c64>] raid1d+0x270/0xbea [raid1]
> [<c0616870>] schedule+0x9cc/0xa55
> [<c0616f33>] schedule_timeout+0x13/0x8c
> [<c05a6b5e>] md_thread+0xdf/0xf5
> [<c0434907>] autoremove_wake_function+0x0/0x2d
> [<c05a6a7f>] md_thread+0x0/0xf5
> [<c0434845>] kthread+0xc0/0xeb
> [<c0434785>] kthread+0x0/0xeb
> [<c0405c53>] kernel_thread_helper+0x7/0x10
> =======================
> Oct 21 13:37:50 wireless kernel: BUG: soft lockup - CPU#0 stuck for 10s!
> [md2_raid1:358]
> Oct 21 13:37:50 wireless kernel:
> Oct 21 13:37:50 wireless kernel: Pid: 358, comm: md2_raid1
> Oct 21 13:37:50 wireless kernel: EIP: 0060:[<c04ec1bc>] CPU: 0
> Oct 21 13:37:50 wireless kernel: EIP is at memcmp+0xd/0x22
> Oct 21 13:37:50 wireless kernel: EFLAGS: 00000202 Not tainted
> (2.6.18-164.el5 #1)
> Oct 21 13:37:50 wireless kernel: EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0
> EDX: 00000000
> Oct 21 13:37:50 wireless kernel: ESI: 00000020 EDI: 00000090 EBP: f70b8e40
> DS: 007b ES: 007b
> Oct 21 13:37:50 wireless kernel: CR0: 8005003b CR2: 0806af70 CR3: 37872000
> CR4: 000006d0
> Oct 21 13:37:50 wireless kernel: [<f8843c64>] raid1d+0x270/0xbea [raid1]
> Oct 21 13:37:50 wireless kernel: [<c0616870>] schedule+0x9cc/0xa55
> Oct 21 13:37:50 wireless kernel: [<c0616f33>] schedule_timeout+0x13/0x8c
> Oct 21 13:37:50 wireless kernel: [<c05a6b5e>] md_thread+0xdf/0xf5
> Oct 21 13:37:51 wireless kernel: [<c0434907>]
> autoremove_wake_function+0x0/0x2d
> Oct 21 13:37:51 wireless kernel: [<c05a6a7f>] md_thread+0x0/0xf5
> Oct 21 13:37:51 wireless kernel: [<c0434845>] kthread+0xc0/0xeb
> Oct 21 13:37:51 wireless kernel: [<c0434785>] kthread+0x0/0xeb
> Oct 21 13:37:51 wireless kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
> Oct 21 13:37:51 wireless kernel: =======================
>
> This is using CentOS 5.3 with Kernel 2.6.18-164.el5 on an i686.
>
> Is this a serious type error? Is there anything else I can supply to
> diagnose things more?
>
> # mdadm --detail /dev/md2
> /dev/md2:
> Version : 00.90.03
> Creation Time : Mon Feb 23 17:15:41 2009
> Raid Level : raid1
> Array Size : 300511808 (286.59 GiB 307.72 GB)
> Used Dev Size : 300511808 (286.59 GiB 307.72 GB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 2
> Persistence : Superblock is persistent
>
> Update Time : Wed Oct 21 13:46:28 2009
> State : clean, resyncing
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Rebuild Status : 5% complete
>
> UUID : fed99e3d:d08fdcc9:b9593a45:2cc09736
> Events : 0.30584
>
> Number Major Minor RaidDevice State
> 0 3 3 0 active sync /dev/hda3
> 1 22 3 1 active sync /dev/hdc3
>
>
> --
> Steven Haigh
>
> Email: netwiz@crc.id.au
> Web: http://www.crc.id.au
> Phone: (03) 9001 6090 - 0412 935 897
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
2009-10-21 5:01 ` Majed B.
@ 2009-10-21 5:02 ` Majed B.
2009-10-21 5:24 ` Lee Howard
0 siblings, 1 reply; 5+ messages in thread
From: Majed B. @ 2009-10-21 5:02 UTC (permalink / raw)
To: Steven Haigh; +Cc: linux-raid
And it's not serious.
On Wed, Oct 21, 2009 at 8:01 AM, Majed B. <majedb@gmail.com> wrote:
> Hello,
>
> I believe this has been fixed in 2.6.30 or 2.6.31.
>
> On Wed, Oct 21, 2009 at 5:46 AM, Steven Haigh <netwiz@crc.id.au> wrote:
>> When trying to run a check using:
>> echo check > /sys/block/md2/md/sync_action
>>
>> I got the following errors printed to the console:
>>
>> Oct 21 13:31:03 wireless kernel: md: syncing RAID array md2
>> Oct 21 13:31:03 wireless kernel: md: minimum _guaranteed_ reconstruction
>> speed: 1000 KB/sec/disc.
>> Oct 21 13:31:03 wireless kernel: md: using maximum available idle IO
>> bandwidth (but not more than 20000 KB/sec) for reconstruction.
>> Oct 21 13:31:03 wireless kernel: md: using 128k window, over a total of
>> 300511808 blocks.
>> BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
>>
>> Pid: 358, comm: md2_raid1
>> EIP: 0060:[<c04ec1bc>] CPU: 0
>> EIP is at memcmp+0xd/0x22
>> EFLAGS: 00000202 Not tainted (2.6.18-164.el5 #1)
>> EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0 EDX: 00000000
>> ESI: 00000020 EDI: 00000090 EBP: f70b8e40 DS: 007b ES: 007b
>> CR0: 8005003b CR2: 0806af70 CR3: 37872000 CR4: 000006d0
>> [<f8843c64>] raid1d+0x270/0xbea [raid1]
>> [<c0616870>] schedule+0x9cc/0xa55
>> [<c0616f33>] schedule_timeout+0x13/0x8c
>> [<c05a6b5e>] md_thread+0xdf/0xf5
>> [<c0434907>] autoremove_wake_function+0x0/0x2d
>> [<c05a6a7f>] md_thread+0x0/0xf5
>> [<c0434845>] kthread+0xc0/0xeb
>> [<c0434785>] kthread+0x0/0xeb
>> [<c0405c53>] kernel_thread_helper+0x7/0x10
>> =======================
>> Oct 21 13:37:50 wireless kernel: BUG: soft lockup - CPU#0 stuck for 10s!
>> [md2_raid1:358]
>> Oct 21 13:37:50 wireless kernel:
>> Oct 21 13:37:50 wireless kernel: Pid: 358, comm: md2_raid1
>> Oct 21 13:37:50 wireless kernel: EIP: 0060:[<c04ec1bc>] CPU: 0
>> Oct 21 13:37:50 wireless kernel: EIP is at memcmp+0xd/0x22
>> Oct 21 13:37:50 wireless kernel: EFLAGS: 00000202 Not tainted
>> (2.6.18-164.el5 #1)
>> Oct 21 13:37:50 wireless kernel: EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0
>> EDX: 00000000
>> Oct 21 13:37:50 wireless kernel: ESI: 00000020 EDI: 00000090 EBP: f70b8e40
>> DS: 007b ES: 007b
>> Oct 21 13:37:50 wireless kernel: CR0: 8005003b CR2: 0806af70 CR3: 37872000
>> CR4: 000006d0
>> Oct 21 13:37:50 wireless kernel: [<f8843c64>] raid1d+0x270/0xbea [raid1]
>> Oct 21 13:37:50 wireless kernel: [<c0616870>] schedule+0x9cc/0xa55
>> Oct 21 13:37:50 wireless kernel: [<c0616f33>] schedule_timeout+0x13/0x8c
>> Oct 21 13:37:50 wireless kernel: [<c05a6b5e>] md_thread+0xdf/0xf5
>> Oct 21 13:37:51 wireless kernel: [<c0434907>]
>> autoremove_wake_function+0x0/0x2d
>> Oct 21 13:37:51 wireless kernel: [<c05a6a7f>] md_thread+0x0/0xf5
>> Oct 21 13:37:51 wireless kernel: [<c0434845>] kthread+0xc0/0xeb
>> Oct 21 13:37:51 wireless kernel: [<c0434785>] kthread+0x0/0xeb
>> Oct 21 13:37:51 wireless kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
>> Oct 21 13:37:51 wireless kernel: =======================
>>
>> This is using CentOS 5.3 with Kernel 2.6.18-164.el5 on an i686.
>>
>> Is this a serious type error? Is there anything else I can supply to
>> diagnose things more?
>>
>> # mdadm --detail /dev/md2
>> /dev/md2:
>> Version : 00.90.03
>> Creation Time : Mon Feb 23 17:15:41 2009
>> Raid Level : raid1
>> Array Size : 300511808 (286.59 GiB 307.72 GB)
>> Used Dev Size : 300511808 (286.59 GiB 307.72 GB)
>> Raid Devices : 2
>> Total Devices : 2
>> Preferred Minor : 2
>> Persistence : Superblock is persistent
>>
>> Update Time : Wed Oct 21 13:46:28 2009
>> State : clean, resyncing
>> Active Devices : 2
>> Working Devices : 2
>> Failed Devices : 0
>> Spare Devices : 0
>>
>> Rebuild Status : 5% complete
>>
>> UUID : fed99e3d:d08fdcc9:b9593a45:2cc09736
>> Events : 0.30584
>>
>> Number Major Minor RaidDevice State
>> 0 3 3 0 active sync /dev/hda3
>> 1 22 3 1 active sync /dev/hdc3
>>
>>
>> --
>> Steven Haigh
>>
>> Email: netwiz@crc.id.au
>> Web: http://www.crc.id.au
>> Phone: (03) 9001 6090 - 0412 935 897
>>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Majed B.
>
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
2009-10-21 5:02 ` Majed B.
@ 2009-10-21 5:24 ` Lee Howard
2009-10-21 8:44 ` Majed B.
0 siblings, 1 reply; 5+ messages in thread
From: Lee Howard @ 2009-10-21 5:24 UTC (permalink / raw)
To: Majed B.; +Cc: Steven Haigh, linux-raid
I've been deliberately monitoring the kernel via the git web interfaces,
and I can't yet see the patch committed that supposedly fixed this.
(Please correct me if it was actually committed.)
While a single 10s stuck CPU may not be serious, it *is* serious when it
happens over and over and over again consecutively (like it does in my
case).
Thanks,
Lee.
Majed B. wrote:
> And it's not serious.
>
> On Wed, Oct 21, 2009 at 8:01 AM, Majed B. <majedb@gmail.com> wrote:
>
>> Hello,
>>
>> I believe this has been fixed in 2.6.30 or 2.6.31.
>>
>> On Wed, Oct 21, 2009 at 5:46 AM, Steven Haigh <netwiz@crc.id.au> wrote:
>>
>>> When trying to run a check using:
>>> echo check > /sys/block/md2/md/sync_action
>>>
>>> I got the following errors printed to the console:
>>>
>>> Oct 21 13:31:03 wireless kernel: md: syncing RAID array md2
>>> Oct 21 13:31:03 wireless kernel: md: minimum _guaranteed_ reconstruction
>>> speed: 1000 KB/sec/disc.
>>> Oct 21 13:31:03 wireless kernel: md: using maximum available idle IO
>>> bandwidth (but not more than 20000 KB/sec) for reconstruction.
>>> Oct 21 13:31:03 wireless kernel: md: using 128k window, over a total of
>>> 300511808 blocks.
>>> BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
>>>
>>> Pid: 358, comm: md2_raid1
>>> EIP: 0060:[<c04ec1bc>] CPU: 0
>>> EIP is at memcmp+0xd/0x22
>>> EFLAGS: 00000202 Not tainted (2.6.18-164.el5 #1)
>>> EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0 EDX: 00000000
>>> ESI: 00000020 EDI: 00000090 EBP: f70b8e40 DS: 007b ES: 007b
>>> CR0: 8005003b CR2: 0806af70 CR3: 37872000 CR4: 000006d0
>>> [<f8843c64>] raid1d+0x270/0xbea [raid1]
>>> [<c0616870>] schedule+0x9cc/0xa55
>>> [<c0616f33>] schedule_timeout+0x13/0x8c
>>> [<c05a6b5e>] md_thread+0xdf/0xf5
>>> [<c0434907>] autoremove_wake_function+0x0/0x2d
>>> [<c05a6a7f>] md_thread+0x0/0xf5
>>> [<c0434845>] kthread+0xc0/0xeb
>>> [<c0434785>] kthread+0x0/0xeb
>>> [<c0405c53>] kernel_thread_helper+0x7/0x10
>>> =======================
>>> Oct 21 13:37:50 wireless kernel: BUG: soft lockup - CPU#0 stuck for 10s!
>>> [md2_raid1:358]
>>> Oct 21 13:37:50 wireless kernel:
>>> Oct 21 13:37:50 wireless kernel: Pid: 358, comm: md2_raid1
>>> Oct 21 13:37:50 wireless kernel: EIP: 0060:[<c04ec1bc>] CPU: 0
>>> Oct 21 13:37:50 wireless kernel: EIP is at memcmp+0xd/0x22
>>> Oct 21 13:37:50 wireless kernel: EFLAGS: 00000202 Not tainted
>>> (2.6.18-164.el5 #1)
>>> Oct 21 13:37:50 wireless kernel: EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0
>>> EDX: 00000000
>>> Oct 21 13:37:50 wireless kernel: ESI: 00000020 EDI: 00000090 EBP: f70b8e40
>>> DS: 007b ES: 007b
>>> Oct 21 13:37:50 wireless kernel: CR0: 8005003b CR2: 0806af70 CR3: 37872000
>>> CR4: 000006d0
>>> Oct 21 13:37:50 wireless kernel: [<f8843c64>] raid1d+0x270/0xbea [raid1]
>>> Oct 21 13:37:50 wireless kernel: [<c0616870>] schedule+0x9cc/0xa55
>>> Oct 21 13:37:50 wireless kernel: [<c0616f33>] schedule_timeout+0x13/0x8c
>>> Oct 21 13:37:50 wireless kernel: [<c05a6b5e>] md_thread+0xdf/0xf5
>>> Oct 21 13:37:51 wireless kernel: [<c0434907>]
>>> autoremove_wake_function+0x0/0x2d
>>> Oct 21 13:37:51 wireless kernel: [<c05a6a7f>] md_thread+0x0/0xf5
>>> Oct 21 13:37:51 wireless kernel: [<c0434845>] kthread+0xc0/0xeb
>>> Oct 21 13:37:51 wireless kernel: [<c0434785>] kthread+0x0/0xeb
>>> Oct 21 13:37:51 wireless kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
>>> Oct 21 13:37:51 wireless kernel: =======================
>>>
>>> This is using CentOS 5.3 with Kernel 2.6.18-164.el5 on an i686.
>>>
>>> Is this a serious type error? Is there anything else I can supply to
>>> diagnose things more?
>>>
>>> # mdadm --detail /dev/md2
>>> /dev/md2:
>>> Version : 00.90.03
>>> Creation Time : Mon Feb 23 17:15:41 2009
>>> Raid Level : raid1
>>> Array Size : 300511808 (286.59 GiB 307.72 GB)
>>> Used Dev Size : 300511808 (286.59 GiB 307.72 GB)
>>> Raid Devices : 2
>>> Total Devices : 2
>>> Preferred Minor : 2
>>> Persistence : Superblock is persistent
>>>
>>> Update Time : Wed Oct 21 13:46:28 2009
>>> State : clean, resyncing
>>> Active Devices : 2
>>> Working Devices : 2
>>> Failed Devices : 0
>>> Spare Devices : 0
>>>
>>> Rebuild Status : 5% complete
>>>
>>> UUID : fed99e3d:d08fdcc9:b9593a45:2cc09736
>>> Events : 0.30584
>>>
>>> Number Major Minor RaidDevice State
>>> 0 3 3 0 active sync /dev/hda3
>>> 1 22 3 1 active sync /dev/hdc3
>>>
>>>
>>> --
>>> Steven Haigh
>>>
>>> Email: netwiz@crc.id.au
>>> Web: http://www.crc.id.au
>>> Phone: (03) 9001 6090 - 0412 935 897
>>>
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>> --
>> Majed B.
>>
>>
>
>
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
2009-10-21 5:24 ` Lee Howard
@ 2009-10-21 8:44 ` Majed B.
0 siblings, 0 replies; 5+ messages in thread
From: Majed B. @ 2009-10-21 8:44 UTC (permalink / raw)
To: Lee Howard; +Cc: linux-raid
If you search the linux-raid mailing list archive for emails during
October, I think you'll find a related thread with an answer (or
more).
On Wed, Oct 21, 2009 at 8:24 AM, Lee Howard <faxguy@howardsilvan.com> wrote:
> I've been deliberately monitoring the kernel via the git web interfaces, and
> I can't yet see the patch committed that supposedly fixed this. (Please
> correct me if it was actually committed.)
>
> While a single 10s stuck CPU may not be serious, it *is* serious when it
> happens over and over and over again consecutively (like it does in my
> case).
>
> Thanks,
>
> Lee.
>
>
> Majed B. wrote:
>>
>> And it's not serious.
>>
>> On Wed, Oct 21, 2009 at 8:01 AM, Majed B. <majedb@gmail.com> wrote:
>>
>>>
>>> Hello,
>>>
>>> I believe this has been fixed in 2.6.30 or 2.6.31.
>>>
>>> On Wed, Oct 21, 2009 at 5:46 AM, Steven Haigh <netwiz@crc.id.au> wrote:
>>>
>>>>
>>>> When trying to run a check using:
>>>> echo check > /sys/block/md2/md/sync_action
>>>>
>>>> I got the following errors printed to the console:
>>>>
>>>> Oct 21 13:31:03 wireless kernel: md: syncing RAID array md2
>>>> Oct 21 13:31:03 wireless kernel: md: minimum _guaranteed_ reconstruction
>>>> speed: 1000 KB/sec/disc.
>>>> Oct 21 13:31:03 wireless kernel: md: using maximum available idle IO
>>>> bandwidth (but not more than 20000 KB/sec) for reconstruction.
>>>> Oct 21 13:31:03 wireless kernel: md: using 128k window, over a total of
>>>> 300511808 blocks.
>>>> BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
>>>>
>>>> Pid: 358, comm: md2_raid1
>>>> EIP: 0060:[<c04ec1bc>] CPU: 0
>>>> EIP is at memcmp+0xd/0x22
>>>> EFLAGS: 00000202 Not tainted (2.6.18-164.el5 #1)
>>>> EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0 EDX: 00000000
>>>> ESI: 00000020 EDI: 00000090 EBP: f70b8e40 DS: 007b ES: 007b
>>>> CR0: 8005003b CR2: 0806af70 CR3: 37872000 CR4: 000006d0
>>>> [<f8843c64>] raid1d+0x270/0xbea [raid1]
>>>> [<c0616870>] schedule+0x9cc/0xa55
>>>> [<c0616f33>] schedule_timeout+0x13/0x8c
>>>> [<c05a6b5e>] md_thread+0xdf/0xf5
>>>> [<c0434907>] autoremove_wake_function+0x0/0x2d
>>>> [<c05a6a7f>] md_thread+0x0/0xf5
>>>> [<c0434845>] kthread+0xc0/0xeb
>>>> [<c0434785>] kthread+0x0/0xeb
>>>> [<c0405c53>] kernel_thread_helper+0x7/0x10
>>>> =======================
>>>> Oct 21 13:37:50 wireless kernel: BUG: soft lockup - CPU#0 stuck for 10s!
>>>> [md2_raid1:358]
>>>> Oct 21 13:37:50 wireless kernel:
>>>> Oct 21 13:37:50 wireless kernel: Pid: 358, comm: md2_raid1
>>>> Oct 21 13:37:50 wireless kernel: EIP: 0060:[<c04ec1bc>] CPU: 0
>>>> Oct 21 13:37:50 wireless kernel: EIP is at memcmp+0xd/0x22
>>>> Oct 21 13:37:50 wireless kernel: EFLAGS: 00000202 Not tainted
>>>> (2.6.18-164.el5 #1)
>>>> Oct 21 13:37:50 wireless kernel: EAX: 00000000 EBX: e2826fe0 ECX:
>>>> d15f3fe0
>>>> EDX: 00000000
>>>> Oct 21 13:37:50 wireless kernel: ESI: 00000020 EDI: 00000090 EBP:
>>>> f70b8e40
>>>> DS: 007b ES: 007b
>>>> Oct 21 13:37:50 wireless kernel: CR0: 8005003b CR2: 0806af70 CR3:
>>>> 37872000
>>>> CR4: 000006d0
>>>> Oct 21 13:37:50 wireless kernel: [<f8843c64>] raid1d+0x270/0xbea
>>>> [raid1]
>>>> Oct 21 13:37:50 wireless kernel: [<c0616870>] schedule+0x9cc/0xa55
>>>> Oct 21 13:37:50 wireless kernel: [<c0616f33>]
>>>> schedule_timeout+0x13/0x8c
>>>> Oct 21 13:37:50 wireless kernel: [<c05a6b5e>] md_thread+0xdf/0xf5
>>>> Oct 21 13:37:51 wireless kernel: [<c0434907>]
>>>> autoremove_wake_function+0x0/0x2d
>>>> Oct 21 13:37:51 wireless kernel: [<c05a6a7f>] md_thread+0x0/0xf5
>>>> Oct 21 13:37:51 wireless kernel: [<c0434845>] kthread+0xc0/0xeb
>>>> Oct 21 13:37:51 wireless kernel: [<c0434785>] kthread+0x0/0xeb
>>>> Oct 21 13:37:51 wireless kernel: [<c0405c53>]
>>>> kernel_thread_helper+0x7/0x10
>>>> Oct 21 13:37:51 wireless kernel: =======================
>>>>
>>>> This is using CentOS 5.3 with Kernel 2.6.18-164.el5 on an i686.
>>>>
>>>> Is this a serious type error? Is there anything else I can supply to
>>>> diagnose things more?
>>>>
>>>> # mdadm --detail /dev/md2
>>>> /dev/md2:
>>>> Version : 00.90.03
>>>> Creation Time : Mon Feb 23 17:15:41 2009
>>>> Raid Level : raid1
>>>> Array Size : 300511808 (286.59 GiB 307.72 GB)
>>>> Used Dev Size : 300511808 (286.59 GiB 307.72 GB)
>>>> Raid Devices : 2
>>>> Total Devices : 2
>>>> Preferred Minor : 2
>>>> Persistence : Superblock is persistent
>>>>
>>>> Update Time : Wed Oct 21 13:46:28 2009
>>>> State : clean, resyncing
>>>> Active Devices : 2
>>>> Working Devices : 2
>>>> Failed Devices : 0
>>>> Spare Devices : 0
>>>>
>>>> Rebuild Status : 5% complete
>>>>
>>>> UUID : fed99e3d:d08fdcc9:b9593a45:2cc09736
>>>> Events : 0.30584
>>>>
>>>> Number Major Minor RaidDevice State
>>>> 0 3 3 0 active sync /dev/hda3
>>>> 1 22 3 1 active sync /dev/hdc3
>>>>
>>>>
>>>> --
>>>> Steven Haigh
>>>>
>>>> Email: netwiz@crc.id.au
>>>> Web: http://www.crc.id.au
>>>> Phone: (03) 9001 6090 - 0412 935 897
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>
>>> --
>>> Majed B.
>>>
>>>
>>
>>
>>
>>
>
>
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-10-21 8:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-21 2:46 BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358] Steven Haigh
2009-10-21 5:01 ` Majed B.
2009-10-21 5:02 ` Majed B.
2009-10-21 5:24 ` Lee Howard
2009-10-21 8:44 ` Majed B.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.