All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.14-mm1 RAID-1 in D< state
@ 2005-11-09 13:32 Chris Boot
  2005-11-09 21:12 ` J.A. Magallon
  2005-11-09 22:23 ` Neil Brown
  0 siblings, 2 replies; 9+ messages in thread
From: Chris Boot @ 2005-11-09 13:32 UTC (permalink / raw)
  To: Linux Kernel

Hi all,

I haven't noticed this until today...but my load average has been 
skyrocketing past 3.00 since Monday, which is when I upgraded to 
2.6.14-mm1. I've got 3 Software RAID-1 arrays across 4 SATA disks, and 
all 3 processes are locked in an uninterruptible sleep.

What's interesting, though, is I haven't noticed a degradation of 
performance at all, and all the arrays work absolutely fine. They aren't 
rebuilding or doing anything strange that I can see.

Any ideas?

Cheers,
Chris

-- 
Chris Boot
bootc@bootc.net
http://www.bootc.net/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.14-mm1 RAID-1 in D< state
  2005-11-09 13:32 2.6.14-mm1 RAID-1 in D< state Chris Boot
@ 2005-11-09 21:12 ` J.A. Magallon
  2005-11-09 22:23 ` Neil Brown
  1 sibling, 0 replies; 9+ messages in thread
From: J.A. Magallon @ 2005-11-09 21:12 UTC (permalink / raw)
  To: Chris Boot; +Cc: Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]

On Wed, 09 Nov 2005 13:32:11 +0000, Chris Boot <bootc@bootc.net> wrote:

> Hi all,
> 
> I haven't noticed this until today...but my load average has been 
> skyrocketing past 3.00 since Monday, which is when I upgraded to 
> 2.6.14-mm1. I've got 3 Software RAID-1 arrays across 4 SATA disks, and 
> all 3 processes are locked in an uninterruptible sleep.
> 
> What's interesting, though, is I haven't noticed a degradation of 
> performance at all, and all the arrays work absolutely fine. They aren't 
> rebuilding or doing anything strange that I can see.
> 
> Any ideas?
> 

Try this:

http://marc.theaimsgroup.com/?l=linux-scsi&m=113145981728205&w=2

My raid 5 was oopsing till I applied this.

--
J.A. Magallon <jamagallon()able!es>     \               Software is like sex:
werewolf!able!es                         \         It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.14-mm1 RAID-1 in D< state
  2005-11-09 13:32 2.6.14-mm1 RAID-1 in D< state Chris Boot
  2005-11-09 21:12 ` J.A. Magallon
@ 2005-11-09 22:23 ` Neil Brown
  2005-11-09 23:15   ` Chris Boot
  1 sibling, 1 reply; 9+ messages in thread
From: Neil Brown @ 2005-11-09 22:23 UTC (permalink / raw)
  To: Chris Boot; +Cc: Linux Kernel

On Wednesday November 9, bootc@bootc.net wrote:
> Hi all,
> 
> I haven't noticed this until today...but my load average has been 
> skyrocketing past 3.00 since Monday, which is when I upgraded to 
> 2.6.14-mm1. I've got 3 Software RAID-1 arrays across 4 SATA disks, and 
> all 3 processes are locked in an uninterruptible sleep.
> 
> What's interesting, though, is I haven't noticed a degradation of 
> performance at all, and all the arrays work absolutely fine. They aren't 
> rebuilding or doing anything strange that I can see.
> 
> Any ideas?

Can you
  echo t > /proc/sysrq-trigger
  dmesg > /tmp/log
and post the log created, possibly removing everything before
   SysRq : Show State

If you can't find the 'Show State', then maybe your log buffer isn't
big enough.  use 'dmesg -s ...' to make it bigger and try again.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.14-mm1 RAID-1 in D< state
  2005-11-09 22:23 ` Neil Brown
@ 2005-11-09 23:15   ` Chris Boot
  2005-11-10  5:40     ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Boot @ 2005-11-09 23:15 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 4453 bytes --]

On 9 Nov 2005, at 22:23, Neil Brown wrote:

> On Wednesday November 9, bootc@bootc.net wrote:
>> Hi all,
>>
>> I haven't noticed this until today...but my load average has been
>> skyrocketing past 3.00 since Monday, which is when I upgraded to
>> 2.6.14-mm1. I've got 3 Software RAID-1 arrays across 4 SATA disks,  
>> and
>> all 3 processes are locked in an uninterruptible sleep.
>>
>> What's interesting, though, is I haven't noticed a degradation of
>> performance at all, and all the arrays work absolutely fine. They  
>> aren't
>> rebuilding or doing anything strange that I can see.
>>
>> Any ideas?
>
> Can you
>   echo t > /proc/sysrq-trigger
>   dmesg > /tmp/log
> and post the log created, possibly removing everything before
>    SysRq : Show State

So that's what the sysrq-trigger is for... :-) Certainly easier that  
way when your system still works!

> If you can't find the 'Show State', then maybe your log buffer isn't
> big enough.  use 'dmesg -s ...' to make it bigger and try again

It was too small, but the serial console got it:

[4329954.200000] md2_raid1     D F7D776E0     0   809       
6           810   799 (L-TLB)
[4329954.200000] f7db7f30 f7d2ba8c c02809e0 f7d776e0 c02c14f2  
e9924580 c1b48b60 c1b8e200
[4329954.200000]        f7c5bd40 7fffffff f7db7f88 00000000 23c37e00  
000f6206 f7d6fa50 f7d6fb78
[4329954.200000]        7fffffff 7fffffff f7db7f88 f7db6000 c0338098  
c1b8e200 f7db7f94 f7db7f88
[4329954.200000] Call Trace:
[4329954.200000]  [<c02809e0>] generic_unplug_device+0x10/0x20
[4329954.200000]  [<c02c14f2>] unplug_slaves+0xd2/0xe0
[4329954.200000]  [<c0338098>] schedule_timeout+0x98/0xa0
[4329954.200000]  [<c01295a9>] finish_wait+0x39/0x50
[4329954.200000]  [<c02c9309>] md_thread+0xc9/0x100
[4329954.200000]  [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000]  [<c01142d7>] __wake_up_common+0x37/0x60
[4329954.200000]  [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000]  [<c02c9240>] md_thread+0x0/0x100
[4329954.200000]  [<c0129174>] kthread+0xa4/0xe0
[4329954.200000]  [<c01290d0>] kthread+0x0/0xe0
[4329954.200000]  [<c0100f35>] kernel_thread_helper+0x5/0x10
[4329954.200000] md0_raid1     D F7D774A0     0   810       
6           812   809 (L-TLB)
[4329954.200000] f7db5f30 f7d2b79c c02809e0 f7d774a0 c02c14f2  
c0383bc0 c1b48ae0 c1b8e400
[4329954.200000]        f7c5bb60 7fffffff f7db5f88 00000000 9bd42ec0  
000f6211 f7d69090 f7d691b8
[4329954.200000]        7fffffff 7fffffff f7db5f88 f7db4000 c0338098  
c1b8e400 00000002 f7db4000
[4329954.200000] Call Trace:
[4329954.200000]  [<c02809e0>] generic_unplug_device+0x10/0x20
[4329954.200000]  [<c02c14f2>] unplug_slaves+0xd2/0xe0
[4329954.200000]  [<c0338098>] schedule_timeout+0x98/0xa0
[4329954.200000]  [<c0129501>] prepare_to_wait+0x41/0x50
[4329954.200000]  [<c02c9309>] md_thread+0xc9/0x100
[4329954.200000]  [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000]  [<c01142d7>] __wake_up_common+0x37/0x60
[4329954.200000]  [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000]  [<c02c9240>] md_thread+0x0/0x100
[4329954.200000]  [<c0129174>] kthread+0xa4/0xe0
[4329954.200000]  [<c01290d0>] kthread+0x0/0xe0
[4329954.200000]  [<c0100f35>] kernel_thread_helper+0x5/0x10
[4329954.200000] md1_raid1     D F7D77860     0   812       
6           813   810 (L-TLB)
[4329954.200000] f7dbbf30 f7d2bc04 c02809e0 f7d77860 c02c14f2  
e9924580 c1b48a60 c1b8e000
[4329954.200000]        f7c5f920 7fffffff f7dbbf88 00000000 2358ae40  
000f6206 f7d5b5c0 f7d5b6e8
[4329954.200000]        7fffffff 7fffffff f7dbbf88 f7dba000 c0338098  
c1b8e000 f7dbbf88 f7dba000
[4329954.200000] Call Trace:
[4329954.200000]  [<c02809e0>] generic_unplug_device+0x10/0x20
[4329954.200000]  [<c02c14f2>] unplug_slaves+0xd2/0xe0
[4329954.200000]  [<c0338098>] schedule_timeout+0x98/0xa0
[4329954.200000]  [<c02c29ba>] raid1d+0x32a/0x350
[4329954.200000]  [<c02c9309>] md_thread+0xc9/0x100
[4329954.200000]  [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000]  [<c01142d7>] __wake_up_common+0x37/0x60
[4329954.200000]  [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000]  [<c02c9240>] md_thread+0x0/0x100
[4329954.200000]  [<c0129174>] kthread+0xa4/0xe0
[4329954.200000]  [<c01290d0>] kthread+0x0/0xe0
[4329954.200000]  [<c0100f35>] kernel_thread_helper+0x5/0x10

Let me know if you need dumps of any other processes.

> NeilBrown

Cheers,
Chris

-- 
Chris Boot
bootc@bootc.net
http://www.bootc.net/



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2359 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.14-mm1 RAID-1 in D< state
  2005-11-09 23:15   ` Chris Boot
@ 2005-11-10  5:40     ` Neil Brown
  2005-11-10  9:37       ` Chris Boot
  2005-11-10  9:37       ` J.A. Magallon
  0 siblings, 2 replies; 9+ messages in thread
From: Neil Brown @ 2005-11-10  5:40 UTC (permalink / raw)
  To: Chris Boot; +Cc: Linux Kernel


Thanks for the trace.  I see what is happening.
I changed
  wait_event_timeout_interruptible 
in md.c(md_thread) to
  wait_event_timeout

as the thread no longer needs to be able to respond the signals.
However that has the side-effect of putting the process in the 'D'
state and adding to the 'uptime'.

I guess I'll put that back...

NeilBrown


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-11-10 16:39:04.000000000 +1100
+++ ./drivers/md/md.c	2005-11-10 16:39:28.000000000 +1100
@@ -3439,10 +3439,11 @@ static int md_thread(void * arg)
 	allow_signal(SIGKILL);
 	while (!kthread_should_stop()) {
 
-		wait_event_timeout(thread->wqueue,
-				   test_bit(THREAD_WAKEUP, &thread->flags)
-				   || kthread_should_stop(),
-				   thread->timeout);
+		wait_event_timeout_interruptible
+			(thread->wqueue,
+			 test_bit(THREAD_WAKEUP, &thread->flags)
+			 || kthread_should_stop(),
+			 thread->timeout);
 		try_to_freeze();
 
 		clear_bit(THREAD_WAKEUP, &thread->flags);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.14-mm1 RAID-1 in D< state
  2005-11-10  5:40     ` Neil Brown
@ 2005-11-10  9:37       ` Chris Boot
  2005-11-10  9:39         ` Neil Brown
  2005-11-10  9:37       ` J.A. Magallon
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Boot @ 2005-11-10  9:37 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 1818 bytes --]

On 10 Nov 2005, at 5:40, Neil Brown wrote:

>
> Thanks for the trace.  I see what is happening.
> I changed
>   wait_event_timeout_interruptible
> in md.c(md_thread) to
>   wait_event_timeout
>
> as the thread no longer needs to be able to respond the signals.
> However that has the side-effect of putting the process in the 'D'
> state and adding to the 'uptime'.
>
> I guess I'll put that back...
>
> NeilBrown
>
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
>  ./drivers/md/md.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff ./drivers/md/md.c~current~ ./drivers/md/md.c
> --- ./drivers/md/md.c~current~	2005-11-10 16:39:04.000000000 +1100
> +++ ./drivers/md/md.c	2005-11-10 16:39:28.000000000 +1100
> @@ -3439,10 +3439,11 @@ static int md_thread(void * arg)
>  	allow_signal(SIGKILL);
>  	while (!kthread_should_stop()) {
>
> -		wait_event_timeout(thread->wqueue,
> -				   test_bit(THREAD_WAKEUP, &thread->flags)
> -				   || kthread_should_stop(),
> -				   thread->timeout);
> +		wait_event_timeout_interruptible
> +			(thread->wqueue,
> +			 test_bit(THREAD_WAKEUP, &thread->flags)
> +			 || kthread_should_stop(),
> +			 thread->timeout);
>  		try_to_freeze();
>
>  		clear_bit(THREAD_WAKEUP, &thread->flags);

Sounds about right but...

drivers/md/md.c: In function `md_thread':
drivers/md/md.c:3441: warning: implicit declaration of function  
`wait_event_timeout_interruptible'
[...]
   LD      .tmp_vmlinux1
drivers/built-in.o(.text+0x9904f): In function `md_thread':
: undefined reference to `wait_event_timeout_interruptible'
drivers/built-in.o(.text+0x9908f): In function `md_thread':
: undefined reference to `wait_event_timeout_interruptible'
make: *** [.tmp_vmlinux1] Error 1

HTH,
Chris

-- 
Chris Boot
bootc@bootc.net
http://www.bootc.net/



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2359 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.14-mm1 RAID-1 in D< state
  2005-11-10  5:40     ` Neil Brown
  2005-11-10  9:37       ` Chris Boot
@ 2005-11-10  9:37       ` J.A. Magallon
  1 sibling, 0 replies; 9+ messages in thread
From: J.A. Magallon @ 2005-11-10  9:37 UTC (permalink / raw)
  To: Neil Brown; +Cc: Chris Boot, Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 1726 bytes --]

On Thu, 10 Nov 2005 16:40:13 +1100, Neil Brown <neilb@suse.de> wrote:

> 
> Thanks for the trace.  I see what is happening.
> I changed
>   wait_event_timeout_interruptible 
> in md.c(md_thread) to
>   wait_event_timeout
> 
> as the thread no longer needs to be able to respond the signals.
> However that has the side-effect of putting the process in the 'D'
> state and adding to the 'uptime'.
> 
> I guess I'll put that back...
> 
> NeilBrown
> 
> 
> Signed-off-by: Neil Brown <neilb@suse.de>
> 
> ### Diffstat output
>  ./drivers/md/md.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff ./drivers/md/md.c~current~ ./drivers/md/md.c
> --- ./drivers/md/md.c~current~	2005-11-10 16:39:04.000000000 +1100
> +++ ./drivers/md/md.c	2005-11-10 16:39:28.000000000 +1100
> @@ -3439,10 +3439,11 @@ static int md_thread(void * arg)
>  	allow_signal(SIGKILL);
>  	while (!kthread_should_stop()) {
>  
> -		wait_event_timeout(thread->wqueue,
> -				   test_bit(THREAD_WAKEUP, &thread->flags)
> -				   || kthread_should_stop(),
> -				   thread->timeout);
> +		wait_event_timeout_interruptible
> +			(thread->wqueue,
> +			 test_bit(THREAD_WAKEUP, &thread->flags)
> +			 || kthread_should_stop(),
> +			 thread->timeout);
>  		try_to_freeze();
>  
>  		clear_bit(THREAD_WAKEUP, &thread->flags);

s/wait_event_timeout_interruptible/wait_event_interruptible_timeout/

;)

--
J.A. Magallon <jamagallon()able!es>     \               Software is like sex:
werewolf!able!es                         \         It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.14-mm1 RAID-1 in D< state
  2005-11-10  9:37       ` Chris Boot
@ 2005-11-10  9:39         ` Neil Brown
  2005-11-10  9:51           ` Chris Boot
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2005-11-10  9:39 UTC (permalink / raw)
  To: Chris Boot; +Cc: Linux Kernel

On Thursday November 10, bootc@bootc.net wrote:
> 
> Sounds about right but...
> 
> drivers/md/md.c: In function `md_thread':
> drivers/md/md.c:3441: warning: implicit declaration of function  
> `wait_event_timeout_interruptible'

should be
  wait_event_interruptible_timeout

Sorry.
NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.14-mm1 RAID-1 in D< state
  2005-11-10  9:39         ` Neil Brown
@ 2005-11-10  9:51           ` Chris Boot
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Boot @ 2005-11-10  9:51 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 476 bytes --]


On 10 Nov 2005, at 9:39, Neil Brown wrote:

> On Thursday November 10, bootc@bootc.net wrote:
>>
>> Sounds about right but...
>>
>> drivers/md/md.c: In function `md_thread':
>> drivers/md/md.c:3441: warning: implicit declaration of function
>> `wait_event_timeout_interruptible'
>
> should be
>   wait_event_interruptible_timeout
>
> Sorry.
> NeilBrown

No problem. Builds, boots, and fixes the problem.

Cheers,
Chris

-- 
Chris Boot
bootc@bootc.net
http://www.bootc.net/



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2359 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-11-10  9:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-09 13:32 2.6.14-mm1 RAID-1 in D< state Chris Boot
2005-11-09 21:12 ` J.A. Magallon
2005-11-09 22:23 ` Neil Brown
2005-11-09 23:15   ` Chris Boot
2005-11-10  5:40     ` Neil Brown
2005-11-10  9:37       ` Chris Boot
2005-11-10  9:39         ` Neil Brown
2005-11-10  9:51           ` Chris Boot
2005-11-10  9:37       ` J.A. Magallon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.