sw raid5 hungs on resync and high IO load, 2.6.32.23

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* sw raid5 hungs on resync and high IO load, 2.6.32.23
@ 2010-10-27  7:35 Martin Hamrle
  2010-10-27  8:01 ` Neil Brown
  2010-10-27  8:50 ` Mikael Abrahamsson
  0 siblings, 2 replies; 5+ messages in thread
From: Martin Hamrle @ 2010-10-27  7:35 UTC (permalink / raw)
  To: linux-raid

Hi,

I'm having this issue on several boxes with several configuration.
One of them is a box with 8 drives attached to ARC-1160 in pass through
mode and build sw raid5 from these drives. There is also one drive to OS.

During resync or check and heavy IO load, process tscpd (tscpd is IO
load maker) hungs, the machine is still alive but there are many blocked
processes.
After tscpd hungs, IO load is generated only by resync. In traceback you
can see blocked processes (ps, htop cat) accessing tscpd cmdline in
proc. Some tscpd threads is blocked during writing files into fs on
raid5. Reading these files is also blocking, reading other files in
filesystem is fast as usual.  This state takes 110 minutes. After that
all blocked processes continue their work.

I am not sure what is the reason of the end of the weird state. I think
the end was caused by starting copying kernel source into array.

Note that this is first time when hung processes wake up I never wait so
long.

I think that it is related to sw raid because I do not see this issue on
hw raid or on sw raid without resync.

kern.log contains initial "INFO: task collectd:2577 blocked for more
than 120 seconds"
   and two dumps
echo w > /proc/sysrq-trigger

log is located http://files.nangu.tv/kernel/kern.log
Let me know if you need more info.

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sw raid5 hungs on resync and high IO load, 2.6.32.23
  2010-10-27  7:35 sw raid5 hungs on resync and high IO load, 2.6.32.23 Martin Hamrle
@ 2010-10-27  8:01 ` Neil Brown
  2010-10-27 10:48   ` Martin Hamrle
  2010-10-27  8:50 ` Mikael Abrahamsson
  1 sibling, 1 reply; 5+ messages in thread
From: Neil Brown @ 2010-10-27  8:01 UTC (permalink / raw)
  To: Martin Hamrle; +Cc: linux-raid

On Wed, 27 Oct 2010 09:35:17 +0200
Martin Hamrle <martin.hamrle@nangu.tv> wrote:

> Hi,
> 
> I'm having this issue on several boxes with several configuration.
> One of them is a box with 8 drives attached to ARC-1160 in pass through
> mode and build sw raid5 from these drives. There is also one drive to OS.
> 
> During resync or check and heavy IO load, process tscpd (tscpd is IO
> load maker) hungs, the machine is still alive but there are many blocked
> processes.
> After tscpd hungs, IO load is generated only by resync. In traceback you
> can see blocked processes (ps, htop cat) accessing tscpd cmdline in
> proc. Some tscpd threads is blocked during writing files into fs on
> raid5. Reading these files is also blocking, reading other files in
> filesystem is fast as usual.  This state takes 110 minutes. After that
> all blocked processes continue their work.
> 
> I am not sure what is the reason of the end of the weird state. I think
> the end was caused by starting copying kernel source into array.
> 
> Note that this is first time when hung processes wake up I never wait so
> long.
> 
> I think that it is related to sw raid because I do not see this issue on
> hw raid or on sw raid without resync.
> 
> kern.log contains initial "INFO: task collectd:2577 blocked for more
> than 120 seconds"
>    and two dumps
> echo w > /proc/sysrq-trigger
> 
> log is located http://files.nangu.tv/kernel/kern.log
> Let me know if you need more info.
> 

When I try to access your kern.log I get 

403 - Forbidden


Just include it in-line in the email.

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sw raid5 hungs on resync and high IO load, 2.6.32.23
  2010-10-27  7:35 sw raid5 hungs on resync and high IO load, 2.6.32.23 Martin Hamrle
  2010-10-27  8:01 ` Neil Brown
@ 2010-10-27  8:50 ` Mikael Abrahamsson
  1 sibling, 0 replies; 5+ messages in thread
From: Mikael Abrahamsson @ 2010-10-27  8:50 UTC (permalink / raw)
  To: Martin Hamrle; +Cc: linux-raid

On Wed, 27 Oct 2010, Martin Hamrle wrote:

> I am not sure what is the reason of the end of the weird state. I think 
> the end was caused by starting copying kernel source into array.

It might be a 2.6.32 problem. I booted Ubuntu 10.04 LTS live off of an USB 
stick two days ago, proceeded to mount an external USB drive and started 
dd:ing my laptop drive to the external drive. To check the progress/speed 
I continued to do "apt-get install sysstat" (to get iostat). This install 
didn't succeed until the dd was over, I also about 40 gigs into the dd ran 
"sync" which blocked also until the dd was over.

So basically, dd:ing an internal 80 gig drive to external usb hd made two 
commands ("apt-get install" and sync) block and not succeed until the 
write pressure from dd was over. There might be something rotten here...

This was on a Thinkpad X200 laptop with 4 gigs of ram.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sw raid5 hungs on resync and high IO load, 2.6.32.23
  2010-10-27  8:01 ` Neil Brown
@ 2010-10-27 10:48   ` Martin Hamrle
  2010-11-15  1:51     ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Hamrle @ 2010-10-27 10:48 UTC (permalink / raw)
  To: linux-raid


On 27.10.2010 10:01, Neil Brown wrote:
> On Wed, 27 Oct 2010 09:35:17 +0200
> Martin Hamrle<martin.hamrle@nangu.tv>  wrote:
>
>> Hi,
>>
>> I'm having this issue on several boxes with several configuration.
>> One of them is a box with 8 drives attached to ARC-1160 in pass through
>> mode and build sw raid5 from these drives. There is also one drive to OS.
>>
>> During resync or check and heavy IO load, process tscpd (tscpd is IO
>> load maker) hungs, the machine is still alive but there are many blocked
>> processes.
>> After tscpd hungs, IO load is generated only by resync. In traceback you
>> can see blocked processes (ps, htop cat) accessing tscpd cmdline in
>> proc. Some tscpd threads is blocked during writing files into fs on
>> raid5. Reading these files is also blocking, reading other files in
>> filesystem is fast as usual.  This state takes 110 minutes. After that
>> all blocked processes continue their work.
>>
>> I am not sure what is the reason of the end of the weird state. I think
>> the end was caused by starting copying kernel source into array.
>>
>> Note that this is first time when hung processes wake up I never wait so
>> long.
>>
>> I think that it is related to sw raid because I do not see this issue on
>> hw raid or on sw raid without resync.
>>
>> kern.log contains initial "INFO: task collectd:2577 blocked for more
>> than 120 seconds"
>>     and two dumps
>> echo w>  /proc/sysrq-trigger
>>
>> log is located http://files.nangu.tv/kernel/kern.log
>> Let me know if you need more info.
>>
> When I try to access your kern.log I get
>
> 403 - Forbidden
Sorry about that, it is fixed now

> Just include it in-line in the email.
>
> NeilBrown




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sw raid5 hungs on resync and high IO load, 2.6.32.23
  2010-10-27 10:48   ` Martin Hamrle
@ 2010-11-15  1:51     ` Neil Brown
  0 siblings, 0 replies; 5+ messages in thread
From: Neil Brown @ 2010-11-15  1:51 UTC (permalink / raw)
  To: Martin Hamrle; +Cc: linux-raid

On Wed, 27 Oct 2010 12:48:13 +0200
Martin Hamrle <martin.hamrle@nangu.tv> wrote:

> 
> On 27.10.2010 10:01, Neil Brown wrote:
> > On Wed, 27 Oct 2010 09:35:17 +0200
> > Martin Hamrle<martin.hamrle@nangu.tv>  wrote:
> >
> >> Hi,
> >>
> >> I'm having this issue on several boxes with several configuration.
> >> One of them is a box with 8 drives attached to ARC-1160 in pass through
> >> mode and build sw raid5 from these drives. There is also one drive to OS.
> >>
> >> During resync or check and heavy IO load, process tscpd (tscpd is IO
> >> load maker) hungs, the machine is still alive but there are many blocked
> >> processes.
> >> After tscpd hungs, IO load is generated only by resync. In traceback you
> >> can see blocked processes (ps, htop cat) accessing tscpd cmdline in
> >> proc. Some tscpd threads is blocked during writing files into fs on
> >> raid5. Reading these files is also blocking, reading other files in
> >> filesystem is fast as usual.  This state takes 110 minutes. After that
> >> all blocked processes continue their work.
> >>
> >> I am not sure what is the reason of the end of the weird state. I think
> >> the end was caused by starting copying kernel source into array.
> >>
> >> Note that this is first time when hung processes wake up I never wait so
> >> long.
> >>
> >> I think that it is related to sw raid because I do not see this issue on
> >> hw raid or on sw raid without resync.
> >>
> >> kern.log contains initial "INFO: task collectd:2577 blocked for more
> >> than 120 seconds"
> >>     and two dumps
> >> echo w>  /proc/sysrq-trigger
> >>
> >> log is located http://files.nangu.tv/kernel/kern.log
> >> Let me know if you need more info.
> >>
> > When I try to access your kern.log I get
> >
> > 403 - Forbidden
> Sorry about that, it is fixed now

Thanks.

Unfortunately it doesn't really show anything interesting.  Just lots of
threads waiting on locks and such, nothing that even points to a problem with
md.

However some of the back traces are missing.  Notice the lines:

Oct 19 13:15:01 osn02 kernel: [72048.851702] md: using 128k window, over a total of 244198464 blocks.
Oct 19 13:38:54 osn02 kernel: 009]  [<ffffffff810c7c32>] ? congestion_wait+0x66/0x80

Between those there should be quite a lot of other stack trace info, but the
kernel log buffer wasn't big enough to hold everything so some got lost.
If you boot with
   log-buf-len=1M

it will make the log buffer larger so you want lose anything.  That *might*
be more helpful, but I cannot promise anything.

NeilBrown



> 
> > Just include it in-line in the email.
> >
> > NeilBrown
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-11-15  1:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-27  7:35 sw raid5 hungs on resync and high IO load, 2.6.32.23 Martin Hamrle
2010-10-27  8:01 ` Neil Brown
2010-10-27 10:48   ` Martin Hamrle
2010-11-15  1:51     ` Neil Brown
2010-10-27  8:50 ` Mikael Abrahamsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).