linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* resync hangs
@ 2009-06-06 17:42 Randall Smith
  2009-06-09 21:20 ` Randall Smith
  0 siblings, 1 reply; 8+ messages in thread
From: Randall Smith @ 2009-06-06 17:42 UTC (permalink / raw)
  To: linux-raid

On occasions that my raid5 array needs to resync (power outage, etc), 
the resync stops progressing early on.  This is on Debian Lenny.  I've 
tried the stock 2.6.26 kernel as well as the 2.6.29 kernel from Sid.

On boot:

[   10.022020] md: md2: raid array is not clean -- starting background 
reconstruction
[   10.032896] raid5: allocated 3226kB for md2
[   10.032936] raid5: raid level 5 set md2 active with 3 out of 3 
devices, algorithm 2
[   10.033685]  md2: unknown partition table
[   20.492246] md: resync of RAID array md2


~$ cat /proc/mdstat

md2 : active raid5 sda3[0] sdf3[2] sdc3[1]
       488279424 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
       [>....................]  resync =  0.0% (128/244139712) 
finish=1220697.9min speed=0K/sec


I it resyncs find when using a live cd.  Any ideas what's causing it to 
hang.

--Randall



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: resync hangs
  2009-06-06 17:42 resync hangs Randall Smith
@ 2009-06-09 21:20 ` Randall Smith
  2009-06-09 21:55   ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Randall Smith @ 2009-06-09 21:20 UTC (permalink / raw)
  To: linux-raid

Maybe I should have used "resync stalls" in the subject.

Any hints about this?  What kind of things might cause it to stall?

Randall

Randall Smith wrote:
> On occasions that my raid5 array needs to resync (power outage, etc), 
> the resync stops progressing early on.  This is on Debian Lenny.  I've 
> tried the stock 2.6.26 kernel as well as the 2.6.29 kernel from Sid.
> 
> On boot:
> 
> [   10.022020] md: md2: raid array is not clean -- starting background 
> reconstruction
> [   10.032896] raid5: allocated 3226kB for md2
> [   10.032936] raid5: raid level 5 set md2 active with 3 out of 3 
> devices, algorithm 2
> [   10.033685]  md2: unknown partition table
> [   20.492246] md: resync of RAID array md2
> 
> 
> ~$ cat /proc/mdstat
> 
> md2 : active raid5 sda3[0] sdf3[2] sdc3[1]
>       488279424 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>       [>....................]  resync =  0.0% (128/244139712) 
> finish=1220697.9min speed=0K/sec
> 
> 
> I it resyncs find when using a live cd.  Any ideas what's causing it to 
> hang.
> 
> --Randall


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: resync hangs
  2009-06-09 21:20 ` Randall Smith
@ 2009-06-09 21:55   ` NeilBrown
  2009-06-10  4:07     ` Randall Smith
  2009-06-22  3:18     ` Randall Smith
  0 siblings, 2 replies; 8+ messages in thread
From: NeilBrown @ 2009-06-09 21:55 UTC (permalink / raw)
  To: Randall Smith; +Cc: linux-raid

On Wed, June 10, 2009 7:20 am, Randall Smith wrote:
> Maybe I should have used "resync stalls" in the subject.
>
> Any hints about this?  What kind of things might cause it to stall?

I cannot think of anything that would cause a stall like that.

The "128" suggest that md_do_sync has scheduled one "window" of
IO and is in the section of code that calculates the speed and
makes sure were aren't going too fast.

'currspeed' will almost certainly be '1' by this point, so it seems
to imply that min_speed and max_speed are both zero.  Seems unlikely.
You could confirm or deny that with

   grep . /sys/block/md2/md/*

if you ever see the problem again.

Just to clarify:  You have seen this only occasionally, not everytime
a resync is needed.  But you have seen it both on 2.6.26 and 2.6.29
(Debian versions).  Correct?

Thanks,
NeilBrown


>
> Randall
>
> Randall Smith wrote:
>> On occasions that my raid5 array needs to resync (power outage, etc),
>> the resync stops progressing early on.  This is on Debian Lenny.  I've
>> tried the stock 2.6.26 kernel as well as the 2.6.29 kernel from Sid.
>>
>> On boot:
>>
>> [   10.022020] md: md2: raid array is not clean -- starting background
>> reconstruction
>> [   10.032896] raid5: allocated 3226kB for md2
>> [   10.032936] raid5: raid level 5 set md2 active with 3 out of 3
>> devices, algorithm 2
>> [   10.033685]  md2: unknown partition table
>> [   20.492246] md: resync of RAID array md2
>>
>>
>> ~$ cat /proc/mdstat
>>
>> md2 : active raid5 sda3[0] sdf3[2] sdc3[1]
>>       488279424 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>       [>....................]  resync =  0.0% (128/244139712)
>> finish=1220697.9min speed=0K/sec
>>
>>
>> I it resyncs find when using a live cd.  Any ideas what's causing it to
>> hang.
>>
>> --Randall
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: resync hangs
  2009-06-09 21:55   ` NeilBrown
@ 2009-06-10  4:07     ` Randall Smith
  2009-06-22  3:18     ` Randall Smith
  1 sibling, 0 replies; 8+ messages in thread
From: Randall Smith @ 2009-06-10  4:07 UTC (permalink / raw)
  To: linux-raid


NeilBrown wrote:
> 
> Just to clarify:  You have seen this only occasionally, not everytime
> a resync is needed.  But you have seen it both on 2.6.26 and 2.6.29
> (Debian versions).  Correct?
> 
> Thanks,
> NeilBrown


It happens every time on both 2.6.26 and 2.6.29.  But it seems to be 
somehow specific to my system setup, because I can boot Ubuntu 8.04 
(2.6.24?) from a live cd and is resyncs fine.

I'll probably have more opportunities to check this since I'm 
troubleshooting a problem with my system in which the power 
instantaneously shuts off.

Thanks for your reply.

Randall


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: resync hangs
  2009-06-09 21:55   ` NeilBrown
  2009-06-10  4:07     ` Randall Smith
@ 2009-06-22  3:18     ` Randall Smith
  2009-06-22  3:46       ` NeilBrown
  1 sibling, 1 reply; 8+ messages in thread
From: Randall Smith @ 2009-06-22  3:18 UTC (permalink / raw)
  To: linux-raid

NeilBrown wrote:
> On Wed, June 10, 2009 7:20 am, Randall Smith wrote:
>> Maybe I should have used "resync stalls" in the subject.
>>
>> Any hints about this?  What kind of things might cause it to stall?
> 
> I cannot think of anything that would cause a stall like that.
> 
> The "128" suggest that md_do_sync has scheduled one "window" of
> IO and is in the section of code that calculates the speed and
> makes sure were aren't going too fast.
> 
> 'currspeed' will almost certainly be '1' by this point, so it seems
> to imply that min_speed and max_speed are both zero.  Seems unlikely.
> You could confirm or deny that with
> 
>    grep . /sys/block/md2/md/*
> 
> if you ever see the problem again.


Happened again.

md2 : active raid5 sda3[0] sdf3[2] sdc3[1]
       488279424 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
       [>....................]  resync =  0.3% (787584/244139712) 
finish=1257319.3min speed=0K/sec


~$ grep . /sys/block/md2/md/*
/sys/block/md2/md/array_state:active
grep: /sys/block/md2/md/bitmap_set_bits: Permission denied
/sys/block/md2/md/chunk_size:65536
/sys/block/md2/md/component_size:244139712
/sys/block/md2/md/degraded:0
/sys/block/md2/md/layout:2
/sys/block/md2/md/level:raid5
/sys/block/md2/md/metadata_version:0.90
/sys/block/md2/md/mismatch_cnt:0
grep: /sys/block/md2/md/new_dev: Permission denied
/sys/block/md2/md/preread_bypass_threshold:1
/sys/block/md2/md/raid_disks:3
/sys/block/md2/md/reshape_position:none
/sys/block/md2/md/resync_start:0
/sys/block/md2/md/safe_mode_delay:0.204
/sys/block/md2/md/stripe_cache_active:0
/sys/block/md2/md/stripe_cache_size:256
/sys/block/md2/md/suspend_hi:0
/sys/block/md2/md/suspend_lo:0
/sys/block/md2/md/sync_action:resync
/sys/block/md2/md/sync_completed:1575168 / 488279424
/sys/block/md2/md/sync_force_parallel:0
/sys/block/md2/md/sync_max:max
/sys/block/md2/md/sync_min:0
/sys/block/md2/md/sync_speed:0
/sys/block/md2/md/sync_speed_max:0 (system)
/sys/block/md2/md/sync_speed_min:0 (system)


> 
> Just to clarify:  You have seen this only occasionally, not everytime
> a resync is needed.  But you have seen it both on 2.6.26 and 2.6.29
> (Debian versions).  Correct?


Happens every time a resync is needed.  I have to boot a rescue cd to 
rebuild it.

> 
> Thanks,
> NeilBrown
> 
> 
>> Randall
>>
>> Randall Smith wrote:
>>> On occasions that my raid5 array needs to resync (power outage, etc),
>>> the resync stops progressing early on.  This is on Debian Lenny.  I've
>>> tried the stock 2.6.26 kernel as well as the 2.6.29 kernel from Sid.
>>>
>>> On boot:
>>>
>>> [   10.022020] md: md2: raid array is not clean -- starting background
>>> reconstruction
>>> [   10.032896] raid5: allocated 3226kB for md2
>>> [   10.032936] raid5: raid level 5 set md2 active with 3 out of 3
>>> devices, algorithm 2
>>> [   10.033685]  md2: unknown partition table
>>> [   20.492246] md: resync of RAID array md2
>>>
>>>
>>> ~$ cat /proc/mdstat
>>>
>>> md2 : active raid5 sda3[0] sdf3[2] sdc3[1]
>>>       488279424 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>>       [>....................]  resync =  0.0% (128/244139712)
>>> finish=1220697.9min speed=0K/sec
>>>
>>>
>>> I it resyncs find when using a live cd.  Any ideas what's causing it to
>>> hang.
>>>
>>> --Randall

Randall


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: resync hangs
  2009-06-22  3:18     ` Randall Smith
@ 2009-06-22  3:46       ` NeilBrown
  2009-06-22 16:53         ` Randall Smith
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2009-06-22  3:46 UTC (permalink / raw)
  To: Randall Smith; +Cc: linux-raid

On Mon, June 22, 2009 1:18 pm, Randall Smith wrote:
> NeilBrown wrote:
>> On Wed, June 10, 2009 7:20 am, Randall Smith wrote:
>>> Maybe I should have used "resync stalls" in the subject.
>>>
>>> Any hints about this?  What kind of things might cause it to stall?
>>
>> I cannot think of anything that would cause a stall like that.
>>
>> The "128" suggest that md_do_sync has scheduled one "window" of
>> IO and is in the section of code that calculates the speed and
>> makes sure were aren't going too fast.
>>
>> 'currspeed' will almost certainly be '1' by this point, so it seems
>> to imply that min_speed and max_speed are both zero.  Seems unlikely.
>> You could confirm or deny that with
>>
>>    grep . /sys/block/md2/md/*
>>
>> if you ever see the problem again.
>
>
> Happened again.
>
> md2 : active raid5 sda3[0] sdf3[2] sdc3[1]
>        488279424 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>        [>....................]  resync =  0.3% (787584/244139712)
> finish=1257319.3min speed=0K/sec

This is a little different.  Instead of stopping at 128K, it
stopped at 787M.  So it didn't stop straight away.

>
>
> ~$ grep . /sys/block/md2/md/*
...
> /sys/block/md2/md/sync_speed:0
> /sys/block/md2/md/sync_speed_max:0 (system)
> /sys/block/md2/md/sync_speed_min:0 (system)

This looks like the culprit.  The sync_speed has been limited to
0.  The "(system)" means that it is using the value from
  /proc/sys/dev/raid/speed_limit_max

It seem that that value has been set to 0 somehow.
The default value is 200000.

Could something be setting that?  Can you set it back?
 .../speed_limit_min should be 1000 by default.

NeilBrown



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: resync hangs
  2009-06-22  3:46       ` NeilBrown
@ 2009-06-22 16:53         ` Randall Smith
  2009-06-26  2:15           ` Neil Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Randall Smith @ 2009-06-22 16:53 UTC (permalink / raw)
  To: linux-raid

NeilBrown wrote:
>>
>> ~$ grep . /sys/block/md2/md/*
> ...
>> /sys/block/md2/md/sync_speed:0
>> /sys/block/md2/md/sync_speed_max:0 (system)
>> /sys/block/md2/md/sync_speed_min:0 (system)
> 
> This looks like the culprit.  The sync_speed has been limited to
> 0.  The "(system)" means that it is using the value from
>   /proc/sys/dev/raid/speed_limit_max
> 
> It seem that that value has been set to 0 somehow.
> The default value is 200000.
> 
> Could something be setting that?  Can you set it back?
>  .../speed_limit_min should be 1000 by default.
> 
> NeilBrown

I'll set it back next time it happens.  Any guess what might set that 
value to 0 and why? mdadm is the only thing I know of that knows about 
raid.  It starts up late in the init process with this command:

/sbin/mdadm --monitor --pid-file /var/run/mdadm/monitor.pid --daemonise 
--scan --syslog

Thanks for your help.

Randall


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: resync hangs
  2009-06-22 16:53         ` Randall Smith
@ 2009-06-26  2:15           ` Neil Brown
  0 siblings, 0 replies; 8+ messages in thread
From: Neil Brown @ 2009-06-26  2:15 UTC (permalink / raw)
  To: Randall Smith; +Cc: linux-raid

On Monday June 22, randall@tnr.cc wrote:
> NeilBrown wrote:
> >>
> >> ~$ grep . /sys/block/md2/md/*
> > ...
> >> /sys/block/md2/md/sync_speed:0
> >> /sys/block/md2/md/sync_speed_max:0 (system)
> >> /sys/block/md2/md/sync_speed_min:0 (system)
> > 
> > This looks like the culprit.  The sync_speed has been limited to
> > 0.  The "(system)" means that it is using the value from
> >   /proc/sys/dev/raid/speed_limit_max
> > 
> > It seem that that value has been set to 0 somehow.
> > The default value is 200000.
> > 
> > Could something be setting that?  Can you set it back?
> >  .../speed_limit_min should be 1000 by default.
> > 
> > NeilBrown
> 
> I'll set it back next time it happens.  Any guess what might set that 
> value to 0 and why? mdadm is the only thing I know of that knows about 
> raid.  It starts up late in the init process with this command:

No, I don't know of anything that would changed these values.  mdadm
certainly does not.  My guess would be some errant script, maybe part
of a cron job.


NeilBrown

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-06-26  2:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-06 17:42 resync hangs Randall Smith
2009-06-09 21:20 ` Randall Smith
2009-06-09 21:55   ` NeilBrown
2009-06-10  4:07     ` Randall Smith
2009-06-22  3:18     ` Randall Smith
2009-06-22  3:46       ` NeilBrown
2009-06-22 16:53         ` Randall Smith
2009-06-26  2:15           ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).