constant array_state active after specific jobs

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* constant array_state active after specific jobs
@ 2017-03-23  8:46 pdi
  2017-03-24  5:25 ` NeilBrown
  0 siblings, 1 reply; 6+ messages in thread
From: pdi @ 2017-03-23  8:46 UTC (permalink / raw)
  To: linux-raid


Greetings all,

The problem in a nutshell is that an array is clean after boot, until
some specific jobs switch it to active where it remains until reboot.

A similar problem was discussed, and solved, in 
https://www.spinics.net/lists/raid/msg46450.html. However, AFAICT,
it is not the same issue.

I would be grateful for any insights as to why this happens and/or how
to prevent it.

The relevant info follows, please let me know if anything further might
help.

Many thanks in advance.

- uname -a
  Linux hostname 4.4.38 #1 SMP Sun Dec 11 16:03:41 CST 2016 x86_64
  Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux
- mdadm -V
  mdadm - v3.3.4 - 3rd August 2015
- Desktop drives without sct/erc,
  with timeout mismatch correction as per
  https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
- /dev/md9 is a raid10 array, 4 devices, far=2,
  with various dirs used as samba and nfs shares
- The array is in *constant* array_state active
- mdadm -D /dev/md9 | grep 'State :'
  State : active
- cat /sys/block/md9/md/array_state
  active
- watch -d 'grep md9 /proc/diskstats'
  remain unchanged
- uptime
  load average: 0.00, 0.00, 0.00
- cat /sys/block/md9/md/safe_mode_delay
  0.201
- echo 0.1 > /sys/block/md9/md/safe_mode_delay
  array_state remains active
- echo clean > /sys/block/md9/md/array_state
  echo: write error: Device or resource busy
- reboot (with or without prior check)
  array_state clean
- After reboot, array remains clean until some specific
  jobs put it in constant active state. Such jobs so far
  identified:
  - echo check > /sys/block/md9/md/sync_action
  - run an rsnapshot job
  - start a qemu/kvm vm
- Other jobs, like text/doc editing, multimedia playback,
  etc retain array_state clean



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: constant array_state active after specific jobs
  2017-03-23  8:46 constant array_state active after specific jobs pdi
@ 2017-03-24  5:25 ` NeilBrown
  2017-03-24  7:04   ` pdi
  2017-03-27 18:08   ` Shaohua Li
  0 siblings, 2 replies; 6+ messages in thread
From: NeilBrown @ 2017-03-24  5:25 UTC (permalink / raw)
  To: pdi, linux-raid; +Cc: Shaohua Li

[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]

On Thu, Mar 23 2017, pdi wrote:

> Greetings all,
>
> The problem in a nutshell is that an array is clean after boot, until
> some specific jobs switch it to active where it remains until reboot.
>
> A similar problem was discussed, and solved, in 
> https://www.spinics.net/lists/raid/msg46450.html. However, AFAICT,
> it is not the same issue.
>
> I would be grateful for any insights as to why this happens and/or how
> to prevent it.
>
> The relevant info follows, please let me know if anything further might
> help.
>
> Many thanks in advance.
>
> - uname -a
>   Linux hostname 4.4.38 #1 SMP Sun Dec 11 16:03:41 CST 2016 x86_64
>   Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux
> - mdadm -V
>   mdadm - v3.3.4 - 3rd August 2015
> - Desktop drives without sct/erc,
>   with timeout mismatch correction as per
>   https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
> - /dev/md9 is a raid10 array, 4 devices, far=2,
>   with various dirs used as samba and nfs shares
> - The array is in *constant* array_state active
> - mdadm -D /dev/md9 | grep 'State :'
>   State : active
> - cat /sys/block/md9/md/array_state
>   active
> - watch -d 'grep md9 /proc/diskstats'
>   remain unchanged
> - uptime
>   load average: 0.00, 0.00, 0.00
> - cat /sys/block/md9/md/safe_mode_delay
>   0.201
> - echo 0.1 > /sys/block/md9/md/safe_mode_delay
>   array_state remains active
> - echo clean > /sys/block/md9/md/array_state
>   echo: write error: Device or resource busy
> - reboot (with or without prior check)
>   array_state clean
> - After reboot, array remains clean until some specific
>   jobs put it in constant active state. Such jobs so far
>   identified:
>   - echo check > /sys/block/md9/md/sync_action
>   - run an rsnapshot job
>   - start a qemu/kvm vm
> - Other jobs, like text/doc editing, multimedia playback,
>   etc retain array_state clean

This bug was introduced by
Commit: 20d0189b1012 ("block: Introduce new bio_split()")
in 3.14, and fixed by
Commit: 9b622e2bbcf0 ("raid10: increment write counter after bio is split")
in 4.8.

Maybe the latter patch should be sent to -stable ??

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: constant array_state active after specific jobs
  2017-03-24  5:25 ` NeilBrown
@ 2017-03-24  7:04   ` pdi
  2017-03-26 22:42     ` NeilBrown
  2017-03-27 18:08   ` Shaohua Li
  1 sibling, 1 reply; 6+ messages in thread
From: pdi @ 2017-03-24  7:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Shaohua Li

On Fri, 24 Mar 2017 16:25:35 +1100
NeilBrown <neilb@suse.com> wrote:

> On Thu, Mar 23 2017, pdi wrote:
> 
> > Greetings all,
> >
> > The problem in a nutshell is that an array is clean after boot,
> > until some specific jobs switch it to active where it remains until
> > reboot.
> >
> > A similar problem was discussed, and solved, in 
> > https://www.spinics.net/lists/raid/msg46450.html. However, AFAICT,
> > it is not the same issue.
> >
> > I would be grateful for any insights as to why this happens and/or
> > how to prevent it.
> >
> > The relevant info follows, please let me know if anything further
> > might help.
> >
> > Many thanks in advance.
> >
> > - uname -a
> >   Linux hostname 4.4.38 #1 SMP Sun Dec 11 16:03:41 CST 2016 x86_64
> >   Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux
> > - mdadm -V
> >   mdadm - v3.3.4 - 3rd August 2015
> > - Desktop drives without sct/erc,
> >   with timeout mismatch correction as per
> >   https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
> > - /dev/md9 is a raid10 array, 4 devices, far=2,
> >   with various dirs used as samba and nfs shares
> > - The array is in *constant* array_state active
> > - mdadm -D /dev/md9 | grep 'State :'
> >   State : active
> > - cat /sys/block/md9/md/array_state
> >   active
> > - watch -d 'grep md9 /proc/diskstats'
> >   remain unchanged
> > - uptime
> >   load average: 0.00, 0.00, 0.00
> > - cat /sys/block/md9/md/safe_mode_delay
> >   0.201
> > - echo 0.1 > /sys/block/md9/md/safe_mode_delay
> >   array_state remains active
> > - echo clean > /sys/block/md9/md/array_state
> >   echo: write error: Device or resource busy
> > - reboot (with or without prior check)
> >   array_state clean
> > - After reboot, array remains clean until some specific
> >   jobs put it in constant active state. Such jobs so far
> >   identified:
> >   - echo check > /sys/block/md9/md/sync_action
> >   - run an rsnapshot job
> >   - start a qemu/kvm vm
> > - Other jobs, like text/doc editing, multimedia playback,
> >   etc retain array_state clean  
> 
> This bug was introduced by
> Commit: 20d0189b1012 ("block: Introduce new bio_split()")
> in 3.14, and fixed by
> Commit: 9b622e2bbcf0 ("raid10: increment write counter after bio is
> split") in 4.8.
> 
> Maybe the latter patch should be sent to -stable ??
> 
> NeilBrown

NeilBrown, thank you for your swift and concise answer.

I gather you are referring to kernel version numbers. The described
behaviour was first noticed many months ago with kernel 2.6.37.6, and
persisted after a system upgrade and kernel 4.4.38. However, after the
upgrade two things were corrected, the timeout mismatch, and a
Current_Pending_Sector in one of the drives; which may, or may not,
explain the occurrence with the older kernel.

Is this constant active state in the data array something to worry about
and try kernel >= 4.8, or shall I let be?

pdi




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: constant array_state active after specific jobs
  2017-03-24  7:04   ` pdi
@ 2017-03-26 22:42     ` NeilBrown
  2017-03-28 13:44       ` pdi
  0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2017-03-26 22:42 UTC (permalink / raw)
  To: pdi; +Cc: linux-raid, Shaohua Li

[-- Attachment #1: Type: text/plain, Size: 3594 bytes --]

On Fri, Mar 24 2017, pdi wrote:

> On Fri, 24 Mar 2017 16:25:35 +1100
> NeilBrown <neilb@suse.com> wrote:
>
>> On Thu, Mar 23 2017, pdi wrote:
>> 
>> > Greetings all,
>> >
>> > The problem in a nutshell is that an array is clean after boot,
>> > until some specific jobs switch it to active where it remains until
>> > reboot.
>> >
>> > A similar problem was discussed, and solved, in 
>> > https://www.spinics.net/lists/raid/msg46450.html. However, AFAICT,
>> > it is not the same issue.
>> >
>> > I would be grateful for any insights as to why this happens and/or
>> > how to prevent it.
>> >
>> > The relevant info follows, please let me know if anything further
>> > might help.
>> >
>> > Many thanks in advance.
>> >
>> > - uname -a
>> >   Linux hostname 4.4.38 #1 SMP Sun Dec 11 16:03:41 CST 2016 x86_64
>> >   Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux
>> > - mdadm -V
>> >   mdadm - v3.3.4 - 3rd August 2015
>> > - Desktop drives without sct/erc,
>> >   with timeout mismatch correction as per
>> >   https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
>> > - /dev/md9 is a raid10 array, 4 devices, far=2,
>> >   with various dirs used as samba and nfs shares
>> > - The array is in *constant* array_state active
>> > - mdadm -D /dev/md9 | grep 'State :'
>> >   State : active
>> > - cat /sys/block/md9/md/array_state
>> >   active
>> > - watch -d 'grep md9 /proc/diskstats'
>> >   remain unchanged
>> > - uptime
>> >   load average: 0.00, 0.00, 0.00
>> > - cat /sys/block/md9/md/safe_mode_delay
>> >   0.201
>> > - echo 0.1 > /sys/block/md9/md/safe_mode_delay
>> >   array_state remains active
>> > - echo clean > /sys/block/md9/md/array_state
>> >   echo: write error: Device or resource busy
>> > - reboot (with or without prior check)
>> >   array_state clean
>> > - After reboot, array remains clean until some specific
>> >   jobs put it in constant active state. Such jobs so far
>> >   identified:
>> >   - echo check > /sys/block/md9/md/sync_action
>> >   - run an rsnapshot job
>> >   - start a qemu/kvm vm
>> > - Other jobs, like text/doc editing, multimedia playback,
>> >   etc retain array_state clean  
>> 
>> This bug was introduced by
>> Commit: 20d0189b1012 ("block: Introduce new bio_split()")
>> in 3.14, and fixed by
>> Commit: 9b622e2bbcf0 ("raid10: increment write counter after bio is
>> split") in 4.8.
>> 
>> Maybe the latter patch should be sent to -stable ??
>> 
>> NeilBrown
>
> NeilBrown, thank you for your swift and concise answer.
>
> I gather you are referring to kernel version numbers. The described
> behaviour was first noticed many months ago with kernel 2.6.37.6, and
> persisted after a system upgrade and kernel 4.4.38. However, after the
> upgrade two things were corrected, the timeout mismatch, and a
> Current_Pending_Sector in one of the drives; which may, or may not,
> explain the occurrence with the older kernel.
>
> Is this constant active state in the data array something to worry about
> and try kernel >= 4.8, or shall I let be?

The only important consequence of the constant active state is that if
your machine crashes at a moment when the array would otherwise have
been idle, then a resync will be needed after reboot.  Without the
constant active state, that resync would not have been needed.

If you have a write-intent bitmap, this is not particularly relevant.

I cannot say how important it is to you to avoid a resync after a crash,
so I don't know if you should just let it be or not.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: constant array_state active after specific jobs
  2017-03-26 22:42     ` NeilBrown
@ 2017-03-28 13:44       ` pdi
  0 siblings, 0 replies; 6+ messages in thread
From: pdi @ 2017-03-28 13:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Shaohua Li

On Mon, 27 Mar 2017 09:42:29 +1100
NeilBrown <neilb@suse.com> wrote:

> On Fri, Mar 24 2017, pdi wrote:
> 
> > On Fri, 24 Mar 2017 16:25:35 +1100
> > NeilBrown <neilb@suse.com> wrote:
> >  
> >> On Thu, Mar 23 2017, pdi wrote:
> >>   
> >> > Greetings all,
> >> >
> >> > The problem in a nutshell is that an array is clean after boot,
> >> > until some specific jobs switch it to active where it remains
> >> > until reboot.
> >> >
> >> > A similar problem was discussed, and solved, in 
> >> > https://www.spinics.net/lists/raid/msg46450.html. However,
> >> > AFAICT, it is not the same issue.
> >> >
> >> > I would be grateful for any insights as to why this happens
> >> > and/or how to prevent it.
> >> >
> >> > The relevant info follows, please let me know if anything further
> >> > might help.
> >> >
> >> > Many thanks in advance.
> >> >
> >> > - uname -a
> >> >   Linux hostname 4.4.38 #1 SMP Sun Dec 11 16:03:41 CST 2016
> >> > x86_64 Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel
> >> > GNU/Linux
> >> > - mdadm -V
> >> >   mdadm - v3.3.4 - 3rd August 2015
> >> > - Desktop drives without sct/erc,
> >> >   with timeout mismatch correction as per
> >> >   https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
> >> > - /dev/md9 is a raid10 array, 4 devices, far=2,
> >> >   with various dirs used as samba and nfs shares
> >> > - The array is in *constant* array_state active
> >> > - mdadm -D /dev/md9 | grep 'State :'
> >> >   State : active
> >> > - cat /sys/block/md9/md/array_state
> >> >   active
> >> > - watch -d 'grep md9 /proc/diskstats'
> >> >   remain unchanged
> >> > - uptime
> >> >   load average: 0.00, 0.00, 0.00
> >> > - cat /sys/block/md9/md/safe_mode_delay
> >> >   0.201
> >> > - echo 0.1 > /sys/block/md9/md/safe_mode_delay
> >> >   array_state remains active
> >> > - echo clean > /sys/block/md9/md/array_state
> >> >   echo: write error: Device or resource busy
> >> > - reboot (with or without prior check)
> >> >   array_state clean
> >> > - After reboot, array remains clean until some specific
> >> >   jobs put it in constant active state. Such jobs so far
> >> >   identified:
> >> >   - echo check > /sys/block/md9/md/sync_action
> >> >   - run an rsnapshot job
> >> >   - start a qemu/kvm vm
> >> > - Other jobs, like text/doc editing, multimedia playback,
> >> >   etc retain array_state clean    
> >> 
> >> This bug was introduced by
> >> Commit: 20d0189b1012 ("block: Introduce new bio_split()")
> >> in 3.14, and fixed by
> >> Commit: 9b622e2bbcf0 ("raid10: increment write counter after bio is
> >> split") in 4.8.
> >> 
> >> Maybe the latter patch should be sent to -stable ??
> >> 
> >> NeilBrown  
> >
> > NeilBrown, thank you for your swift and concise answer.
> >
> > I gather you are referring to kernel version numbers. The described
> > behaviour was first noticed many months ago with kernel 2.6.37.6,
> > and persisted after a system upgrade and kernel 4.4.38. However,
> > after the upgrade two things were corrected, the timeout mismatch,
> > and a Current_Pending_Sector in one of the drives; which may, or
> > may not, explain the occurrence with the older kernel.
> >
> > Is this constant active state in the data array something to worry
> > about and try kernel >= 4.8, or shall I let be?  
> 
> The only important consequence of the constant active state is that if
> your machine crashes at a moment when the array would otherwise have
> been idle, then a resync will be needed after reboot.  Without the
> constant active state, that resync would not have been needed.
> 
> If you have a write-intent bitmap, this is not particularly relevant.
> 
> I cannot say how important it is to you to avoid a resync after a
> crash, so I don't know if you should just let it be or not.
> 
> NeilBrown

NeilBrown,

Thank you for your clear explanation.

Best regards,
pdi



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: constant array_state active after specific jobs
  2017-03-24  5:25 ` NeilBrown
  2017-03-24  7:04   ` pdi
@ 2017-03-27 18:08   ` Shaohua Li
  1 sibling, 0 replies; 6+ messages in thread
From: Shaohua Li @ 2017-03-27 18:08 UTC (permalink / raw)
  To: NeilBrown; +Cc: pdi, linux-raid

On Fri, Mar 24, 2017 at 04:25:35PM +1100, Neil Brown wrote:
> On Thu, Mar 23 2017, pdi wrote:
> 
> > Greetings all,
> >
> > The problem in a nutshell is that an array is clean after boot, until
> > some specific jobs switch it to active where it remains until reboot.
> >
> > A similar problem was discussed, and solved, in 
> > https://www.spinics.net/lists/raid/msg46450.html. However, AFAICT,
> > it is not the same issue.
> >
> > I would be grateful for any insights as to why this happens and/or how
> > to prevent it.
> >
> > The relevant info follows, please let me know if anything further might
> > help.
> >
> > Many thanks in advance.
> >
> > - uname -a
> >   Linux hostname 4.4.38 #1 SMP Sun Dec 11 16:03:41 CST 2016 x86_64
> >   Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux
> > - mdadm -V
> >   mdadm - v3.3.4 - 3rd August 2015
> > - Desktop drives without sct/erc,
> >   with timeout mismatch correction as per
> >   https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
> > - /dev/md9 is a raid10 array, 4 devices, far=2,
> >   with various dirs used as samba and nfs shares
> > - The array is in *constant* array_state active
> > - mdadm -D /dev/md9 | grep 'State :'
> >   State : active
> > - cat /sys/block/md9/md/array_state
> >   active
> > - watch -d 'grep md9 /proc/diskstats'
> >   remain unchanged
> > - uptime
> >   load average: 0.00, 0.00, 0.00
> > - cat /sys/block/md9/md/safe_mode_delay
> >   0.201
> > - echo 0.1 > /sys/block/md9/md/safe_mode_delay
> >   array_state remains active
> > - echo clean > /sys/block/md9/md/array_state
> >   echo: write error: Device or resource busy
> > - reboot (with or without prior check)
> >   array_state clean
> > - After reboot, array remains clean until some specific
> >   jobs put it in constant active state. Such jobs so far
> >   identified:
> >   - echo check > /sys/block/md9/md/sync_action
> >   - run an rsnapshot job
> >   - start a qemu/kvm vm
> > - Other jobs, like text/doc editing, multimedia playback,
> >   etc retain array_state clean
> 
> This bug was introduced by
> Commit: 20d0189b1012 ("block: Introduce new bio_split()")
> in 3.14, and fixed by
> Commit: 9b622e2bbcf0 ("raid10: increment write counter after bio is split")
> in 4.8.
> 
> Maybe the latter patch should be sent to -stable ??

Sure, looks suitable, will do it now.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-03-28 13:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-23  8:46 constant array_state active after specific jobs pdi
2017-03-24  5:25 ` NeilBrown
2017-03-24  7:04   ` pdi
2017-03-26 22:42     ` NeilBrown
2017-03-28 13:44       ` pdi
2017-03-27 18:08   ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).