Spin down

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Spin down
@ 2011-12-03 11:23 Vincent Pelletier
  2011-12-05  6:11 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Vincent Pelletier @ 2011-12-03 11:23 UTC (permalink / raw)
  To: linux-raid

Hi.

Short system description:
up-to-date Debian sid, used on a NAS-ish machine (iscsi with
open-iscsi's kernel module, nfs, dnsmask for netboot), ext4 FS over
RAID1 on both system partition and exported data partitions.

I would like disks containing raid arrays to spin down, but there are
periodic disk writes which prevent this.
They do not show up in iotop (although it does show a write activity
in the topmost line).
Enabling /proc/sys/vm/block_dump generates the following output:

<7>[  918.078079] md0_raid1(304): WRITE block 8 on sda1 (2 sectors)
<7>[  918.078169] md0_raid1(304): WRITE block 8 on sdc1 (2 sectors)
<7>[  918.092124] md1_raid1(311): WRITE block 8 on sda5 (2 sectors)
<7>[  918.092184] md1_raid1(311): WRITE block 8 on sdc5 (2 sectors)
<7>[  918.292627] md0_raid1(304): WRITE block 8 on sda1 (2 sectors)
<7>[  918.292714] md0_raid1(304): WRITE block 8 on sdc1 (2 sectors)
<7>[  918.308500] md1_raid1(311): WRITE block 8 on sda5 (2 sectors)
<7>[  918.308588] md1_raid1(311): WRITE block 8 on sdc5 (2 sectors)
<7>[  923.113309] md0_raid1(304): WRITE block 8 on sda1 (2 sectors)
<7>[  923.113397] md0_raid1(304): WRITE block 8 on sdc1 (2 sectors)
<7>[  923.129330] md1_raid1(311): WRITE block 8 on sda5 (2 sectors)
<7>[  923.129388] md1_raid1(311): WRITE block 8 on sdc5 (2 sectors)
<7>[  923.332489] md0_raid1(304): WRITE block 8 on sda1 (2 sectors)
<7>[  923.332545] md0_raid1(304): WRITE block 8 on sdc1 (2 sectors)
<7>[  923.348507] md1_raid1(311): WRITE block 8 on sda5 (2 sectors)
<7>[  923.348595] md1_raid1(311): WRITE block 8 on sdc5 (2 sectors)

Note the 5-seconds pause between each half of this output.
Block 8 seems to be the md superblock (dixit hexdump + description of
md superblock), and "mdadm --examine" shows the "Update time" is being
increased.
Those arrays are in clean state, no synchronisation is happening and
no checking that I know of (I would expect a lot of disk activity).

Is there any reason why those writes are happening ?
I can imagine a mechanism like filesystem superblock periodic
flushing, but AFAIK it only happens when changes are happening (even
if they are not flushed themselves), and I would expect the same to
happen here.
Is there any knob to control those writes ?

Regards,
-- 
Vincent Pelletier

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spin down
  2011-12-03 11:23 Spin down Vincent Pelletier
@ 2011-12-05  6:11 ` NeilBrown
  2011-12-05  7:49   ` Vincent Pelletier
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2011-12-05  6:11 UTC (permalink / raw)
  To: Vincent Pelletier; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2778 bytes --]

On Sat, 3 Dec 2011 12:23:37 +0100 Vincent Pelletier <plr.vincent@gmail.com>
wrote:

> Hi.
> 
> Short system description:
> up-to-date Debian sid, used on a NAS-ish machine (iscsi with
> open-iscsi's kernel module, nfs, dnsmask for netboot), ext4 FS over
> RAID1 on both system partition and exported data partitions.
> 
> I would like disks containing raid arrays to spin down, but there are
> periodic disk writes which prevent this.
> They do not show up in iotop (although it does show a write activity
> in the topmost line).
> Enabling /proc/sys/vm/block_dump generates the following output:
> 
> <7>[  918.078079] md0_raid1(304): WRITE block 8 on sda1 (2 sectors)
> <7>[  918.078169] md0_raid1(304): WRITE block 8 on sdc1 (2 sectors)
> <7>[  918.092124] md1_raid1(311): WRITE block 8 on sda5 (2 sectors)
> <7>[  918.092184] md1_raid1(311): WRITE block 8 on sdc5 (2 sectors)
> <7>[  918.292627] md0_raid1(304): WRITE block 8 on sda1 (2 sectors)
> <7>[  918.292714] md0_raid1(304): WRITE block 8 on sdc1 (2 sectors)
> <7>[  918.308500] md1_raid1(311): WRITE block 8 on sda5 (2 sectors)
> <7>[  918.308588] md1_raid1(311): WRITE block 8 on sdc5 (2 sectors)
> <7>[  923.113309] md0_raid1(304): WRITE block 8 on sda1 (2 sectors)
> <7>[  923.113397] md0_raid1(304): WRITE block 8 on sdc1 (2 sectors)
> <7>[  923.129330] md1_raid1(311): WRITE block 8 on sda5 (2 sectors)
> <7>[  923.129388] md1_raid1(311): WRITE block 8 on sdc5 (2 sectors)
> <7>[  923.332489] md0_raid1(304): WRITE block 8 on sda1 (2 sectors)
> <7>[  923.332545] md0_raid1(304): WRITE block 8 on sdc1 (2 sectors)
> <7>[  923.348507] md1_raid1(311): WRITE block 8 on sda5 (2 sectors)
> <7>[  923.348595] md1_raid1(311): WRITE block 8 on sdc5 (2 sectors)
> 
> Note the 5-seconds pause between each half of this output.
> Block 8 seems to be the md superblock (dixit hexdump + description of
> md superblock), and "mdadm --examine" shows the "Update time" is being
> increased.
> Those arrays are in clean state, no synchronisation is happening and
> no checking that I know of (I would expect a lot of disk activity).
> 
> Is there any reason why those writes are happening ?
> I can imagine a mechanism like filesystem superblock periodic
> flushing, but AFAIK it only happens when changes are happening (even
> if they are not flushed themselves), and I would expect the same to
> happen here.
> Is there any knob to control those writes ?
> 
> Regards,

5 seconds suggests a bitmap update.  Do you have an internal bitmap?

Try to remove it
   mdadm --grow /dev/md0 --bitmap=none

and see if they stop.
Then try adding it back with a longer delay
  mdadm --grow /dev/md0 --bitmap=internal --delay=60

What kernel version are you running?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spin down
  2011-12-05  6:11 ` NeilBrown
@ 2011-12-05  7:49   ` Vincent Pelletier
  2011-12-06  4:39     ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Vincent Pelletier @ 2011-12-05  7:49 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, Dec 5, 2011 at 7:11 AM, NeilBrown <neilb@suse.de> wrote:
> 5 seconds suggests a bitmap update.  Do you have an internal bitmap?

I don't, but I enabled it on those arrays before with default delay.
I then removed it when I suspected (with help of block_dump) it to
cause spin-ups, and I noticed an improvement. Recently, the machine
was restarted (once on 3.0 kernel, and again to update to 3.1), and
after both restarts those 5s periodic writes occurred.

> Then try adding it back with a longer delay
>  mdadm --grow /dev/md0 --bitmap=internal --delay=60

Adding bitmaps on both arrays (md0 and md1), 5s periodic writes disappear.
Removing bitmaps seems to preserve the improvement.

Could the delay setting have survived on-disk and be used when
initializing the array, but cleared when mdadm later disables bitmap
usage ?

> What kernel version are you running?

3.1.4, according to dpkg:
ii  linux-image-3.1.0-1-amd64                   3.1.4-1
                     Linux 3.1 for 64-bit PCs

Regards,
-- 
Vincent Pelletier
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spin down
  2011-12-05  7:49   ` Vincent Pelletier
@ 2011-12-06  4:39     ` NeilBrown
  2011-12-06  8:21       ` Vincent Pelletier
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2011-12-06  4:39 UTC (permalink / raw)
  To: Vincent Pelletier; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

On Mon, 5 Dec 2011 08:49:03 +0100 Vincent Pelletier <plr.vincent@gmail.com>
wrote:

> On Mon, Dec 5, 2011 at 7:11 AM, NeilBrown <neilb@suse.de> wrote:
> > 5 seconds suggests a bitmap update.  Do you have an internal bitmap?
> 
> I don't, but I enabled it on those arrays before with default delay.
> I then removed it when I suspected (with help of block_dump) it to
> cause spin-ups, and I noticed an improvement. Recently, the machine
> was restarted (once on 3.0 kernel, and again to update to 3.1), and
> after both restarts those 5s periodic writes occurred.
> 
> > Then try adding it back with a longer delay
> >  mdadm --grow /dev/md0 --bitmap=internal --delay=60
> 
> Adding bitmaps on both arrays (md0 and md1), 5s periodic writes disappear.
> Removing bitmaps seems to preserve the improvement.
> 
> Could the delay setting have survived on-disk and be used when
> initializing the array, but cleared when mdadm later disables bitmap
> usage ?
> 
> > What kernel version are you running?
> 
> 3.1.4, according to dpkg:
> ii  linux-image-3.1.0-1-amd64                   3.1.4-1
>                      Linux 3.1 for 64-bit PCs
> 
> Regards,


Hmm... I cannot reproduce this which makes it harder.

Can you enable dynamic debugging on md?
Make sure CONFIG_DYNAMIC_DEBUG is set, mount debugfs on /sys/kernel/debug and

 echo module md_mod +p > /sys/kernel/debug/dynamic_debug/control

then look for extra messages in the kernel logs, starting "md:" most likely.

You'll almost certainly see something like

[ 1851.559629] md: waking up MD thread md0_raid1.
[ 1851.625819] md: updating md0 RAID superblock on device (in sync 1)
[ 1851.656878] md: (write) sdb's sb offset: 8
[ 1851.656915] md: (write) sda's sb offset: 8


every 5 seconds.  If you could confirm that and report if there is anything
else reported that might help.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spin down
  2011-12-06  4:39     ` NeilBrown
@ 2011-12-06  8:21       ` Vincent Pelletier
  0 siblings, 0 replies; 5+ messages in thread
From: Vincent Pelletier @ 2011-12-06  8:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, kobras

Le mardi 06 décembre 2011 05:39:23, NeilBrown a écrit :
> Hmm... I cannot reproduce this which makes it harder.

Rebooting on a CONFIG_DYNAMIC_DEBUG-enabled kernel and ssh'ing soon enough, I 
think I found the culprit: noflushd process.
(cc'ing its author)

<noflushd intro>
noflushd is a nice daemon helping suspending disks by tracking disk *read* 
activity. When enough time has passed without reads, it disables automated 
flushing (echo 0 > /proc/sys/vm/dirty_writeback_centisecs) and takes over 
flushing for individual block devices (only flushing spinning devices).
This way, there is no extra delay in writes when disk spins, and only an 
explicit flush (or read) will cause disk to spin up.
</noflushd intro>

I enabled block_trace to see if the problem was still here, and there was no 
activity for 10+ seconds. Then I saw traces for noflushd process, and 
previously-reported writes started ticking every 5s. 5s is the default (at 
least, on my machine) writeback period, so it's what noflushd uses when taking 
over per-device flushing.
I stopped noflushd process, and tada, problem solved.

FWIW, noflushd does individual flushes by opening bock device for write, and 
fsync()'ing it.

Sorry for the noise.

Regards,
-- 
Vincent Pelletier
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-12-06  8:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-03 11:23 Spin down Vincent Pelletier
2011-12-05  6:11 ` NeilBrown
2011-12-05  7:49   ` Vincent Pelletier
2011-12-06  4:39     ` NeilBrown
2011-12-06  8:21       ` Vincent Pelletier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).