Sleepy drives and MD RAID 6

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Sleepy drives and MD RAID 6
@ 2014-08-12  1:03 Adam Talbot
  2014-08-12  1:21 ` Larkin Lowrey
  2014-08-12  3:29 ` Roman Mamedov
  0 siblings, 2 replies; 17+ messages in thread
From: Adam Talbot @ 2014-08-12  1:03 UTC (permalink / raw)
  To: linux-raid

I need help from the Linux RAID pros.

To make a very long story short; I have a 7 disk in a RAID 6 array.  I
put the drives to sleep after 7 minutes of inactivity.  When I go to
use this array the spin up time is causing applications to hang.
Current spin up time is 50 seconds, but will be getting worse as I add
drives.

Here is the MUCH longer description including more specs (DingbatCA):
http://forums.gentoo.org/viewtopic-p-7599010.html

Any help would be greatly appreciated.  I think this would make a
great wiki article.

More details bellow:
root@nas:/data# smartctl -a /dev/sdd | grep Spin_Up
  3 Spin_Up_Time            0x0027   150   137   021    Pre-fail
Always       -       9608

root@nas:/data# time (touch foo ; sync)
real    0m49.004s
user    0m0.000s
sys     0m0.004s

root@nas:/data# time (touch foo ; sync)
real    0m50.647s
user    0m0.000s
sys     0m0.008s

root@nas:/data# df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/md125      9.1T  3.8T  5.4T  42% /data

root@nas:/data# mdadm -D /dev/md125
/dev/md125:
        Version : 1.2
  Creation Time : Wed Jun 18 07:54:38 2014
     Raid Level : raid6
     Array Size : 9766909440 (9314.45 GiB 10001.32 GB)
  Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
   Raid Devices : 7
  Total Devices : 7
    Persistence : Superblock is persistent

    Update Time : Mon Aug 11 16:30:16 2014
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : nas:data  (local to host nas)
           UUID : 74f9ce7a:df1c2698:c8ec7259:5fdb2618
         Events : 1038642

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       3       8       49        2      active sync   /dev/sdd1
       4       8       65        3      active sync   /dev/sde1
       5       8       81        4      active sync   /dev/sdf1
       7       8      145        5      active sync   /dev/sdj1
       6       8      129        6      active sync   /dev/sdi1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-12  1:03 Sleepy drives and MD RAID 6 Adam Talbot
@ 2014-08-12  1:21 ` Larkin Lowrey
  2014-08-12  1:31   ` Adam Talbot
  2014-08-12  5:55   ` Can Jeuleers
  2014-08-12  3:29 ` Roman Mamedov
  1 sibling, 2 replies; 17+ messages in thread
From: Larkin Lowrey @ 2014-08-12  1:21 UTC (permalink / raw)
  To: Adam Talbot, linux-raid

I was in the same boat and decided to solve the problem with some code.

I wrote a daemon that monitors /sys/block/sd?/stat for each member of
the array. If all the drives have been idle for X seconds the daemon
sends a spindown command to each member in parallel. If the array is
spun down the daemon watches for any change in the aforementioned stat
file and if there is it spins up all members in parallel.

The affect of this is the array spin-up time is only as long as the
slowest drive and all the drives spin down at the same time. My
experience has been that leaving spindown up to the drives is a bad
idea. Different models and different manufacturers have varying notions
of what 10 minutes means. Also, leaving spin-up to the controller is
also not so hot since some controllers spin-up the drives sequentially
rather than in parallel.

I'd be happy to share the code and even happier if someone wrote
something better!

--Larkin

On 8/11/2014 8:03 PM, Adam Talbot wrote:
> I need help from the Linux RAID pros.
>
> To make a very long story short; I have a 7 disk in a RAID 6 array.  I
> put the drives to sleep after 7 minutes of inactivity.  When I go to
> use this array the spin up time is causing applications to hang.
> Current spin up time is 50 seconds, but will be getting worse as I add
> drives.
>
> Here is the MUCH longer description including more specs (DingbatCA):
> http://forums.gentoo.org/viewtopic-p-7599010.html
>
> Any help would be greatly appreciated.  I think this would make a
> great wiki article.
>
> More details bellow:
> root@nas:/data# smartctl -a /dev/sdd | grep Spin_Up
>   3 Spin_Up_Time            0x0027   150   137   021    Pre-fail
> Always       -       9608
>
> root@nas:/data# time (touch foo ; sync)
> real    0m49.004s
> user    0m0.000s
> sys     0m0.004s
>
> root@nas:/data# time (touch foo ; sync)
> real    0m50.647s
> user    0m0.000s
> sys     0m0.008s
>
> root@nas:/data# df -h /data
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/md125      9.1T  3.8T  5.4T  42% /data
>
> root@nas:/data# mdadm -D /dev/md125
> /dev/md125:
>         Version : 1.2
>   Creation Time : Wed Jun 18 07:54:38 2014
>      Raid Level : raid6
>      Array Size : 9766909440 (9314.45 GiB 10001.32 GB)
>   Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
>    Raid Devices : 7
>   Total Devices : 7
>     Persistence : Superblock is persistent
>
>     Update Time : Mon Aug 11 16:30:16 2014
>           State : clean
>  Active Devices : 7
> Working Devices : 7
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>            Name : nas:data  (local to host nas)
>            UUID : 74f9ce7a:df1c2698:c8ec7259:5fdb2618
>          Events : 1038642
>
>     Number   Major   Minor   RaidDevice State
>        0       8       17        0      active sync   /dev/sdb1
>        1       8       33        1      active sync   /dev/sdc1
>        3       8       49        2      active sync   /dev/sdd1
>        4       8       65        3      active sync   /dev/sde1
>        5       8       81        4      active sync   /dev/sdf1
>        7       8      145        5      active sync   /dev/sdj1
>        6       8      129        6      active sync   /dev/sdi1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-12  1:21 ` Larkin Lowrey
@ 2014-08-12  1:31   ` Adam Talbot
  2014-08-12  5:55   ` Can Jeuleers
  1 sibling, 0 replies; 17+ messages in thread
From: Adam Talbot @ 2014-08-12  1:31 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: linux-raid

Larkin
I would be happy to take a look at your code and improve it, if I can.
I am good with C, C++ and shell scripts.  I can read almost everything
else.

I like your solution of going after /sys/block/sd?/stat.  Would it be
better if we looked at including something like this directly into the
MD raid stack, or at least an option for parallel spin up
(/sys/block/md125/md/parallel_spinup)?
Adam

On Mon, Aug 11, 2014 at 6:21 PM, Larkin Lowrey
<llowrey@nuclearwinter.com> wrote:
> I was in the same boat and decided to solve the problem with some code.
>
> I wrote a daemon that monitors /sys/block/sd?/stat for each member of
> the array. If all the drives have been idle for X seconds the daemon
> sends a spindown command to each member in parallel. If the array is
> spun down the daemon watches for any change in the aforementioned stat
> file and if there is it spins up all members in parallel.
>
> The affect of this is the array spin-up time is only as long as the
> slowest drive and all the drives spin down at the same time. My
> experience has been that leaving spindown up to the drives is a bad
> idea. Different models and different manufacturers have varying notions
> of what 10 minutes means. Also, leaving spin-up to the controller is
> also not so hot since some controllers spin-up the drives sequentially
> rather than in parallel.
>
> I'd be happy to share the code and even happier if someone wrote
> something better!
>
> --Larkin
>
> On 8/11/2014 8:03 PM, Adam Talbot wrote:
>> I need help from the Linux RAID pros.
>>
>> To make a very long story short; I have a 7 disk in a RAID 6 array.  I
>> put the drives to sleep after 7 minutes of inactivity.  When I go to
>> use this array the spin up time is causing applications to hang.
>> Current spin up time is 50 seconds, but will be getting worse as I add
>> drives.
>>
>> Here is the MUCH longer description including more specs (DingbatCA):
>> http://forums.gentoo.org/viewtopic-p-7599010.html
>>
>> Any help would be greatly appreciated.  I think this would make a
>> great wiki article.
>>
>> More details bellow:
>> root@nas:/data# smartctl -a /dev/sdd | grep Spin_Up
>>   3 Spin_Up_Time            0x0027   150   137   021    Pre-fail
>> Always       -       9608
>>
>> root@nas:/data# time (touch foo ; sync)
>> real    0m49.004s
>> user    0m0.000s
>> sys     0m0.004s
>>
>> root@nas:/data# time (touch foo ; sync)
>> real    0m50.647s
>> user    0m0.000s
>> sys     0m0.008s
>>
>> root@nas:/data# df -h /data
>> Filesystem      Size  Used Avail Use% Mounted on
>> /dev/md125      9.1T  3.8T  5.4T  42% /data
>>
>> root@nas:/data# mdadm -D /dev/md125
>> /dev/md125:
>>         Version : 1.2
>>   Creation Time : Wed Jun 18 07:54:38 2014
>>      Raid Level : raid6
>>      Array Size : 9766909440 (9314.45 GiB 10001.32 GB)
>>   Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
>>    Raid Devices : 7
>>   Total Devices : 7
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Mon Aug 11 16:30:16 2014
>>           State : clean
>>  Active Devices : 7
>> Working Devices : 7
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>            Name : nas:data  (local to host nas)
>>            UUID : 74f9ce7a:df1c2698:c8ec7259:5fdb2618
>>          Events : 1038642
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       17        0      active sync   /dev/sdb1
>>        1       8       33        1      active sync   /dev/sdc1
>>        3       8       49        2      active sync   /dev/sdd1
>>        4       8       65        3      active sync   /dev/sde1
>>        5       8       81        4      active sync   /dev/sdf1
>>        7       8      145        5      active sync   /dev/sdj1
>>        6       8      129        6      active sync   /dev/sdi1
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-12  1:21 ` Larkin Lowrey
  2014-08-12  1:31   ` Adam Talbot
@ 2014-08-12  5:55   ` Can Jeuleers
  2014-08-12  9:46     ` Wilson, Jonathan
  1 sibling, 1 reply; 17+ messages in thread
From: Can Jeuleers @ 2014-08-12  5:55 UTC (permalink / raw)
  To: linux-raid

On 08/12/2014 03:21 AM, Larkin Lowrey wrote:
> Also, leaving spin-up to the controller is
> also not so hot since some controllers spin-up the drives sequentially
> rather than in parallel.

Sequential spin-up is a feature to some, because it avoids large power
spikes.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-12  5:55   ` Can Jeuleers
@ 2014-08-12  9:46     ` Wilson, Jonathan
  2014-08-12 15:26       ` Adam Talbot
       [not found]       ` <CAH_2GhfBKiJAT=+zCFH0_xkmyTfWFi=c-F+z8rwL1x++UwU+KA@mail.gmail.com>
  0 siblings, 2 replies; 17+ messages in thread
From: Wilson, Jonathan @ 2014-08-12  9:46 UTC (permalink / raw)
  To: Can Jeuleers; +Cc: linux-raid

On Tue, 2014-08-12 at 07:55 +0200, Can Jeuleers wrote:
> On 08/12/2014 03:21 AM, Larkin Lowrey wrote:
> > Also, leaving spin-up to the controller is
> > also not so hot since some controllers spin-up the drives sequentially
> > rather than in parallel.
> 
> Sequential spin-up is a feature to some, because it avoids large power
> spikes.

I vaguely recall older drives had a jumper to set a delayed spin up so
they stayed in a low power (possibly un-spun up) mode when power was
applied and only woke up when a command was received (I think any
command, not a specific "wake up" one).

Also as mentioned some controllers may also only wake drives one after
the other, likewise mdriad does not care about the underlying
hardware/driver stack, only that it eventually responds, and even then I
believe it will happily wait till the end of time if no response or
error is propagated up the stack; hence the time out in scsi_device
stack not in the mdraid.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-12  9:46     ` Wilson, Jonathan
@ 2014-08-12 15:26       ` Adam Talbot
       [not found]       ` <CAH_2GhfBKiJAT=+zCFH0_xkmyTfWFi=c-F+z8rwL1x++UwU+KA@mail.gmail.com>
  1 sibling, 0 replies; 17+ messages in thread
From: Adam Talbot @ 2014-08-12 15:26 UTC (permalink / raw)
  To: Wilson, Jonathan, rm; +Cc: Can Jeuleers, linux-raid

Thank you all for the input.  At this point I think I am going to
write a simple daemon to do dm power management. I still think this
would be a good feature set to roll into the driver stack, or
madam-tools.

As far as wear and tear on the disks. Yes, starting and stopping the
drives shortens their life span. I don't trust my disks, regardless of
starting/stopping, that is why I run RAID 6. Lets say I use my NAS
with it's 7 disks for 2 hours a day, 7 days a week @ 10 watts per
drive.  The current price for power in my area is $0.11 per
kilowatt-hour. That comes out to be $5.62 per year to run my drives
for 2 hours, daily.  But if I run my drives 24/7 it would cost me
$67.45/year.  Basically it would cost me an extra $61.83/year to run
the drives 24/7.  The 2TB 5400RPM SATA drives I have been picking up
from local surplus, or auction websites are costing me $40~$50,
including shipping and tax.  In other words I could buy a new disk
every 8~10 months to replace failures and it would be the same cost.
Drives don't fail that fast, even if I was start/stopping them 10
times daily. This is also completely ignoring the fact that drive
prices are failing.  Sorry to disappoint, but I am going to spin down
my array and save some money.

On Tue, Aug 12, 2014 at 2:46 AM, Wilson, Jonathan
<piercing_male@hotmail.com> wrote:
> On Tue, 2014-08-12 at 07:55 +0200, Can Jeuleers wrote:
>> On 08/12/2014 03:21 AM, Larkin Lowrey wrote:
>> > Also, leaving spin-up to the controller is
>> > also not so hot since some controllers spin-up the drives sequentially
>> > rather than in parallel.
>>
>> Sequential spin-up is a feature to some, because it avoids large power
>> spikes.
>
> I vaguely recall older drives had a jumper to set a delayed spin up so
> they stayed in a low power (possibly un-spun up) mode when power was
> applied and only woke up when a command was received (I think any
> command, not a specific "wake up" one).
>
> Also as mentioned some controllers may also only wake drives one after
> the other, likewise mdriad does not care about the underlying
> hardware/driver stack, only that it eventually responds, and even then I
> believe it will happily wait till the end of time if no response or
> error is propagated up the stack; hence the time out in scsi_device
> stack not in the mdraid.
>
>
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CAH_2GhfBKiJAT=+zCFH0_xkmyTfWFi=c-F+z8rwL1x++UwU+KA@mail.gmail.com>]

* Re: Sleepy drives and MD RAID 6
       [not found]       ` <CAH_2GhfBKiJAT=+zCFH0_xkmyTfWFi=c-F+z8rwL1x++UwU+KA@mail.gmail.com>
@ 2014-08-13 16:07         ` Adam Talbot
  2014-08-14 16:50           ` Adam Talbot
  0 siblings, 1 reply; 17+ messages in thread
From: Adam Talbot @ 2014-08-13 16:07 UTC (permalink / raw)
  To: Wilson, Jonathan; +Cc: Can Jeuleers, linux-raid, rm

Arg!!  Am I hitting some kind of blocking at the Linux kernel?? No
matter what I do, I can't seem to get the drives to spin up in
parallel.  Any ideas?

A simple test case trying to get two drives to spin up at once.
root@nas:~# hdparm -C /dev/sdh /dev/sdg
/dev/sdh:
 drive state is:  standby

/dev/sdg:
 drive state is:  standby

#Two terminal windows dd'ing sdg and sdh at the same time.
root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
count=1 iflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 14.371 s, 0.3 kB/s

real   0m28.139s ############# WHY?! ################
user   0m0.000s
sys   0m0.000s

#A single drive spin-up
root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
count=1 iflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 14.4212 s, 0.3 kB/s

real   0m14.424s
user   0m0.000s
sys   0m0.000s

On Tue, Aug 12, 2014 at 8:23 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
> Thank you all for the input.  At this point I think I am going to write a
> simple daemon to do dm power management. I still think this would be a good
> feature set to roll into the driver stack, or madam-tools.
>
> As far as wear and tear on the disks. Yes, starting and stopping the drives
> shortens their life span. I don't trust my disks, regardless of
> starting/stopping, that is why I run RAID 6. Lets say I use my NAS with it's
> 7 disks for 2 hours a day, 7 days a week @ 10 watts per drive.  The current
> price for power in my area is $0.11 per kilowatt-hour. That comes out to be
> $5.62 per year to run my drives for 2 hours, daily.  But if I run my drives
> 24/7 it would cost me $67.45/year.  Basically it would cost me an extra
> $61.83/year to run the drives 24/7.  The 2TB 5400RPM SATA drives I have been
> picking up from local surplus, or auction websites are costing me $40~$50,
> including shipping and tax.  In other words I could buy a new disk every
> 8~10 months to replace failures and it would be the same cost. Drives don't
> fail that fast, even if I was start/stopping them 10 times daily. This is
> also completely ignoring the fact that drive prices are failing.  Sorry to
> disappoint, but I am going to spin down my array and save some money.
>
>
>
>
> On Tue, Aug 12, 2014 at 2:46 AM, Wilson, Jonathan
> <piercing_male@hotmail.com> wrote:
>>
>> On Tue, 2014-08-12 at 07:55 +0200, Can Jeuleers wrote:
>> > On 08/12/2014 03:21 AM, Larkin Lowrey wrote:
>> > > Also, leaving spin-up to the controller is
>> > > also not so hot since some controllers spin-up the drives sequentially
>> > > rather than in parallel.
>> >
>> > Sequential spin-up is a feature to some, because it avoids large power
>> > spikes.
>>
>> I vaguely recall older drives had a jumper to set a delayed spin up so
>> they stayed in a low power (possibly un-spun up) mode when power was
>> applied and only woke up when a command was received (I think any
>> command, not a specific "wake up" one).
>>
>> Also as mentioned some controllers may also only wake drives one after
>> the other, likewise mdriad does not care about the underlying
>> hardware/driver stack, only that it eventually responds, and even then I
>> believe it will happily wait till the end of time if no response or
>> error is propagated up the stack; hence the time out in scsi_device
>> stack not in the mdraid.
>>
>>
>>
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-13 16:07         ` Adam Talbot
@ 2014-08-14 16:50           ` Adam Talbot
  2014-08-14 17:00             ` Larkin Lowrey
  0 siblings, 1 reply; 17+ messages in thread
From: Adam Talbot @ 2014-08-14 16:50 UTC (permalink / raw)
  To: Wilson, Jonathan; +Cc: Can Jeuleers, linux-raid, rm

I am running out of ideas.  Does anyone know how to wake a disk with a
non-blocking, and non-caching method?
I have tried the following commands:
dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct,nonblock
hdparm --dco-identify /dev/sdh   (This gets cached after the 3~10th
time running)
hdparm --read-sector 48059863 /dev/sdh

Any ideas?

On Wed, Aug 13, 2014 at 9:07 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
> Arg!!  Am I hitting some kind of blocking at the Linux kernel?? No
> matter what I do, I can't seem to get the drives to spin up in
> parallel.  Any ideas?
>
> A simple test case trying to get two drives to spin up at once.
> root@nas:~# hdparm -C /dev/sdh /dev/sdg
> /dev/sdh:
>  drive state is:  standby
>
> /dev/sdg:
>  drive state is:  standby
>
> #Two terminal windows dd'ing sdg and sdh at the same time.
> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
> count=1 iflag=direct
> 1+0 records in
> 1+0 records out
> 4096 bytes (4.1 kB) copied, 14.371 s, 0.3 kB/s
>
> real   0m28.139s ############# WHY?! ################
> user   0m0.000s
> sys   0m0.000s
>
> #A single drive spin-up
> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
> count=1 iflag=direct
> 1+0 records in
> 1+0 records out
> 4096 bytes (4.1 kB) copied, 14.4212 s, 0.3 kB/s
>
> real   0m14.424s
> user   0m0.000s
> sys   0m0.000s
>
> On Tue, Aug 12, 2014 at 8:23 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>> Thank you all for the input.  At this point I think I am going to write a
>> simple daemon to do dm power management. I still think this would be a good
>> feature set to roll into the driver stack, or madam-tools.
>>
>> As far as wear and tear on the disks. Yes, starting and stopping the drives
>> shortens their life span. I don't trust my disks, regardless of
>> starting/stopping, that is why I run RAID 6. Lets say I use my NAS with it's
>> 7 disks for 2 hours a day, 7 days a week @ 10 watts per drive.  The current
>> price for power in my area is $0.11 per kilowatt-hour. That comes out to be
>> $5.62 per year to run my drives for 2 hours, daily.  But if I run my drives
>> 24/7 it would cost me $67.45/year.  Basically it would cost me an extra
>> $61.83/year to run the drives 24/7.  The 2TB 5400RPM SATA drives I have been
>> picking up from local surplus, or auction websites are costing me $40~$50,
>> including shipping and tax.  In other words I could buy a new disk every
>> 8~10 months to replace failures and it would be the same cost. Drives don't
>> fail that fast, even if I was start/stopping them 10 times daily. This is
>> also completely ignoring the fact that drive prices are failing.  Sorry to
>> disappoint, but I am going to spin down my array and save some money.
>>
>>
>>
>>
>> On Tue, Aug 12, 2014 at 2:46 AM, Wilson, Jonathan
>> <piercing_male@hotmail.com> wrote:
>>>
>>> On Tue, 2014-08-12 at 07:55 +0200, Can Jeuleers wrote:
>>> > On 08/12/2014 03:21 AM, Larkin Lowrey wrote:
>>> > > Also, leaving spin-up to the controller is
>>> > > also not so hot since some controllers spin-up the drives sequentially
>>> > > rather than in parallel.
>>> >
>>> > Sequential spin-up is a feature to some, because it avoids large power
>>> > spikes.
>>>
>>> I vaguely recall older drives had a jumper to set a delayed spin up so
>>> they stayed in a low power (possibly un-spun up) mode when power was
>>> applied and only woke up when a command was received (I think any
>>> command, not a specific "wake up" one).
>>>
>>> Also as mentioned some controllers may also only wake drives one after
>>> the other, likewise mdriad does not care about the underlying
>>> hardware/driver stack, only that it eventually responds, and even then I
>>> believe it will happily wait till the end of time if no response or
>>> error is propagated up the stack; hence the time out in scsi_device
>>> stack not in the mdraid.
>>>
>>>
>>>
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> > the body of a message to majordomo@vger.kernel.org
>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-14 16:50           ` Adam Talbot
@ 2014-08-14 17:00             ` Larkin Lowrey
  2014-08-14 17:37               ` Adam Talbot
  0 siblings, 1 reply; 17+ messages in thread
From: Larkin Lowrey @ 2014-08-14 17:00 UTC (permalink / raw)
  To: Adam Talbot, Wilson, Jonathan; +Cc: Can Jeuleers, linux-raid, rm

Have you tried the dd command w/o nonblock and putting it in the
background via &? You could then use the 'wait' command to wait for them
to finish.

I did dust off some old memories and recalled that one of my SAS
controllers (LSI) does the spin ups serially no matter what and I ended
up moving these low duty cycle drives to my other SAS controller
(Marvell) and put my always spinning drives on the LSI. I've never seen
this behavior from any of my AHCI SATA controllers.

--Larkin

On 8/14/2014 11:50 AM, Adam Talbot wrote:
> I am running out of ideas.  Does anyone know how to wake a disk with a
> non-blocking, and non-caching method?
> I have tried the following commands:
> dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct,nonblock
> hdparm --dco-identify /dev/sdh   (This gets cached after the 3~10th
> time running)
> hdparm --read-sector 48059863 /dev/sdh
>
> Any ideas?
>
> On Wed, Aug 13, 2014 at 9:07 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>> Arg!!  Am I hitting some kind of blocking at the Linux kernel?? No
>> matter what I do, I can't seem to get the drives to spin up in
>> parallel.  Any ideas?
>>
>> A simple test case trying to get two drives to spin up at once.
>> root@nas:~# hdparm -C /dev/sdh /dev/sdg
>> /dev/sdh:
>>  drive state is:  standby
>>
>> /dev/sdg:
>>  drive state is:  standby
>>
>> #Two terminal windows dd'ing sdg and sdh at the same time.
>> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
>> count=1 iflag=direct
>> 1+0 records in
>> 1+0 records out
>> 4096 bytes (4.1 kB) copied, 14.371 s, 0.3 kB/s
>>
>> real   0m28.139s ############# WHY?! ################
>> user   0m0.000s
>> sys   0m0.000s
>>
>> #A single drive spin-up
>> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
>> count=1 iflag=direct
>> 1+0 records in
>> 1+0 records out
>> 4096 bytes (4.1 kB) copied, 14.4212 s, 0.3 kB/s
>>
>> real   0m14.424s
>> user   0m0.000s
>> sys   0m0.000s
>>
>> On Tue, Aug 12, 2014 at 8:23 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>>> Thank you all for the input.  At this point I think I am going to write a
>>> simple daemon to do dm power management. I still think this would be a good
>>> feature set to roll into the driver stack, or madam-tools.
>>>
>>> As far as wear and tear on the disks. Yes, starting and stopping the drives
>>> shortens their life span. I don't trust my disks, regardless of
>>> starting/stopping, that is why I run RAID 6. Lets say I use my NAS with it's
>>> 7 disks for 2 hours a day, 7 days a week @ 10 watts per drive.  The current
>>> price for power in my area is $0.11 per kilowatt-hour. That comes out to be
>>> $5.62 per year to run my drives for 2 hours, daily.  But if I run my drives
>>> 24/7 it would cost me $67.45/year.  Basically it would cost me an extra
>>> $61.83/year to run the drives 24/7.  The 2TB 5400RPM SATA drives I have been
>>> picking up from local surplus, or auction websites are costing me $40~$50,
>>> including shipping and tax.  In other words I could buy a new disk every
>>> 8~10 months to replace failures and it would be the same cost. Drives don't
>>> fail that fast, even if I was start/stopping them 10 times daily. This is
>>> also completely ignoring the fact that drive prices are failing.  Sorry to
>>> disappoint, but I am going to spin down my array and save some money.
>>>
>>>
>>>
>>>
>>> On Tue, Aug 12, 2014 at 2:46 AM, Wilson, Jonathan
>>> <piercing_male@hotmail.com> wrote:
>>>> On Tue, 2014-08-12 at 07:55 +0200, Can Jeuleers wrote:
>>>>> On 08/12/2014 03:21 AM, Larkin Lowrey wrote:
>>>>>> Also, leaving spin-up to the controller is
>>>>>> also not so hot since some controllers spin-up the drives sequentially
>>>>>> rather than in parallel.
>>>>> Sequential spin-up is a feature to some, because it avoids large power
>>>>> spikes.
>>>> I vaguely recall older drives had a jumper to set a delayed spin up so
>>>> they stayed in a low power (possibly un-spun up) mode when power was
>>>> applied and only woke up when a command was received (I think any
>>>> command, not a specific "wake up" one).
>>>>
>>>> Also as mentioned some controllers may also only wake drives one after
>>>> the other, likewise mdriad does not care about the underlying
>>>> hardware/driver stack, only that it eventually responds, and even then I
>>>> believe it will happily wait till the end of time if no response or
>>>> error is propagated up the stack; hence the time out in scsi_device
>>>> stack not in the mdraid.
>>>>
>>>>
>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-14 17:00             ` Larkin Lowrey
@ 2014-08-14 17:37               ` Adam Talbot
  2014-08-14 18:05                 ` Larkin Lowrey
  0 siblings, 1 reply; 17+ messages in thread
From: Adam Talbot @ 2014-08-14 17:37 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: Wilson, Jonathan, Can Jeuleers, linux-raid, rm

For testing I use two windows, just to make sure they are run
independent. My shell script uses "(setsid put_some_command_here
/dev/$i > /dev/null 2>&1 &)" to make sure the command is forced into
the background.

Hummm... A controller issue?
lspci | grep LSI
07:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
PCI-Express Fusion-MPT SAS (rev 02)
09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
PCI-Express Fusion-MPT SAS (rev 08)
0b:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
PCI-Express Fusion-MPT SAS (rev 02)
lspci | grep -i sata  (On-board)
00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset
SATA IDE Controller (rev 09)

All but 1 of my drives are run through my 3X 4-port LSI cards.
/dev/sdb is running through the onboard Intel SATA controller. Each
drive takes 10 secounds to spin up. With a 7 disk RAID 6, I would
expect a read/write to succeed 50 seconds (5 drives) after the
request.  But on my system it always takes 40 seconds?!

Quick test. sdb & sdc at the same time (Intel + LSI):
root@nas:~/dm_drive_sleeper# time (dd if=/dev/sdc of=/dev/null bs=512k
count=16 iflag=direct)
16+0 records in
16+0 records out
8388608 bytes (8.4 MB) copied, 10.2006 s, 822 kB/s

real 0m10.202s
user 0m0.000s
sys 0m0.000s

sdf & sde at the same time (LSI + LSI):root@nas:~/dm_drive_sleeper#
time (dd if=/dev/sdf of=/dev/null bs=512k count=16 iflag=direct)
16+0 records in
16+0 records out
8388608 bytes (8.4 MB) copied, 10.2417 s, 819 kB/s

real 0m20.208s
user 0m0.000s
sys 0m0.000s

I blame the LSI cards!??!?   I have been looking for an excuse to
upgrade, and now I have it!  Any clue where I can find a
dumb/cheap/used 12-port (Or 2X 8-port).  My drive cage has 15 ports,
standard SATA/SAS connections.  So I will have to pick up some adapter
cables regardless of the new card type.

In other news, Larkin I owe you a beer/coffee/tea.

On Thu, Aug 14, 2014 at 10:00 AM, Larkin Lowrey
<llowrey@nuclearwinter.com> wrote:
> Have you tried the dd command w/o nonblock and putting it in the
> background via &? You could then use the 'wait' command to wait for them
> to finish.
>
> I did dust off some old memories and recalled that one of my SAS
> controllers (LSI) does the spin ups serially no matter what and I ended
> up moving these low duty cycle drives to my other SAS controller
> (Marvell) and put my always spinning drives on the LSI. I've never seen
> this behavior from any of my AHCI SATA controllers.
>
> --Larkin
>
> On 8/14/2014 11:50 AM, Adam Talbot wrote:
>> I am running out of ideas.  Does anyone know how to wake a disk with a
>> non-blocking, and non-caching method?
>> I have tried the following commands:
>> dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct,nonblock
>> hdparm --dco-identify /dev/sdh   (This gets cached after the 3~10th
>> time running)
>> hdparm --read-sector 48059863 /dev/sdh
>>
>> Any ideas?
>>
>> On Wed, Aug 13, 2014 at 9:07 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>>> Arg!!  Am I hitting some kind of blocking at the Linux kernel?? No
>>> matter what I do, I can't seem to get the drives to spin up in
>>> parallel.  Any ideas?
>>>
>>> A simple test case trying to get two drives to spin up at once.
>>> root@nas:~# hdparm -C /dev/sdh /dev/sdg
>>> /dev/sdh:
>>>  drive state is:  standby
>>>
>>> /dev/sdg:
>>>  drive state is:  standby
>>>
>>> #Two terminal windows dd'ing sdg and sdh at the same time.
>>> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
>>> count=1 iflag=direct
>>> 1+0 records in
>>> 1+0 records out
>>> 4096 bytes (4.1 kB) copied, 14.371 s, 0.3 kB/s
>>>
>>> real   0m28.139s ############# WHY?! ################
>>> user   0m0.000s
>>> sys   0m0.000s
>>>
>>> #A single drive spin-up
>>> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
>>> count=1 iflag=direct
>>> 1+0 records in
>>> 1+0 records out
>>> 4096 bytes (4.1 kB) copied, 14.4212 s, 0.3 kB/s
>>>
>>> real   0m14.424s
>>> user   0m0.000s
>>> sys   0m0.000s
>>>
>>> On Tue, Aug 12, 2014 at 8:23 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>>>> Thank you all for the input.  At this point I think I am going to write a
>>>> simple daemon to do dm power management. I still think this would be a good
>>>> feature set to roll into the driver stack, or madam-tools.
>>>>
>>>> As far as wear and tear on the disks. Yes, starting and stopping the drives
>>>> shortens their life span. I don't trust my disks, regardless of
>>>> starting/stopping, that is why I run RAID 6. Lets say I use my NAS with it's
>>>> 7 disks for 2 hours a day, 7 days a week @ 10 watts per drive.  The current
>>>> price for power in my area is $0.11 per kilowatt-hour. That comes out to be
>>>> $5.62 per year to run my drives for 2 hours, daily.  But if I run my drives
>>>> 24/7 it would cost me $67.45/year.  Basically it would cost me an extra
>>>> $61.83/year to run the drives 24/7.  The 2TB 5400RPM SATA drives I have been
>>>> picking up from local surplus, or auction websites are costing me $40~$50,
>>>> including shipping and tax.  In other words I could buy a new disk every
>>>> 8~10 months to replace failures and it would be the same cost. Drives don't
>>>> fail that fast, even if I was start/stopping them 10 times daily. This is
>>>> also completely ignoring the fact that drive prices are failing.  Sorry to
>>>> disappoint, but I am going to spin down my array and save some money.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 12, 2014 at 2:46 AM, Wilson, Jonathan
>>>> <piercing_male@hotmail.com> wrote:
>>>>> On Tue, 2014-08-12 at 07:55 +0200, Can Jeuleers wrote:
>>>>>> On 08/12/2014 03:21 AM, Larkin Lowrey wrote:
>>>>>>> Also, leaving spin-up to the controller is
>>>>>>> also not so hot since some controllers spin-up the drives sequentially
>>>>>>> rather than in parallel.
>>>>>> Sequential spin-up is a feature to some, because it avoids large power
>>>>>> spikes.
>>>>> I vaguely recall older drives had a jumper to set a delayed spin up so
>>>>> they stayed in a low power (possibly un-spun up) mode when power was
>>>>> applied and only woke up when a command was received (I think any
>>>>> command, not a specific "wake up" one).
>>>>>
>>>>> Also as mentioned some controllers may also only wake drives one after
>>>>> the other, likewise mdriad does not care about the underlying
>>>>> hardware/driver stack, only that it eventually responds, and even then I
>>>>> believe it will happily wait till the end of time if no response or
>>>>> error is propagated up the stack; hence the time out in scsi_device
>>>>> stack not in the mdraid.
>>>>>
>>>>>
>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-14 17:37               ` Adam Talbot
@ 2014-08-14 18:05                 ` Larkin Lowrey
  2014-08-18 15:41                   ` Adam Talbot
  0 siblings, 1 reply; 17+ messages in thread
From: Larkin Lowrey @ 2014-08-14 18:05 UTC (permalink / raw)
  To: Adam Talbot; +Cc: Wilson, Jonathan, Can Jeuleers, linux-raid, rm

My LSI SAS controller (SAS2008) is newer and may behave differently but
I guessing this is your problem.

I've been very happy with my HighPoint controllers (difficult to say in
public). I have an 8 port Rocket 2720SGL ($150) and a 16 port RocketRaid
2740 ($400+) . Both have worked flawlessly and performance has been
excellent. The 16 port card actually has two 8 port controllers on it
bridged together. I think you're better off with 2 8 port cards.

The 8 port RocketRaid 2680 is slower (3Gb/s) but should be fine for
spinning rust and is about $100. I don't have any experience with those.
I found one on ebay for $45 so there may be some good deals on that one
since it's a generation older.

--Larkin

On 8/14/2014 12:37 PM, Adam Talbot wrote:
> For testing I use two windows, just to make sure they are run
> independent. My shell script uses "(setsid put_some_command_here
> /dev/$i > /dev/null 2>&1 &)" to make sure the command is forced into
> the background.
>
> Hummm... A controller issue?
> lspci | grep LSI
> 07:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
> PCI-Express Fusion-MPT SAS (rev 02)
> 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
> PCI-Express Fusion-MPT SAS (rev 08)
> 0b:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
> PCI-Express Fusion-MPT SAS (rev 02)
> lspci | grep -i sata  (On-board)
> 00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset
> SATA IDE Controller (rev 09)
>
> All but 1 of my drives are run through my 3X 4-port LSI cards.
> /dev/sdb is running through the onboard Intel SATA controller. Each
> drive takes 10 secounds to spin up. With a 7 disk RAID 6, I would
> expect a read/write to succeed 50 seconds (5 drives) after the
> request.  But on my system it always takes 40 seconds?!
>
> Quick test. sdb & sdc at the same time (Intel + LSI):
> root@nas:~/dm_drive_sleeper# time (dd if=/dev/sdc of=/dev/null bs=512k
> count=16 iflag=direct)
> 16+0 records in
> 16+0 records out
> 8388608 bytes (8.4 MB) copied, 10.2006 s, 822 kB/s
>
> real 0m10.202s
> user 0m0.000s
> sys 0m0.000s
>
> sdf & sde at the same time (LSI + LSI):root@nas:~/dm_drive_sleeper#
> time (dd if=/dev/sdf of=/dev/null bs=512k count=16 iflag=direct)
> 16+0 records in
> 16+0 records out
> 8388608 bytes (8.4 MB) copied, 10.2417 s, 819 kB/s
>
> real 0m20.208s
> user 0m0.000s
> sys 0m0.000s
>
> I blame the LSI cards!??!?   I have been looking for an excuse to
> upgrade, and now I have it!  Any clue where I can find a
> dumb/cheap/used 12-port (Or 2X 8-port).  My drive cage has 15 ports,
> standard SATA/SAS connections.  So I will have to pick up some adapter
> cables regardless of the new card type.
>
> In other news, Larkin I owe you a beer/coffee/tea.
>
> On Thu, Aug 14, 2014 at 10:00 AM, Larkin Lowrey
> <llowrey@nuclearwinter.com> wrote:
>> Have you tried the dd command w/o nonblock and putting it in the
>> background via &? You could then use the 'wait' command to wait for them
>> to finish.
>>
>> I did dust off some old memories and recalled that one of my SAS
>> controllers (LSI) does the spin ups serially no matter what and I ended
>> up moving these low duty cycle drives to my other SAS controller
>> (Marvell) and put my always spinning drives on the LSI. I've never seen
>> this behavior from any of my AHCI SATA controllers.
>>
>> --Larkin
>>
>> On 8/14/2014 11:50 AM, Adam Talbot wrote:
>>> I am running out of ideas.  Does anyone know how to wake a disk with a
>>> non-blocking, and non-caching method?
>>> I have tried the following commands:
>>> dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct,nonblock
>>> hdparm --dco-identify /dev/sdh   (This gets cached after the 3~10th
>>> time running)
>>> hdparm --read-sector 48059863 /dev/sdh
>>>
>>> Any ideas?
>>>
>>> On Wed, Aug 13, 2014 at 9:07 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>>>> Arg!!  Am I hitting some kind of blocking at the Linux kernel?? No
>>>> matter what I do, I can't seem to get the drives to spin up in
>>>> parallel.  Any ideas?
>>>>
>>>> A simple test case trying to get two drives to spin up at once.
>>>> root@nas:~# hdparm -C /dev/sdh /dev/sdg
>>>> /dev/sdh:
>>>>  drive state is:  standby
>>>>
>>>> /dev/sdg:
>>>>  drive state is:  standby
>>>>
>>>> #Two terminal windows dd'ing sdg and sdh at the same time.
>>>> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
>>>> count=1 iflag=direct
>>>> 1+0 records in
>>>> 1+0 records out
>>>> 4096 bytes (4.1 kB) copied, 14.371 s, 0.3 kB/s
>>>>
>>>> real   0m28.139s ############# WHY?! ################
>>>> user   0m0.000s
>>>> sys   0m0.000s
>>>>
>>>> #A single drive spin-up
>>>> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
>>>> count=1 iflag=direct
>>>> 1+0 records in
>>>> 1+0 records out
>>>> 4096 bytes (4.1 kB) copied, 14.4212 s, 0.3 kB/s
>>>>
>>>> real   0m14.424s
>>>> user   0m0.000s
>>>> sys   0m0.000s
>>>>
>>>> On Tue, Aug 12, 2014 at 8:23 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>>>>> Thank you all for the input.  At this point I think I am going to write a
>>>>> simple daemon to do dm power management. I still think this would be a good
>>>>> feature set to roll into the driver stack, or madam-tools.
>>>>>
>>>>> As far as wear and tear on the disks. Yes, starting and stopping the drives
>>>>> shortens their life span. I don't trust my disks, regardless of
>>>>> starting/stopping, that is why I run RAID 6. Lets say I use my NAS with it's
>>>>> 7 disks for 2 hours a day, 7 days a week @ 10 watts per drive.  The current
>>>>> price for power in my area is $0.11 per kilowatt-hour. That comes out to be
>>>>> $5.62 per year to run my drives for 2 hours, daily.  But if I run my drives
>>>>> 24/7 it would cost me $67.45/year.  Basically it would cost me an extra
>>>>> $61.83/year to run the drives 24/7.  The 2TB 5400RPM SATA drives I have been
>>>>> picking up from local surplus, or auction websites are costing me $40~$50,
>>>>> including shipping and tax.  In other words I could buy a new disk every
>>>>> 8~10 months to replace failures and it would be the same cost. Drives don't
>>>>> fail that fast, even if I was start/stopping them 10 times daily. This is
>>>>> also completely ignoring the fact that drive prices are failing.  Sorry to
>>>>> disappoint, but I am going to spin down my array and save some money.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 12, 2014 at 2:46 AM, Wilson, Jonathan
>>>>> <piercing_male@hotmail.com> wrote:
>>>>>> On Tue, 2014-08-12 at 07:55 +0200, Can Jeuleers wrote:
>>>>>>> On 08/12/2014 03:21 AM, Larkin Lowrey wrote:
>>>>>>>> Also, leaving spin-up to the controller is
>>>>>>>> also not so hot since some controllers spin-up the drives sequentially
>>>>>>>> rather than in parallel.
>>>>>>> Sequential spin-up is a feature to some, because it avoids large power
>>>>>>> spikes.
>>>>>> I vaguely recall older drives had a jumper to set a delayed spin up so
>>>>>> they stayed in a low power (possibly un-spun up) mode when power was
>>>>>> applied and only woke up when a command was received (I think any
>>>>>> command, not a specific "wake up" one).
>>>>>>
>>>>>> Also as mentioned some controllers may also only wake drives one after
>>>>>> the other, likewise mdriad does not care about the underlying
>>>>>> hardware/driver stack, only that it eventually responds, and even then I
>>>>>> believe it will happily wait till the end of time if no response or
>>>>>> error is propagated up the stack; hence the time out in scsi_device
>>>>>> stack not in the mdraid.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-14 18:05                 ` Larkin Lowrey
@ 2014-08-18 15:41                   ` Adam Talbot
  2014-08-18 16:16                     ` Larkin Lowrey
  0 siblings, 1 reply; 17+ messages in thread
From: Adam Talbot @ 2014-08-18 15:41 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: Wilson, Jonathan, Can Jeuleers, linux-raid, rm

Can you confirm the LSI 2008 SAS Controller is, or is not, effected by
this problem?

On Thu, Aug 14, 2014 at 11:05 AM, Larkin Lowrey
<llowrey@nuclearwinter.com> wrote:
> My LSI SAS controller (SAS2008) is newer and may behave differently but
> I guessing this is your problem.
>
> I've been very happy with my HighPoint controllers (difficult to say in
> public). I have an 8 port Rocket 2720SGL ($150) and a 16 port RocketRaid
> 2740 ($400+) . Both have worked flawlessly and performance has been
> excellent. The 16 port card actually has two 8 port controllers on it
> bridged together. I think you're better off with 2 8 port cards.
>
> The 8 port RocketRaid 2680 is slower (3Gb/s) but should be fine for
> spinning rust and is about $100. I don't have any experience with those.
> I found one on ebay for $45 so there may be some good deals on that one
> since it's a generation older.
>
> --Larkin
>
> On 8/14/2014 12:37 PM, Adam Talbot wrote:
>> For testing I use two windows, just to make sure they are run
>> independent. My shell script uses "(setsid put_some_command_here
>> /dev/$i > /dev/null 2>&1 &)" to make sure the command is forced into
>> the background.
>>
>> Hummm... A controller issue?
>> lspci | grep LSI
>> 07:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
>> PCI-Express Fusion-MPT SAS (rev 02)
>> 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
>> PCI-Express Fusion-MPT SAS (rev 08)
>> 0b:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
>> PCI-Express Fusion-MPT SAS (rev 02)
>> lspci | grep -i sata  (On-board)
>> 00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset
>> SATA IDE Controller (rev 09)
>>
>> All but 1 of my drives are run through my 3X 4-port LSI cards.
>> /dev/sdb is running through the onboard Intel SATA controller. Each
>> drive takes 10 secounds to spin up. With a 7 disk RAID 6, I would
>> expect a read/write to succeed 50 seconds (5 drives) after the
>> request.  But on my system it always takes 40 seconds?!
>>
>> Quick test. sdb & sdc at the same time (Intel + LSI):
>> root@nas:~/dm_drive_sleeper# time (dd if=/dev/sdc of=/dev/null bs=512k
>> count=16 iflag=direct)
>> 16+0 records in
>> 16+0 records out
>> 8388608 bytes (8.4 MB) copied, 10.2006 s, 822 kB/s
>>
>> real 0m10.202s
>> user 0m0.000s
>> sys 0m0.000s
>>
>> sdf & sde at the same time (LSI + LSI):root@nas:~/dm_drive_sleeper#
>> time (dd if=/dev/sdf of=/dev/null bs=512k count=16 iflag=direct)
>> 16+0 records in
>> 16+0 records out
>> 8388608 bytes (8.4 MB) copied, 10.2417 s, 819 kB/s
>>
>> real 0m20.208s
>> user 0m0.000s
>> sys 0m0.000s
>>
>> I blame the LSI cards!??!?   I have been looking for an excuse to
>> upgrade, and now I have it!  Any clue where I can find a
>> dumb/cheap/used 12-port (Or 2X 8-port).  My drive cage has 15 ports,
>> standard SATA/SAS connections.  So I will have to pick up some adapter
>> cables regardless of the new card type.
>>
>> In other news, Larkin I owe you a beer/coffee/tea.
>>
>> On Thu, Aug 14, 2014 at 10:00 AM, Larkin Lowrey
>> <llowrey@nuclearwinter.com> wrote:
>>> Have you tried the dd command w/o nonblock and putting it in the
>>> background via &? You could then use the 'wait' command to wait for them
>>> to finish.
>>>
>>> I did dust off some old memories and recalled that one of my SAS
>>> controllers (LSI) does the spin ups serially no matter what and I ended
>>> up moving these low duty cycle drives to my other SAS controller
>>> (Marvell) and put my always spinning drives on the LSI. I've never seen
>>> this behavior from any of my AHCI SATA controllers.
>>>
>>> --Larkin
>>>
>>> On 8/14/2014 11:50 AM, Adam Talbot wrote:
>>>> I am running out of ideas.  Does anyone know how to wake a disk with a
>>>> non-blocking, and non-caching method?
>>>> I have tried the following commands:
>>>> dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct,nonblock
>>>> hdparm --dco-identify /dev/sdh   (This gets cached after the 3~10th
>>>> time running)
>>>> hdparm --read-sector 48059863 /dev/sdh
>>>>
>>>> Any ideas?
>>>>
>>>> On Wed, Aug 13, 2014 at 9:07 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>>>>> Arg!!  Am I hitting some kind of blocking at the Linux kernel?? No
>>>>> matter what I do, I can't seem to get the drives to spin up in
>>>>> parallel.  Any ideas?
>>>>>
>>>>> A simple test case trying to get two drives to spin up at once.
>>>>> root@nas:~# hdparm -C /dev/sdh /dev/sdg
>>>>> /dev/sdh:
>>>>>  drive state is:  standby
>>>>>
>>>>> /dev/sdg:
>>>>>  drive state is:  standby
>>>>>
>>>>> #Two terminal windows dd'ing sdg and sdh at the same time.
>>>>> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
>>>>> count=1 iflag=direct
>>>>> 1+0 records in
>>>>> 1+0 records out
>>>>> 4096 bytes (4.1 kB) copied, 14.371 s, 0.3 kB/s
>>>>>
>>>>> real   0m28.139s ############# WHY?! ################
>>>>> user   0m0.000s
>>>>> sys   0m0.000s
>>>>>
>>>>> #A single drive spin-up
>>>>> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
>>>>> count=1 iflag=direct
>>>>> 1+0 records in
>>>>> 1+0 records out
>>>>> 4096 bytes (4.1 kB) copied, 14.4212 s, 0.3 kB/s
>>>>>
>>>>> real   0m14.424s
>>>>> user   0m0.000s
>>>>> sys   0m0.000s
>>>>>
>>>>> On Tue, Aug 12, 2014 at 8:23 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>>>>>> Thank you all for the input.  At this point I think I am going to write a
>>>>>> simple daemon to do dm power management. I still think this would be a good
>>>>>> feature set to roll into the driver stack, or madam-tools.
>>>>>>
>>>>>> As far as wear and tear on the disks. Yes, starting and stopping the drives
>>>>>> shortens their life span. I don't trust my disks, regardless of
>>>>>> starting/stopping, that is why I run RAID 6. Lets say I use my NAS with it's
>>>>>> 7 disks for 2 hours a day, 7 days a week @ 10 watts per drive.  The current
>>>>>> price for power in my area is $0.11 per kilowatt-hour. That comes out to be
>>>>>> $5.62 per year to run my drives for 2 hours, daily.  But if I run my drives
>>>>>> 24/7 it would cost me $67.45/year.  Basically it would cost me an extra
>>>>>> $61.83/year to run the drives 24/7.  The 2TB 5400RPM SATA drives I have been
>>>>>> picking up from local surplus, or auction websites are costing me $40~$50,
>>>>>> including shipping and tax.  In other words I could buy a new disk every
>>>>>> 8~10 months to replace failures and it would be the same cost. Drives don't
>>>>>> fail that fast, even if I was start/stopping them 10 times daily. This is
>>>>>> also completely ignoring the fact that drive prices are failing.  Sorry to
>>>>>> disappoint, but I am going to spin down my array and save some money.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 12, 2014 at 2:46 AM, Wilson, Jonathan
>>>>>> <piercing_male@hotmail.com> wrote:
>>>>>>> On Tue, 2014-08-12 at 07:55 +0200, Can Jeuleers wrote:
>>>>>>>> On 08/12/2014 03:21 AM, Larkin Lowrey wrote:
>>>>>>>>> Also, leaving spin-up to the controller is
>>>>>>>>> also not so hot since some controllers spin-up the drives sequentially
>>>>>>>>> rather than in parallel.
>>>>>>>> Sequential spin-up is a feature to some, because it avoids large power
>>>>>>>> spikes.
>>>>>>> I vaguely recall older drives had a jumper to set a delayed spin up so
>>>>>>> they stayed in a low power (possibly un-spun up) mode when power was
>>>>>>> applied and only woke up when a command was received (I think any
>>>>>>> command, not a specific "wake up" one).
>>>>>>>
>>>>>>> Also as mentioned some controllers may also only wake drives one after
>>>>>>> the other, likewise mdriad does not care about the underlying
>>>>>>> hardware/driver stack, only that it eventually responds, and even then I
>>>>>>> believe it will happily wait till the end of time if no response or
>>>>>>> error is propagated up the stack; hence the time out in scsi_device
>>>>>>> stack not in the mdraid.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-18 15:41                   ` Adam Talbot
@ 2014-08-18 16:16                     ` Larkin Lowrey
       [not found]                       ` <CAH_2GhcwHszr75bbDq2aWRnnF8-ttpGERKE=CKt=Nri3ue9row@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Larkin Lowrey @ 2014-08-18 16:16 UTC (permalink / raw)
  To: Adam Talbot; +Cc: Wilson, Jonathan, Can Jeuleers, linux-raid, rm

Yes, the SAS2008 will not spin up the drives in parallel no matter what
I try.

--Larkin

On 8/18/2014 10:41 AM, Adam Talbot wrote:
> Can you confirm the LSI 2008 SAS Controller is, or is not, effected by
> this problem?
>
> On Thu, Aug 14, 2014 at 11:05 AM, Larkin Lowrey
> <llowrey@nuclearwinter.com> wrote:
>> My LSI SAS controller (SAS2008) is newer and may behave differently but
>> I guessing this is your problem.
>>
>> I've been very happy with my HighPoint controllers (difficult to say in
>> public). I have an 8 port Rocket 2720SGL ($150) and a 16 port RocketRaid
>> 2740 ($400+) . Both have worked flawlessly and performance has been
>> excellent. The 16 port card actually has two 8 port controllers on it
>> bridged together. I think you're better off with 2 8 port cards.
>>
>> The 8 port RocketRaid 2680 is slower (3Gb/s) but should be fine for
>> spinning rust and is about $100. I don't have any experience with those.
>> I found one on ebay for $45 so there may be some good deals on that one
>> since it's a generation older.
>>
>> --Larkin
>>
>> On 8/14/2014 12:37 PM, Adam Talbot wrote:
>>> For testing I use two windows, just to make sure they are run
>>> independent. My shell script uses "(setsid put_some_command_here
>>> /dev/$i > /dev/null 2>&1 &)" to make sure the command is forced into
>>> the background.
>>>
>>> Hummm... A controller issue?
>>> lspci | grep LSI
>>> 07:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
>>> PCI-Express Fusion-MPT SAS (rev 02)
>>> 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
>>> PCI-Express Fusion-MPT SAS (rev 08)
>>> 0b:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
>>> PCI-Express Fusion-MPT SAS (rev 02)
>>> lspci | grep -i sata  (On-board)
>>> 00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset
>>> SATA IDE Controller (rev 09)
>>>
>>> All but 1 of my drives are run through my 3X 4-port LSI cards.
>>> /dev/sdb is running through the onboard Intel SATA controller. Each
>>> drive takes 10 secounds to spin up. With a 7 disk RAID 6, I would
>>> expect a read/write to succeed 50 seconds (5 drives) after the
>>> request.  But on my system it always takes 40 seconds?!
>>>
>>> Quick test. sdb & sdc at the same time (Intel + LSI):
>>> root@nas:~/dm_drive_sleeper# time (dd if=/dev/sdc of=/dev/null bs=512k
>>> count=16 iflag=direct)
>>> 16+0 records in
>>> 16+0 records out
>>> 8388608 bytes (8.4 MB) copied, 10.2006 s, 822 kB/s
>>>
>>> real 0m10.202s
>>> user 0m0.000s
>>> sys 0m0.000s
>>>
>>> sdf & sde at the same time (LSI + LSI):root@nas:~/dm_drive_sleeper#
>>> time (dd if=/dev/sdf of=/dev/null bs=512k count=16 iflag=direct)
>>> 16+0 records in
>>> 16+0 records out
>>> 8388608 bytes (8.4 MB) copied, 10.2417 s, 819 kB/s
>>>
>>> real 0m20.208s
>>> user 0m0.000s
>>> sys 0m0.000s
>>>
>>> I blame the LSI cards!??!?   I have been looking for an excuse to
>>> upgrade, and now I have it!  Any clue where I can find a
>>> dumb/cheap/used 12-port (Or 2X 8-port).  My drive cage has 15 ports,
>>> standard SATA/SAS connections.  So I will have to pick up some adapter
>>> cables regardless of the new card type.
>>>
>>> In other news, Larkin I owe you a beer/coffee/tea.
>>>
>>> On Thu, Aug 14, 2014 at 10:00 AM, Larkin Lowrey
>>> <llowrey@nuclearwinter.com> wrote:
>>>> Have you tried the dd command w/o nonblock and putting it in the
>>>> background via &? You could then use the 'wait' command to wait for them
>>>> to finish.
>>>>
>>>> I did dust off some old memories and recalled that one of my SAS
>>>> controllers (LSI) does the spin ups serially no matter what and I ended
>>>> up moving these low duty cycle drives to my other SAS controller
>>>> (Marvell) and put my always spinning drives on the LSI. I've never seen
>>>> this behavior from any of my AHCI SATA controllers.
>>>>
>>>> --Larkin
>>>>
>>>> On 8/14/2014 11:50 AM, Adam Talbot wrote:
>>>>> I am running out of ideas.  Does anyone know how to wake a disk with a
>>>>> non-blocking, and non-caching method?
>>>>> I have tried the following commands:
>>>>> dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct,nonblock
>>>>> hdparm --dco-identify /dev/sdh   (This gets cached after the 3~10th
>>>>> time running)
>>>>> hdparm --read-sector 48059863 /dev/sdh
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> On Wed, Aug 13, 2014 at 9:07 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>>>>>> Arg!!  Am I hitting some kind of blocking at the Linux kernel?? No
>>>>>> matter what I do, I can't seem to get the drives to spin up in
>>>>>> parallel.  Any ideas?
>>>>>>
>>>>>> A simple test case trying to get two drives to spin up at once.
>>>>>> root@nas:~# hdparm -C /dev/sdh /dev/sdg
>>>>>> /dev/sdh:
>>>>>>  drive state is:  standby
>>>>>>
>>>>>> /dev/sdg:
>>>>>>  drive state is:  standby
>>>>>>
>>>>>> #Two terminal windows dd'ing sdg and sdh at the same time.
>>>>>> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
>>>>>> count=1 iflag=direct
>>>>>> 1+0 records in
>>>>>> 1+0 records out
>>>>>> 4096 bytes (4.1 kB) copied, 14.371 s, 0.3 kB/s
>>>>>>
>>>>>> real   0m28.139s ############# WHY?! ################
>>>>>> user   0m0.000s
>>>>>> sys   0m0.000s
>>>>>>
>>>>>> #A single drive spin-up
>>>>>> root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096
>>>>>> count=1 iflag=direct
>>>>>> 1+0 records in
>>>>>> 1+0 records out
>>>>>> 4096 bytes (4.1 kB) copied, 14.4212 s, 0.3 kB/s
>>>>>>
>>>>>> real   0m14.424s
>>>>>> user   0m0.000s
>>>>>> sys   0m0.000s
>>>>>>
>>>>>> On Tue, Aug 12, 2014 at 8:23 AM, Adam Talbot <ajtalbot1@gmail.com> wrote:
>>>>>>> Thank you all for the input.  At this point I think I am going to write a
>>>>>>> simple daemon to do dm power management. I still think this would be a good
>>>>>>> feature set to roll into the driver stack, or madam-tools.
>>>>>>>
>>>>>>> As far as wear and tear on the disks. Yes, starting and stopping the drives
>>>>>>> shortens their life span. I don't trust my disks, regardless of
>>>>>>> starting/stopping, that is why I run RAID 6. Lets say I use my NAS with it's
>>>>>>> 7 disks for 2 hours a day, 7 days a week @ 10 watts per drive.  The current
>>>>>>> price for power in my area is $0.11 per kilowatt-hour. That comes out to be
>>>>>>> $5.62 per year to run my drives for 2 hours, daily.  But if I run my drives
>>>>>>> 24/7 it would cost me $67.45/year.  Basically it would cost me an extra
>>>>>>> $61.83/year to run the drives 24/7.  The 2TB 5400RPM SATA drives I have been
>>>>>>> picking up from local surplus, or auction websites are costing me $40~$50,
>>>>>>> including shipping and tax.  In other words I could buy a new disk every
>>>>>>> 8~10 months to replace failures and it would be the same cost. Drives don't
>>>>>>> fail that fast, even if I was start/stopping them 10 times daily. This is
>>>>>>> also completely ignoring the fact that drive prices are failing.  Sorry to
>>>>>>> disappoint, but I am going to spin down my array and save some money.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 12, 2014 at 2:46 AM, Wilson, Jonathan
>>>>>>> <piercing_male@hotmail.com> wrote:
>>>>>>>> On Tue, 2014-08-12 at 07:55 +0200, Can Jeuleers wrote:
>>>>>>>>> On 08/12/2014 03:21 AM, Larkin Lowrey wrote:
>>>>>>>>>> Also, leaving spin-up to the controller is
>>>>>>>>>> also not so hot since some controllers spin-up the drives sequentially
>>>>>>>>>> rather than in parallel.
>>>>>>>>> Sequential spin-up is a feature to some, because it avoids large power
>>>>>>>>> spikes.
>>>>>>>> I vaguely recall older drives had a jumper to set a delayed spin up so
>>>>>>>> they stayed in a low power (possibly un-spun up) mode when power was
>>>>>>>> applied and only woke up when a command was received (I think any
>>>>>>>> command, not a specific "wake up" one).
>>>>>>>>
>>>>>>>> Also as mentioned some controllers may also only wake drives one after
>>>>>>>> the other, likewise mdriad does not care about the underlying
>>>>>>>> hardware/driver stack, only that it eventually responds, and even then I
>>>>>>>> believe it will happily wait till the end of time if no response or
>>>>>>>> error is propagated up the stack; hence the time out in scsi_device
>>>>>>>> stack not in the mdraid.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CAH_2GhcwHszr75bbDq2aWRnnF8-ttpGERKE=CKt=Nri3ue9row@mail.gmail.com>]

* Re: Sleepy drives and MD RAID 6
       [not found]                       ` <CAH_2GhcwHszr75bbDq2aWRnnF8-ttpGERKE=CKt=Nri3ue9row@mail.gmail.com>
@ 2014-08-28 21:04                         ` Adam Talbot
  0 siblings, 0 replies; 17+ messages in thread
From: Adam Talbot @ 2014-08-28 21:04 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: Can Jeuleers, rm, linux-raid, Wilson, Jonathan

Solved.  The below link to the Gentoo forums is my formal write up.  I
hope this info can help those who follow me in the sleep drive
adventures. Now I am off to take a nap.
https://forums.gentoo.org/viewtopic-t-997086.html

The raw text, just encase the above link does not work:
Success!!

Why?
I wanted to put my DM software based RAID6 to sleep when not in use.
At 10 watts per drive, it adds up! I did not want to wait 10 seconds
per drive, in series for the array to come to life. I was tired of my
windows desktop hanging while waiting for a simple directory look up
on my NAS.

Disclaimer:
Do not come crying to me when you destroy a hard drive, loose all your
data, fry a power supply, or cause a small country to be erased from
the face of the Earth.

The key points covered below:
Drive Controller
Bcache
Inotify

Drive Controller
My server/NAS was running 3X LSI SAS 1068e controllers to control my 7
drives RAID 6. Turns out that the cards are hard coded to spin up in
series. No way to get around it, it just is. This happens to apply to
ANY card running the LSI 1068e chipset, such as a Dell Perc 6/i, or HP
P400. This may even apply to all LSI based cards. To make matters
worse, the cards are smart and will only spin one drive up at a time
across all 3 cards. My 7 disk RAID 6 was taking 50 seconds to spin up
(10 seconds per drive). This was dropped to 40 seconds when I moved 1
drives to the on board SATA controller. That was my first clue. Thanks
to the Linux-Raid group mailing list for the help isolating this one.

So I was on the Internets looking for a new, cheap, 12~16 port SATAII
controller card. I found a very strange card on ebay. A "Ciprico Inc.
RAIDCore" 16-port card. I cant even find any good pictures or links to
add to this post so you can see it. It basically has 4 Marvell
controllers and a pci-e bridge strapped onto a single card. No brains,
no nothing. Just a pure, dumb controller with out any spin up
stupidity. Same chipset (88SE6445) found on some RocketRAID cards. It
was EXACTLY what I was looking for. At a cost of $60 I was thrilled.
In Linux is shows up as a bridge + controller chips:

Code:
07:00.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI
Express Switch (rev 0d)
08:02.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI
Express Switch (rev 0d)
08:03.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI
Express Switch (rev 0d)
08:04.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI
Express Switch (rev 0d)
08:05.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI
Express Switch (rev 0d)
09:00.0 SCSI storage controller: Marvell Technology Group Ltd.
88SE6440 SAS/SATA PCIe controller (rev 02)
0a:00.0 SCSI storage controller: Marvell Technology Group Ltd.
88SE6440 SAS/SATA PCIe controller (rev 02)
0b:00.0 SCSI storage controller: Marvell Technology Group Ltd.
88SE6440 SAS/SATA PCIe controller (rev 02)
0c:00.0 SCSI storage controller: Marvell Technology Group Ltd.
88SE6440 SAS/SATA PCIe controller (rev 02)

Bcache https://www.kernel.org/doc/Documentation/bcache.txt
Now that I have the total spin up time down from 50 seconds
((number_of_drives *10) -2) to 10 seconds. I was able to address the
reaming 10 seconds using caching. In this case I am using bcache. My
operating system disks 2X are OCZ Deneva 240GB SSD's set up in a basic
mirror. I partitioned these drives out and used 24GB's as a caching
device for my raid. Quickly found out that bcache is unstable on the
3.16 kernel and was forced back to the 3.14lts kernel. After I landed
on the 3.14.15 kernel everything is running great. The basic bcache
setting work, but I wanted more:
Code:
#Setup bcache just the way I like it, hun-hun, hun-hun
#Get involved in read and write activities
echo "writeback" > /sys/block/bcache0/bcache/cache_mode

#Allow the bcache to put data in the cache, but get it out as fast as possible
echo "0" > /sys/block/bcache0/bcache/writeback_percent
echo "0" > /sys/block/bcache0/bcache/writeback_delay
echo $((16*1024)) > /sys/block/bcache0/bcache/writeback_rate

#Clean up jerky read performance on file that have never been cached.
echo "16M" > /sys/block/bcache0/bcache/readahead

I put all the above code in rc.local so my system picks them up on
boot. Writes still need to wake the array, but reads from cache don't
even wake up the drives.

Code:
root@nas:/data# time (dd if=/dev/zero of=foo.dd bs=4096k count=16 ; sync)
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 0.0963405 s, 697 MB/s

real    0m10.656s  #######Array spin up time#########
user    0m0.000s
sys     0m0.128s

root@nas:~# ./sleeping_raid_status.sh
/dev/sdc standby
...
/dev/sdd standby
root@nas:/data#  time (dd if=foo.dd of=/dev/null iflag=direct)
131072+0 records in
131072+0 records out
67108864 bytes (67 MB) copied, 0.118975 s, 564 MB/s

real    0m0.121s  ########Array never even woke up#########
user    0m0.024s
sys     0m0.096s
root@nas:~# ./sleeping_raid_status.sh
/dev/sdc standby
/dev/sdj standby
...

Inotify
Wait... The array did not spin up because it read from cache?! Not
good, but working exactly as expected. I have the file metadata in
cache, but what happens when I want to read the file... 10 seconds
later... Normally when I find a media file, I want to
read/watch/listen to it. I accessed the metadata; preemptive spin up?
Time for a fun script using inotify.

I actually took this script one step further then just preemptive spin
up and have it do all drive power management. Turns out different
drive manufactures interpret `hdparm -S 84 $DRIVE` (Go to sleep in 7m)
differently. This whole NAS was built on the cheap and I have 4
different types of drives in my array.

Code:
#!/bin/bash
WATCH_PATH="/data"
ARRAY_NAME="data"
SLEEPING_TIME_S="600"
ARRAY=`ls -la /dev/md/$ARRAY_NAME | awk -F"../" '{print $5}'`

PARTS=`ls /sys/block/$ARRAY/slaves | sed 's/[^a-z]*//g'`

set -m

while [ 1 ];do
  inotifywait $WATCH_PATH -qq -t $SLEEPING_TIME_S
  if [ $? = "0" ];then
    #echo -n "Start waking: "
    for i in $PARTS; do
      (hdparm -S 0 /dev/$i) &
    done
    #echo "Done"
  else
    #echo -n "Make go sleep: "
    for i in $PARTS; do
      STATE=`hdparm -C /dev/$i | grep "drive state is" | awk '{print $4}'`
      #Really should check that the array is not doing something block
related, like a check or rebuild
      if [ "$STATE" != "standby" ];then
        hdparm -y /dev/$i > /dev/null 2>&1
      fi
    done
    #echo "Done"
  fi
  sleep 1s
done

A few other key points have been addressed in this thread. There is
much greater details in the below posts:
Spinning drives up/down puts wear on drives, but it is more cost
effective to sleep the drives and wear them out then it is to pay for
the power.
Spinning up X drives at once puts a huge load on the PSU (Power Supply
Unit). According to Western Digital, their 7200RPM drives spike at 30
watts during spin up. You have been warned.
Warning, formatting a drive for bcache will remove ALL your data.
There is no way to remove bcache with out reformatting the device.
5400RPM drives take about 10 seconds to spin up. 7200RPM take about 14
seconds to spin up.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-12  1:03 Sleepy drives and MD RAID 6 Adam Talbot
  2014-08-12  1:21 ` Larkin Lowrey
@ 2014-08-12  3:29 ` Roman Mamedov
  2014-08-14 18:10   ` Bill Davidsen
  1 sibling, 1 reply; 17+ messages in thread
From: Roman Mamedov @ 2014-08-12  3:29 UTC (permalink / raw)
  To: Adam Talbot; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 709 bytes --]

On Mon, 11 Aug 2014 18:03:17 -0700
Adam Talbot <ajtalbot1@gmail.com> wrote:

> I need help from the Linux RAID pros.
> 
> To make a very long story short; I have a 7 disk in a RAID 6 array.  I
> put the drives to sleep after 7 minutes of inactivity.

It is well known that repeatedly spinning a drive down/up is absolutely the
worst possible thing you can do to it, from a long term reliability standpoint.
So my personal suggestion would be to reconsider if you really want this. The
power consumption from 7 spinning drives with no access should be no higher
than 60-70 watt; IMHO saving that amount, is not something that's worth risking
your disks and data for.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-12  3:29 ` Roman Mamedov
@ 2014-08-14 18:10   ` Bill Davidsen
  2014-08-14 19:33     ` Adam Talbot
  0 siblings, 1 reply; 17+ messages in thread
From: Bill Davidsen @ 2014-08-14 18:10 UTC (permalink / raw)
  To: linux-raid

Roman Mamedov wrote:
> On Mon, 11 Aug 2014 18:03:17 -0700
> Adam Talbot <ajtalbot1@gmail.com> wrote:
>
>> I need help from the Linux RAID pros.
>>
>> To make a very long story short; I have a 7 disk in a RAID 6 array.  I
>> put the drives to sleep after 7 minutes of inactivity.
>
> It is well known that repeatedly spinning a drive down/up is absolutely the
> worst possible thing you can do to it, from a long term reliability standpoint.
> So my personal suggestion would be to reconsider if you really want this. The
> power consumption from 7 spinning drives with no access should be no higher
> than 60-70 watt; IMHO saving that amount, is not something that's worth risking
> your disks and data for.
>
Unless you live in someplace really cold, there's the cost of pumping that heat 
out of the room. Running A/C can almost double your power cost, and running the 
room hot shortens your component life. In other words there's more cost than the 
power for most people. Even if my hardware can run at 90F, I can't.

There appears to be a partial solution, get a small SDD (<$100) and put the 
journal on it. Get two, run RAID1 if you must. Then configure the system to 
write the journal with all data (data=journal), and the write will be really 
fast, even if they don't fully complete for a minute or so. Doesn't help reads, 
of course. Turns the journal into cache, sort of.

I have the feeling that sequential spin is an option in a driver, but I can't 
remember or quickly find where. I say this because I had to set it on one 
machine I had, spinning up the whole array at one caused the power supply to 
overload, and until I could get a bigger one which would fit I set an option. 
That was long enough that I can't remember where I found that. Turn it off and 
all seven drives will ask for power at once, which probably isn't a great thing, 
but not my system.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Sleepy drives and MD RAID 6
  2014-08-14 18:10   ` Bill Davidsen
@ 2014-08-14 19:33     ` Adam Talbot
  0 siblings, 0 replies; 17+ messages in thread
From: Adam Talbot @ 2014-08-14 19:33 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid

I live in Portland OR USA.  My basement runs about 68F (20C) peak.
That is with my NAS at 100% activity for 40hours+.  That is why
cooling was not included in my calculation.  One could also include
2~3 year drive warranties into the equation of spin up/down.  Wow.
This whole topic is becoming a really good Wiki article.

I remember a staggering option some where in the firmware of the
cards.  I am going to check that tonight, before I spend any moneys.

I have two SSD's (240GB) RAID1.  I never knew I could break the
journal out to other drives?!  Got any links/information on how to do
this?

On Thu, Aug 14, 2014 at 11:10 AM, Bill Davidsen <davidsen@tmr.com> wrote:
> Roman Mamedov wrote:
>>
>> On Mon, 11 Aug 2014 18:03:17 -0700
>> Adam Talbot <ajtalbot1@gmail.com> wrote:
>>
>>> I need help from the Linux RAID pros.
>>>
>>> To make a very long story short; I have a 7 disk in a RAID 6 array.  I
>>> put the drives to sleep after 7 minutes of inactivity.
>>
>>
>> It is well known that repeatedly spinning a drive down/up is absolutely
>> the
>> worst possible thing you can do to it, from a long term reliability
>> standpoint.
>> So my personal suggestion would be to reconsider if you really want this.
>> The
>> power consumption from 7 spinning drives with no access should be no
>> higher
>> than 60-70 watt; IMHO saving that amount, is not something that's worth
>> risking
>> your disks and data for.
>>
> Unless you live in someplace really cold, there's the cost of pumping that
> heat out of the room. Running A/C can almost double your power cost, and
> running the room hot shortens your component life. In other words there's
> more cost than the power for most people. Even if my hardware can run at
> 90F, I can't.
>
> There appears to be a partial solution, get a small SDD (<$100) and put the
> journal on it. Get two, run RAID1 if you must. Then configure the system to
> write the journal with all data (data=journal), and the write will be really
> fast, even if they don't fully complete for a minute or so. Doesn't help
> reads, of course. Turns the journal into cache, sort of.
>
> I have the feeling that sequential spin is an option in a driver, but I
> can't remember or quickly find where. I say this because I had to set it on
> one machine I had, spinning up the whole array at one caused the power
> supply to overload, and until I could get a bigger one which would fit I set
> an option. That was long enough that I can't remember where I found that.
> Turn it off and all seven drives will ask for power at once, which probably
> isn't a great thing, but not my system.
>
> --
> Bill Davidsen <davidsen@tmr.com>
>   "We have more to fear from the bungling of the incompetent than from
> the machinations of the wicked."  - from Slashdot
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-08-28 21:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-12  1:03 Sleepy drives and MD RAID 6 Adam Talbot
2014-08-12  1:21 ` Larkin Lowrey
2014-08-12  1:31   ` Adam Talbot
2014-08-12  5:55   ` Can Jeuleers
2014-08-12  9:46     ` Wilson, Jonathan
2014-08-12 15:26       ` Adam Talbot
     [not found]       ` <CAH_2GhfBKiJAT=+zCFH0_xkmyTfWFi=c-F+z8rwL1x++UwU+KA@mail.gmail.com>
2014-08-13 16:07         ` Adam Talbot
2014-08-14 16:50           ` Adam Talbot
2014-08-14 17:00             ` Larkin Lowrey
2014-08-14 17:37               ` Adam Talbot
2014-08-14 18:05                 ` Larkin Lowrey
2014-08-18 15:41                   ` Adam Talbot
2014-08-18 16:16                     ` Larkin Lowrey
     [not found]                       ` <CAH_2GhcwHszr75bbDq2aWRnnF8-ttpGERKE=CKt=Nri3ue9row@mail.gmail.com>
2014-08-28 21:04                         ` Adam Talbot
2014-08-12  3:29 ` Roman Mamedov
2014-08-14 18:10   ` Bill Davidsen
2014-08-14 19:33     ` Adam Talbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox