linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Hardware advice for software raid
@ 2014-04-01 19:09 Barrett Lewis
  2014-04-01 19:28 ` Matt Garman
  2014-04-02  9:42 ` Stan Hoeppner
  0 siblings, 2 replies; 8+ messages in thread
From: Barrett Lewis @ 2014-04-01 19:09 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

I have a dedicated, consumer hw, media/file server with a 6 drive
raid6 of 2tb drives, all plugged directly into the sata ports on my
motherboard, an asrock z77.  A while back I had a problem which seemed
like cascade failure of drives, but Stan Hoeppner and Phil Turmel
helped me to figure out it was a PSU having gone bad and delivering
dirty power.

After replacing the PSU things worked fine, or so I thought.  At some
point I noticed I have quite a bit of trouble making it through a
resync without the machine locking up.  When I realized it wasn't tied
to a resync in particular but any extended heavy I/O, I lowered the
sync_speed_max to 10,000, I was able to get through a repair (no
mismatches found!).

I'm guessing that the motherboard has some problem (perhaps
originating from the bad PSU?), and I want to switch to a dedicated
HBA card to make this more modular.

Stan had suggested the LSI SATA/SAS 9211-8i in many threads in the
archives.  If I use this card as my HBA, is there any particular
motherboard which would be better suited than others?

thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hardware advice for software raid
  2014-04-01 19:09 Hardware advice for software raid Barrett Lewis
@ 2014-04-01 19:28 ` Matt Garman
  2014-04-01 19:43   ` Scott D'Vileskis
  2014-04-02  9:42 ` Stan Hoeppner
  1 sibling, 1 reply; 8+ messages in thread
From: Matt Garman @ 2014-04-01 19:28 UTC (permalink / raw)
  To: Barrett Lewis; +Cc: linux-raid@vger.kernel.org

On Tue, Apr 1, 2014 at 2:09 PM, Barrett Lewis
<barrett.lewis.mitsi@gmail.com> wrote:
> I'm guessing that the motherboard has some problem (perhaps
> originating from the bad PSU?), and I want to switch to a dedicated
> HBA card to make this more modular.

I had a problem with a different motherboard: my system would randomly
reboot from time to time.  The motherboard was a Biostar nm70i-847,
and I found other people were having the problem too:
    http://ubuntuforums.org/showthread.php?t=2094859
The solution was trivial, just add "i915.i915_enable_rc6=0" to the
kernel commandline.  Now that system appears to be completely stable.

I mention this only to suggest that maybe there is some other, easily
fixable problem with your motherboard.  IOW, maybe there is a cheaper
solution that doesn't require new hardware.

You might also play with things like AHCI vs IDE mode in your BIOS...
these days, I think AHCI is generally the way to go, but it seems some
boards still ship with SATA mode set to "native" or "IDE", rather than
AHCI.

> Stan had suggested the LSI SATA/SAS 9211-8i in many threads in the
> archives.  If I use this card as my HBA, is there any particular
> motherboard which would be better suited than others?

For home use, I use the IBM ServeRAID m1050, which I believe is a
re-branded LSI 9220-8i.  I use this because they can be purchased
fairly cheap on eBay.  In fact, I flash mine to "IT" mode (as opposed
to "IR" mode; IT mode removes all RAID features, and they become a
truly dumb, non-bootable HBA).

If you care about power consumption at all, note that this card will
add about five to 10 watts of power draw to your system (depending on
your PSU's efficiency).  It will also add a little bit of heat
(possibly if you have a small case, bad airflow, or live in a hot
climate with no air conditioning).

With Linux software RAID, I think you're already "modular"; that is,
your array is already "portable" across other Linux systems with
different hardware, regardless of HBA or onboard SATA.

The other thing I'd look out for: *some* cheap consumer motherboards
have crippled PCIe slots that only allow graphics cards to be
installed in them.  I haven't seen this in a long time, but many years
ago, I found out the hard way that my non-graphics cards wouldn't work
in a (supposedly standard) PCIe slot.  Hopefully the situation has
improved, as even that cheap Biostar Celeron board accepts the IBM
m1050.  But, if in doubt, confirm with the manufacturer before
purchasing.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hardware advice for software raid
  2014-04-01 19:28 ` Matt Garman
@ 2014-04-01 19:43   ` Scott D'Vileskis
  0 siblings, 0 replies; 8+ messages in thread
From: Scott D'Vileskis @ 2014-04-01 19:43 UTC (permalink / raw)
  To: Matt Garman; +Cc: Barrett Lewis, linux-raid@vger.kernel.org

I would take a close look at your motherboard's capacitors and make
sure none are bulging.

Additionally, it might be worth running a memtest overnight, although
bad memory would only be responsible if you heavy IO also came with
heavy memory usage.

I run an Intel i7 desktop board  (Model number doesn't come to mind)
w/ 6 SATA ports/drives and bought a $40 SATA3 PCIe card for my SSD and
Blueray Burner. All of this runs pretty well with a ~500W power supply

Recently I purchased a Lenovo IX4-300D NAS appliance for under $200 at
Amazon and migrated 4 drives and a bunch of data to it. It runs a
flavor of Debian on ARM and uses linux md, is pretty quiet, and only
uses a few watts. The nicest part is that if the hardware quits I know
I can stick the disks in my PC.

On Tue, Apr 1, 2014 at 3:28 PM, Matt Garman <matthew.garman@gmail.com> wrote:
> On Tue, Apr 1, 2014 at 2:09 PM, Barrett Lewis
> <barrett.lewis.mitsi@gmail.com> wrote:
>> I'm guessing that the motherboard has some problem (perhaps
>> originating from the bad PSU?), and I want to switch to a dedicated
>> HBA card to make this more modular.
>
> I had a problem with a different motherboard: my system would randomly
> reboot from time to time.  The motherboard was a Biostar nm70i-847,
> and I found other people were having the problem too:
>     http://ubuntuforums.org/showthread.php?t=2094859
> The solution was trivial, just add "i915.i915_enable_rc6=0" to the
> kernel commandline.  Now that system appears to be completely stable.
>
> I mention this only to suggest that maybe there is some other, easily
> fixable problem with your motherboard.  IOW, maybe there is a cheaper
> solution that doesn't require new hardware.
>
> You might also play with things like AHCI vs IDE mode in your BIOS...
> these days, I think AHCI is generally the way to go, but it seems some
> boards still ship with SATA mode set to "native" or "IDE", rather than
> AHCI.
>
>> Stan had suggested the LSI SATA/SAS 9211-8i in many threads in the
>> archives.  If I use this card as my HBA, is there any particular
>> motherboard which would be better suited than others?
>
> For home use, I use the IBM ServeRAID m1050, which I believe is a
> re-branded LSI 9220-8i.  I use this because they can be purchased
> fairly cheap on eBay.  In fact, I flash mine to "IT" mode (as opposed
> to "IR" mode; IT mode removes all RAID features, and they become a
> truly dumb, non-bootable HBA).
>
> If you care about power consumption at all, note that this card will
> add about five to 10 watts of power draw to your system (depending on
> your PSU's efficiency).  It will also add a little bit of heat
> (possibly if you have a small case, bad airflow, or live in a hot
> climate with no air conditioning).
>
> With Linux software RAID, I think you're already "modular"; that is,
> your array is already "portable" across other Linux systems with
> different hardware, regardless of HBA or onboard SATA.
>
> The other thing I'd look out for: *some* cheap consumer motherboards
> have crippled PCIe slots that only allow graphics cards to be
> installed in them.  I haven't seen this in a long time, but many years
> ago, I found out the hard way that my non-graphics cards wouldn't work
> in a (supposedly standard) PCIe slot.  Hopefully the situation has
> improved, as even that cheap Biostar Celeron board accepts the IBM
> m1050.  But, if in doubt, confirm with the manufacturer before
> purchasing.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hardware advice for software raid
  2014-04-01 19:09 Hardware advice for software raid Barrett Lewis
  2014-04-01 19:28 ` Matt Garman
@ 2014-04-02  9:42 ` Stan Hoeppner
  2014-04-07 19:28   ` Barrett Lewis
  1 sibling, 1 reply; 8+ messages in thread
From: Stan Hoeppner @ 2014-04-02  9:42 UTC (permalink / raw)
  To: Barrett Lewis, linux-raid@vger.kernel.org

On 4/1/2014 2:09 PM, Barrett Lewis wrote:
> I have a dedicated, consumer hw, media/file server with a 6 drive
> raid6 of 2tb drives, all plugged directly into the sata ports on my
> motherboard, an asrock z77.  

There are 6 models of the Asrock z77.  All but one contain a PCH
heatsink designed to look cool rather than properly cool the chip.  The
Asrock z77 Extreme 11 has a fan so is an exception, and also has an
onboard 8 port LSI SAS controller (9211-8i), so I assume you do not have
the Extreme 11.

> A while back I had a problem which seemed
> like cascade failure of drives, but Stan Hoeppner and Phil Turmel
> helped me to figure out it was a PSU having gone bad and delivering
> dirty power.
> 
> After replacing the PSU things worked fine, or so I thought.  At some
> point I noticed I have quite a bit of trouble making it through a
> resync without the machine locking up.  When I realized it wasn't tied
> to a resync in particular but any extended heavy I/O, I lowered the
> sync_speed_max to 10,000, I was able to get through a repair (no
> mismatches found!).

With consumer PC hardware random lockups occurring only under heavy disk
IO are most often the result of thermal buildup in the PCH (Northbridge)
chip.  This can occur when all the drives are connected to its SATA
ports as in your case, but it can also occur when using one or more
SAS/SATA HBAs if the PCIe slots are connected through the PCH.  The odds
are very good that your lockups are a result of the poor PCH heatsink
design on the Asrock boards exacerbated by insufficient case airflow
across the heatsink.  What case is this z77 board in?  Be specific
please so I can pull up the schematic.

Regardless of case the solution is straightforward and inexpensive:
install a low profile solid copper active cooler, such as this one:

http://www.frozencpu.com/products/6717/vid-102/Enzotech_SLF-1_Forged_Copper_Northbridge_Southbridge_Low-Profile_Heatsink.html?tl=g40c16s501

The SLF-1 has 53-59mm hole spacing.  Asrock doesn't provide such
information in their manual and after 30 minutes I can't find forum
posts or other sources presenting this info.  Measure your PCH heatsink
mounting hole spacing before ordering.  If it's less than 53mm
center-to-center you need the SLF-30, and if it's more than 59mm you
need the SLF-40.  If you think your case airflow over the PCH is
actually greater than zero you can go with the CNB-R1 passive unit which
has 3 mounting rings to fit all hole spacings.  But with it you lose two
expansion slots.  Here's the product lineup:

http://www.enzotechnology.com/air_cooling.htm

There are other brands.  Enzo products are solid copper and compact,
with these 3 fan models fitting under your PCIe cards.  You lose no PCI
slots as with nearly all other chipset coolers.  I recommend them
because they are high quality and work well, which is why they are also
more expensive than most others.  That being the case, ~$35 including
shipping is a small sum to part with to eliminate the lockups.

> I'm guessing that the motherboard has some problem (perhaps
> originating from the bad PSU?), and I want to switch to a dedicated
> HBA card to make this more modular.

The one glaring problem is the woefully inadequate PCH heatsink.
Replacing it as suggested will very likely eliminate the lockups, for
about 1/8th the cost of a discrete LSI HBA.  And if it doesn't you will
still have increased the lifespan of the PCH chip by at least a couple
of years due to lowering operating temperature by 10-15°C or more.

> Stan had suggested the LSI SATA/SAS 9211-8i in many threads in the
> archives.  If I use this card as my HBA, is there any particular
> motherboard which would be better suited than others?

Wait and cross this bridge later.  If it turns out this board has other
problems that we can't identify and fix, there's a micro-ATX Intel
server board with 6 SATA-2 ports on the PCH, socket LGA 1155, dual Intel
GbE ports, integrated video, etc for ~$160 at Newegg.  Your CPU,  RAM,
and drives will drop right in, and you won't have to spend another $200
on the LSI.  It'll save you ~$150 overall compared to a consumer board+LSI.

Cheers,

Stan


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hardware advice for software raid
  2014-04-02  9:42 ` Stan Hoeppner
@ 2014-04-07 19:28   ` Barrett Lewis
  2014-04-08  3:53     ` Stan Hoeppner
  0 siblings, 1 reply; 8+ messages in thread
From: Barrett Lewis @ 2014-04-07 19:28 UTC (permalink / raw)
  To: stan, linux-raid@vger.kernel.org

Sorry for the delay, I didn't want to reply until I got time to get
down into the machine and do some research.

On Tue, Apr 1, 2014 at 2:43 PM, Scott D'Vileskis <sdvileskis@gmail.com> wrote:
> I would take a close look at your motherboard's capacitors and make
> sure none are bulging.

I checked and none that I can see are visibly bulging.

> Additionally, it might be worth running a memtest overnight, although
> bad memory would only be responsible if you heavy IO also came with
> heavy memory usage.

I have actually run memtest for 48 hours since this started happening
and not found any problem with the memory.
That was my first thought too though.



On Wed, Apr 2, 2014 at 4:42 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

> There are 6 models of the Asrock z77.  All but one contain a PCH
> heatsink designed to look cool rather than properly cool the chip.  The
> Asrock z77 Extreme 11 has a fan so is an exception, and also has an
> onboard 8 port LSI SAS controller (9211-8i), so I assume you do not have
> the Extreme 11.

 Mine is the Z77 Extreme4, picture below.


> With consumer PC hardware random lockups occurring only under heavy disk
> IO are most often the result of thermal buildup in the PCH (Northbridge)
> chip.  This can occur when all the drives are connected to its SATA
> ports as in your case, but it can also occur when using one or more
> SAS/SATA HBAs if the PCIe slots are connected through the PCH.  The odds
> are very good that your lockups are a result of the poor PCH heatsink
> design on the Asrock boards exacerbated by insufficient case airflow
> across the heatsink.  What case is this z77 board in?  Be specific
> please so I can pull up the schematic.

The case is an NZXT H2.  It has all (and only) the stock fans running.

Overheating would fit the way it is fine until a long heavy operation,
and even then doesn't crash until a random time a ways into the
operation.
Much of the hardware stuff is outside of my domain of knowledge which
is why I was leaning towards buying new equipment.

Is the PCH the part I circled in yellow?  http://i.imgur.com/safg5iW.jpg

I've been doing a lot of googling and and see that the northbridge is
usually between the PCIe slot and the CPU but there doesn't seem to be
any large object in that place on this board.
Can you confirm this is the proper part for me to measure for a new heatsink?

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hardware advice for software raid
  2014-04-07 19:28   ` Barrett Lewis
@ 2014-04-08  3:53     ` Stan Hoeppner
  2014-04-09 20:21       ` Barrett Lewis
  0 siblings, 1 reply; 8+ messages in thread
From: Stan Hoeppner @ 2014-04-08  3:53 UTC (permalink / raw)
  To: Barrett Lewis, linux-raid@vger.kernel.org

On 4/7/2014 2:28 PM, Barrett Lewis wrote:
> Sorry for the delay, I didn't want to reply until I got time to get
> down into the machine and do some research.
...
> On Wed, Apr 2, 2014 at 4:42 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> 
>> There are 6 models of the Asrock z77.  All but one contain a PCH
>> heatsink designed to look cool rather than properly cool the chip.  The
>> Asrock z77 Extreme 11 has a fan so is an exception, and also has an
>> onboard 8 port LSI SAS controller (9211-8i), so I assume you do not have
>> the Extreme 11.
> 
>  Mine is the Z77 Extreme4, picture below.

Yep, same as all Z77s but for the Extreme 11.

>> With consumer PC hardware random lockups occurring only under heavy disk
>> IO are most often the result of thermal buildup in the PCH (Northbridge)
>> chip.  This can occur when all the drives are connected to its SATA
>> ports as in your case, but it can also occur when using one or more
>> SAS/SATA HBAs if the PCIe slots are connected through the PCH.  The odds
>> are very good that your lockups are a result of the poor PCH heatsink
>> design on the Asrock boards exacerbated by insufficient case airflow
>> across the heatsink.  What case is this z77 board in?  Be specific
>> please so I can pull up the schematic.
> 
> The case is an NZXT H2.  It has all (and only) the stock fans running.

This case seems to have a have a decent airflow design.

> Overheating would fit the way it is fine until a long heavy operation,
> and even then doesn't crash until a random time a ways into the
> operation.
> Much of the hardware stuff is outside of my domain of knowledge which
> is why I was leaning towards buying new equipment.

Replacing the heatsink is low cost, low risk.  If it doesn't fix the
problem and you end up replacing the mobo you can likely use it on the
new board's PCH as well.

> Is the PCH the part I circled in yellow?  http://i.imgur.com/safg5iW.jpg

Yep, that's it.  Notice the aesthetic cover attached over the heat sink
fins?  The aluminum heatsink under it has high thermal resistance due to
being aluminum and having a small fin surface area.  That cover
increases the thermal resistance further by preventing airflow from
reaching the fins.  Couple this with the fact that low quality thermal
interface material (TIM, paste, tape) is used on factory installed mobo
chipset heatsinks, and this demonstrates why the chip is likely getting
too hot under IO load.

> I've been doing a lot of googling and and see that the northbridge is
> usually between the PCIe slot and the CPU but there doesn't seem to be
> any large object in that place on this board.

Due to ever increasing integration, most mobos today have a single
system support chip in place of the previous north/south bridge duo.
Aftermarket heatsinks are typically sized such that larger units are for
the "northbridge" and smaller units for the "southbridge".
"Northbridge" heatsinks are typically used for single chip systems as
the mounting footprint and thermal output are similar.

> Can you confirm this is the proper part for me to measure for a new heatsink?

I can.  It is.

Do not attempt this with the system running.  Power down, remove all
external cables and sit the chassis on a table.  Ground yourself by
touching the chassis or a metal table leg, etc, to discharge any static
from your body.

Measure between the approximate centers of the two spring loaded plastic
mounting tabs.  You don't need an exact measurement to 1mm, but a
ballpark.  The hole spacing is fairly standardized by the industry.
Your measurement should fall into one of 3 ranges, and this will dictate
which heatsink you buy:

47.5 - 53mm
53   - 59mm
59   - 63mm

Let me know the measurement and I'll recommend the best unit for your
application.  It seems you won't be using all of your expansion slots
any time soon so going with a taller passive unit shouldn't be a
problem.  A taller/larger passive unit in a case with good airflow is
preferable to a low profile unit w/fan due to 2/3:1 greater mass, no fan
to fail, no noise.  After you select the heatsink I'll give you tips on
removing the current one and installing the new one.  Proper
installation is more important than which heatsink you install, as doing
it wrong may result in higher temperatures than what you have now.

As always, the devil is in the details.

Cheers,

Stan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hardware advice for software raid
  2014-04-08  3:53     ` Stan Hoeppner
@ 2014-04-09 20:21       ` Barrett Lewis
  2014-04-11 10:31         ` Stan Hoeppner
  0 siblings, 1 reply; 8+ messages in thread
From: Barrett Lewis @ 2014-04-09 20:21 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

On Mon, Apr 7, 2014 at 10:53 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> Replacing the heatsink is low cost, low risk.  If it doesn't fix the
> problem and you end up replacing the mobo you can likely use it on the
> new board's PCH as well.

Good point, I'm down for just owning a good chipset heatsink for this
or any other board.

> Due to ever increasing integration, most mobos today have a single
> system support chip in place of the previous north/south bridge duo.
> Aftermarket heatsinks are typically sized such that larger units are for
> the "northbridge" and smaller units for the "southbridge".
> "Northbridge" heatsinks are typically used for single chip systems as
> the mounting footprint and thermal output are similar.

Thanks for explaining this, I was getting really confused.

> Measure between the approximate centers of the two spring loaded plastic
> mounting tabs.  You don't need an exact measurement to 1mm, but a
> ballpark.  The hole spacing is fairly standardized by the industry.
> Your measurement should fall into one of 3 ranges, and this will dictate
> which heatsink you buy:
>
> 47.5 - 53mm
> 53   - 59mm
> 59   - 63mm
>

I ran home during lunch and tried to get a quick measurement.  I kept
coming up with 53mm which makes me nervous since it straddles two of
your ranges, but I measured twice. Since you say the bigger passive
cooler is better anyway, maybe I should get that CNB-R1, with the
multiple mounting rings?

My only hesitation is if I do ever decide to go with a discrete HBA
(and potentially needed more than one if I had more than 8 drives) I
would be using more of those expansion slots.

On Mon, Apr 7, 2014 at 10:53 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 4/7/2014 2:28 PM, Barrett Lewis wrote:
>> Sorry for the delay, I didn't want to reply until I got time to get
>> down into the machine and do some research.
> ...
>> On Wed, Apr 2, 2014 at 4:42 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>>
>>> There are 6 models of the Asrock z77.  All but one contain a PCH
>>> heatsink designed to look cool rather than properly cool the chip.  The
>>> Asrock z77 Extreme 11 has a fan so is an exception, and also has an
>>> onboard 8 port LSI SAS controller (9211-8i), so I assume you do not have
>>> the Extreme 11.
>>
>>  Mine is the Z77 Extreme4, picture below.
>
> Yep, same as all Z77s but for the Extreme 11.
>
>>> With consumer PC hardware random lockups occurring only under heavy disk
>>> IO are most often the result of thermal buildup in the PCH (Northbridge)
>>> chip.  This can occur when all the drives are connected to its SATA
>>> ports as in your case, but it can also occur when using one or more
>>> SAS/SATA HBAs if the PCIe slots are connected through the PCH.  The odds
>>> are very good that your lockups are a result of the poor PCH heatsink
>>> design on the Asrock boards exacerbated by insufficient case airflow
>>> across the heatsink.  What case is this z77 board in?  Be specific
>>> please so I can pull up the schematic.
>>
>> The case is an NZXT H2.  It has all (and only) the stock fans running.
>
> This case seems to have a have a decent airflow design.
>
>> Overheating would fit the way it is fine until a long heavy operation,
>> and even then doesn't crash until a random time a ways into the
>> operation.
>> Much of the hardware stuff is outside of my domain of knowledge which
>> is why I was leaning towards buying new equipment.
>
> Replacing the heatsink is low cost, low risk.  If it doesn't fix the
> problem and you end up replacing the mobo you can likely use it on the
> new board's PCH as well.
>
>> Is the PCH the part I circled in yellow?  http://i.imgur.com/safg5iW.jpg
>
> Yep, that's it.  Notice the aesthetic cover attached over the heat sink
> fins?  The aluminum heatsink under it has high thermal resistance due to
> being aluminum and having a small fin surface area.  That cover
> increases the thermal resistance further by preventing airflow from
> reaching the fins.  Couple this with the fact that low quality thermal
> interface material (TIM, paste, tape) is used on factory installed mobo
> chipset heatsinks, and this demonstrates why the chip is likely getting
> too hot under IO load.
>
>> I've been doing a lot of googling and and see that the northbridge is
>> usually between the PCIe slot and the CPU but there doesn't seem to be
>> any large object in that place on this board.
>
> Due to ever increasing integration, most mobos today have a single
> system support chip in place of the previous north/south bridge duo.
> Aftermarket heatsinks are typically sized such that larger units are for
> the "northbridge" and smaller units for the "southbridge".
> "Northbridge" heatsinks are typically used for single chip systems as
> the mounting footprint and thermal output are similar.
>
>> Can you confirm this is the proper part for me to measure for a new heatsink?
>
> I can.  It is.
>
> Do not attempt this with the system running.  Power down, remove all
> external cables and sit the chassis on a table.  Ground yourself by
> touching the chassis or a metal table leg, etc, to discharge any static
> from your body.
>
> Measure between the approximate centers of the two spring loaded plastic
> mounting tabs.  You don't need an exact measurement to 1mm, but a
> ballpark.  The hole spacing is fairly standardized by the industry.
> Your measurement should fall into one of 3 ranges, and this will dictate
> which heatsink you buy:
>
> 47.5 - 53mm
> 53   - 59mm
> 59   - 63mm
>
> Let me know the measurement and I'll recommend the best unit for your
> application.  It seems you won't be using all of your expansion slots
> any time soon so going with a taller passive unit shouldn't be a
> problem.  A taller/larger passive unit in a case with good airflow is
> preferable to a low profile unit w/fan due to 2/3:1 greater mass, no fan
> to fail, no noise.  After you select the heatsink I'll give you tips on
> removing the current one and installing the new one.  Proper
> installation is more important than which heatsink you install, as doing
> it wrong may result in higher temperatures than what you have now.
>
> As always, the devil is in the details.
>
> Cheers,
>
> Stan
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hardware advice for software raid
  2014-04-09 20:21       ` Barrett Lewis
@ 2014-04-11 10:31         ` Stan Hoeppner
  0 siblings, 0 replies; 8+ messages in thread
From: Stan Hoeppner @ 2014-04-11 10:31 UTC (permalink / raw)
  To: Barrett Lewis, linux-raid@vger.kernel.org

On 4/9/2014 3:21 PM, Barrett Lewis wrote:
> On Mon, Apr 7, 2014 at 10:53 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> Replacing the heatsink is low cost, low risk.  If it doesn't fix the
>> problem and you end up replacing the mobo you can likely use it on the
>> new board's PCH as well.
> 
> Good point, I'm down for just owning a good chipset heatsink for this
> or any other board.
> 
>> Due to ever increasing integration, most mobos today have a single
>> system support chip in place of the previous north/south bridge duo.
>> Aftermarket heatsinks are typically sized such that larger units are for
>> the "northbridge" and smaller units for the "southbridge".
>> "Northbridge" heatsinks are typically used for single chip systems as
>> the mounting footprint and thermal output are similar.
> 
> Thanks for explaining this, I was getting really confused.
> 
>> Measure between the approximate centers of the two spring loaded plastic
>> mounting tabs.  You don't need an exact measurement to 1mm, but a
>> ballpark.  The hole spacing is fairly standardized by the industry.
>> Your measurement should fall into one of 3 ranges, and this will dictate
>> which heatsink you buy:
>>
>> 47.5 - 53mm
>> 53   - 59mm
>> 59   - 63mm
>>
> 
> I ran home during lunch and tried to get a quick measurement.  I kept
> coming up with 53mm which makes me nervous since it straddles two of
> your ranges, but I measured twice. 

Don't be nervous.  Enzo makes only one model to fit 47.5-53.5mm, the
SLF-30 for Southbridge chips.  They make 5 models to fit 53-59mm
Northbridge or single chip footprints.  You want one of these.

> Since you say the bigger passive cooler is better anyway, 

Only in a chassis with good airflow over the motherboard.  If you don't
have this an active cooler is better.  But I'd guess your server does
have decent airflow over this area of the board.

> maybe I should get that CNB-R1, with the multiple mounting rings?

Overkill, lose slots.

> My only hesitation is if I do ever decide to go with a discrete HBA
> (and potentially needed more than one if I had more than 8 drives) I
> would be using more of those expansion slots.

Just go with the EnzoTech CNB-S1L which is tailored to your application.
 It has a mass of 50 grams and is 1.41 x 1.41 x 0.46" in dimension, will
not interfere with expansion cards.  Its thermal performance is roughly
equivalent to an aluminum heatsink 2-3x its dimensions and fin area.
The base is polished to .0004" so with correct application of Arctic
thermal interface material your thermal junction between heatsink and
chip will be 100 times better than the current heatsink, which may not
even have any TIM--factory NB/SB heatsinks often have no TIM, or they
have cheap thermal tape.

Get:
http://www.frozencpu.com/products/5520/vid-84/Enzotech_Forged_Copper_Northbridge_Low-Profile_Heatsink_CNB-S1L_-_36mm_x_36mm_x_116mm.html?tl=g40c16s500&id=ThjnNJ9H

http://www.frozencpu.com/products/3769/thr-02/Arctic_Alumina_Premium_Ceramic_Thermal_Compound_-_175_Grams_AA-175G.html?tl=g8c127s533

Pick up a small can of acetone and q-tips.  Nail polish remover will
work as well.  These will be used to delicately remove any TIM/tape film
stuck to the chip after removing the stock heatsink, if there is any.
May want to pull it and check first.  Once you have the chip completely
clean, apply a tiny dab of Arctic TIM to the center of the chip, about
the size of the lead on the end of a sharp pencil.  Use a razor blade or
credit card, any rigid plastic card, to spread the TIM into an even film
across the chip.  You want a layer across the entire surface so thin you
can just see through it.  Place the new heatsink on top and push the
retaining pins through the board.  You're done.

Cheers,

Stan



> On Mon, Apr 7, 2014 at 10:53 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> On 4/7/2014 2:28 PM, Barrett Lewis wrote:
>>> Sorry for the delay, I didn't want to reply until I got time to get
>>> down into the machine and do some research.
>> ...
>>> On Wed, Apr 2, 2014 at 4:42 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>>>
>>>> There are 6 models of the Asrock z77.  All but one contain a PCH
>>>> heatsink designed to look cool rather than properly cool the chip.  The
>>>> Asrock z77 Extreme 11 has a fan so is an exception, and also has an
>>>> onboard 8 port LSI SAS controller (9211-8i), so I assume you do not have
>>>> the Extreme 11.
>>>
>>>  Mine is the Z77 Extreme4, picture below.
>>
>> Yep, same as all Z77s but for the Extreme 11.
>>
>>>> With consumer PC hardware random lockups occurring only under heavy disk
>>>> IO are most often the result of thermal buildup in the PCH (Northbridge)
>>>> chip.  This can occur when all the drives are connected to its SATA
>>>> ports as in your case, but it can also occur when using one or more
>>>> SAS/SATA HBAs if the PCIe slots are connected through the PCH.  The odds
>>>> are very good that your lockups are a result of the poor PCH heatsink
>>>> design on the Asrock boards exacerbated by insufficient case airflow
>>>> across the heatsink.  What case is this z77 board in?  Be specific
>>>> please so I can pull up the schematic.
>>>
>>> The case is an NZXT H2.  It has all (and only) the stock fans running.
>>
>> This case seems to have a have a decent airflow design.
>>
>>> Overheating would fit the way it is fine until a long heavy operation,
>>> and even then doesn't crash until a random time a ways into the
>>> operation.
>>> Much of the hardware stuff is outside of my domain of knowledge which
>>> is why I was leaning towards buying new equipment.
>>
>> Replacing the heatsink is low cost, low risk.  If it doesn't fix the
>> problem and you end up replacing the mobo you can likely use it on the
>> new board's PCH as well.
>>
>>> Is the PCH the part I circled in yellow?  http://i.imgur.com/safg5iW.jpg
>>
>> Yep, that's it.  Notice the aesthetic cover attached over the heat sink
>> fins?  The aluminum heatsink under it has high thermal resistance due to
>> being aluminum and having a small fin surface area.  That cover
>> increases the thermal resistance further by preventing airflow from
>> reaching the fins.  Couple this with the fact that low quality thermal
>> interface material (TIM, paste, tape) is used on factory installed mobo
>> chipset heatsinks, and this demonstrates why the chip is likely getting
>> too hot under IO load.
>>
>>> I've been doing a lot of googling and and see that the northbridge is
>>> usually between the PCIe slot and the CPU but there doesn't seem to be
>>> any large object in that place on this board.
>>
>> Due to ever increasing integration, most mobos today have a single
>> system support chip in place of the previous north/south bridge duo.
>> Aftermarket heatsinks are typically sized such that larger units are for
>> the "northbridge" and smaller units for the "southbridge".
>> "Northbridge" heatsinks are typically used for single chip systems as
>> the mounting footprint and thermal output are similar.
>>
>>> Can you confirm this is the proper part for me to measure for a new heatsink?
>>
>> I can.  It is.
>>
>> Do not attempt this with the system running.  Power down, remove all
>> external cables and sit the chassis on a table.  Ground yourself by
>> touching the chassis or a metal table leg, etc, to discharge any static
>> from your body.
>>
>> Measure between the approximate centers of the two spring loaded plastic
>> mounting tabs.  You don't need an exact measurement to 1mm, but a
>> ballpark.  The hole spacing is fairly standardized by the industry.
>> Your measurement should fall into one of 3 ranges, and this will dictate
>> which heatsink you buy:
>>
>> 47.5 - 53mm
>> 53   - 59mm
>> 59   - 63mm
>>
>> Let me know the measurement and I'll recommend the best unit for your
>> application.  It seems you won't be using all of your expansion slots
>> any time soon so going with a taller passive unit shouldn't be a
>> problem.  A taller/larger passive unit in a case with good airflow is
>> preferable to a low profile unit w/fan due to 2/3:1 greater mass, no fan
>> to fail, no noise.  After you select the heatsink I'll give you tips on
>> removing the current one and installing the new one.  Proper
>> installation is more important than which heatsink you install, as doing
>> it wrong may result in higher temperatures than what you have now.
>>
>> As always, the devil is in the details.
>>
>> Cheers,
>>
>> Stan
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-04-11 10:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-01 19:09 Hardware advice for software raid Barrett Lewis
2014-04-01 19:28 ` Matt Garman
2014-04-01 19:43   ` Scott D'Vileskis
2014-04-02  9:42 ` Stan Hoeppner
2014-04-07 19:28   ` Barrett Lewis
2014-04-08  3:53     ` Stan Hoeppner
2014-04-09 20:21       ` Barrett Lewis
2014-04-11 10:31         ` Stan Hoeppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).