linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Corey Coughlin <corey.coughlin.cc3@gmail.com>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	Tomasz Kusmierz <tom.kusmierz@gmail.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: raid1 has failing disks, but smart is clear
Date: Fri, 8 Jul 2016 22:13:50 -0700	[thread overview]
Message-ID: <5780880E.1030004@gmail.com> (raw)
In-Reply-To: <8d9f3f42-4a96-6791-790d-89ce02e5caaf@gmail.com>

Hi all,
     One thing I may not have made clear, this wasn't a system that has 
been running for a month and just became up corrupt out of nowhere, the 
corruption showed up the first time I tried to run a filesystem balance, 
basically the day after I set up the filesystem and copied files over.  
I was hoping to get it stable and then add some more disks, but since it 
wasn't stable right off the top, I'm assuming the problem is bigger than 
some bad memory. I ran the stress.sh on two disks connected to ports on 
the motherboard, that seemed to work fine.  And I'm using a pair of WD 
green drives, in case there's an issue with those.  I did order some WD 
red NAS drives, hoping they arrive soon.  I'm running the stress now 
with them connected to the SAS card, trying to only let them run for a 
day to see if something bad happens, it's a 4 port card so if there's a 
problem with a specific port or cable, it could take me a while to find 
it.  I'm hoping it shows up in a somewhat obvious way.  But thanks for 
all the help, the stress.sh runs give me a clear way to try to debug 
this, thanks again for that tip.

     ------- Corey

On 07/08/2016 05:14 AM, Austin S. Hemmelgarn wrote:
> On 2016-07-08 07:14, Tomasz Kusmierz wrote:
>>>
>>> Well, I was able to run memtest on the system last night, that 
>>> passed with
>>> flying colors, so I'm now leaning toward the problem being in the 
>>> sas card.
>>> But I'll have to run some more tests.
>>>
>>
>> Seriously use the "stres.sh" for couple of days, When I was running
>> memtest it was running continuously for 3 days without the error, day
>> of stres.sh and errors started showing up.
>> Be VERY careful with trusting any sort of that tool, modern CPU's lye
>> to you continuously !!!
>> 1. You may think that you've wrote best on the planet code that
>> bypasses a CPU cache, but in reality since CPU's are multicore you can
>> end up with overzealous MPMD traping you inside of you cache memory
>> and all you resting will do is write a page (trapped in cache) read it
>> from cache (coherency mechanism, not the mis/hit one) will trap you
>> inside of L3 so you have no clue you don't touch the ram, then CPU
>> will just dump your page to RAM and "job done"
>> 2. Since coherency problems and real problems with non blocking on
>> mpmd you can have a DMA controller sucking pages out your own cache,
>> due to ram being marked as dirty and CPU will try to spare the time
>> and accelerate the operation to push DMA straigh out of L3 to
>> somewhere else (mentioning that sine some testers use crazy way of
>> forcing your ram access via DMA to somewhere and back to force droping
>> out of L3)
>> 3. This one is actually funny: some testers didn't claim the pages to
>> the process so for some reason pages that the were using were not
>> showing up as used / dirty etc so all the testing was done 32kB of L1
>> ... tests were fast thou :)
>>
>> srters.sh will test operation of the whole system !!! it shifts a lot
>> of data so disks are engaged, CPU keeps pumping out CRC32 all the time
>> so it's busy, RAM gets hit nicely as well due to high DMA.
> Agreed, never just trust memtest86 or memtest86+.
>
> FWIW< here's the routine I go through to test new RAM:
> 1. Run regular memtest86 for at least 3 full cycles in full SMP mode 
> (F2 while starting up to force SMP).  On some systems this may hang, 
> but that's an issue in the BIOS's setup of the CPU and MC, not the 
> RAM, and is generally not indicative of a system which will have issues.
> 2. Run regular memtest86 for at least 3 full cycles in regular UP mode 
> (the default on most non-NUMA hardware).
> 3. Repeat 1 and 2 with memtest86+.  It's diverged enough from regular 
> memtest86 that it's functionally a separate tool, and I've seen RAM 
> that passes one but not the other on multiple occasions before.
> 4. Boot SystemRescueCD, download a copy of the Linux sources, and run 
> as many allmodconfig builds in parallel as I have CPU's, each with a 
> number of make jobs equal to the twice number of CPU's (so each CPU 
> ends up running at least two threads).  This forces enough context 
> switching to completely trash even the L3 cache on almost any modern 
> processor, which means it forces things out to RAM.  It won't hit all 
> your RAM, but I've found it to be a relatively reliable way to verify 
> the memory bus and the memory controller work properly.
> 5. Still from SystemRescueCD, use a tool called memtester (essentially 
> memtest86, but run from userspace) to check the RAM.
> 6. Still from SystemRescueCD, use sha1sum to compute SHA-1 hashes of 
> all the disks in the system, using at least 8 instances of sha1sum per 
> CPU core, and make sure that all the sums for a disk match.
> 7. Do 6 again, but using cat to compute the sum of a concatenation of 
> all the disks in the system (so the individual commands end up being 
> `cat /dev/sd? | sha1sum`).  This will rapidly use all available memory 
> on the system and keep it in use for quite a while.
> 8. If I'm using my home server system, I also have a special virtual 
> runlevel set up where I spin up 4 times as many VM's as I have CPU 
> cores (so on my current 8 core system, I spin up 32), all assigned a 
> part of the RAM not used by the host (which I shrink to the minimum 
> useable size of about 500MB), all running steps 1-3 in parallel.
>
> It may also be worth mentioning that I've seen very poorly behaved 
> HBA's that produce symptoms that look like bad RAM, including issues 
> not related to the disks themselves, yet show no issues when regular 
> memory testing is run.
>>
>> When come to think about it, if your device points change during
>> operation of the system it might be an LSI card dying -> reinitialize
>> -> rediscovering drives -> drives show up in different point. On my
>> system I can hot swap sata and it will come up with different dev even
>> thou it was connected to same place on the controller.
> Barring a few odd controllers I've seen which support hot-plug but not 
> hot-remove, that shouldn't happen unless the device is in use, and in 
> that case it only happens because of the existing open references to 
> the device being held by whatever is using it.
>>
>> I think, most important - I presume you run nonECC ?
> And if not, how well shielded is your system?  You can often get by 
> with non-ECC RAM if you have good EMI shielding and reboot regularly.  
> Most servers actually do have good EMI shielding, and many pre-built 
> desktops do, but a lot of DIY systems don't (especially if ti's a 
> gaming case, the poly-carbonate windows many of them have in the side 
> panel are a _huge_ hole in the EMI shielding).


      reply	other threads:[~2016-07-09  5:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-06 22:14 raid1 has failing disks, but smart is clear Corey Coughlin
2016-07-06 22:59 ` Tomasz Kusmierz
2016-07-07  6:40   ` Corey Coughlin
2016-07-08  1:24     ` Duncan
2016-07-08  4:51       ` Corey Coughlin
2016-07-09  5:51       ` Andrei Borzenkov
2016-07-09  5:40     ` Andrei Borzenkov
2016-07-12  4:50       ` Corey Coughlin
2016-07-07 11:58   ` Austin S. Hemmelgarn
2016-07-08  4:50     ` Corey Coughlin
2016-07-08 11:14       ` Tomasz Kusmierz
2016-07-08 12:14         ` Austin S. Hemmelgarn
2016-07-09  5:13           ` Corey Coughlin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5780880E.1030004@gmail.com \
    --to=corey.coughlin.cc3@gmail.com \
    --cc=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=tom.kusmierz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).