From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roger Heflin Subject: Re: RAID1 seems not to be able to scrub pending sectors shown by smart Date: Fri, 23 Dec 2011 16:26:39 -0600 Message-ID: <4EF5001F.8050409@gmail.com> References: <87hb0r2kvq.fsf@poker.hands.com> <878vm32dan.fsf@poker.hands.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <878vm32dan.fsf@poker.hands.com> Sender: linux-raid-owner@vger.kernel.org To: Philip Hands Cc: 'LinuxRaid' List-Id: linux-raid.ids On 12/23/2011 03:22 PM, Philip Hands wrote: > On Fri, 23 Dec 2011 13:59:21 -0600, Roger Heflin wrote: >> On Fri, Dec 23, 2011 at 12:39 PM, Philip Hands wrote: > ... >> I had 4 1.5tb seagate drives from 2009 (bought at different times in >> 2009) and 3 of those 4 started getting lots of bad sector all within a >> 2 month period and all 3 finally officially failed smart.and when the >> sectors (one after another...lucky they failed out aover 2-3 weeks so >> I had got the replacements in before I lost data-I was down to no >> redundancy for several days in the middle) were failing and being >> rewritten the performance was just ugly--so even if raid1 was >> rewriting the drives it does not do anything for performance when the >> drives are going bad...the only thing that solved my performance was >> getting all of the failing devices to finally fail smart so they could >> be RMAed and replaced at minimal cost.. > > Well, I suppose that's to some extent the reason I mentioned this. > > It seems to me that if a disk is throwing _loads_ of read errors, and > running dreadfully slowly, one could react to that by favouring > different disk(s), and only occasionally throwing a read at the duff > disk, until it either sorts itself out or dies. > > My performance went from rubbish to fine simply by removing the > 360-pending-sector disk from the RAID. OK, so if the problem is that > writes are being delayed by the dodgy disk, that's not easy to deal > with, but looking at the logs makes it look like the reads quite often > keep targeting the same disk even when several reads just failed and > got redirected. This seems suboptimal to me. > > Cheers, Phil. In mine I am pretty sure the reads being delayed was causing issues. I wonder if a patch might be possible that allows one to put an array into a mode (or go into said mode once a badblock condition has happened) that causes it to read from at least 2 possible data sources and return whichever gets there first...in the raid1 case it would read from another mirror (esp if one of the data sources was known to be flakey), in the raid5/6 case it would need to read one of the parity disks and calculate the correct data...that would appear to help in this sort of situation...in all other situations the extra reads would appear to hurt things, but it may produce less performance issues when these sorts of things happen). No idea how bad this would be to implement...and it won't help with the case where the writes are getting delayed because the reads are having serious issues with bad sectors, in this case the reads would continue to go through, but eventually I would think that enough writes backed up to cause things to stop anyway... The recent disk quality does appear to have gone downhill...with the previous 160-250 gb drives and the later 500gb drives I had not seen many issues...but the 1-2TB drives appear to be a mess and certainly don't appear to be aging well, nor the the initial quality appear to be that good either...