From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Brown Subject: Re: nonzero mismatch_cnt with no earlier error Date: Mon, 26 Feb 2007 15:36:04 +1100 Message-ID: <17890.25524.465403.130119@notabene.brown> References: <45DF859B.2050108@eyal.emu.id.au> <45DF8DE3.2050903@eyal.emu.id.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: message from Eyal Lebedinsky on Saturday February 24 Sender: linux-raid-owner@vger.kernel.org To: Eyal Lebedinsky Cc: Justin Piszcz , linux-raid list List-Id: linux-raid.ids On Saturday February 24, eyal@eyal.emu.id.au wrote: > But is this not a good opportunity to repair the bad stripe for a very > low cost (no complete resync required)? In this case, 'md' knew nothing about an error. The SCSI layer detected something and thought it had fixed it itself. Nothing for md to do. > > At time of error we actually know which disk failed and can re-write > it, something we do not know at resync time, so I assume we always > write to the parity disk. md only knows of a 'problem' if the lower level driver reports one. If it reports a problem for a write request, md will fail the device. If it reports a problem for a read request, md will try to over-write correct data on the failed block. But if the driver doesn't report the failure, there is nothing md can do. When performing a check/repair md looks for consistencies and fixes the 'arbitrarily'. For raid5/6, it just 'corrects' the parity. For raid1/10, it chooses one block and over-writes the other(s) with it. Mapping these corrections back to blocks in files in the filesystem is extremely non-trivial. NeilBrown