From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtpauth03.csee.onr.siteprotect.com ([64.26.60.137])
	by bombadil.infradead.org with esmtp (Exim 4.68 #1 (Red Hat Linux))
	id 1KVCHn-0004VR-US
	for linux-mtd@lists.infradead.org; Mon, 18 Aug 2008 21:29:12 +0000
Message-ID: <48A9E99E.7070302@boundarydevices.com>
Date: Mon, 18 Aug 2008 14:29:02 -0700
From: Troy Kisky <troy.kisky@boundarydevices.com>
MIME-Version: 1.0
To: Frans Meulenbroeks <fransmeulenbroeks@gmail.com>
Subject: Re: [RESUBMIT] [PATCH] [MTD] NAND nand_ecc.c: rewrite for improved
	performance
References: <alpine.DEB.0.99.0807311031390.6774@frans-desktop>	
	<ac9c93b10808150223h6fb6032co61ffa7babba57884@mail.gmail.com>	
	<1218793271.3184.77.camel@pmac.infradead.org>	
	<ac9c93b10808150304k14f054faw402a7f67e868d2a@mail.gmail.com>	
	<1218795140.3184.84.camel@pmac.infradead.org>	
	<48A5D154.2000409@boundarydevices.com>	
	<alpine.DEB.0.99.0808152300300.9220@frans-desktop>	
	<48A8937D.1010007@boundarydevices.com>	
	<ac9c93b10808172333l1f6771b2w1404f9ac771d3058@mail.gmail.com>	
	<48A9AF73.2040105@boundarydevices.com>
	<ac9c93b10808181409uca8dc59ja62f43a217f4a969@mail.gmail.com>
In-Reply-To: <ac9c93b10808181409uca8dc59ja62f43a217f4a969@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: linux-mtd@lists.infradead.org, David Woodhouse <dwmw2@infradead.org>
List-Id: Linux MTD discussion mailing list <linux-mtd.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/listinfo/linux-mtd>,
	<mailto:linux-mtd-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-mtd>
List-Post: <mailto:linux-mtd@lists.infradead.org>
List-Help: <mailto:linux-mtd-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-mtd>,
	<mailto:linux-mtd-request@lists.infradead.org?subject=subscribe>

Frans Meulenbroeks wrote:
> 2008/8/18 Troy Kisky <troy.kisky@boundarydevices.com>:
>> Frans Meulenbroeks wrote:
>>> Yes, the NSLU2 had a filesystem that was created before the patch was applied.
>>> But actually I think the filesystem is irrelevant.
>>> I verified the proper operation by comparing the values generated by
>>> the original code with the values generated by my code over a set of
>>> input blocks.
>>> Guess there is no endianness dependency and that if the data is big
>>> endian the ecc is too.
>> Does that make logical sense to you? The correction routine
>> accesses the data as a byte and flips a bit. If it accessed it as
>> an uint32 and flipped the bit, then I can see that there would be
>> no endianness dependency. I'm not suggesting you do that, as it would be
>> incompatible with current ecc, just explaining my logic. I'd very
>> much appreciate an explanation of why I'm wrong. I would expect
>> big endian ecc to have 4 bits differences whenever the entire
>> block parity is odd. These would be the bits that select the byte
>> within the uint32.
> 
> Troy, did a further investigation.
> Your explanation is correct. My test program had a flaw causing this
> case to be undetected.
> Indeed in case of odd parity the 4 bits selecting the byte are flipped
> on big endian systems.
> (little endian is ok).

I appreciate you digging into it, as I don't have a big endian system.

> 
> Still looking at what the best way to fix it. In the code you posted
> before you used __cpu_to_le64s.
> Not sure why you are using the 64 variant. As it is an uint32_t, I
> would expect __cpu_to_le32s to suffice.
My bug.
> 
> Then again I am not too eager to use that function as it generates
> some overhead. I'd rather use the builtin gcc macro __BIG_ENDIAN__ (in
> that case I can just use an #ifdef to distinguish the two cases and in
> case of BE no byte swapping is needed.
> What is your opinion on this?

I agree.

> 
> Frans.
> 
> PS: wrt the 11 bits check for the other message. Can't really envision
> why this fails, but maybe it is just too late.
> If you have an ecc and a faulty 256 byte data block that would be
> erroneously accepted by my code and that would be rightfully rejected
> by the original code, I'll be more than happy to change it.
> Performancewise the difference is very small and it is a rare
> situation anyway. The original test is definitely more rigid than just
> the nr of bits test.
> 

(ignoring inversions)
Example: You have a block of all zeros.

The ecc stored in the spare bytes of this is also 0.
Now, upon reading this block of zeroes, a two bit ecc occurs. The bits that happen to be
read incorrectly are bit # 0 & bit # 0x3f of the block
The hardware calculated ecc will be
0:0 ^ 0:fff = 0:fff after bit 0
0:fff ^ 3f:fc00 = 3f:3f after bit 3f

Now, when your algorithm counts bits you get 12, and decide
it is a single bit ecc error.

The old way however will xor the high and low 12 bits 3f ^ 3f = 0, 0 != fff and
decide it is multi bit ecc error and give an error.

Note, that both approaches would have decide it was a single bit error, if the second
error wouldn't have happened.


So, try a block of zeroes and flip bits 0 and 0x3f.

Troy