Re: FYI: i.MX8MP ISP (RKISP1) MI registers corruption

From: Stefan Klug <stefan.klug@ideasonboard.com>
To: "Dafna Hirschfeld" <dafna@fastmail.com>,
	"Krzysztof Hałasa" <khalasa@piap.pl>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>,
	Heiko Stuebner <heiko@sntech.de>,
	Paul Elder <paul.elder@ideasonboard.com>,
	Jacopo Mondi <jacopo.mondi@ideasonboard.com>,
	Ondrej Jirman <megi@xff.cz>,
	linux-media@vger.kernel.org, linux-rockchip@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: FYI: i.MX8MP ISP (RKISP1) MI registers corruption
Date: Mon, 21 Jul 2025 10:46:23 +0200	[thread overview]
Message-ID: <175308758352.3134829.9472501038683860006@localhost> (raw)
In-Reply-To: <m38qknx939.fsf@t19.piap.pl>

Hi Krzysztof,

Thanks for your investigations. This is all quite worrisome to say the
least.

Quoting Krzysztof Hałasa (2025-07-17 15:03:54)
> > The "reference" (NXP) VVCam driver simply does the operations twice
> > (i.e., reads twice with the first result discarded, and writes twice).
> > This fixes the problem on most accesses, but the problems still persist.
> 
> It appears the corruptions are quite frequent, though.
> Using "ldp qXX, qYY, [x0]" (2 * 128-bit load pair) I get results like
> this (each data row is a result of a single ldp):
> 
> addr:  32E21400 32E21404 32E21408 32E2140C 32E21410 32E21414 32E21418 32E2141C
> --------------------------------------------------------------------------------
> values 3D000007       20 3C300000   1FA400        0        0        0 3C380000 count 99993097
> values        0       20 3C300000   1FA400        0        0        0 3C380000 count 5930
> values    7E900       20 3C300000   1FA400        0        0        0 3C380000 count 338
> values    FD200       20 3C300000   1FA400        0        0        0 3C380000 count 223
> values 3C380000       20 3C300000   1FA400        0        0        0 3C380000 count 220
> values       40       20 3C300000   1FA400        0        0        0 3C380000 count 192
> 
> The valid value (in 0x32E21400 register) is 3D000007 only, the rest are
> corruptions: ca. 6 errors per 100k reads. With other registers, 15 errors
> per 100k reads etc.
> 
> I also got this:
> addr:  32E213F0 32E213F4 32E213F8 32E213FC 32E21400 32E21404 32E21408 32E2140C
> --------------------------------------------------------------------------------
> values        0        0        0        0 3D000007       20 3C300000   1FA400 count 98638773
> values        0        0        0        0        0       20 3C300000   1FA400 count 1330227
> values        0        0        0        0       40       20 3C300000   1FA400 count 3721
> values        0        0        0        0 3C380000       20 3C300000   1FA400 count 314
> values        0        0        0        0    7E900       20 3C300000   1FA400 count 572
> values        0        0        0        0    FD200       20 3C300000   1FA400 count 428
> values        0        0        0        0    4C010       20 3C300000   1FA400 count 25965
> 
> which is ca. 14 errors per 1k reads, though maybe it's special -
> non-MI/MI boundary (at 0x32E21400), reserved addresses (0x32E213Fx) etc.
> 
> > The problems show themselves maybe in 5% or 10% of boots.

How do you detect if the current boot was a "faulty" one?

> 
> Well, now it appears more like 20%: e.g. in 41 system runs (soft reboots
> only, no power-downs), I got problems 8 times.
> 
> Obviously I can post the tester source if anyone is interested.

I'd like to have a look there. I'm doing a lot of work on the imx8mp at
the moment. I didn't have too many issues other than the ones caused by
myself. There were however a few start/stop issues that are still on my
list for further investigations. But I didn't observe bigger memory
corruptions. So I think this ties into my earlier question of how you
observe that the device is in a bad state.

I'm mostly using the debix boards right now, so I could test if I can
replicate the behavior there.

Can you also share the kernel you are using?

Best regards,
Stefan

> 
> 
> Generally ISP MI register read accesses which can be corrupted are:
> - first 32 bits read in a given transfer, and additionally
> - every 32 bits on a 32-byte boundary (addresses 0x....00, ...20 etc.).
> 
> This means, in practice, on i.MX8MP only, RKISP1_CIF_MI_CTRL and
> RKISP1_CIF_MI_MP_CB_SIZE_INIT (with the workaround).
> 
> What is this 32-byte boundary?
> 
> Writing is a bigger problem, though.
> -- 
> Krzysztof "Chris" Hałasa
> 
> Sieć Badawcza Łukasiewicz
> Przemysłowy Instytut Automatyki i Pomiarów PIAP
> Al. Jerozolimskie 202, 02-486 Warszawa
>