From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesse Barnes Subject: Re: [RFC] algorithm for handling bad cachelines Date: Wed, 28 Mar 2012 10:26:52 -0700 Message-ID: <20120328102652.2a930985@jbarnes-desktop> References: <20120327071943.061bba40@bwidawsk.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0911389095==" Return-path: Received: from oproxy1-pub.bluehost.com (oproxy1-pub.bluehost.com [66.147.249.253]) by gabe.freedesktop.org (Postfix) with SMTP id DED50A0E0A for ; Wed, 28 Mar 2012 10:26:55 -0700 (PDT) In-Reply-To: <20120327071943.061bba40@bwidawsk.net> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Ben Widawsky Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org --===============0911389095== Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/eDMIH8++lLv+YR5xxnkFB6o"; protocol="application/pgp-signature" --Sig_/eDMIH8++lLv+YR5xxnkFB6o Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Tue, 27 Mar 2012 07:19:43 -0700 Ben Widawsky wrote: > I wanted to run this by folks before I start doing any actual work. >=20 > This is primarily for GPGPU, or perhaps *really* accurate rendering > requirements. >=20 > IVB+ has an interrupt to tell us when a cacheline seems to be going bad. > There is also a mechanism to remap the bad cachelines. The > implementation details aren't quite clear to me yet, but I'd like to > enable this feature for userspace. >=20 > Here is my current plan, but it involves filesystem access, so it's > probably going to get a lot of flames. >=20 > 1. Handle cache line going bad interrupt. > > 2. send a uevent > 2.5 reset the GPU (docs tell us to) > > 3. Read a module parameter with a path in the filesystem > of the list of bad lines. It's not clear to me yet exactly what I need > to store, but it should be a relatively simple list. > 4. Parse list on driver load, and handle as necessary. > 5. goto 1. >=20 > Probably the biggest unanswered question is exactly when in the HW > loading do we have to finish remapping. If it can happen at any time > while the card is running, I don't need the filesystem stuff, but I > believe I need to remap the lines quite early in the device bootstrap. >=20 > The only alternative I have is a huge comma separated string for a > module parameter, but I kind of like reading the file better. >=20 > Any feedback is highly appreciated. I couldn't really find much > precedent for doing this in other drivers, so pointers to similar > things would also be highly welcome. I think the main thing here is to make sure we handle the L3 parity check interrupts. I don't think "lines going bad" will be very common in practice (maybe if you really abuse your CPU by putting it in the freezer and then into an oven or something), so having a fancy interface for it probably isn't too important. Also, the behavior should be configurable. For the vast majority of users, an L3 parity interrupt is no big deal, and resetting the GPU when we see one is more than we want. So it should probably be off by default and be controlled with a module parameter and maybe a config option. As for the interface for feeding in bad lines (useful for testing if nothing else), I'd prefer an ioctl over a module parameter. Some as yet uncoded userspace service can load a set based on previously collected uevents at boot time before any real GPU stuff runs... Reading a file at load time is definitely a non-starter; we have no idea whether the filesystem will be available to the module based on mount points, hiding, chroots, etc. --=20 Jesse Barnes, Intel Open Source Technology Center --Sig_/eDMIH8++lLv+YR5xxnkFB6o Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJPc0ncAAoJEIEoDkX4Qk9hvh8P/35xpMxkgFcopkuD0emzG+qi SlTNGiw66S9q1/XulqyxA2NCLQy/7wiKbcDeCrv7voY9E7BpCRVex76Xta88St+D wEWvptDB4NYZwhlrQAZX5vozpzuaYYNVuHaW269xbdlMTSCtQ3pGJYL2YsBxFB6D wTnQqyggGOMlhj3ZVT3P6qYEyNo6UjkMzF6EHIpho3UtSzsjwixrngUOMgjdVe29 L5lGezzbThKxJDJaSEv1yKLZLQ+Si1LZ11xnnFeRe9Q1BKGE1AxEi50qsEYCVuOW E0Mctcsz9VzJisfVUTsmmygFFqZwdxUHzab1wZBy26Rvn5G2lpVSQaR4IlltiTrs 8iMyyImqMo4epLERKY0SzV9dS3b+lzao9/dPM9NQXCRH3FNE5ESi5WsxEV031ZX2 2N9PN43OvKomrtWmy3dPgy/8AEd2lyyy1w9DXa+t8dnZPL9MF7jxZy9HdJgumGCu pNlkqddmHYSjm5SG72E2LgnHHu5JFcU9E/srpCpoO10QT/jFNvL4cofcFR27qHM/ hhfwNhVISC2ZVq5/cBxzWjNRsa6av8lEJ8QiTpT88RydXR+h9D/2tuYd3YjlExN7 JxmJJ3pI8pJqd0JZptKrF+nBcmkn2GbyQi6MwxrwlSAnK0L3oh+owJAEjzsx/gaz bFwpOcz8O1rDsIFOdfk9 =wPnK -----END PGP SIGNATURE----- --Sig_/eDMIH8++lLv+YR5xxnkFB6o-- --===============0911389095== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx --===============0911389095==--