From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Widawsky Subject: [RFC] algorithm for handling bad cachelines Date: Tue, 27 Mar 2012 07:19:43 -0700 Message-ID: <20120327071943.061bba40@bwidawsk.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from cloud01.chad-versace.us (184-106-247-128.static.cloud-ips.com [184.106.247.128]) by gabe.freedesktop.org (Postfix) with ESMTP id 44E49A08B6 for ; Tue, 27 Mar 2012 07:19:49 -0700 (PDT) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org I wanted to run this by folks before I start doing any actual work. This is primarily for GPGPU, or perhaps *really* accurate rendering requirements. IVB+ has an interrupt to tell us when a cacheline seems to be going bad. There is also a mechanism to remap the bad cachelines. The implementation details aren't quite clear to me yet, but I'd like to enable this feature for userspace. Here is my current plan, but it involves filesystem access, so it's probably going to get a lot of flames. 1. Handle cache line going bad interrupt. 2. send a uevent 2.5 reset the GPU (docs tell us to) 3. Read a module parameter with a path in the filesystem of the list of bad lines. It's not clear to me yet exactly what I need to store, but it should be a relatively simple list. 4. Parse list on driver load, and handle as necessary. 5. goto 1. Probably the biggest unanswered question is exactly when in the HW loading do we have to finish remapping. If it can happen at any time while the card is running, I don't need the filesystem stuff, but I believe I need to remap the lines quite early in the device bootstrap. The only alternative I have is a huge comma separated string for a module parameter, but I kind of like reading the file better. Any feedback is highly appreciated. I couldn't really find much precedent for doing this in other drivers, so pointers to similar things would also be highly welcome. Ben