From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Widawsky Subject: Re: [PATCH 0/4] [RFC] use HW watchdog timer Date: Tue, 17 Jul 2012 11:51:18 -0700 Message-ID: <20120717115118.00e3da0b@bwidawsk.net> References: <1342464719-8790-1-git-send-email-ben@bwidawsk.net> <1342523579_7970@CP5-2952> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from cloud01.chad-versace.us (184-106-247-128.static.cloud-ips.com [184.106.247.128]) by gabe.freedesktop.org (Postfix) with ESMTP id 650AD9E88F for ; Tue, 17 Jul 2012 11:51:26 -0700 (PDT) In-Reply-To: <1342523579_7970@CP5-2952> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Chris Wilson Cc: Daniel Vetter , intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Tue, 17 Jul 2012 12:12:39 +0100 Chris Wilson wrote: > On Mon, 16 Jul 2012 11:51:55 -0700, Ben Widawsky wrote: > > Pros: > > * Potential for per batch, or ring watchdog values. I believe when/if we > > get to GPGPU workloads, this is particularly interesting. > > * Batch granularity hang detection. This mostly just makes hang > > detection and recovery a bit easier IMO. > > > > Cons: > > * Blit ring doesn't have an interrupt. This means we still need the > > software watchdog, and it makes hang detection more complex. I've been > > led to believe future HW *may* have this interrupt. > > * Semaphores > > Replacing the black magic for INSTDONE hang detection does seem like a > sensible plan, but as long as we require the hangcheck timer we are only > adding code complexity. So there really needs to a be a compelling > advantage for the watchdoy, something that we cannot acheive with the > existing method. Just to be clear, INSTDONE can go away. I don't think it's valuable for the blitter. > > For me, the criteria is whether we ever miss a hang or falsely accuse > the hw of stopping. If I understand the watchdog correctly, it basically > ensures the batch completes within a certain interval which we can > codify into the existing hangcheck, so no USP. Yeah. If we follow the windows model, I think we just tweak the value until we find something, "good" and just always reset on the timeout instead of doing instdone-foo. > > Or is there more magic waiting in the wings? > -Chris > The magic was only a more straightforward way of finding the batch to blame, and as I said on IRC, when I started I was planning to gut the whole SW watchdog; that was the magic. FWIW I think we may see the interrupt in future products; so it may still be worth considering whether we want to move in this direction. -- Ben Widawsky, Intel Open Source Technology Center