From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753859AbaHDRFU (ORCPT ); Mon, 4 Aug 2014 13:05:20 -0400 Received: from mail-bn1blp0188.outbound.protection.outlook.com ([207.46.163.188]:16227 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752053AbaHDRFP convert rfc822-to-8bit (ORCPT ); Mon, 4 Aug 2014 13:05:15 -0400 X-WSS-ID: 0N9SKRY-08-ZQD-02 X-M-MSG: Message-ID: <53DFBD2E.5070001@amd.com> Date: Mon, 4 Aug 2014 19:04:46 +0200 From: =?UTF-8?B?Q2hyaXN0aWFuIEvDtm5pZw==?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Maarten Lankhorst , CC: , , , , , Subject: Re: [PATCH 09/19] drm/radeon: handle lockup in delayed work, v2 References: <20140731153245.15061.63023.stgit@patser> <20140731153342.15061.54264.stgit@patser> <53DBC1EC.1010001@amd.com> <53DBD269.80807@canonical.com> <53DF462B.2060102@amd.com> <53DF4A7D.3040505@canonical.com> <53DF7516.2010408@amd.com> <53DF8BF2.4000309@canonical.com> <53DF9AC4.3010700@amd.com> <53DF9B58.8000403@canonical.com> <53DF9C88.6060107@amd.com> <53DF9F89.60202@canonical.com> <53DFA0EB.5040302@amd.com> <53DFA210.2020603@canonical.com> In-Reply-To: <53DFA210.2020603@canonical.com> Content-Type: text/plain; charset="utf-8"; format=flowed X-Originating-IP: [10.224.152.188] Content-Transfer-Encoding: 8BIT X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:165.204.84.222;CTRY:US;IPV:NLI;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(6009001)(428002)(377424004)(199002)(189002)(51704005)(83322001)(44976005)(80316001)(80022001)(65956001)(65806001)(23676002)(76482001)(77982001)(97736001)(4396001)(81342001)(86362001)(64126003)(64706001)(50466002)(74662001)(74502001)(47776003)(20776003)(81542001)(31966008)(36756003)(46102001)(106466001)(68736004)(85202003)(92566001)(95666004)(101416001)(87936001)(85182001)(102836001)(85852003)(76176999)(54356999)(83072002)(105586002)(83506001)(50986999)(33656002)(85306004)(84676001)(92726001)(99396002)(21056001)(65816999)(93886004)(79102001)(87266999)(107046002);DIR:OUT;SFP:;SCL:1;SRVR:BY2PR02MB042;H:atltwp02.amd.com;FPR:;MLV:sfv;PTR:InfoDomainNonexistent;MX:1;LANG:en; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID: X-Forefront-PRVS: 0293D40691 Authentication-Results: spf=none (sender IP is 165.204.84.222) smtp.mailfrom=Christian.Koenig@amd.com; X-OriginatorOrg: amd4.onmicrosoft.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 04.08.2014 um 17:09 schrieb Maarten Lankhorst: > op 04-08-14 17:04, Christian König schreef: >> Am 04.08.2014 um 16:58 schrieb Maarten Lankhorst: >>> op 04-08-14 16:45, Christian König schreef: >>>> Am 04.08.2014 um 16:40 schrieb Maarten Lankhorst: >>>>> op 04-08-14 16:37, Christian König schreef: >>>>>>> It'a pain to deal with gpu reset. >>>>>> Yeah, well that's nothing new. >>>>>> >>>>>>> I've now tried other solutions but that would mean reverting to the old style during gpu lockup recovery, and only running the delayed work when !lockup. >>>>>>> But this meant that the timeout was useless to add. I think the cleanest is keeping the v2 patch, because potentially any waiting code can be called during lockup recovery. >>>>>> The lockup code itself should never call any waiting code and V2 doesn't seem to handle a couple of cases correctly either. >>>>>> >>>>>> How about moving the fence waiting out of the reset code? >>>>> What cases did I miss then? >>>>> >>>>> I'm curious how you want to move the fence waiting out of reset, when there are so many places that could potentially wait, like radeon_ib_get can call radeon_sa_bo_new which can do a wait, or radeon_ring_alloc that can wait on radeon_fence_wait_next, etc. >>>> The IB test itself doesn't needs to be protected by the exclusive lock. Only everything between radeon_save_bios_scratch_regs and radeon_ring_restore. >>> I'm not sure about that, what do you want to do if the ring tests fail? Do you have to retake the exclusive lock? >> Just set need_reset again and return -EAGAIN, that should have mostly the same effect as what we are doing right now. > Yeah, except for the locking the ttm delayed workqueue, but that bool should be easy to save/restore. > I think this could work. Actually you could activate the delayed workqueue much earlier as well. Thinking more about it that sounds like a bug in the current code, because we probably want the workqueue activated before waiting for the fence. Christian. > > ~Maarten >