From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D239EB64DA for ; Fri, 30 Jun 2023 15:11:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232943AbjF3PLy (ORCPT ); Fri, 30 Jun 2023 11:11:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232504AbjF3PLs (ORCPT ); Fri, 30 Jun 2023 11:11:48 -0400 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDF1CEC for ; Fri, 30 Jun 2023 08:11:46 -0700 (PDT) Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4QszLQ5HWYz9sb4; Fri, 30 Jun 2023 17:11:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mailbox.org; s=mail20150812; t=1688137902; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ua30wMXPHiIzJC+EDhqRrAg0WgrJt2wqLGpN6dMTsnE=; b=XpHYz7NGak7KagCRXi00iobhb5iTDn6dzHL+UN7sESklYB+TogCXy613/A2FumWprXei0n 6X1A+RhnvZEqW4sdcc7JHjPN83cudu6zSpPXYaVt55mjbsviJpx/Yj6omqvvjmZpyVmgGU X9YqrixgneF+iC6Bh6qynctyb6oKn7kSrKU939SMTbxnqo/Z983nCYKGcAVR8Foqr8/8nr FZ9rFPwZBSAvTd5kV8RN2BY0PeOExEoqV16PjmZuZdv3kmlYdd58bknXvssFPQ/t55MLWh dQkKHb08UlemwcT98nwa1+vcWUXnMRTASVw1l2eN0jIxzrg8RrPxJnCYNA8ouQ== Message-ID: Date: Fri, 30 Jun 2023 17:11:38 +0200 MIME-Version: 1.0 Subject: Re: [PATCH v5 1/1] drm/doc: Document DRM device reset expectations Content-Language: en-CA To: Alex Deucher , Sebastian Wick Cc: pierre-eric.pelloux-prayer@amd.com, =?UTF-8?Q?Andr=c3=a9_Almeida?= , =?UTF-8?B?TWFyZWsgT2zFocOhaw==?= , dri-devel@lists.freedesktop.org, Randy Dunlap , linux-kernel@vger.kernel.org, Samuel Pitoiset , Pekka Paalanen , =?UTF-8?Q?Timur_Krist=c3=b3f?= , amd-gfx@lists.freedesktop.org, kernel-dev@igalia.com, alexander.deucher@amd.com, Pekka Paalanen , christian.koenig@amd.com References: <20230627132323.115440-1-andrealmeid@igalia.com> From: =?UTF-8?Q?Michel_D=c3=a4nzer?= In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-MBO-RS-ID: 852533966e5c2714efb X-MBO-RS-META: o68zfrpg5xx44heqt4sp9rw33djooghd Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/30/23 16:59, Alex Deucher wrote: > On Fri, Jun 30, 2023 at 10:49 AM Sebastian Wick > wrote: >> On Tue, Jun 27, 2023 at 3:23 PM André Almeida wrote: >>> >>> +Robustness >>> +---------- >>> + >>> +The only way to try to keep an application working after a reset is if it >>> +complies with the robustness aspects of the graphical API that it is using. >>> + >>> +Graphical APIs provide ways to applications to deal with device resets. However, >>> +there is no guarantee that the app will use such features correctly, and the >>> +UMD can implement policies to close the app if it is a repeating offender, >>> +likely in a broken loop. This is done to ensure that it does not keep blocking >>> +the user interface from being correctly displayed. This should be done even if >>> +the app is correct but happens to trigger some bug in the hardware/driver. >> >> I still don't think it's good to let the kernel arbitrarily kill >> processes that it thinks are not well-behaved based on some heuristics >> and policy. >> >> Can't this be outsourced to user space? Expose the information about >> processes causing a device and let e.g. systemd deal with coming up >> with a policy and with killing stuff. > > I don't think it's the kernel doing the killing, it would be the UMD. > E.g., if the app is guilty and doesn't support robustness the UMD can > just call exit(). It would be safer to just ignore API calls[0], similarly to what is done until the application destroys the context with robustness. Calling exit() likely results in losing any unsaved work, whereas at least some applications might otherwise allow saving the work by other means. [0] Possibly accompanied by a one-time message to stderr along the lines of "GPU reset detected but robustness not enabled in context, ignoring OpenGL API calls". -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and Xwayland developer