From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEDBDC7619A for ; Wed, 5 Apr 2023 15:38:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238794AbjDEPiY (ORCPT ); Wed, 5 Apr 2023 11:38:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237845AbjDEPiV (ORCPT ); Wed, 5 Apr 2023 11:38:21 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C400549F1 for ; Wed, 5 Apr 2023 08:37:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680709055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PAoSWbkGrjcqGy1ebcwtm3AyobUtd3KUJObW/aFOL0w=; b=aFGzp+QjTh0wubo3/hdEY1PHMxvJJPyI1X4gsfw5jQNa4o/0/EgpRjcbXF7BHc6PgGCQqn S4DfJ3l6j4lO16nAfAvikyw9Bbi+oNXHp/TmXsH9D2Fy6/YkGqOpB2CXPlqD0iScdEv/WC avvcdTZmrxlwT5nvNoLfIqX23wOpGyQ= Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-255-T_ZCm0htMLOJwGEJVndOpg-1; Wed, 05 Apr 2023 11:37:22 -0400 X-MC-Unique: T_ZCm0htMLOJwGEJVndOpg-1 Received: by mail-pf1-f200.google.com with SMTP id t67-20020a628146000000b0062d6d838243so14145349pfd.21 for ; Wed, 05 Apr 2023 08:37:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680709021; h=content-transfer-encoding:in-reply-to:cc:organization:from :references:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PAoSWbkGrjcqGy1ebcwtm3AyobUtd3KUJObW/aFOL0w=; b=fdSnmK7NGnkOE3D0U8y7zfuCUTbIOR+ViOhcIGCahL23BrCCx69UQor3h7VXAsfmpM MxbI+eZWbRUFMGZkun3BDTUzAtUkwTPScpQhnD+tUa6c6MLRJGm8qJGLdXuH5CuS8Zad 2O6cm8dW4B0gY95ClTsPMN9WJBpkQM1t35q8Eds6walJRlcXtQDj4e9zp9Z2vDr2IAIN 3RoAYAN7vXUdL11CM72DL4cncJU32g1tvH1QRWH/7z5Tpn0jejBYtjOhL7uwfynFrNWn ieqAtkZc9jdV++U0kjGF2f8QCYUpOUkVxItmNeYIR2frX05x9Oww3pyAm3gGt7dQEooy Eh8g== X-Gm-Message-State: AAQBX9euDNk8bAYiY1oujdidlN+ec4hhX2PzOTvYoIAU5y2EzPcqVJ02 5IpXiQ/w+/nP1QyV2Nv2P01mewqxm6xzxNNXdQuE9QUw6TiT1YkYwbThfavYyCKYmpuvTd9N8YT vSyx8nqbKQxYav1PMDnDMQmwV X-Received: by 2002:a17:902:fb86:b0:1a1:cce7:94ed with SMTP id lg6-20020a170902fb8600b001a1cce794edmr5875815plb.67.1680709020723; Wed, 05 Apr 2023 08:37:00 -0700 (PDT) X-Google-Smtp-Source: AKy350Ysj1lHSmWHRrIOh5XGakc+yhw2pI8tXqwln7bNPtMABRm3Ds3ebM09ogVIeJ4YwQVaZ5dCsg== X-Received: by 2002:a17:902:fb86:b0:1a1:cce7:94ed with SMTP id lg6-20020a170902fb8600b001a1cce794edmr5875793plb.67.1680709020353; Wed, 05 Apr 2023 08:37:00 -0700 (PDT) Received: from ?IPV6:2a02:810d:4b3f:de78:642:1aff:fe31:a15c? ([2a02:810d:4b3f:de78:642:1aff:fe31:a15c]) by smtp.gmail.com with ESMTPSA id jj21-20020a170903049500b001a19196af48sm10262513plb.64.2023.04.05.08.36.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 05 Apr 2023 08:37:00 -0700 (PDT) Message-ID: <3004a2bf-e725-643e-82af-8a217784e796@redhat.com> Date: Wed, 5 Apr 2023 16:05:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [Regression] drm/scheduler: track GPU active time per entity Content-Language: en-US To: Luben Tuikov , Lucas Stach , daniel@ffwll.ch, Dave Airlie , Bagas Sanjaya , andrey.grodzovsky@amd.com, =?UTF-8?Q?Christian_K=c3=b6nig?= , tvrtko.ursulin@linux.intel.com, Matthew Brost , yuq825@gmail.com, Boris Brezillon , lina@asahilina.net References: <3e00d8a9-b6c4-8202-4f2d-5a659c61d094@redhat.com> <2a84875dde6565842aa07ddb96245b7d939cb4fd.camel@pengutronix.de> <8b28151c-f2db-af3f-8dff-87dd5d57417b@amd.com> From: Danilo Krummrich Organization: RedHat Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org In-Reply-To: <8b28151c-f2db-af3f-8dff-87dd5d57417b@amd.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/4/23 06:31, Luben Tuikov wrote: > On 2023-03-28 04:54, Lucas Stach wrote: >> Hi Danilo, >> >> Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich: >>> Hi all, >>> >>> Commit df622729ddbf ("drm/scheduler: track GPU active time per entity") >>> tries to track the accumulated time that a job was active on the GPU >>> writing it to the entity through which the job was deployed to the >>> scheduler originally. This is done within drm_sched_get_cleanup_job() >>> which fetches a job from the schedulers pending_list. >>> >>> Doing this can result in a race condition where the entity is already >>> freed, but the entity's newly added elapsed_ns field is still accessed >>> once the job is fetched from the pending_list. >>> >>> After drm_sched_entity_destroy() being called it should be safe to free >>> the structure that embeds the entity. However, a job originally handed >>> over to the scheduler by this entity might still reside in the >>> schedulers pending_list for cleanup after drm_sched_entity_destroy() >>> already being called and the entity being freed. Hence, we can run into >>> a UAF. >>> >> Sorry about that, I clearly didn't properly consider this case. >> >>> In my case it happened that a job, as explained above, was just picked >>> from the schedulers pending_list after the entity was freed due to the >>> client application exiting. Meanwhile this freed up memory was already >>> allocated for a subsequent client applications job structure again. >>> Hence, the new jobs memory got corrupted. Luckily, I was able to >>> reproduce the same corruption over and over again by just using >>> deqp-runner to run a specific set of VK test cases in parallel. >>> >>> Fixing this issue doesn't seem to be very straightforward though (unless >>> I miss something), which is why I'm writing this mail instead of sending >>> a fix directly. >>> >>> Spontaneously, I see three options to fix it: >>> >>> 1. Rather than embedding the entity into driver specific structures >>> (e.g. tied to file_priv) we could allocate the entity separately and >>> reference count it, such that it's only freed up once all jobs that were >>> deployed through this entity are fetched from the schedulers pending list. >>> >> My vote is on this or something in similar vain for the long term. I >> have some hope to be able to add a GPU scheduling algorithm with a bit >> more fairness than the current one sometime in the future, which >> requires execution time tracking on the entities. > > Danilo, > > Using kref is preferable, i.e. option 1 above. I think the only real motivation for doing that would be for generically tracking job statistics within the entity a job was deployed through. If we all agree on tracking job statistics this way I am happy to prepare a patch for this option and drop this one: https://lore.kernel.org/all/20230331000622.4156-1-dakr@redhat.com/T/#u Christian mentioned amdgpu tried something similar to what Lucas tried running into similar trouble, backed off and implemented it in another way - a driver specific way I guess? > > Lucas, can you shed some light on, > > 1. In what way the current FIFO scheduling is unfair, and > 2. shed some details on this "scheduling algorithm with a bit > more fairness than the current one"? > > Regards, > Luben > >> >>> 2. Somehow make sure drm_sched_entity_destroy() does block until all >>> jobs deployed through this entity were fetched from the schedulers >>> pending list. Though, I'm pretty sure that this is not really desirable. >>> >>> 3. Just revert the change and let drivers implement tracking of GPU >>> active times themselves. >>> >> Given that we are already pretty late in the release cycle and etnaviv >> being the only driver so far making use of the scheduler elapsed time >> tracking I think the right short term solution is to either move the >> tracking into etnaviv or just revert the change for now. I'll have a >> look at this. >> >> Regards, >> Lucas >> >>> In the case of just reverting the change I'd propose to also set a jobs >>> entity pointer to NULL once the job was taken from the entity, such >>> that in case of a future issue we fail where the actual issue resides >>> and to make it more obvious that the field shouldn't be used anymore >>> after the job was taken from the entity. >>> >>> I'm happy to implement the solution we agree on. However, it might also >>> make sense to revert the change until we have a solution in place. I'm >>> also happy to send a revert with a proper description of the problem. >>> Please let me know what you think. >>> >>> - Danilo >>> >> >