From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4E42363C49 for ; Fri, 20 Mar 2026 11:25:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.178 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774005941; cv=none; b=CMZsUlbjWUCsKLvcXZmdDGCjhcd3+np1VdmDVcA72JafEgH2PjfNbLQNHR4M1tIfRtnLkkjePJq9BvYx7/U/kKF8kUOVEW/TvEgrCFOICs11qTS+Nui+9KX9WXUq5oQnlgzI8Nb7uDWmgsfDmYyM8U27eBSm9DdM9UswsaZ5Cmg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774005941; c=relaxed/simple; bh=Cij4vJAkzziawCPn0GPxVxs6d3jcL4C+/ulAGD7lN3w=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=FFKGrdkkkVjw03+WgC/Ywt61YXaZqvtG2svZDHt+sdmh9936gaRqWxSxfR7hO/6k9j2gUFcYQ4wou+qjoIxzzkqvxulCp5k710gGRV4LtVhIkVEeCzz9BV5HkE3qV6P69/fVeaizSgLAM+0pyol5WiiZF7NwPxKmZZdaAFm0Zts= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=hO9pNdsk; arc=none smtp.client-ip=91.218.175.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="hO9pNdsk" Message-ID: <0dd4b5ac-a9c4-4aff-a67a-ae43de10857e@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1774005936; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kXQ/vYp6rCDQW4EhV/tYDmE/1pTk+r6ZdveQdIc7YLU=; b=hO9pNdskisE1/n235biSZGNG+brbHkO37EHqicP+P0Ut7HMvAKvILlgRLvyTV2YCnCU5kf HxQgNnYlFI0TKYr+ajPgXWEDeb7XSZAXBv/os2v0afkUIo/DmyoWHXc1D5JgYa0xe8VPLN 98xtUeWnfvElBM7HTssIK2dOzesvoiI= Date: Fri, 20 Mar 2026 12:25:31 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH] drm/sched: Add new KUnit test suite for concurrent job submission To: Tvrtko Ursulin , Philipp Stanner Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Matthew Brost , Danilo Krummrich , =?UTF-8?Q?Christian_K=C3=B6nig?= , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Simona Vetter References: <20260219140711.3296237-1-marco.pagani@linux.dev> <4793f2a5-9b08-4cd8-8e1e-d8a3a7125299@linux.dev> <3ac5825a-95e3-4b9a-aa7e-9b0107fa7b6b@ursulin.net> <7281dcf3-5c1a-47fa-81d1-ebf701d2cf81@linux.dev> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Marco Pagani In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 19/03/2026 09:50, Tvrtko Ursulin wrote: > > > On 18/03/2026 17:23, Marco Pagani wrote: > > 8>< > >>>>> Alternative approach could be to set a per test time budget and just >>>>> keep the workers submitting until over. It would be simpler to >>>>> understand and there would be more submit/complete overlap. >>>> >>>> I agree. Using a test time budget and having workers continuously >>>> submit jobs until it expires would make better use of the test time. >>>> I'm thinking that the simplest and most straightforward approach would >>>> be to cyclically distribute periods among workers until they reach the >>>> the largest possibile value below test duration, which would coincide >>>> with the hyperperiod. This would also solve the issue of selecting a >>>> suitable periods_cycle parameter that you mentioned earlier. >>>> In practice, something like this: >>>> >>>> drm_sched_interleaved_params [...] >>>> { >>>> .num_workers = N >>>> .test_max_time = T >>>> .job_base_period_us = P /* Small GPU job, 100 us */ >>>> } >>>> >>>> period_us = job_base_period_us; >>>> for (i = 0; i < params->num_workers; i++) { >>>> workers[i].period_us = period_us; >>>> period_us *= 2; >>>> if (period_us > test_max_time) >>>> period_us = job_base_period_us; >>>> } >>>> >>>> >>>> What do you think? >>> >>> >>> Again some time has passed so rather than going to re-read your patch I >>> will go from memory. IIRC I was thinking something really dumb and 100% >>> time bound with no need to think when coding and reviewing. Each thread >>> simply does: >>> >>> ktime_t start = ktime_get(); >>> >>> do { >>> .. thread doing its submission pattern thing .. >>> } while (ktime_to_ms(ktime_sub(ktime_get(), start)) < test->time_ms); >>> >>> May miss the time target by a job period_us but who cares. >> >> Sorry for the delay. I got pulled into other things. I left out the worker >> execution part since we already agreed on that. Instead, I've replied with >> some pseudocode describing a new strategy for period assignments from test >> parameters that takes into account your comments. > > Sorry I misread when I saw test_max_time clamping I thought it was about > runtime control. I guess it makes sense to clamp it to avoid > over-shooting by too much. You removed the cyclical nature so I guess in > practice this will not happen? I mean number of workers vs base period > you don't expect more than one of them to get clamped? I haven't removed the cyclical nature of workers submitting jobs. I omitted that part because I thought we already agreed on it. Anyway, I realized that unfortunately the strategy of using harmonic periods to overlap submissions makes no sense given how the mock scheduler serializes jobs into a single "execution" line. I'm now thinking that using a narrower range of (multiple) submission periods would be more effective to stress concurrent submissions. I'm also thinking that splitting the single execution time budget into equal shares among workers, and then computing the number of jobs that fit into that share, is simpler and better suited for a test case compared to a time-based approach. Let me share some pseudocode for this new approach: /* Parameters (test_duration must be larger than base_period) */ drm_sched_interleaved_params [...] { .num_workers = ... /* 8 to 32 */ .test_duration = ... /* Few seconds */ .base_period = ... /* 100 us, small GPU job */ } /* Setup phase common for all workers. */ workers_share = params->test_duration / params->num_workers; /* Worker */ drm_sched_interleaved_worker() { period = (worker->id + 1) * base_period; num_jobs = workers_share / period; for (i = 0; i < num_jobs; i++) { drm_mock_sched_job_set_duration_us(period); /* submit and wait for the job to complete */ } } What do you think? Thanks, Marco