From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 088C711188 for ; Mon, 5 Feb 2024 07:37:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707118656; cv=none; b=nbA0JItBylNTzH3s/iT735U+/HWrKG7Pmilj75P2skDmFWWdhAbXDvi8olH0BFcpOFixMKko2bEDkipeKDUcAZvlcLjnbXZ0bW6QuBcGd8qVmgxlGLEnX0obXLJ1PKzZs67YQiU+Ti+Bh3gvyJfPcmfCy9u0pmeMQeNl7RtSnhg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707118656; c=relaxed/simple; bh=UZxXIP99jFG/L0vfF6h6K40FExh0nFSZsbnMZ1elkec=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=or/g19QonXkak9ebdXrkrQU7pezxWMCeDogEua2aS/0vXgPFMTnzrXIQsojX7ZqHF5ov8BJJ/vP0UJLW+xvpfDtdeOijOuyHTr5w3xzcWDX2AHq96/t4y2kzWt3q6CBHU45qnBMB7ZuFFPOuKB/49shcr5ZmigLAi6sb2ZqdNVc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=KSWtJ2j5; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="KSWtJ2j5" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 4156sG1N013998; Mon, 5 Feb 2024 07:37:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : content-type : in-reply-to : content-transfer-encoding : mime-version; s=pp1; bh=LeC1stUUxfSLTwrGZrb3KVqa0lOCfnFHn5vFwLeJvgQ=; b=KSWtJ2j5XiNaIRyATJQqVtrX0LHaI/21gC5Bdr/zzK07N+1BTdTaIQKJZBRu5uQ4sNY6 wHnyjIX8gdV6VjtYXss1ngm6TQ+yaPlLYW5gUdN6JfRoiBpPfQO6m9moBakch7WC/gcX n81G5mPf6p/pSBkkbqOZEFfna1YprZFRzAk8cXsdWNmW4E/CAbTQmi0Pzm3wAsGB+tem B9fR4PZeqjW6aaw5e71RtxarJaRD0/Z+rr6cYlMGCSEb1L2/EQfWdzQi8mx0ejH9+Bd9 CuhSAH24NMOBhWIAHKEC7kYfMG9vLBFGPNo0F4otdyXwQdPHHT/F619hPqyIaHw2g+oS WQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3w2pyycxb7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 05 Feb 2024 07:37:22 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4157Ldnl005892; Mon, 5 Feb 2024 07:37:21 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3w2pyycxb0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 05 Feb 2024 07:37:21 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 4156wKom008818; Mon, 5 Feb 2024 07:37:21 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3w206y76td-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 05 Feb 2024 07:37:20 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4157bJbk17171052 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 5 Feb 2024 07:37:19 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E4FB62004B; Mon, 5 Feb 2024 07:37:18 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 62E8520040; Mon, 5 Feb 2024 07:37:16 +0000 (GMT) Received: from linux.ibm.com (unknown [9.204.201.194]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTPS; Mon, 5 Feb 2024 07:37:16 +0000 (GMT) Date: Mon, 5 Feb 2024 13:07:08 +0530 From: Vishal Chourasia To: Ze Gao Cc: Peter Zijlstra , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Steven Rostedt , Valentin Schneider , Vincent Guittot , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta Message-ID: References: <20240111115745.62813-2-zegao@tencent.com> Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: DoGiJ9iR4xNS4GVXGn4P70w_5WphT8Ng X-Proofpoint-ORIG-GUID: En60lxCsHdMEkEgyZD-FncFYq1DKjoCI Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-05_02,2024-01-31_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 clxscore=1015 bulkscore=0 priorityscore=1501 phishscore=0 impostorscore=0 suspectscore=0 adultscore=0 malwarescore=0 mlxlogscore=843 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402050056 On Sun, Feb 04, 2024 at 11:05:22AM +0800, Ze Gao wrote: > On Fri, Feb 2, 2024 at 7:50 PM Vishal Chourasia wrote: > > > > On Wed, Jan 24, 2024 at 10:32:08AM +0800, Ze Gao wrote: > > > > Hi, How are you setting custom request values for process A and B? > > > > > > I cherry-picked peter's commit[1], and adds a SCHED_QUANTA feature control > > > for testing w/o my patch. You can check out [2] to see how it works. > > > > > Thank you sharing your setup. > > > > Built the kernel according to [2] keeping v6.8.0-rc1 as base > > > > // NO_SCHED_QUANTA > > # perf script -i perf.data.old -s perf-latency.py > > PID 355045: Average Delta = 87.72726154385964 ms, Max Delta = 110.015044 ms, Count = 57 > > PID 355044: Average Delta = 92.2655679245283 ms, Max Delta = 110.017182 ms, Count = 53 > > > > // SCHED_QUANTA > > # perf script -i perf.data -s perf-latency.py > > PID 355065: Average Delta = 10.00 ms, Max Delta = 10.012708 ms, Count = 500 > > PID 355064: Average Delta = 9.959 ms, Max Delta = 10.023588 ms, Count = 501 > > > > # cat /sys/kernel/debug/sched/base_slice_ns > > 3000000 > > > > base slice is not being enforced. > > > > Next, Looking closing at the perf.data file > > > > # perf script -i perf.data -C 1 | grep switch > > ... > > stress-ng-cpu 355064 [001] 776706.003222: sched:sched_switch: stress-ng-cpu:355064 [120] R ==> stress-ng-cpu:355065 [120] > > stress-ng-cpu 355065 [001] 776706.013218: sched:sched_switch: stress-ng-cpu:355065 [120] R ==> stress-ng-cpu:355064 [120] > > stress-ng-cpu 355064 [001] 776706.023218: sched:sched_switch: stress-ng-cpu:355064 [120] R ==> stress-ng-cpu:355065 [120] > > stress-ng-cpu 355065 [001] 776706.033218: sched:sched_switch: stress-ng-cpu:355065 [120] R ==> stress-ng-cpu:355064 [120] > > ... > > > > Delta wait time is approx 0.01s or 10ms > > You can check out your HZ, which should be 100 in your settings > in my best guess.That explains your results. Yes. How much is it in your case? If I may ask. > > > So, switch is not happening at base_slice_ns boundary. > > > > But why? is it possible base_slice_ns is not properly used in > > arch != x86 ? > > The thing is in my RFC the effective quanta is actually > > max_t(u64, TICK_NSEC, sysctl_sched_base_slice) > > where sysctl_sched_base_slice is precisely a handy tunable knob > for users ( maybe i should make it loud and clear more ). > > See what I do in update_entity_lag(), you will understand. Thanks. I will look into it. > > Note we have 3 time related concepts here: > 1. TIME TICK: (schedule) accounting time unit. > 2. TIME QUANTA (not necessarily the effective one): scheduling time unit > 3. USER SLICE: time slice per request To double check, User slice is the request size submitted by a competing task for the time-shared resource (here, processor) against other competing tasks. Scheduler allocates time-shared resource (here, processor) in `q` quantum which is our TIME QUANTA TIME TICK is time period between two scheduler ticks. Thanks, -- vishal.c > > To implement latency-nice while being as fair as possible, We must > carefully consider the size relationship between them, and especially > the value range of USER SLICE due to the cold fact that the lag( > unfairness) is literally subject to both time quanta and user requested > slices. > > > Regards, > -- Ze > > > > > > > echo NO_SCHED_QUANTA > /sys/kernel/debug/sched/features > > > test > > > sleep 2 > > > echo SCHED_QUANTA > /sys/kernel/debug/sched/features > > > test > > > > > > > > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/kernel/sched?h=sched/eevdf&id=98866150f92f268a2f08eb1d884de9677eb4ec8f > > > [2]: https://github.com/zegao96/linux/tree/sched-eevdf > > > > > > > > > Regards, > > > -- Ze > > > > > > > > > > > > > stress-ng-cpu:10705 stress-ng-cpu:10706 > > > > > --------------------------------------------------------------------- > > > > > Slices(ms) 100 0.1 > > > > > Runtime(ms) 4934.206 5025.048 > > > > > Switches 58 67 > > > > > Average delay(ms) 87.074 73.863 > > > > > Maximum delay(ms) 101.998 101.010 > > > > > > > > > > In contrast, using sysctl_sched_base_slice as the size of a 'quantum' > > > > > in this patch gives us a better control of the allocation accuracy and > > > > > the avg latency: > > > > > > > > > > stress-ng-cpu:10584 stress-ng-cpu:10583 > > > > > --------------------------------------------------------------------- > > > > > Slices(ms) 100 0.1 > > > > > Runtime(ms) 4980.309 4981.356 > > > > > Switches 1253 1254 > > > > > Average delay(ms) 3.990 3.990 > > > > > Maximum delay(ms) 5.001 4.014 > > > > > > > > > > Furthmore, with sysctl_sched_base_slice = 10ms, we might benefit from > > > > > less switches at the cost of worse delay: > > > > > > > > > > stress-ng-cpu:11208 stress-ng-cpu:11207 > > > > > --------------------------------------------------------------------- > > > > > Slices(ms) 100 0.1 > > > > > Runtime(ms) 4983.722 4977.035 > > > > > Switches 456 456 > > > > > Average delay(ms) 10.963 10.939 > > > > > Maximum delay(ms) 19.002 21.001 > > > > > > > > > > By being able to tune sysctl_sched_base_slice knob, we can achieve > > > > > the goal to strike a good balance between throughput and latency by > > > > > adjusting the frequency of context switches, and the conclusions are > > > > > much close to what's covered in [1] with the explicit definition of > > > > > a time quantum. And it aslo gives more freedom to choose the eligible > > > > > request length range(either through nice value or raw value) > > > > > without worrying about overscheduling or underscheduling too much. > > > > > > > > > > Note this change should introduce no obvious regression because all > > > > > processes have the same request length as sysctl_sched_base_slice as > > > > > in the status quo. And the result of benchmarks proves this as well. > > > > > > > > > > schbench -m2 -F128 -n10 -r90 w/patch tip/6.7-rc7 > > > > > Wakeup (usec): 99.0th: 3028 95 > > > > > Request (usec): 99.0th: 14992 21984 > > > > > RPS (count): 50.0th: 5864 5848 > > > > > > > > > > hackbench -s 512 -l 200 -f 25 -P w/patch tip/6.7-rc7 > > > > > -g 10 0.212 0.223 > > > > > -g 20 0.415 0.432 > > > > > -g 30 0.625 0.639 > > > > > -g 40 0.852 0.858 > > > > > > > > > > [1]: https://dl.acm.org/doi/10.5555/890606 > > > > > [2]: https://lore.kernel.org/all/20230420150537.GC4253@hirez.programming.kicks-ass.net/T/#u > > > > > > > > > > Signed-off-by: Ze Gao > > > > > --- > > > > > > >