From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C9AC23C512 for ; Fri, 1 May 2026 13:05:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777640728; cv=none; b=jjuEuRsNNOWLWyu/lgUe9KkgefrlXMskOSjGc9B6MkPTv9/+EAzutd8pr6uUcrxHbnabI/TmNX7y0byxHeEQX5Yemn72mRmyCs1cApTmANF4QZOGpDQhG1EBWLA7oih8J0DlSZDqG4P/flfzE69S204WEUiB0ub7LUorHcqiMDs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777640728; c=relaxed/simple; bh=1JzIvlgcIWjPOdZ82s7EJ/OHWwTFiFoDLl6dopudwI0=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=dqtxLnImaX2xuM5UXcwWLTP2ZIEQNvtfHW5L/uOC9WwiTsV2ruCiq6JHyvMY5u4MVVtfmgv15wpoFEbLxpNqx047sBNmoUEQhjZ8NuOo4c12+5LVb2EvoeFMhv/bEHb4jYmGMoaCrU2xD9H2U2dsYr8ERJz4nKKuywe3PYVWvGc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=edCbb7E2; arc=none smtp.client-ip=209.85.221.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="edCbb7E2" Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-44a5174670eso430137f8f.1 for ; Fri, 01 May 2026 06:05:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777640725; x=1778245525; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=rFbuTRj/0a7+yuWvb8D+q1nQr4rEjwxGQCQuoEtlsCA=; b=edCbb7E2VsAB92SbPM9G9SW0BFA65wXl2/IgXtg5c7LNAhm68TCtdlTurUO0DCr8iM geh1LCW+cOTCa50jUJ4qNo9Ex6G9/dMhFa9Rcr67SxAMmIaJCAxijBwQ0Yrvo4SLVxL8 cms/JsdME8uC5HY8wqoIDhIhSuUBy4BaT1IypWwmrziu7UH5dsyvsvEK0i/kC+WdTl1q SfiY//zLExuCyYW33xgI1+VspiVFcdRlWvDBRltjrHWirHH7Ro80+rfpU6PMptccxDRT nFFB+OYXN1rl/oEu8yFAwTtYmTODInwKU32Il5hoxxUAa+T8SbYHSB8b0GICamfHEubj Uquw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777640725; x=1778245525; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=rFbuTRj/0a7+yuWvb8D+q1nQr4rEjwxGQCQuoEtlsCA=; b=CSVHWYjvStNAQP28DWHlQWmAUCdj97nz+nV2a+h7bV01bMmT7KSdBfFi4PrHdIjzmC MfFlYdKqDqb3vkav+3K1uVvE9sa1rP+NqpCwfqjJLqpzr3YmQHQvbDlVLh2Vm4BIyQ2Z cDO8YmlBt3uNdBZQDx3Sf/53Jotr/7cO1Hjb1C1r2HHftWF1Hutpq8MwQP5ERULdfMCk 7JQ5CHNlE1jbbPj3GHx1wkfuUHiX1Wj8xBKzABeiAyRD5GO6lcqqMliG3gd0wj12+YQo 02GJFDUnVFKexF9Dx/WCZwnWsluW284SjNzpH/eOgk8jecTFhnPtnaP/W7TeA/7PDrAK zybg== X-Forwarded-Encrypted: i=1; AFNElJ/zbhfbTc+iC8b01ryxeOt3guOu0tYxQ64tFGZ3B1Kq1MyryB4L4ibj6czFVW8dckqllH6v5DS1pH/ORjg=@vger.kernel.org X-Gm-Message-State: AOJu0YyMArTsYnjq1Yav8T7jhReqaPN0N+i9T8LHR1cv1OSBZkRQ+SI6 WHrSGS12kkPfJN/ObopUKd4hjtgbu9FKVIoCh2Vn8Q8NqivY2dg9EYSp X-Gm-Gg: AeBDiespPvyfR3ny0NHNGOEY2OBRiJQ9ztuiO4W2VVu2AIXh8yVbWOgByXzrliUmbOX KsXjmobSeE1Ew5ckWpstGAGWqpti7YNU4TSZzseW17WoAK9JMGAN584lHiyv9yh06wiVWFMQPZF nIp4qOXpU8rwF1KGoXBz+iZQcNfNuBd4kw5wxJxjCsZzFQxIYwf3eYmilZb9qX0VNrp0a4J6zCz UQHnRt29BektyfnoNNhZd9JfFBBR71bghXrKT633xLUx57MT76oDge6F17TA3U83mdoSatYvHRL fk1dREv5qRDzuUiaHSTfeRpyZKNy7ZpelnHPhcu8t1cO1WgX7krozNSFVvGisx7Q3OBqFSsiZNN sjA/CYKzXYsWgUJEv+rMlzAPsS720dPR+IVbiqwhkvoUQ8ETSNqY09+bcanjYMbPmjoBzQ9sxR5 YS6VkfjyXhY8dwhV8ui8xf0YjN4taX87n+5OBgUWB3FDentMiGY8WkwEho4eH+EGWLDpej5SJU5 iE= X-Received: by 2002:a05:6000:15ca:b0:449:e8c0:fd64 with SMTP id ffacd0b85a97d-449e8c0fe91mr5890734f8f.32.1777640725090; Fri, 01 May 2026 06:05:25 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a8ef50be1sm5366622f8f.9.2026.05.01.06.05.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 May 2026 06:05:24 -0700 (PDT) Date: Fri, 1 May 2026 14:05:21 +0100 From: David Laight To: Peter Zijlstra Cc: K Prateek Nayak , Zhan Xusheng , Ingo Molnar , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, Zhan Xusheng , Dietmar Eggemann , Valentin Schneider , Ben Segall , Steven Rostedt , Mel Gorman Subject: Re: [PATCH RESEND] sched/fair: Fix overflow in vruntime_eligible() Message-ID: <20260501140521.27be883d@pumpkin> In-Reply-To: <20260501104006.GA3102624@noisy.programming.kicks-ass.net> References: <20260415145742.10359-1-zhanxusheng@xiaomi.com> <20260428144951.121765-1-zhanxusheng@xiaomi.com> <57e94d7e-b8fb-49bc-a914-8a7401e43af3@amd.com> <20260428173521.GK3126523@noisy.programming.kicks-ass.net> <4870ea19-78d4-44b2-9f18-14c3f8782726@amd.com> <20260501104006.GA3102624@noisy.programming.kicks-ass.net> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Fri, 1 May 2026 12:40:06 +0200 Peter Zijlstra wrote: > On Wed, Apr 29, 2026 at 10:33:48AM +0530, K Prateek Nayak wrote: > > > > Anyway... let me construct a worst case. > > > > > > So if you have this cgroup crap on, then you can have an entity of > > > weight 2, and vlag should then be bounded by: (slice+TICK_NSEC) * > > > NICE_0_LOAD, which is around 44 bits as per the comment on entity_key(). > > > > > > The other end is 100*NICE_0_LOAD, so lets wake that, then you get: > > > > > > {key, weight}[] := { > > > puny: { (slice + TICK_NSEC) * NICE_0_LOAD, 2 }, > > > max: { 0, 100*NICE_0_LOAD }, > > > } > > > > So MAX_SHARES is (1UL << 18) and we then scale it up so the weight can > > can go up to (1 << 28) I suppose in UP setting. > > > > So let us assume there is a single task with the largest custom slice on > > each cgroup making the min_slice equal to 100ms. Things start off in > > such a way that puny having a vruntime of say -1 is picked over max which > > had a vruntime of 0 for the sake of simplicity. > > > > Deadline then becomes: > > > > puny->deadline = -1 + __calc_delta(100ms, NICE_0_LOAD, 2) > > > > which is 52428800000000 - 1. > > > > At 10HZ tick + RUN_TO_PARITY, we get 10ms error so that puts the > > entity_key() at 57671680000000 (46 bits) > > > > > > > > The avg_vruntime() would end up being very close to 0 (which is > > > zero_vruntime), so no real help making that more accurate. > > > > > > vruntime_eligible(puny) ends up with: > > > > > > avg = 2 * puny.key (+ 0) > > > weight = 2 + 100 * NICE_0_LOAD > > > > > > avg >= puny.key * weight > > > > > > And that is: (slice + TICK_NSEC) * NICE_0_LOAD * NICE_0_LOAD * 100 > > > > and yes, I suppose that can exceed 64bit :-( > > > > Then the worst case right side would be: > > > > 57671680000000 * load /* load = ((1 << 28) + 2) */ > > > > and yes that goes beyond 64-bits at 15481123719086080000000. > > > > I'm running with this configurations: > > > > echo 2 > /sys/fs/cgroup/cg0/cpu.weight > > echo 10000 > /sys/fs/cgroup/cg1/cpu.weight > > > > Weight of 2 actually gets scaled up to 20480 for UP scenario so I go try > > to make it even smaller with SMP and wait until it gets picked again. > > Make sure all of the actual activity of that cgroup happens 'elsewhere' > and you can get it down to an actual 2 per calc_group_shares(). > > > without any splat so I'm not sure if there is something that prevents > > a possible crash since a weight of 104857600 should have definitely > > made that entity_key() overflow. > > Right. > > Anyway, I had a poke around with godbolt, and the below seems to > generate the best code for things like x86_64 and arm64. > > Specifically, the __builtin_mul_overflow() already has to compute the > 128 bit product anyway for most architectures, so using that directly > then leads to saner asm and easier to understand code. Older versions of gcc (maybe before 10?) generate a call to the full 128 bit multiply (_muldid3 ?) on mips64 - rather than just using the instruction that generates the high part of the product. The library function is now included but is slightly more complex because it does the high*low multiples as well. The same happens on sparc64 and the asm multiply code looks strange. > > AFAICT HPPA64 is the only 64bit architecture that doesn't implement > __int128 and will thus be demoted to doing what we do on 32bit. > > Now all I need to do is write a coherent Changelog to go with it ... > *sigh* > > --- > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 728965851842..214fe9d99834 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -904,7 +904,7 @@ static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime) > load += weight; > } > > - return avg >= vruntime_op(vruntime, "-", cfs_rq->zero_vruntime) * load; > + return avg >= (sched_double_long_t)vruntime_op(vruntime, "-", cfs_rq->zero_vruntime) * load; > } > > int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se) > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 9f63b15d309d..f15f4468efe5 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -146,7 +146,7 @@ extern struct list_head asym_cap_list; > * Really only required when CONFIG_FAIR_GROUP_SCHED=y is also set, but to > * increase coverage and consistency always enable it on 64-bit platforms. > */ > -#ifdef CONFIG_64BIT > +#if defined(CONFIG_64BIT) && defined(__SIZEOF_INT128__) > # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT) > # define scale_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT) > # define scale_load_down(w) \ > @@ -157,10 +157,12 @@ extern struct list_head asym_cap_list; > __w = max(2UL, __w >> SCHED_FIXEDPOINT_SHIFT); \ > __w; \ > }) > +typedef __int128 sched_double_long_t; > #else > # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT) > # define scale_load(w) (w) > # define scale_load_down(w) (w) > +typedef s64 sched_double_long_t; Seems strange that this is signed but the 64bit one is unsigned. -- David > #endif > > /* >