From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f193.google.com (mail-yw1-f193.google.com [209.85.128.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A8C6D1DDC00; Fri, 8 Aug 2025 05:48:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.193 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754632115; cv=none; b=fYJVacXtWaAGSNs0TupZpn8c9t0pOxq8DGH5c3y6ZpwM5zwGdKuI+8WaS+ZIlO3f7+ir7pirFEO5V5G8suQEl5zLoE3eSeB2+CapgUWyySRXJM+LCmejN38MzivRXDjFF81PmPwEMwTB6JTfpFvBeK7CJ1LtESpRRnJxi22hWkA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754632115; c=relaxed/simple; bh=BsKMVhW1RwdzzwaPHW+GINN4uXfjMnD9TVxoFvjWWXc=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=QTdgQarjExeyLk8XUNPbLjm3+deBjIPdwpJ5Cl9/qgL/DeMseSS2bLezq03cKMM9onnGlwQtL3j92TBRE0eps+O+rr47xBkhFraJP1UslMXRG9V665yPJ53BM9oH0fs4vB2YilMUpUukF5tVUWu8KKouwr/ohRtBGljyH4OfWbs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Tygq6F2H; arc=none smtp.client-ip=209.85.128.193 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Tygq6F2H" Received: by mail-yw1-f193.google.com with SMTP id 00721157ae682-71a39f93879so29919157b3.0; Thu, 07 Aug 2025 22:48:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1754632113; x=1755236913; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hLqrzlrDfbEKtEKET8ucndG4ByXhyVlHtqjMtTWnHag=; b=Tygq6F2H2FSxlwKjTcP9GZhHdLcsZiKSOAJlZQd2le7k7G7uC7ZSPEoGNs+NBFNXNC aZKjCNTX2lQTdMzQC5esehOL8/D02au9b8fZDiOfJLsMorjcNEFjAgrqIun56b6dlk7G ZcnK2lKqodXKj3pCItCjDfzY8HvKGU1Nj+AKylPLtYQqnaUG30QYZUEUWznHOmli1+CO fOk01mdE1ynrwhVUHoDotTn9YCUu3c4YwYeSLa5+1wQ433I2iZNKuh+0QoWolXPw20pN tnzHcPF9RywY316CcPOGWZKVtatnNCxxjF0rHDbbgcGTDb1/IWvbvBkSZawre/HkwmLG wmkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754632113; x=1755236913; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hLqrzlrDfbEKtEKET8ucndG4ByXhyVlHtqjMtTWnHag=; b=hjsYGDYEDmq/9OWHFFF5R6lKMf8ZK2SIDJBKGPSNzFVyO5KSDf8kYPNVZnMpaG8hRr WApKkB1um4p0HVFlF++0NbTj83thKUhsNcj5ARfN/abvZdhldHqRJYpfMpBLv9HASUx8 eZBrDslBapVUwPeMwe3oylKM3UfoESObk9yS/8sY7D8vyGcs95Q3SMTnIiX4mftFnZkL SVakQAmDH5398jDR3zlPnxyv82Qj3sEXAzWFkzq1v6tsZW8ZuVgOqRF00mK+cI8hliaD wd39R01/r/CFFaFDZA3AiPEDU/luCk6LnVZ2EPcVn9BFuVftxlIC6ITJG8Q0JhEXYVEz 2l9Q== X-Forwarded-Encrypted: i=1; AJvYcCUtlxePx3xhZ3qLHNGeGsVOvh7y4d5S27QNlRQ6B11BVEiv7JfciRQST7CuElQ/T2qqMHW/MWEfr7tXZUsm@vger.kernel.org, AJvYcCV73VvG83Bjlrvl1Rjz+Qgv5e1vqfA3l8mShwvfFv6lGxC8WJ+SeXPEl+yUZVo/a2tpLTNTIfy2@vger.kernel.org, AJvYcCVoNTLzVhLaIuRSBRGIfAT8WY/6QAObxSE0dBbAAMGzwV27fkCsfWctPCPp0iPHA7VGk80=@vger.kernel.org X-Gm-Message-State: AOJu0YyYmnTBLfda7qPZdpJjbSskYY7XkQjIYR1Q7OJBCVj/sYIVVY6s qrFHBgleyRMH3/KvsbLNr9uOTwlxFJMLDhrbdWdug1xDj3G1Q3ZLucYO39zbTIeraTxvb0ZqSqg EGKunC1DDUbJJz7Ig8v2bKVYbCxLurbBK1+LlBwbkog== X-Gm-Gg: ASbGncsmDoWnl3QnOUoISmxi3jwF4ytoZ5mi/2gT0m73+Hb6MPAbeC8691m6m9YgXL7 fw8vBvZel5X6XLVAWyNjMa8mgKpycndvBxxV8je7Fsf3cijEWNFyNIjVd03FjE3VwOppNzhXfcH NntZZDoRNj0wUvLBHiODw5eZu7KSj5Tn3fp5J6js9IGcJV8eJjOpwFXAytJSm2JCtpadlzHG2iA LsEmik= X-Google-Smtp-Source: AGHT+IEj5zs3nnPi143lY/APy71L6pBO8WZRfMTjPDy/s9FFUdV2hEFO6nFNLxqlqM2ZPcfVfD7KsJWF7dONZ8jYgl4= X-Received: by 2002:a05:690c:7482:b0:71b:6a28:8625 with SMTP id 00721157ae682-71bdae79826mr88828227b3.4.1754632112419; Thu, 07 Aug 2025 22:48:32 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20250703121521.1874196-1-dongml2@chinatelecom.cn> <20250703121521.1874196-3-dongml2@chinatelecom.cn> <45f4d349-7b08-45d3-9bec-3ab75217f9b6@linux.dev> <3bccb986-bea1-4df0-a4fe-1e668498d5d5@linux.dev> <20250716182414.GI4105545@noisy.programming.kicks-ass.net> In-Reply-To: From: Menglong Dong Date: Fri, 8 Aug 2025 13:48:21 +0800 X-Gm-Features: Ac12FXzHIGJ8Ws_t1eKyFpKJcyamwUGDGfRZi6GT3IcZp0wuDlgQ2aeguNrAJWE Message-ID: Subject: Re: Inlining migrate_disable/enable. Was: [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for global trampoline To: Alexei Starovoitov Cc: Peter Zijlstra , Menglong Dong , Steven Rostedt , Jiri Olsa , bpf , Martin KaFai Lau , Eduard Zingerman , LKML , Network Development Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Aug 8, 2025 at 8:58=E2=80=AFAM Alexei Starovoitov wrote: > > On Wed, Aug 6, 2025 at 1:44=E2=80=AFAM Menglong Dong wrote: > > > > On Fri, Aug 1, 2025 at 12:15=E2=80=AFAM Alexei Starovoitov > > wrote: > > > > > > On Mon, Jul 28, 2025 at 2:20=E2=80=AFAM Menglong Dong wrote: > > > > > > > > On Thu, Jul 17, 2025 at 6:35=E2=80=AFAM Alexei Starovoitov > > > > wrote: > > > > > > > > > > On Wed, Jul 16, 2025 at 11:24=E2=80=AFAM Peter Zijlstra wrote: > > > > > > > > > > > > On Wed, Jul 16, 2025 at 09:56:11AM -0700, Alexei Starovoitov wr= ote: > > > > > > > > > > > > > Maybe Peter has better ideas ? > > > > > > > > > > > > Is it possible to express runqueues::nr_pinned as an alias? > > > > > > > > > > > > extern unsigned int __attribute__((alias("runqueues.nr_pinned")= )) this_nr_pinned; > > > > > > > > > > > > And use: > > > > > > > > > > > > __this_cpu_inc(&this_nr_pinned); > > > > > > > > > > > > > > > > > > This syntax doesn't actually seem to work; but can we construct > > > > > > something like that? > > > > > > > > > > Yeah. Iant is right. It's a string and not a pointer dereference. > > > > > It never worked. > > > > > > > > > > Few options: > > > > > > > > > > 1. > > > > > struct rq { > > > > > +#ifdef CONFIG_SMP > > > > > + unsigned int nr_pinned; > > > > > +#endif > > > > > /* runqueue lock: */ > > > > > raw_spinlock_t __lock; > > > > > > > > > > @@ -1271,9 +1274,6 @@ struct rq { > > > > > struct cpuidle_state *idle_state; > > > > > #endif > > > > > > > > > > -#ifdef CONFIG_SMP > > > > > - unsigned int nr_pinned; > > > > > -#endif > > > > > > > > > > but ugly... > > > > > > > > > > 2. > > > > > static unsigned int nr_pinned_offset __ro_after_init __used; > > > > > RUNTIME_CONST(nr_pinned_offset, nr_pinned_offset) > > > > > > > > > > overkill for what's needed > > > > > > > > > > 3. > > > > > OFFSET(RQ_nr_pinned, rq, nr_pinned); > > > > > then > > > > > #include > > > > > > > > > > imo the best. > > > > > > > > I had a try. The struct rq is not visible to asm-offsets.c, so we > > > > can't define it in arch/xx/kernel/asm-offsets.c. Do you mean > > > > to define a similar rq-offsets.c in kernel/sched/ ? It will be more > > > > complex than the way 2, and I think the second way 2 is > > > > easier :/ > > > > > > 2 maybe easier, but it's an overkill. > > > I still think asm-offset is cleaner. > > > arch/xx shouldn't be used, of course, since this nr_pinned should > > > be generic for all archs. > > > We can do something similar to drivers/memory/emif-asm-offsets.c > > > and do that within kernel/sched/. > > > rq-offsets.c as you said. > > > It will generate rq-offsets.h in a build dir that can be #include-d. > > > > > > I thought about another alternative (as a derivative of 1): > > > split nr_pinned from 'struct rq' into its own per-cpu variable, > > > but I don't think that will work, since rq_has_pinned_tasks() > > > doesn't always operate on this_rq(). > > > So the acceptable choices are realistically 1 and 3 and > > > rq-offsets.c seems cleaner. > > > Pls give it another try. > > > > Generally speaking, the way 3 works. The only problem is how > > we handle this_rq(). I introduced following code in > > include/linux/sched.h: > > > > struct rq; > > DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); > > #define this_rq_ptr() arch_raw_cpu_ptr(&runqueues) > > > > The this_rq_ptr() is used in migrate_enable(). I have to use the > > arch_raw_cpu_ptr() for it. this_cpu_ptr() can't be used here, as > > it will fail on this_cpu_ptr -> raw_cpu_ptr -> __verify_pcpu_ptr: > > > > #define __verify_pcpu_ptr(ptr) \ > > do { \ > > const void __percpu *__vpp_verify =3D (typeof((ptr) + 0))NULL; \ > > (void)__vpp_verify; \ > > } while (0) > > > > The struct rq is not available here, which makes the typeof((ptr) + 0) > > fail during compiling. What can we do here? > > Interesting. > The comment says: > * + 0 is required in order to convert the pointer type from a > * potential array type to a pointer to a single item of the array. > > so maybe we can do some macro magic to avoid '+ 0' > when type is already pointer, > but for now let's proceed with arch_raw_cpu_ptr(). OK > > > According to my testing, the performance of fentry increased from > > 111M/s to 121M/s with migrate_enable/disable inlined. > > Very nice. > > > Following is the whole patch: > > -----------------------------------------------------------------------= -------------------- > > diff --git a/Kbuild b/Kbuild > > index f327ca86990c..13324b4bbe23 100644 > > --- a/Kbuild > > +++ b/Kbuild > > @@ -34,13 +34,24 @@ arch/$(SRCARCH)/kernel/asm-offsets.s: > > $(timeconst-file) $(bounds-file) > > $(offsets-file): arch/$(SRCARCH)/kernel/asm-offsets.s FORCE > > $(call filechk,offsets,__ASM_OFFSETS_H__) > > > > +# Generate rq-offsets.h > > + > > +rq-offsets-file :=3D include/generated/rq-offsets.h > > + > > +targets +=3D kernel/sched/rq-offsets.s > > + > > +kernel/sched/rq-offsets.s: $(offsets-file) > > + > > +$(rq-offsets-file): kernel/sched/rq-offsets.s FORCE > > + $(call filechk,offsets,__RQ_OFFSETS_H__) > > + > > # Check for missing system calls > > > > quiet_cmd_syscalls =3D CALL $< > > cmd_syscalls =3D $(CONFIG_SHELL) $< $(CC) $(c_flags) > > $(missing_syscalls_flags) > > > > PHONY +=3D missing-syscalls > > -missing-syscalls: scripts/checksyscalls.sh $(offsets-file) > > +missing-syscalls: scripts/checksyscalls.sh $(rq-offsets-file) > > $(call cmd,syscalls) > > > > # Check the manual modification of atomic headers > > diff --git a/include/linux/preempt.h b/include/linux/preempt.h > > index 1fad1c8a4c76..3a1c08a75c09 100644 > > --- a/include/linux/preempt.h > > +++ b/include/linux/preempt.h > > @@ -369,64 +369,6 @@ static inline void preempt_notifier_init(struct > > preempt_notifier *notifier, > > > > #endif > > > > -/* > > - * Migrate-Disable and why it is undesired. > > Keep the comment where it is. It will keep the diff smaller. > There is really no need to move it. OK > > > - * > > - * When a preempted task becomes elegible to run under the ideal model= (IOW it > > but fix the typos. OK > > > - * becomes one of the M highest priority tasks), it might still have t= o wait > > - * for the preemptee's migrate_disable() section to complete. Thereby = suffering > > - * a reduction in bandwidth in the exact duration of the migrate_disab= le() > > - * section. > > - * > > - * Per this argument, the change from preempt_disable() to migrate_dis= able() > > - * gets us: > > - * > > - * - a higher priority tasks gains reduced wake-up latency; with > > preempt_disable() > > - * it would have had to wait for the lower priority task. > > - * > > - * - a lower priority tasks; which under preempt_disable() could've in= stantly > > - * migrated away when another CPU becomes available, is now constrai= ned > > - * by the ability to push the higher priority task away, which > > might itself be > > - * in a migrate_disable() section, reducing it's available bandwidth= . > > - * > > - * IOW it trades latency / moves the interference term, but it stays i= n the > > - * system, and as long as it remains unbounded, the system is not full= y > > - * deterministic. > > - * > > - * > > - * The reason we have it anyway. > > - * > > - * PREEMPT_RT breaks a number of assumptions traditionally held. By fo= rcing a > > - * number of primitives into becoming preemptible, they would also all= ow > > - * migration. This turns out to break a bunch of per-cpu usage. To thi= s end, > > - * all these primitives employ migirate_disable() to restore this impl= icit > > - * assumption. > > - * > > - * This is a 'temporary' work-around at best. The correct solution is = getting > > - * rid of the above assumptions and reworking the code to employ expli= cit > > - * per-cpu locking or short preempt-disable regions. > > - * > > - * The end goal must be to get rid of migrate_disable(), alternatively= we need > > - * a schedulability theory that does not depend on abritrary migration= . > > and this one. OK > > > - * > > - * > > - * Notes on the implementation. > > - * > > - * The implementation is particularly tricky since existing code patte= rns > > - * dictate neither migrate_disable() nor migrate_enable() is allowed t= o block. > > - * This means that it cannot use cpus_read_lock() to serialize against= hotplug, > > - * nor can it easily migrate itself into a pending affinity mask chang= e on > > - * migrate_enable(). > > - * > > - * > > - * Note: even non-work-conserving schedulers like semi-partitioned dep= ends on > > - * migration, so migrate_disable() is not only a problem for > > - * work-conserving schedulers. > > - * > > - */ > > -extern void migrate_disable(void); > > -extern void migrate_enable(void); > > - > > /** > > * preempt_disable_nested - Disable preemption inside a normally > > preempt disabled section > > * > > @@ -471,7 +413,6 @@ static __always_inline void preempt_enable_nested(v= oid) > > > > DEFINE_LOCK_GUARD_0(preempt, preempt_disable(), preempt_enable()) > > DEFINE_LOCK_GUARD_0(preempt_notrace, preempt_disable_notrace(), > > preempt_enable_notrace()) > > -DEFINE_LOCK_GUARD_0(migrate, migrate_disable(), migrate_enable()) > > hmm. why? Because we moved migrate_disable and migrate_enable to include/linux/sched.h, which makes them not available in include/linux/preempt.h, so we need to move this line to include/linux/sched.h too. > > > #ifdef CONFIG_PREEMPT_DYNAMIC > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index 40d2fa90df42..365ac6d17504 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -48,6 +48,9 @@ > > #include > > #include > > #include > > +#ifndef COMPILE_OFFSETS > > +#include > > +#endif > > > > /* task_struct member predeclarations (sorted alphabetically): */ > > struct audit_context; > > @@ -2299,4 +2302,127 @@ static __always_inline void > > alloc_tag_restore(struct alloc_tag *tag, struct allo > > #define alloc_tag_restore(_tag, _old) do {} while (0) > > #endif > > > > +#if defined(CONFIG_SMP) && !defined(COMPILE_OFFSETS) > > + > > +extern void __migrate_enable(void); > > + > > +struct rq; > > +DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); > > +#define this_rq_ptr() arch_raw_cpu_ptr(&runqueues) > > + > > +/* > > + * Migrate-Disable and why it is undesired. > > + * > > + * When a preempted task becomes elegible to run under the ideal model= (IOW it > > + * becomes one of the M highest priority tasks), it might still have t= o wait > > + * for the preemptee's migrate_disable() section to complete. Thereby = suffering > > + * a reduction in bandwidth in the exact duration of the migrate_disab= le() > > + * section. > > + * > > + * Per this argument, the change from preempt_disable() to migrate_dis= able() > > + * gets us: > > + * > > + * - a higher priority tasks gains reduced wake-up latency; with > > preempt_disable() > > + * it would have had to wait for the lower priority task. > > + * > > + * - a lower priority tasks; which under preempt_disable() could've in= stantly > > + * migrated away when another CPU becomes available, is now constrai= ned > > + * by the ability to push the higher priority task away, which > > might itself be > > + * in a migrate_disable() section, reducing it's available bandwidth= . > > + * > > + * IOW it trades latency / moves the interference term, but it stays i= n the > > + * system, and as long as it remains unbounded, the system is not full= y > > + * deterministic. > > + * > > + * > > + * The reason we have it anyway. > > + * > > + * PREEMPT_RT breaks a number of assumptions traditionally held. By fo= rcing a > > + * number of primitives into becoming preemptible, they would also all= ow > > + * migration. This turns out to break a bunch of per-cpu usage. To thi= s end, > > + * all these primitives employ migirate_disable() to restore this impl= icit > > + * assumption. > > + * > > + * This is a 'temporary' work-around at best. The correct solution is = getting > > + * rid of the above assumptions and reworking the code to employ expli= cit > > + * per-cpu locking or short preempt-disable regions. > > + * > > + * The end goal must be to get rid of migrate_disable(), alternatively= we need > > + * a schedulability theory that does not depend on abritrary migration= . > > + * > > + * > > + * Notes on the implementation. > > + * > > + * The implementation is particularly tricky since existing code patte= rns > > + * dictate neither migrate_disable() nor migrate_enable() is allowed t= o block. > > + * This means that it cannot use cpus_read_lock() to serialize against= hotplug, > > + * nor can it easily migrate itself into a pending affinity mask chang= e on > > + * migrate_enable(). > > + * > > + * > > + * Note: even non-work-conserving schedulers like semi-partitioned dep= ends on > > + * migration, so migrate_disable() is not only a problem for > > + * work-conserving schedulers. > > + * > > + */ > > +static inline void migrate_enable(void) > > +{ > > + struct task_struct *p =3D current; > > + > > +#ifdef CONFIG_DEBUG_PREEMPT > > + /* > > + * Check both overflow from migrate_disable() and superfluous > > + * migrate_enable(). > > + */ > > + if (WARN_ON_ONCE((s16)p->migration_disabled <=3D 0)) > > + return; > > +#endif > > + > > + if (p->migration_disabled > 1) { > > + p->migration_disabled--; > > + return; > > + } > > + > > + /* > > + * Ensure stop_task runs either before or after this, and that > > + * __set_cpus_allowed_ptr(SCA_MIGRATE_ENABLE) doesn't schedule(). > > + */ > > + guard(preempt)(); > > + __migrate_enable(); > > You're leaving performance on the table. > In many case bpf is one and only user of migrate_enable/disable > and it's not nested. > So this call is likely hot. > Move 'if (p->cpus_ptr !=3D &p->cpus_mask)' check into .h > and only keep slow path of __set_cpus_allowed_ptr() in .c Oops, my mistake, I should do it this way :/ > > Can probably wrap it with likely() too. > > > + /* > > + * Mustn't clear migration_disabled() until cpus_ptr points back a= t the > > + * regular cpus_mask, otherwise things that race (eg. > > + * select_fallback_rq) get confused. > > + */ > > + barrier(); > > + p->migration_disabled =3D 0; > > + (*(unsigned int *)((void *)this_rq_ptr() + RQ_nr_pinned))--; > > +} > > + > > +static inline void migrate_disable(void) > > +{ > > + struct task_struct *p =3D current; > > + > > + if (p->migration_disabled) { > > +#ifdef CONFIG_DEBUG_PREEMPT > > + /* > > + *Warn about overflow half-way through the range. > > + */ > > + WARN_ON_ONCE((s16)p->migration_disabled < 0); > > +#endif > > + p->migration_disabled++; > > + return; > > + } > > + > > + guard(preempt)(); > > + (*(unsigned int *)((void *)this_rq_ptr() + RQ_nr_pinned))++; > > + p->migration_disabled =3D 1; > > +} > > +#else > > +static inline void migrate_disable(void) { } > > +static inline void migrate_enable(void) { } > > +#endif > > + > > +DEFINE_LOCK_GUARD_0(migrate, migrate_disable(), migrate_enable()) > > + > > #endif > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 399f03e62508..75d5f145ca60 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -23853,8 +23853,7 @@ int bpf_check_attach_target(struct > > bpf_verifier_log *log, > > BTF_SET_START(btf_id_deny) > > BTF_ID_UNUSED > > #ifdef CONFIG_SMP > > -BTF_ID(func, migrate_disable) > > -BTF_ID(func, migrate_enable) > > +BTF_ID(func, __migrate_enable) > > #endif > > #if !defined CONFIG_PREEMPT_RCU && !defined CONFIG_TINY_RCU > > BTF_ID(func, rcu_read_unlock_strict) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 3ec00d08d46a..b521024c99ed 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -119,6 +119,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_runnin= g_tp); > > EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp); > > > > DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); > > +EXPORT_SYMBOL_GPL(runqueues); > > why? Because the runqueues referenced in migrate_enable/migrate_disable directly, and they can be used in modules, we need to export the runqueues. Isn't it? > > > > > #ifdef CONFIG_SCHED_PROXY_EXEC > > DEFINE_STATIC_KEY_TRUE(__sched_proxy_exec); > > @@ -2375,28 +2376,7 @@ static void migrate_disable_switch(struct rq > > *rq, struct task_struct *p) > > __do_set_cpus_allowed(p, &ac); > > } > > > > -void migrate_disable(void) > > -{ > > - struct task_struct *p =3D current; > > - > > - if (p->migration_disabled) { > > -#ifdef CONFIG_DEBUG_PREEMPT > > - /* > > - *Warn about overflow half-way through the range. > > - */ > > - WARN_ON_ONCE((s16)p->migration_disabled < 0); > > -#endif > > - p->migration_disabled++; > > - return; > > - } > > - > > - guard(preempt)(); > > - this_rq()->nr_pinned++; > > - p->migration_disabled =3D 1; > > -} > > -EXPORT_SYMBOL_GPL(migrate_disable); > > - > > -void migrate_enable(void) > > +void __migrate_enable(void) > > { > > struct task_struct *p =3D current; > > struct affinity_context ac =3D { > > @@ -2404,37 +2384,10 @@ void migrate_enable(void) > > .flags =3D SCA_MIGRATE_ENABLE, > > }; > > > > -#ifdef CONFIG_DEBUG_PREEMPT > > - /* > > - * Check both overflow from migrate_disable() and superfluous > > - * migrate_enable(). > > - */ > > - if (WARN_ON_ONCE((s16)p->migration_disabled <=3D 0)) > > - return; > > -#endif > > - > > - if (p->migration_disabled > 1) { > > - p->migration_disabled--; > > - return; > > - } > > - > > - /* > > - * Ensure stop_task runs either before or after this, and that > > - * __set_cpus_allowed_ptr(SCA_MIGRATE_ENABLE) doesn't schedule(). > > - */ > > - guard(preempt)(); > > if (p->cpus_ptr !=3D &p->cpus_mask) > > __set_cpus_allowed_ptr(p, &ac); > > - /* > > - * Mustn't clear migration_disabled() until cpus_ptr points back a= t the > > - * regular cpus_mask, otherwise things that race (eg. > > - * select_fallback_rq) get confused. > > - */ > > - barrier(); > > - p->migration_disabled =3D 0; > > - this_rq()->nr_pinned--; > > } > > -EXPORT_SYMBOL_GPL(migrate_enable); > > +EXPORT_SYMBOL_GPL(__migrate_enable); > > > > static inline bool rq_has_pinned_tasks(struct rq *rq) > > { > > diff --git a/kernel/sched/rq-offsets.c b/kernel/sched/rq-offsets.c > > new file mode 100644 > > index 000000000000..a23747bbe25b > > --- /dev/null > > +++ b/kernel/sched/rq-offsets.c > > @@ -0,0 +1,12 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +#define COMPILE_OFFSETS > > +#include > > +#include > > +#include "sched.h" > > + > > +int main(void) > > +{ > > + DEFINE(RQ_nr_pinned, offsetof(struct rq, nr_pinned)); > > This part looks nice and sweet. Not sure what you were concerned about. The usage of arch_raw_cpu_ptr() looks ugly and there is no such usage existing, which made me concerned. > > Respin it as a proper patch targeting tip tree. > > And explain the motivation in commit log with detailed > 'perf report' before/after along with 111M/s to 121M/s speed up, > > I suspect with my other __set_cpus_allowed_ptr() suggestion > the speed up should be even bigger. It should be. I'll respin this patch and send it to the tip tree. Thanks! Menglong Dong > > > + return 0; > > +}