From: Ankur Arora <ankur.a.arora@oracle.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
willy@infradead.org, mgorman@suse.de, rostedt@goodmis.org,
tglx@linutronix.de, jon.grimm@amd.com, bharata@amd.com,
raghavendra.kt@amd.com, boris.ostrovsky@oracle.com,
konrad.wilk@oracle.com
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED
Date: Sun, 10 Sep 2023 03:01:02 -0700 [thread overview]
Message-ID: <87zg1u1h5t.fsf@oracle.com> (raw)
In-Reply-To: <CAHk-=wi0bXpgULVVLc2AdJcta-fvQP7yyFQ_JtaoHUiPrqf--A@mail.gmail.com>
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Sat, 9 Sept 2023 at 20:49, Ankur Arora <ankur.a.arora@oracle.com> wrote:
>>
>> I think we can keep these checks, but with this fixed definition of
>> resched_allowed(). This might be better:
>>
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -2260,7 +2260,8 @@ static inline void disallow_resched(void)
>>
>> static __always_inline bool resched_allowed(void)
>> {
>> - return unlikely(test_tsk_thread_flag(current, TIF_RESCHED_ALLOW));
>> + return unlikely(!preempt_count() &&
>> + test_tsk_thread_flag(current, TIF_RESCHED_ALLOW));
>> }
>
> I'm not convinced (at all) that the preempt count is the right thing.
>
> It works for interrupts, yes, because interrupts will increment the
> preempt count even on non-preempt kernels (since the preempt count is
> also the interrupt context level).
>
> But what about any synchronous trap handling?
>
> In other words, just something like a page fault? A page fault doesn't
> increment the preemption count (and in fact many page faults _will_
> obviously re-schedule as part of waiting for IO).
>
> A page fault can *itself* say "feel free to preempt me", and that's one thing.
>
> But a page fault can also *interupt* something that said "feel free to
> preempt me", and that's a completely *different* thing.
>
> So I feel like the "tsk_thread_flag" was sadly completely the wrong
> place to add this bit to, and the wrong place to test it in. What we
> really want is "current kernel entry context".
So, what we want allow_resched() to say is: feel free to reschedule
if in a reschedulable context.
The problem with doing that with an allow_resched tsk_thread_flag is
that the flag is really only valid while it is executing in the context
it was set.
And, trying to validate the flag by checking the preempt_count() makes
it pretty fragile, given that now we are tying it with the specifics of
whether the handling of arbitrary interrupts bumps up the
preempt_count() or not.
> So the right thing to do would basically be to put it in the stack
> frame at kernel entry - whether that kernel entry was a system call
> (which is doing some big copy that should be preemptible without us
> having to add "cond_resched()" in places), or is a page fault (which
> will also do things like big page clearings for hugepages)
Seems to me that associating an allow_resched flag with the stack also
has similar issue. Couldn't the context level change while we are on the
same stack?
I guess the problem is that allow_resched()/disallow_resched() really
need to demarcate a section of code having some property, but instead
set up state that has much wider scope.
Maybe code that allows resched can be in a new .section ".text.resched"
or whatever, and we could use something like this as a check:
int resched_allowed(regs) {
return !preempt_count() && in_resched_function(regs->rip);
}
(allow_resched()/disallow_resched() shouldn't be needed except for
debug checks.)
We still need the !preempt_count() check, but now both the conditions
in the test express two orthogonal ideas:
- !preempt_count(): preemption is safe in the current context
- in_resched_function(regs->rip): okay to reschedule here
So in this example, it should allow scheduling inside both the
clear_pages_reschedulable() calls:
-> page_fault()
clear_page_reschedulable();
-> page_fault()
clear_page_reschedulable();
Here though, rescheduling could happen only in the first call to
clear_page_reschedulable():
-> page_fault()
clear_page_reschedulable();
-> hardirq()
-> page_fault()
clear_page_reschedulable();
Does that make any sense, or I'm just talking through my hat?
--
ankur
next prev parent reply other threads:[~2023-09-10 10:01 UTC|newest]
Thread overview: 214+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-30 18:49 [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-30 18:49 ` [PATCH v2 1/9] mm/clear_huge_page: allow arch override for clear_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 2/9] mm/huge_page: separate clear_huge_page() and copy_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 3/9] mm/huge_page: cleanup clear_/copy_subpage() Ankur Arora
2023-09-08 13:09 ` Matthew Wilcox
2023-09-11 17:22 ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 4/9] x86/clear_page: extend clear_page*() for multi-page clearing Ankur Arora
2023-09-08 13:11 ` Matthew Wilcox
2023-08-30 18:49 ` [PATCH v2 5/9] x86/clear_page: add clear_pages() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 6/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-31 18:26 ` kernel test robot
2023-09-08 12:38 ` Peter Zijlstra
2023-09-13 6:43 ` Raghavendra K T
2023-08-30 18:49 ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-08 7:02 ` Peter Zijlstra
2023-09-08 17:15 ` Linus Torvalds
2023-09-08 22:50 ` Peter Zijlstra
2023-09-09 5:15 ` Linus Torvalds
2023-09-09 6:39 ` Ankur Arora
2023-09-09 9:11 ` Peter Zijlstra
2023-09-09 20:04 ` Ankur Arora
2023-09-09 5:30 ` Ankur Arora
2023-09-09 9:12 ` Peter Zijlstra
2023-09-09 20:15 ` Ankur Arora
2023-09-09 21:16 ` Linus Torvalds
2023-09-10 3:48 ` Ankur Arora
2023-09-10 4:35 ` Linus Torvalds
2023-09-10 10:01 ` Ankur Arora [this message]
2023-09-10 18:32 ` Linus Torvalds
2023-09-11 15:04 ` Peter Zijlstra
2023-09-11 16:29 ` andrew.cooper3
2023-09-11 17:04 ` Ankur Arora
2023-09-12 8:26 ` Peter Zijlstra
2023-09-12 12:24 ` Phil Auld
2023-09-12 12:33 ` Matthew Wilcox
2023-09-18 23:42 ` Thomas Gleixner
2023-09-19 1:57 ` Linus Torvalds
2023-09-19 8:03 ` Ingo Molnar
2023-09-19 8:43 ` Ingo Molnar
2023-09-19 13:43 ` Thomas Gleixner
2023-09-19 13:25 ` Thomas Gleixner
2023-09-19 12:30 ` Thomas Gleixner
2023-09-19 13:00 ` Arches that don't support PREEMPT Matthew Wilcox
2023-09-19 13:00 ` Matthew Wilcox
2023-09-19 13:00 ` Matthew Wilcox
2023-09-19 13:34 ` Geert Uytterhoeven
2023-09-19 13:34 ` Geert Uytterhoeven
2023-09-19 13:34 ` Geert Uytterhoeven
2023-09-19 13:37 ` John Paul Adrian Glaubitz
2023-09-19 13:37 ` John Paul Adrian Glaubitz
2023-09-19 13:37 ` John Paul Adrian Glaubitz
2023-09-19 13:42 ` Peter Zijlstra
2023-09-19 13:42 ` Peter Zijlstra
2023-09-19 13:42 ` Peter Zijlstra
2023-09-19 13:48 ` John Paul Adrian Glaubitz
2023-09-19 13:48 ` John Paul Adrian Glaubitz
2023-09-19 13:48 ` John Paul Adrian Glaubitz
2023-09-19 14:16 ` Peter Zijlstra
2023-09-19 14:16 ` Peter Zijlstra
2023-09-19 14:16 ` Peter Zijlstra
2023-09-19 14:24 ` John Paul Adrian Glaubitz
2023-09-19 14:24 ` John Paul Adrian Glaubitz
2023-09-19 14:24 ` John Paul Adrian Glaubitz
2023-09-19 14:32 ` Matthew Wilcox
2023-09-19 14:32 ` Matthew Wilcox
2023-09-19 14:32 ` Matthew Wilcox
2023-09-19 15:31 ` Steven Rostedt
2023-09-19 15:31 ` Steven Rostedt
2023-09-19 15:31 ` Steven Rostedt
2023-09-20 14:38 ` Anton Ivanov
2023-09-20 14:38 ` Anton Ivanov
2023-09-20 14:38 ` Anton Ivanov
2023-09-21 12:20 ` Arnd Bergmann
2023-09-21 12:20 ` Arnd Bergmann
2023-09-21 12:20 ` Arnd Bergmann
2023-09-19 14:17 ` Thomas Gleixner
2023-09-19 14:17 ` Thomas Gleixner
2023-09-19 14:17 ` Thomas Gleixner
2023-09-19 14:50 ` H. Peter Anvin
2023-09-19 14:50 ` H. Peter Anvin
2023-09-19 14:50 ` H. Peter Anvin
2023-09-19 14:57 ` Matt Turner
2023-09-19 14:57 ` Matt Turner
2023-09-19 14:57 ` Matt Turner
2023-09-19 17:09 ` Ulrich Teichert
2023-09-19 17:09 ` Ulrich Teichert
2023-09-19 17:25 ` Linus Torvalds
2023-09-19 17:25 ` Linus Torvalds
2023-09-19 17:25 ` Linus Torvalds
2023-09-19 17:58 ` John Paul Adrian Glaubitz
2023-09-19 17:58 ` John Paul Adrian Glaubitz
2023-09-19 17:58 ` John Paul Adrian Glaubitz
2023-09-19 18:31 ` Thomas Gleixner
2023-09-19 18:31 ` Thomas Gleixner
2023-09-19 18:31 ` Thomas Gleixner
2023-09-19 18:38 ` Steven Rostedt
2023-09-19 18:38 ` Steven Rostedt
2023-09-19 18:38 ` Steven Rostedt
2023-09-19 18:52 ` Linus Torvalds
2023-09-19 18:52 ` Linus Torvalds
2023-09-19 18:52 ` Linus Torvalds
2023-09-19 19:53 ` Thomas Gleixner
2023-09-19 19:53 ` Thomas Gleixner
2023-09-19 19:53 ` Thomas Gleixner
2023-09-20 7:32 ` Ingo Molnar
2023-09-20 7:32 ` Ingo Molnar
2023-09-20 7:32 ` Ingo Molnar
2023-09-20 7:29 ` Ingo Molnar
2023-09-20 7:29 ` Ingo Molnar
2023-09-20 7:29 ` Ingo Molnar
2023-09-20 8:26 ` Thomas Gleixner
2023-09-20 8:26 ` Thomas Gleixner
2023-09-20 8:26 ` Thomas Gleixner
2023-09-20 10:37 ` David Laight
2023-09-20 10:37 ` David Laight
2023-09-20 10:37 ` David Laight
2023-09-19 14:21 ` Anton Ivanov
2023-09-19 14:21 ` Anton Ivanov
2023-09-19 14:21 ` Anton Ivanov
2023-09-19 15:17 ` Thomas Gleixner
2023-09-19 15:17 ` Thomas Gleixner
2023-09-19 15:17 ` Thomas Gleixner
2023-09-19 15:21 ` Anton Ivanov
2023-09-19 15:21 ` Anton Ivanov
2023-09-19 15:21 ` Anton Ivanov
2023-09-19 16:22 ` Richard Weinberger
2023-09-19 16:22 ` Richard Weinberger
2023-09-19 16:22 ` Richard Weinberger
2023-09-19 16:41 ` Anton Ivanov
2023-09-19 16:41 ` Anton Ivanov
2023-09-19 16:41 ` Anton Ivanov
2023-09-19 17:33 ` Thomas Gleixner
2023-09-19 17:33 ` Thomas Gleixner
2023-09-19 17:33 ` Thomas Gleixner
2023-10-06 14:51 ` Geert Uytterhoeven
2023-10-06 14:51 ` Geert Uytterhoeven
2023-09-20 14:22 ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-20 20:51 ` Thomas Gleixner
2023-09-21 0:14 ` Thomas Gleixner
2023-09-21 0:58 ` Ankur Arora
2023-09-21 2:12 ` Thomas Gleixner
2023-09-20 23:58 ` Thomas Gleixner
2023-09-21 0:57 ` Ankur Arora
2023-09-21 2:02 ` Thomas Gleixner
2023-09-21 4:16 ` Ankur Arora
2023-09-21 13:59 ` Steven Rostedt
2023-09-21 16:00 ` Linus Torvalds
2023-09-21 22:55 ` Thomas Gleixner
2023-09-23 1:11 ` Thomas Gleixner
2023-10-02 14:15 ` Steven Rostedt
2023-10-02 16:13 ` Thomas Gleixner
2023-10-18 1:03 ` Paul E. McKenney
2023-10-18 12:09 ` Ankur Arora
2023-10-18 17:51 ` Paul E. McKenney
2023-10-18 22:53 ` Thomas Gleixner
2023-10-18 23:25 ` Paul E. McKenney
2023-10-18 13:16 ` Thomas Gleixner
2023-10-18 14:31 ` Steven Rostedt
2023-10-18 17:55 ` Paul E. McKenney
2023-10-18 18:00 ` Steven Rostedt
2023-10-18 18:13 ` Paul E. McKenney
2023-10-19 12:37 ` Daniel Bristot de Oliveira
2023-10-19 17:08 ` Paul E. McKenney
2023-10-18 17:19 ` Paul E. McKenney
2023-10-18 17:41 ` Steven Rostedt
2023-10-18 17:59 ` Paul E. McKenney
2023-10-18 20:15 ` Ankur Arora
2023-10-18 20:42 ` Paul E. McKenney
2023-10-19 0:21 ` Thomas Gleixner
2023-10-19 19:13 ` Paul E. McKenney
2023-10-20 21:59 ` Paul E. McKenney
2023-10-20 22:56 ` Ankur Arora
2023-10-20 23:36 ` Paul E. McKenney
2023-10-21 1:05 ` Ankur Arora
2023-10-21 2:08 ` Paul E. McKenney
2023-10-24 12:15 ` Thomas Gleixner
2023-10-24 18:59 ` Paul E. McKenney
2023-09-23 22:50 ` Thomas Gleixner
2023-09-24 0:10 ` Thomas Gleixner
2023-09-24 7:19 ` Matthew Wilcox
2023-09-24 7:55 ` Thomas Gleixner
2023-09-24 10:29 ` Matthew Wilcox
2023-09-25 0:13 ` Ankur Arora
2023-10-06 13:01 ` Geert Uytterhoeven
2023-09-19 7:21 ` Ingo Molnar
2023-09-19 19:05 ` Ankur Arora
2023-10-24 14:34 ` Steven Rostedt
2023-10-25 1:49 ` Steven Rostedt
2023-10-26 7:50 ` Sergey Senozhatsky
2023-10-26 12:48 ` Steven Rostedt
2023-09-11 16:48 ` Steven Rostedt
2023-09-11 20:50 ` Linus Torvalds
2023-09-11 21:16 ` Linus Torvalds
2023-09-12 7:20 ` Peter Zijlstra
2023-09-12 7:38 ` Ingo Molnar
2023-09-11 22:20 ` Steven Rostedt
2023-09-11 23:10 ` Ankur Arora
2023-09-11 23:16 ` Steven Rostedt
2023-09-12 16:30 ` Linus Torvalds
2023-09-12 3:27 ` Matthew Wilcox
2023-09-12 16:20 ` Linus Torvalds
2023-09-19 3:21 ` Andy Lutomirski
2023-09-19 9:20 ` Thomas Gleixner
2023-09-19 9:49 ` Ingo Molnar
2023-08-30 18:49 ` [PATCH v2 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-09-08 12:42 ` Peter Zijlstra
2023-09-11 17:24 ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-09-08 12:45 ` Peter Zijlstra
2023-09-03 8:14 ` [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Mateusz Guzik
2023-09-05 22:14 ` Ankur Arora
2023-09-08 2:18 ` Raghavendra K T
2023-09-05 1:06 ` Raghavendra K T
2023-09-05 19:36 ` Ankur Arora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zg1u1h5t.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jon.grimm@amd.com \
--cc=juri.lelli@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.