All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	tglx@linutronix.de, peterz@infradead.org,
	torvalds@linux-foundation.org, paulmck@kernel.org,
	rostedt@goodmis.org, mark.rutland@arm.com, juri.lelli@redhat.com,
	joel@joelfernandes.org, raghavendra.kt@amd.com,
	boris.ostrovsky@oracle.com, konrad.wilk@oracle.com,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling
Date: Tue, 18 Jun 2024 19:40:23 -0700	[thread overview]
Message-ID: <871q4td59k.fsf@oracle.com> (raw)
In-Reply-To: <e7e2126f-40ca-44af-9287-888f4ec34b35@linux.ibm.com>


Shrikanth Hegde <sshegde@linux.ibm.com> writes:

> On 6/15/24 8:34 PM, Shrikanth Hegde wrote:
>>
>>
>> On 6/10/24 12:53 PM, Ankur Arora wrote:
>>>
>> _auto.
>>>>
>>>> 6.10-rc1:
>>>> =========
>>>> 10:09:50 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
>>>> 09:45:23 AM  all    4.14    0.00   77.57    0.00   16.92    0.00    0.00    0.00    0.00    1.37
>>>> 09:45:24 AM  all    4.42    0.00   77.62    0.00   16.76    0.00    0.00    0.00    0.00    1.20
>>>> 09:45:25 AM  all    4.43    0.00   77.45    0.00   16.94    0.00    0.00    0.00    0.00    1.18
>>>> 09:45:26 AM  all    4.45    0.00   77.87    0.00   16.68    0.00    0.00    0.00    0.00    0.99
>>>>
>>>> PREEMPT_AUTO:
>>>> ===========
>>>> 10:09:50 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
>>>> 10:09:56 AM  all    3.11    0.00   72.59    0.00   21.34    0.00    0.00    0.00    0.00    2.96
>>>> 10:09:57 AM  all    3.31    0.00   73.10    0.00   20.99    0.00    0.00    0.00    0.00    2.60
>>>> 10:09:58 AM  all    3.40    0.00   72.83    0.00   20.85    0.00    0.00    0.00    0.00    2.92
>>>> 10:10:00 AM  all    3.21    0.00   72.87    0.00   21.19    0.00    0.00    0.00    0.00    2.73
>>>> 10:10:01 AM  all    3.02    0.00   72.18    0.00   21.08    0.00    0.00    0.00    0.00    3.71
>>>>
>>>> Used bcc tools hardirq and softirq to see if irq are increasing. softirq implied there are more
>>>> timer,sched softirq. Numbers vary between different samples, but trend seems to be similar.
>>>
>>> Yeah, the %sys is lower and %irq, higher. Can you also see where the
>>> increased %irq is? For instance are the resched IPIs numbers greater?
>>
>> Hi Ankur,
>>
>>
>> Used mpstat -I ALL to capture this info for 20 seconds.
>>
>> HARDIRQ per second:
>> ===================
>> 6.10:
>> ===================
>> 18		19		22		23		48	49	50	51	LOC		BCT	LOC2	SPU	PMI	MCE	NMI	WDG	DBL
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> 417956.86	1114642.30	1712683.65	2058664.99	0.00	0.00	18.30	0.39	31978.37	0.00	0.35	351.98	0.00	0.00	0.00	6405.54	329189.45
>>
>> Preempt_auto:
>> ===================
>> 18		19		22		23		48	49	50	51	LOC		BCT	LOC2	SPU	PMI	MCE	NMI	WDG	DBL
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> 609509.69	1910413.99	1923503.52	2061876.33	0.00	0.00	19.14	0.30	31916.59	0.00	0.45	497.88	0.00	0.00	0.00	6825.49	88247.85
>>
>> 18,19,22,23 are called XIVE interrupts. These are IPI interrupts. I am not sure which type of IPI are these. will have to see why its increasing.
>>
>>
>> SOFTIRQ per second:
>> ===================
>> 6.10:
>> ===================
>> HI	TIMER	NET_TX	NET_RX	BLOCK	IRQ_POLL	TASKLET		SCHED		HRTIMER		RCU
>> 0.00	3966.47	0.00	18.25	0.59	0.00		0.34		12811.00	0.00		9693.95
>>
>> Preempt_auto:
>> ===================
>> HI	TIMER	NET_TX	NET_RX	BLOCK	IRQ_POLL	TASKLET		SCHED		HRTIMER		RCU
>> 0.00	4871.67	0.00	18.94	0.40	0.00		0.25		13518.66	0.00		15732.77
>>
>> Note: RCU softirq seems to increase significantly. Not sure which one triggers. still trying to figure out why.
>> It maybe irq triggering to softirq or softirq causing more IPI.
>>
>>
>>
>> Also, Noticed a below config difference which gets removed in preempt auto. This happens because PREEMPTION make them as N. Made the changes in kernel/Kconfig.locks to get them
>> enabled. I still see the same regression in hackbench. These configs still may need attention?
>>
>> 					6.10				       | 					preempt auto
>>   CONFIG_INLINE_SPIN_UNLOCK_IRQ=y                                              |  CONFIG_UNINLINE_SPIN_UNLOCK=y
>>   CONFIG_INLINE_READ_UNLOCK=y                                                  |  ----------------------------------------------------------------------------
>>   CONFIG_INLINE_READ_UNLOCK_IRQ=y                                              |  ----------------------------------------------------------------------------
>>   CONFIG_INLINE_WRITE_UNLOCK=y                                                 |  ----------------------------------------------------------------------------
>>   CONFIG_INLINE_WRITE_UNLOCK_IRQ=y                                             |  ----------------------------------------------------------------------------
>>
>>
>
> Did an experiment keeping the number of CPU constant, while changing the number of sockets they span across.
> When all CPU belong to same socket, there is no regression w.r.t to PREEMPT_AUTO. Regression starts when the CPUs start
> spanning across sockets.

Ah. That's really interesting. So, upto 160 CPUs was okay?

> Since Preempt auto by default enables preempt count, I think that may cause the regression. I see Powerpc uses generic implementation
> which may not scale well.

Yeah this would explain why I don't see similar behaviour on a 384 CPU
x86 box.

Also, IIRC the powerpc numbers on preempt=full were significantly worse
than preempt=none. That test might also be worth doing once you have the
percpu based method working.

> Will try to shift to percpu based method and see. will get back if I can get that done successfully.

Sounds good to me.


Thanks
Ankur

  reply	other threads:[~2024-06-19  2:41 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-28  0:34 [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-05-28  0:34 ` [PATCH v2 01/35] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h Ankur Arora
2024-06-06 17:45   ` [tip: sched/core] " tip-bot2 for Sean Christopherson
2024-05-28  0:34 ` [PATCH v2 02/35] sched/core: Drop spinlocks on contention iff kernel is preemptible Ankur Arora
2024-05-28  0:34 ` [PATCH v2 03/35] sched: make test_*_tsk_thread_flag() return bool Ankur Arora
2024-05-28  0:34 ` [PATCH v2 04/35] preempt: introduce CONFIG_PREEMPT_AUTO Ankur Arora
2024-06-03 15:04   ` Shrikanth Hegde
2024-06-04 17:52     ` Ankur Arora
2024-05-28  0:34 ` [PATCH v2 05/35] thread_info: selector for TIF_NEED_RESCHED[_LAZY] Ankur Arora
2024-05-28 15:55   ` Peter Zijlstra
2024-05-30  9:07     ` Ankur Arora
2024-05-28  0:34 ` [PATCH v2 06/35] thread_info: define __tif_need_resched(resched_t) Ankur Arora
2024-05-28 16:03   ` Peter Zijlstra
2024-05-28  0:34 ` [PATCH v2 07/35] sched: define *_tsk_need_resched_lazy() helpers Ankur Arora
2024-05-28 16:09   ` Peter Zijlstra
2024-05-30  9:02     ` Ankur Arora
2024-05-29  8:25   ` Peter Zijlstra
2024-05-30  9:08     ` Ankur Arora
2024-05-28  0:34 ` [PATCH v2 08/35] entry: handle lazy rescheduling at user-exit Ankur Arora
2024-05-28 16:12   ` Peter Zijlstra
2024-05-28  0:34 ` [PATCH v2 09/35] entry/kvm: handle lazy rescheduling at guest-entry Ankur Arora
2024-05-28 16:13   ` Peter Zijlstra
2024-05-30  9:04     ` Ankur Arora
2024-05-28  0:34 ` [PATCH v2 10/35] entry: irqentry_exit only preempts for TIF_NEED_RESCHED Ankur Arora
2024-05-28 16:18   ` Peter Zijlstra
2024-05-30  9:03     ` Ankur Arora
2024-05-28  0:34 ` [PATCH v2 11/35] sched: __schedule_loop() doesn't need to check for need_resched_lazy() Ankur Arora
2024-05-28  0:34 ` [PATCH v2 12/35] sched: separate PREEMPT_DYNAMIC config logic Ankur Arora
2024-05-28 16:25   ` Peter Zijlstra
2024-05-30  9:30     ` Ankur Arora
2024-05-28  0:34 ` [PATCH v2 13/35] sched: allow runtime config for PREEMPT_AUTO Ankur Arora
2024-05-28 16:27   ` Peter Zijlstra
2024-05-30  9:29     ` Ankur Arora
2024-06-06 11:51       ` Peter Zijlstra
2024-06-06 15:11         ` Ankur Arora
2024-06-06 17:32           ` Peter Zijlstra
2024-06-09  0:46             ` Ankur Arora
2024-06-12 18:10               ` Paul E. McKenney
2024-05-28  0:35 ` [PATCH v2 14/35] rcu: limit PREEMPT_RCU to full preemption under PREEMPT_AUTO Ankur Arora
2024-05-28  0:35 ` [PATCH v2 15/35] rcu: fix header guard for rcu_all_qs() Ankur Arora
2024-05-28  0:35 ` [PATCH v2 16/35] preempt,rcu: warn on PREEMPT_RCU=n, preempt=full Ankur Arora
2024-05-29  8:14   ` Peter Zijlstra
2024-05-30 18:32     ` Paul E. McKenney
2024-05-30 23:05       ` Ankur Arora
2024-05-30 23:15         ` Paul E. McKenney
2024-05-30 23:04     ` Ankur Arora
2024-05-30 23:20       ` Paul E. McKenney
2024-06-06 11:53         ` Peter Zijlstra
2024-06-06 13:38           ` Paul E. McKenney
2024-06-17 15:54             ` Paul E. McKenney
2024-06-18 16:29               ` Paul E. McKenney
2024-05-28  0:35 ` [PATCH v2 17/35] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Ankur Arora
2024-05-28  0:35 ` [PATCH v2 18/35] rcu: force context-switch " Ankur Arora
2024-05-28  0:35 ` [PATCH v2 19/35] x86/thread_info: define TIF_NEED_RESCHED_LAZY Ankur Arora
2024-05-28  0:35 ` [PATCH v2 20/35] powerpc: add support for PREEMPT_AUTO Ankur Arora
2024-05-28  0:35 ` [PATCH v2 21/35] sched: prepare for lazy rescheduling in resched_curr() Ankur Arora
2024-05-29  9:32   ` Peter Zijlstra
2024-05-28  0:35 ` [PATCH v2 22/35] sched: default preemption policy for PREEMPT_AUTO Ankur Arora
2024-05-28  0:35 ` [PATCH v2 23/35] sched: handle idle preemption " Ankur Arora
2024-05-28  0:35 ` [PATCH v2 24/35] sched: schedule eagerly in resched_cpu() Ankur Arora
2024-05-28  0:35 ` [PATCH v2 25/35] sched/fair: refactor update_curr(), entity_tick() Ankur Arora
2024-05-28  0:35 ` [PATCH v2 26/35] sched/fair: handle tick expiry under lazy preemption Ankur Arora
2024-05-28  0:35 ` [PATCH v2 27/35] sched: support preempt=none under PREEMPT_AUTO Ankur Arora
2024-05-28  0:35 ` [PATCH v2 28/35] sched: support preempt=full " Ankur Arora
2024-05-28  0:35 ` [PATCH v2 29/35] sched: handle preempt=voluntary " Ankur Arora
2024-06-17  3:20   ` Tianchen Ding
2024-06-21 18:58     ` Ankur Arora
2024-06-24  2:35       ` Tianchen Ding
2024-06-25  1:12         ` Ankur Arora
2024-06-26  2:43           ` Tianchen Ding
2024-05-28  0:35 ` [PATCH v2 30/35] sched: latency warn for TIF_NEED_RESCHED_LAZY Ankur Arora
2024-05-28  0:35 ` [PATCH v2 31/35] tracing: support lazy resched Ankur Arora
2024-05-28  0:35 ` [PATCH v2 32/35] Documentation: tracing: add TIF_NEED_RESCHED_LAZY Ankur Arora
2024-05-28  0:35 ` [PATCH v2 33/35] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y Ankur Arora
2024-05-28 13:12   ` Daniel Bristot de Oliveira
2024-05-28  0:35 ` [PATCH v2 34/35] kconfig: decompose ARCH_NO_PREEMPT Ankur Arora
2024-05-28  0:35 ` [PATCH v2 35/35] arch: " Ankur Arora
2024-05-29  6:16 ` [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling Shrikanth Hegde
2024-06-01 11:47   ` Ankur Arora
2024-06-04  7:32     ` Shrikanth Hegde
2024-06-07 16:48       ` Shrikanth Hegde
2024-06-10  7:23         ` Ankur Arora
2024-06-15 15:04           ` Shrikanth Hegde
2024-06-18 18:27             ` Shrikanth Hegde
2024-06-19  2:40               ` Ankur Arora [this message]
2024-06-24 18:37                 ` Shrikanth Hegde
2024-06-27  2:50                   ` Ankur Arora
2024-06-27  5:56                     ` Michael Ellerman
2024-06-27 15:44                       ` Shrikanth Hegde
2024-07-03  5:27                         ` Ankur Arora
2024-08-12 17:32                           ` Shrikanth Hegde
2024-08-12 21:07                             ` Linus Torvalds
2024-08-13  5:40                               ` Ankur Arora
2024-06-05 15:44 ` Sean Christopherson
2024-06-05 17:45   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871q4td59k.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=joel@joelfernandes.org \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=sshegde@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.