* [PATCH 6.12.y-cip] mm: page_alloc: Add kernel parameter to select maximum PCP batch scale number
@ 2025-10-27 9:26 Claudiu
2025-10-29 9:24 ` Pavel Machek
0 siblings, 1 reply; 3+ messages in thread
From: Claudiu @ 2025-10-27 9:26 UTC (permalink / raw)
To: nobuhiro1.iwamatsu, pavel; +Cc: claudiu.beznea, cip-dev
From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid
too long latency") introduced default PCP (Per-CPU Pageset) batch size as
a configuration flag. The configuration flag is CONFIG_PCP_BATCH_SCALE_MAX.
The ARM64 defconfig has CONFIG_PCP_BATCH_SCALE_MAX=5. This defconfig
is used by a high range of SoCs.
The Renesas RZ/G3S SoC is a single CPU SoC, with L1$ (I-cache 32Kbytes,
D-cache 32 Kbytes), L3$ (256 Kbytes), but no L2$. It is currently used in
a configuration with 1 GiB RAM size. In this configuration, starting with
commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid too
long latency") the "bonnie++ -d /mnt -u root" benchmark takes ~14 minutes
while previously it took ~10 minutes. The /mnt directory is mounted on SD
card. Same behavior is reproduced on similar Renesas single core devices
(e.g., Renesas RZ/G2UL). bonnie++ version used was 1.04.
Add a new kernel parameter to allow systems like Renesas RZ/G3S to
continue have the same performance numbers with the default mainline
ARM64 config. With pcp_batch_scale_max=5 (the default value) the bonnie++
benchmark takes ~14 minutes while with pcp_batch_scale_max=0 it takes
~10 minutes.
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
---
Hi,
This patch was posted as RFC at [1] but got no feedback. Starting with
kernel v6.17 the behaviour was restored by commits:
18ebe55a9236 ("mm/readahead: terminate async readahead on natural boundary")
38b0ece6d763 ("mm/filemap: allow arch to request folio size for exec memory")
However, these commits are not fixes and depends on other updates and
cleanup done on memory management subsystem b/w versions v6.9 and v6.17.
Please give your feedback.
Thank you,
Claudiu
[1] https://lore.kernel.org/all/20241126095138.1832464-1-claudiu.beznea.uj@bp.renesas.com/
.../admin-guide/kernel-parameters.txt | 6 +++++
mm/page_alloc.c | 26 ++++++++++++++-----
2 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8724c2c580b8..4fba1837d7ca 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4734,6 +4734,12 @@
for debug and development, but should not be
needed on a platform with proper driver support.
+ pcp_batch_scale_max=n
+ Format: <integer>
+ Range: 0,6 : number
+ Default : CONFIG_PCP_BATCH_SCALE_MAX
+ Used for setting the scale number for PCP batch scale algorithm.
+
pdcchassis= [PARISC,HW] Disable/Enable PDC Chassis Status codes at
boot time.
Format: { 0 | 1 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 752576749db9..468e63b8b979 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -163,6 +163,20 @@ static DEFINE_MUTEX(pcp_batch_high_lock);
#define pcp_spin_unlock(ptr) \
pcpu_spin_unlock(lock, ptr)
+static unsigned int pcp_batch_scale_max = CONFIG_PCP_BATCH_SCALE_MAX;
+#define MAX_PCP_BATCH 6
+
+static int __init setup_pcp_batch_scale_max(char *str)
+{
+ get_option(&str, (unsigned int *)&pcp_batch_scale_max);
+
+ if (pcp_batch_scale_max > MAX_PCP_BATCH)
+ pcp_batch_scale_max = MAX_PCP_BATCH;
+
+ return 1;
+}
+__setup("pcp_batch_scale_max=", setup_pcp_batch_scale_max);
+
#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
DEFINE_PER_CPU(int, numa_node);
EXPORT_PER_CPU_SYMBOL(numa_node);
@@ -2376,7 +2390,7 @@ int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
* control latency. This caps pcp->high decrement too.
*/
if (pcp->high > high_min) {
- pcp->high = max3(pcp->count - (batch << CONFIG_PCP_BATCH_SCALE_MAX),
+ pcp->high = max3(pcp->count - (batch << pcp_batch_scale_max),
pcp->high - (pcp->high >> 3), high_min);
if (pcp->high > high_min)
todo++;
@@ -2426,7 +2440,7 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone)
count = pcp->count;
if (count) {
int to_drain = min(count,
- pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
+ pcp->batch << pcp_batch_scale_max);
free_pcppages_bulk(zone, to_drain, pcp, 0);
count -= to_drain;
@@ -2554,7 +2568,7 @@ static int nr_pcp_free(struct per_cpu_pages *pcp, int batch, int high, bool free
/* Free as much as possible if batch freeing high-order pages. */
if (unlikely(free_high))
- return min(pcp->count, batch << CONFIG_PCP_BATCH_SCALE_MAX);
+ return min(pcp->count, batch << pcp_batch_scale_max);
/* Check for PCP disabled or boot pageset */
if (unlikely(high < batch))
@@ -2586,7 +2600,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone,
return 0;
if (unlikely(free_high)) {
- pcp->high = max(high - (batch << CONFIG_PCP_BATCH_SCALE_MAX),
+ pcp->high = max(high - (batch << pcp_batch_scale_max),
high_min);
return 0;
}
@@ -2656,7 +2670,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
} else if (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) {
pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER;
}
- if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX))
+ if (pcp->free_count < (batch << pcp_batch_scale_max))
pcp->free_count += (1 << order);
high = nr_pcp_high(pcp, zone, batch, free_high);
if (pcp->count >= high) {
@@ -2999,7 +3013,7 @@ static int nr_pcp_alloc(struct per_cpu_pages *pcp, struct zone *zone, int order)
* subsequent allocation of order-0 pages without any freeing.
*/
if (batch <= max_nr_alloc &&
- pcp->alloc_factor < CONFIG_PCP_BATCH_SCALE_MAX)
+ pcp->alloc_factor < pcp_batch_scale_max)
pcp->alloc_factor++;
batch = min(batch, max_nr_alloc);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 6.12.y-cip] mm: page_alloc: Add kernel parameter to select maximum PCP batch scale number
2025-10-27 9:26 [PATCH 6.12.y-cip] mm: page_alloc: Add kernel parameter to select maximum PCP batch scale number Claudiu
@ 2025-10-29 9:24 ` Pavel Machek
2025-10-29 14:30 ` claudiu beznea
0 siblings, 1 reply; 3+ messages in thread
From: Pavel Machek @ 2025-10-29 9:24 UTC (permalink / raw)
To: Claudiu; +Cc: nobuhiro1.iwamatsu, cip-dev
[-- Attachment #1: Type: text/plain, Size: 2430 bytes --]
Hi!
> Commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid
> too long latency") introduced default PCP (Per-CPU Pageset) batch size as
> a configuration flag. The configuration flag is CONFIG_PCP_BATCH_SCALE_MAX.
>
> The ARM64 defconfig has CONFIG_PCP_BATCH_SCALE_MAX=5. This defconfig
> is used by a high range of SoCs.
>
> The Renesas RZ/G3S SoC is a single CPU SoC, with L1$ (I-cache 32Kbytes,
> D-cache 32 Kbytes), L3$ (256 Kbytes), but no L2$. It is currently used in
> a configuration with 1 GiB RAM size. In this configuration, starting with
> commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid too
> long latency") the "bonnie++ -d /mnt -u root" benchmark takes ~14 minutes
> while previously it took ~10 minutes. The /mnt directory is mounted on SD
> card. Same behavior is reproduced on similar Renesas single core devices
> (e.g., Renesas RZ/G2UL). bonnie++ version used was 1.04.
>
> Add a new kernel parameter to allow systems like Renesas RZ/G3S to
> continue have the same performance numbers with the default mainline
...
> This patch was posted as RFC at [1] but got no feedback. Starting with
> kernel v6.17 the behaviour was restored by commits:
>
> 18ebe55a9236 ("mm/readahead: terminate async readahead on natural boundary")
> 38b0ece6d763 ("mm/filemap: allow arch to request folio size for exec
> memory")
Ok, so this is not mainline, because mainline has different solution.
It is also not queued for stable.
It introduces kernel parameter that will be special for -cip, not seen
in mainline nor -stable.
Presumably you'll be telling your customers to set the command line
parameter. Would it make sense to ask them to tune the config instead?
Best regards,
Pavel
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4734,6 +4734,12 @@
> for debug and development, but should not be
> needed on a platform with proper driver support.
>
> + pcp_batch_scale_max=n
> + Format: <integer>
> + Range: 0,6 : number
> + Default : CONFIG_PCP_BATCH_SCALE_MAX
> + Used for setting the scale number for PCP batch scale algorithm.
> +
> pdcchassis= [PARISC,HW] Disable/Enable PDC Chassis Status codes at
> boot time.
> Format: { 0 | 1 }
--
In cooperation with DENX Software Engineering GmbH, HRB 165235 Munich,
Office: Kirchenstr.5, D-82194 Groebenzell, Germany
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 6.12.y-cip] mm: page_alloc: Add kernel parameter to select maximum PCP batch scale number
2025-10-29 9:24 ` Pavel Machek
@ 2025-10-29 14:30 ` claudiu beznea
0 siblings, 0 replies; 3+ messages in thread
From: claudiu beznea @ 2025-10-29 14:30 UTC (permalink / raw)
To: Pavel Machek; +Cc: nobuhiro1.iwamatsu, cip-dev
Hi, Pavel,
On 10/29/25 11:24, Pavel Machek wrote:
> Hi!
>
>> Commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid
>> too long latency") introduced default PCP (Per-CPU Pageset) batch size as
>> a configuration flag. The configuration flag is CONFIG_PCP_BATCH_SCALE_MAX.
>>
>> The ARM64 defconfig has CONFIG_PCP_BATCH_SCALE_MAX=5. This defconfig
>> is used by a high range of SoCs.
>>
>> The Renesas RZ/G3S SoC is a single CPU SoC, with L1$ (I-cache 32Kbytes,
>> D-cache 32 Kbytes), L3$ (256 Kbytes), but no L2$. It is currently used in
>> a configuration with 1 GiB RAM size. In this configuration, starting with
>> commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid too
>> long latency") the "bonnie++ -d /mnt -u root" benchmark takes ~14 minutes
>> while previously it took ~10 minutes. The /mnt directory is mounted on SD
>> card. Same behavior is reproduced on similar Renesas single core devices
>> (e.g., Renesas RZ/G2UL). bonnie++ version used was 1.04.
>>
>> Add a new kernel parameter to allow systems like Renesas RZ/G3S to
>> continue have the same performance numbers with the default mainline
> ...
>
>> This patch was posted as RFC at [1] but got no feedback. Starting with
>> kernel v6.17 the behaviour was restored by commits:
>>
>> 18ebe55a9236 ("mm/readahead: terminate async readahead on natural boundary")
>> 38b0ece6d763 ("mm/filemap: allow arch to request folio size for exec
>> memory")
>
> Ok, so this is not mainline, because mainline has different solution.
>
> It is also not queued for stable.
>
> It introduces kernel parameter that will be special for -cip, not seen
> in mainline nor -stable.
>
> Presumably you'll be telling your customers to set the command line
> parameter.
Yes, the idea was to have the generic built image tuned tough the command line
parameter.
> Would it make sense to ask them to tune the config instead?
We wanted to avoid updating the generic kernel config to not, presumably, affect
the more powerful devices sharing the same defconfig. We can update the config
in cip-kernel-config but that may have negative impact on the more powerful
devices. Please let me know this approach fits better for you.
The commit description specifies:
Although it is reasonable to use 5 as max batch scale factor for the
systems tested, there are also slower systems. Where smaller value should
be used to constrain the page allocation/freeing latency.
Thank you,
Claudiu
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-10-29 14:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-27 9:26 [PATCH 6.12.y-cip] mm: page_alloc: Add kernel parameter to select maximum PCP batch scale number Claudiu
2025-10-29 9:24 ` Pavel Machek
2025-10-29 14:30 ` claudiu beznea
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox