From: Nathan Lynch <nathanl@linux.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>,
Nathan Lynch via B4 Submission Endpoint
<devnull+nathanl.linux.ibm.com@kernel.org>,
Nicholas Piggin <npiggin@gmail.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Kajol Jain <kjain@linux.ibm.com>,
Laurent Dufour <ldufour@linux.ibm.com>,
Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
Andrew Donnellan <ajd@linux.ibm.com>,
Nick Child <nnac123@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 11/19] powerpc/rtas: add work area allocator
Date: Wed, 08 Feb 2023 08:48:01 -0600 [thread overview]
Message-ID: <87r0v0nrha.fsf@linux.ibm.com> (raw)
In-Reply-To: <87o7q4wera.fsf@mpe.ellerman.id.au>
Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch via B4 Submission Endpoint
> <devnull+nathanl.linux.ibm.com@kernel.org> writes:
>> diff --git a/arch/powerpc/include/asm/rtas-work-area.h b/arch/powerpc/include/asm/rtas-work-area.h
>> new file mode 100644
>> index 000000000000..76ccb039cc37
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/rtas-work-area.h
>> @@ -0,0 +1,45 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +#ifndef POWERPC_RTAS_WORK_AREA_H
>> +#define POWERPC_RTAS_WORK_AREA_H
>
> The usual style would be _ASM_POWERPC_RTAS_WORK_AREA_H.
OK. (will change in all new headers)
>> +static struct rtas_work_area_allocator_state {
>> + struct gen_pool *gen_pool;
>> + char *arena;
>> + struct mutex mutex; /* serializes allocations */
>> + struct wait_queue_head wqh;
>> + mempool_t descriptor_pool;
>> + bool available;
>> +} rwa_state_ = {
>> + .mutex = __MUTEX_INITIALIZER(rwa_state_.mutex),
>> + .wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state_.wqh),
>> +};
>> +static struct rtas_work_area_allocator_state *rwa_state = &rwa_state_;
>
> I assumed the pointer was so you could swap this out at runtime or
> something, but I don't think you do.
>
> Any reason not to drop the pointer and just use rwa_state.foo accessors?
> That would also allow the struct to be anonymous.
>
> Or if you have the pointer you can at least make it NULL prior to init
> and avoid the need for "available".
I think it's there because earlier versions of this that I never posted
had unit tests. I'll either resurrect those or reduce the indirection.
>> +/*
>> + * A single work area buffer and descriptor to serve requests early in
>> + * boot before the allocator is fully initialized.
>> + */
>> +static bool early_work_area_in_use __initdata;
>> +static char early_work_area_buf[SZ_4K] __initdata;
>
> That should be page aligned I think?
Yes. It happens to be safe in this version because ibm,get-system-parameter,
which has no alignment requirement, is the only function used early
enough to use the buffer. But that's too fragile.
>> +static struct rtas_work_area early_work_area __initdata = {
>> + .buf = early_work_area_buf,
>> + .size = sizeof(early_work_area_buf),
>> +};
>> +
>> +
>> +static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t size)
>> +{
>> + WARN_ON(size > early_work_area.size);
>> + WARN_ON(early_work_area_in_use);
>> + early_work_area_in_use = true;
>> + memset(early_work_area.buf, 0, early_work_area.size);
>> + return &early_work_area;
>> +}
>> +
>> +static void __init rtas_work_area_free_early(struct rtas_work_area *work_area)
>> +{
>> + WARN_ON(work_area != &early_work_area);
>> + WARN_ON(!early_work_area_in_use);
>> + early_work_area_in_use = false;
>> +}
>> +
>> +struct rtas_work_area * __ref rtas_work_area_alloc(size_t size)
>> +{
>> + struct rtas_work_area *area;
>> + unsigned long addr;
>> +
>> + might_sleep();
>> +
>> + WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
>> + size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ);
>
> This seems unsafe.
>
> If you return a buffer smaller than the caller asks for they're likely
> to read/write past the end of it and corrupt memory.
OK, let's figure out another way to handle this.
> AFAIK genalloc doesn't have guard pages or anything fancy to save us
> from that - but maybe I'm wrong, I've never used it.
Yeah we would have to build our own thing on top of it. And I don't
think it could be something that traps on access, it would have to be a
check in rtas_work_area_free(), after the fact.
> There's only three callers in the end, seems like we should just return
> NULL if the size is too large and have callers check the return value.
There are more conversions to do, and a property I hope to maintain is
that requests can't fail. Existing users of rtas_data_buf don't have
error paths for failure to acquire the buffer.
I believe the allocation size passed to rtas_work_area_alloc() can be
known at build time in all cases. Maybe we could prevent inappropriate
requests from being built with a compile-time assertion (untested):
/* rtas-work-area.h */
static inline struct rtas_work_area *rtas_work_area_alloc(size_t sz)
{
static_assert(sz < RTAS_WORK_AREA_MAX_ALLOC_SZ);
return __rtas_work_area_alloc(sz);
}
I think this would be OK? If I can't make it work I'll fall back to
returning NULL as you suggest, but it will make for more churn (and
risk) in the conversions.
>> + if (!rwa_state->available) {
>> + area = rtas_work_area_alloc_early(size);
>> + goto out;
>> + }
>> +
>> + /*
>> + * To ensure FCFS behavior and prevent a high rate of smaller
>> + * requests from starving larger ones, use the mutex to queue
>> + * allocations.
>> + */
>> + mutex_lock(&rwa_state->mutex);
>> + wait_event(rwa_state->wqh,
>> + (addr = gen_pool_alloc(rwa_state->gen_pool, size)) != 0);
>> + mutex_unlock(&rwa_state->mutex);
>> +
>> + area = mempool_alloc(&rwa_state->descriptor_pool, GFP_KERNEL);
>> + *area = (typeof(*area)){
>> + .size = size,
>> + .buf = (char *)addr,
>> + };
>
> That is an odd way to write that :)
yeah I'll change it.
>
>> +out:
>> + pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
>> + (void *)_RET_IP_, __func__, area->buf, area->size);
>
> Can we drop those? They need a recompile to enable, so if someone needs
> debugging they can just rewrite them - or use some sort of tracing
> instead.
Sure.
>> +machine_arch_initcall(pseries, rtas_work_area_allocator_init);
>
> Should it live in platforms/pseries then?
Yeah it probably ought to. I am pretty sure the "work area" construct is
PAPR-specific, and I haven't found any evidence that it's used on
non-pseries.
>> +/**
>> + * rtas_work_area_reserve_arena() - reserve memory suitable for RTAS work areas.
>> + */
>> +int __init rtas_work_area_reserve_arena(const phys_addr_t limit)
>> +{
>> + const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN;
>> + const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ;
>> + const phys_addr_t min = MEMBLOCK_LOW_LIMIT;
>> + const int nid = NUMA_NO_NODE;
>
> This should probably also be restricted to pseries?
Yes.
next prev parent reply other threads:[~2023-02-08 14:49 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-06 18:54 [PATCH v2 00/19] RTAS maintenance Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 01/19] powerpc/rtas: handle extended delays safely in early boot Nathan Lynch via B4 Submission Endpoint
2023-02-08 11:20 ` Michael Ellerman
2023-02-08 13:14 ` Nathan Lynch
2023-02-10 5:54 ` Michael Ellerman
2023-02-06 18:54 ` [PATCH v2 02/19] powerpc/perf/hv-24x7: add missing RTAS retry status handling Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 03/19] powerpc/pseries/lpar: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 04/19] powerpc/pseries/lparcfg: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 05/19] powerpc/pseries/setup: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 06/19] powerpc/pseries: drop RTAS-based timebase synchronization Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 07/19] powerpc/rtas: improve function information lookups Nathan Lynch via B4 Submission Endpoint
2023-02-08 11:57 ` Michael Ellerman
2023-02-08 13:16 ` Nathan Lynch
2023-02-06 18:54 ` [PATCH v2 08/19] powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 09/19] powerpc/tracing: tracepoints for RTAS entry and exit Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 10/19] powerpc/rtas: add tracepoints around RTAS entry Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 11/19] powerpc/rtas: add work area allocator Nathan Lynch via B4 Submission Endpoint
2023-02-08 11:58 ` Michael Ellerman
2023-02-08 14:48 ` Nathan Lynch [this message]
2023-02-10 6:07 ` Michael Ellerman
2023-02-06 18:54 ` [PATCH v2 12/19] powerpc/pseries/dlpar: use RTAS work area API Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 13/19] powerpc/pseries: PAPR system parameter API Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 14/19] powerpc/pseries: convert CMO probe to papr_sysparm API Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 15/19] powerpc/pseries/lparcfg: convert " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 16/19] powerpc/pseries/hv-24x7: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 17/19] powerpc/pseries/lpar: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 18/19] powerpc/rtas: introduce rtas_function_token() API Nathan Lynch via B4 Submission Endpoint
2023-02-08 12:09 ` Michael Ellerman
2023-02-08 15:44 ` Nathan Lynch
2023-02-06 18:54 ` [PATCH v2 19/19] powerpc/rtas: arch-wide function token lookup conversions Nathan Lynch via B4 Submission Endpoint
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r0v0nrha.fsf@linux.ibm.com \
--to=nathanl@linux.ibm.com \
--cc=ajd@linux.ibm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=devnull+nathanl.linux.ibm.com@kernel.org \
--cc=kjain@linux.ibm.com \
--cc=ldufour@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=nnac123@linux.ibm.com \
--cc=npiggin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).