linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nathan Lynch <nathanl@linux.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>,
	Nathan Lynch via B4 Submission Endpoint
	<devnull+nathanl.linux.ibm.com@kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Kajol Jain <kjain@linux.ibm.com>,
	Laurent Dufour <ldufour@linux.ibm.com>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Andrew Donnellan <ajd@linux.ibm.com>,
	Nick Child <nnac123@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 11/19] powerpc/rtas: add work area allocator
Date: Wed, 08 Feb 2023 08:48:01 -0600	[thread overview]
Message-ID: <87r0v0nrha.fsf@linux.ibm.com> (raw)
In-Reply-To: <87o7q4wera.fsf@mpe.ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch via B4 Submission Endpoint
> <devnull+nathanl.linux.ibm.com@kernel.org> writes:
>> diff --git a/arch/powerpc/include/asm/rtas-work-area.h b/arch/powerpc/include/asm/rtas-work-area.h
>> new file mode 100644
>> index 000000000000..76ccb039cc37
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/rtas-work-area.h
>> @@ -0,0 +1,45 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +#ifndef POWERPC_RTAS_WORK_AREA_H
>> +#define POWERPC_RTAS_WORK_AREA_H
>
> The usual style would be _ASM_POWERPC_RTAS_WORK_AREA_H.

OK. (will change in all new headers)

>> +static struct rtas_work_area_allocator_state {
>> +	struct gen_pool *gen_pool;
>> +	char *arena;
>> +	struct mutex mutex; /* serializes allocations */
>> +	struct wait_queue_head wqh;
>> +	mempool_t descriptor_pool;
>> +	bool available;
>> +} rwa_state_ = {
>> +	.mutex = __MUTEX_INITIALIZER(rwa_state_.mutex),
>> +	.wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state_.wqh),
>> +};
>> +static struct rtas_work_area_allocator_state *rwa_state = &rwa_state_;
>
> I assumed the pointer was so you could swap this out at runtime or
> something, but I don't think you do.
>
> Any reason not to drop the pointer and just use rwa_state.foo accessors?
> That would also allow the struct to be anonymous.
>
> Or if you have the pointer you can at least make it NULL prior to init
> and avoid the need for "available".

I think it's there because earlier versions of this that I never posted
had unit tests. I'll either resurrect those or reduce the indirection.


>> +/*
>> + * A single work area buffer and descriptor to serve requests early in
>> + * boot before the allocator is fully initialized.
>> + */
>> +static bool early_work_area_in_use __initdata;
>> +static char early_work_area_buf[SZ_4K] __initdata;
>
> That should be page aligned I think?

Yes. It happens to be safe in this version because ibm,get-system-parameter,
which has no alignment requirement, is the only function used early
enough to use the buffer. But that's too fragile.


>> +static struct rtas_work_area early_work_area __initdata = {
>> +	.buf = early_work_area_buf,
>> +	.size = sizeof(early_work_area_buf),
>> +};
>> +
>> +
>> +static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t size)
>> +{
>> +	WARN_ON(size > early_work_area.size);
>> +	WARN_ON(early_work_area_in_use);
>> +	early_work_area_in_use = true;
>> +	memset(early_work_area.buf, 0, early_work_area.size);
>> +	return &early_work_area;
>> +}
>> +
>> +static void __init rtas_work_area_free_early(struct rtas_work_area *work_area)
>> +{
>> +	WARN_ON(work_area != &early_work_area);
>> +	WARN_ON(!early_work_area_in_use);
>> +	early_work_area_in_use = false;
>> +}
>> +
>> +struct rtas_work_area * __ref rtas_work_area_alloc(size_t size)
>> +{
>> +	struct rtas_work_area *area;
>> +	unsigned long addr;
>> +
>> +	might_sleep();
>> +
>> +	WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
>> +	size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ);
>
> This seems unsafe.
>
> If you return a buffer smaller than the caller asks for they're likely
> to read/write past the end of it and corrupt memory.

OK, let's figure out another way to handle this.

> AFAIK genalloc doesn't have guard pages or anything fancy to save us
> from that - but maybe I'm wrong, I've never used it.

Yeah we would have to build our own thing on top of it. And I don't
think it could be something that traps on access, it would have to be a
check in rtas_work_area_free(), after the fact.

> There's only three callers in the end, seems like we should just return
> NULL if the size is too large and have callers check the return value.

There are more conversions to do, and a property I hope to maintain is
that requests can't fail. Existing users of rtas_data_buf don't have
error paths for failure to acquire the buffer.

I believe the allocation size passed to rtas_work_area_alloc() can be
known at build time in all cases. Maybe we could prevent inappropriate
requests from being built with a compile-time assertion (untested):

/* rtas-work-area.h */

static inline struct rtas_work_area *rtas_work_area_alloc(size_t sz)
{
	static_assert(sz < RTAS_WORK_AREA_MAX_ALLOC_SZ);
        return __rtas_work_area_alloc(sz);
}

I think this would be OK? If I can't make it work I'll fall back to
returning NULL as you suggest, but it will make for more churn (and
risk) in the conversions.


>> +	if (!rwa_state->available) {
>> +		area = rtas_work_area_alloc_early(size);
>> +		goto out;
>> +	}
>> +
>> +	/*
>> +	 * To ensure FCFS behavior and prevent a high rate of smaller
>> +	 * requests from starving larger ones, use the mutex to queue
>> +	 * allocations.
>> +	 */
>> +	mutex_lock(&rwa_state->mutex);
>> +	wait_event(rwa_state->wqh,
>> +		   (addr = gen_pool_alloc(rwa_state->gen_pool, size)) != 0);
>> +	mutex_unlock(&rwa_state->mutex);
>> +
>> +	area = mempool_alloc(&rwa_state->descriptor_pool, GFP_KERNEL);
>> +	*area = (typeof(*area)){
>> +		.size = size,
>> +		.buf = (char *)addr,
>> +	};
>
> That is an odd way to write that :)

yeah I'll change it.

>
>> +out:
>> +	pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
>> +		 (void *)_RET_IP_, __func__, area->buf, area->size);
>
> Can we drop those? They need a recompile to enable, so if someone needs
> debugging they can just rewrite them - or use some sort of tracing
> instead.

Sure.


>> +machine_arch_initcall(pseries, rtas_work_area_allocator_init);
>
> Should it live in platforms/pseries then?

Yeah it probably ought to. I am pretty sure the "work area" construct is
PAPR-specific, and I haven't found any evidence that it's used on
non-pseries.


>> +/**
>> + * rtas_work_area_reserve_arena() - reserve memory suitable for RTAS work areas.
>> + */
>> +int __init rtas_work_area_reserve_arena(const phys_addr_t limit)
>> +{
>> +	const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN;
>> +	const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ;
>> +	const phys_addr_t min = MEMBLOCK_LOW_LIMIT;
>> +	const int nid = NUMA_NO_NODE;
>
> This should probably also be restricted to pseries?

Yes.

  reply	other threads:[~2023-02-08 14:49 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-06 18:54 [PATCH v2 00/19] RTAS maintenance Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 01/19] powerpc/rtas: handle extended delays safely in early boot Nathan Lynch via B4 Submission Endpoint
2023-02-08 11:20   ` Michael Ellerman
2023-02-08 13:14     ` Nathan Lynch
2023-02-10  5:54       ` Michael Ellerman
2023-02-06 18:54 ` [PATCH v2 02/19] powerpc/perf/hv-24x7: add missing RTAS retry status handling Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 03/19] powerpc/pseries/lpar: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 04/19] powerpc/pseries/lparcfg: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 05/19] powerpc/pseries/setup: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 06/19] powerpc/pseries: drop RTAS-based timebase synchronization Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 07/19] powerpc/rtas: improve function information lookups Nathan Lynch via B4 Submission Endpoint
2023-02-08 11:57   ` Michael Ellerman
2023-02-08 13:16     ` Nathan Lynch
2023-02-06 18:54 ` [PATCH v2 08/19] powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 09/19] powerpc/tracing: tracepoints for RTAS entry and exit Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 10/19] powerpc/rtas: add tracepoints around RTAS entry Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 11/19] powerpc/rtas: add work area allocator Nathan Lynch via B4 Submission Endpoint
2023-02-08 11:58   ` Michael Ellerman
2023-02-08 14:48     ` Nathan Lynch [this message]
2023-02-10  6:07       ` Michael Ellerman
2023-02-06 18:54 ` [PATCH v2 12/19] powerpc/pseries/dlpar: use RTAS work area API Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 13/19] powerpc/pseries: PAPR system parameter API Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 14/19] powerpc/pseries: convert CMO probe to papr_sysparm API Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 15/19] powerpc/pseries/lparcfg: convert " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 16/19] powerpc/pseries/hv-24x7: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 17/19] powerpc/pseries/lpar: " Nathan Lynch via B4 Submission Endpoint
2023-02-06 18:54 ` [PATCH v2 18/19] powerpc/rtas: introduce rtas_function_token() API Nathan Lynch via B4 Submission Endpoint
2023-02-08 12:09   ` Michael Ellerman
2023-02-08 15:44     ` Nathan Lynch
2023-02-06 18:54 ` [PATCH v2 19/19] powerpc/rtas: arch-wide function token lookup conversions Nathan Lynch via B4 Submission Endpoint

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0v0nrha.fsf@linux.ibm.com \
    --to=nathanl@linux.ibm.com \
    --cc=ajd@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=devnull+nathanl.linux.ibm.com@kernel.org \
    --cc=kjain@linux.ibm.com \
    --cc=ldufour@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=nnac123@linux.ibm.com \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).