From: "Alex Bennée" <alex.bennee@linaro.org>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: qemu-devel@nongnu.org, laurent@vivier.eu, pbonzini@redhat.com,
imp@bsdimp.com, f4bug@amsat.org
Subject: Re: [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking
Date: Thu, 27 Oct 2022 16:59:11 +0100 [thread overview]
Message-ID: <87y1t1yw32.fsf@linaro.org> (raw)
In-Reply-To: <4df39234-6697-61b8-6c56-1bd17b4f9fa8@linaro.org>
Richard Henderson <richard.henderson@linaro.org> writes:
> On 10/26/22 23:36, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
>>> Finish weaning user-only away from PageDesc.
>>>
>>> Using an interval tree to track page permissions means that
>>> we can represent very large regions efficiently.
>>>
>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/290
>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/967
>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1214
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>> accel/tcg/internal.h | 4 +-
>>> accel/tcg/tb-maint.c | 20 +-
>>> accel/tcg/user-exec.c | 614 ++++++++++++++++++++++-----------
>>> tests/tcg/multiarch/test-vma.c | 22 ++
>>> 4 files changed, 451 insertions(+), 209 deletions(-)
>>> create mode 100644 tests/tcg/multiarch/test-vma.c
>>>
>>> diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
>>> index 250f0daac9..c7e157d1cd 100644
>>> --- a/accel/tcg/internal.h
>>> +++ b/accel/tcg/internal.h
>>> @@ -24,9 +24,7 @@
>>> #endif
>>> typedef struct PageDesc {
>>> -#ifdef CONFIG_USER_ONLY
>>> - unsigned long flags;
>>> -#else
>>> +#ifndef CONFIG_USER_ONLY
>>> QemuSpin lock;
>>> /* list of TBs intersecting this ram page */
>>> uintptr_t first_tb;
>>> diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
>>> index 14e8e47a6a..694440cb4a 100644
>>> --- a/accel/tcg/tb-maint.c
>>> +++ b/accel/tcg/tb-maint.c
>>> @@ -68,15 +68,23 @@ static void page_flush_tb(void)
>> <snip>
>>> int page_get_flags(target_ulong address)
>>> {
>>> - PageDesc *p;
>>> + PageFlagsNode *p = pageflags_find(address, address);
>>> - p = page_find(address >> TARGET_PAGE_BITS);
>>> + /*
>>> + * See util/interval-tree.c re lockless lookups: no false positives but
>>> + * there are false negatives. If we find nothing, retry with the mmap
>>> + * lock acquired.
>>> + */
>>> if (!p) {
>>> - return 0;
>>> + if (have_mmap_lock()) {
>>> + return 0;
>>> + }
>>> + mmap_lock();
>>> + p = pageflags_find(address, address);
>>> + mmap_unlock();
>>> + if (!p) {
>>> + return 0;
>>> + }
>>> }
>>> return p->flags;
>> To avoid the brain twisting following locks and multiple return legs
>> how about this:
>> int page_get_flags(target_ulong address)
>> {
>> PageFlagsNode *p = pageflags_find(address, address);
>> /*
>> * See util/interval-tree.c re lockless lookups: no false positives but
>> * there are false negatives. If we had the lock and found
>> * nothing we are done, otherwise retry with the mmap lock acquired.
>> */
>> if (have_mmap_lock()) {
>> return p ? p->flags : 0;
>> }
>> mmap_lock();
>> p = pageflags_find(address, address);
>> mmap_unlock();
>> return p ? p->flags : 0;
>> }
>
> I'm unwilling to put an expensive test like a function call
> (have_mmap_lock) before an inexpensive test like pointer != NULL.
Is it really that more expensive?
> I don't see what's so brain twisting about the code as is. The lock
> tightly surrounds a single statement, with a couple of pointer tests.
Sure, I guess I'm just trying to avoid having so many returns out of
the code at various levels of nesting. The page_get_target_data code is
harder to follow. What about:
int page_get_flags(target_ulong address)
{
PageFlagsNode *p = pageflags_find(address, address);
/*
* See util/interval-tree.c re lockless lookups: no false positives but
* there are false negatives. If we had the lock and found
* nothing we are done, otherwise retry with the mmap lock acquired.
*/
if (p) {
return p->flags;
} else if (have_mmap_lock()) {
return 0;
}
mmap_lock();
p = pageflags_find(address, address);
mmap_unlock();
return p ? p->flags : 0;
}
and:
static IntervalTreeNode * new_target_data_locked(target_ulong region)
{
IntervalTreeNode *n;
TargetPageDataNode *t;
t = g_new0(TargetPageDataNode, 1);
n = &t->itree;
n->start = region;
n->last = region | ~TBD_MASK;
interval_tree_insert(n, &targetdata_root);
return n;
}
static inline void * get_target_data(IntervalTreeNode *n,
target_ulong region, target_ulong page)
{
TargetPageDataNode *t;
t = container_of(n, TargetPageDataNode, itree);
return t->data[(page - region) >> TARGET_PAGE_BITS];
}
void *page_get_target_data(target_ulong address)
{
IntervalTreeNode *n;
target_ulong page, region;
page = address & TARGET_PAGE_MASK;
region = address & TBD_MASK;
n = interval_tree_iter_first(&targetdata_root, page, page);
if (n) {
return get_target_data(n, region, page);
}
/*
* See util/interval-tree.c re lockless lookups: no false positives but
* there are false negatives. If we find nothing but had the lock
* then we need a new node, otherwise try again under lock and
* potentially allocate a new node.
*/
if (have_mmap_lock()) {
n = new_target_data_locked(region);
return get_target_data(n, region, page);
} else {
mmap_lock();
n = interval_tree_iter_first(&targetdata_root, page, page);
if (!n) {
n = new_target_data_locked(region);
}
mmap_unlock();
}
return get_target_data(n, region, page);
}
>
>>> +/*
>>> + * Test very large vma allocations.
>>> + * The qemu out-of-memory condition was within the mmap syscall itself.
>>> + * If the syscall actually returns with MAP_FAILED, the test succeeded.
>>> + */
>>> +#include <sys/mman.h>
>>> +
>>> +int main()
>>> +{
>>> + int n = sizeof(size_t) == 4 ? 32 : 45;
>>> +
>>> + for (int i = 28; i < n; i++) {
>>> + size_t l = (size_t)1 << i;
>>> + void *p = mmap(0, l, PROT_NONE,
>>> + MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
>>> + if (p == MAP_FAILED) {
>>> + break;
>>> + }
>>> + munmap(p, l);
>>> + }
>>> + return 0;
>>> +}
>> So is the failure mode here we actually seg or bus out?
>
> SEGV or KILL (via oom) depending on the state of the system. If the
> host is *really* beefy, it may even complete but with an unreasonable
> timeout.
>
> r~
--
Alex Bennée
next prev parent reply other threads:[~2022-10-27 18:13 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-06 3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
2022-10-06 3:10 ` [PATCH 01/24] util: Add interval-tree.c Richard Henderson
2022-10-25 8:40 ` Alex Bennée
2022-10-25 10:36 ` Richard Henderson
2022-10-06 3:10 ` [PATCH 02/24] accel/tcg: Make page_alloc_target_data allocation constant Richard Henderson
2022-10-25 8:45 ` Alex Bennée
2022-10-06 3:10 ` [PATCH 03/24] accel/tcg: Remove disabled debug in translate-all.c Richard Henderson
2022-10-25 8:46 ` Alex Bennée
2022-10-06 3:10 ` [PATCH 04/24] accel/tcg: Split out PageDesc to internal.h Richard Henderson
2022-10-25 8:47 ` Alex Bennée
2022-10-06 3:10 ` [PATCH 05/24] accel/tcg: Split out tb-maint.c Richard Henderson
2022-10-25 8:49 ` Alex Bennée
2022-10-06 3:10 ` [PATCH 06/24] accel/tcg: Move assert_no_pages_locked to internal.h Richard Henderson
2022-10-25 8:49 ` Alex Bennée
2022-10-06 3:10 ` [PATCH 07/24] accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC Richard Henderson
2022-10-25 8:50 ` Alex Bennée
2022-10-06 3:10 ` [PATCH 08/24] accel/tcg: Remove duplicate store to tb->page_addr[] Richard Henderson
2022-10-25 9:12 ` Alex Bennée
2022-10-06 3:10 ` [PATCH 09/24] accel/tcg: Introduce tb_{set_}page_addr{0,1} Richard Henderson
2022-10-25 9:47 ` Alex Bennée
2022-10-06 3:10 ` [PATCH 10/24] accel/tcg: Rename tb_invalidate_phys_page Richard Henderson
2022-10-25 9:49 ` Alex Bennée
2022-10-06 3:11 ` [PATCH 11/24] accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter Richard Henderson
2022-10-25 13:21 ` Alex Bennée
2022-10-06 3:11 ` [PATCH 12/24] accel/tcg: Unify declarations of tb_invalidate_phys_range Richard Henderson
2022-10-25 13:24 ` Alex Bennée
2022-10-06 3:11 ` [PATCH 13/24] accel/tcg: Use tb_invalidate_phys_page in page_set_flags Richard Henderson
2022-10-06 3:11 ` [PATCH 14/24] accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET Richard Henderson
2022-10-25 15:42 ` Alex Bennée
2022-10-25 20:55 ` Richard Henderson
2022-10-06 3:11 ` [PATCH 15/24] accel/tcg: Use interval tree for TBs in user-only mode Richard Henderson
2022-10-25 15:58 ` Alex Bennée
2022-10-25 21:19 ` Richard Henderson
2022-10-06 3:11 ` [PATCH 16/24] accel/tcg: Use page_reset_target_data in page_set_flags Richard Henderson
2022-10-25 16:12 ` Alex Bennée
2022-10-06 3:11 ` [PATCH 17/24] accel/tcg: Use tb_invalidate_phys_range " Richard Henderson
2022-10-25 16:14 ` Alex Bennée
2022-10-06 3:11 ` [PATCH 18/24] accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c Richard Henderson
2022-10-25 16:15 ` Alex Bennée
2022-10-06 3:11 ` [PATCH 19/24] accel/tcg: Simplify page_get/alloc_target_data Richard Henderson
2022-10-25 16:19 ` Alex Bennée
2022-10-06 3:11 ` [PATCH 20/24] accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE Richard Henderson
2022-10-25 19:30 ` Alex Bennée
2022-10-25 21:29 ` Richard Henderson
2022-10-06 3:11 ` [PATCH 21/24] accel/tcg: Move page_{get,set}_flags to user-exec.c Richard Henderson
2022-10-26 12:35 ` [PATCH 21/24] accel/tcg: Move page_{get, set}_flags " Alex Bennée
2022-10-06 3:11 ` [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking Richard Henderson
2022-10-26 13:36 ` Alex Bennée
2022-10-27 10:20 ` Richard Henderson
2022-10-27 15:59 ` Alex Bennée [this message]
2022-10-27 21:38 ` Richard Henderson
2022-10-06 3:11 ` [PATCH 23/24] accel/tcg: Move PageDesc tree into tb-maint.c for system Richard Henderson
2022-10-06 3:11 ` [PATCH 24/24] accel/tcg: Move remainder of page locking to tb-maint.c Richard Henderson
2022-10-26 13:48 ` Alex Bennée
2022-10-24 23:05 ` [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y1t1yw32.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=f4bug@amsat.org \
--cc=imp@bsdimp.com \
--cc=laurent@vivier.eu \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.