* [PATCH v4 0/5] Some fixes about vgic-its @ 2024-11-07 21:41 Jing Zhang 2024-11-07 21:41 ` [PATCH v4 1/5] KVM: selftests: aarch64: Add VGIC selftest for save/restore ITS table mappings Jing Zhang ` (5 more replies) 0 siblings, 6 replies; 12+ messages in thread From: Jing Zhang @ 2024-11-07 21:41 UTC (permalink / raw) To: KVM, KVMARM, ARMLinux, Marc Zyngier, Oliver Upton, Joey Gouly, Zenghui Yu, Suzuki K Poulose, Kunkun Jiang Cc: Paolo Bonzini, Andre Przywara, Colton Lewis, Raghavendra Rao Ananta, Shusen Li, Eric Auger, Jing Zhang This patch series addresses a critical issue in the VGIC ITS tables' save/restore mechanism, accompanied by a comprehensive selftest for bug reproduction and verification. The fix is originally from Kunkun Jiang at [1]. The identified bug manifests as a failure in VM suspend/resume operations. The root cause lies in the repeated suspend attempts often required for successful VM suspension, coupled with concurrent device interrupt registration and freeing. This concurrency leads to inconsistencies in ITS mappings before the save operation, potentially leaving orphaned Device Translation Entries (DTEs) and Interrupt Translation Entries (ITEs) in the respective tables. During the subsequent restore operation, encountering these orphaned entries can result in two error scenarios: * EINVAL Error: If an orphaned entry lacks a corresponding collection ID, the restore operation fails with an EINVAL error. * Mapping Corruption: If an orphaned entry possesses a valid collection ID, the restore operation may succeed but with incorrect or lost mappings, compromising system integrity. The provided selftest facilitates the reproduction of both error scenarios: * EINVAL Reproduction: Execute ./vgic_its_tables without any options. * Mapping Corruption Reproduction: Execute ./vgic_its_tables -s The -s option enforces identical collection IDs for all mappings. * A workaround within the selftest involves clearing the tables before the save operation using the command ./vgic_its_tables -c. With this, we can run the the selftest successfully on host w/o the fix. --- * v3 -> v4: - Added two helper functions for table entry read/write in guest memory. - Move selftest as the first patch to easily run on a host without the fix. * v2 -> v3: - Rebased to v6.12-rc6 - Fixed some typos - Added a selftest for bug reproduction and verification * v1 -> v2: - Replaced BUG_ON() with KVM_BUG_ON() [1] https://lore.kernel.org/linux-arm-kernel/20240704142319.728-1-jiangkunkun@huawei.com --- Jing Zhang (2): KVM: selftests: aarch64: Add VGIC selftest for save/restore ITS table mappings KVM: arm64: vgic-its: Add read/write helpers on ITS table entries. Kunkun Jiang (3): KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE arch/arm64/kvm/vgic/vgic-its.c | 31 +- arch/arm64/kvm/vgic/vgic.h | 23 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/aarch64/vgic_its_tables.c | 565 ++++++++++++++++++ .../kvm/include/aarch64/gic_v3_its.h | 3 +- .../testing/selftests/kvm/include/kvm_util.h | 4 +- .../selftests/kvm/lib/aarch64/gic_v3_its.c | 24 +- 7 files changed, 631 insertions(+), 20 deletions(-) create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_its_tables.c base-commit: 59b723cd2adbac2a34fc8e12c74ae26ae45bf230 -- 2.47.0.277.g8800431eea-goog ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v4 1/5] KVM: selftests: aarch64: Add VGIC selftest for save/restore ITS table mappings 2024-11-07 21:41 [PATCH v4 0/5] Some fixes about vgic-its Jing Zhang @ 2024-11-07 21:41 ` Jing Zhang 2024-11-07 21:41 ` [PATCH v4 2/5] KVM: arm64: vgic-its: Add read/write helpers on ITS table entries Jing Zhang ` (4 subsequent siblings) 5 siblings, 0 replies; 12+ messages in thread From: Jing Zhang @ 2024-11-07 21:41 UTC (permalink / raw) To: KVM, KVMARM, ARMLinux, Marc Zyngier, Oliver Upton, Joey Gouly, Zenghui Yu, Suzuki K Poulose, Kunkun Jiang Cc: Paolo Bonzini, Andre Przywara, Colton Lewis, Raghavendra Rao Ananta, Shusen Li, Eric Auger, Jing Zhang This patch adds a selftest to verify the integrity of GIC ITS table mappings after save/restore operations, triggered via the KVM_DEV_ARM_ITS_[SAVE|RESTORE]_TABLES ioctls. The current save/restore mechanism can exhibit inconsistencies due to concurrent device interrupt registration and freeing during repeated suspend attempts. This can lead to orphaned Device Translation Entries (DTEs) and Interrupt Translation Entries (ITEs), causing EINVAL errors or mapping corruption during restore. EINVAL Error: If an orphaned entry lacks a corresponding collection ID, the restore operation fails with an EINVAL error. Mapping Corruption: If an orphaned entry possesses a valid collection ID, the restore operation may succeed but with incorrect or lost mappings, compromising system integrity. This selftest reproduces these error scenarios: * EINVAL: `./vgic_its_tables` * Mapping corruption: `./vgic_its_tables -s` The -s option enforces identical collection IDs for all mappings. The `-c` option demonstrates a potential fix by clearing the tables before saving. Signed-off-by: Jing Zhang <jingzhangos@google.com> --- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/aarch64/vgic_its_tables.c | 565 ++++++++++++++++++ .../kvm/include/aarch64/gic_v3_its.h | 3 +- .../testing/selftests/kvm/include/kvm_util.h | 4 +- .../selftests/kvm/lib/aarch64/gic_v3_its.c | 24 +- 5 files changed, 591 insertions(+), 6 deletions(-) create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_its_tables.c diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 156fbfae940f..9cba573c23f3 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -163,6 +163,7 @@ TEST_GEN_PROGS_aarch64 += aarch64/smccc_filter TEST_GEN_PROGS_aarch64 += aarch64/vcpu_width_config TEST_GEN_PROGS_aarch64 += aarch64/vgic_init TEST_GEN_PROGS_aarch64 += aarch64/vgic_irq +TEST_GEN_PROGS_aarch64 += aarch64/vgic_its_tables TEST_GEN_PROGS_aarch64 += aarch64/vgic_lpi_stress TEST_GEN_PROGS_aarch64 += aarch64/vpmu_counter_access TEST_GEN_PROGS_aarch64 += aarch64/no-vgic-v3 diff --git a/tools/testing/selftests/kvm/aarch64/vgic_its_tables.c b/tools/testing/selftests/kvm/aarch64/vgic_its_tables.c new file mode 100644 index 000000000000..fd4e7cbf7bc0 --- /dev/null +++ b/tools/testing/selftests/kvm/aarch64/vgic_its_tables.c @@ -0,0 +1,565 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * vgic_its_tables - VGIC selftest for save/restore ITS table mappings + * + * Copyright (c) 2024 Google LLC + */ + +#include <linux/sizes.h> +#include <pthread.h> +#include <stdatomic.h> +#include <sys/sysinfo.h> + +#include "kvm_util.h" +#include "gic.h" +#include "gic_v3.h" +#include "gic_v3_its.h" +#include "processor.h" +#include "ucall.h" +#include "vgic.h" +#include "kselftest.h" + + +#define GIC_LPI_OFFSET 8192 +#define TEST_MEMSLOT_INDEX 1 +#define TABLE_SIZE SZ_64K +#define DEFAULT_NR_L2 4ULL +#define DTE_SIZE 8ULL +#define ITE_SIZE 8ULL +#define NR_EVENTS (TABLE_SIZE / ITE_SIZE) +/* We only have 64K PEND/PROP tables */ +#define MAX_NR_L2 ((TABLE_SIZE - GIC_LPI_OFFSET) * DTE_SIZE / TABLE_SIZE) + +static vm_paddr_t gpa_base; + +static struct kvm_vm *vm; +static struct kvm_vcpu *vcpu; +static int gic_fd, its_fd; +static u32 collection_id = 0; + +struct event_id_block { + u32 start; + u32 size; +}; + +static struct mappings_tracker { + struct event_id_block *devices; + struct event_id_block *devices_va; +} mtracker; + +static struct test_data { + vm_paddr_t l1_device_table; + vm_paddr_t l2_device_tables; + vm_paddr_t collection_table; + vm_paddr_t cmdq_base; + void *cmdq_base_va; + vm_paddr_t itt_tables; + + vm_paddr_t lpi_prop_table; + vm_paddr_t lpi_pend_tables; + + int control_cmd; + bool clear_before_save; + bool same_coll_id; + size_t nr_l2_tables; + size_t nr_devices; +} td = { + .clear_before_save = false, + .same_coll_id = false, + .nr_l2_tables = DEFAULT_NR_L2, + .nr_devices = DEFAULT_NR_L2 * TABLE_SIZE / DTE_SIZE, +}; + +static void guest_its_mappings_clear(void) +{ + memset((void *)td.l2_device_tables, 0, TABLE_SIZE * td.nr_l2_tables); + memset((void *)td.collection_table, 0, TABLE_SIZE); + memset((void *)td.itt_tables, 0, td.nr_devices * TABLE_SIZE); +} + +static void guest_its_unmap_all(bool update_tracker) +{ + u32 device_id, event_id; + + for (device_id = 0; device_id < td.nr_devices; device_id++) { + vm_paddr_t itt_base = td.itt_tables + (device_id * TABLE_SIZE); + u32 start_id = mtracker.devices[device_id].start; + u32 end_id = start_id + mtracker.devices[device_id].size; + + for (event_id = start_id; event_id < end_id ; event_id++) + its_send_discard_cmd(td.cmdq_base_va, + device_id, event_id); + + if (end_id - start_id > 0) + its_send_mapd_cmd(td.cmdq_base_va, device_id, + itt_base, TABLE_SIZE, false); + + if (update_tracker) { + mtracker.devices[device_id].start = 0; + mtracker.devices[device_id].size = 0; + } + + } + + for (u32 i= 0; i <= collection_id; i++) + its_send_mapc_cmd(td.cmdq_base_va, 0, i, false); +} + +static void guest_its_map_single_event(u32 device_id, u32 event_id, u32 coll_id) +{ + u32 intid = GIC_LPI_OFFSET; + + guest_its_unmap_all(true); + + its_send_mapc_cmd(td.cmdq_base_va, guest_get_vcpuid(), coll_id, true); + its_send_mapd_cmd(td.cmdq_base_va, device_id, + td.itt_tables + (device_id * TABLE_SIZE), TABLE_SIZE, true); + its_send_mapti_cmd(td.cmdq_base_va, device_id, + event_id, coll_id, intid); + + + mtracker.devices[device_id].start = event_id; + mtracker.devices[device_id].size = 1; +} + +static void guest_its_map_event_per_device(u32 event_id, u32 coll_id) +{ + u32 device_id, intid = GIC_LPI_OFFSET; + + guest_its_unmap_all(true); + + its_send_mapc_cmd(td.cmdq_base_va, guest_get_vcpuid(), coll_id, true); + + for (device_id = 0; device_id < td.nr_devices; device_id++) { + vm_paddr_t itt_base = td.itt_tables + (device_id * TABLE_SIZE); + + its_send_mapd_cmd(td.cmdq_base_va, device_id, + itt_base, TABLE_SIZE, true); + + its_send_mapti_cmd(td.cmdq_base_va, device_id, + event_id, coll_id, intid++); + + mtracker.devices[device_id].start = event_id; + mtracker.devices[device_id].size = 1; + + } +} + +static void guest_setup_gic(void) +{ + u32 cpuid = guest_get_vcpuid(); + + gic_init(GIC_V3, 1); + gic_rdist_enable_lpis(td.lpi_prop_table, TABLE_SIZE, + td.lpi_pend_tables + (cpuid * TABLE_SIZE)); + + guest_its_mappings_clear(); + + its_init(td.collection_table, TABLE_SIZE, + td.l1_device_table, TABLE_SIZE, + td.cmdq_base, TABLE_SIZE, true); +} + +enum { + GUEST_EXIT, + MAP_INIT, + MAP_INIT_DONE, + MAP_DONE, + PREPARE_FOR_SAVE, + PREPARE_DONE, + MAP_EMPTY, + MAP_SINGLE_EVENT_FIRST, + MAP_SINGLE_EVENT_LAST, + MAP_FIRST_EVENT_PER_DEVICE, + MAP_LAST_EVENT_PER_DEVICE, +}; + +static void guest_code(size_t nr_lpis) +{ + int cmd; + + guest_setup_gic(); + GUEST_SYNC1(MAP_INIT_DONE); + + while ((cmd = READ_ONCE(td.control_cmd)) != GUEST_EXIT) { + switch (cmd) { + case MAP_INIT: + guest_its_unmap_all(true); + if (td.clear_before_save) + guest_its_mappings_clear(); + GUEST_SYNC1(MAP_INIT_DONE); + break; + case PREPARE_FOR_SAVE: + its_init(td.collection_table, TABLE_SIZE, + td.l1_device_table, TABLE_SIZE, + td.cmdq_base, TABLE_SIZE, true); + GUEST_SYNC1(PREPARE_DONE); + break; + case MAP_EMPTY: + guest_its_mappings_clear(); + GUEST_SYNC1(MAP_DONE); + break; + case MAP_SINGLE_EVENT_FIRST: + guest_its_map_single_event(1, 1, collection_id); + if (!td.same_coll_id) + collection_id++; + GUEST_SYNC1(MAP_DONE); + break; + case MAP_SINGLE_EVENT_LAST: + guest_its_map_single_event(td.nr_devices - 2, NR_EVENTS - 2, + collection_id); + if (!td.same_coll_id) + collection_id++; + GUEST_SYNC1(MAP_DONE); + break; + case MAP_FIRST_EVENT_PER_DEVICE: + guest_its_map_event_per_device(2, collection_id); + if (!td.same_coll_id) + collection_id++; + GUEST_SYNC1(MAP_DONE); + break; + case MAP_LAST_EVENT_PER_DEVICE: + guest_its_map_event_per_device(NR_EVENTS - 3, + collection_id); + if (!td.same_coll_id) + collection_id++; + GUEST_SYNC1(MAP_DONE); + break; + default: + break; + } + } + + GUEST_DONE(); +} + +static void setup_memslot(void) +{ + size_t pages; + size_t sz; + + /* + * For the ITS: + * - A single l1 level device table + * - td.nr_l2_tables l2 level device tables + * - A single level collection table + * - The command queue + * - An ITT for each device + */ + sz = (3 + td.nr_l2_tables + td.nr_devices) * TABLE_SIZE; + + /* + * For the redistributors: + * - A shared LPI configuration table + * - An LPI pending table for the vCPU + */ + sz += 2 * TABLE_SIZE; + + /* + * For the mappings tracker + */ + sz += sizeof(*mtracker.devices) * td.nr_devices; + + pages = sz / vm->page_size; + gpa_base = ((vm_compute_max_gfn(vm) + 1) * vm->page_size) - sz; + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, gpa_base, + TEST_MEMSLOT_INDEX, pages, 0); +} + +#define KVM_ITS_L1E_VALID_MASK BIT_ULL(63) +#define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16) + +static void setup_test_data(void) +{ + size_t pages_per_table = vm_calc_num_guest_pages(vm->mode, TABLE_SIZE); + size_t pages_mt = sizeof(*mtracker.devices) * td.nr_devices / vm->page_size; + + mtracker.devices = (void *)vm_phy_pages_alloc(vm, pages_mt, gpa_base, + TEST_MEMSLOT_INDEX); + virt_map(vm, (vm_paddr_t)mtracker.devices, + (vm_paddr_t)mtracker.devices, pages_mt); + mtracker.devices_va = (void *)addr_gpa2hva(vm, (vm_paddr_t)mtracker.devices); + + td.l2_device_tables = vm_phy_pages_alloc(vm, + pages_per_table * td.nr_l2_tables, + gpa_base, TEST_MEMSLOT_INDEX); + td.l1_device_table = vm_phy_pages_alloc(vm, pages_per_table, + gpa_base, + TEST_MEMSLOT_INDEX); + td.collection_table = vm_phy_pages_alloc(vm, pages_per_table, + gpa_base, + TEST_MEMSLOT_INDEX); + td.itt_tables = vm_phy_pages_alloc(vm, pages_per_table * td.nr_devices, + gpa_base, TEST_MEMSLOT_INDEX); + td.lpi_prop_table = vm_phy_pages_alloc(vm, pages_per_table, + gpa_base, TEST_MEMSLOT_INDEX); + td.lpi_pend_tables = vm_phy_pages_alloc(vm, pages_per_table, + gpa_base, TEST_MEMSLOT_INDEX); + td.cmdq_base = vm_phy_pages_alloc(vm, pages_per_table, gpa_base, + TEST_MEMSLOT_INDEX); + + u64 *l1_tbl = addr_gpa2hva(vm, td.l1_device_table); + for (int i = 0; i < td.nr_l2_tables; i++) { + u64 l2_addr = ((u64)td.l2_device_tables + i * TABLE_SIZE); + *(l1_tbl + i) = cpu_to_le64(l2_addr | KVM_ITS_L1E_VALID_MASK); + } + + virt_map(vm, td.l2_device_tables, td.l2_device_tables, + pages_per_table * td.nr_l2_tables); + virt_map(vm, td.l1_device_table, + td.l1_device_table, pages_per_table); + virt_map(vm, td.collection_table, + td.collection_table, pages_per_table); + virt_map(vm, td.itt_tables, + td.itt_tables, pages_per_table * td.nr_devices); + virt_map(vm, td.cmdq_base, td.cmdq_base, pages_per_table); + td.cmdq_base_va = (void *)td.cmdq_base; + + sync_global_to_guest(vm, mtracker); + sync_global_to_guest(vm, td); +} + +static void setup_gic(void) +{ + gic_fd = vgic_v3_setup(vm, 1, 64); + __TEST_REQUIRE(gic_fd >= 0, "Failed to create GICv3"); + + its_fd = vgic_its_setup(vm); +} + +static bool is_mapped(u32 device_id, u32 event_id) +{ + vm_paddr_t db_addr = GITS_BASE_GPA + GITS_TRANSLATER; + + struct kvm_msi msi = { + .address_lo = db_addr, + .address_hi = db_addr >> 32, + .data = event_id, + .devid = device_id, + .flags = KVM_MSI_VALID_DEVID, + }; + + /* + * KVM_SIGNAL_MSI returns 1 if the MSI wasn't 'blocked' by the VM, + * which for arm64 implies having a valid translation in the ITS. + */ + return __vm_ioctl(vm, KVM_SIGNAL_MSI, &msi); +} + +static bool restored_mappings_sanity_check(void) +{ + u64 lost_count = 0, wrong_count = 0; + bool pass = true; + + sync_global_from_guest(vm, mtracker); + + ksft_print_msg("\tChecking restored ITS mappings ...\n"); + for(u32 dev_id = 0; dev_id < td.nr_devices; dev_id++) { + u32 start_id = mtracker.devices_va[dev_id].start; + u32 end_id = start_id + mtracker.devices_va[dev_id].size; + + for (u32 eid = 0; eid < NR_EVENTS; eid++) { + bool save_mapped = eid >= start_id && eid < end_id; + bool restore_mapped = is_mapped(dev_id, eid); + + if(save_mapped && !restore_mapped && ++lost_count < 6) { + ksft_print_msg("\t\tMapping lost for device:%u, event:%u\n", + dev_id, eid); + pass = false; + } else if (!save_mapped && restore_mapped && ++wrong_count < 6) { + ksft_print_msg("\t\tWrong mapping from device:%u, event:%u\n", + dev_id, eid); + pass = false; + } + /* + * For test purpose, we only use the first and last 3 events + * per device. + */ + if (eid == 2) + eid = NR_EVENTS - 4; + } + if (lost_count > 5 || wrong_count > 5) { + ksft_print_msg("\tThere are more lost/wrong mappings found.\n"); + break; + } + } + + return pass; +} + +static void run_its_tables_save_restore_test(int test_cmd) +{ + struct timespec start, delta; + struct ucall uc; + bool done = false; + double duration; + bool pass = true; + + write_guest_global(vm, td.control_cmd, MAP_INIT); + while (!done) { + vcpu_run(vcpu); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_SYNC: + switch (uc.args[0]) { + case MAP_INIT_DONE: + write_guest_global(vm, td.control_cmd, test_cmd); + break; + case MAP_DONE: + clock_gettime(CLOCK_MONOTONIC, &start); + + kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_CTRL, + KVM_DEV_ARM_ITS_SAVE_TABLES, NULL); + + delta = timespec_elapsed(start); + duration = (double)delta.tv_sec * USEC_PER_SEC; + duration += (double)delta.tv_nsec / NSEC_PER_USEC; + ksft_print_msg("\tITS tables save time: %.2f (us)\n", duration); + + /* Prepare for restoring */ + kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_CTRL, + KVM_DEV_ARM_ITS_CTRL_RESET, NULL); + write_guest_global(vm, td.control_cmd, PREPARE_FOR_SAVE); + break; + case PREPARE_DONE: + done = true; + break; + } + break; + case UCALL_DONE: + done = true; + break; + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + default: + TEST_FAIL("Unknown ucall: %lu", uc.cmd); + } + } + + + clock_gettime(CLOCK_MONOTONIC, &start); + + int ret = __kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_CTRL, + KVM_DEV_ARM_ITS_RESTORE_TABLES, NULL); + if (ret) { + ksft_print_msg("\t"); + ksft_print_msg(KVM_IOCTL_ERROR(KVM_SET_DEVICE_ATTR, ret)); + ksft_print_msg("\n"); + ksft_print_msg("\tFailed to restore ITS tables.\n"); + pass = false; + } + + delta = timespec_elapsed(start); + duration = (double)delta.tv_sec * USEC_PER_SEC; + duration += (double)delta.tv_nsec / NSEC_PER_USEC; + ksft_print_msg("\tITS tables restore time: %.2f (us)\n", duration); + + if (restored_mappings_sanity_check() && pass) + ksft_test_result_pass("*** PASSED ***\n"); + else + ksft_test_result_fail("*** FAILED ***\n"); + +} + +static void setup_vm(void) +{ + vm = __vm_create_with_one_vcpu(&vcpu, 1024*1024, guest_code); + + setup_memslot(); + + setup_gic(); + + setup_test_data(); +} + +static void destroy_vm(void) +{ + close(its_fd); + close(gic_fd); + kvm_vm_free(vm); +} + +static void run_test(int test_cmd) +{ + pr_info("------------------------------------------------------------------------------\n"); + switch (test_cmd) { + case MAP_EMPTY: + pr_info("Test ITS save/restore with empty mapping\n"); + break; + case MAP_SINGLE_EVENT_FIRST: + pr_info("Test ITS save/restore with one mapping (device:1, event:1)\n"); + break; + case MAP_SINGLE_EVENT_LAST: + pr_info("Test ITS save/restore with one mapping (device:%zu, event:%llu)\n", + td.nr_devices - 2, NR_EVENTS - 2); + break; + case MAP_FIRST_EVENT_PER_DEVICE: + pr_info("Test ITS save/restore with one small event per device (device:[0-%zu], event:2)\n", + td.nr_devices - 1); + break; + case MAP_LAST_EVENT_PER_DEVICE: + pr_info("Test ITS save/restore with one big event per device (device:[0-%zu], event:%llu)\n", + td.nr_devices - 1, NR_EVENTS - 3); + break; + } + pr_info("------------------------------------------------------------------------------\n"); + + run_its_tables_save_restore_test(test_cmd); + + ksft_print_msg("\n"); +} + +static void pr_usage(const char *name) +{ + pr_info("%s -c -s -h\n", name); + pr_info(" -c:\tclear ITS tables entries before saving\n"); + pr_info(" -s:\tuse the same collection ID for all mappings\n"); + pr_info(" -n:\tnumber of L2 device tables (default: %zu, range: [1 - %llu])\n", + td.nr_l2_tables, MAX_NR_L2); +} + +int main(int argc, char **argv) +{ + int c; + + while ((c = getopt(argc, argv, "hcsn:")) != -1) { + switch (c) { + case 'c': + td.clear_before_save = true; + break; + case 's': + td.same_coll_id = true; + break; + case 'n': + td.nr_l2_tables = atoi(optarg); + if (td.nr_l2_tables > 0 && td.nr_l2_tables <= MAX_NR_L2) { + td.nr_devices = td.nr_l2_tables * TABLE_SIZE / DTE_SIZE; + break; + } + pr_info("The specified number of L2 device tables is out of range!\n"); + case 'h': + default: + pr_usage(argv[0]); + return 1; + } + } + + ksft_print_header(); + + setup_vm(); + + ksft_set_plan(5); + + run_test(MAP_EMPTY); + run_test(MAP_SINGLE_EVENT_FIRST); + run_test(MAP_SINGLE_EVENT_LAST); + run_test(MAP_FIRST_EVENT_PER_DEVICE); + run_test(MAP_LAST_EVENT_PER_DEVICE); + + destroy_vm(); + + ksft_finished(); + + return 0; +} diff --git a/tools/testing/selftests/kvm/include/aarch64/gic_v3_its.h b/tools/testing/selftests/kvm/include/aarch64/gic_v3_its.h index 3722ed9c8f96..ecf1eb955471 100644 --- a/tools/testing/selftests/kvm/include/aarch64/gic_v3_its.h +++ b/tools/testing/selftests/kvm/include/aarch64/gic_v3_its.h @@ -7,7 +7,7 @@ void its_init(vm_paddr_t coll_tbl, size_t coll_tbl_sz, vm_paddr_t device_tbl, size_t device_tbl_sz, - vm_paddr_t cmdq, size_t cmdq_size); + vm_paddr_t cmdq, size_t cmdq_size, bool indirect_device_tbl); void its_send_mapd_cmd(void *cmdq_base, u32 device_id, vm_paddr_t itt_base, size_t itt_size, bool valid); @@ -15,5 +15,6 @@ void its_send_mapc_cmd(void *cmdq_base, u32 vcpu_id, u32 collection_id, bool val void its_send_mapti_cmd(void *cmdq_base, u32 device_id, u32 event_id, u32 collection_id, u32 intid); void its_send_invall_cmd(void *cmdq_base, u32 collection_id); +void its_send_discard_cmd(void *cmdq_base, u32 device_id, u32 event_id); #endif // __SELFTESTS_GIC_V3_ITS_H__ diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index bc7c242480d6..3abe06ad1f85 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -27,7 +27,9 @@ #define KVM_DEV_PATH "/dev/kvm" #define KVM_MAX_VCPUS 512 -#define NSEC_PER_SEC 1000000000L +#define NSEC_PER_USEC 1000L +#define USEC_PER_SEC 1000000L +#define NSEC_PER_SEC 1000000000L struct userspace_mem_region { struct kvm_userspace_memory_region2 region; diff --git a/tools/testing/selftests/kvm/lib/aarch64/gic_v3_its.c b/tools/testing/selftests/kvm/lib/aarch64/gic_v3_its.c index 09f270545646..cd3c65d762d2 100644 --- a/tools/testing/selftests/kvm/lib/aarch64/gic_v3_its.c +++ b/tools/testing/selftests/kvm/lib/aarch64/gic_v3_its.c @@ -52,7 +52,8 @@ static unsigned long its_find_baser(unsigned int type) return -1; } -static void its_install_table(unsigned int type, vm_paddr_t base, size_t size) +static void its_install_table(unsigned int type, vm_paddr_t base, + size_t size, bool indirect) { unsigned long offset = its_find_baser(type); u64 baser; @@ -64,6 +65,9 @@ static void its_install_table(unsigned int type, vm_paddr_t base, size_t size) GITS_BASER_RaWaWb | GITS_BASER_VALID; + if (indirect) + baser |= GITS_BASER_INDIRECT; + its_write_u64(offset, baser); } @@ -82,12 +86,13 @@ static void its_install_cmdq(vm_paddr_t base, size_t size) void its_init(vm_paddr_t coll_tbl, size_t coll_tbl_sz, vm_paddr_t device_tbl, size_t device_tbl_sz, - vm_paddr_t cmdq, size_t cmdq_size) + vm_paddr_t cmdq, size_t cmdq_size, bool indirect_device_tbl) { u32 ctlr; - its_install_table(GITS_BASER_TYPE_COLLECTION, coll_tbl, coll_tbl_sz); - its_install_table(GITS_BASER_TYPE_DEVICE, device_tbl, device_tbl_sz); + its_install_table(GITS_BASER_TYPE_COLLECTION, coll_tbl, coll_tbl_sz, false); + its_install_table(GITS_BASER_TYPE_DEVICE, device_tbl, device_tbl_sz, + indirect_device_tbl); its_install_cmdq(cmdq, cmdq_size); ctlr = its_read_u32(GITS_CTLR); @@ -237,6 +242,17 @@ void its_send_mapti_cmd(void *cmdq_base, u32 device_id, u32 event_id, its_send_cmd(cmdq_base, &cmd); } +void its_send_discard_cmd(void *cmdq_base, u32 device_id, u32 event_id) +{ + struct its_cmd_block cmd = {}; + + its_encode_cmd(&cmd, GITS_CMD_DISCARD); + its_encode_devid(&cmd, device_id); + its_encode_event_id(&cmd, event_id); + + its_send_cmd(cmdq_base, &cmd); +} + void its_send_invall_cmd(void *cmdq_base, u32 collection_id) { struct its_cmd_block cmd = {}; -- 2.47.0.277.g8800431eea-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 2/5] KVM: arm64: vgic-its: Add read/write helpers on ITS table entries. 2024-11-07 21:41 [PATCH v4 0/5] Some fixes about vgic-its Jing Zhang 2024-11-07 21:41 ` [PATCH v4 1/5] KVM: selftests: aarch64: Add VGIC selftest for save/restore ITS table mappings Jing Zhang @ 2024-11-07 21:41 ` Jing Zhang 2024-11-12 8:25 ` Marc Zyngier 2024-11-07 21:41 ` [PATCH v4 3/5] KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* Jing Zhang ` (3 subsequent siblings) 5 siblings, 1 reply; 12+ messages in thread From: Jing Zhang @ 2024-11-07 21:41 UTC (permalink / raw) To: KVM, KVMARM, ARMLinux, Marc Zyngier, Oliver Upton, Joey Gouly, Zenghui Yu, Suzuki K Poulose, Kunkun Jiang Cc: Paolo Bonzini, Andre Przywara, Colton Lewis, Raghavendra Rao Ananta, Shusen Li, Eric Auger, Jing Zhang To simplify read/write operations on ITS table entries, two helper functions have been implemented. These functions incorporate the necessary entry size validation. Signed-off-by: Jing Zhang <jingzhangos@google.com> --- arch/arm64/kvm/vgic/vgic.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index f2486b4d9f95..309295f5e1b0 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -146,6 +146,29 @@ static inline int vgic_write_guest_lock(struct kvm *kvm, gpa_t gpa, return ret; } +static inline int vgic_its_read_entry_lock(struct vgic_its *its, gpa_t eaddr, + u64 *eval, unsigned long esize) +{ + struct kvm *kvm = its->dev->kvm; + + if (KVM_BUG_ON(esize != sizeof(*eval), kvm)) + return -EINVAL; + + return kvm_read_guest_lock(kvm, eaddr, eval, esize); + +} + +static inline int vgic_its_write_entry_lock(struct vgic_its *its, gpa_t eaddr, + u64 eval, unsigned long esize) +{ + struct kvm *kvm = its->dev->kvm; + + if (KVM_BUG_ON(esize != sizeof(eval), kvm)) + return -EINVAL; + + return vgic_write_guest_lock(kvm, eaddr, &eval, esize); +} + /* * This struct provides an intermediate representation of the fields contained * in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC -- 2.47.0.277.g8800431eea-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v4 2/5] KVM: arm64: vgic-its: Add read/write helpers on ITS table entries. 2024-11-07 21:41 ` [PATCH v4 2/5] KVM: arm64: vgic-its: Add read/write helpers on ITS table entries Jing Zhang @ 2024-11-12 8:25 ` Marc Zyngier 0 siblings, 0 replies; 12+ messages in thread From: Marc Zyngier @ 2024-11-12 8:25 UTC (permalink / raw) To: Jing Zhang Cc: KVM, KVMARM, ARMLinux, Oliver Upton, Joey Gouly, Zenghui Yu, Suzuki K Poulose, Kunkun Jiang, Paolo Bonzini, Andre Przywara, Colton Lewis, Raghavendra Rao Ananta, Shusen Li, Eric Auger On Thu, 07 Nov 2024 21:41:34 +0000, Jing Zhang <jingzhangos@google.com> wrote: > > To simplify read/write operations on ITS table entries, two helper > functions have been implemented. These functions incorporate the > necessary entry size validation. > > Signed-off-by: Jing Zhang <jingzhangos@google.com> > --- > arch/arm64/kvm/vgic/vgic.h | 23 +++++++++++++++++++++++ > 1 file changed, 23 insertions(+) > > diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h > index f2486b4d9f95..309295f5e1b0 100644 > --- a/arch/arm64/kvm/vgic/vgic.h > +++ b/arch/arm64/kvm/vgic/vgic.h > @@ -146,6 +146,29 @@ static inline int vgic_write_guest_lock(struct kvm *kvm, gpa_t gpa, > return ret; > } > > +static inline int vgic_its_read_entry_lock(struct vgic_its *its, gpa_t eaddr, > + u64 *eval, unsigned long esize) > +{ > + struct kvm *kvm = its->dev->kvm; > + > + if (KVM_BUG_ON(esize != sizeof(*eval), kvm)) > + return -EINVAL; > + > + return kvm_read_guest_lock(kvm, eaddr, eval, esize); > + > +} > + > +static inline int vgic_its_write_entry_lock(struct vgic_its *its, gpa_t eaddr, > + u64 eval, unsigned long esize) > +{ > + struct kvm *kvm = its->dev->kvm; > + > + if (KVM_BUG_ON(esize != sizeof(eval), kvm)) > + return -EINVAL; > + > + return vgic_write_guest_lock(kvm, eaddr, &eval, esize); > +} > + I don't think this is right. Or at least not what I had in mind. What I wanted was to abstract both the element size and the ABI, and check for the type of 'eval'. Here, you implicitly case the caller's data on a u64, which C will happily do without warnings. So this KVM_BUG_ON() checks very little on writes. Also, you force the caller to explicitly extract the element-size from the ABI. Yes, it is available most of the time. But this is about checking consistency, and you are missing this opportunity. So while this code isn't wrong, it really doesn't make anything more robust. It's just another indirection. M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v4 3/5] KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* 2024-11-07 21:41 [PATCH v4 0/5] Some fixes about vgic-its Jing Zhang 2024-11-07 21:41 ` [PATCH v4 1/5] KVM: selftests: aarch64: Add VGIC selftest for save/restore ITS table mappings Jing Zhang 2024-11-07 21:41 ` [PATCH v4 2/5] KVM: arm64: vgic-its: Add read/write helpers on ITS table entries Jing Zhang @ 2024-11-07 21:41 ` Jing Zhang 2024-11-08 5:13 ` kernel test robot 2024-11-07 21:41 ` [PATCH v4 4/5] KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device Jing Zhang ` (2 subsequent siblings) 5 siblings, 1 reply; 12+ messages in thread From: Jing Zhang @ 2024-11-07 21:41 UTC (permalink / raw) To: KVM, KVMARM, ARMLinux, Marc Zyngier, Oliver Upton, Joey Gouly, Zenghui Yu, Suzuki K Poulose, Kunkun Jiang Cc: Paolo Bonzini, Andre Przywara, Colton Lewis, Raghavendra Rao Ananta, Shusen Li, Eric Auger, Jing Zhang From: Kunkun Jiang <jiangkunkun@huawei.com> In all the vgic_its_save_*() functinos, they do not check whether the data length is 8 bytes before calling vgic_write_guest_lock. This patch adds the check. To prevent the kernel from being blown up when the fault occurs, KVM_BUG_ON() is used. And the other BUG_ON()s are replaced together. Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with the new entry read/write helpers] Signed-off-by: Jing Zhang <jingzhangos@google.com> --- arch/arm64/kvm/vgic/vgic-its.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c index ba945ba78cc7..9ccf00731ad2 100644 --- a/arch/arm64/kvm/vgic/vgic-its.c +++ b/arch/arm64/kvm/vgic/vgic-its.c @@ -2086,7 +2086,6 @@ static int scan_its_table(struct vgic_its *its, gpa_t base, int size, u32 esz, static int vgic_its_save_ite(struct vgic_its *its, struct its_device *dev, struct its_ite *ite, gpa_t gpa, int ite_esz) { - struct kvm *kvm = its->dev->kvm; u32 next_offset; u64 val; @@ -2095,7 +2094,8 @@ static int vgic_its_save_ite(struct vgic_its *its, struct its_device *dev, ((u64)ite->irq->intid << KVM_ITS_ITE_PINTID_SHIFT) | ite->collection->collection_id; val = cpu_to_le64(val); - return vgic_write_guest_lock(kvm, gpa, &val, ite_esz); + + return vgic_its_write_entry_lock(its, gpa, val, ite_esz); } /** @@ -2239,7 +2239,6 @@ static int vgic_its_restore_itt(struct vgic_its *its, struct its_device *dev) static int vgic_its_save_dte(struct vgic_its *its, struct its_device *dev, gpa_t ptr, int dte_esz) { - struct kvm *kvm = its->dev->kvm; u64 val, itt_addr_field; u32 next_offset; @@ -2250,7 +2249,8 @@ static int vgic_its_save_dte(struct vgic_its *its, struct its_device *dev, (itt_addr_field << KVM_ITS_DTE_ITTADDR_SHIFT) | (dev->num_eventid_bits - 1)); val = cpu_to_le64(val); - return vgic_write_guest_lock(kvm, ptr, &val, dte_esz); + + return vgic_its_write_entry_lock(its, ptr, val, dte_esz); } /** @@ -2437,7 +2437,8 @@ static int vgic_its_save_cte(struct vgic_its *its, ((u64)collection->target_addr << KVM_ITS_CTE_RDBASE_SHIFT) | collection->collection_id); val = cpu_to_le64(val); - return vgic_write_guest_lock(its->dev->kvm, gpa, &val, esz); + + return vgic_its_write_entry_lock(its, gpa, val, esz); } /* @@ -2453,8 +2454,7 @@ static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz) u64 val; int ret; - BUG_ON(esz > sizeof(val)); - ret = kvm_read_guest_lock(kvm, gpa, &val, esz); + ret = vgic_its_read_entry_lock(its, gpa, &val, esz); if (ret) return ret; val = le64_to_cpu(val); @@ -2516,10 +2516,7 @@ static int vgic_its_save_collection_table(struct vgic_its *its) * table is not fully filled, add a last dummy element * with valid bit unset */ - val = 0; - BUG_ON(cte_esz > sizeof(val)); - ret = vgic_write_guest_lock(its->dev->kvm, gpa, &val, cte_esz); - return ret; + return vgic_its_write_entry_lock(its, gpa, 0, cte_esz); } /* -- 2.47.0.277.g8800431eea-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v4 3/5] KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* 2024-11-07 21:41 ` [PATCH v4 3/5] KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* Jing Zhang @ 2024-11-08 5:13 ` kernel test robot 0 siblings, 0 replies; 12+ messages in thread From: kernel test robot @ 2024-11-08 5:13 UTC (permalink / raw) To: Jing Zhang, KVM, KVMARM, ARMLinux, Marc Zyngier, Oliver Upton, Joey Gouly, Zenghui Yu, Suzuki K Poulose, Kunkun Jiang Cc: oe-kbuild-all, Paolo Bonzini, Andre Przywara, Colton Lewis, Raghavendra Rao Ananta, Shusen Li, Eric Auger, Jing Zhang Hi Jing, kernel test robot noticed the following build warnings: [auto build test WARNING on 59b723cd2adbac2a34fc8e12c74ae26ae45bf230] url: https://github.com/intel-lab-lkp/linux/commits/Jing-Zhang/KVM-selftests-aarch64-Add-VGIC-selftest-for-save-restore-ITS-table-mappings/20241108-054433 base: 59b723cd2adbac2a34fc8e12c74ae26ae45bf230 patch link: https://lore.kernel.org/r/20241107214137.428439-4-jingzhangos%40google.com patch subject: [PATCH v4 3/5] KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* config: arm64-randconfig-004-20241108 (https://download.01.org/0day-ci/archive/20241108/202411081338.47eReEKu-lkp@intel.com/config) compiler: aarch64-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241108/202411081338.47eReEKu-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411081338.47eReEKu-lkp@intel.com/ All warnings (new ones prefixed by >>): arch/arm64/kvm/vgic/vgic-its.c: In function 'vgic_its_save_collection_table': >> arch/arm64/kvm/vgic/vgic-its.c:2495:13: warning: unused variable 'val' [-Wunused-variable] 2495 | u64 val; | ^~~ vim +/val +2495 arch/arm64/kvm/vgic/vgic-its.c ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2484 0aa34b37a78d06 arch/arm64/kvm/vgic/vgic-its.c Sebastian Ott 2024-07-23 2485 /* 3b65808f4b2914 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2016-12-24 2486 * vgic_its_save_collection_table - Save the collection table into 3b65808f4b2914 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2016-12-24 2487 * guest RAM 3b65808f4b2914 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2016-12-24 2488 */ 3b65808f4b2914 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2016-12-24 2489 static int vgic_its_save_collection_table(struct vgic_its *its) 3b65808f4b2914 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2016-12-24 2490 { ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2491 const struct vgic_its_abi *abi = vgic_its_get_abi(its); c2385eaa6c5a87 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-10-26 2492 u64 baser = its->baser_coll_table; 8ad50c8985d805 virt/kvm/arm/vgic/vgic-its.c Kristina Martsenko 2018-09-26 2493 gpa_t gpa = GITS_BASER_ADDR_48_to_52(baser); ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2494 struct its_collection *collection; ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 @2495 u64 val; ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2496 size_t max_size, filled = 0; ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2497 int ret, cte_esz = abi->cte_esz; ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2498 c2385eaa6c5a87 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-10-26 2499 if (!(baser & GITS_BASER_VALID)) ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2500 return 0; ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2501 c2385eaa6c5a87 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-10-26 2502 max_size = GITS_BASER_NR_PAGES(baser) * SZ_64K; ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2503 ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2504 list_for_each_entry(collection, &its->collection_list, coll_list) { ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2505 ret = vgic_its_save_cte(its, collection, gpa, cte_esz); ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2506 if (ret) ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2507 return ret; ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2508 gpa += cte_esz; ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2509 filled += cte_esz; ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2510 } ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2511 ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2512 if (filled == max_size) ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2513 return 0; ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2514 ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2515 /* ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2516 * table is not fully filled, add a last dummy element ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2517 * with valid bit unset ea1ad53e1e31a3 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2017-01-09 2518 */ c4380c338c9607 arch/arm64/kvm/vgic/vgic-its.c Kunkun Jiang 2024-11-07 2519 return vgic_its_write_entry_lock(its, gpa, 0, cte_esz); 3b65808f4b2914 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2016-12-24 2520 } 3b65808f4b2914 virt/kvm/arm/vgic/vgic-its.c Eric Auger 2016-12-24 2521 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v4 4/5] KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device 2024-11-07 21:41 [PATCH v4 0/5] Some fixes about vgic-its Jing Zhang ` (2 preceding siblings ...) 2024-11-07 21:41 ` [PATCH v4 3/5] KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* Jing Zhang @ 2024-11-07 21:41 ` Jing Zhang 2024-11-07 21:41 ` [PATCH v4 5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE Jing Zhang 2024-11-11 20:40 ` [PATCH v4 0/5] Some fixes about vgic-its Oliver Upton 5 siblings, 0 replies; 12+ messages in thread From: Jing Zhang @ 2024-11-07 21:41 UTC (permalink / raw) To: KVM, KVMARM, ARMLinux, Marc Zyngier, Oliver Upton, Joey Gouly, Zenghui Yu, Suzuki K Poulose, Kunkun Jiang Cc: Paolo Bonzini, Andre Przywara, Colton Lewis, Raghavendra Rao Ananta, Shusen Li, Eric Auger, Jing Zhang From: Kunkun Jiang <jiangkunkun@huawei.com> vgic_its_save_device_tables will traverse its->device_list to save DTE for each device. vgic_its_restore_device_tables will traverse each entry of device table and check if it is valid. Restore if valid. But when MAPD unmaps a device, it does not invalidate the corresponding DTE. In the scenario of continuous saves and restores, there may be a situation where a device's DTE is not saved but is restored. This is unreasonable and may cause restore to fail. This patch clears the corresponding DTE when MAPD unmaps a device. Co-developed-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> --- arch/arm64/kvm/vgic/vgic-its.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c index 9ccf00731ad2..7f931e33a425 100644 --- a/arch/arm64/kvm/vgic/vgic-its.c +++ b/arch/arm64/kvm/vgic/vgic-its.c @@ -1139,9 +1139,11 @@ static int vgic_its_cmd_handle_mapd(struct kvm *kvm, struct vgic_its *its, bool valid = its_cmd_get_validbit(its_cmd); u8 num_eventid_bits = its_cmd_get_size(its_cmd); gpa_t itt_addr = its_cmd_get_ittaddr(its_cmd); + int dte_esz = vgic_its_get_abi(its)->dte_esz; struct its_device *device; + gpa_t gpa; - if (!vgic_its_check_id(its, its->baser_device_table, device_id, NULL)) + if (!vgic_its_check_id(its, its->baser_device_table, device_id, &gpa)) return E_ITS_MAPD_DEVICE_OOR; if (valid && num_eventid_bits > VITS_TYPER_IDBITS) @@ -1162,7 +1164,7 @@ static int vgic_its_cmd_handle_mapd(struct kvm *kvm, struct vgic_its *its, * is an error, so we are done in any case. */ if (!valid) - return 0; + return vgic_its_write_entry_lock(its, gpa, 0, dte_esz); device = vgic_its_alloc_device(its, device_id, itt_addr, num_eventid_bits); -- 2.47.0.277.g8800431eea-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE 2024-11-07 21:41 [PATCH v4 0/5] Some fixes about vgic-its Jing Zhang ` (3 preceding siblings ...) 2024-11-07 21:41 ` [PATCH v4 4/5] KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device Jing Zhang @ 2024-11-07 21:41 ` Jing Zhang 2025-05-12 14:09 ` David Sauerwein 2024-11-11 20:40 ` [PATCH v4 0/5] Some fixes about vgic-its Oliver Upton 5 siblings, 1 reply; 12+ messages in thread From: Jing Zhang @ 2024-11-07 21:41 UTC (permalink / raw) To: KVM, KVMARM, ARMLinux, Marc Zyngier, Oliver Upton, Joey Gouly, Zenghui Yu, Suzuki K Poulose, Kunkun Jiang Cc: Paolo Bonzini, Andre Przywara, Colton Lewis, Raghavendra Rao Ananta, Shusen Li, Eric Auger, Jing Zhang From: Kunkun Jiang <jiangkunkun@huawei.com> When DISCARD frees an ITE, it does not invalidate the corresponding ITE. In the scenario of continuous saves and restores, there may be a situation where an ITE is not saved but is restored. This is unreasonable and may cause restore to fail. This patch clears the corresponding ITE when DISCARD frees an ITE. Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> --- arch/arm64/kvm/vgic/vgic-its.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c index 7f931e33a425..5d5104af8768 100644 --- a/arch/arm64/kvm/vgic/vgic-its.c +++ b/arch/arm64/kvm/vgic/vgic-its.c @@ -782,6 +782,9 @@ static int vgic_its_cmd_handle_discard(struct kvm *kvm, struct vgic_its *its, ite = find_ite(its, device_id, event_id); if (ite && its_is_collection_mapped(ite->collection)) { + struct its_device *device = find_its_device(its, device_id); + int ite_esz = vgic_its_get_abi(its)->ite_esz; + gpa_t gpa = device->itt_addr + ite->event_id * ite_esz; /* * Though the spec talks about removing the pending state, we * don't bother here since we clear the ITTE anyway and the @@ -790,7 +793,8 @@ static int vgic_its_cmd_handle_discard(struct kvm *kvm, struct vgic_its *its, vgic_its_invalidate_cache(its); its_free_ite(kvm, ite); - return 0; + + return vgic_its_write_entry_lock(its, gpa, 0, ite_esz); } return E_ITS_DISCARD_UNMAPPED_INTERRUPT; -- 2.47.0.277.g8800431eea-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v4 5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE 2024-11-07 21:41 ` [PATCH v4 5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE Jing Zhang @ 2025-05-12 14:09 ` David Sauerwein 2025-05-16 9:52 ` Marc Zyngier 0 siblings, 1 reply; 12+ messages in thread From: David Sauerwein @ 2025-05-12 14:09 UTC (permalink / raw) To: jingzhangos Cc: andre.przywara, coltonlewis, eauger, jiangkunkun, joey.gouly, kvm, kvmarm, linux-arm-kernel, lishusen2, maz, oupton, pbonzini, rananta, suzuki.poulose, yuzenghui, graf, nh-open-source Hi Jing, After pulling this patch in via the v6.6.64 and v5.10.226 LTS releases, I see NULL pointer dereferences in some guests. The dereference happens in different parts of the kernel outside of the GIC driver (file systems, NVMe driver, etc.). The issue only appears once every few hundred DISCARDs / guest boots. Reverting the commit does fix the problem. I have seen multiple different guest kernel versions (4.14, 5.15) and distributions exhibit this issue. The issue looks like some kind of race. I think the guest re-uses the memory allocated for the ITT before the hypervisor is actually done with the DISCARD command, i.e. before it zeros the ITE. From what I can tell, the guest should wait for the command to finish via its_wait_for_range_completion(). I tried locking reads to its->cwriter in vgic_mmio_read_its_cwriter() and its->creadr in vgic_mmio_read_its_creadr() with its->cmd_lock in the hypervisor kernel, but that did not help. I also instrumented the guest kernel both via printk() and trace events. In both cases the issue disappears once the instrumentation is in place, so I'm not able to fully observe what is happening on the guest side. Do you have an idea of what might cause the issue? David Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4 5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE 2025-05-12 14:09 ` David Sauerwein @ 2025-05-16 9:52 ` Marc Zyngier 2025-08-11 12:40 ` David Woodhouse 0 siblings, 1 reply; 12+ messages in thread From: Marc Zyngier @ 2025-05-16 9:52 UTC (permalink / raw) To: David Sauerwein Cc: jingzhangos, andre.przywara, coltonlewis, eauger, jiangkunkun, joey.gouly, kvm, kvmarm, linux-arm-kernel, lishusen2, oupton, pbonzini, rananta, suzuki.poulose, yuzenghui, graf, nh-open-source On Mon, 12 May 2025 15:09:09 +0100, David Sauerwein <dssauerw@amazon.de> wrote: > > Hi Jing, > > After pulling this patch in via the v6.6.64 and v5.10.226 LTS releases, I see > NULL pointer dereferences in some guests. The dereference happens in different > parts of the kernel outside of the GIC driver (file systems, NVMe driver, > etc.). The issue only appears once every few hundred DISCARDs / guest boots. > Reverting the commit does fix the problem. I have seen multiple different guest > kernel versions (4.14, 5.15) and distributions exhibit this issue. Where is the guest stack trace? > The issue looks like some kind of race. I think the guest re-uses the memory > allocated for the ITT before the hypervisor is actually done with the DISCARD > command, i.e. before it zeros the ITE. From what I can tell, the guest should > wait for the command to finish via its_wait_for_range_completion(). I tried > locking reads to its->cwriter in vgic_mmio_read_its_cwriter() and its->creadr > in vgic_mmio_read_its_creadr() with its->cmd_lock in the hypervisor kernel, but > that did not help. I also instrumented the guest kernel both via printk() and > trace events. In both cases the issue disappears once the instrumentation is in > place, so I'm not able to fully observe what is happening on the guest side. > > Do you have an idea of what might cause the issue? I'm a bit sceptical of this analysis, because KVM makes no use of the guest's owned memory outside of a save/restore event, and otherwise shadows everything. So what are you *exactly* doing here? Have you reproduced this with an upstream, current KVM host? Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4 5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE 2025-05-16 9:52 ` Marc Zyngier @ 2025-08-11 12:40 ` David Woodhouse 0 siblings, 0 replies; 12+ messages in thread From: David Woodhouse @ 2025-08-11 12:40 UTC (permalink / raw) To: Marc Zyngier, David Sauerwein Cc: jingzhangos, andre.przywara, coltonlewis, eauger, jiangkunkun, joey.gouly, kvm, kvmarm, linux-arm-kernel, lishusen2, oupton, pbonzini, rananta, suzuki.poulose, yuzenghui, graf, nh-open-source [-- Attachment #1: Type: text/plain, Size: 5999 bytes --] On Fri, 2025-05-16 at 10:52 +0100, Marc Zyngier wrote: > On Mon, 12 May 2025 15:09:09 +0100, > David Sauerwein <dssauerw@amazon.de> wrote: > > > > Hi Jing, > > > > After pulling this patch in via the v6.6.64 and v5.10.226 LTS releases, I see > > NULL pointer dereferences in some guests. The dereference happens in different > > parts of the kernel outside of the GIC driver (file systems, NVMe driver, > > etc.). The issue only appears once every few hundred DISCARDs / guest boots. > > Reverting the commit does fix the problem. I have seen multiple different guest > > kernel versions (4.14, 5.15) and distributions exhibit this issue. > > Where is the guest stack trace? [ 157.126835] Unable to handle kernel NULL pointer dereference at virtual address 000002e8 [ 157.128248] Mem abort info: [ 157.128745] Exception class = DABT (current EL), IL = 32 bits [ 157.129736] SET = 0, FnV = 0 [ 157.130266] EA = 0, S1PTW = 0 [ 157.130794] Data abort info: [ 157.131273] ISV = 0, ISS = 0x00000004 [ 157.131933] CM = 0, WnR = 0 [ 157.132451] user pgtable: 4k pages, 48-bit VAs, pgd = ffff8003f5d4d000 [ 157.133556] [00000000000002e8] *pgd=0000000000000000 [ 157.134414] Internal error: Oops: 96000004 [#1] SMP [ 157.135238] Modules linked in: sunrpc vfat fat dm_mirror dm_region_hash dm_log dm_mod crc32_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ena ptp pps_core [ 157.137452] Process kworker/0:1 (pid: 28, stack limit = 0xffff000009dd8000) [ 157.138741] CPU: 0 PID: 28 Comm: kworker/0:1 Not tainted 4.14.336-253.554.amzn2.aarch64 #1 [ 157.140276] Hardware name: Amazon EC2 c7g.medium/, BIOS 1.0 11/1/2018 [ 157.141502] Workqueue: xfs-reclaim/nvme0n1p1 xfs_reclaim_worker [ 157.142629] task: ffff8003f91de600 task.stack: ffff000009dd8000 [ 157.143757] pc : xfs_perag_clear_reclaim_tag+0x4c/0x120 [ 157.144747] lr : 0x0 [ 157.145188] sp : ffff000009ddbb50 pstate : 80c00145 [ 157.146118] x29: ffff000009ddbb50 x28: ffff8003f8052e00 [ 157.147126] x27: 0000000000000000 x26: 000000000007bc78 [ 157.148165] x25: ffff000008d36000 x24: ffff8003f8052e00 [ 157.149151] x23: ffff000009ddbb80 x22: 0000000000000000 [ 157.150139] x21: ffff00000843bd5c x20: 0000000000000000 [ 157.151135] x19: ffff8003f8052e00 x18: 0000000000000038 [ 157.152146] x17: 0000ffff8b577980 x16: ffff0000083255c8 [ 157.153132] x15: 0000000000000000 x14: ffff8003f8052e70 [ 157.154137] x13: ffff8003f8f85b60 x12: ffff8003f8f85d49 [ 157.155144] x11: ffff8003f8f85b88 x10: 0000000000000000 [ 157.156158] x9 : 0000000000000039 x8 : 0000000000000007 [ 157.157180] x7 : 000000000000003e x6 : 0000000000000038 [ 157.158199] x5 : 0000000000000000 x4 : 0000000000000000 [ 157.159205] x3 : 00000000000002e8 x2 : 0000000000000001 [ 157.160211] x1 : 0000000000000000 x0 : 00000000000002e8 [ 157.161209] Call trace: [ 157.161709] xfs_perag_clear_reclaim_tag+0x4c/0x120 [ 157.162644] xfs_reclaim_inode+0x314/0x49c [ 157.163432] xfs_reclaim_inodes_ag+0x1ac/0x2fc [ 157.164290] xfs_reclaim_worker+0x4c/0x80 [ 157.165065] process_one_work+0x198/0x3e0 [ 157.165841] worker_thread+0x4c/0x458 [ 157.166544] kthread+0x138/0x13c [ 157.167172] ret_from_fork+0x10/0x2c [ 157.167858] Code: d2800001 52800022 aa0303e0 2a0103fe (88fe7c62) [ 157.169013] ---[ end trace 0a6955946156d7d5 ]--- [ 157.169905] Kernel panic - not syncing: Fatal exception [ 157.170894] Kernel Offset: disabled [ 157.171583] CPU features: 0x2,28002238 [ 157.172300] Memory Limit: none [ 157.172893] Rebooting in 30 seconds.. Hypervisor debug logs show a DISCARD command with a gpa which might match x28/x24 in the above? vgic_its_cmd_handle_discard gpa=438052e00 ite=00000000facc1299 event_id=0 device_id=32 ite_esz=8 vgic_its_base=10080000 vgic_its_check_event_id()=1 David, did we ever establish whether Ilias's patch from https://lore.kernel.org/all/20250414111244.153528-1-ilstam@amazon.com/ makes the problem go away? It serializes the GIC state to userspace like KVM does for most other devices, instead of doing the dubious thing that the GIC specification *permits* and scribbling it to guest memory. If we look at the GIC specification, it says that behaviour is UNPREDICTABLE in various cases where software writes to tables that the GIC owns, or if those tables aren't zeroed when given to the GIC. It would perhaps be useful to add a mode to QEMU which *enforces* that, taking the affected pages out of the guest's memory map (and emulating writes to parts of those pages which the guest *is* still allowed to touch, etc.). If this is indeed a guest bug, as I suspect, it should show up fairly quickly in such a setup. Would be useful for catching other guest bugs caused by this GIC feature too, like https://lore.kernel.org/all/c69938cffd4002a93a95a396affaa945e0f69206.camel@infradead.org/ > > The issue looks like some kind of race. I think the guest re-uses the memory > > allocated for the ITT before the hypervisor is actually done with the DISCARD > > command, i.e. before it zeros the ITE. From what I can tell, the guest should > > wait for the command to finish via its_wait_for_range_completion(). I tried > > locking reads to its->cwriter in vgic_mmio_read_its_cwriter() and its->creadr > > in vgic_mmio_read_its_creadr() with its->cmd_lock in the hypervisor kernel, but > > that did not help. I also instrumented the guest kernel both via printk() and > > trace events. In both cases the issue disappears once the instrumentation is in > > place, so I'm not able to fully observe what is happening on the guest side. > > > > Do you have an idea of what might cause the issue? > > I'm a bit sceptical of this analysis, because KVM makes no use of the > guest's owned memory outside of a save/restore event, and otherwise > shadows everything. Hypervisor live updates or live migration could trigger precisely that save/restore event at any time, surely? [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5069 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4 0/5] Some fixes about vgic-its 2024-11-07 21:41 [PATCH v4 0/5] Some fixes about vgic-its Jing Zhang ` (4 preceding siblings ...) 2024-11-07 21:41 ` [PATCH v4 5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE Jing Zhang @ 2024-11-11 20:40 ` Oliver Upton 5 siblings, 0 replies; 12+ messages in thread From: Oliver Upton @ 2024-11-11 20:40 UTC (permalink / raw) To: ARMLinux, Suzuki K Poulose, KVMARM, Oliver Upton, Kunkun Jiang, Jing Zhang, KVM, Marc Zyngier, Zenghui Yu, Joey Gouly Cc: Colton Lewis, Andre Przywara, Shusen Li, Raghavendra Rao Ananta, Paolo Bonzini, Eric Auger On Thu, 7 Nov 2024 13:41:32 -0800, Jing Zhang wrote: > This patch series addresses a critical issue in the VGIC ITS tables' > save/restore mechanism, accompanied by a comprehensive selftest for bug > reproduction and verification. > > The fix is originally from Kunkun Jiang at [1]. > > The identified bug manifests as a failure in VM suspend/resume operations. > The root cause lies in the repeated suspend attempts often required for > successful VM suspension, coupled with concurrent device interrupt registration > and freeing. This concurrency leads to inconsistencies in ITS mappings before > the save operation, potentially leaving orphaned Device Translation Entries > (DTEs) and Interrupt Translation Entries (ITEs) in the respective tables. > > [...] Taking the immediate fixes for now, selftest might need a bit more work (will review soon). Note that I squashed patch 2 + 3 together as well. Applied to kvmarm/next, thanks! [3/5] KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* https://git.kernel.org/kvmarm/kvmarm/c/7fe28d7e68f9 [4/5] KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device https://git.kernel.org/kvmarm/kvmarm/c/e9649129d33d [5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE https://git.kernel.org/kvmarm/kvmarm/c/7602ffd1d5e8 -- Best, Oliver ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-11 12:44 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-07 21:41 [PATCH v4 0/5] Some fixes about vgic-its Jing Zhang 2024-11-07 21:41 ` [PATCH v4 1/5] KVM: selftests: aarch64: Add VGIC selftest for save/restore ITS table mappings Jing Zhang 2024-11-07 21:41 ` [PATCH v4 2/5] KVM: arm64: vgic-its: Add read/write helpers on ITS table entries Jing Zhang 2024-11-12 8:25 ` Marc Zyngier 2024-11-07 21:41 ` [PATCH v4 3/5] KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* Jing Zhang 2024-11-08 5:13 ` kernel test robot 2024-11-07 21:41 ` [PATCH v4 4/5] KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device Jing Zhang 2024-11-07 21:41 ` [PATCH v4 5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE Jing Zhang 2025-05-12 14:09 ` David Sauerwein 2025-05-16 9:52 ` Marc Zyngier 2025-08-11 12:40 ` David Woodhouse 2024-11-11 20:40 ` [PATCH v4 0/5] Some fixes about vgic-its Oliver Upton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).