* [Qemu-devel] [PATCH 04/99] nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Eric Blake
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Eric Blake <eblake@redhat.com>
A missing space makes for poor error messages, and sizes can't
go negative. Also, we missed diagnosing a server that sends
a maximum block size less than the minimum.
Fixes: 081dd1fe
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180501154654.943782-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
(cherry picked from commit e475d108f1b3d3163f0affea67cdedbe5fc9752b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
nbd/client.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/nbd/client.c b/nbd/client.c
index b9e175d1c2..3523c863fe 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -435,8 +435,8 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
}
be32_to_cpus(&info->min_block);
if (!is_power_of_2(info->min_block)) {
- error_setg(errp, "server minimum block size %" PRId32
- "is not a power of two", info->min_block);
+ error_setg(errp, "server minimum block size %" PRIu32
+ " is not a power of two", info->min_block);
nbd_send_opt_abort(ioc);
return -1;
}
@@ -450,8 +450,8 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
be32_to_cpus(&info->opt_block);
if (!is_power_of_2(info->opt_block) ||
info->opt_block < info->min_block) {
- error_setg(errp, "server preferred block size %" PRId32
- "is not valid", info->opt_block);
+ error_setg(errp, "server preferred block size %" PRIu32
+ " is not valid", info->opt_block);
nbd_send_opt_abort(ioc);
return -1;
}
@@ -462,6 +462,12 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
return -1;
}
be32_to_cpus(&info->max_block);
+ if (info->max_block < info->min_block) {
+ error_setg(errp, "server maximum block size %" PRIu32
+ " is not valid", info->max_block);
+ nbd_send_opt_abort(ioc);
+ return -1;
+ }
trace_nbd_opt_go_info_block_size(info->min_block, info->opt_block,
info->max_block);
break;
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 50/99] usb: correctly handle Zero Length Packets
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Philippe Mathieu-Daudé, Gerd Hoffmann
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Philippe Mathieu-Daudé <f4bug@amsat.org>
USB Specification Revision 2.0, §5.5.3:
The Data stage of a control transfer from an endpoint to the host is complete when the endpoint does one of the following:
• Has transferred exactly the amount of data specified during the Setup stage
• Transfers a packet with a payload size less than wMaxPacketSize or transfers a zero-length packet"
hw/usb/redirect.c:802:9: warning: Declared variable-length array (VLA) has zero size
uint8_t buf[size];
^~~~~~~~~~~ ~~~~
Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180604151421.23385-2-f4bug@amsat.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit bf78fb1c1b61a819a47f7a1dbecf9934b9f32a0d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
hw/usb/redirect.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 65a9196c1a..58e8f7f5bd 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -795,7 +795,7 @@ static void usbredir_handle_bulk_data(USBRedirDevice *dev, USBPacket *p,
usbredirparser_peer_has_cap(dev->parser,
usb_redir_cap_32bits_bulk_length));
- if (ep & USB_DIR_IN) {
+ if (ep & USB_DIR_IN || size == 0) {
usbredirparser_send_bulk_packet(dev->parser, p->id,
&bulk_packet, NULL, 0);
} else {
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 49/99] arm_gicv3_kvm: kvm_dist_get/put_priority: skip the registers banked by GICR_IPRIORITYR
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Shannon Zhao, Peter Maydell
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Shannon Zhao <zhaoshenglong@huawei.com>
While for_each_dist_irq_reg loop starts from GIC_INTERNAL, it forgot to
offset the date array and index. This will overlap the GICR registers
value and leave the last GIC_INTERNAL irq's registers out of update.
Fixes: 367b9f527becdd20ddf116e17a3c0c2bbc486920
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 1dcf3675196a1cec616ce71b067d9498590a60a6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
hw/intc/arm_gicv3_kvm.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index 81cbd16817..bc6fa415b2 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -135,7 +135,14 @@ static void kvm_dist_get_priority(GICv3State *s, uint32_t offset, uint8_t *bmp)
uint32_t reg, *field;
int irq;
- field = (uint32_t *)bmp;
+ /* For the KVM GICv3, affinity routing is always enabled, and the first 8
+ * GICD_IPRIORITYR<n> registers are always RAZ/WI. The corresponding
+ * functionality is replaced by GICR_IPRIORITYR<n>. It doesn't need to
+ * sync them. So it needs to skip the field of GIC_INTERNAL irqs in bmp and
+ * offset.
+ */
+ field = (uint32_t *)(bmp + GIC_INTERNAL);
+ offset += (GIC_INTERNAL * 8) / 8;
for_each_dist_irq_reg(irq, s->num_irq, 8) {
kvm_gicd_access(s, offset, ®, false);
*field = reg;
@@ -149,7 +156,14 @@ static void kvm_dist_put_priority(GICv3State *s, uint32_t offset, uint8_t *bmp)
uint32_t reg, *field;
int irq;
- field = (uint32_t *)bmp;
+ /* For the KVM GICv3, affinity routing is always enabled, and the first 8
+ * GICD_IPRIORITYR<n> registers are always RAZ/WI. The corresponding
+ * functionality is replaced by GICR_IPRIORITYR<n>. It doesn't need to
+ * sync them. So it needs to skip the field of GIC_INTERNAL irqs in bmp and
+ * offset.
+ */
+ field = (uint32_t *)(bmp + GIC_INTERNAL);
+ offset += (GIC_INTERNAL * 8) / 8;
for_each_dist_irq_reg(irq, s->num_irq, 8) {
reg = *field;
kvm_gicd_access(s, offset, ®, true);
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 48/99] iotests: Add test 221 to catch qemu-img map regression
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Eric Blake, Kevin Wolf
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Eric Blake <eblake@redhat.com>
Although qemu-img creates aligned files (by rounding up), it
must also gracefully handle files that are not sector-aligned.
Test that the bug fixed in the previous patch does not recur.
It's a bit annoying that we can see the (implicit) hole past
the end of the file on to the next sector boundary, so if we
ever reach the point where we report a byte-accurate size rather
than our current behavior of always rounding up, this test will
probably need a slight modification.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit c6a9d2f6f9bc0c163b3a3073126464a2446bad5f)
Conflicts:
tests/qemu-iotests/group
* drop context dep on tests not present in 2.12
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
tests/qemu-iotests/221 | 60 ++++++++++++++++++++++++++++++++++++++
tests/qemu-iotests/221.out | 16 ++++++++++
tests/qemu-iotests/group | 1 +
3 files changed, 77 insertions(+)
create mode 100755 tests/qemu-iotests/221
create mode 100644 tests/qemu-iotests/221.out
diff --git a/tests/qemu-iotests/221 b/tests/qemu-iotests/221
new file mode 100755
index 0000000000..41c4e4bdf8
--- /dev/null
+++ b/tests/qemu-iotests/221
@@ -0,0 +1,60 @@
+#!/bin/bash
+#
+# Test qemu-img vs. unaligned images
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1 # failure is the default!
+
+_cleanup()
+{
+ _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Check mapping of unaligned raw image ==="
+echo
+
+_make_test_img 43009 # qemu-img create rounds size up
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+truncate --size=43009 "$TEST_IMG" # so we resize it and check again
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+$QEMU_IO -c 'w 43008 1' "$TEST_IMG" | _filter_qemu_io # writing also rounds up
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+truncate --size=43009 "$TEST_IMG" # so we resize it and check again
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/221.out b/tests/qemu-iotests/221.out
new file mode 100644
index 0000000000..a9c0190aad
--- /dev/null
+++ b/tests/qemu-iotests/221.out
@@ -0,0 +1,16 @@
+QA output created by 221
+
+=== Check mapping of unaligned raw image ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=43009
+[{ "start": 0, "length": 43520, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
+[{ "start": 0, "length": 43520, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
+wrote 1/1 bytes at offset 43008
+1 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 40960, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 40960, "length": 2049, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 43009, "length": 511, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
+[{ "start": 0, "length": 40960, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 40960, "length": 2049, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 43009, "length": 511, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 52a80f3f9e..fc10a72192 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -212,3 +212,4 @@
211 rw auto quick
212 rw auto quick
213 rw auto quick
+221 rw auto quick
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 47/99] qemu-img: Fix assert when mapping unaligned raw file
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Eric Blake, Kevin Wolf
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Eric Blake <eblake@redhat.com>
Commit a290f085 exposed a latent bug in qemu-img map introduced
during the conversion of block status to be byte-based. Earlier in
commit 5e344dd8, the internal interface get_block_status() switched
to take byte-based parameters, but still called a sector-based
block layer function; as such, rounding was added in the lone
caller to obey the contract. However, commit 237d78f8 changed
get_block_status() to truly be byte-based, at which point rounding
to sector boundaries can result in calling bdrv_block_status() with
'bytes == 0' (a coding error) when the boundary between data and a
hole falls mid-sector (true for the past-EOF implicit hole present
in POSIX files). Fix things by removing the rounding that is now
no longer necessary.
See also https://bugzilla.redhat.com/1589738
Fixes: 237d78f8
Reported-by: Dan Kenigsberg <danken@redhat.com>
Reported-by: Nir Soffer <nsoffer@redhat.com>
Reported-by: Maor Lipchuk <mlipchuk@redhat.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit e0b371ed5e2db079051139136fd0478728b6a58f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
qemu-img.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/qemu-img.c b/qemu-img.c
index b422fda6f3..a8e2b53dc6 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2827,7 +2827,7 @@ static int img_map(int argc, char **argv)
int64_t n;
/* Probe up to 1 GiB at a time. */
- n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), BDRV_SECTOR_SIZE);
+ n = MIN(1 << 30, length - offset);
ret = get_block_status(bs, offset, n, &next);
if (ret < 0) {
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 46/99] vhost-user: delete net client if necessary
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, linzhecheng, Jason Wang
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: linzhecheng <linzhecheng@huawei.com>
As qemu_new_net_client create new ncs but error happens later,
ncs will be left in global net_clients list and we can't use them any
more, so we need to cleanup them.
Cc: qemu-stable@nongnu.org
Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit c67daf4a24442d1bb404a11a6a54dc45ea10f234)
Conflicts:
net/vhost-user.c
* drop functional dep on 4d0cf552
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
net/vhost-user.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/net/vhost-user.c b/net/vhost-user.c
index e0f16c895b..62745c06dd 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -299,7 +299,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
s = DO_UPCAST(VhostUserState, nc, nc);
if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
error_report_err(err);
- return -1;
+ goto err;
}
}
@@ -309,7 +309,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
do {
if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
error_report_err(err);
- return -1;
+ goto err;
}
qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
net_vhost_user_event, NULL, nc0->name, NULL,
@@ -319,6 +319,13 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
assert(s->vhost_net);
return 0;
+
+err:
+ if (nc0) {
+ qemu_del_net_client(nc0);
+ }
+
+ return -1;
}
static Chardev *net_vhost_claim_chardev(
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 45/99] tap: set vhostfd passed from qemu cli to non-blocking
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Brijesh Singh, Michael S . Tsirkin, Jason Wang
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Brijesh Singh <brijesh.singh@amd.com>
A guest boot hangs while probing the network interface when
iommu_platform=on is used.
The following qemu cli hangs without this patch:
# $QEMU \
-netdev tap,fd=3,id=hostnet0,vhost=on,vhostfd=4 3<>/dev/tap67 4<>/dev/host-net \
-device virtio-net-pci,netdev=hostnet0,id=net0,iommu_platform=on,disable-legacy=on \
...
Commit: c471ad0e9bd46 (vhost_net: device IOTLB support) took care of
setting vhostfd to non-blocking when QEMU opens /dev/host-net but if
the fd is passed from qemu cli then we need to ensure that fd is set
to non-blocking.
Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
Cc: qemu-stable@nongnu.org
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit d542800d1edc62f63f8a29cfa6bdd1a9536ae11c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
net/tap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/tap.c b/net/tap.c
index 2b3a36f9b5..89c4e19162 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -40,6 +40,7 @@
#include "qemu-common.h"
#include "qemu/cutils.h"
#include "qemu/error-report.h"
+#include "qemu/sockets.h"
#include "net/tap.h"
@@ -693,6 +694,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
}
return;
}
+ qemu_set_nonblock(vhostfd);
} else {
vhostfd = open("/dev/vhost-net", O_RDWR);
if (vhostfd < 0) {
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 44/99] i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639)
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Konrad Rzeszutek Wilk, Daniel P . Berrangé,
Eduardo Habkost
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
AMD Zen expose the Intel equivalant to Speculative Store Bypass Disable
via the 0x80000008_EBX[25] CPUID feature bit.
This needs to be exposed to guest OS to allow them to protect
against CVE-2018-3639.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180521215424.13520-3-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 403503b162ffc33fb64cfefdf7b880acf41772cd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
target/i386/cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2f5263e22f..2e305ab689 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -541,7 +541,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
"ibpb", NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
- NULL, NULL, NULL, NULL,
+ NULL, "virt-ssbd", NULL, NULL,
NULL, NULL, NULL, NULL,
},
.cpuid_eax = 0x80000008,
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 43/99] i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639)
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Konrad Rzeszutek Wilk, Daniel P . Berrangé,
Eduardo Habkost
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
"Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD). To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.
Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present." (from x86/speculation: Add virtualized
speculative store bypass disable support in Linux).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180521215424.13520-4-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit cfeea0c021db6234c154dbc723730e81553924ff)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
target/i386/cpu.h | 2 ++
target/i386/kvm.c | 16 ++++++++++++++--
target/i386/machine.c | 20 ++++++++++++++++++++
3 files changed, 36 insertions(+), 2 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 970ab96e54..75e821cefe 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -351,6 +351,7 @@ typedef enum X86Seg {
#define MSR_IA32_FEATURE_CONTROL 0x0000003a
#define MSR_TSC_ADJUST 0x0000003b
#define MSR_IA32_SPEC_CTRL 0x48
+#define MSR_VIRT_SSBD 0xc001011f
#define MSR_IA32_TSCDEADLINE 0x6e0
#define FEATURE_CONTROL_LOCKED (1<<0)
@@ -1150,6 +1151,7 @@ typedef struct CPUX86State {
uint32_t pkru;
uint64_t spec_ctrl;
+ uint64_t virt_ssbd;
/* End of state preserved by INIT (dummy marker). */
struct {} end_init_save;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 6c49954e68..19e6aa320d 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -92,6 +92,7 @@ static bool has_msr_hv_stimer;
static bool has_msr_hv_frequencies;
static bool has_msr_xss;
static bool has_msr_spec_ctrl;
+static bool has_msr_virt_ssbd;
static bool has_msr_smi_count;
static uint32_t has_architectural_pmu_version;
@@ -1218,6 +1219,9 @@ static int kvm_get_supported_msrs(KVMState *s)
case MSR_IA32_SPEC_CTRL:
has_msr_spec_ctrl = true;
break;
+ case MSR_VIRT_SSBD:
+ has_msr_virt_ssbd = true;
+ break;
}
}
}
@@ -1706,6 +1710,10 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
if (has_msr_spec_ctrl) {
kvm_msr_entry_add(cpu, MSR_IA32_SPEC_CTRL, env->spec_ctrl);
}
+ if (has_msr_virt_ssbd) {
+ kvm_msr_entry_add(cpu, MSR_VIRT_SSBD, env->virt_ssbd);
+ }
+
#ifdef TARGET_X86_64
if (lm_capable_kernel) {
kvm_msr_entry_add(cpu, MSR_CSTAR, env->cstar);
@@ -2077,8 +2085,9 @@ static int kvm_get_msrs(X86CPU *cpu)
if (has_msr_spec_ctrl) {
kvm_msr_entry_add(cpu, MSR_IA32_SPEC_CTRL, 0);
}
-
-
+ if (has_msr_virt_ssbd) {
+ kvm_msr_entry_add(cpu, MSR_VIRT_SSBD, 0);
+ }
if (!env->tsc_valid) {
kvm_msr_entry_add(cpu, MSR_IA32_TSC, 0);
env->tsc_valid = !runstate_is_running();
@@ -2444,6 +2453,9 @@ static int kvm_get_msrs(X86CPU *cpu)
case MSR_IA32_SPEC_CTRL:
env->spec_ctrl = msrs[i].data;
break;
+ case MSR_VIRT_SSBD:
+ env->virt_ssbd = msrs[i].data;
+ break;
case MSR_IA32_RTIT_CTL:
env->msr_rtit_ctrl = msrs[i].data;
break;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index bd2d82e91b..f0a835c292 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -893,6 +893,25 @@ static const VMStateDescription vmstate_msr_intel_pt = {
}
};
+static bool virt_ssbd_needed(void *opaque)
+{
+ X86CPU *cpu = opaque;
+ CPUX86State *env = &cpu->env;
+
+ return env->virt_ssbd != 0;
+}
+
+static const VMStateDescription vmstate_msr_virt_ssbd = {
+ .name = "cpu/virt_ssbd",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .needed = virt_ssbd_needed,
+ .fields = (VMStateField[]){
+ VMSTATE_UINT64(env.virt_ssbd, X86CPU),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
VMStateDescription vmstate_x86_cpu = {
.name = "cpu",
.version_id = 12,
@@ -1015,6 +1034,7 @@ VMStateDescription vmstate_x86_cpu = {
&vmstate_spec_ctrl,
&vmstate_mcg_ext_ctl,
&vmstate_msr_intel_pt,
+ &vmstate_msr_virt_ssbd,
NULL
}
};
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 42/99] i386: define the 'ssbd' CPUID feature bit (CVE-2018-3639)
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Daniel P. Berrangé, Konrad Rzeszutek Wilk,
Eduardo Habkost
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Daniel P. Berrangé <berrange@redhat.com>
New microcode introduces the "Speculative Store Bypass Disable"
CPUID feature bit. This needs to be exposed to guest OS to allow
them to protect against CVE-2018-3639.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Message-Id: <20180521215424.13520-2-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit d19d1f965904a533998739698020ff4ee8a103da)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
target/i386/cpu.c | 2 +-
target/i386/cpu.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a20fe26573..2f5263e22f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -510,7 +510,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
NULL, NULL, "spec-ctrl", NULL,
- NULL, NULL, NULL, NULL,
+ NULL, NULL, NULL, "ssbd",
},
.cpuid_eax = 7,
.cpuid_needs_ecx = true, .cpuid_ecx = 0,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 1b219fafc4..970ab96e54 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -684,6 +684,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
#define CPUID_7_0_EDX_AVX512_4VNNIW (1U << 2) /* AVX512 Neural Network Instructions */
#define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3) /* AVX512 Multiply Accumulation Single Precision */
#define CPUID_7_0_EDX_SPEC_CTRL (1U << 26) /* Speculation Control */
+#define CPUID_7_0_EDX_SPEC_CTRL_SSBD (1U << 31) /* Speculative Store Bypass Disable */
#define KVM_HINTS_DEDICATED (1U << 0)
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 41/99] throttle: Fix crash on reopen
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Alberto Garcia, Max Reitz
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Alberto Garcia <berto@igalia.com>
The throttle block filter can be reopened, and with this it is
possible to change the throttle group that the filter belongs to.
The way the code does that is the following:
- On throttle_reopen_prepare(): create a new ThrottleGroupMember
and attach it to the new throttle group.
- On throttle_reopen_commit(): detach the old ThrottleGroupMember,
delete it and replace it with the new one.
The problem with this is that by replacing the ThrottleGroupMember the
previous value of io_limits_disabled is lost, causing an assertion
failure in throttle_co_drain_end().
This problem can be reproduced by reopening a throttle node:
$QEMU -monitor stdio
-object throttle-group,id=tg0,x-iops-total=1000 \
-blockdev node-name=hd0,driver=qcow2,file.driver=file,file.filename=hd.qcow2 \
-blockdev node-name=root,driver=throttle,throttle-group=tg0,file=hd0,read-only=on
(qemu) block_stream root
block/throttle.c:214: throttle_co_drain_end: Assertion `tgm->io_limits_disabled' failed.
Since we only want to change the throttle group on reopen there's no
need to create a ThrottleGroupMember and discard the old one. It's
easier if we simply detach it from its current group and attach it to
the new one.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180608151536.7378-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit bc33c047d1ec0b35c9cd8be62bcefae2da28654f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
block/throttle.c | 54 +++++++++++++++++++++++++++++-------------------
1 file changed, 33 insertions(+), 21 deletions(-)
diff --git a/block/throttle.c b/block/throttle.c
index 95ed06acd8..026f408aa6 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -36,9 +36,12 @@ static QemuOptsList throttle_opts = {
},
};
-static int throttle_configure_tgm(BlockDriverState *bs,
- ThrottleGroupMember *tgm,
- QDict *options, Error **errp)
+/*
+ * If this function succeeds then the throttle group name is stored in
+ * @group and must be freed by the caller.
+ * If there's an error then @group remains unmodified.
+ */
+static int throttle_parse_options(QDict *options, char **group, Error **errp)
{
int ret;
const char *group_name;
@@ -63,8 +66,7 @@ static int throttle_configure_tgm(BlockDriverState *bs,
goto fin;
}
- /* Register membership to group with name group_name */
- throttle_group_register_tgm(tgm, group_name, bdrv_get_aio_context(bs));
+ *group = g_strdup(group_name);
ret = 0;
fin:
qemu_opts_del(opts);
@@ -75,6 +77,8 @@ static int throttle_open(BlockDriverState *bs, QDict *options,
int flags, Error **errp)
{
ThrottleGroupMember *tgm = bs->opaque;
+ char *group;
+ int ret;
bs->file = bdrv_open_child(NULL, options, "file", bs,
&child_file, false, errp);
@@ -84,7 +88,14 @@ static int throttle_open(BlockDriverState *bs, QDict *options,
bs->supported_write_flags = bs->file->bs->supported_write_flags;
bs->supported_zero_flags = bs->file->bs->supported_zero_flags;
- return throttle_configure_tgm(bs, tgm, options, errp);
+ ret = throttle_parse_options(options, &group, errp);
+ if (ret == 0) {
+ /* Register membership to group with name group_name */
+ throttle_group_register_tgm(tgm, group, bdrv_get_aio_context(bs));
+ g_free(group);
+ }
+
+ return ret;
}
static void throttle_close(BlockDriverState *bs)
@@ -160,35 +171,36 @@ static void throttle_attach_aio_context(BlockDriverState *bs,
static int throttle_reopen_prepare(BDRVReopenState *reopen_state,
BlockReopenQueue *queue, Error **errp)
{
- ThrottleGroupMember *tgm;
+ int ret;
+ char *group = NULL;
assert(reopen_state != NULL);
assert(reopen_state->bs != NULL);
- reopen_state->opaque = g_new0(ThrottleGroupMember, 1);
- tgm = reopen_state->opaque;
-
- return throttle_configure_tgm(reopen_state->bs, tgm, reopen_state->options,
- errp);
+ ret = throttle_parse_options(reopen_state->options, &group, errp);
+ reopen_state->opaque = group;
+ return ret;
}
static void throttle_reopen_commit(BDRVReopenState *reopen_state)
{
- ThrottleGroupMember *old_tgm = reopen_state->bs->opaque;
- ThrottleGroupMember *new_tgm = reopen_state->opaque;
+ BlockDriverState *bs = reopen_state->bs;
+ ThrottleGroupMember *tgm = bs->opaque;
+ char *group = reopen_state->opaque;
+
+ assert(group);
- throttle_group_unregister_tgm(old_tgm);
- g_free(old_tgm);
- reopen_state->bs->opaque = new_tgm;
+ if (strcmp(group, throttle_group_get_name(tgm))) {
+ throttle_group_unregister_tgm(tgm);
+ throttle_group_register_tgm(tgm, group, bdrv_get_aio_context(bs));
+ }
+ g_free(reopen_state->opaque);
reopen_state->opaque = NULL;
}
static void throttle_reopen_abort(BDRVReopenState *reopen_state)
{
- ThrottleGroupMember *tgm = reopen_state->opaque;
-
- throttle_group_unregister_tgm(tgm);
- g_free(tgm);
+ g_free(reopen_state->opaque);
reopen_state->opaque = NULL;
}
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 03/99] ccid: Fix dwProtocols advertisement of T=0
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Jason Andryuk, Gerd Hoffmann
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Jason Andryuk <jandryuk@gmail.com>
Commit d7d218ef02d87c637d20d64da8f575d434ff6f78 attempted to change
dwProtocols to only advertise support for T=0 and not T=1. The change
was incorrect as it changed 0x00000003 to 0x00010000.
lsusb -v in a linux guest shows:
"dwProtocols 65536 (Invalid values detected)", though the
smart card could still be accessed. Windows 7 does not detect inserted
smart cards and logs the the following Error in the Event Logs:
Source: Smart Card Service
Event ID: 610
Smart Card Reader 'QEMU QEMU USB CCID 0' rejected IOCTL SET_PROTOCOL:
Incorrect function. If this error persists, your smart card or reader
may not be functioning correctly
Command Header: 03 00 00 00
Setting to 0x00000001 fixes the Windows issue.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Message-id: 20180420183219.20722-1-jandryuk@gmail.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 0ee86bb6c5beb6498488850104f7557c376d0bef)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
hw/usb/dev-smartcard-reader.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/hw/usb/dev-smartcard-reader.c b/hw/usb/dev-smartcard-reader.c
index e6468057a0..cabb564788 100644
--- a/hw/usb/dev-smartcard-reader.c
+++ b/hw/usb/dev-smartcard-reader.c
@@ -329,8 +329,8 @@ static const uint8_t qemu_ccid_descriptor[] = {
*/
0x07, /* u8 bVoltageSupport; 01h - 5.0v, 02h - 3.0, 03 - 1.8 */
- 0x00, 0x00, /* u32 dwProtocols; RRRR PPPP. RRRR = 0000h.*/
- 0x01, 0x00, /* PPPP: 0001h = Protocol T=0, 0002h = Protocol T=1 */
+ 0x01, 0x00, /* u32 dwProtocols; RRRR PPPP. RRRR = 0000h.*/
+ 0x00, 0x00, /* PPPP: 0001h = Protocol T=0, 0002h = Protocol T=1 */
/* u32 dwDefaultClock; in kHZ (0x0fa0 is 4 MHz) */
0xa0, 0x0f, 0x00, 0x00,
/* u32 dwMaximumClock; */
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 40/99] iotests: Add case for a corrupted inactive image
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Max Reitz
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Tested-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180606193702.7113-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit c50abd175a88cd41c2c08339de91f6f6e4a7b162)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
tests/qemu-iotests/060 | 30 ++++++++++++++++++++++++++++++
tests/qemu-iotests/060.out | 14 ++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index 6c7407f499..7bdf609f3f 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -440,6 +440,36 @@ echo "{'execute': 'qmp_capabilities'}
-drive if=none,node-name=drive,file="$TEST_IMG",driver=qcow2 \
| _filter_qmp | _filter_qemu_io
+echo
+echo "=== Testing incoming inactive corrupted image ==="
+echo
+
+_make_test_img 64M
+# Create an unaligned L1 entry, so qemu will signal a corruption when
+# reading from the covered area
+poke_file "$TEST_IMG" "$l1_offset" "\x00\x00\x00\x00\x2a\x2a\x2a\x2a"
+
+# Inactive images are effectively read-only images, so this should be a
+# non-fatal corruption (which does not modify the image)
+echo "{'execute': 'qmp_capabilities'}
+ {'execute': 'human-monitor-command',
+ 'arguments': {'command-line': 'qemu-io drive \"read 0 512\"'}}
+ {'execute': 'quit'}" \
+ | $QEMU -qmp stdio -nographic -nodefaults \
+ -blockdev "{'node-name': 'drive',
+ 'driver': 'qcow2',
+ 'file': {
+ 'driver': 'file',
+ 'filename': '$TEST_IMG'
+ }}" \
+ -incoming exec:'cat /dev/null' \
+ 2>&1 \
+ | _filter_qmp | _filter_qemu_io
+
+echo
+# Image should not have been marked corrupt
+_img_info --format-specific | grep 'corrupt:'
+
# success, all done
echo "*** done"
rm -f $seq.full
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 25d5c3938b..99234fbfc2 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -420,4 +420,18 @@ write failed: Input/output error
{"return": ""}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
+
+=== Testing incoming inactive corrupted image ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+QMP_VERSION
+{"return": {}}
+qcow2: Image is corrupt: L2 table offset 0x2a2a2a00 unaligned (L1 index: 0); further non-fatal corruption events will be suppressed
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_IMAGE_CORRUPTED", "data": {"device": "", "msg": "L2 table offset 0x2a2a2a00 unaligned (L1 index: 0)", "node-name": "drive", "fatal": false}}
+read failed: Input/output error
+{"return": ""}
+{"return": {}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
+
+ corrupt: false
*** done
--
2.17.1
^ permalink raw reply related
* [PATCH] tpm: add support for partial reads
From: Jarkko Sakkinen @ 2018-07-23 20:19 UTC (permalink / raw)
To: linux-security-module
In-Reply-To: <153201555276.20155.1352499992826895966.stgit@tstruk-mobl1.jf.intel.com>
On Thu, Jul 19, 2018 at 08:52:32AM -0700, Tadeusz Struk wrote:
> Currently to read a response from the TPM device an application needs
> provide "big enough" buffer for the whole response and read it in one go.
> The application doesn't know how big the response it beforehand so it
> always needs to maintain a 4K buffer and read the max (4K).
> In case if the user of the TSS library doesn't provide big enough buffer
> the TCTI spec says that the library should set the required size and return
> TSS2_TCTI_RC_INSUFFICIENT_BUFFER error code so that the application could
> allocate a bigger buffer and call receive again.
> To make it possible in the TSS library this requires being able to do
> partial reads from the driver.
> The library would read the header first to get the actual size of the
> response from the header and then read the rest of the response.
> This patch adds support for partial reads.
>
> The usecase is implemented in this TSS commit:
> https://github.com/tpm2-software/tpm2-tss/commit/ce982f67a67dc08e24683d30b05800648d8a264c
>
> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
For non-blocking operation I see the benefit because it does not break
the ABI and it really simplifies threading in the user space.
In this case I do not have any major evidence of any major benefit *and*
the change breaks the ABI.
Linux does not *have* to implement in kernel level every tidbit of the
TCG spec but it *can* provide support in places where it makes sense
and things do not break.
/Jarkko
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [Qemu-devel] [PATCH 39/99] qcow2: Do not mark inactive images corrupt
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Max Reitz
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Max Reitz <mreitz@redhat.com>
When signaling a corruption on a read-only image, qcow2 already makes
fatal events non-fatal (i.e., they will not result in the image being
closed, and the image header's corrupt flag will not be set). This is
necessary because we cannot set the corrupt flag on read-only images,
and it is possible because further corruption of read-only images is
impossible.
Inactive images are effectively read-only, too, so we should do the same
for them. bdrv_is_writable() can tell us whether an image can actually
be written to, so use its result instead of !bs->read_only.
(Otherwise, the assert(!(bs->open_flags & BDRV_O_INACTIVE)) in
bdrv_co_pwritev() will fail, crashing qemu.)
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180606193702.7113-3-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit ddf3b47ef4b5ed0bf6558d4c2c8ae130b8d8a580)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
block/qcow2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index ef68772aca..a2b2e5d4cd 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4386,7 +4386,7 @@ void qcow2_signal_corruption(BlockDriverState *bs, bool fatal, int64_t offset,
char *message;
va_list ap;
- fatal = fatal && !bs->read_only;
+ fatal = fatal && bdrv_is_writable(bs);
if (s->signaled_corruption &&
(!fatal || (s->incompatible_features & QCOW2_INCOMPAT_CORRUPT)))
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 37/99] arm_gicv3_kvm: kvm_dist_get/put: skip the registers banked by GICR
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Shannon Zhao, Peter Maydell
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Shannon Zhao <zhaoshenglong@huawei.com>
While we skip the GIC_INTERNAL irqs, we don't change the register offset
accordingly. This will overlap the GICR registers value and leave the
last GIC_INTERNAL irq's registers out of update.
Fix this by skipping the registers banked by GICR.
Also for migration compatibility if the migration source (old version
qemu) doesn't send gicd_no_migration_shift_bug = 1 to destination, then
we shift the data of PPI to get the right data for SPI.
Fixes: 367b9f527becdd20ddf116e17a3c0c2bbc486920
Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Message-id: 1527816987-16108-1-git-send-email-zhaoshenglong@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 910e204841954b95c051b2ee49ab0f5c735ff93c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
hw/intc/arm_gicv3_common.c | 79 ++++++++++++++++++++++++++++++
hw/intc/arm_gicv3_kvm.c | 38 ++++++++++++++
include/hw/intc/arm_gicv3_common.h | 1 +
3 files changed, 118 insertions(+)
diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 7b54d52376..864b7c6515 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -27,6 +27,7 @@
#include "hw/intc/arm_gicv3_common.h"
#include "gicv3_internal.h"
#include "hw/arm/linux-boot-if.h"
+#include "sysemu/kvm.h"
static int gicv3_pre_save(void *opaque)
{
@@ -141,6 +142,79 @@ static const VMStateDescription vmstate_gicv3_cpu = {
}
};
+static int gicv3_gicd_no_migration_shift_bug_pre_load(void *opaque)
+{
+ GICv3State *cs = opaque;
+
+ /*
+ * The gicd_no_migration_shift_bug flag is used for migration compatibility
+ * for old version QEMU which may have the GICD bmp shift bug under KVM mode.
+ * Strictly, what we want to know is whether the migration source is using
+ * KVM. Since we don't have any way to determine that, we look at whether the
+ * destination is using KVM; this is close enough because for the older QEMU
+ * versions with this bug KVM -> TCG migration didn't work anyway. If the
+ * source is a newer QEMU without this bug it will transmit the migration
+ * subsection which sets the flag to true; otherwise it will remain set to
+ * the value we select here.
+ */
+ if (kvm_enabled()) {
+ cs->gicd_no_migration_shift_bug = false;
+ }
+
+ return 0;
+}
+
+static int gicv3_gicd_no_migration_shift_bug_post_load(void *opaque,
+ int version_id)
+{
+ GICv3State *cs = opaque;
+
+ if (cs->gicd_no_migration_shift_bug) {
+ return 0;
+ }
+
+ /* Older versions of QEMU had a bug in the handling of state save/restore
+ * to the KVM GICv3: they got the offset in the bitmap arrays wrong,
+ * so that instead of the data for external interrupts 32 and up
+ * starting at bit position 32 in the bitmap, it started at bit
+ * position 64. If we're receiving data from a QEMU with that bug,
+ * we must move the data down into the right place.
+ */
+ memmove(cs->group, (uint8_t *)cs->group + GIC_INTERNAL / 8,
+ sizeof(cs->group) - GIC_INTERNAL / 8);
+ memmove(cs->grpmod, (uint8_t *)cs->grpmod + GIC_INTERNAL / 8,
+ sizeof(cs->grpmod) - GIC_INTERNAL / 8);
+ memmove(cs->enabled, (uint8_t *)cs->enabled + GIC_INTERNAL / 8,
+ sizeof(cs->enabled) - GIC_INTERNAL / 8);
+ memmove(cs->pending, (uint8_t *)cs->pending + GIC_INTERNAL / 8,
+ sizeof(cs->pending) - GIC_INTERNAL / 8);
+ memmove(cs->active, (uint8_t *)cs->active + GIC_INTERNAL / 8,
+ sizeof(cs->active) - GIC_INTERNAL / 8);
+ memmove(cs->edge_trigger, (uint8_t *)cs->edge_trigger + GIC_INTERNAL / 8,
+ sizeof(cs->edge_trigger) - GIC_INTERNAL / 8);
+
+ /*
+ * While this new version QEMU doesn't have this kind of bug as we fix it,
+ * so it needs to set the flag to true to indicate that and it's necessary
+ * for next migration to work from this new version QEMU.
+ */
+ cs->gicd_no_migration_shift_bug = true;
+
+ return 0;
+}
+
+const VMStateDescription vmstate_gicv3_gicd_no_migration_shift_bug = {
+ .name = "arm_gicv3/gicd_no_migration_shift_bug",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .pre_load = gicv3_gicd_no_migration_shift_bug_pre_load,
+ .post_load = gicv3_gicd_no_migration_shift_bug_post_load,
+ .fields = (VMStateField[]) {
+ VMSTATE_BOOL(gicd_no_migration_shift_bug, GICv3State),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
static const VMStateDescription vmstate_gicv3 = {
.name = "arm_gicv3",
.version_id = 1,
@@ -165,6 +239,10 @@ static const VMStateDescription vmstate_gicv3 = {
VMSTATE_STRUCT_VARRAY_POINTER_UINT32(cpu, GICv3State, num_cpu,
vmstate_gicv3_cpu, GICv3CPUState),
VMSTATE_END_OF_LIST()
+ },
+ .subsections = (const VMStateDescription * []) {
+ &vmstate_gicv3_gicd_no_migration_shift_bug,
+ NULL
}
};
@@ -364,6 +442,7 @@ static void arm_gicv3_common_reset(DeviceState *dev)
gicv3_gicd_group_set(s, i);
}
}
+ s->gicd_no_migration_shift_bug = true;
}
static void arm_gic_common_linux_init(ARMLinuxBootIf *obj,
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index 3536795501..81cbd16817 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -164,6 +164,14 @@ static void kvm_dist_get_edge_trigger(GICv3State *s, uint32_t offset,
uint32_t reg;
int irq;
+ /* For the KVM GICv3, affinity routing is always enabled, and the first 2
+ * GICD_ICFGR<n> registers are always RAZ/WI. The corresponding
+ * functionality is replaced by GICR_ICFGR<n>. It doesn't need to sync
+ * them. So it should increase the offset to skip GIC_INTERNAL irqs.
+ * This matches the for_each_dist_irq_reg() macro which also skips the
+ * first GIC_INTERNAL irqs.
+ */
+ offset += (GIC_INTERNAL * 2) / 8;
for_each_dist_irq_reg(irq, s->num_irq, 2) {
kvm_gicd_access(s, offset, ®, false);
reg = half_unshuffle32(reg >> 1);
@@ -181,6 +189,14 @@ static void kvm_dist_put_edge_trigger(GICv3State *s, uint32_t offset,
uint32_t reg;
int irq;
+ /* For the KVM GICv3, affinity routing is always enabled, and the first 2
+ * GICD_ICFGR<n> registers are always RAZ/WI. The corresponding
+ * functionality is replaced by GICR_ICFGR<n>. It doesn't need to sync
+ * them. So it should increase the offset to skip GIC_INTERNAL irqs.
+ * This matches the for_each_dist_irq_reg() macro which also skips the
+ * first GIC_INTERNAL irqs.
+ */
+ offset += (GIC_INTERNAL * 2) / 8;
for_each_dist_irq_reg(irq, s->num_irq, 2) {
reg = *gic_bmp_ptr32(bmp, irq);
if (irq % 32 != 0) {
@@ -222,6 +238,15 @@ static void kvm_dist_getbmp(GICv3State *s, uint32_t offset, uint32_t *bmp)
uint32_t reg;
int irq;
+ /* For the KVM GICv3, affinity routing is always enabled, and the
+ * GICD_IGROUPR0/GICD_IGRPMODR0/GICD_ISENABLER0/GICD_ISPENDR0/
+ * GICD_ISACTIVER0 registers are always RAZ/WI. The corresponding
+ * functionality is replaced by the GICR registers. It doesn't need to sync
+ * them. So it should increase the offset to skip GIC_INTERNAL irqs.
+ * This matches the for_each_dist_irq_reg() macro which also skips the
+ * first GIC_INTERNAL irqs.
+ */
+ offset += (GIC_INTERNAL * 1) / 8;
for_each_dist_irq_reg(irq, s->num_irq, 1) {
kvm_gicd_access(s, offset, ®, false);
*gic_bmp_ptr32(bmp, irq) = reg;
@@ -235,6 +260,19 @@ static void kvm_dist_putbmp(GICv3State *s, uint32_t offset,
uint32_t reg;
int irq;
+ /* For the KVM GICv3, affinity routing is always enabled, and the
+ * GICD_IGROUPR0/GICD_IGRPMODR0/GICD_ISENABLER0/GICD_ISPENDR0/
+ * GICD_ISACTIVER0 registers are always RAZ/WI. The corresponding
+ * functionality is replaced by the GICR registers. It doesn't need to sync
+ * them. So it should increase the offset and clroffset to skip GIC_INTERNAL
+ * irqs. This matches the for_each_dist_irq_reg() macro which also skips the
+ * first GIC_INTERNAL irqs.
+ */
+ offset += (GIC_INTERNAL * 1) / 8;
+ if (clroffset != 0) {
+ clroffset += (GIC_INTERNAL * 1) / 8;
+ }
+
for_each_dist_irq_reg(irq, s->num_irq, 1) {
/* If this bitmap is a set/clear register pair, first write to the
* clear-reg to clear all bits before using the set-reg to write
diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
index bccdfe17c6..d75b49d558 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -217,6 +217,7 @@ struct GICv3State {
uint32_t revision;
bool security_extn;
bool irq_reset_nonsecure;
+ bool gicd_no_migration_shift_bug;
int dev_fd; /* kvm device fd if backed by kvm vgic support */
Error *migration_blocker;
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 38/99] block: Make bdrv_is_writable() public
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Max Reitz
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Max Reitz <mreitz@redhat.com>
This is a useful function for the whole block layer, so make it public.
At the same time, users outside of block.c probably do not need to make
use of the reopen functionality, so rename the current function to
bdrv_is_writable_after_reopen() create a new bdrv_is_writable() function
that just passes NULL to it for the reopen queue.
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180606193702.7113-2-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit cc022140972f8b6ac3973c12ccf9dd6b1d2fd200)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
block.c | 17 ++++++++++++++---
include/block/block.h | 1 +
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/block.c b/block.c
index a2caadf0a0..9af22179ce 100644
--- a/block.c
+++ b/block.c
@@ -1620,13 +1620,24 @@ static int bdrv_reopen_get_flags(BlockReopenQueue *q, BlockDriverState *bs)
/* Returns whether the image file can be written to after the reopen queue @q
* has been successfully applied, or right now if @q is NULL. */
-static bool bdrv_is_writable(BlockDriverState *bs, BlockReopenQueue *q)
+static bool bdrv_is_writable_after_reopen(BlockDriverState *bs,
+ BlockReopenQueue *q)
{
int flags = bdrv_reopen_get_flags(q, bs);
return (flags & (BDRV_O_RDWR | BDRV_O_INACTIVE)) == BDRV_O_RDWR;
}
+/*
+ * Return whether the BDS can be written to. This is not necessarily
+ * the same as !bdrv_is_read_only(bs), as inactivated images may not
+ * be written to but do not count as read-only images.
+ */
+bool bdrv_is_writable(BlockDriverState *bs)
+{
+ return bdrv_is_writable_after_reopen(bs, NULL);
+}
+
static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
BdrvChild *c, const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
@@ -1664,7 +1675,7 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
/* Write permissions never work with read-only images */
if ((cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) &&
- !bdrv_is_writable(bs, q))
+ !bdrv_is_writable_after_reopen(bs, q))
{
error_setg(errp, "Block node is read-only");
return -EPERM;
@@ -1956,7 +1967,7 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
&perm, &shared);
/* Format drivers may touch metadata even if the guest doesn't write */
- if (bdrv_is_writable(bs, reopen_queue)) {
+ if (bdrv_is_writable_after_reopen(bs, reopen_queue)) {
perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
}
diff --git a/include/block/block.h b/include/block/block.h
index cdec3639a3..68a667a742 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -400,6 +400,7 @@ bool bdrv_is_read_only(BlockDriverState *bs);
int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
bool ignore_allow_rdw, Error **errp);
int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp);
+bool bdrv_is_writable(BlockDriverState *bs);
bool bdrv_is_sg(BlockDriverState *bs);
bool bdrv_is_inserted(BlockDriverState *bs);
void bdrv_lock_medium(BlockDriverState *bs, bool locked);
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 35/99] Fix libusb-1.0.22 deprecated libusb_set_debug with libusb_set_option
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, John Thomson, Gerd Hoffmann
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: John Thomson <git@johnthomson.fastmail.com.au>
libusb-1.0.22 marked libusb_set_debug deprecated
it is replaced with
libusb_set_option(libusb_context, LIBUSB_OPTION_LOG_LEVEL, libusb_log_level);
details here: https://github.com/libusb/libusb/commit/539f22e2fd916558d11ab9a66f10f461c5593168
Warning here:
CC hw/usb/host-libusb.o
/builds/xen/src/qemu-xen/hw/usb/host-libusb.c: In function 'usb_host_init':
/builds/xen/src/qemu-xen/hw/usb/host-libusb.c:250:5: error: 'libusb_set_debug' is deprecated: Use libusb_set_option instead [-Werror=deprecated-declarations]
libusb_set_debug(ctx, loglevel);
^~~~~~~~~~~~~~~~
In file included from /builds/xen/src/qemu-xen/hw/usb/host-libusb.c:40:0:
/usr/include/libusb-1.0/libusb.h:1300:18: note: declared here
void LIBUSB_CALL libusb_set_debug(libusb_context *ctx, int level);
^~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make: *** [/builds/xen/src/qemu-xen/rules.mak:66: hw/usb/host-libusb.o] Error 1
make: Leaving directory '/builds/xen/src/xen/tools/qemu-xen-build'
Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au>
Message-id: 20180405132046.4968-1-git@johnthomson.fastmail.com.au
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 9d8fa0df49af16a208fa961c2968fba4daffcc07)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
hw/usb/host-libusb.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
index 1b0be071cc..dc0a8fe295 100644
--- a/hw/usb/host-libusb.c
+++ b/hw/usb/host-libusb.c
@@ -247,7 +247,11 @@ static int usb_host_init(void)
if (rc != 0) {
return -1;
}
+#if LIBUSB_API_VERSION >= 0x01000106
+ libusb_set_option(ctx, LIBUSB_OPTION_LOG_LEVEL, loglevel);
+#else
libusb_set_debug(ctx, loglevel);
+#endif
#ifdef CONFIG_WIN32
/* FIXME: add support for Windows. */
#else
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 36/99] ahci: fix PxCI register race
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, John Snow
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: John Snow <jsnow@redhat.com>
Fixes: https://bugs.launchpad.net/qemu/+bug/1769189
AHCI presently signals completion prior to the PxCI register being
cleared to indicate completion. If a guest driver attempts to issue
a new command in its IRQ handler, it might be surprised to learn there
is still a command pending.
In the case of Windows 10's boot driver, it will actually poll the IRQ
register hoping to find out when the command is done running -- which
will never happen, as there isn't a command running.
Fix this: clear PxCI in ahci_cmd_done and not in the asynchronous BH.
Because it now runs synchronously, we don't need to check if the command
is actually done by spying on the ATA registers. We know it's done.
CC: qemu-stable <qemu-stable@nongnu.org>
Reported-by: François Guerraz <kubrick@fgv6.net>
Tested-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180531004323.4611-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 5694c7eacce6b263ad7497cc1bb76aad746cfd4e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
hw/ide/ahci.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index e22d7be05f..18b9a9c18b 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -532,13 +532,6 @@ static void ahci_check_cmd_bh(void *opaque)
qemu_bh_delete(ad->check_bh);
ad->check_bh = NULL;
- if ((ad->busy_slot != -1) &&
- !(ad->port.ifs[0].status & (BUSY_STAT|DRQ_STAT))) {
- /* no longer busy */
- ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
- ad->busy_slot = -1;
- }
-
check_cmd(ad->hba, ad->port_no);
}
@@ -1425,6 +1418,12 @@ static void ahci_cmd_done(IDEDMA *dma)
trace_ahci_cmd_done(ad->hba, ad->port_no);
+ /* no longer busy */
+ if (ad->busy_slot != -1) {
+ ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
+ ad->busy_slot = -1;
+ }
+
/* update d2h status */
ahci_write_fis_d2h(ad);
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 33/99] intel-iommu: rework the page walk logic
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Peter Xu, Michael S . Tsirkin
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Peter Xu <peterx@redhat.com>
This patch fixes a potential small window that the DMA page table might
be incomplete or invalid when the guest sends domain/context
invalidations to a device. This can cause random DMA errors for
assigned devices.
This is a major change to the VT-d shadow page walking logic. It
includes but is not limited to:
- For each VTDAddressSpace, now we maintain what IOVA ranges we have
mapped and what we have not. With that information, now we only send
MAP or UNMAP when necessary. Say, we don't send MAP notifies if we
know we have already mapped the range, meanwhile we don't send UNMAP
notifies if we know we never mapped the range at all.
- Introduce vtd_sync_shadow_page_table[_range] APIs so that we can call
in any places to resync the shadow page table for a device.
- When we receive domain/context invalidation, we should not really run
the replay logic, instead we use the new sync shadow page table API to
resync the whole shadow page table without unmapping the whole
region. After this change, we'll only do the page walk once for each
domain invalidations (before this, it can be multiple, depending on
number of notifiers per address space).
While at it, the page walking logic is also refactored to be simpler.
CC: QEMU Stable <qemu-stable@nongnu.org>
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Tested-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 63b88968f139b6a77f2f81e6f1eedf70c0170a85)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
hw/i386/intel_iommu.c | 213 +++++++++++++++++++++++++---------
hw/i386/trace-events | 3 +-
include/hw/i386/intel_iommu.h | 2 +
3 files changed, 159 insertions(+), 59 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 61bb3d31e7..b5a09b7908 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -769,10 +769,77 @@ typedef struct {
static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
{
+ VTDAddressSpace *as = info->as;
vtd_page_walk_hook hook_fn = info->hook_fn;
void *private = info->private;
+ DMAMap target = {
+ .iova = entry->iova,
+ .size = entry->addr_mask,
+ .translated_addr = entry->translated_addr,
+ .perm = entry->perm,
+ };
+ DMAMap *mapped = iova_tree_find(as->iova_tree, &target);
+
+ if (entry->perm == IOMMU_NONE && !info->notify_unmap) {
+ trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
+ return 0;
+ }
assert(hook_fn);
+
+ /* Update local IOVA mapped ranges */
+ if (entry->perm) {
+ if (mapped) {
+ /* If it's exactly the same translation, skip */
+ if (!memcmp(mapped, &target, sizeof(target))) {
+ trace_vtd_page_walk_one_skip_map(entry->iova, entry->addr_mask,
+ entry->translated_addr);
+ return 0;
+ } else {
+ /*
+ * Translation changed. Normally this should not
+ * happen, but it can happen when with buggy guest
+ * OSes. Note that there will be a small window that
+ * we don't have map at all. But that's the best
+ * effort we can do. The ideal way to emulate this is
+ * atomically modify the PTE to follow what has
+ * changed, but we can't. One example is that vfio
+ * driver only has VFIO_IOMMU_[UN]MAP_DMA but no
+ * interface to modify a mapping (meanwhile it seems
+ * meaningless to even provide one). Anyway, let's
+ * mark this as a TODO in case one day we'll have
+ * a better solution.
+ */
+ IOMMUAccessFlags cache_perm = entry->perm;
+ int ret;
+
+ /* Emulate an UNMAP */
+ entry->perm = IOMMU_NONE;
+ trace_vtd_page_walk_one(info->domain_id,
+ entry->iova,
+ entry->translated_addr,
+ entry->addr_mask,
+ entry->perm);
+ ret = hook_fn(entry, private);
+ if (ret) {
+ return ret;
+ }
+ /* Drop any existing mapping */
+ iova_tree_remove(as->iova_tree, &target);
+ /* Recover the correct permission */
+ entry->perm = cache_perm;
+ }
+ }
+ iova_tree_insert(as->iova_tree, &target);
+ } else {
+ if (!mapped) {
+ /* Skip since we didn't map this range at all */
+ trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
+ return 0;
+ }
+ iova_tree_remove(as->iova_tree, &target);
+ }
+
trace_vtd_page_walk_one(info->domain_id, entry->iova,
entry->translated_addr, entry->addr_mask,
entry->perm);
@@ -834,45 +901,34 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
*/
entry_valid = read_cur | write_cur;
- entry.target_as = &address_space_memory;
- entry.iova = iova & subpage_mask;
- entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
- entry.addr_mask = ~subpage_mask;
-
- if (vtd_is_last_slpte(slpte, level)) {
- /* NOTE: this is only meaningful if entry_valid == true */
- entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
- if (!entry_valid && !info->notify_unmap) {
- trace_vtd_page_walk_skip_perm(iova, iova_next);
- goto next;
- }
- ret = vtd_page_walk_one(&entry, info);
- if (ret < 0) {
- return ret;
- }
- } else {
- if (!entry_valid) {
- if (info->notify_unmap) {
- /*
- * The whole entry is invalid; unmap it all.
- * Translated address is meaningless, zero it.
- */
- entry.translated_addr = 0x0;
- ret = vtd_page_walk_one(&entry, info);
- if (ret < 0) {
- return ret;
- }
- } else {
- trace_vtd_page_walk_skip_perm(iova, iova_next);
- }
- goto next;
- }
+ if (!vtd_is_last_slpte(slpte, level) && entry_valid) {
+ /*
+ * This is a valid PDE (or even bigger than PDE). We need
+ * to walk one further level.
+ */
ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
iova, MIN(iova_next, end), level - 1,
read_cur, write_cur, info);
- if (ret < 0) {
- return ret;
- }
+ } else {
+ /*
+ * This means we are either:
+ *
+ * (1) the real page entry (either 4K page, or huge page)
+ * (2) the whole range is invalid
+ *
+ * In either case, we send an IOTLB notification down.
+ */
+ entry.target_as = &address_space_memory;
+ entry.iova = iova & subpage_mask;
+ entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+ entry.addr_mask = ~subpage_mask;
+ /* NOTE: this is only meaningful if entry_valid == true */
+ entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
+ ret = vtd_page_walk_one(&entry, info);
+ }
+
+ if (ret < 0) {
+ return ret;
}
next:
@@ -964,6 +1020,58 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
return 0;
}
+static int vtd_sync_shadow_page_hook(IOMMUTLBEntry *entry,
+ void *private)
+{
+ memory_region_notify_iommu((IOMMUMemoryRegion *)private, *entry);
+ return 0;
+}
+
+/* If context entry is NULL, we'll try to fetch it on our own. */
+static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
+ VTDContextEntry *ce,
+ hwaddr addr, hwaddr size)
+{
+ IntelIOMMUState *s = vtd_as->iommu_state;
+ vtd_page_walk_info info = {
+ .hook_fn = vtd_sync_shadow_page_hook,
+ .private = (void *)&vtd_as->iommu,
+ .notify_unmap = true,
+ .aw = s->aw_bits,
+ .as = vtd_as,
+ };
+ VTDContextEntry ce_cache;
+ int ret;
+
+ if (ce) {
+ /* If the caller provided context entry, use it */
+ ce_cache = *ce;
+ } else {
+ /* If the caller didn't provide ce, try to fetch */
+ ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
+ vtd_as->devfn, &ce_cache);
+ if (ret) {
+ /*
+ * This should not really happen, but in case it happens,
+ * we just skip the sync for this time. After all we even
+ * don't have the root table pointer!
+ */
+ trace_vtd_err("Detected invalid context entry when "
+ "trying to sync shadow page table");
+ return 0;
+ }
+ }
+
+ info.domain_id = VTD_CONTEXT_ENTRY_DID(ce_cache.hi);
+
+ return vtd_page_walk(&ce_cache, addr, addr + size, &info);
+}
+
+static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
+{
+ return vtd_sync_shadow_page_table_range(vtd_as, NULL, 0, UINT64_MAX);
+}
+
/*
* Fetch translation type for specific device. Returns <0 if error
* happens, otherwise return the shifted type to check against
@@ -1296,7 +1404,7 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s)
VTDAddressSpace *vtd_as;
QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
- memory_region_iommu_replay_all(&vtd_as->iommu);
+ vtd_sync_shadow_page_table(vtd_as);
}
}
@@ -1371,14 +1479,13 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
vtd_switch_address_space(vtd_as);
/*
* So a device is moving out of (or moving into) a
- * domain, a replay() suites here to notify all the
- * IOMMU_NOTIFIER_MAP registers about this change.
+ * domain, resync the shadow page table.
* This won't bring bad even if we have no such
* notifier registered - the IOMMU notification
* framework will skip MAP notifications if that
* happened.
*/
- memory_region_iommu_replay_all(&vtd_as->iommu);
+ vtd_sync_shadow_page_table(vtd_as);
}
}
}
@@ -1436,18 +1543,11 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
vtd_as->devfn, &ce) &&
domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
- memory_region_iommu_replay_all(&vtd_as->iommu);
+ vtd_sync_shadow_page_table(vtd_as);
}
}
}
-static int vtd_page_invalidate_notify_hook(IOMMUTLBEntry *entry,
- void *private)
-{
- memory_region_notify_iommu((IOMMUMemoryRegion *)private, *entry);
- return 0;
-}
-
static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
uint16_t domain_id, hwaddr addr,
uint8_t am)
@@ -1462,21 +1562,12 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
vtd_as->devfn, &ce);
if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
if (vtd_as_has_map_notifier(vtd_as)) {
- vtd_page_walk_info info = {
- .hook_fn = vtd_page_invalidate_notify_hook,
- .private = (void *)&vtd_as->iommu,
- .notify_unmap = true,
- .aw = s->aw_bits,
- .as = vtd_as,
- .domain_id = domain_id,
- };
-
/*
* As long as we have MAP notifications registered in
* any of our IOMMU notifiers, we need to sync the
* shadow page table.
*/
- vtd_page_walk(&ce, addr, addr + size, &info);
+ vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
} else {
/*
* For UNMAP-only notifiers, we don't need to walk the
@@ -2806,6 +2897,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
vtd_dev_as->devfn = (uint8_t)devfn;
vtd_dev_as->iommu_state = s;
vtd_dev_as->context_cache_entry.context_cache_gen = 0;
+ vtd_dev_as->iova_tree = iova_tree_new();
/*
* Memory region relationships looks like (Address range shows
@@ -2858,6 +2950,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
hwaddr start = n->start;
hwaddr end = n->end;
IntelIOMMUState *s = as->iommu_state;
+ DMAMap map;
/*
* Note: all the codes in this function has a assumption that IOVA
@@ -2902,6 +2995,10 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
VTD_PCI_FUNC(as->devfn),
entry.iova, size);
+ map.iova = entry.iova;
+ map.size = entry.addr_mask;
+ iova_tree_remove(as->iova_tree, &map);
+
memory_region_notify_one(n, &entry);
}
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index ca23ba9fad..e14d06ec83 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -40,8 +40,9 @@ vtd_replay_ce_valid(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t domain, uint6
vtd_replay_ce_invalid(uint8_t bus, uint8_t dev, uint8_t fn) "replay invalid context device %02"PRIx8":%02"PRIx8".%02"PRIx8
vtd_page_walk_level(uint64_t addr, uint32_t level, uint64_t start, uint64_t end) "walk (base=0x%"PRIx64", level=%"PRIu32") iova range 0x%"PRIx64" - 0x%"PRIx64
vtd_page_walk_one(uint16_t domain, uint64_t iova, uint64_t gpa, uint64_t mask, int perm) "domain 0x%"PRIu16" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64" perm %d"
+vtd_page_walk_one_skip_map(uint64_t iova, uint64_t mask, uint64_t translated) "iova 0x%"PRIx64" mask 0x%"PRIx64" translated 0x%"PRIx64
+vtd_page_walk_one_skip_unmap(uint64_t iova, uint64_t mask) "iova 0x%"PRIx64" mask 0x%"PRIx64
vtd_page_walk_skip_read(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to unable to read"
-vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to perm empty"
vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
vtd_as_unmap_whole(uint8_t bus, uint8_t slot, uint8_t fn, uint64_t iova, uint64_t size) "Device %02x:%02x.%x start 0x%"PRIx64" size 0x%"PRIx64
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 156f35e919..fbfedcb1c0 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -27,6 +27,7 @@
#include "hw/i386/ioapic.h"
#include "hw/pci/msi.h"
#include "hw/sysbus.h"
+#include "qemu/iova-tree.h"
#define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
#define INTEL_IOMMU_DEVICE(obj) \
@@ -95,6 +96,7 @@ struct VTDAddressSpace {
QLIST_ENTRY(VTDAddressSpace) next;
/* Superset of notifier flags that this address space has */
IOMMUNotifierFlag notifier_flags;
+ IOVATree *iova_tree; /* Traces mapped IOVA ranges */
};
struct VTDBus {
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 34/99] arm_gicv3_kvm: increase clroffset accordingly
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Shannon Zhao, Peter Maydell
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Shannon Zhao <zhaoshenglong@huawei.com>
It forgot to increase clroffset during the loop. So it only clear the
first 4 bytes.
Fixes: 367b9f527becdd20ddf116e17a3c0c2bbc486920
Cc: qemu-stable@nongnu.org
Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1527047633-12368-1-git-send-email-zhaoshenglong@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 34ffacae085914fce54590ea84bae9c6ad95e2a4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
hw/intc/arm_gicv3_kvm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index ec371772b3..3536795501 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -243,6 +243,7 @@ static void kvm_dist_putbmp(GICv3State *s, uint32_t offset,
if (clroffset != 0) {
reg = 0;
kvm_gicd_access(s, clroffset, ®, true);
+ clroffset += 4;
}
reg = *gic_bmp_ptr32(bmp, irq);
kvm_gicd_access(s, offset, ®, true);
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 32/99] util: implement simple iova tree
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Peter Xu, Michael S . Tsirkin
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Peter Xu <peterx@redhat.com>
Introduce a simplest iova tree implementation based on GTree.
CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit eecf5eedbdc0fc04f39abcf3afeedfbf21b25ca4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
MAINTAINERS | 6 ++
include/qemu/iova-tree.h | 134 +++++++++++++++++++++++++++++++++++++++
util/Makefile.objs | 1 +
util/iova-tree.c | 114 +++++++++++++++++++++++++++++++++
4 files changed, 255 insertions(+)
create mode 100644 include/qemu/iova-tree.h
create mode 100644 util/iova-tree.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 24b70169bc..ada7c33485 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1781,6 +1781,12 @@ F: include/sysemu/replay.h
F: docs/replay.txt
F: stubs/replay.c
+IOVA Tree
+M: Peter Xu <peterx@redhat.com>
+S: Maintained
+F: include/qemu/iova-tree.h
+F: util/iova-tree.c
+
Usermode Emulation
------------------
Overall
diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
new file mode 100644
index 0000000000..b061932097
--- /dev/null
+++ b/include/qemu/iova-tree.h
@@ -0,0 +1,134 @@
+/*
+ * An very simplified iova tree implementation based on GTree.
+ *
+ * Copyright 2018 Red Hat, Inc.
+ *
+ * Authors:
+ * Peter Xu <peterx@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ */
+#ifndef IOVA_TREE_H
+#define IOVA_TREE_H
+
+/*
+ * Currently the iova tree will only allow to keep ranges
+ * information, and no extra user data is allowed for each element. A
+ * benefit is that we can merge adjacent ranges internally within the
+ * tree. It can save a lot of memory when the ranges are splitted but
+ * mostly continuous.
+ *
+ * Note that current implementation does not provide any thread
+ * protections. Callers of the iova tree should be responsible
+ * for the thread safety issue.
+ */
+
+#include "qemu/osdep.h"
+#include "exec/memory.h"
+#include "exec/hwaddr.h"
+
+#define IOVA_OK (0)
+#define IOVA_ERR_INVALID (-1) /* Invalid parameters */
+#define IOVA_ERR_OVERLAP (-2) /* IOVA range overlapped */
+
+typedef struct IOVATree IOVATree;
+typedef struct DMAMap {
+ hwaddr iova;
+ hwaddr translated_addr;
+ hwaddr size; /* Inclusive */
+ IOMMUAccessFlags perm;
+} QEMU_PACKED DMAMap;
+typedef gboolean (*iova_tree_iterator)(DMAMap *map);
+
+/**
+ * iova_tree_new:
+ *
+ * Create a new iova tree.
+ *
+ * Returns: the tree pointer when succeeded, or NULL if error.
+ */
+IOVATree *iova_tree_new(void);
+
+/**
+ * iova_tree_insert:
+ *
+ * @tree: the iova tree to insert
+ * @map: the mapping to insert
+ *
+ * Insert an iova range to the tree. If there is overlapped
+ * ranges, IOVA_ERR_OVERLAP will be returned.
+ *
+ * Return: 0 if succeeded, or <0 if error.
+ */
+int iova_tree_insert(IOVATree *tree, DMAMap *map);
+
+/**
+ * iova_tree_remove:
+ *
+ * @tree: the iova tree to remove range from
+ * @map: the map range to remove
+ *
+ * Remove mappings from the tree that are covered by the map range
+ * provided. The range does not need to be exactly what has inserted,
+ * all the mappings that are included in the provided range will be
+ * removed from the tree. Here map->translated_addr is meaningless.
+ *
+ * Return: 0 if succeeded, or <0 if error.
+ */
+int iova_tree_remove(IOVATree *tree, DMAMap *map);
+
+/**
+ * iova_tree_find:
+ *
+ * @tree: the iova tree to search from
+ * @map: the mapping to search
+ *
+ * Search for a mapping in the iova tree that overlaps with the
+ * mapping range specified. Only the first found mapping will be
+ * returned.
+ *
+ * Return: DMAMap pointer if found, or NULL if not found. Note that
+ * the returned DMAMap pointer is maintained internally. User should
+ * only read the content but never modify or free the content. Also,
+ * user is responsible to make sure the pointer is valid (say, no
+ * concurrent deletion in progress).
+ */
+DMAMap *iova_tree_find(IOVATree *tree, DMAMap *map);
+
+/**
+ * iova_tree_find_address:
+ *
+ * @tree: the iova tree to search from
+ * @iova: the iova address to find
+ *
+ * Similar to iova_tree_find(), but it tries to find mapping with
+ * range iova=iova & size=0.
+ *
+ * Return: same as iova_tree_find().
+ */
+DMAMap *iova_tree_find_address(IOVATree *tree, hwaddr iova);
+
+/**
+ * iova_tree_foreach:
+ *
+ * @tree: the iova tree to iterate on
+ * @iterator: the interator for the mappings, return true to stop
+ *
+ * Iterate over the iova tree.
+ *
+ * Return: 1 if found any overlap, 0 if not, <0 if error.
+ */
+void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
+
+/**
+ * iova_tree_destroy:
+ *
+ * @tree: the iova tree to destroy
+ *
+ * Destroy an existing iova tree.
+ *
+ * Return: None.
+ */
+void iova_tree_destroy(IOVATree *tree);
+
+#endif
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 728c3541db..e1c3fed4dc 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -47,4 +47,5 @@ util-obj-y += qht.o
util-obj-y += range.o
util-obj-y += stats64.o
util-obj-y += systemd.o
+util-obj-y += iova-tree.o
util-obj-$(CONFIG_LINUX) += vfio-helpers.o
diff --git a/util/iova-tree.c b/util/iova-tree.c
new file mode 100644
index 0000000000..2d9cebfc89
--- /dev/null
+++ b/util/iova-tree.c
@@ -0,0 +1,114 @@
+/*
+ * IOVA tree implementation based on GTree.
+ *
+ * Copyright 2018 Red Hat, Inc.
+ *
+ * Authors:
+ * Peter Xu <peterx@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ */
+
+#include <glib.h>
+#include "qemu/iova-tree.h"
+
+struct IOVATree {
+ GTree *tree;
+};
+
+static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
+{
+ const DMAMap *m1 = a, *m2 = b;
+
+ if (m1->iova > m2->iova + m2->size) {
+ return 1;
+ }
+
+ if (m1->iova + m1->size < m2->iova) {
+ return -1;
+ }
+
+ /* Overlapped */
+ return 0;
+}
+
+IOVATree *iova_tree_new(void)
+{
+ IOVATree *iova_tree = g_new0(IOVATree, 1);
+
+ /* We don't have values actually, no need to free */
+ iova_tree->tree = g_tree_new_full(iova_tree_compare, NULL, g_free, NULL);
+
+ return iova_tree;
+}
+
+DMAMap *iova_tree_find(IOVATree *tree, DMAMap *map)
+{
+ return g_tree_lookup(tree->tree, map);
+}
+
+DMAMap *iova_tree_find_address(IOVATree *tree, hwaddr iova)
+{
+ DMAMap map = { .iova = iova, .size = 0 };
+
+ return iova_tree_find(tree, &map);
+}
+
+static inline void iova_tree_insert_internal(GTree *gtree, DMAMap *range)
+{
+ /* Key and value are sharing the same range data */
+ g_tree_insert(gtree, range, range);
+}
+
+int iova_tree_insert(IOVATree *tree, DMAMap *map)
+{
+ DMAMap *new;
+
+ if (map->iova + map->size < map->iova || map->perm == IOMMU_NONE) {
+ return IOVA_ERR_INVALID;
+ }
+
+ /* We don't allow to insert range that overlaps with existings */
+ if (iova_tree_find(tree, map)) {
+ return IOVA_ERR_OVERLAP;
+ }
+
+ new = g_new0(DMAMap, 1);
+ memcpy(new, map, sizeof(*new));
+ iova_tree_insert_internal(tree->tree, new);
+
+ return IOVA_OK;
+}
+
+static gboolean iova_tree_traverse(gpointer key, gpointer value,
+ gpointer data)
+{
+ iova_tree_iterator iterator = data;
+ DMAMap *map = key;
+
+ g_assert(key == value);
+
+ return iterator(map);
+}
+
+void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator)
+{
+ g_tree_foreach(tree->tree, iova_tree_traverse, iterator);
+}
+
+int iova_tree_remove(IOVATree *tree, DMAMap *map)
+{
+ DMAMap *overlap;
+
+ while ((overlap = iova_tree_find(tree, map))) {
+ g_tree_remove(tree->tree, overlap);
+ }
+
+ return IOVA_OK;
+}
+
+void iova_tree_destroy(IOVATree *tree)
+{
+ g_tree_destroy(tree->tree);
+ g_free(tree);
+}
--
2.17.1
^ permalink raw reply related
* [Qemu-devel] [PATCH 31/99] intel-iommu: trace domain id during page walk
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Peter Xu, Michael S . Tsirkin
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Peter Xu <peterx@redhat.com>
This patch only modifies the trace points.
Previously we were tracing page walk levels. They are redundant since
we have page mask (size) already. Now we trace something much more
useful which is the domain ID of the page walking. That can be very
useful when we trace more than one devices on the same system, so that
we can know which map is for which domain.
CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit d118c06ebbee2d23ddf873cae4a809311aa61310)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
hw/i386/intel_iommu.c | 16 ++++++++++------
hw/i386/trace-events | 2 +-
2 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a882894f49..61bb3d31e7 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -756,6 +756,7 @@ typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
* @notify_unmap: whether we should notify invalid entries
* @as: VT-d address space of the device
* @aw: maximum address width
+ * @domain: domain ID of the page walk
*/
typedef struct {
VTDAddressSpace *as;
@@ -763,17 +764,18 @@ typedef struct {
void *private;
bool notify_unmap;
uint8_t aw;
+ uint16_t domain_id;
} vtd_page_walk_info;
-static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
- vtd_page_walk_info *info)
+static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
{
vtd_page_walk_hook hook_fn = info->hook_fn;
void *private = info->private;
assert(hook_fn);
- trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
- entry->addr_mask, entry->perm);
+ trace_vtd_page_walk_one(info->domain_id, entry->iova,
+ entry->translated_addr, entry->addr_mask,
+ entry->perm);
return hook_fn(entry, private);
}
@@ -844,7 +846,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
trace_vtd_page_walk_skip_perm(iova, iova_next);
goto next;
}
- ret = vtd_page_walk_one(&entry, level, info);
+ ret = vtd_page_walk_one(&entry, info);
if (ret < 0) {
return ret;
}
@@ -856,7 +858,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
* Translated address is meaningless, zero it.
*/
entry.translated_addr = 0x0;
- ret = vtd_page_walk_one(&entry, level, info);
+ ret = vtd_page_walk_one(&entry, info);
if (ret < 0) {
return ret;
}
@@ -1466,6 +1468,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
.notify_unmap = true,
.aw = s->aw_bits,
.as = vtd_as,
+ .domain_id = domain_id,
};
/*
@@ -2947,6 +2950,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
.notify_unmap = false,
.aw = s->aw_bits,
.as = vtd_as,
+ .domain_id = VTD_CONTEXT_ENTRY_DID(ce.hi),
};
vtd_page_walk(&ce, 0, ~0ULL, &info);
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 22d44648af..ca23ba9fad 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -39,7 +39,7 @@ vtd_fault_disabled(void) "Fault processing disabled for context entry"
vtd_replay_ce_valid(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t domain, uint64_t hi, uint64_t lo) "replay valid context device %02"PRIx8":%02"PRIx8".%02"PRIx8" domain 0x%"PRIx16" hi 0x%"PRIx64" lo 0x%"PRIx64
vtd_replay_ce_invalid(uint8_t bus, uint8_t dev, uint8_t fn) "replay invalid context device %02"PRIx8":%02"PRIx8".%02"PRIx8
vtd_page_walk_level(uint64_t addr, uint32_t level, uint64_t start, uint64_t end) "walk (base=0x%"PRIx64", level=%"PRIu32") iova range 0x%"PRIx64" - 0x%"PRIx64
-vtd_page_walk_one(uint32_t level, uint64_t iova, uint64_t gpa, uint64_t mask, int perm) "detected page level 0x%"PRIx32" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64" perm %d"
+vtd_page_walk_one(uint16_t domain, uint64_t iova, uint64_t gpa, uint64_t mask, int perm) "domain 0x%"PRIu16" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64" perm %d"
vtd_page_walk_skip_read(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to unable to read"
vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to perm empty"
vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
--
2.17.1
^ permalink raw reply related
* ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Skip repeated calls to i915_gem_set_wedged() (rev4)
From: Patchwork @ 2018-07-23 20:19 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
In-Reply-To: <20180723145335.24579-1-chris@chris-wilson.co.uk>
== Series Details ==
Series: drm/i915: Skip repeated calls to i915_gem_set_wedged() (rev4)
URL : https://patchwork.freedesktop.org/series/47067/
State : warning
== Summary ==
$ dim checkpatch origin/drm-tip
a85ab6631205 drm/i915: Skip repeated calls to i915_gem_set_wedged()
-:61: WARNING:MEMORY_BARRIER: memory barrier without comment
#61: FILE: drivers/gpu/drm/i915/i915_gem.c:3389:
+ smp_mb__before_atomic();
total: 0 errors, 1 warnings, 0 checks, 74 lines checked
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply
* [Qemu-devel] [PATCH 29/99] intel-iommu: introduce vtd_page_walk_info
From: Michael Roth @ 2018-07-23 20:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Peter Xu, Michael S . Tsirkin
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
From: Peter Xu <peterx@redhat.com>
During the recursive page walking of IOVA page tables, some stack
variables are constant variables and never changed during the whole page
walking procedure. Isolate them into a struct so that we don't need to
pass those contants down the stack every time and multiple times.
CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit fe215b0cbb8c1f4b4af0a64aa5c02042080dd537)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
hw/i386/intel_iommu.c | 84 ++++++++++++++++++++++++++-----------------
1 file changed, 52 insertions(+), 32 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 38ccc741f9..e247269659 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -748,9 +748,27 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
+/**
+ * Constant information used during page walking
+ *
+ * @hook_fn: hook func to be called when detected page
+ * @private: private data to be passed into hook func
+ * @notify_unmap: whether we should notify invalid entries
+ * @aw: maximum address width
+ */
+typedef struct {
+ vtd_page_walk_hook hook_fn;
+ void *private;
+ bool notify_unmap;
+ uint8_t aw;
+} vtd_page_walk_info;
+
static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
- vtd_page_walk_hook hook_fn, void *private)
+ vtd_page_walk_info *info)
{
+ vtd_page_walk_hook hook_fn = info->hook_fn;
+ void *private = info->private;
+
assert(hook_fn);
trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
entry->addr_mask, entry->perm);
@@ -763,17 +781,13 @@ static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
* @addr: base GPA addr to start the walk
* @start: IOVA range start address
* @end: IOVA range end address (start <= addr < end)
- * @hook_fn: hook func to be called when detected page
- * @private: private data to be passed into hook func
* @read: whether parent level has read permission
* @write: whether parent level has write permission
- * @notify_unmap: whether we should notify invalid entries
- * @aw: maximum address width
+ * @info: constant information for the page walk
*/
static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
- uint64_t end, vtd_page_walk_hook hook_fn,
- void *private, uint32_t level, bool read,
- bool write, bool notify_unmap, uint8_t aw)
+ uint64_t end, uint32_t level, bool read,
+ bool write, vtd_page_walk_info *info)
{
bool read_cur, write_cur, entry_valid;
uint32_t offset;
@@ -823,24 +837,24 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
if (vtd_is_last_slpte(slpte, level)) {
/* NOTE: this is only meaningful if entry_valid == true */
- entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
- if (!entry_valid && !notify_unmap) {
+ entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
+ if (!entry_valid && !info->notify_unmap) {
trace_vtd_page_walk_skip_perm(iova, iova_next);
goto next;
}
- ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+ ret = vtd_page_walk_one(&entry, level, info);
if (ret < 0) {
return ret;
}
} else {
if (!entry_valid) {
- if (notify_unmap) {
+ if (info->notify_unmap) {
/*
* The whole entry is invalid; unmap it all.
* Translated address is meaningless, zero it.
*/
entry.translated_addr = 0x0;
- ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+ ret = vtd_page_walk_one(&entry, level, info);
if (ret < 0) {
return ret;
}
@@ -849,10 +863,9 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
}
goto next;
}
- ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
- MIN(iova_next, end), hook_fn, private,
- level - 1, read_cur, write_cur,
- notify_unmap, aw);
+ ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
+ iova, MIN(iova_next, end), level - 1,
+ read_cur, write_cur, info);
if (ret < 0) {
return ret;
}
@@ -871,28 +884,24 @@ next:
* @ce: context entry to walk upon
* @start: IOVA address to start the walk
* @end: IOVA range end address (start <= addr < end)
- * @hook_fn: the hook that to be called for each detected area
- * @private: private data for the hook function
- * @aw: maximum address width
+ * @info: page walking information struct
*/
static int vtd_page_walk(VTDContextEntry *ce, uint64_t start, uint64_t end,
- vtd_page_walk_hook hook_fn, void *private,
- bool notify_unmap, uint8_t aw)
+ vtd_page_walk_info *info)
{
dma_addr_t addr = vtd_ce_get_slpt_base(ce);
uint32_t level = vtd_ce_get_level(ce);
- if (!vtd_iova_range_check(start, ce, aw)) {
+ if (!vtd_iova_range_check(start, ce, info->aw)) {
return -VTD_FR_ADDR_BEYOND_MGAW;
}
- if (!vtd_iova_range_check(end, ce, aw)) {
+ if (!vtd_iova_range_check(end, ce, info->aw)) {
/* Fix end so that it reaches the maximum */
- end = vtd_iova_limit(ce, aw);
+ end = vtd_iova_limit(ce, info->aw);
}
- return vtd_page_walk_level(addr, start, end, hook_fn, private,
- level, true, true, notify_unmap, aw);
+ return vtd_page_walk_level(addr, start, end, level, true, true, info);
}
/* Map a device to its corresponding domain (context-entry) */
@@ -1449,14 +1458,19 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
vtd_as->devfn, &ce);
if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
if (vtd_as_has_map_notifier(vtd_as)) {
+ vtd_page_walk_info info = {
+ .hook_fn = vtd_page_invalidate_notify_hook,
+ .private = (void *)&vtd_as->iommu,
+ .notify_unmap = true,
+ .aw = s->aw_bits,
+ };
+
/*
* As long as we have MAP notifications registered in
* any of our IOMMU notifiers, we need to sync the
* shadow page table.
*/
- vtd_page_walk(&ce, addr, addr + size,
- vtd_page_invalidate_notify_hook,
- (void *)&vtd_as->iommu, true, s->aw_bits);
+ vtd_page_walk(&ce, addr, addr + size, &info);
} else {
/*
* For UNMAP-only notifiers, we don't need to walk the
@@ -2924,8 +2938,14 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
ce.hi, ce.lo);
if (vtd_as_has_map_notifier(vtd_as)) {
/* This is required only for MAP typed notifiers */
- vtd_page_walk(&ce, 0, ~0ULL, vtd_replay_hook, (void *)n, false,
- s->aw_bits);
+ vtd_page_walk_info info = {
+ .hook_fn = vtd_replay_hook,
+ .private = (void *)n,
+ .notify_unmap = false,
+ .aw = s->aw_bits,
+ };
+
+ vtd_page_walk(&ce, 0, ~0ULL, &info);
}
} else {
trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
--
2.17.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.