Discussion:
[RFC v4 00/16] VIRTIO-IOMMU device
(too old to reply)
Eric Auger
2017-09-19 07:46:32 UTC
Permalink
This series implements the virtio-iommu device.

This v4 is an upgrade to v0.4 spec [1] and applies on QEMU v2.10.0.
- probe request support although no reserved region is returned at
the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell

The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only, as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node.

The associated VHOST/VFIO adaptation is available on the branch
below but is not officially delivered as part of this series.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/v2.10.0-virtio-iommu-v4

References:
[1] [RFC] virtio-iommu version 0.4
git://linux-arm.org/virtio-iommu.git branch viommu/v0.4

Testing:
- guest kernel with v0.4 virtio-iommu driver (4kB page)
- tested with guest using virtio-pci-net and vhost-net
(,vhost=on/off,iommu_platform,disable-modern=off,disable-legacy=on )
- tested with VFIO and guest assigned with 2 VFs
- tested with DPDK on guest with 2 assigned VFs

Not tested:
- hot-plug/hot-unplug of EP: not implemented

History:
v2-> v4:
- see above

v2 -> v3:
- rebase on top of 2.10-rc0 and especially
[PATCH qemu v9 0/2] memory/iommu: QOM'fy IOMMU MemoryRegion
- add mutex init
- fix as->mappings deletion using g_tree_ref/unref
- when a dev is attached whereas it is already attached to
another address space, first detach it
- fix some error values
- page_sizes = TARGET_PAGE_MASK;
- I haven't changed the unmap() semantics yet, waiting for the
next virtio-iommu spec revision.

v1 -> v2:
- fix redifinition of viommu_as typedef

Eric Auger (16):
update-linux-headers: import virtio_iommu.h
linux-headers: Update for virtio-iommu
virtio-iommu: add skeleton
virtio-iommu: Decode the command payload
virtio-iommu: Add the iommu regions
virtio-iommu: Register attached devices
virtio-iommu: Implement attach/detach command
virtio-iommu: Implement map/unmap
virtio-iommu: Implement translate
virtio-iommu: Implement probe request
hw/arm/virt: Add 2.11 machine type
hw/arm/virt: Add virtio-iommu to the virt board
memory.h: Add set_page_size_mask IOMMUMemoryRegion callback
hw/vfio/common: Set the IOMMUMemoryRegion supported page sizes
virtio-iommu: Implement set_page_size_mask
hw/vfio/common: Do not print error when viommu translates into an mmio
region

hw/arm/virt.c | 115 +++-
hw/vfio/common.c | 7 +-
hw/virtio/Makefile.objs | 1 +
hw/virtio/trace-events | 24 +
hw/virtio/virtio-iommu.c | 923 ++++++++++++++++++++++++++
include/exec/memory.h | 4 +
include/hw/arm/virt.h | 5 +
include/hw/vfio/vfio-common.h | 1 +
include/hw/virtio/virtio-iommu.h | 61 ++
include/standard-headers/linux/virtio_ids.h | 1 +
include/standard-headers/linux/virtio_iommu.h | 177 +++++
linux-headers/linux/virtio_iommu.h | 1 +
scripts/update-linux-headers.sh | 3 +
13 files changed, 1312 insertions(+), 11 deletions(-)
create mode 100644 hw/virtio/virtio-iommu.c
create mode 100644 include/hw/virtio/virtio-iommu.h
create mode 100644 include/standard-headers/linux/virtio_iommu.h
create mode 100644 linux-headers/linux/virtio_iommu.h
--
2.5.5
Eric Auger
2017-09-19 07:46:33 UTC
Permalink
Update the script to update the virtio_iommu.h header.

Signed-off-by: Eric Auger <***@redhat.com>
---
scripts/update-linux-headers.sh | 3 +++
1 file changed, 3 insertions(+)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 2f906c4..03f6712 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -134,6 +134,9 @@ EOF
cat <<EOF >$output/linux-headers/linux/virtio_config.h
#include "standard-headers/linux/virtio_config.h"
EOF
+cat <<EOF >$output/linux-headers/linux/virtio_iommu.h
+#include "standard-headers/linux/virtio_iommu.h"
+EOF
cat <<EOF >$output/linux-headers/linux/virtio_ring.h
#include "standard-headers/linux/virtio_ring.h"
EOF
--
2.5.5
Eric Auger
2017-09-19 07:46:36 UTC
Permalink
This patch adds the command payload decoding and
introduces the functions that will do the actual
command handling. Those functions are not yet implemented.

Signed-off-by: Eric Auger <***@redhat.com>

---
v3 -> v4:
- no flags field anymore in struct virtio_iommu_req_unmap
- test reserved on attach/detach, change trace proto

v3 -> v4:
- rebase on v2.10.0.
---
hw/virtio/trace-events | 4 ++
hw/virtio/virtio-iommu.c | 104 +++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 98cf9a1..a35dc62 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -32,3 +32,7 @@ virtio_iommu_set_features(uint64_t features) "features accepted by the driver =0
virtio_iommu_device_reset(void) "reset!"
virtio_iommu_device_status(uint8_t status) "driver status = %d"
virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint8_t ioasid_bits, uint32_t probe_size) "page_size_mask=0x%"PRIx64" start=0x%"PRIx64" end=0x%"PRIx64" ioasid_bits=%d probe_size=0x%x"
+virtio_iommu_attach(uint32_t as, uint32_t dev) "as=%d dev=%d"
+virtio_iommu_detach(uint32_t dev) "dev=%d"
+virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr, uint64_t size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64" virt_addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d"
+virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 36e5b5f..51f9003 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -35,29 +35,125 @@
/* Max size */
#define VIOMMU_DEFAULT_QUEUE_SIZE 256

+static int virtio_iommu_attach(VirtIOIOMMU *s,
+ struct virtio_iommu_req_attach *req)
+{
+ uint32_t asid = le32_to_cpu(req->address_space);
+ uint32_t devid = le32_to_cpu(req->device);
+ uint32_t reserved = le32_to_cpu(req->reserved);
+
+ trace_virtio_iommu_attach(asid, devid);
+
+ if (reserved) {
+ return VIRTIO_IOMMU_S_INVAL;
+ }
+
+ return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_detach(VirtIOIOMMU *s,
+ struct virtio_iommu_req_detach *req)
+{
+ uint32_t devid = le32_to_cpu(req->device);
+ uint32_t reserved = le32_to_cpu(req->reserved);
+
+ trace_virtio_iommu_detach(devid);
+
+ if (reserved) {
+ return VIRTIO_IOMMU_S_INVAL;
+ }
+
+ return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_map(VirtIOIOMMU *s,
+ struct virtio_iommu_req_map *req)
+{
+ uint32_t asid = le32_to_cpu(req->address_space);
+ uint64_t phys_addr = le64_to_cpu(req->phys_addr);
+ uint64_t virt_addr = le64_to_cpu(req->virt_addr);
+ uint64_t size = le64_to_cpu(req->size);
+ uint32_t flags = le32_to_cpu(req->flags);
+
+ trace_virtio_iommu_map(asid, phys_addr, virt_addr, size, flags);
+
+ return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_unmap(VirtIOIOMMU *s,
+ struct virtio_iommu_req_unmap *req)
+{
+ uint32_t asid = le32_to_cpu(req->address_space);
+ uint64_t virt_addr = le64_to_cpu(req->virt_addr);
+ uint64_t size = le64_to_cpu(req->size);
+
+ trace_virtio_iommu_unmap(asid, virt_addr, size);
+
+ return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+#define get_payload_size(req) (\
+sizeof((req)) - sizeof(struct virtio_iommu_req_tail))
+
static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
struct iovec *iov,
unsigned int iov_cnt)
{
- return -ENOENT;
+ struct virtio_iommu_req_attach req;
+ size_t sz, payload_sz;
+
+ payload_sz = get_payload_size(req);
+
+ sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+ if (sz != payload_sz) {
+ return VIRTIO_IOMMU_S_INVAL;
+ }
+ return virtio_iommu_attach(s, &req);
}
static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
struct iovec *iov,
unsigned int iov_cnt)
{
- return -ENOENT;
+ struct virtio_iommu_req_detach req;
+ size_t sz, payload_sz;
+
+ payload_sz = get_payload_size(req);
+
+ sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+ if (sz != payload_sz) {
+ return VIRTIO_IOMMU_S_INVAL;
+ }
+ return virtio_iommu_detach(s, &req);
}
static int virtio_iommu_handle_map(VirtIOIOMMU *s,
struct iovec *iov,
unsigned int iov_cnt)
{
- return -ENOENT;
+ struct virtio_iommu_req_map req;
+ size_t sz, payload_sz;
+
+ payload_sz = get_payload_size(req);
+
+ sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+ if (sz != payload_sz) {
+ return VIRTIO_IOMMU_S_INVAL;
+ }
+ return virtio_iommu_map(s, &req);
}
static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
struct iovec *iov,
unsigned int iov_cnt)
{
- return -ENOENT;
+ struct virtio_iommu_req_unmap req;
+ size_t sz, payload_sz;
+
+ payload_sz = get_payload_size(req);
+
+ sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+ if (sz != payload_sz) {
+ return VIRTIO_IOMMU_S_INVAL;
+ }
+ return virtio_iommu_unmap(s, &req);
}

static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
--
2.5.5
Eric Auger
2017-09-19 07:46:34 UTC
Permalink
This is a partial linux header update against Jean-Philippe's branch:
git://linux-arm.org/linux-jpb.git virtio-iommu/v0.4 (unstable)

Signed-off-by: Eric Auger <***@redhat.com>

---

v3- >v3:
- upgrade the API from v0.1 to v0.4
---
include/standard-headers/linux/virtio_ids.h | 1 +
include/standard-headers/linux/virtio_iommu.h | 177 ++++++++++++++++++++++++++
linux-headers/linux/virtio_iommu.h | 1 +
3 files changed, 179 insertions(+)
create mode 100644 include/standard-headers/linux/virtio_iommu.h
create mode 100644 linux-headers/linux/virtio_iommu.h

diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h
index 6d5c3b2..934ed3d 100644
--- a/include/standard-headers/linux/virtio_ids.h
+++ b/include/standard-headers/linux/virtio_ids.h
@@ -43,5 +43,6 @@
#define VIRTIO_ID_INPUT 18 /* virtio input */
#define VIRTIO_ID_VSOCK 19 /* virtio vsock transport */
#define VIRTIO_ID_CRYPTO 20 /* virtio crypto */
+#define VIRTIO_ID_IOMMU 61216 /* virtio IOMMU (temporary) */

#endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/standard-headers/linux/virtio_iommu.h b/include/standard-headers/linux/virtio_iommu.h
new file mode 100644
index 0000000..7f486cf
--- /dev/null
+++ b/include/standard-headers/linux/virtio_iommu.h
@@ -0,0 +1,177 @@
+/*
+ * Virtio-iommu definition v0.4
+ *
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * This header is BSD licensed so anyone can use the definitions
+ * to implement compatible drivers/servers:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of ARM Ltd. nor the names of its contributors
+ * may be used to endorse or promote products derived from this software
+ * without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL IBM OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#ifndef _LINUX_VIRTIO_IOMMU_H
+#define _LINUX_VIRTIO_IOMMU_H
+
+/* Feature bits */
+#define VIRTIO_IOMMU_F_INPUT_RANGE 0
+#define VIRTIO_IOMMU_F_IOASID_BITS 1
+#define VIRTIO_IOMMU_F_MAP_UNMAP 2
+#define VIRTIO_IOMMU_F_BYPASS 3
+#define VIRTIO_IOMMU_F_PROBE 4
+
+struct virtio_iommu_config {
+ /* Supported page sizes */
+ uint64_t page_size_mask;
+ struct virtio_iommu_range {
+ uint64_t start;
+ uint64_t end;
+ } input_range;
+ uint8_t ioasid_bits;
+ uint8_t padding[3];
+ uint32_t probe_size;
+} QEMU_PACKED;
+
+/* Request types */
+#define VIRTIO_IOMMU_T_ATTACH 0x01
+#define VIRTIO_IOMMU_T_DETACH 0x02
+#define VIRTIO_IOMMU_T_MAP 0x03
+#define VIRTIO_IOMMU_T_UNMAP 0x04
+#define VIRTIO_IOMMU_T_PROBE 0x05
+
+/* Status types */
+#define VIRTIO_IOMMU_S_OK 0x00
+#define VIRTIO_IOMMU_S_IOERR 0x01
+#define VIRTIO_IOMMU_S_UNSUPP 0x02
+#define VIRTIO_IOMMU_S_DEVERR 0x03
+#define VIRTIO_IOMMU_S_INVAL 0x04
+#define VIRTIO_IOMMU_S_RANGE 0x05
+#define VIRTIO_IOMMU_S_NOENT 0x06
+#define VIRTIO_IOMMU_S_FAULT 0x07
+
+struct virtio_iommu_req_head {
+ uint8_t type;
+ uint8_t reserved[3];
+} QEMU_PACKED;
+
+struct virtio_iommu_req_tail {
+ uint8_t status;
+ uint8_t reserved[3];
+} QEMU_PACKED;
+
+struct virtio_iommu_req_attach {
+ struct virtio_iommu_req_head head;
+
+ uint32_t address_space;
+ uint32_t device;
+ uint32_t reserved;
+
+ struct virtio_iommu_req_tail tail;
+} QEMU_PACKED;
+
+struct virtio_iommu_req_detach {
+ struct virtio_iommu_req_head head;
+
+ uint32_t device;
+ uint32_t reserved;
+
+ struct virtio_iommu_req_tail tail;
+} QEMU_PACKED;
+
+#define VIRTIO_IOMMU_MAP_F_READ (1 << 0)
+#define VIRTIO_IOMMU_MAP_F_WRITE (1 << 1)
+#define VIRTIO_IOMMU_MAP_F_EXEC (1 << 2)
+
+#define VIRTIO_IOMMU_MAP_F_MASK (VIRTIO_IOMMU_MAP_F_READ | \
+ VIRTIO_IOMMU_MAP_F_WRITE | \
+ VIRTIO_IOMMU_MAP_F_EXEC)
+
+struct virtio_iommu_req_map {
+ struct virtio_iommu_req_head head;
+
+ uint32_t address_space;
+ uint64_t virt_addr;
+ uint64_t phys_addr;
+ uint64_t size;
+ uint32_t flags;
+
+ struct virtio_iommu_req_tail tail;
+} QEMU_PACKED;
+
+QEMU_PACKED
+struct virtio_iommu_req_unmap {
+ struct virtio_iommu_req_head head;
+
+ uint32_t address_space;
+ uint64_t virt_addr;
+ uint64_t size;
+ uint32_t reserved;
+
+ struct virtio_iommu_req_tail tail;
+} QEMU_PACKED;
+
+#define VIRTIO_IOMMU_PROBE_RESV_MEM_T_ABORT 0
+#define VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS 1
+
+#define VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI (1 << 0)
+
+struct virtio_iommu_probe_resv_mem {
+ uint8_t subtype;
+ uint8_t reserved[3];
+ uint64_t addr;
+ uint64_t size;
+ uint32_t flags;
+} QEMU_PACKED;
+
+#define VIRTIO_IOMMU_PROBE_T_NONE 0
+#define VIRTIO_IOMMU_PROBE_T_RESV_MEM 2
+
+#define VIRTIO_IOMMU_PROBE_T_MASK 0xfff
+
+struct virtio_iommu_probe_property {
+ uint16_t type;
+ uint16_t length;
+ uint8_t value[];
+} QEMU_PACKED;
+
+struct virtio_iommu_req_probe {
+ struct virtio_iommu_req_head head;
+ uint32_t device;
+ uint8_t reserved[64];
+
+ uint8_t properties[];
+
+ /* Tail follows the variable-length properties array (no padding) */
+} QEMU_PACKED;
+
+union virtio_iommu_req {
+ struct virtio_iommu_req_head head;
+
+ struct virtio_iommu_req_attach attach;
+ struct virtio_iommu_req_detach detach;
+ struct virtio_iommu_req_map map;
+ struct virtio_iommu_req_unmap unmap;
+ struct virtio_iommu_req_probe probe;
+};
+
+#endif
diff --git a/linux-headers/linux/virtio_iommu.h b/linux-headers/linux/virtio_iommu.h
new file mode 100644
index 0000000..2dc4609
--- /dev/null
+++ b/linux-headers/linux/virtio_iommu.h
@@ -0,0 +1 @@
+#include "standard-headers/linux/virtio_iommu.h"
--
2.5.5
Eric Auger
2017-09-19 07:46:37 UTC
Permalink
This patch initializes the iommu memory regions so that
PCIe end point transactions get translated. The translation function
is not yet implemented at that stage.

Signed-off-by: Eric Auger <***@redhat.com>

---
v3 -> v4:
- add trace_virtio_iommu_init_iommu_mr

v2 -> v3:
- use IOMMUMemoryRegion
- iommu mr name built with BDF
- rename smmu_get_sid into virtio_iommu_get_sid and use PCI_BUILD_BDF
---
hw/virtio/trace-events | 2 +
hw/virtio/virtio-iommu.c | 119 +++++++++++++++++++++++++++++++++++++++
include/hw/virtio/virtio-iommu.h | 2 +
3 files changed, 123 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index a35dc62..bc65356 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -36,3 +36,5 @@ virtio_iommu_attach(uint32_t as, uint32_t dev) "as=%d dev=%d"
virtio_iommu_detach(uint32_t dev) "dev=%d"
virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr, uint64_t size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64" virt_addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d"
virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64
+virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
+virtio_iommu_init_iommu_mr(char *iommu_mr) "init %s"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 51f9003..f4cb76f 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -23,6 +23,7 @@
#include "hw/virtio/virtio.h"
#include "sysemu/kvm.h"
#include "qapi-event.h"
+#include "qemu/error-report.h"
#include "trace.h"

#include "standard-headers/linux/virtio_ids.h"
@@ -35,6 +36,66 @@
/* Max size */
#define VIOMMU_DEFAULT_QUEUE_SIZE 256

+static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
+{
+ return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
+}
+
+static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
+ int devfn)
+{
+ VirtIOIOMMU *s = opaque;
+ uintptr_t key = (uintptr_t)bus;
+ IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, &key);
+ IOMMUDevice *sdev;
+
+ if (!sbus) {
+ uintptr_t *new_key = g_malloc(sizeof(*new_key));
+
+ *new_key = (uintptr_t)bus;
+ sbus = g_malloc0(sizeof(IOMMUPciBus) +
+ sizeof(IOMMUDevice *) * IOMMU_PCI_DEVFN_MAX);
+ sbus->bus = bus;
+ g_hash_table_insert(s->as_by_busptr, new_key, sbus);
+ }
+
+ sdev = sbus->pbdev[devfn];
+ if (!sdev) {
+ char *name = g_strdup_printf("%s-%d-%d",
+ TYPE_VIRTIO_IOMMU_MEMORY_REGION,
+ pci_bus_num(bus), devfn);
+ sdev = sbus->pbdev[devfn] = g_malloc0(sizeof(IOMMUDevice));
+
+ sdev->viommu = s;
+ sdev->bus = bus;
+ sdev->devfn = devfn;
+
+ trace_virtio_iommu_init_iommu_mr(name);
+
+ memory_region_init_iommu(&sdev->iommu_mr, sizeof(sdev->iommu_mr),
+ TYPE_VIRTIO_IOMMU_MEMORY_REGION,
+ OBJECT(s), name,
+ UINT64_MAX);
+ address_space_init(&sdev->as,
+ MEMORY_REGION(&sdev->iommu_mr), TYPE_VIRTIO_IOMMU);
+ }
+
+ return &sdev->as;
+
+}
+
+static void virtio_iommu_init_as(VirtIOIOMMU *s)
+{
+ PCIBus *pcibus = pci_find_primary_bus();
+
+ if (pcibus) {
+ pci_setup_iommu(pcibus, virtio_iommu_find_add_as, s);
+ } else {
+ error_report("No PCI bus, virtio-iommu is not registered");
+ }
+}
+
+
static int virtio_iommu_attach(VirtIOIOMMU *s,
struct virtio_iommu_req_attach *req)
{
@@ -215,6 +276,26 @@ static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
}
}

+static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
+ IOMMUAccessFlags flag)
+{
+ IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+ uint32_t sid;
+
+ IOMMUTLBEntry entry = {
+ .target_as = &address_space_memory,
+ .iova = addr,
+ .translated_addr = addr,
+ .addr_mask = ~(hwaddr)0,
+ .perm = IOMMU_NONE,
+ };
+
+ sid = virtio_iommu_get_sid(sdev);
+
+ trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
+ return entry;
+}
+
static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
{
VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
@@ -265,6 +346,21 @@ static const VMStateDescription vmstate_virtio_iommu_device = {
},
};

+/*****************************
+ * Hash Table
+ *****************************/
+
+static inline gboolean as_uint64_equal(gconstpointer v1, gconstpointer v2)
+{
+ return *((const uint64_t *)v1) == *((const uint64_t *)v2);
+}
+
+static inline guint as_uint64_hash(gconstpointer v)
+{
+ return (guint)*(const uint64_t *)v;
+}
+
+
static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
{
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -278,6 +374,13 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)

s->config.page_size_mask = TARGET_PAGE_MASK;
s->config.input_range.end = -1UL;
+
+ memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
+ s->as_by_busptr = g_hash_table_new_full(as_uint64_hash,
+ as_uint64_equal,
+ g_free, g_free);
+
+ virtio_iommu_init_as(s);
}

static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
@@ -335,6 +438,14 @@ static void virtio_iommu_class_init(ObjectClass *klass, void *data)
vdc->vmsd = &vmstate_virtio_iommu_device;
}

+static void virtio_iommu_memory_region_class_init(ObjectClass *klass,
+ void *data)
+{
+ IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
+
+ imrc->translate = virtio_iommu_translate;
+}
+
static const TypeInfo virtio_iommu_info = {
.name = TYPE_VIRTIO_IOMMU,
.parent = TYPE_VIRTIO_DEVICE,
@@ -343,9 +454,17 @@ static const TypeInfo virtio_iommu_info = {
.class_init = virtio_iommu_class_init,
};

+static const TypeInfo virtio_iommu_memory_region_info = {
+ .parent = TYPE_IOMMU_MEMORY_REGION,
+ .name = TYPE_VIRTIO_IOMMU_MEMORY_REGION,
+ .class_init = virtio_iommu_memory_region_class_init,
+};
+
+
static void virtio_register_types(void)
{
type_register_static(&virtio_iommu_info);
+ type_register_static(&virtio_iommu_memory_region_info);
}

type_init(virtio_register_types)
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 6716cdb..f9c988f 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -28,6 +28,8 @@
#define VIRTIO_IOMMU(obj) \
OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)

+#define TYPE_VIRTIO_IOMMU_MEMORY_REGION "virtio-iommu-memory-region"
+
#define IOMMU_PCI_BUS_MAX 256
#define IOMMU_PCI_DEVFN_MAX 256
--
2.5.5
Eric Auger
2017-09-19 07:46:39 UTC
Permalink
This patch implements the device attach/detach to an
address space.

Signed-off-by: Eric Auger <***@redhat.com>
---
hw/virtio/virtio-iommu.c | 38 ++++++++++++++++++++++++++++++++------
1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 41a4bbc..bd859fb 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -110,8 +110,7 @@ static void virtio_iommu_put_dev(gpointer data)
g_free(dev);
}

-viommu_as *virtio_iommu_get_as(VirtIOIOMMU *s, uint32_t asid);
-viommu_as *virtio_iommu_get_as(VirtIOIOMMU *s, uint32_t asid)
+static viommu_as *virtio_iommu_get_as(VirtIOIOMMU *s, uint32_t asid)
{
viommu_as *as;

@@ -205,6 +204,8 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
uint32_t asid = le32_to_cpu(req->address_space);
uint32_t devid = le32_to_cpu(req->device);
uint32_t reserved = le32_to_cpu(req->reserved);
+ viommu_as *as;
+ viommu_dev *dev;

trace_virtio_iommu_attach(asid, devid);

@@ -212,7 +213,22 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
return VIRTIO_IOMMU_S_INVAL;
}

- return VIRTIO_IOMMU_S_UNSUPP;
+ dev = virtio_iommu_get_dev(s, devid);
+ if (dev->as) {
+ /*
+ * the device is already attached to an address space,
+ * detach it first
+ */
+ virtio_iommu_detach_dev_from_as(dev);
+ }
+
+ as = virtio_iommu_get_as(s, asid);
+ QLIST_INSERT_HEAD(&as->device_list, dev, next);
+
+ dev->as = as;
+ g_tree_ref(as->mappings);
+
+ return VIRTIO_IOMMU_S_OK;
}

static int virtio_iommu_detach(VirtIOIOMMU *s,
@@ -220,14 +236,24 @@ static int virtio_iommu_detach(VirtIOIOMMU *s,
{
uint32_t devid = le32_to_cpu(req->device);
uint32_t reserved = le32_to_cpu(req->reserved);
-
- trace_virtio_iommu_detach(devid);
+ viommu_dev *dev;

if (reserved) {
return VIRTIO_IOMMU_S_INVAL;
}

- return VIRTIO_IOMMU_S_UNSUPP;
+ dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
+ if (!dev) {
+ return VIRTIO_IOMMU_S_NOENT;
+ }
+
+ if (!dev->as) {
+ return VIRTIO_IOMMU_S_INVAL;
+ }
+
+ virtio_iommu_detach_dev_from_as(dev);
+ trace_virtio_iommu_detach(devid);
+ return VIRTIO_IOMMU_S_OK;
}

static int virtio_iommu_map(VirtIOIOMMU *s,
--
2.5.5
Eric Auger
2017-09-19 07:46:41 UTC
Permalink
This patch implements the translate callback

Signed-off-by: Eric Auger <***@redhat.com>
---
hw/virtio/trace-events | 1 +
hw/virtio/virtio-iommu.c | 39 +++++++++++++++++++++++++++++++++++++--
2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index da298c1..9010fbd 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -45,3 +45,4 @@ virtio_iommu_put_as(uint32_t asid) "Free asid=%d"
virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap left [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc [0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 6f1a7d1..db46a91 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -496,19 +496,54 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
IOMMUAccessFlags flag)
{
IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+ VirtIOIOMMU *s = sdev->viommu;
uint32_t sid;
+ viommu_dev *dev;
+ viommu_mapping *mapping;
+ viommu_interval interval;
+
+ interval.low = addr;
+ interval.high = addr + 1;

IOMMUTLBEntry entry = {
.target_as = &address_space_memory,
.iova = addr,
.translated_addr = addr,
- .addr_mask = ~(hwaddr)0,
- .perm = IOMMU_NONE,
+ .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
+ .perm = flag,
};

sid = virtio_iommu_get_sid(sdev);

trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
+ qemu_mutex_lock(&s->mutex);
+
+ dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(sid));
+ if (!dev) {
+ /* device cannot be attached to another as */
+ printf("%s sid=%d is not known!!\n", __func__, sid);
+ goto unlock;
+ }
+
+ mapping = g_tree_lookup(dev->as->mappings, (gpointer)(&interval));
+ if (!mapping) {
+ printf("%s no mapping for 0x%"PRIx64" for sid=%d\n", __func__,
+ addr, sid);
+ goto unlock;
+ }
+
+ if (((flag & IOMMU_RO) && !(mapping->flags & VIRTIO_IOMMU_MAP_F_READ)) ||
+ ((flag & IOMMU_WO) && !(mapping->flags & VIRTIO_IOMMU_MAP_F_WRITE))) {
+ error_report("Permission error on 0x%"PRIx64"(%d): allowed=%d",
+ addr, flag, mapping->flags);
+ entry.perm = IOMMU_NONE;
+ goto unlock;
+ }
+ entry.translated_addr = addr - mapping->virt_addr + mapping->phys_addr,
+ trace_virtio_iommu_translate_out(addr, entry.translated_addr, sid);
+
+unlock:
+ qemu_mutex_unlock(&s->mutex);
return entry;
}
--
2.5.5
Bharat Bhushan
2017-09-22 06:52:18 UTC
Permalink
-----Original Message-----
Sent: Tuesday, September 19, 2017 1:17 PM
Subject: [RFC v4 09/16] virtio-iommu: Implement translate
This patch implements the translate callback
---
hw/virtio/trace-events | 1 +
hw/virtio/virtio-iommu.c | 39
+++++++++++++++++++++++++++++++++++++--
2 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events index
da298c1..9010fbd 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -45,3 +45,4 @@ virtio_iommu_put_as(uint32_t asid) "Free asid=%d"
virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t
next_low, uint64_t next_high) "Unmap left [0x%"PRIx64",0x%"PRIx64"],
new interval=[0x%"PRIx64",0x%"PRIx64"]"
virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t
next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"],
new interval=[0x%"PRIx64",0x%"PRIx64"]"
virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc
[0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr,
uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c index
6f1a7d1..db46a91 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -496,19 +496,54 @@ static IOMMUTLBEntry
virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
IOMMUAccessFlags flag) {
IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+ VirtIOIOMMU *s = sdev->viommu;
uint32_t sid;
+ viommu_dev *dev;
+ viommu_mapping *mapping;
+ viommu_interval interval;
+
+ interval.low = addr;
+ interval.high = addr + 1;
IOMMUTLBEntry entry = {
.target_as = &address_space_memory,
.iova = addr,
.translated_addr = addr,
- .addr_mask = ~(hwaddr)0,
- .perm = IOMMU_NONE,
+ .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
+ .perm = flag,
};
sid = virtio_iommu_get_sid(sdev);
trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
+ qemu_mutex_lock(&s->mutex);
+
+ dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(sid));
+ if (!dev) {
+ /* device cannot be attached to another as */
+ printf("%s sid=%d is not known!!\n", __func__, sid);
+ goto unlock;
+ }
+
+ mapping = g_tree_lookup(dev->as->mappings, (gpointer)(&interval));
Should check of !dev->as before accessing this.

Thanks
-Bharat
+ if (!mapping) {
+ printf("%s no mapping for 0x%"PRIx64" for sid=%d\n", __func__,
+ addr, sid);
+ goto unlock;
+ }
+
+ if (((flag & IOMMU_RO) && !(mapping->flags &
VIRTIO_IOMMU_MAP_F_READ)) ||
+ ((flag & IOMMU_WO) && !(mapping->flags &
VIRTIO_IOMMU_MAP_F_WRITE))) {
+ error_report("Permission error on 0x%"PRIx64"(%d): allowed=%d",
+ addr, flag, mapping->flags);
+ entry.perm = IOMMU_NONE;
+ goto unlock;
+ }
+ entry.translated_addr = addr - mapping->virt_addr + mapping-
Post by Eric Auger
phys_addr,
+ trace_virtio_iommu_translate_out(addr, entry.translated_addr, sid);
+
+ qemu_mutex_unlock(&s->mutex);
return entry;
}
--
2.5.5
Eric Auger
2017-09-19 07:46:35 UTC
Permalink
This patchs adds the skeleton for the virtio-iommu device.

Signed-off-by: Eric Auger <***@redhat.com>

---

v3 -> v4:
- use page_size_mask instead of page_sizes
- added set_features()
- added some traces (reset, set_status, set_features)
- empty virtio_iommu_set_config() as the driver MUST NOT
write to device configuration fields
- add get_config trace

v2 -> v3:
- rebase on 2.10-rc0, ie. use IOMMUMemoryRegion and remove
iommu_ops.
- advertise VIRTIO_IOMMU_F_MAP_UNMAP feature
- page_sizes set to TARGET_PAGE_SIZE
---
hw/virtio/Makefile.objs | 1 +
hw/virtio/trace-events | 7 ++
hw/virtio/virtio-iommu.c | 255 +++++++++++++++++++++++++++++++++++++++
include/hw/virtio/virtio-iommu.h | 59 +++++++++
4 files changed, 322 insertions(+)
create mode 100644 hw/virtio/virtio-iommu.c
create mode 100644 include/hw/virtio/virtio-iommu.h

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 765d363..8967a4a 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-y += virtio-mmio.o

obj-y += virtio.o virtio-balloon.o
obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
+obj-$(CONFIG_LINUX) += virtio-iommu.o
obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
obj-y += virtio-crypto.o
obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 775461a..98cf9a1 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -25,3 +25,10 @@ virtio_balloon_handle_output(const char *name, uint64_t gpa) "section name: %s g
virtio_balloon_get_config(uint32_t num_pages, uint32_t actual) "num_pages: %d actual: %d"
virtio_balloon_set_config(uint32_t actual, uint32_t oldactual) "actual: %d oldactual: %d"
virtio_balloon_to_target(uint64_t target, uint32_t num_pages) "balloon target: 0x%"PRIx64" num_pages: %d"
+
+# hw/virtio/virtio-iommu.c
+#
+virtio_iommu_set_features(uint64_t features) "features accepted by the driver =0x%"PRIx64
+virtio_iommu_device_reset(void) "reset!"
+virtio_iommu_device_status(uint8_t status) "driver status = %d"
+virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint8_t ioasid_bits, uint32_t probe_size) "page_size_mask=0x%"PRIx64" start=0x%"PRIx64" end=0x%"PRIx64" ioasid_bits=%d probe_size=0x%x"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
new file mode 100644
index 0000000..36e5b5f
--- /dev/null
+++ b/hw/virtio/virtio-iommu.c
@@ -0,0 +1,255 @@
+/*
+ * virtio-iommu device
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iov.h"
+#include "qemu-common.h"
+#include "hw/virtio/virtio.h"
+#include "sysemu/kvm.h"
+#include "qapi-event.h"
+#include "trace.h"
+
+#include "standard-headers/linux/virtio_ids.h"
+#include <linux/virtio_iommu.h>
+
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-iommu.h"
+
+/* Max size */
+#define VIOMMU_DEFAULT_QUEUE_SIZE 256
+
+static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
+ struct iovec *iov,
+ unsigned int iov_cnt)
+{
+ return -ENOENT;
+}
+static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
+ struct iovec *iov,
+ unsigned int iov_cnt)
+{
+ return -ENOENT;
+}
+static int virtio_iommu_handle_map(VirtIOIOMMU *s,
+ struct iovec *iov,
+ unsigned int iov_cnt)
+{
+ return -ENOENT;
+}
+static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
+ struct iovec *iov,
+ unsigned int iov_cnt)
+{
+ return -ENOENT;
+}
+
+static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
+{
+ VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
+ VirtQueueElement *elem;
+ struct virtio_iommu_req_head head;
+ struct virtio_iommu_req_tail tail;
+ unsigned int iov_cnt;
+ struct iovec *iov;
+ size_t sz;
+
+ for (;;) {
+ elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+ if (!elem) {
+ return;
+ }
+
+ if (iov_size(elem->in_sg, elem->in_num) < sizeof(tail) ||
+ iov_size(elem->out_sg, elem->out_num) < sizeof(head)) {
+ virtio_error(vdev, "virtio-iommu erroneous head or tail");
+ virtqueue_detach_element(vq, elem, 0);
+ g_free(elem);
+ break;
+ }
+
+ iov_cnt = elem->out_num;
+ iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
+ sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
+ if (sz != sizeof(head)) {
+ tail.status = VIRTIO_IOMMU_S_UNSUPP;
+ }
+ qemu_mutex_lock(&s->mutex);
+ switch (head.type) {
+ case VIRTIO_IOMMU_T_ATTACH:
+ tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
+ break;
+ case VIRTIO_IOMMU_T_DETACH:
+ tail.status = virtio_iommu_handle_detach(s, iov, iov_cnt);
+ break;
+ case VIRTIO_IOMMU_T_MAP:
+ tail.status = virtio_iommu_handle_map(s, iov, iov_cnt);
+ break;
+ case VIRTIO_IOMMU_T_UNMAP:
+ tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
+ break;
+ default:
+ tail.status = VIRTIO_IOMMU_S_UNSUPP;
+ }
+ qemu_mutex_unlock(&s->mutex);
+
+ sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
+ &tail, sizeof(tail));
+ assert(sz == sizeof(tail));
+
+ virtqueue_push(vq, elem, sizeof(tail));
+ virtio_notify(vdev, vq);
+ g_free(elem);
+ }
+}
+
+static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
+{
+ VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+ struct virtio_iommu_config *config = &dev->config;
+
+ trace_virtio_iommu_get_config(config->page_size_mask,
+ config->input_range.start,
+ config->input_range.end,
+ config->ioasid_bits,
+ config->probe_size);
+ memcpy(config_data, &dev->config, sizeof(struct virtio_iommu_config));
+}
+
+static void virtio_iommu_set_config(VirtIODevice *vdev,
+ const uint8_t *config_data)
+{
+}
+
+static uint64_t virtio_iommu_get_features(VirtIODevice *vdev, uint64_t f,
+ Error **errp)
+{
+ VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+ f |= dev->host_features;
+ virtio_add_feature(&f, VIRTIO_RING_F_EVENT_IDX);
+ virtio_add_feature(&f, VIRTIO_RING_F_INDIRECT_DESC);
+ virtio_add_feature(&f, VIRTIO_IOMMU_F_INPUT_RANGE);
+ virtio_add_feature(&f, VIRTIO_IOMMU_F_MAP_UNMAP);
+ return f;
+}
+
+static void virtio_iommu_set_features(VirtIODevice *vdev, uint64_t val)
+{
+ trace_virtio_iommu_set_features(val);
+}
+
+static int virtio_iommu_post_load_device(void *opaque, int version_id)
+{
+ return 0;
+}
+
+static const VMStateDescription vmstate_virtio_iommu_device = {
+ .name = "virtio-iommu-device",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .post_load = virtio_iommu_post_load_device,
+ .fields = (VMStateField[]) {
+ VMSTATE_END_OF_LIST()
+ },
+};
+
+static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
+{
+ VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+ VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
+
+ virtio_init(vdev, "virtio-iommu", VIRTIO_ID_IOMMU,
+ sizeof(struct virtio_iommu_config));
+
+ s->vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE,
+ virtio_iommu_handle_command);
+
+ s->config.page_size_mask = TARGET_PAGE_MASK;
+ s->config.input_range.end = -1UL;
+}
+
+static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
+{
+ VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+
+ virtio_cleanup(vdev);
+}
+
+static void virtio_iommu_device_reset(VirtIODevice *vdev)
+{
+ trace_virtio_iommu_device_reset();
+}
+
+static void virtio_iommu_set_status(VirtIODevice *vdev, uint8_t status)
+{
+ trace_virtio_iommu_device_status(status);
+}
+
+static void virtio_iommu_instance_init(Object *obj)
+{
+}
+
+static const VMStateDescription vmstate_virtio_iommu = {
+ .name = "virtio-iommu",
+ .minimum_version_id = 1,
+ .version_id = 1,
+ .fields = (VMStateField[]) {
+ VMSTATE_VIRTIO_DEVICE,
+ VMSTATE_END_OF_LIST()
+ },
+};
+
+static Property virtio_iommu_properties[] = {
+ DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_iommu_class_init(ObjectClass *klass, void *data)
+{
+ DeviceClass *dc = DEVICE_CLASS(klass);
+ VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+
+ dc->props = virtio_iommu_properties;
+ dc->vmsd = &vmstate_virtio_iommu;
+
+ set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+ vdc->realize = virtio_iommu_device_realize;
+ vdc->unrealize = virtio_iommu_device_unrealize;
+ vdc->reset = virtio_iommu_device_reset;
+ vdc->get_config = virtio_iommu_get_config;
+ vdc->set_config = virtio_iommu_set_config;
+ vdc->get_features = virtio_iommu_get_features;
+ vdc->set_features = virtio_iommu_set_features;
+ vdc->set_status = virtio_iommu_set_status;
+ vdc->vmsd = &vmstate_virtio_iommu_device;
+}
+
+static const TypeInfo virtio_iommu_info = {
+ .name = TYPE_VIRTIO_IOMMU,
+ .parent = TYPE_VIRTIO_DEVICE,
+ .instance_size = sizeof(VirtIOIOMMU),
+ .instance_init = virtio_iommu_instance_init,
+ .class_init = virtio_iommu_class_init,
+};
+
+static void virtio_register_types(void)
+{
+ type_register_static(&virtio_iommu_info);
+}
+
+type_init(virtio_register_types)
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
new file mode 100644
index 0000000..6716cdb
--- /dev/null
+++ b/include/hw/virtio/virtio-iommu.h
@@ -0,0 +1,59 @@
+/*
+ * virtio-iommu device
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef QEMU_VIRTIO_IOMMU_H
+#define QEMU_VIRTIO_IOMMU_H
+
+#include "standard-headers/linux/virtio_iommu.h"
+#include "hw/virtio/virtio.h"
+#include "hw/pci/pci.h"
+
+#define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
+#define VIRTIO_IOMMU(obj) \
+ OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
+
+#define IOMMU_PCI_BUS_MAX 256
+#define IOMMU_PCI_DEVFN_MAX 256
+
+typedef struct IOMMUDevice {
+ void *viommu;
+ PCIBus *bus;
+ int devfn;
+ IOMMUMemoryRegion iommu_mr;
+ AddressSpace as;
+} IOMMUDevice;
+
+typedef struct IOMMUPciBus {
+ PCIBus *bus;
+ IOMMUDevice *pbdev[0]; /* Parent array is sparse, so dynamically alloc */
+} IOMMUPciBus;
+
+typedef struct VirtIOIOMMU {
+ VirtIODevice parent_obj;
+ VirtQueue *vq;
+ struct virtio_iommu_config config;
+ uint32_t host_features;
+ GHashTable *as_by_busptr;
+ IOMMUPciBus *as_by_bus_num[IOMMU_PCI_BUS_MAX];
+ GTree *address_spaces;
+ QemuMutex mutex;
+ GTree *devices;
+} VirtIOIOMMU;
+
+#endif
--
2.5.5
Eric Auger
2017-09-19 07:46:38 UTC
Permalink
This patch introduce address space and device internal
datatypes. Both are stored in RB trees. The address space
owns a list of devices attached to it.

It is assumed the devid corresponds to the PCI BDF.

Signed-off-by: Eric Auger <***@redhat.com>

---
v3 -> v4:
- new separate patch
---
hw/virtio/trace-events | 4 ++
hw/virtio/virtio-iommu.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 124 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index bc65356..74b92d3 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -38,3 +38,7 @@ virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr, uint64_t s
virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64
virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
virtio_iommu_init_iommu_mr(char *iommu_mr) "init %s"
+virtio_iommu_get_dev(uint32_t devid) "Alloc devid=%d"
+virtio_iommu_put_dev(uint32_t devid) "Free devid=%d"
+virtio_iommu_get_as(uint32_t asid) "Alloc asid=%d"
+virtio_iommu_put_as(uint32_t asid) "Free asid=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index f4cb76f..41a4bbc 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -32,15 +32,116 @@
#include "hw/virtio/virtio-bus.h"
#include "hw/virtio/virtio-access.h"
#include "hw/virtio/virtio-iommu.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"

/* Max size */
#define VIOMMU_DEFAULT_QUEUE_SIZE 256

+typedef struct viommu_as {
+ uint32_t id;
+ GTree *mappings;
+ QLIST_HEAD(, viommu_dev) device_list;
+} viommu_as;
+
+typedef struct viommu_dev {
+ uint32_t id;
+ viommu_as *as;
+ QLIST_ENTRY(viommu_dev) next;
+ VirtIOIOMMU *viommu;
+} viommu_dev;
+
+typedef struct viommu_interval {
+ uint64_t low;
+ uint64_t high;
+} viommu_interval;
+
static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
{
return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
}

+static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+ viommu_interval *inta = (viommu_interval *)a;
+ viommu_interval *intb = (viommu_interval *)b;
+
+ if (inta->high <= intb->low) {
+ return -1;
+ } else if (intb->high <= inta->low) {
+ return 1;
+ } else {
+ return 0;
+ }
+}
+
+static void virtio_iommu_detach_dev_from_as(viommu_dev *dev)
+{
+ QLIST_REMOVE(dev, next);
+ dev->as = NULL;
+}
+
+static viommu_dev *virtio_iommu_get_dev(VirtIOIOMMU *s, uint32_t devid)
+{
+ viommu_dev *dev;
+
+ dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
+ if (dev) {
+ return dev;
+ }
+ dev = g_malloc0(sizeof(*dev));
+ dev->id = devid;
+ dev->viommu = s;
+ trace_virtio_iommu_get_dev(devid);
+ g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
+ return dev;
+}
+
+static void virtio_iommu_put_dev(gpointer data)
+{
+ viommu_dev *dev = (viommu_dev *)data;
+
+ if (dev->as) {
+ virtio_iommu_detach_dev_from_as(dev);
+ g_tree_unref(dev->as->mappings);
+ }
+
+ trace_virtio_iommu_put_dev(dev->id);
+ g_free(dev);
+}
+
+viommu_as *virtio_iommu_get_as(VirtIOIOMMU *s, uint32_t asid);
+viommu_as *virtio_iommu_get_as(VirtIOIOMMU *s, uint32_t asid)
+{
+ viommu_as *as;
+
+ as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
+ if (as) {
+ return as;
+ }
+ as = g_malloc0(sizeof(*as));
+ as->id = asid;
+ as->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
+ NULL, (GDestroyNotify)g_free,
+ (GDestroyNotify)g_free);
+ g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
+ trace_virtio_iommu_get_as(asid);
+ return as;
+}
+
+static void virtio_iommu_put_as(gpointer data)
+{
+ viommu_as *as = (viommu_as *)data;
+ viommu_dev *iter, *tmp;
+
+ QLIST_FOREACH_SAFE(iter, &as->device_list, next, tmp) {
+ virtio_iommu_detach_dev_from_as(iter);
+ }
+ g_tree_destroy(as->mappings);
+ trace_virtio_iommu_put_as(as->id);
+ g_free(as);
+}
+
static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
int devfn)
{
@@ -70,6 +171,8 @@ static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
sdev->bus = bus;
sdev->devfn = devfn;

+ virtio_iommu_get_dev(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
+
trace_virtio_iommu_init_iommu_mr(name);

memory_region_init_iommu(&sdev->iommu_mr, sizeof(sdev->iommu_mr),
@@ -360,6 +463,12 @@ static inline guint as_uint64_hash(gconstpointer v)
return (guint)*(const uint64_t *)v;
}

+static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+ uint ua = GPOINTER_TO_UINT(a);
+ uint ub = GPOINTER_TO_UINT(b);
+ return (ua > ub) - (ua < ub);
+}

static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
{
@@ -375,17 +484,28 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
s->config.page_size_mask = TARGET_PAGE_MASK;
s->config.input_range.end = -1UL;

+ qemu_mutex_init(&s->mutex);
+
memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
s->as_by_busptr = g_hash_table_new_full(as_uint64_hash,
as_uint64_equal,
g_free, g_free);

+ s->address_spaces = g_tree_new_full((GCompareDataFunc)int_cmp,
+ NULL, NULL, virtio_iommu_put_as);
+ s->devices = g_tree_new_full((GCompareDataFunc)int_cmp,
+ NULL, NULL, virtio_iommu_put_dev);
+
virtio_iommu_init_as(s);
}

static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
{
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+ VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
+
+ g_tree_destroy(s->address_spaces);
+ g_tree_destroy(s->devices);

virtio_cleanup(vdev);
}
--
2.5.5
Bharat Bhushan
2017-09-22 07:29:55 UTC
Permalink
-----Original Message-----
Sent: Tuesday, September 19, 2017 1:17 PM
Subject: [RFC v4 06/16] virtio-iommu: Register attached devices
This patch introduce address space and device internal
datatypes. Both are stored in RB trees. The address space
owns a list of devices attached to it.
It is assumed the devid corresponds to the PCI BDF.
---
- new separate patch
---
hw/virtio/trace-events | 4 ++
hw/virtio/virtio-iommu.c | 120
+++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 124 insertions(+)
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index bc65356..74b92d3 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -38,3 +38,7 @@ virtio_iommu_map(uint32_t as, uint64_t phys_addr,
uint64_t virt_addr, uint64_t s
virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size) "as= %d
virt_addr=0x%"PRIx64" size=0x%"PRIx64
virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int
flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
virtio_iommu_init_iommu_mr(char *iommu_mr) "init %s"
+virtio_iommu_get_dev(uint32_t devid) "Alloc devid=%d"
+virtio_iommu_put_dev(uint32_t devid) "Free devid=%d"
+virtio_iommu_get_as(uint32_t asid) "Alloc asid=%d"
+virtio_iommu_put_as(uint32_t asid) "Free asid=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index f4cb76f..41a4bbc 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -32,15 +32,116 @@
#include "hw/virtio/virtio-bus.h"
#include "hw/virtio/virtio-access.h"
#include "hw/virtio/virtio-iommu.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
/* Max size */
#define VIOMMU_DEFAULT_QUEUE_SIZE 256
+typedef struct viommu_as {
+ uint32_t id;
+ GTree *mappings;
+ QLIST_HEAD(, viommu_dev) device_list;
+} viommu_as;
+
+typedef struct viommu_dev {
+ uint32_t id;
+ viommu_as *as;
+ QLIST_ENTRY(viommu_dev) next;
+ VirtIOIOMMU *viommu;
+} viommu_dev;
+
+typedef struct viommu_interval {
+ uint64_t low;
+ uint64_t high;
+} viommu_interval;
+
static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
{
return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
}
+static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+ viommu_interval *inta = (viommu_interval *)a;
+ viommu_interval *intb = (viommu_interval *)b;
+
+ if (inta->high <= intb->low) {
+ return -1;
+ } else if (intb->high <= inta->low) {
+ return 1;
+ } else {
+ return 0;
+ }
+}
+
+static void virtio_iommu_detach_dev_from_as(viommu_dev *dev)
+{
+ QLIST_REMOVE(dev, next);
+ dev->as = NULL;
+}
+
+static viommu_dev *virtio_iommu_get_dev(VirtIOIOMMU *s, uint32_t devid)
+{
+ viommu_dev *dev;
+
+ dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
+ if (dev) {
+ return dev;
+ }
+ dev = g_malloc0(sizeof(*dev));
+ dev->id = devid;
+ dev->viommu = s;
+ trace_virtio_iommu_get_dev(devid);
+ g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
+ return dev;
+}
+
+static void virtio_iommu_put_dev(gpointer data)
+{
+ viommu_dev *dev = (viommu_dev *)data;
+
+ if (dev->as) {
+ virtio_iommu_detach_dev_from_as(dev);
+ g_tree_unref(dev->as->mappings);
+ }
+
+ trace_virtio_iommu_put_dev(dev->id);
+ g_free(dev);
+}
+
+viommu_as *virtio_iommu_get_as(VirtIOIOMMU *s, uint32_t asid);
+viommu_as *virtio_iommu_get_as(VirtIOIOMMU *s, uint32_t asid)
+{
+ viommu_as *as;
+
+ as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
+ if (as) {
+ return as;
+ }
+ as = g_malloc0(sizeof(*as));
+ as->id = asid;
I see now you maintain device-list in address-space in your base patches, but I do not see where it is initialized?
Should not that be initialized here (vfio integration patches initialized here).

Thanks
-Bharat
+ as->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
+ NULL, (GDestroyNotify)g_free,
+ (GDestroyNotify)g_free);
+ g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
+ trace_virtio_iommu_get_as(asid);
+ return as;
+}
+
+static void virtio_iommu_put_as(gpointer data)
+{
+ viommu_as *as = (viommu_as *)data;
+ viommu_dev *iter, *tmp;
+
+ QLIST_FOREACH_SAFE(iter, &as->device_list, next, tmp) {
+ virtio_iommu_detach_dev_from_as(iter);
+ }
+ g_tree_destroy(as->mappings);
+ trace_virtio_iommu_put_as(as->id);
+ g_free(as);
+}
+
static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
int devfn)
{
@@ -70,6 +171,8 @@ static AddressSpace
*virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
sdev->bus = bus;
sdev->devfn = devfn;
+ virtio_iommu_get_dev(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
+
trace_virtio_iommu_init_iommu_mr(name);
memory_region_init_iommu(&sdev->iommu_mr, sizeof(sdev-
Post by Eric Auger
iommu_mr),
@@ -360,6 +463,12 @@ static inline guint as_uint64_hash(gconstpointer v)
return (guint)*(const uint64_t *)v;
}
+static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+ uint ua = GPOINTER_TO_UINT(a);
+ uint ub = GPOINTER_TO_UINT(b);
+ return (ua > ub) - (ua < ub);
+}
static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
{
@@ -375,17 +484,28 @@ static void
virtio_iommu_device_realize(DeviceState *dev, Error **errp)
s->config.page_size_mask = TARGET_PAGE_MASK;
s->config.input_range.end = -1UL;
+ qemu_mutex_init(&s->mutex);
+
memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
s->as_by_busptr = g_hash_table_new_full(as_uint64_hash,
as_uint64_equal,
g_free, g_free);
+ s->address_spaces = g_tree_new_full((GCompareDataFunc)int_cmp,
+ NULL, NULL, virtio_iommu_put_as);
+ s->devices = g_tree_new_full((GCompareDataFunc)int_cmp,
+ NULL, NULL, virtio_iommu_put_dev);
+
virtio_iommu_init_as(s);
}
static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
{
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+ VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
+
+ g_tree_destroy(s->address_spaces);
+ g_tree_destroy(s->devices);
virtio_cleanup(vdev);
}
--
2.5.5
Eric Auger
2017-09-19 07:46:43 UTC
Permalink
The new machine type allows virtio-iommu instantiation.

Signed-off-by: Eric Auger <***@redhat.com>

---
---
hw/arm/virt.c | 23 +++++++++++++++++++++--
include/hw/arm/virt.h | 1 +
2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index cfd834d..9737a5f 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1648,7 +1648,7 @@ static void machvirt_machine_init(void)
}
type_init(machvirt_machine_init);

-static void virt_2_10_instance_init(Object *obj)
+static void virt_2_11_instance_init(Object *obj)
{
VirtMachineState *vms = VIRT_MACHINE(obj);
VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
@@ -1708,10 +1708,29 @@ static void virt_2_10_instance_init(Object *obj)
vms->irqmap = a15irqmap;
}

+static void virt_machine_2_11_options(MachineClass *mc)
+{
+}
+DEFINE_VIRT_MACHINE_AS_LATEST(2, 11)
+
+#define VIRT_COMPAT_2_10 \
+ HW_COMPAT_2_10
+
+static void virt_2_10_instance_init(Object *obj)
+{
+ virt_2_11_instance_init(obj);
+}
+
static void virt_machine_2_10_options(MachineClass *mc)
{
+ VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
+ virt_machine_2_11_options(mc);
+ SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_10);
+
+ vmc->no_iommu = true;
}
-DEFINE_VIRT_MACHINE_AS_LATEST(2, 10)
+DEFINE_VIRT_MACHINE(2, 10)

#define VIRT_COMPAT_2_9 \
HW_COMPAT_2_9
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 33b0ff3..ff27551 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -84,6 +84,7 @@ typedef struct {
bool disallow_affinity_adjustment;
bool no_its;
bool no_pmu;
+ bool no_iommu;
bool claim_edge_triggered_timers;
} VirtMachineClass;
--
2.5.5
Eric Auger
2017-09-19 07:46:40 UTC
Permalink
This patch implements virtio_iommu_map/unmap.

Signed-off-by: Eric Auger <***@redhat.com>

---

v3 -> v4:
- implement unmap semantics as specified in v0.4
---
hw/virtio/trace-events | 3 ++
hw/virtio/virtio-iommu.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 74b92d3..da298c1 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -42,3 +42,6 @@ virtio_iommu_get_dev(uint32_t devid) "Alloc devid=%d"
virtio_iommu_put_dev(uint32_t devid) "Free devid=%d"
virtio_iommu_get_as(uint32_t asid) "Alloc asid=%d"
virtio_iommu_put_as(uint32_t asid) "Free asid=%d"
+virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap left [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc [0x%"PRIx64",0x%"PRIx64"]"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index bd859fb..6f1a7d1 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -56,6 +56,13 @@ typedef struct viommu_interval {
uint64_t high;
} viommu_interval;

+typedef struct viommu_mapping {
+ uint64_t virt_addr;
+ uint64_t phys_addr;
+ uint64_t size;
+ uint32_t flags;
+} viommu_mapping;
+
static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
{
return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
@@ -264,10 +271,37 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
uint64_t virt_addr = le64_to_cpu(req->virt_addr);
uint64_t size = le64_to_cpu(req->size);
uint32_t flags = le32_to_cpu(req->flags);
+ viommu_as *as;
+ viommu_interval *interval;
+ viommu_mapping *mapping;
+
+ interval = g_malloc0(sizeof(*interval));
+
+ interval->low = virt_addr;
+ interval->high = virt_addr + size - 1;
+
+ as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
+ if (!as) {
+ return VIRTIO_IOMMU_S_NOENT;
+ }
+
+ mapping = g_tree_lookup(as->mappings, (gpointer)interval);
+ if (mapping) {
+ g_free(interval);
+ return VIRTIO_IOMMU_S_INVAL;
+ }

trace_virtio_iommu_map(asid, phys_addr, virt_addr, size, flags);

- return VIRTIO_IOMMU_S_UNSUPP;
+ mapping = g_malloc0(sizeof(*mapping));
+ mapping->virt_addr = virt_addr;
+ mapping->phys_addr = phys_addr;
+ mapping->size = size;
+ mapping->flags = flags;
+
+ g_tree_insert(as->mappings, interval, mapping);
+
+ return VIRTIO_IOMMU_S_OK;
}

static int virtio_iommu_unmap(VirtIOIOMMU *s,
@@ -276,10 +310,63 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
uint32_t asid = le32_to_cpu(req->address_space);
uint64_t virt_addr = le64_to_cpu(req->virt_addr);
uint64_t size = le64_to_cpu(req->size);
+ viommu_mapping *mapping;
+ viommu_interval interval;
+ viommu_as *as;

trace_virtio_iommu_unmap(asid, virt_addr, size);

- return VIRTIO_IOMMU_S_UNSUPP;
+ as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
+ if (!as) {
+ error_report("%s: no as", __func__);
+ return VIRTIO_IOMMU_S_NOENT;
+ }
+ interval.low = virt_addr;
+ interval.high = virt_addr + size - 1;
+
+ mapping = g_tree_lookup(as->mappings, (gpointer)(&interval));
+
+ while (mapping) {
+ viommu_interval current;
+ uint64_t low = mapping->virt_addr;
+ uint64_t high = mapping->virt_addr + mapping->size - 1;
+
+ current.low = low;
+ current.high = high;
+
+ if (low == interval.low && size >= mapping->size) {
+ g_tree_remove(as->mappings, (gpointer)(&current));
+ interval.low = high + 1;
+ trace_virtio_iommu_unmap_left_interval(current.low, current.high,
+ interval.low, interval.high);
+ } else if (high == interval.high && size >= mapping->size) {
+ trace_virtio_iommu_unmap_right_interval(current.low, current.high,
+ interval.low, interval.high);
+ g_tree_remove(as->mappings, (gpointer)(&current));
+ interval.high = low - 1;
+ } else if (low > interval.low && high < interval.high) {
+ trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
+ g_tree_remove(as->mappings, (gpointer)(&current));
+ } else {
+ break;
+ }
+ if (interval.low >= interval.high) {
+ return VIRTIO_IOMMU_S_OK;
+ } else {
+ mapping = g_tree_lookup(as->mappings, (gpointer)(&interval));
+ }
+ }
+
+ if (mapping) {
+ error_report("****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
+ " from 0x%"PRIx64" size=0x%"PRIx64" is not supported",
+ __func__, interval.low, size,
+ mapping->virt_addr, mapping->size);
+ } else {
+ return VIRTIO_IOMMU_S_OK;
+ }
+
+ return VIRTIO_IOMMU_S_INVAL;
}

#define get_payload_size(req) (\
--
2.5.5
Eric Auger
2017-09-19 07:46:42 UTC
Permalink
This patch implements the PROBE request. At the moment,
no reserved regions are returned.

At the moment reserved regions are stored per device.

Signed-off-by: Eric Auger <***@redhat.com>

---

Waiting for clarifications on v0.4 spec
---
hw/virtio/trace-events | 2 +
hw/virtio/virtio-iommu.c | 173 ++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 173 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 9010fbd..9ccfad1 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -46,3 +46,5 @@ virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t next_low,
virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc [0x%"PRIx64",0x%"PRIx64"]"
virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
+virtio_iommu_fill_resv_property(uint32_t devid, uint8_t subtype, uint64_t addr, uint64_t size, uint32_t flags, size_t filled) "dev= %d, subtype=%d addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d filled=0x%lx"
+virtio_iommu_fill_none_property(uint32_t devid) "devid=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index db46a91..281b0f8 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -37,6 +37,11 @@

/* Max size */
#define VIOMMU_DEFAULT_QUEUE_SIZE 256
+#define VIOMMU_PROBE_SIZE 512
+
+#define SUPPORTED_PROBE_PROPERTIES (\
+ VIRTIO_IOMMU_PROBE_T_NONE | \
+ VIRTIO_IOMMU_PROBE_T_RESV_MEM)

typedef struct viommu_as {
uint32_t id;
@@ -49,6 +54,7 @@ typedef struct viommu_dev {
viommu_as *as;
QLIST_ENTRY(viommu_dev) next;
VirtIOIOMMU *viommu;
+ GTree *reserved_regions;
} viommu_dev;

typedef struct viommu_interval {
@@ -63,6 +69,13 @@ typedef struct viommu_mapping {
uint32_t flags;
} viommu_mapping;

+typedef struct viommu_property_buffer {
+ viommu_dev *dev;
+ size_t filled;
+ uint8_t *start;
+ bool error;
+} viommu_property_buffer;
+
static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
{
return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
@@ -101,6 +114,9 @@ static viommu_dev *virtio_iommu_get_dev(VirtIOIOMMU *s, uint32_t devid)
dev->viommu = s;
trace_virtio_iommu_get_dev(devid);
g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
+ dev->reserved_regions = g_tree_new_full((GCompareDataFunc)interval_cmp,
+ NULL, (GDestroyNotify)g_free,
+ (GDestroyNotify)g_free);
return dev;
}

@@ -114,6 +130,7 @@ static void virtio_iommu_put_dev(gpointer data)
}

trace_virtio_iommu_put_dev(dev->id);
+ g_tree_destroy(dev->reserved_regions);
g_free(dev);
}

@@ -369,6 +386,123 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
return VIRTIO_IOMMU_S_INVAL;
}

+static gboolean virtio_iommu_fill_resv_mem_prop(gpointer key,
+ gpointer value,
+ gpointer data)
+{
+ struct virtio_iommu_probe_resv_mem *resv =
+ (struct virtio_iommu_probe_resv_mem *)value;
+ struct virtio_iommu_probe_property *prop;
+ struct virtio_iommu_probe_resv_mem *current;
+ viommu_property_buffer *bufstate = (viommu_property_buffer *)data;
+ size_t size = sizeof(*resv), total_size;
+
+ total_size = size + 4;
+
+ if (bufstate->filled >= VIOMMU_PROBE_SIZE) {
+ bufstate->error = true;
+ return true;
+ }
+ prop = (struct virtio_iommu_probe_property *)
+ (bufstate->start + bufstate->filled);
+ prop->type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM) &
+ VIRTIO_IOMMU_PROBE_T_MASK;
+ prop->length = size;
+
+ current = (struct virtio_iommu_probe_resv_mem *)prop->value;
+ *current = *resv;
+ bufstate->filled += total_size;
+ trace_virtio_iommu_fill_resv_property(bufstate->dev->id,
+ resv->subtype, resv->addr,
+ resv->size, resv->flags,
+ bufstate->filled);
+ return false;
+}
+
+static int virtio_iommu_fill_none_prop(viommu_property_buffer *bufstate)
+{
+ struct virtio_iommu_probe_property *prop;
+
+ prop = (struct virtio_iommu_probe_property *)
+ (bufstate->start + bufstate->filled);
+ prop->type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_NONE)
+ & VIRTIO_IOMMU_PROBE_T_MASK;
+ prop->length = 0;
+ bufstate->filled += 4;
+ trace_virtio_iommu_fill_none_property(bufstate->dev->id);
+ return 0;
+}
+
+static int virtio_iommu_fill_property(int devid, int type,
+ viommu_property_buffer *bufstate)
+{
+ int ret = -ENOSPC;
+
+ if (bufstate->filled + 4 >= VIOMMU_PROBE_SIZE) {
+ bufstate->error = true;
+ goto out;
+ }
+
+ switch (type) {
+ case VIRTIO_IOMMU_PROBE_T_NONE:
+ ret = virtio_iommu_fill_none_prop(bufstate);
+ break;
+ case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
+ {
+ viommu_dev *dev = bufstate->dev;
+
+ g_tree_foreach(dev->reserved_regions,
+ virtio_iommu_fill_resv_mem_prop,
+ bufstate);
+ if (!bufstate->error) {
+ ret = 0;
+ }
+ break;
+ }
+ default:
+ ret = -ENOENT;
+ break;
+ }
+out:
+ if (ret) {
+ error_report("%s property of type=%d could not be filled (%d),"
+ " remaining size = 0x%lx",
+ __func__, type, ret, bufstate->filled);
+ }
+ return ret;
+}
+
+static int virtio_iommu_probe(VirtIOIOMMU *s,
+ struct virtio_iommu_req_probe *req,
+ uint8_t *buf)
+{
+ uint32_t devid = le32_to_cpu(req->device);
+ int16_t prop_types = SUPPORTED_PROBE_PROPERTIES, type;
+ viommu_property_buffer bufstate;
+ viommu_dev *dev;
+ int ret;
+
+ dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
+ if (!dev) {
+ return -EINVAL;
+ }
+
+ bufstate.start = buf;
+ bufstate.filled = 0;
+ bufstate.dev = dev;
+
+ while ((type = ctz32(prop_types)) != 32) {
+ ret = virtio_iommu_fill_property(devid, 1 << type, &bufstate);
+ if (ret) {
+ break;
+ }
+ prop_types &= ~(1 << type);
+ }
+ virtio_iommu_fill_property(devid, VIRTIO_IOMMU_PROBE_T_NONE, &bufstate);
+
+ return VIRTIO_IOMMU_S_OK;
+}
+
#define get_payload_size(req) (\
sizeof((req)) - sizeof(struct virtio_iommu_req_tail))

@@ -433,6 +567,24 @@ static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
return virtio_iommu_unmap(s, &req);
}

+static int virtio_iommu_handle_probe(VirtIOIOMMU *s,
+ struct iovec *iov,
+ unsigned int iov_cnt,
+ uint8_t *buf)
+{
+ struct virtio_iommu_req_probe req;
+ size_t sz, payload_sz;
+
+ payload_sz = get_payload_size(req);
+
+ sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+ if (sz != payload_sz) {
+ return VIRTIO_IOMMU_S_INVAL;
+ }
+
+ return virtio_iommu_probe(s, &req, buf);
+}
+
static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
{
VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
@@ -477,16 +629,31 @@ static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
case VIRTIO_IOMMU_T_UNMAP:
tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
break;
+ case VIRTIO_IOMMU_T_PROBE:
+ {
+ struct virtio_iommu_req_tail *ptail;
+ uint8_t *buf = g_malloc0(s->config.probe_size + sizeof(tail));
+
+ ptail = (struct virtio_iommu_req_tail *)buf + s->config.probe_size;
+ ptail->status = virtio_iommu_handle_probe(s, iov, iov_cnt, buf);
+
+ sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
+ buf, s->config.probe_size + sizeof(tail));
+ g_free(buf);
+ assert(sz == s->config.probe_size + sizeof(tail));
+ goto push;
+ }
default:
tail.status = VIRTIO_IOMMU_S_UNSUPP;
}
- qemu_mutex_unlock(&s->mutex);

sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
&tail, sizeof(tail));
assert(sz == sizeof(tail));

- virtqueue_push(vq, elem, sizeof(tail));
+push:
+ qemu_mutex_unlock(&s->mutex);
+ virtqueue_push(vq, elem, sz);
virtio_notify(vdev, vq);
g_free(elem);
}
@@ -574,6 +741,7 @@ static uint64_t virtio_iommu_get_features(VirtIODevice *vdev, uint64_t f,
virtio_add_feature(&f, VIRTIO_RING_F_INDIRECT_DESC);
virtio_add_feature(&f, VIRTIO_IOMMU_F_INPUT_RANGE);
virtio_add_feature(&f, VIRTIO_IOMMU_F_MAP_UNMAP);
+ virtio_add_feature(&f, VIRTIO_IOMMU_F_PROBE);
return f;
}

@@ -631,6 +799,7 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)

s->config.page_size_mask = TARGET_PAGE_MASK;
s->config.input_range.end = -1UL;
+ s->config.probe_size = VIOMMU_PROBE_SIZE;

qemu_mutex_init(&s->mutex);
--
2.5.5
Tomasz Nowicki
2017-09-27 10:53:24 UTC
Permalink
Hi Eric,
Post by Eric Auger
This patch implements the PROBE request. At the moment,
no reserved regions are returned.
At the moment reserved regions are stored per device.
---
[...]
Post by Eric Auger
+
+static int virtio_iommu_fill_property(int devid, int type,
+ viommu_property_buffer *bufstate)
+{
+ int ret = -ENOSPC;
+
+ if (bufstate->filled + 4 >= VIOMMU_PROBE_SIZE) {
+ bufstate->error = true;
+ goto out;
+ }
+
+ switch (type) {
+ ret = virtio_iommu_fill_none_prop(bufstate);
+ break;
+ {
+ viommu_dev *dev = bufstate->dev;
+
+ g_tree_foreach(dev->reserved_regions,
+ virtio_iommu_fill_resv_mem_prop,
+ bufstate);
+ if (!bufstate->error) {
+ ret = 0;
+ }
+ break;
+ }
+ ret = -ENOENT;
+ break;
+ }
+ if (ret) {
+ error_report("%s property of type=%d could not be filled (%d),"
+ " remaining size = 0x%lx",
+ __func__, type, ret, bufstate->filled);
+ }
+ return ret;
+}
+
+static int virtio_iommu_probe(VirtIOIOMMU *s,
+ struct virtio_iommu_req_probe *req,
+ uint8_t *buf)
+{
+ uint32_t devid = le32_to_cpu(req->device);
+ int16_t prop_types = SUPPORTED_PROBE_PROPERTIES, type;
+ viommu_property_buffer bufstate;
+ viommu_dev *dev;
+ int ret;
+
+ dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
+ if (!dev) {
+ return -EINVAL;
+ }
+
+ bufstate.start = buf;
+ bufstate.filled = 0;
+ bufstate.dev = dev;
bufstate.error is not initialized which may cause false alarm in
virtio_iommu_fill_property()
Post by Eric Auger
+
+ while ((type = ctz32(prop_types)) != 32) {
+ ret = virtio_iommu_fill_property(devid, 1 << type, &bufstate);
+ if (ret) {
+ break;
+ }
+ prop_types &= ~(1 << type);
+ }
+ virtio_iommu_fill_property(devid, VIRTIO_IOMMU_PROBE_T_NONE, &bufstate);
+
+ return VIRTIO_IOMMU_S_OK;
+}
+
#define get_payload_size(req) (\
sizeof((req)) - sizeof(struct virtio_iommu_req_tail))
@@ -433,6 +567,24 @@ static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
return virtio_iommu_unmap(s, &req);
}
Thanks,
Tomasz
Bharat Bhushan
2017-09-27 11:00:52 UTC
Permalink
-----Original Message-----
Sent: Wednesday, September 27, 2017 4:23 PM
Subject: Re: [RFC v4 10/16] virtio-iommu: Implement probe request
Hi Eric,
This patch implements the PROBE request. At the moment, no reserved
regions are returned.
At the moment reserved regions are stored per device.
---
[...]
+
+static int virtio_iommu_fill_property(int devid, int type,
+ viommu_property_buffer
+*bufstate) {
+ int ret = -ENOSPC;
+
+ if (bufstate->filled + 4 >= VIOMMU_PROBE_SIZE) {
+ bufstate->error = true;
+ goto out;
+ }
+
+ switch (type) {
+ ret = virtio_iommu_fill_none_prop(bufstate);
+ break;
+ {
+ viommu_dev *dev = bufstate->dev;
+
+ g_tree_foreach(dev->reserved_regions,
+ virtio_iommu_fill_resv_mem_prop,
+ bufstate);
+ if (!bufstate->error) {
+ ret = 0;
+ }
+ break;
+ }
+ ret = -ENOENT;
+ break;
+ }
+ if (ret) {
+ error_report("%s property of type=%d could not be filled (%d),"
+ " remaining size = 0x%lx",
+ __func__, type, ret, bufstate->filled);
+ }
+ return ret;
+}
+
+static int virtio_iommu_probe(VirtIOIOMMU *s,
+ struct virtio_iommu_req_probe *req,
+ uint8_t *buf) {
+ uint32_t devid = le32_to_cpu(req->device);
+ int16_t prop_types = SUPPORTED_PROBE_PROPERTIES, type;
+ viommu_property_buffer bufstate;
+ viommu_dev *dev;
+ int ret;
+
+ dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
+ if (!dev) {
+ return -EINVAL;
+ }
+
+ bufstate.start = buf;
+ bufstate.filled = 0;
+ bufstate.dev = dev;
bufstate.error is not initialized which may cause false alarm in
virtio_iommu_fill_property()
I observed below prints
"qemu-system-aarch64: virtio_iommu_fill_property property of type=2 could not be filled (-28), remaining size = 0x0 "

When I initialized the bufstate.error = 0,it goes.

Thanks
-Bharat
+
+ while ((type = ctz32(prop_types)) != 32) {
+ ret = virtio_iommu_fill_property(devid, 1 << type, &bufstate);
+ if (ret) {
+ break;
+ }
+ prop_types &= ~(1 << type);
+ }
+ virtio_iommu_fill_property(devid, VIRTIO_IOMMU_PROBE_T_NONE,
+ &bufstate);
+
+ return VIRTIO_IOMMU_S_OK;
+}
+
#define get_payload_size(req) (\
sizeof((req)) - sizeof(struct virtio_iommu_req_tail))
@@ -433,6 +567,24 @@ static int
virtio_iommu_handle_unmap(VirtIOIOMMU *s,
return virtio_iommu_unmap(s, &req);
}
Th
Auger Eric
2017-09-27 15:44:13 UTC
Permalink
Hi Bharat,
Post by Bharat Bhushan
-----Original Message-----
Sent: Wednesday, September 27, 2017 4:23 PM
Subject: Re: [RFC v4 10/16] virtio-iommu: Implement probe request
Hi Eric,
This patch implements the PROBE request. At the moment, no reserved
regions are returned.
At the moment reserved regions are stored per device.
---
[...]
+
+static int virtio_iommu_fill_property(int devid, int type,
+ viommu_property_buffer
+*bufstate) {
+ int ret = -ENOSPC;
+
+ if (bufstate->filled + 4 >= VIOMMU_PROBE_SIZE) {
+ bufstate->error = true;
+ goto out;
+ }
+
+ switch (type) {
+ ret = virtio_iommu_fill_none_prop(bufstate);
+ break;
+ {
+ viommu_dev *dev = bufstate->dev;
+
+ g_tree_foreach(dev->reserved_regions,
+ virtio_iommu_fill_resv_mem_prop,
+ bufstate);
+ if (!bufstate->error) {
+ ret = 0;
+ }
+ break;
+ }
+ ret = -ENOENT;
+ break;
+ }
+ if (ret) {
+ error_report("%s property of type=%d could not be filled (%d),"
+ " remaining size = 0x%lx",
+ __func__, type, ret, bufstate->filled);
+ }
+ return ret;
+}
+
+static int virtio_iommu_probe(VirtIOIOMMU *s,
+ struct virtio_iommu_req_probe *req,
+ uint8_t *buf) {
+ uint32_t devid = le32_to_cpu(req->device);
+ int16_t prop_types = SUPPORTED_PROBE_PROPERTIES, type;
+ viommu_property_buffer bufstate;
+ viommu_dev *dev;
+ int ret;
+
+ dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
+ if (!dev) {
+ return -EINVAL;
+ }
+
+ bufstate.start = buf;
+ bufstate.filled = 0;
+ bufstate.dev = dev;
bufstate.error is not initialized which may cause false alarm in
virtio_iommu_fill_property()
I observed below prints
"qemu-system-aarch64: virtio_iommu_fill_property property of type=2 could not be filled (-28), remaining size = 0x0 "
When I initialized the bufstate.error = 0,it goes.
Sure, I will fix that soon.

Best Regards

Eric
Post by Bharat Bhushan
Thanks
-Bharat
+
+ while ((type = ctz32(prop_types)) != 32) {
+ ret = virtio_iommu_fill_property(devid, 1 << type, &bufstate);
+ if (ret) {
+ break;
+ }
+ prop_types &= ~(1 << type);
+ }
+ virtio_iommu_fill_property(devid, VIRTIO_IOMMU_PROBE_T_NONE,
+ &bufstate);
+
+ return VIRTIO_IOMMU_S_OK;
+}
+
#define get_payload_size(req) (\
sizeof((req)) - sizeof(struct virtio_iommu_req_tail))
@@ -433,6 +567,24 @@ static int
virtio_iommu_handle_unmap(VirtIOIOMMU *s,
return virtio_iommu_unmap(s, &req);
}
Thanks,
Tomasz
Auger Eric
2017-09-27 15:40:43 UTC
Permalink
Hi Tomasz,
Post by Tomasz Nowicki
Hi Eric,
Post by Eric Auger
This patch implements the PROBE request. At the moment,
no reserved regions are returned.
At the moment reserved regions are stored per device.
---
[...]
Post by Eric Auger
+
+static int virtio_iommu_fill_property(int devid, int type,
+ viommu_property_buffer *bufstate)
+{
+ int ret = -ENOSPC;
+
+ if (bufstate->filled + 4 >= VIOMMU_PROBE_SIZE) {
+ bufstate->error = true;
+ goto out;
+ }
+
+ switch (type) {
+ ret = virtio_iommu_fill_none_prop(bufstate);
+ break;
+ {
+ viommu_dev *dev = bufstate->dev;
+
+ g_tree_foreach(dev->reserved_regions,
+ virtio_iommu_fill_resv_mem_prop,
+ bufstate);
+ if (!bufstate->error) {
+ ret = 0;
+ }
+ break;
+ }
+ ret = -ENOENT;
+ break;
+ }
+ if (ret) {
+ error_report("%s property of type=%d could not be filled (%d),"
+ " remaining size = 0x%lx",
+ __func__, type, ret, bufstate->filled);
+ }
+ return ret;
+}
+
+static int virtio_iommu_probe(VirtIOIOMMU *s,
+ struct virtio_iommu_req_probe *req,
+ uint8_t *buf)
+{
+ uint32_t devid = le32_to_cpu(req->device);
+ int16_t prop_types = SUPPORTED_PROBE_PROPERTIES, type;
+ viommu_property_buffer bufstate;
+ viommu_dev *dev;
+ int ret;
+
+ dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
+ if (!dev) {
+ return -EINVAL;
+ }
+
+ bufstate.start = buf;
+ bufstate.filled = 0;
+ bufstate.dev = dev;
bufstate.error is not initialized which may cause false alarm in
virtio_iommu_fill_property()
thanks for spotting that. I owe you several fixes in both vsmmuv3 and
virtio-iommu. Thank you for testing, again!

Best Regards

Eric
Post by Tomasz Nowicki
Post by Eric Auger
+
+ while ((type = ctz32(prop_types)) != 32) {
+ ret = virtio_iommu_fill_property(devid, 1 << type, &bufstate);
+ if (ret) {
+ break;
+ }
+ prop_types &= ~(1 << type);
+ }
+ virtio_iommu_fill_property(devid, VIRTIO_IOMMU_PROBE_T_NONE, &bufstate);
+
+ return VIRTIO_IOMMU_S_OK;
+}
+
#define get_payload_size(req) (\
sizeof((req)) - sizeof(struct virtio_iommu_req_tail))
@@ -433,6 +567,24 @@ static int
virtio_iommu_handle_unmap(VirtIOIOMMU *s,
return virtio_iommu_unmap(s, &req);
}
Thanks,
Tomasz
Eric Auger
2017-09-19 07:46:46 UTC
Permalink
We store the page_size_mask in the container and on
vfio_listener_region_add(), the information is conveyed
to the IOMMUMemoryRegion.

Signed-off-by: Eric Auger <***@redhat.com>
---
hw/vfio/common.c | 5 +++++
include/hw/vfio/vfio-common.h | 1 +
2 files changed, 6 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b2924c..468a259 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -480,6 +480,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
if (memory_region_is_iommu(section->mr)) {
VFIOGuestIOMMU *giommu;
IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+ IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);

trace_vfio_listener_region_add_iommu(iova, end);
/*
@@ -500,6 +501,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
IOMMU_NOTIFIER_ALL,
section->offset_within_region,
int128_get64(llend));
+ if (imrc->set_page_size_mask) {
+ imrc->set_page_size_mask(iommu_mr, container->page_size_mask);
+ }
QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);

memory_region_register_iommu_notifier(section->mr, &giommu->n);
@@ -1024,6 +1028,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
/* Assume 4k IOVA page size */
info.iova_pgsizes = 4096;
}
+ container->page_size_mask = info.iova_pgsizes;
vfio_host_win_add(container, 0, (hwaddr)-1, info.iova_pgsizes);
} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f3a2ac9..221cc30 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -79,6 +79,7 @@ typedef struct VFIOContainer {
int fd; /* /dev/vfio/vfio, empowered by the attached groups */
MemoryListener listener;
MemoryListener prereg_listener;
+ uint64_t page_size_mask; /* page sizes supported for this container */
unsigned iommu_type;
int error;
bool initialized;
--
2.5.5
Eric Auger
2017-09-19 07:46:45 UTC
Permalink
This callback allows to inform the IOMMU memory region about
restrictions on the supported page sizes.

Signed-off-by: Eric Auger <***@redhat.com>
---
include/exec/memory.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1dcd312..0c91dca 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -204,6 +204,10 @@ typedef struct IOMMUMemoryRegionClass {
IOMMUAccessFlags flag);
/* Returns minimum supported page size */
uint64_t (*get_min_page_size)(IOMMUMemoryRegion *iommu);
+
+ /* Limits the supported page sizes to @pgsizes */
+ void (*set_page_size_mask)(IOMMUMemoryRegion *iommu, uint64_t pgsizes);
+
/* Called when IOMMU Notifier flag changed */
void (*notify_flag_changed)(IOMMUMemoryRegion *iommu,
IOMMUNotifierFlag old_flags,
--
2.5.5
Eric Auger
2017-09-19 07:46:44 UTC
Permalink
The specific virtio-mmio node is inconditionally added on
machine init while the binding between this latter and the
PCIe host bridge is done on machine init done notifier, only
if -device virtio-iommu-device was added to the qemu command
line.

Signed-off-by: Eric Auger <***@redhat.com>

---
---
hw/arm/virt.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++----
include/hw/arm/virt.h | 4 +++
2 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9737a5f..f057d7a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -52,6 +52,7 @@
#include "hw/arm/fdt.h"
#include "hw/intc/arm_gic.h"
#include "hw/intc/arm_gicv3_common.h"
+#include "hw/virtio/virtio-iommu.h"
#include "kvm_arm.h"
#include "hw/smbios/smbios.h"
#include "qapi/visitor.h"
@@ -139,6 +140,7 @@ static const MemMapEntry a15memmap[] = {
[VIRT_FW_CFG] = { 0x09020000, 0x00000018 },
[VIRT_GPIO] = { 0x09030000, 0x00001000 },
[VIRT_SECURE_UART] = { 0x09040000, 0x00001000 },
+ [VIRT_IOMMU] = { 0x09050000, 0x00000200 },
[VIRT_MMIO] = { 0x0a000000, 0x00000200 },
/* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
[VIRT_PLATFORM_BUS] = { 0x0c000000, 0x02000000 },
@@ -159,6 +161,7 @@ static const int a15irqmap[] = {
[VIRT_SECURE_UART] = 8,
[VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
[VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
+ [VIRT_IOMMU] = 74,
[VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS -1 */
};

@@ -999,7 +1002,81 @@ static void create_pcie_irq_map(const VirtMachineState *vms,
0x7 /* PCI irq */);
}

-static void create_pcie(const VirtMachineState *vms, qemu_irq *pic)
+static int bind_virtio_iommu_device(Object *obj, void *opaque)
+{
+ VirtMachineState *vms = (VirtMachineState *)opaque;
+ struct arm_boot_info *info = &vms->bootinfo;
+ int dtb_size;
+ void *fdt = info->get_dtb(info, &dtb_size);
+ Object *dev;
+
+ dev = object_dynamic_cast(obj, TYPE_VIRTIO_IOMMU);
+
+ if (!dev) {
+ /* Container, traverse it for children */
+ return object_child_foreach(obj, bind_virtio_iommu_device, opaque);
+ }
+
+ qemu_fdt_setprop_cells(fdt, vms->pcie_host_nodename, "iommu-map",
+ 0x0, vms->iommu_phandle, 0x0, 0x10000);
+
+ return true;
+}
+
+static
+void virtio_iommu_notifier(Notifier *notifier, void *data)
+{
+ VirtMachineState *vms = container_of(notifier, VirtMachineState,
+ virtio_iommu_done);
+ VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+ Object *container;
+
+
+ if (vmc->no_iommu) {
+ return;
+ }
+
+ container = container_get(qdev_get_machine(), "/peripheral");
+ bind_virtio_iommu_device(container, vms);
+ container = container_get(qdev_get_machine(), "/peripheral-anon");
+ bind_virtio_iommu_device(container, vms);
+}
+
+static void create_virtio_iommu(VirtMachineState *vms, qemu_irq *pic)
+{
+ char *smmu;
+ const char compat[] = "virtio,mmio";
+ int irq = vms->irqmap[VIRT_IOMMU];
+ hwaddr base = vms->memmap[VIRT_IOMMU].base;
+ hwaddr size = vms->memmap[VIRT_IOMMU].size;
+ VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+
+ if (vmc->no_iommu) {
+ return;
+ }
+
+ vms->iommu_phandle = qemu_fdt_alloc_phandle(vms->fdt);
+
+ sysbus_create_simple("virtio-mmio", base, pic[irq]);
+
+ smmu = g_strdup_printf("/virtio_mmio@%" PRIx64, base);
+ qemu_fdt_add_subnode(vms->fdt, smmu);
+ qemu_fdt_setprop(vms->fdt, smmu, "compatible", compat, sizeof(compat));
+ qemu_fdt_setprop_sized_cells(vms->fdt, smmu, "reg", 2, base, 2, size);
+
+ qemu_fdt_setprop_cells(vms->fdt, smmu, "interrupts",
+ GIC_FDT_IRQ_TYPE_SPI, irq, GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
+
+ qemu_fdt_setprop(vms->fdt, smmu, "dma-coherent", NULL, 0);
+ qemu_fdt_setprop_cell(vms->fdt, smmu, "#iommu-cells", 1);
+ qemu_fdt_setprop_cell(vms->fdt, smmu, "phandle", vms->iommu_phandle);
+ g_free(smmu);
+
+ vms->virtio_iommu_done.notify = virtio_iommu_notifier;
+ qemu_add_machine_init_done_notifier(&vms->virtio_iommu_done);
+}
+
+static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
{
hwaddr base_mmio = vms->memmap[VIRT_PCIE_MMIO].base;
hwaddr size_mmio = vms->memmap[VIRT_PCIE_MMIO].size;
@@ -1073,7 +1150,8 @@ static void create_pcie(const VirtMachineState *vms, qemu_irq *pic)
}
}

- nodename = g_strdup_printf("/pcie@%" PRIx64, base);
+ vms->pcie_host_nodename = g_strdup_printf("/pcie@%" PRIx64, base);
+ nodename = vms->pcie_host_nodename;
qemu_fdt_add_subnode(vms->fdt, nodename);
qemu_fdt_setprop_string(vms->fdt, nodename,
"compatible", "pci-host-ecam-generic");
@@ -1112,7 +1190,6 @@ static void create_pcie(const VirtMachineState *vms, qemu_irq *pic)
qemu_fdt_setprop_cell(vms->fdt, nodename, "#interrupt-cells", 1);
create_pcie_irq_map(vms, vms->gic_phandle, irq, nodename);

- g_free(nodename);
}

static void create_platform_bus(VirtMachineState *vms, qemu_irq *pic)
@@ -1457,16 +1534,16 @@ static void machvirt_init(MachineState *machine)

create_rtc(vms, pic);

- create_pcie(vms, pic);
-
- create_gpio(vms, pic);
-
/* Create mmio transports, so the user can create virtio backends
* (which will be automatically plugged in to the transports). If
* no backend is created the transport will just sit harmlessly idle.
*/
create_virtio_devices(vms, pic);

+ create_pcie(vms, pic);
+
+ create_gpio(vms, pic);
+
vms->fw_cfg = create_fw_cfg(vms, &address_space_memory);
rom_set_fw(vms->fw_cfg);

@@ -1491,6 +1568,7 @@ static void machvirt_init(MachineState *machine)
* Notifiers are executed in registration reverse order.
*/
create_platform_bus(vms, pic);
+ create_virtio_iommu(vms, pic);
}

static bool virt_get_secure(Object *obj, Error **errp)
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ff27551..e00fa65 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -59,6 +59,7 @@ enum {
VIRT_GIC_V2M,
VIRT_GIC_ITS,
VIRT_GIC_REDIST,
+ VIRT_IOMMU,
VIRT_UART,
VIRT_MMIO,
VIRT_RTC,
@@ -91,6 +92,7 @@ typedef struct {
typedef struct {
MachineState parent;
Notifier machine_done;
+ Notifier virtio_iommu_done;
FWCfgState *fw_cfg;
bool secure;
bool highmem;
@@ -106,6 +108,8 @@ typedef struct {
uint32_t clock_phandle;
uint32_t gic_phandle;
uint32_t msi_phandle;
+ uint32_t iommu_phandle;
+ char *pcie_host_nodename;
int psci_conduit;
} VirtMachineState;
--
2.5.5
Eric Auger
2017-09-19 07:46:48 UTC
Permalink
On ARM, the MSI doorbell is translated by the virtual IOMMU.
As such address_space_translate() returns the MSI controller
MMIO region and we get an "iommu map to non memory area"
message. Let's remove this latter.

Signed-off-by: Eric Auger <***@redhat.com>
---
hw/vfio/common.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 468a259..fb6f397 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -326,8 +326,6 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
iotlb->translated_addr,
&xlat, &len, writable);
if (!memory_region_is_ram(mr)) {
- error_report("iommu map to non memory area %"HWADDR_PRIx"",
- xlat);
return false;
}
--
2.5.5
Eric Auger
2017-09-19 07:46:47 UTC
Permalink
We implement the set_page_size_mask callback to allow the
virtio-iommu to be aware of any restrictions on the page size
mask due to an underlying HW IOMMU.

Signed-off-by: Eric Auger <***@redhat.com>
---
hw/virtio/trace-events | 1 +
hw/virtio/virtio-iommu.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 9ccfad1..2793604 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -48,3 +48,4 @@ virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc [0x%"PRI
virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
virtio_iommu_fill_resv_property(uint32_t devid, uint8_t subtype, uint64_t addr, uint64_t size, uint32_t flags, size_t filled) "dev= %d, subtype=%d addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d filled=0x%lx"
virtio_iommu_fill_none_property(uint32_t devid) "devid=%d"
+virtio_iommu_set_page_size_mask(const char *iommu_mr, uint64_t mask) "mr=%s page_size_mask=0x%"PRIx64
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 281b0f8..1873b9a 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -714,6 +714,21 @@ unlock:
return entry;
}

+static void virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
+ uint64_t page_size_mask)
+{
+ IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+ VirtIOIOMMU *s = sdev->viommu;
+
+ s->config.page_size_mask &= page_size_mask;
+ if (!s->config.page_size_mask) {
+ error_setg(&error_fatal,
+ "No compatible page size between guest and host iommus");
+ }
+
+ trace_virtio_iommu_set_page_size_mask(mr->parent_obj.name, page_size_mask);
+}
+
static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
{
VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
@@ -881,6 +896,7 @@ static void virtio_iommu_memory_region_class_init(ObjectClass *klass,
IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);

imrc->translate = virtio_iommu_translate;
+ imrc->set_page_size_mask = virtio_iommu_set_page_size_mask;
}

static const TypeInfo virtio_iommu_info = {
--
2.5.5
Tomasz Nowicki
2017-09-27 11:07:20 UTC
Permalink
Hi Eric,

Out of curiosity, I compared your SMMUv3 full emulation against
virtio-iommu. For virtio-net device behind the virtio-iommu I get 50%
performance of what I can get for SMMUv3 emulation.

Do you have similar observations? Since there is no need to emulate HW
behaviour in QEMU I expected virtio-iommu to be faster. I will run some
more benchmarks.

Thanks,
Tomasz
Post by Eric Auger
This series implements the virtio-iommu device.
This v4 is an upgrade to v0.4 spec [1] and applies on QEMU v2.10.0.
- probe request support although no reserved region is returned at
the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell
The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only, as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node.
The associated VHOST/VFIO adaptation is available on the branch
below but is not officially delivered as part of this series.
Best Regards
Eric
https://github.com/eauger/qemu/tree/v2.10.0-virtio-iommu-v4
[1] [RFC] virtio-iommu version 0.4
git://linux-arm.org/virtio-iommu.git branch viommu/v0.4
- guest kernel with v0.4 virtio-iommu driver (4kB page)
- tested with guest using virtio-pci-net and vhost-net
(,vhost=on/off,iommu_platform,disable-modern=off,disable-legacy=on )
- tested with VFIO and guest assigned with 2 VFs
- tested with DPDK on guest with 2 assigned VFs
- hot-plug/hot-unplug of EP: not implemented
- see above
- rebase on top of 2.10-rc0 and especially
[PATCH qemu v9 0/2] memory/iommu: QOM'fy IOMMU MemoryRegion
- add mutex init
- fix as->mappings deletion using g_tree_ref/unref
- when a dev is attached whereas it is already attached to
another address space, first detach it
- fix some error values
- page_sizes = TARGET_PAGE_MASK;
- I haven't changed the unmap() semantics yet, waiting for the
next virtio-iommu spec revision.
- fix redifinition of viommu_as typedef
update-linux-headers: import virtio_iommu.h
linux-headers: Update for virtio-iommu
virtio-iommu: add skeleton
virtio-iommu: Decode the command payload
virtio-iommu: Add the iommu regions
virtio-iommu: Register attached devices
virtio-iommu: Implement attach/detach command
virtio-iommu: Implement map/unmap
virtio-iommu: Implement translate
virtio-iommu: Implement probe request
hw/arm/virt: Add 2.11 machine type
hw/arm/virt: Add virtio-iommu to the virt board
memory.h: Add set_page_size_mask IOMMUMemoryRegion callback
hw/vfio/common: Set the IOMMUMemoryRegion supported page sizes
virtio-iommu: Implement set_page_size_mask
hw/vfio/common: Do not print error when viommu translates into an mmio
region
hw/arm/virt.c | 115 +++-
hw/vfio/common.c | 7 +-
hw/virtio/Makefile.objs | 1 +
hw/virtio/trace-events | 24 +
hw/virtio/virtio-iommu.c | 923 ++++++++++++++++++++++++++
include/exec/memory.h | 4 +
include/hw/arm/virt.h | 5 +
include/hw/vfio/vfio-common.h | 1 +
include/hw/virtio/virtio-iommu.h | 61 ++
include/standard-headers/linux/virtio_ids.h | 1 +
include/standard-headers/linux/virtio_iommu.h | 177 +++++
linux-headers/linux/virtio_iommu.h | 1 +
scripts/update-linux-headers.sh | 3 +
13 files changed, 1312 insertions(+), 11 deletions(-)
create mode 100644 hw/virtio/virtio-iommu.c
create mode 100644 include/hw/virtio/virtio-iommu.h
create mode 100644 include/standard-headers/linux/virtio_iommu.h
create mode 100644 linux-headers/linux/virtio_iommu.h
Auger Eric
2017-09-27 15:38:47 UTC
Permalink
Hi Tomasz
Post by Tomasz Nowicki
Hi Eric,
Out of curiosity, I compared your SMMUv3 full emulation against
virtio-iommu. For virtio-net device behind the virtio-iommu I get 50%
performance of what I can get for SMMUv3 emulation.
Do you have similar observations? Since there is no need to emulate HW
behaviour in QEMU I expected virtio-iommu to be faster. I will run some
more benchmarks.
No I don't have formal comparative figures at the moment. Benchmarking
is also on the top of my todo list though.

One reason could be that virtio-iommu sends requests for each map
operations whereas this does not happen for vsmmuv3 (in non caching
mode). You could try to turn the vsmmuv3 caching-mode option and I guess
you should have more similar perf figures between virtio-iommu and
vsmuv3.

Thanks

Eric
Post by Tomasz Nowicki
Thanks,
Tomasz
Post by Eric Auger
This series implements the virtio-iommu device.
This v4 is an upgrade to v0.4 spec [1] and applies on QEMU v2.10.0.
- probe request support although no reserved region is returned at
the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell
The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only, as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node.
The associated VHOST/VFIO adaptation is available on the branch
below but is not officially delivered as part of this series.
Best Regards
Eric
https://github.com/eauger/qemu/tree/v2.10.0-virtio-iommu-v4
[1] [RFC] virtio-iommu version 0.4
git://linux-arm.org/virtio-iommu.git branch viommu/v0.4
- guest kernel with v0.4 virtio-iommu driver (4kB page)
- tested with guest using virtio-pci-net and vhost-net
(,vhost=on/off,iommu_platform,disable-modern=off,disable-legacy=on )
- tested with VFIO and guest assigned with 2 VFs
- tested with DPDK on guest with 2 assigned VFs
- hot-plug/hot-unplug of EP: not implemented
- see above
- rebase on top of 2.10-rc0 and especially
[PATCH qemu v9 0/2] memory/iommu: QOM'fy IOMMU MemoryRegion
- add mutex init
- fix as->mappings deletion using g_tree_ref/unref
- when a dev is attached whereas it is already attached to
another address space, first detach it
- fix some error values
- page_sizes = TARGET_PAGE_MASK;
- I haven't changed the unmap() semantics yet, waiting for the
next virtio-iommu spec revision.
- fix redifinition of viommu_as typedef
update-linux-headers: import virtio_iommu.h
linux-headers: Update for virtio-iommu
virtio-iommu: add skeleton
virtio-iommu: Decode the command payload
virtio-iommu: Add the iommu regions
virtio-iommu: Register attached devices
virtio-iommu: Implement attach/detach command
virtio-iommu: Implement map/unmap
virtio-iommu: Implement translate
virtio-iommu: Implement probe request
hw/arm/virt: Add 2.11 machine type
hw/arm/virt: Add virtio-iommu to the virt board
memory.h: Add set_page_size_mask IOMMUMemoryRegion callback
hw/vfio/common: Set the IOMMUMemoryRegion supported page sizes
virtio-iommu: Implement set_page_size_mask
hw/vfio/common: Do not print error when viommu translates into an mmio
region
hw/arm/virt.c | 115 +++-
hw/vfio/common.c | 7 +-
hw/virtio/Makefile.objs | 1 +
hw/virtio/trace-events | 24 +
hw/virtio/virtio-iommu.c | 923
++++++++++++++++++++++++++
include/exec/memory.h | 4 +
include/hw/arm/virt.h | 5 +
include/hw/vfio/vfio-common.h | 1 +
include/hw/virtio/virtio-iommu.h | 61 ++
include/standard-headers/linux/virtio_ids.h | 1 +
include/standard-headers/linux/virtio_iommu.h | 177 +++++
linux-headers/linux/virtio_iommu.h | 1 +
scripts/update-linux-headers.sh | 3 +
13 files changed, 1312 insertions(+), 11 deletions(-)
create mode 100644 hw/virtio/virtio-iommu.c
create mode 100644 include/hw/virtio/virtio-iommu.h
create mode 100644 include/standard-headers/linux/virtio_iommu.h
create mode 100644 linux-headers/linux/virtio_iommu.h
Peter Maydell
2017-10-11 14:56:59 UTC
Permalink
Post by Eric Auger
This series implements the virtio-iommu device.
This v4 is an upgrade to v0.4 spec [1] and applies on QEMU v2.10.0.
- probe request support although no reserved region is returned at
the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell
The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only, as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node.
Could this work on x86, or is it inherently arm-only?

thanks
-- PMM
Auger Eric
2017-10-11 16:08:58 UTC
Permalink
Hi Peter,
Post by Peter Maydell
Post by Eric Auger
This series implements the virtio-iommu device.
This v4 is an upgrade to v0.4 spec [1] and applies on QEMU v2.10.0.
- probe request support although no reserved region is returned at
the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell
The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only, as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node.
Could this work on x86, or is it inherently arm-only?
Yes this is the goal. At the moment the ACPI probing is not yet properly
specified but a Q35 prototype was developed in the Red Hat Virt team.
This will be presented at the KVM forum.

Thanks

Eric
Post by Peter Maydell
thanks
-- PMM
Peter Maydell
2017-10-12 09:54:25 UTC
Permalink
Post by Auger Eric
Hi Peter,
Post by Peter Maydell
Post by Eric Auger
This series implements the virtio-iommu device.
This v4 is an upgrade to v0.4 spec [1] and applies on QEMU v2.10.0.
- probe request support although no reserved region is returned at
the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell
The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only, as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node.
Could this work on x86, or is it inherently arm-only?
Yes this is the goal. At the moment the ACPI probing is not yet properly
specified but a Q35 prototype was developed in the Red Hat Virt team.
This will be presented at the KVM forum.
Since I have very little familiarity with virtio or iommu code,
I'd be much happier if this was reviewed as a generic virtio-iommu
by the x86/virtio devs and then the arm specific parts done second...

I'm also not clear on what we're expecting the recommended or normal
way to do device passthrough is going to be -- this virtio-mmio,
or presenting the guest with an SMMUv3 interface? Do we really
need to implement both ?

thanks
-- PMM
Auger Eric
2017-10-12 10:09:40 UTC
Permalink
Hi Peter,
Post by Peter Maydell
Post by Auger Eric
Hi Peter,
Post by Peter Maydell
Post by Eric Auger
This series implements the virtio-iommu device.
This v4 is an upgrade to v0.4 spec [1] and applies on QEMU v2.10.0.
- probe request support although no reserved region is returned at
the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell
The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only, as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node.
Could this work on x86, or is it inherently arm-only?
Yes this is the goal. At the moment the ACPI probing is not yet properly
specified but a Q35 prototype was developed in the Red Hat Virt team.
This will be presented at the KVM forum.
Since I have very little familiarity with virtio or iommu code,
I'd be much happier if this was reviewed as a generic virtio-iommu
by the x86/virtio devs and then the arm specific parts done second...
Understood. I was rather expecting you to review the smmuv3 emulation
code which you did, in a comprehensive manner ;-), and many thanks for that.

Note sure this is time yet to get this RFC reviewed as
- the v0.4 virtio-iommu driver it relies on was not officially submitted,
- the virtio-iommu specification review has not really been reviewed,
- the ACPI probing method has not been discussed yet.

Jean-Philippe, please correct me if I am wrong.

So to me, this is pure RFC at the moment.
Post by Peter Maydell
I'm also not clear on what we're expecting the recommended or normal
way to do device passthrough is going to be -- this virtio-mmio,
or presenting the guest with an SMMUv3 interface? Do we really
need to implement both ?
I think the KVM forum is the right place to sync as both approaches will
be presented and some pros/cons + performance figures will be given.

As we talk about choosing, there is one alternative that was suggested
on the ML by Alex & Michael but never really get considered yet and
maybe should be: using intel iommu emulation code for ARM. I aknowledge
this deserves a thorough impact study on kernel and FW side but I would
be happy to get your opinion about the QEMU side. Would you have a by-
principle rejection of this idea to instantiate such an Intel device in
mach virt or would it be something you would be ready to consider?

Thanks

Eric
Post by Peter Maydell
thanks
-- PMM
Jean-Philippe Brucker
2017-10-12 10:46:26 UTC
Permalink
Post by Auger Eric
Hi Peter,
Post by Peter Maydell
Post by Auger Eric
Hi Peter,
Post by Peter Maydell
Post by Eric Auger
This series implements the virtio-iommu device.
This v4 is an upgrade to v0.4 spec [1] and applies on QEMU v2.10.0.
- probe request support although no reserved region is returned at
the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell
The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only, as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node.
Could this work on x86, or is it inherently arm-only?
Yes this is the goal. At the moment the ACPI probing is not yet properly
specified but a Q35 prototype was developed in the Red Hat Virt team.
This will be presented at the KVM forum.
Since I have very little familiarity with virtio or iommu code,
I'd be much happier if this was reviewed as a generic virtio-iommu
by the x86/virtio devs and then the arm specific parts done second...
Understood. I was rather expecting you to review the smmuv3 emulation
code which you did, in a comprehensive manner ;-), and many thanks for that.
Note sure this is time yet to get this RFC reviewed as
- the v0.4 virtio-iommu driver it relies on was not officially submitted,
- the virtio-iommu specification review has not really been reviewed,
- the ACPI probing method has not been discussed yet.
Jean-Philippe, please correct me if I am wrong.
So to me, this is pure RFC at the moment.
Yes, having it as an RFC for the moment allowed to break compatibility
between versions of the specification. The downside is that it probably
doesn't get as many eyeballs as it would without an RFC tag (although I
did receive tonnes of helpful comments). Maybe v0.5, that should be
published shortly, can be a candidate for mainline.

Thanks,
Jean
Post by Auger Eric
Post by Peter Maydell
I'm also not clear on what we're expecting the recommended or normal
way to do device passthrough is going to be -- this virtio-mmio,
or presenting the guest with an SMMUv3 interface? Do we really
need to implement both ?
I think the KVM forum is the right place to sync as both approaches will
be presented and some pros/cons + performance figures will be given.
As we talk about choosing, there is one alternative that was suggested
on the ML by Alex & Michael but never really get considered yet and
maybe should be: using intel iommu emulation code for ARM. I aknowledge
this deserves a thorough impact study on kernel and FW side but I would
be happy to get your opinion about the QEMU side. Would you have a by-
principle rejection of this idea to instantiate such an Intel device in
mach virt or would it be something you would be ready to consider?
Thanks
Eric
Post by Peter Maydell
thanks
-- PMM
Tian, Kevin
2017-10-13 07:01:54 UTC
Permalink
Sent: Thursday, October 12, 2017 6:10 PM
Hi Peter,
Post by Peter Maydell
Post by Auger Eric
Hi Peter,
Post by Peter Maydell
Post by Eric Auger
This series implements the virtio-iommu device.
This v4 is an upgrade to v0.4 spec [1] and applies on QEMU v2.10.0.
- probe request support although no reserved region is returned at
the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell
The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only, as the machine
must handle the dt binding between the virtio-mmio "iommu" node
and
Post by Peter Maydell
Post by Auger Eric
Post by Peter Maydell
Post by Eric Auger
the PCI host bridge node.
Could this work on x86, or is it inherently arm-only?
Yes this is the goal. At the moment the ACPI probing is not yet properly
specified but a Q35 prototype was developed in the Red Hat Virt team.
This will be presented at the KVM forum.
Since I have very little familiarity with virtio or iommu code,
I'd be much happier if this was reviewed as a generic virtio-iommu
by the x86/virtio devs and then the arm specific parts done second...
Understood. I was rather expecting you to review the smmuv3 emulation
code which you did, in a comprehensive manner ;-), and many thanks for that.
Note sure this is time yet to get this RFC reviewed as
- the v0.4 virtio-iommu driver it relies on was not officially submitted,
- the virtio-iommu specification review has not really been reviewed,
- the ACPI probing method has not been discussed yet.
Jean-Philippe, please correct me if I am wrong.
So to me, this is pure RFC at the moment.
Post by Peter Maydell
I'm also not clear on what we're expecting the recommended or normal
way to do device passthrough is going to be -- this virtio-mmio,
or presenting the guest with an SMMUv3 interface? Do we really
need to implement both ?
I think the KVM forum is the right place to sync as both approaches will
be presented and some pros/cons + performance figures will be given.
As we talk about choosing, there is one alternative that was suggested
on the ML by Alex & Michael but never really get considered yet and
maybe should be: using intel iommu emulation code for ARM. I
aknowledge
this deserves a thorough impact study on kernel and FW side but I would
be happy to get your opinion about the QEMU side. Would you have a by-
principle rejection of this idea to instantiate such an Intel device in
mach virt or would it be something you would be ready to consider?
bear posting a link to Alex/Michael's comment? interesting to know
the rationale...

Tha
Auger Eric
2017-10-13 07:43:52 UTC
Permalink
Hi Kevin,
Post by Tian, Kevin
Sent: Thursday, October 12, 2017 6:10 PM
Hi Peter,
Post by Peter Maydell
Post by Auger Eric
Hi Peter,
Post by Peter Maydell
Post by Eric Auger
This series implements the virtio-iommu device.
This v4 is an upgrade to v0.4 spec [1] and applies on QEMU v2.10.0.
- probe request support although no reserved region is returned at
the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell
The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only, as the machine
must handle the dt binding between the virtio-mmio "iommu" node
and
Post by Peter Maydell
Post by Auger Eric
Post by Peter Maydell
Post by Eric Auger
the PCI host bridge node.
Could this work on x86, or is it inherently arm-only?
Yes this is the goal. At the moment the ACPI probing is not yet properly
specified but a Q35 prototype was developed in the Red Hat Virt team.
This will be presented at the KVM forum.
Since I have very little familiarity with virtio or iommu code,
I'd be much happier if this was reviewed as a generic virtio-iommu
by the x86/virtio devs and then the arm specific parts done second...
Understood. I was rather expecting you to review the smmuv3 emulation
code which you did, in a comprehensive manner ;-), and many thanks for that.
Note sure this is time yet to get this RFC reviewed as
- the v0.4 virtio-iommu driver it relies on was not officially submitted,
- the virtio-iommu specification review has not really been reviewed,
- the ACPI probing method has not been discussed yet.
Jean-Philippe, please correct me if I am wrong.
So to me, this is pure RFC at the moment.
Post by Peter Maydell
I'm also not clear on what we're expecting the recommended or normal
way to do device passthrough is going to be -- this virtio-mmio,
or presenting the guest with an SMMUv3 interface? Do we really
need to implement both ?
I think the KVM forum is the right place to sync as both approaches will
be presented and some pros/cons + performance figures will be given.
As we talk about choosing, there is one alternative that was suggested
on the ML by Alex & Michael but never really get considered yet and
maybe should be: using intel iommu emulation code for ARM. I
aknowledge
this deserves a thorough impact study on kernel and FW side but I would
be happy to get your opinion about the QEMU side. Would you have a by-
principle rejection of this idea to instantiate such an Intel device in
mach virt or would it be something you would be ready to consider?
bear posting a link to Alex/Michael's comment? interesting to know
the rationale...
This was suggested here for instance:

https://lkml.org/lkml/2017/7/12/579

I think rationale was
- this is an emulated platform so there is more freedom
- vtd emulation code is rather stable
- it has pieces missing on smmu: cachine mode, IOTLB invalidation
command with addr_mask,
- intel iommu driver implements deferred IOTLB Invalidation which boosts
perf

But problems I foresee are:
- MSI handling. ARM MSI doorbells can be anywhere in the GPA address
space whereas Intel has MSIs within the APIC window: [FEE0_0000h –
FEF0_000h]. So I suspect the MSI handling may not work at kernel level.
- intel IOMMU input/output address ranges and page sizes may be
different from ARM ones.
- ARM uses ACPI IORT for binding RC <-> IOMMU <-> MSI controller whereas
Intel uses other tables
- Intel's dmar kernel code must be compiled/enabled on ARM

So personally I don't think this solution is viable but I prefer this
gets discussed on the ML.

Thanks

Eric
Post by Tian, Kevin
Thanks
Kevin
Continue reading on narkive:
Loading...