[Qemu-devel] [RESEND PATCH 0/6] Introduce new iommu notifier framework

Discussion:

Liu, Yi L

2017-11-03 12:01:50 UTC

Hi,

Resend it due to the build error reported by the auto-test. No functional
change compared with the previous sending.

This patchset is a follow-up of Peter Xu's patchset as the link below.
In brief, Peter's patchset is to introduce a common IOMMU object which
is not depending on platform (x86/ppc/...), or bus (PCI/...). And based
on it, a iommu object based notifier framework is introduced and also
AddressSpaceOps is added to provide methods like getting IOMMUObject
behind an AddressSpace. It could be used to detect the exposure of
vIOMMU.

https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05360.html

Here let me try to address why we need such change.

I'm working on virt-SVM enabling for passthru devices on Intel platform.
This work is to extend the existing intel iommu emulator in Qemu. Among
the extensions, there are two requirements which ae related to the topic
we are talking here.

* intel iommu emulator needs to propagate a guest pasid table pointer
to host through VFIO. So that host intel iommu driver could set it to
its ctx table. With guest pasid table pointer set, host would be able
to get guest CR3 table after guest calls intel_svm_bind_mm(). Then HW
iommu could do nested translation to get GVA->GPA GPA->HPA. Thus enables
Shared Virtual Memory in guest.

* intel iommu emulator needs to propagate guest's iotlb(1st level cache)
flush to host through VFIO.

Since the two requirements need to talk with VFIO, so notifiers are
needed. Meanwhile, the notifiers should be registered as long as there
is vIOMMU exposed to guest.

Qemu has an existing notifier framework based on MemoryRegion. And we
are using it for MAP/UNMAP. However, we cannot use it here. Reason is
as below:

* IOMMU MemoryRegion notifiers depends on IOMMU MemoryRegion. If guest
iommu driver configs to bypass the IOVA adress translation. The address
space would be system ram address space. The MemoryRegion would be the
RAM MemoryRegion. Details can be got in Peter's patch to allow dynamic
switch of IOMMU region.
https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg02690.html

* virt-SVM requires guest to config to bypass the IOVA address translation
With such config, we can make sure host would have a GPA->HPA mapping,
and meanwhile intel iommu emulator could propagate the guest CR3 table
(GVA->GPA) to host. With nested translation, we are able to achieve
GVA->GPA and then GPA->HPA translation. However, if so, the IOMMU
MemoryRegion notifiers would not be registered. It means for virt-SVM,
we need another notifier framework.

Based on Peter's patch, I did some clean up and fulfill the notifier
framework based on IOMMUObject and also provide an example of the newly
introduced notifier framework. The notifier framework introduced here
is going to be used in my virt-SVM patchset.

For virt-SVM design details, you may refer to svm RFC patch.
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04925.html

Liu, Yi L (3):
vfio: rename GuestIOMMU to be GuestIOMMUMR
vfio/pci: add notify framework based on IOMMUObject
vfio/pci: register vfio_iommu_bind_pasidtbl_notify notifier

Peter Xu (3):
memory: rename existing iommu notifier to be iommu mr notifier
memory: introduce AddressSpaceOps and IOMMUObject
intel_iommu: provide AddressSpaceOps.iommu_get instance

hw/core/Makefile.objs | 1 +
hw/core/iommu.c | 58 ++++++++++++++++++++++++++++++++
hw/i386/amd_iommu.c | 6 ++--
hw/i386/intel_iommu.c | 41 +++++++++++++----------
hw/ppc/spapr_iommu.c | 8 ++---
hw/s390x/s390-pci-bus.c | 2 +-
hw/vfio/common.c | 25 +++++++-------
hw/vfio/pci.c | 53 ++++++++++++++++++++++++++++-
hw/virtio/vhost.c | 10 +++---
include/exec/memory.h | 77 ++++++++++++++++++++++++++++---------------
include/hw/core/iommu.h | 73 ++++++++++++++++++++++++++++++++++++++++
include/hw/i386/intel_iommu.h | 10 +++---
include/hw/vfio/vfio-common.h | 16 ++++++---
include/hw/virtio/vhost.h | 4 +--
memory.c | 47 +++++++++++++++-----------
15 files changed, 331 insertions(+), 100 deletions(-)
create mode 100644 hw/core/iommu.c
create mode 100644 include/hw/core/iommu.h

--
1.9.1

Liu, Yi L

2017-11-03 12:01:51 UTC

Permalink

From: Peter Xu <***@redhat.com>

IOMMU notifiers before are mostly used for [dev-]IOTLB stuffs. It is not
suitable for other kind of notifiers (one example would be the future
virt-svm support). Considering that current notifiers are targeted for
per memory region, renaming the iommu notifier definitions.

* all the notifier types from IOMMU_NOTIFIER_* prefix into IOMMU_MR_EVENT_*
to better show its usage (for memory regions).
* rename IOMMUNotifier to IOMMUMRNotifier
* rename iommu_notifier to iommu_mr_notifier

Signed-off-by: Peter Xu <***@redhat.com>
Signed-off-by: Liu, Yi L <***@linux.intel.com>
---
hw/i386/amd_iommu.c | 6 ++---
hw/i386/intel_iommu.c | 34 +++++++++++++-------------
hw/ppc/spapr_iommu.c | 8 +++----
hw/s390x/s390-pci-bus.c | 2 +-
hw/vfio/common.c | 10 ++++----
hw/virtio/vhost.c | 10 ++++----
include/exec/memory.h | 55 ++++++++++++++++++++++---------------------
include/hw/i386/intel_iommu.h | 8 +++----
include/hw/vfio/vfio-common.h | 2 +-
include/hw/virtio/vhost.h | 4 ++--
memory.c | 37 +++++++++++++++--------------
11 files changed, 89 insertions(+), 87 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index ad8155c..8f756e8 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1072,12 +1072,12 @@ static const MemoryRegionOps mmio_mem_ops = {
};

static void amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
- IOMMUNotifierFlag old,
- IOMMUNotifierFlag new)
+ IOMMUMREventFlag old,
+ IOMMUMREventFlag new)
{
AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);

- if (new & IOMMU_NOTIFIER_MAP) {
+ if (new & IOMMU_MR_EVENT_MAP) {
error_report("device %02x.%02x.%x requires iommu notifier which is not "
"currently supported", as->bus_num, PCI_SLOT(as->devfn),
PCI_FUNC(as->devfn));
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3a5bb0b..e81c706 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1234,7 +1234,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)

static void vtd_iommu_replay_all(IntelIOMMUState *s)
{
- IntelIOMMUNotifierNode *node;
+ IntelIOMMUMRNotifierNode *node;

QLIST_FOREACH(node, &s->notifiers_list, next) {
memory_region_iommu_replay_all(&node->vtd_as->iommu);
@@ -1308,7 +1308,7 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
/*
* So a device is moving out of (or moving into) a
* domain, a replay() suites here to notify all the
- * IOMMU_NOTIFIER_MAP registers about this change.
+ * IOMMU_MR_EVENT_MAP registers about this change.
* This won't bring bad even if we have no such
* notifier registered - the IOMMU notification
* framework will skip MAP notifications if that
@@ -1358,7 +1358,7 @@ static void vtd_iotlb_global_invalidate(IntelIOMMUState *s)

static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
{
- IntelIOMMUNotifierNode *node;
+ IntelIOMMUMRNotifierNode *node;
VTDContextEntry ce;
VTDAddressSpace *vtd_as;

@@ -1388,7 +1388,7 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
uint16_t domain_id, hwaddr addr,
uint8_t am)
{
- IntelIOMMUNotifierNode *node;
+ IntelIOMMUMRNotifierNode *node;
VTDContextEntry ce;
int ret;

@@ -2318,21 +2318,21 @@ static IOMMUTLBEntry vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
}

static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
- IOMMUNotifierFlag old,
- IOMMUNotifierFlag new)
+ IOMMUMREventFlag old,
+ IOMMUMREventFlag new)
{
VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
IntelIOMMUState *s = vtd_as->iommu_state;
- IntelIOMMUNotifierNode *node = NULL;
- IntelIOMMUNotifierNode *next_node = NULL;
+ IntelIOMMUMRNotifierNode *node = NULL;
+ IntelIOMMUMRNotifierNode *next_node = NULL;

- if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
+ if (!s->caching_mode && new & IOMMU_MR_EVENT_MAP) {
error_report("We need to set cache_mode=1 for intel-iommu to enable "
"device assignment with IOMMU protection.");
exit(1);
}

- if (old == IOMMU_NOTIFIER_NONE) {
+ if (old == IOMMU_MR_EVENT_NONE) {
node = g_malloc0(sizeof(*node));
node->vtd_as = vtd_as;
QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
@@ -2342,7 +2342,7 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
/* update notifier node with new flags */
QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
if (node->vtd_as == vtd_as) {
- if (new == IOMMU_NOTIFIER_NONE) {
+ if (new == IOMMU_MR_EVENT_NONE) {
QLIST_REMOVE(node, next);
g_free(node);
}
@@ -2759,7 +2759,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
}

/* Unmap the whole range in the notifier's scope. */
-static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
+static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUMRNotifier *n)
{
IOMMUTLBEntry entry;
hwaddr size;
@@ -2814,13 +2814,13 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)

static void vtd_address_space_unmap_all(IntelIOMMUState *s)
{
- IntelIOMMUNotifierNode *node;
+ IntelIOMMUMRNotifierNode *node;
VTDAddressSpace *vtd_as;
- IOMMUNotifier *n;
+ IOMMUMRNotifier *n;

QLIST_FOREACH(node, &s->notifiers_list, next) {
vtd_as = node->vtd_as;
- IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
+ IOMMU_MR_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
vtd_address_space_unmap(vtd_as, n);
}
}
@@ -2828,11 +2828,11 @@ static void vtd_address_space_unmap_all(IntelIOMMUState *s)

static int vtd_replay_hook(IOMMUTLBEntry *entry, void *private)
{
- memory_region_notify_one((IOMMUNotifier *)private, entry);
+ memory_region_notify_one((IOMMUMRNotifier *)private, entry);
return 0;
}

-static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
+static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUMRNotifier *n)
{
VTDAddressSpace *vtd_as = container_of(iommu_mr, VTDAddressSpace, iommu);
IntelIOMMUState *s = vtd_as->iommu_state;
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 5ccd785..088f614 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -161,14 +161,14 @@ static uint64_t spapr_tce_get_min_page_size(IOMMUMemoryRegion *iommu)
}

static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
- IOMMUNotifierFlag old,
- IOMMUNotifierFlag new)
+ IOMMUMREventFlag old,
+ IOMMUMREventFlag new)
{
struct sPAPRTCETable *tbl = container_of(iommu, sPAPRTCETable, iommu);

- if (old == IOMMU_NOTIFIER_NONE && new != IOMMU_NOTIFIER_NONE) {
+ if (old == IOMMU_MR_EVENT_NONE && new != IOMMU_MR_EVENT_NONE) {
spapr_tce_set_need_vfio(tbl, true);
- } else if (old != IOMMU_NOTIFIER_NONE && new == IOMMU_NOTIFIER_NONE) {
+ } else if (old != IOMMU_MR_EVENT_NONE && new == IOMMU_MR_EVENT_NONE) {
spapr_tce_set_need_vfio(tbl, false);
}
}
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index e7a58e8..10b7020 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -398,7 +398,7 @@ static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
}

static void s390_pci_iommu_replay(IOMMUMemoryRegion *iommu,
- IOMMUNotifier *notifier)
+ IOMMUMRNotifier *notifier)
{
/* It's impossible to plug a pci device on s390x that already has iommu
* mappings which need to be replayed, that is due to the "one iommu per
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b2924c..1f7d516 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -346,7 +346,7 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
return true;
}

-static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vfio_iommu_map_notify(IOMMUMRNotifier *n, IOMMUTLBEntry *iotlb)
{
VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
VFIOContainer *container = giommu->container;
@@ -496,10 +496,10 @@ static void vfio_listener_region_add(MemoryListener *listener,
llend = int128_add(int128_make64(section->offset_within_region),
section->size);
llend = int128_sub(llend, int128_one());
- iommu_notifier_init(&giommu->n, vfio_iommu_map_notify,
- IOMMU_NOTIFIER_ALL,
- section->offset_within_region,
- int128_get64(llend));
+ iommu_mr_notifier_init(&giommu->n, vfio_iommu_map_notify,
+ IOMMU_MR_EVENT_ALL,
+ section->offset_within_region,
+ int128_get64(llend));
QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);

memory_region_register_iommu_notifier(section->mr, &giommu->n);
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddc42f0..e2c1228 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -719,7 +719,7 @@ static void vhost_region_del(MemoryListener *listener,
}
}

-static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vhost_iommu_unmap_notify(IOMMUMRNotifier *n, IOMMUTLBEntry *iotlb)
{
struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
struct vhost_dev *hdev = iommu->hdev;
@@ -747,10 +747,10 @@ static void vhost_iommu_region_add(MemoryListener *listener,
end = int128_add(int128_make64(section->offset_within_region),
section->size);
end = int128_sub(end, int128_one());
- iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
- IOMMU_NOTIFIER_UNMAP,
- section->offset_within_region,
- int128_get64(end));
+ iommu_mr_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
+ IOMMU_MR_EVENT_UNMAP,
+ section->offset_within_region,
+ int128_get64(end));
iommu->mr = section->mr;
iommu->iommu_offset = section->offset_within_address_space -
section->offset_within_region;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5ed4042..03595e3 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -75,36 +75,36 @@ struct IOMMUTLBEntry {
};

/*
- * Bitmap for different IOMMUNotifier capabilities. Each notifier can
+ * Bitmap for different IOMMUMRNotifier capabilities. Each notifier can
* register with one or multiple IOMMU Notifier capability bit(s).
*/
typedef enum {
- IOMMU_NOTIFIER_NONE = 0,
+ IOMMU_MR_EVENT_NONE = 0,
/* Notify cache invalidations */
- IOMMU_NOTIFIER_UNMAP = 0x1,
+ IOMMU_MR_EVENT_UNMAP = 0x1,
/* Notify entry changes (newly created entries) */
- IOMMU_NOTIFIER_MAP = 0x2,
-} IOMMUNotifierFlag;
+ IOMMU_MR_EVENT_MAP = 0x2,
+} IOMMUMREventFlag;

-#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
+#define IOMMU_MR_EVENT_ALL (IOMMU_MR_EVENT_MAP | IOMMU_MR_EVENT_UNMAP)

-struct IOMMUNotifier;
-typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier,
+struct IOMMUMRNotifier;
+typedef void (*IOMMUMRNotify)(struct IOMMUMRNotifier *notifier,
IOMMUTLBEntry *data);

-struct IOMMUNotifier {
- IOMMUNotify notify;
- IOMMUNotifierFlag notifier_flags;
+struct IOMMUMRNotifier {
+ IOMMUMRNotify notify;
+ IOMMUMREventFlag notifier_flags;
/* Notify for address space range start <= addr <= end */
hwaddr start;
hwaddr end;
- QLIST_ENTRY(IOMMUNotifier) node;
+ QLIST_ENTRY(IOMMUMRNotifier) node;
};
-typedef struct IOMMUNotifier IOMMUNotifier;
+typedef struct IOMMUMRNotifier IOMMUMRNotifier;

-static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
- IOMMUNotifierFlag flags,
- hwaddr start, hwaddr end)
+static inline void iommu_mr_notifier_init(IOMMUMRNotifier *n, IOMMUMRNotify fn,
+ IOMMUMREventFlag flags,
+ hwaddr start, hwaddr end)
{
n->notify = fn;
n->notifier_flags = flags;
@@ -206,10 +206,10 @@ typedef struct IOMMUMemoryRegionClass {
uint64_t (*get_min_page_size)(IOMMUMemoryRegion *iommu);
/* Called when IOMMU Notifier flag changed */
void (*notify_flag_changed)(IOMMUMemoryRegion *iommu,
- IOMMUNotifierFlag old_flags,
- IOMMUNotifierFlag new_flags);
+ IOMMUMREventFlag old_flags,
+ IOMMUMREventFlag new_flags);
/* Set this up to provide customized IOMMU replay function */
- void (*replay)(IOMMUMemoryRegion *iommu, IOMMUNotifier *notifier);
+ void (*replay)(IOMMUMemoryRegion *iommu, IOMMUMRNotifier *notifier);
} IOMMUMemoryRegionClass;

typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -259,11 +259,11 @@ struct MemoryRegion {
struct IOMMUMemoryRegion {
MemoryRegion parent_obj;

- QLIST_HEAD(, IOMMUNotifier) iommu_notify;
- IOMMUNotifierFlag iommu_notify_flags;
+ QLIST_HEAD(, IOMMUMRNotifier) iommu_notify;
+ IOMMUMREventFlag iommu_notify_flags;
};

-#define IOMMU_NOTIFIER_FOREACH(n, mr) \
+#define IOMMU_MR_NOTIFIER_FOREACH(n, mr) \
QLIST_FOREACH((n), &(mr)->iommu_notify, node)

/**
@@ -879,7 +879,7 @@ void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
* replaces all old entries for the same virtual I/O address range.
* Deleted entries have ***@perm == 0.
*/
-void memory_region_notify_one(IOMMUNotifier *notifier,
+void memory_region_notify_one(IOMMUMRNotifier *notifier,
IOMMUTLBEntry *entry);

/**
@@ -887,12 +887,12 @@ void memory_region_notify_one(IOMMUNotifier *notifier,
* IOMMU translation entries.
*
* @mr: the memory region to observe
- * @n: the IOMMUNotifier to be added; the notify callback receives a
+ * @n: the IOMMUMRNotifier to be added; the notify callback receives a
* pointer to an #IOMMUTLBEntry as the opaque value; the pointer
* ceases to be valid on exit from the notifier.
*/
void memory_region_register_iommu_notifier(MemoryRegion *mr,
- IOMMUNotifier *n);
+ IOMMUMRNotifier *n);

/**
* memory_region_iommu_replay: replay existing IOMMU translations to
@@ -902,7 +902,8 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
* @iommu_mr: the memory region to observe
* @n: the notifier to which to replay iommu mappings
*/
-void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n);
+void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr,
+ IOMMUMRNotifier *n);

/**
* memory_region_iommu_replay_all: replay existing IOMMU translations
@@ -921,7 +922,7 @@ void memory_region_iommu_replay_all(IOMMUMemoryRegion *iommu_mr);
* @n: the notifier to be removed.
*/
void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
- IOMMUNotifier *n);
+ IOMMUMRNotifier *n);

/**
* memory_region_name: get a memory region's name
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index ac15e6b..c85f9ff 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -65,7 +65,7 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
typedef struct VTDIrq VTDIrq;
typedef struct VTD_MSIMessage VTD_MSIMessage;
-typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
+typedef struct IntelIOMMUMRNotifierNode IntelIOMMUMRNotifierNode;

/* Context-Entry */
struct VTDContextEntry {
@@ -251,9 +251,9 @@ struct VTD_MSIMessage {
/* When IR is enabled, all MSI/MSI-X data bits should be zero */
#define VTD_IR_MSI_DATA (0)

-struct IntelIOMMUNotifierNode {
+struct IntelIOMMUMRNotifierNode {
VTDAddressSpace *vtd_as;
- QLIST_ENTRY(IntelIOMMUNotifierNode) next;
+ QLIST_ENTRY(IntelIOMMUMRNotifierNode) next;
};

/* The iommu (DMAR) device state struct */
@@ -293,7 +293,7 @@ struct IntelIOMMUState {
GHashTable *vtd_as_by_busptr; /* VTDBus objects indexed by PCIBus* reference */
VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
/* list of registered notifiers */
- QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
+ QLIST_HEAD(, IntelIOMMUMRNotifierNode) notifiers_list;

/* interrupt remapping */
bool intr_enabled; /* Whether guest enabled IR */
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f3a2ac9..865e3e7 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -97,7 +97,7 @@ typedef struct VFIOGuestIOMMU {
VFIOContainer *container;
IOMMUMemoryRegion *iommu;
hwaddr iommu_offset;
- IOMMUNotifier n;
+ IOMMUMRNotifier n;
QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
} VFIOGuestIOMMU;

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 467dc77..ffe9d9f 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -42,7 +42,7 @@ struct vhost_iommu {
struct vhost_dev *hdev;
MemoryRegion *mr;
hwaddr iommu_offset;
- IOMMUNotifier n;
+ IOMMUMRNotifier n;
QLIST_ENTRY(vhost_iommu) iommu_next;
};

@@ -75,7 +75,7 @@ struct vhost_dev {
struct vhost_log *log;
QLIST_ENTRY(vhost_dev) entry;
QLIST_HEAD(, vhost_iommu) iommu_list;
- IOMMUNotifier n;
+ IOMMUMRNotifier n;
};

int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
diff --git a/memory.c b/memory.c
index e26e5a3..77fb3ef 100644
--- a/memory.c
+++ b/memory.c
@@ -1689,7 +1689,7 @@ void memory_region_init_iommu(void *_iommu_mr,
iommu_mr = IOMMU_MEMORY_REGION(mr);
mr->terminates = true; /* then re-forwards */
QLIST_INIT(&iommu_mr->iommu_notify);
- iommu_mr->iommu_notify_flags = IOMMU_NOTIFIER_NONE;
+ iommu_mr->iommu_notify_flags = IOMMU_MR_EVENT_NONE;
}

static void memory_region_finalize(Object *obj)
@@ -1786,12 +1786,12 @@ bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)

static void memory_region_update_iommu_notify_flags(IOMMUMemoryRegion *iommu_mr)
{
- IOMMUNotifierFlag flags = IOMMU_NOTIFIER_NONE;
- IOMMUNotifier *iommu_notifier;
+ IOMMUMREventFlag flags = IOMMU_MR_EVENT_NONE;
+ IOMMUMRNotifier *iommu_mr_notifier;
IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);

- IOMMU_NOTIFIER_FOREACH(iommu_notifier, iommu_mr) {
- flags |= iommu_notifier->notifier_flags;
+ IOMMU_MR_NOTIFIER_FOREACH(iommu_mr_notifier, iommu_mr) {
+ flags |= iommu_mr_notifier->notifier_flags;
}

if (flags != iommu_mr->iommu_notify_flags && imrc->notify_flag_changed) {
@@ -1804,7 +1804,7 @@ static void memory_region_update_iommu_notify_flags(IOMMUMemoryRegion *iommu_mr)
}

void memory_region_register_iommu_notifier(MemoryRegion *mr,
- IOMMUNotifier *n)
+ IOMMUMRNotifier *n)
{
IOMMUMemoryRegion *iommu_mr;

@@ -1815,7 +1815,7 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,

/* We need to register for at least one bitfield */
iommu_mr = IOMMU_MEMORY_REGION(mr);
- assert(n->notifier_flags != IOMMU_NOTIFIER_NONE);
+ assert(n->notifier_flags != IOMMU_MR_EVENT_NONE);
assert(n->start <= n->end);
QLIST_INSERT_HEAD(&iommu_mr->iommu_notify, n, node);
memory_region_update_iommu_notify_flags(iommu_mr);
@@ -1831,7 +1831,8 @@ uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *iommu_mr)
return TARGET_PAGE_SIZE;
}

-void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
+void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr,
+ IOMMUMRNotifier *n)
{
MemoryRegion *mr = MEMORY_REGION(iommu_mr);
IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
@@ -1862,15 +1863,15 @@ void memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)

void memory_region_iommu_replay_all(IOMMUMemoryRegion *iommu_mr)
{
- IOMMUNotifier *notifier;
+ IOMMUMRNotifier *notifier;

- IOMMU_NOTIFIER_FOREACH(notifier, iommu_mr) {
+ IOMMU_MR_NOTIFIER_FOREACH(notifier, iommu_mr) {
memory_region_iommu_replay(iommu_mr, notifier);
}
}

void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
- IOMMUNotifier *n)
+ IOMMUMRNotifier *n)
{
IOMMUMemoryRegion *iommu_mr;

@@ -1883,10 +1884,10 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
memory_region_update_iommu_notify_flags(iommu_mr);
}

-void memory_region_notify_one(IOMMUNotifier *notifier,
+void memory_region_notify_one(IOMMUMRNotifier *notifier,
IOMMUTLBEntry *entry)
{
- IOMMUNotifierFlag request_flags;
+ IOMMUMREventFlag request_flags;

/*
* Skip the notification if the notification does not overlap
@@ -1898,9 +1899,9 @@ void memory_region_notify_one(IOMMUNotifier *notifier,
}

if (entry->perm & IOMMU_RW) {
- request_flags = IOMMU_NOTIFIER_MAP;
+ request_flags = IOMMU_MR_EVENT_MAP;
} else {
- request_flags = IOMMU_NOTIFIER_UNMAP;
+ request_flags = IOMMU_MR_EVENT_UNMAP;
}

if (notifier->notifier_flags & request_flags) {
@@ -1911,12 +1912,12 @@ void memory_region_notify_one(IOMMUNotifier *notifier,
void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
IOMMUTLBEntry entry)
{
- IOMMUNotifier *iommu_notifier;
+ IOMMUMRNotifier *iommu_mr_notifier;

assert(memory_region_is_iommu(MEMORY_REGION(iommu_mr)));

- IOMMU_NOTIFIER_FOREACH(iommu_notifier, iommu_mr) {
- memory_region_notify_one(iommu_notifier, &entry);
+ IOMMU_MR_NOTIFIER_FOREACH(iommu_mr_notifier, iommu_mr) {
+ memory_region_notify_one(iommu_mr_notifier, &entry);
}
}

--
1.9.1

Liu, Yi L

2017-11-03 12:01:53 UTC

Permalink

From: Peter Xu <***@redhat.com>

Provide AddressSpaceOps.iommu_get() in Intel IOMMU emulator.

Signed-off-by: Peter Xu <***@redhat.com>
Signed-off-by: Liu, Yi L <***@linux.intel.com>
---
hw/i386/intel_iommu.c | 7 +++++++
include/hw/i386/intel_iommu.h | 2 ++
2 files changed, 9 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e81c706..54343e5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2687,6 +2687,12 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
},
};

+static IOMMUObject *vtd_as_iommu_get(AddressSpace *as)
+{
+ VTDAddressSpace *vtd_dev_as = container_of(as, VTDAddressSpace, as);
+ return &vtd_dev_as->iommu_object;
+}
+
VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
{
uintptr_t key = (uintptr_t)bus;
@@ -2748,6 +2754,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
VTD_INTERRUPT_ADDR_FIRST,
&vtd_dev_as->iommu_ir, 64);
address_space_init(&vtd_dev_as->as, &vtd_dev_as->root, name);
+ vtd_dev_as->as.as_ops.iommu_get = vtd_as_iommu_get;
memory_region_add_subregion_overlap(&vtd_dev_as->root, 0,
&vtd_dev_as->sys_alias, 1);
memory_region_add_subregion_overlap(&vtd_dev_as->root, 0,
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index c85f9ff..a3c6d45 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -27,6 +27,7 @@
#include "hw/i386/ioapic.h"
#include "hw/pci/msi.h"
#include "hw/sysbus.h"
+#include "hw/core/iommu.h"

#define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
#define INTEL_IOMMU_DEVICE(obj) \
@@ -90,6 +91,7 @@ struct VTDAddressSpace {
MemoryRegion sys_alias;
MemoryRegion iommu_ir; /* Interrupt region: 0xfeeXXXXX */
IntelIOMMUState *iommu_state;
+ IOMMUObject iommu_object;
VTDContextCacheEntry context_cache_entry;
};

--
1.9.1

Liu, Yi L

2017-11-03 12:01:52 UTC

Permalink

From: Peter Xu <***@redhat.com>

AddressSpaceOps is similar to MemoryRegionOps, it's just for address
spaces to store arch-specific hooks.

The first hook I would like to introduce is iommu_get(). Return an
IOMMUObject behind the AddressSpace.

For systems that have IOMMUs, we will create a special address
space per device which is different from system default address
space for it (please refer to pci_device_iommu_address_space()).
Normally when that happens, there will be one specific IOMMU (or
say, translation unit) stands right behind that new address space.

This iommu_get() fetches that guy behind the address space. Here,
the guy is defined as IOMMUObject, which includes a notifier_list
so far, may extend in future. Along with IOMMUObject, a new iommu
notifier mechanism is introduced. It would be used for virt-svm.
Also IOMMUObject can further have a IOMMUObjectOps which is similar
to MemoryRegionOps. The difference is IOMMUObjectOps is not relied
on MemoryRegion.

Signed-off-by: Peter Xu <***@redhat.com>
Signed-off-by: Liu, Yi L <***@linux.intel.com>
---
hw/core/Makefile.objs | 1 +
hw/core/iommu.c | 58 +++++++++++++++++++++++++++++++++++++++
include/exec/memory.h | 22 +++++++++++++++
include/hw/core/iommu.h | 73 +++++++++++++++++++++++++++++++++++++++++++++++++
memory.c | 10 +++++--
5 files changed, 162 insertions(+), 2 deletions(-)
create mode 100644 hw/core/iommu.c
create mode 100644 include/hw/core/iommu.h

diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index f8d7a4a..d688412 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-y += fw-path-provider.o
# irq.o needed for qdev GPIO handling:
common-obj-y += irq.o
common-obj-y += hotplug.o
+common-obj-y += iommu.o
common-obj-y += nmi.o

common-obj-$(CONFIG_EMPTY_SLOT) += empty_slot.o
diff --git a/hw/core/iommu.c b/hw/core/iommu.c
new file mode 100644
index 0000000..7c4fcfe
--- /dev/null
+++ b/hw/core/iommu.c
@@ -0,0 +1,58 @@
+/*
+ * QEMU emulation of IOMMU logic
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ *
+ * Authors: Peter Xu <***@redhat.com>,
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/core/iommu.h"
+
+void iommu_notifier_register(IOMMUObject *iommu,
+ IOMMUNotifier *n,
+ IOMMUNotifyFn fn,
+ IOMMUEvent event)
+{
+ n->event = event;
+ n->iommu_notify = fn;
+ QLIST_INSERT_HEAD(&iommu->iommu_notifiers, n, node);
+ return;
+}
+
+void iommu_notifier_unregister(IOMMUObject *iommu,
+ IOMMUNotifier *notifier)
+{
+ IOMMUNotifier *cur, *next;
+
+ QLIST_FOREACH_SAFE(cur, &iommu->iommu_notifiers, node, next) {
+ if (cur == notifier) {
+ QLIST_REMOVE(cur, node);
+ break;
+ }
+ }
+}
+
+void iommu_notify(IOMMUObject *iommu, IOMMUEventData *event_data)
+{
+ IOMMUNotifier *cur;
+
+ QLIST_FOREACH(cur, &iommu->iommu_notifiers, node) {
+ if ((cur->event == event_data->event) && cur->iommu_notify) {
+ cur->iommu_notify(cur, event_data);
+ }
+ }
+}
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 03595e3..8350973 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -26,6 +26,7 @@
#include "qom/object.h"
#include "qemu/rcu.h"
#include "hw/qdev-core.h"
+#include "hw/core/iommu.h"

#define RAM_ADDR_INVALID (~(ram_addr_t)0)

@@ -301,6 +302,19 @@ struct MemoryListener {
};

/**
+ * AddressSpaceOps: callbacks structure for address space specific operations
+ *
+ * @iommu_get: returns an IOMMU object that backs the address space.
+ * Normally this should be NULL for generic address
+ * spaces, and it's only used when there is one
+ * translation unit behind this address space.
+ */
+struct AddressSpaceOps {
+ IOMMUObject *(*iommu_get)(AddressSpace *as);
+};
+typedef struct AddressSpaceOps AddressSpaceOps;
+
+/**
* AddressSpace: describes a mapping of addresses to #MemoryRegion objects
*/
struct AddressSpace {
@@ -316,6 +330,7 @@ struct AddressSpace {
struct MemoryRegionIoeventfd *ioeventfds;
QTAILQ_HEAD(memory_listeners_as, MemoryListener) listeners;
QTAILQ_ENTRY(AddressSpace) address_spaces_link;
+ AddressSpaceOps as_ops;
};

FlatView *address_space_to_flatview(AddressSpace *as);
@@ -1988,6 +2003,13 @@ address_space_write_cached(MemoryRegionCache *cache, hwaddr addr,
address_space_write(cache->as, cache->xlat + addr, MEMTXATTRS_UNSPECIFIED, buf, len);
}

+/**
+ * address_space_iommu_get: Get the backend IOMMU for the address space
+ *
+ * @as: the address space to fetch IOMMU from
+ */
+IOMMUObject *address_space_iommu_get(AddressSpace *as);
+
#endif

#endif
diff --git a/include/hw/core/iommu.h b/include/hw/core/iommu.h
new file mode 100644
index 0000000..34387c0
--- /dev/null
+++ b/include/hw/core/iommu.h
@@ -0,0 +1,73 @@
+/*
+ * QEMU emulation of IOMMU logic
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ *
+ * Authors: Peter Xu <***@redhat.com>,
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_CORE_IOMMU_H
+#define HW_CORE_IOMMU_H
+
+#include "qemu/queue.h"
+
+enum IOMMUEvent {
+ IOMMU_EVENT_BIND_PASIDT,
+};
+typedef enum IOMMUEvent IOMMUEvent;
+
+struct IOMMUEventData {
+ IOMMUEvent event;
+ uint64_t length;
+ void *data;
+};
+typedef struct IOMMUEventData IOMMUEventData;
+
+typedef struct IOMMUNotifier IOMMUNotifier;
+
+typedef void (*IOMMUNotifyFn)(IOMMUNotifier *notifier,
+ IOMMUEventData *event_data);
+
+struct IOMMUNotifier {
+ IOMMUNotifyFn iommu_notify;
+ /*
+ * What events we are listening to. Let's allow multiple event
+ * registrations from beginning.
+ */
+ IOMMUEvent event;
+ QLIST_ENTRY(IOMMUNotifier) node;
+};
+
+typedef struct IOMMUObject IOMMUObject;
+
+/*
+ * This stands for an IOMMU unit. Any translation device should have
+ * this struct inside its own structure to make sure it can leverage
+ * common IOMMU functionalities.
+ */
+struct IOMMUObject {
+ QLIST_HEAD(, IOMMUNotifier) iommu_notifiers;
+};
+
+void iommu_notifier_register(IOMMUObject *iommu,
+ IOMMUNotifier *n,
+ IOMMUNotifyFn fn,
+ IOMMUEvent event);
+void iommu_notifier_unregister(IOMMUObject *iommu,
+ IOMMUNotifier *notifier);
+void iommu_notify(IOMMUObject *iommu, IOMMUEventData *event_data);
+
+#endif
diff --git a/memory.c b/memory.c
index 77fb3ef..307f665 100644
--- a/memory.c
+++ b/memory.c
@@ -235,8 +235,6 @@ struct FlatView {
MemoryRegion *root;
};

-typedef struct AddressSpaceOps AddressSpaceOps;
-
#define FOR_EACH_FLAT_RANGE(var, view) \
for (var = (view)->ranges; var < (view)->ranges + (view)->nr; ++var)

@@ -2793,6 +2791,14 @@ static void do_address_space_destroy(AddressSpace *as)
memory_region_unref(as->root);
}

+IOMMUObject *address_space_iommu_get(AddressSpace *as)
+{
+ if (!as->as_ops.iommu_get) {
+ return NULL;
+ }
+ return as->as_ops.iommu_get(as);
+}
+
void address_space_destroy(AddressSpace *as)
{
MemoryRegion *root = as->root;

--
1.9.1

Liu, Yi L

2017-11-14 14:20:26 UTC

Permalink

Hi Eric,

Hi Yi L,

David had an objection in the past about this method, saying that
several IOMMUs could translate a single AS?
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg01610.html
In
https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/pci/pci-iommu.txt,
it is said
"a given PCI device can only master through one IOMMU"

Post by Liu, Yi L
For systems that have IOMMUs, we will create a special address
space per device which is different from system default address
space for it (please refer to pci_device_iommu_address_space()).
Normally when that happens, there will be one specific IOMMU (or
say, translation unit) stands right behind that new address space.

standing

Post by Liu, Yi L
This iommu_get() fetches that guy behind the address space. Here,
the guy is defined as IOMMUObject, which includes a notifier_list
so far, may extend in future. Along with IOMMUObject, a new iommu
notifier mechanism is introduced. It would be used for virt-svm.
Also IOMMUObject can further have a IOMMUObjectOps which is similar
to MemoryRegionOps. The difference is IOMMUObjectOps is not relied

relying

Post by Liu, Yi L
on MemoryRegion.

I think I would split this patch into a 1 first patch introducing the
iommu object and a second adding the AS ops and address_space_iommu_get()

Good point.

May be rephrased as it does not really explain what the iommu object
exposes as an API

yes, may need to rephrase to address it clear.

Post by Liu, Yi L
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/core/iommu.h"
+ IOMMUNotifier *n)
+
+void iommu_notifier_register(IOMMUObject *iommu,
+ IOMMUNotifier *n,
+ IOMMUNotifyFn fn,
+ IOMMUEvent event)
+{
+ n->event = event;
+ n->iommu_notify = fn;
+ QLIST_INSERT_HEAD(&iommu->iommu_notifiers, n, node);
+ return;
+}
+
+void iommu_notifier_unregister(IOMMUObject *iommu,
+ IOMMUNotifier *notifier)
+{
+ IOMMUNotifier *cur, *next;
+
+ QLIST_FOREACH_SAFE(cur, &iommu->iommu_notifiers, node, next) {
+ if (cur == notifier) {
+ QLIST_REMOVE(cur, node);
+ break;
+ }
+ }
+}
+
+void iommu_notify(IOMMUObject *iommu, IOMMUEventData *event_data)
+{
+ IOMMUNotifier *cur;
+
+ QLIST_FOREACH(cur, &iommu->iommu_notifiers, node) {
+ if ((cur->event == event_data->event) && cur->iommu_notify) {

can cur->iommu_notify be NULL if registered as above?

the cur->event is actually initialized with a event and also a iommu_notify.
So if the cur->event meets the requested event type, then it should be
non-NULL.

Post by Liu, Yi L
+ cur->iommu_notify(cur, event_data);
+ }
+ }
+}
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 03595e3..8350973 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -26,6 +26,7 @@
#include "qom/object.h"
#include "qemu/rcu.h"
#include "hw/qdev-core.h"
+#include "hw/core/iommu.h"
#define RAM_ADDR_INVALID (~(ram_addr_t)0)
@@ -301,6 +302,19 @@ struct MemoryListener {
};
/**
+ * AddressSpaceOps: callbacks structure for address space specific operations
+ *
+ * Normally this should be NULL for generic address
+ * spaces, and it's only used when there is one
+ * translation unit behind this address space.
+ */
+struct AddressSpaceOps {
+ IOMMUObject *(*iommu_get)(AddressSpace *as);
+};
+typedef struct AddressSpaceOps AddressSpaceOps;
+
+/**
* AddressSpace: describes a mapping of addresses to #MemoryRegion objects
*/
struct AddressSpace {
@@ -316,6 +330,7 @@ struct AddressSpace {
struct MemoryRegionIoeventfd *ioeventfds;
QTAILQ_HEAD(memory_listeners_as, MemoryListener) listeners;
QTAILQ_ENTRY(AddressSpace) address_spaces_link;
+ AddressSpaceOps as_ops;
};
FlatView *address_space_to_flatview(AddressSpace *as);
@@ -1988,6 +2003,13 @@ address_space_write_cached(MemoryRegionCache *cache, hwaddr addr,
address_space_write(cache->as, cache->xlat + addr, MEMTXATTRS_UNSPECIFIED, buf, len);
}
+/**
+ * address_space_iommu_get: Get the backend IOMMU for the address space
+ *
+ */
+IOMMUObject *address_space_iommu_get(AddressSpace *as);
+
#endif
#endif
diff --git a/include/hw/core/iommu.h b/include/hw/core/iommu.h
new file mode 100644
index 0000000..34387c0
--- /dev/null
+++ b/include/hw/core/iommu.h
@@ -0,0 +1,73 @@
+/*
+ * QEMU emulation of IOMMU logic
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_CORE_IOMMU_H
+#define HW_CORE_IOMMU_H
+
+#include "qemu/queue.h"
+
+enum IOMMUEvent {
+ IOMMU_EVENT_BIND_PASIDT,
+};
+typedef enum IOMMUEvent IOMMUEvent;
+
+struct IOMMUEventData {
+ IOMMUEvent event;

/* length and opaque data passed to notifiers */ ?

yes, it is. But the "void *data" would be replaced with well defined structure.
No plan to do it as opaque. Once this patchset is accepted, later patchset
would define it as；

struct IOMMUEventData {
IOMMUEvent event;
uint64_t length;
union {
struct pasid_table_config pasidt_info;
struct tlb_invalidate_info tlb_info;
};
};
typedef struct IOMMUEventData IOMMUEventData;

This is in my further patchset. Currently, we want to show the idea of
introducing this new notifier framework.

Post by Liu, Yi L
+ uint64_t length;
+ void *data;
+};
+typedef struct IOMMUEventData IOMMUEventData;
+
+typedef struct IOMMUNotifier IOMMUNotifier;
+
+typedef void (*IOMMUNotifyFn)(IOMMUNotifier *notifier,
+ IOMMUEventData *event_data);
+
+struct IOMMUNotifier {
+ IOMMUNotifyFn iommu_notify;
+ /*
+ * What events we are listening to. Let's allow multiple event
+ * registrations from beginning.
+ */
+ IOMMUEvent event;

/* the event the notifier is sensitive to ? */

events(aka. operations) like bind pasid table/flush 1st level tlb. etc.

Post by Liu, Yi L
+ QLIST_ENTRY(IOMMUNotifier) node;
+};
+
+typedef struct IOMMUObject IOMMUObject;
+
+/*
+ * This stands for an IOMMU unit. Any translation device should have
+ * this struct inside its own structure to make sure it can leverage
+ * common IOMMU functionalities.
+ */
+struct IOMMUObject {
+ QLIST_HEAD(, IOMMUNotifier) iommu_notifiers;

where is the QLIST_INIT supposed to be done?

yeah, need to add it accordingly.

Thanks,
Yi L

Thanks
Eric

Post by Liu, Yi L
+};
+
+void iommu_notifier_register(IOMMUObject *iommu,
+ IOMMUNotifier *n,
+ IOMMUNotifyFn fn,
+ IOMMUEvent event);
+void iommu_notifier_unregister(IOMMUObject *iommu,
+ IOMMUNotifier *notifier);
+void iommu_notify(IOMMUObject *iommu, IOMMUEventData *event_data);
+
+#endif
diff --git a/memory.c b/memory.c
index 77fb3ef..307f665 100644
--- a/memory.c
+++ b/memory.c
@@ -235,8 +235,6 @@ struct FlatView {
MemoryRegion *root;
};
-typedef struct AddressSpaceOps AddressSpaceOps;
-
#define FOR_EACH_FLAT_RANGE(var, view) \
for (var = (view)->ranges; var < (view)->ranges + (view)->nr; ++var)
@@ -2793,6 +2791,14 @@ static void do_address_space_destroy(AddressSpace *as)
memory_region_unref(as->root);
}
+IOMMUObject *address_space_iommu_get(AddressSpace *as)
+{
+ if (!as->as_ops.iommu_get) {
+ return NULL;
+ }
+ return as->as_ops.iommu_get(as);
+}
+
void address_space_destroy(AddressSpace *as)
{
MemoryRegion *root = as->root;

Liu, Yi L

2017-11-14 13:59:04 UTC

Permalink

On Tue, Nov 14, 2017 at 09:53:07AM +0100, Auger Eric wrote:
Hi Eric,

Hi Yi L,

Post by Liu, Yi L
AddressSpaceOps is similar to MemoryRegionOps, it's just for address
spaces to store arch-specific hooks.
The first hook I would like to introduce is iommu_get(). Return an
IOMMUObject behind the AddressSpace.
For systems that have IOMMUs, we will create a special address
space per device which is different from system default address
space for it (please refer to pci_device_iommu_address_space()).
Normally when that happens, there will be one specific IOMMU (or
say, translation unit) stands right behind that new address space.
This iommu_get() fetches that guy behind the address space. Here,
the guy is defined as IOMMUObject, which includes a notifier_list
so far, may extend in future. Along with IOMMUObject, a new iommu
notifier mechanism is introduced. It would be used for virt-svm.
Also IOMMUObject can further have a IOMMUObjectOps which is similar
to MemoryRegionOps. The difference is IOMMUObjectOps is not relied
on MemoryRegion.

Hi, sorry I didn't reply to the earlier postings of this after our
discussion in China. I've been sick several times and very busy.

Hi David,
Fully understood. I'll try my best to address your question. Also,
feel free to input further questions, anyhow, the more we discuss the
better work we done.

I still don't feel like there's an adequate explanation of exactly
what an IOMMUObject represents. Obviously it can represent more than

IOMMUObject is aimed to represent the iommu itself. e.g. the iommu
specific operations. One of the key purpose of IOMMUObject is to
introduce a notifier framework to let iommu emulator to be able to
do iommu operations other than MAP/UNMAP. As IOMMU grows more and
more feature, MAP/UNMAP is not the only operation iommu emulator needs
to deal. e.g. shared virtual memory. So far, as I know AMD/ARM also
has it. may correct me on it. As my cover letter mentioned, MR based
notifier framework doesn’t work for the newly added IOMMU operations.
Like bind guest pasid table pointer to host and propagate guest's
iotlb flush to host.

a single translation window - since that's represented by the
IOMMUMR. But what exactly do all the MRs - or whatever else - that
are represented by the IOMMUObject have in common, from a functional
point of view.

Let me take virt-SVM as an example. As far as I know, for virt-SVM,
the implementation of different vendors are similar. The key design
is to have a nested translation(aka. two stage translation). It is to
have guest maintain gVA->gPA mapping and hypervisor builds gPA->hPA
mapping. Similar to EPT based virt-MMU solution.
In Qemu, gPA->hPA mapping is done through MAP/UNMAP notifier, it can
keep going. But for gVA->gPA mapping, only guest knows it, so hypervisor
needs to trap specific guest iommu operation and pass the gVA->gPA
mapping knowledge to host through a notifier(newly added one). In VT-d,
it is called bind guest pasid table to host.

What I don't get is the PASID table is per extended context entry. I
understand this latter is indexed by PCI device function. And today MR
are created per PCIe device if I am not wrong.

In my understanding, MR is more related to AddressSpace not exactly tagged
with PCIe device.

So why can't we have 1
new MR notifier dedicated to PASID table passing? My understanding is
the MR, having a 1-1 correspondence with a PCIe device and thus a
context could be of right granularity. Then I understand the only flags

I didn't quite get your point regards to the "granlarity" here. May talk
a little bit more here?

we currently have are NONE, MAP and UNMAP but couldn't we add a new one
for PASID TABLE passing? So this is not crystal clear to me why MR
notifiers are not adapted to PASID table passing.

This is also my initial plan. You may get some detail in the link below.
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05295.html

In brief, the new notifier I want to add is not really MR related and
just more like a behaviour of a translation unit. Also, as my cover letter
mentioned that the MR notifiers would not be registered for VT-d(virt-svm)
in region_add(). Then it requires to register MR notifiers out of the
region_add(). Also paste below. I think it more or less breaks the
consistency of MR notifiers. Also, I think existing MR notifiers are more
related IOVA address translation(on VT-d it is 2nd level translation, for ARM
is it stage 2?), and there is some existing codes highly rely on this
assumption. e.g. the memory_replay introduced by Peter, the notifier node
would affect the replay. If adding a non MAP/UNMAP notifier, it would break
the logic. So it's also a reason to add a separate framework instead of
just adding a new flag to the existing MR notifier framework.

https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04931.html

+ /* Check if vIOMMU exists */
+ QTAILQ_FOREACH(subregion, &as->root->subregions, subregions_link) {
+ if (memory_region_is_iommu(subregion)) {
+ IOMMUNotifier n1;
+
+ /*
+ FIXME: current iommu notifier is actually designed for
+ IOMMUTLB MAP/UNMAP. However, vIOMMU emulator may need
+ notifiers other than MAP/UNMAP, so it'll be better to
+ split the non-IOMMUTLB notifier from the current IOMMUTLB
+ notifier framewrok.
+ */
+ iommu_notifier_init(&n1, vfio_iommu_bind_pasid_tbl_notify,
+ IOMMU_NOTIFIER_SVM_PASIDT_BIND,
+ 0,
+ 0);
+ vfio_register_notifier(group->container,
+ subregion,
+ 0,
+ &n1);
+ }
+ }
+

Also, for the gVA iotlb flushing, only guest knows it. So hypervisor
needs to propagate it to host. Here, MAP/UNMAP is not suitable since
this gVA iotlb flush here doesn’t require to modify host iommu
translation table.

I don't really get this argument. IOMMUNotifier just is a notifier that
is attached to an IOMMU MR and calls a an IOMMUNotify function, right?

yes, it is.

Then the role of the function currently is attached to the currently
existing flags, MAP, UNMAP. This is not linked to an action on the
physical IOMMU, right?

The MAP/UNMAP notifier finally talks to physical IOMMU. is it? My point
here is MAP/UNMAP finally would talk to physical IOMMU change the translation
page table in memory. However, for virt-svm case, the translation page
table is the process vaddr page table(though the I/O page table is also used
we don't need to talk it since hypervisor owns it). process vaddr page table
is owned by guest, changes to the translation page table is by guest. So for
such cache, just need to flush the cache in iommu side. no need to modify
translation page table.

At the time gVA iotlb flush is issued, the gVA->gPA

mapping has already modified. Host iommu only needs to reference it when
performing address translation. But before host iommu perform translation,
it needs to flush the old gVA cache. In VT-d, it is called 1st level cache
flushing.

The fact MR notifiers may not be relevant could be linked to the nature
of the tagging of the structures you want to flush. My understanding is
if you want to flush by source-id, MR granularity could be fine. Please
could you clarify why do you need an iommu wide operation in that case.

The flush is not limited to source-id granularity, it would be page selected
and others. As I mentioned, it has no requirement to modify the translation
page table, so it is more like a translation unit behavior.

Both of the two notifiers(operations) has no direct relationship with MR,
instead they highly depends on virt-iommu itself.

As described above this is not obvious to me. Please could you clarify
why source-id granularity (which I understand has a 1-1 granularity with
MR/AS is not the correct granularity). Of course, please correct me if
my understanding of MR mapping is not correct.

It's correct that for the PCIe device, the iova address space(aka PCI address
space) has kind of 1-1 relationship with MR. But, for virt-SVM, the address
space is not limted to iova address space, it has process address space, how
can such an address space relate to a MR...

Thanks,
Yi L

Thanks
Eric
If virt-iommu exists,

then the two notfiers are needed, if not, then it's not.

Even understanding the SVM stuff better than I did, I don't really see
why an AddressSpace is an obvious unit to have an IOMMUObject
associated with it.

This will benefit the notifier registration. As my comments above, the
IOMMUObject is to represent iommu. Associate an AddressSpace with an
IOMMUObject makes it easy to check if it is necessary to register the
notifiers.

If no IOMMUObject, means no virt-iommu exposed to guest, then

no need to register notifiers. For this, I also considered to use the
MemoryRegion.iommu_ops. e.g. for VT-d, it can be a loop to check all the
subregions and register notfiers if it is an iommu MemoryRegion. Peter
mentioned it may not work for SPAR. So he proposed associating an
AddressSpace with an IOMMUObject. I think it wroks and easier, so I
didn’t object it.
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04931.html
+ /* Check if vIOMMU exists */
+ QTAILQ_FOREACH(subregion, &as->root->subregions, subregions_link) {
+ if (memory_region_is_iommu(subregion)) {
+ IOMMUNotifier n1;
+
+ /*
+ FIXME: current iommu notifier is actually designed for
+ IOMMUTLB MAP/UNMAP. However, vIOMMU emulator may need
+ notifiers other than MAP/UNMAP, so it'll be better to
+ split the non-IOMMUTLB notifier from the current IOMMUTLB
+ notifier framewrok.
+ */
+ iommu_notifier_init(&n1, vfio_iommu_bind_pasid_tbl_notify,
+ IOMMU_NOTIFIER_SVM_PASIDT_BIND,
+ 0,
+ 0);
+ vfio_register_notifier(group->container,
+ subregion,
+ 0,
+ &n1);
+ }
+ }
+
Thanks,
Yi L

Post by Liu, Yi L
---
hw/core/Makefile.objs | 1 +
hw/core/iommu.c | 58 +++++++++++++++++++++++++++++++++++++++
include/exec/memory.h | 22 +++++++++++++++
include/hw/core/iommu.h | 73 +++++++++++++++++++++++++++++++++++++++++++++++++
memory.c | 10 +++++--
5 files changed, 162 insertions(+), 2 deletions(-)
create mode 100644 hw/core/iommu.c
create mode 100644 include/hw/core/iommu.h
diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index f8d7a4a..d688412 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-y += fw-path-provider.o
common-obj-y += irq.o
common-obj-y += hotplug.o
+common-obj-y += iommu.o
common-obj-y += nmi.o
common-obj-$(CONFIG_EMPTY_SLOT) += empty_slot.o
diff --git a/hw/core/iommu.c b/hw/core/iommu.c
new file mode 100644
index 0000000..7c4fcfe
--- /dev/null
+++ b/hw/core/iommu.c
@@ -0,0 +1,58 @@
+/*
+ * QEMU emulation of IOMMU logic
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/core/iommu.h"
+
+void iommu_notifier_register(IOMMUObject *iommu,
+ IOMMUNotifier *n,
+ IOMMUNotifyFn fn,
+ IOMMUEvent event)
+{
+ n->event = event;
+ n->iommu_notify = fn;
+ QLIST_INSERT_HEAD(&iommu->iommu_notifiers, n, node);
+ return;
+}
+
+void iommu_notifier_unregister(IOMMUObject *iommu,
+ IOMMUNotifier *notifier)
+{
+ IOMMUNotifier *cur, *next;
+
+ QLIST_FOREACH_SAFE(cur, &iommu->iommu_notifiers, node, next) {
+ if (cur == notifier) {
+ QLIST_REMOVE(cur, node);
+ break;
+ }
+ }
+}
+
+void iommu_notify(IOMMUObject *iommu, IOMMUEventData *event_data)
+{
+ IOMMUNotifier *cur;
+
+ QLIST_FOREACH(cur, &iommu->iommu_notifiers, node) {
+ if ((cur->event == event_data->event) && cur->iommu_notify) {
+ cur->iommu_notify(cur, event_data);
+ }
+ }
+}
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 03595e3..8350973 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -26,6 +26,7 @@
#include "qom/object.h"
#include "qemu/rcu.h"
#include "hw/qdev-core.h"
+#include "hw/core/iommu.h"
#define RAM_ADDR_INVALID (~(ram_addr_t)0)
@@ -301,6 +302,19 @@ struct MemoryListener {
};
/**
+ * AddressSpaceOps: callbacks structure for address space specific operations
+ *
+ * Normally this should be NULL for generic address
+ * spaces, and it's only used when there is one
+ * translation unit behind this address space.
+ */
+struct AddressSpaceOps {
+ IOMMUObject *(*iommu_get)(AddressSpace *as);
+};
+typedef struct AddressSpaceOps AddressSpaceOps;
+
+/**
* AddressSpace: describes a mapping of addresses to #MemoryRegion objects
*/
struct AddressSpace {
@@ -316,6 +330,7 @@ struct AddressSpace {
struct MemoryRegionIoeventfd *ioeventfds;
QTAILQ_HEAD(memory_listeners_as, MemoryListener) listeners;
QTAILQ_ENTRY(AddressSpace) address_spaces_link;
+ AddressSpaceOps as_ops;
};
FlatView *address_space_to_flatview(AddressSpace *as);
@@ -1988,6 +2003,13 @@ address_space_write_cached(MemoryRegionCache *cache, hwaddr addr,
address_space_write(cache->as, cache->xlat + addr, MEMTXATTRS_UNSPECIFIED, buf, len);
}
+/**
+ * address_space_iommu_get: Get the backend IOMMU for the address space
+ *
+ */
+IOMMUObject *address_space_iommu_get(AddressSpace *as);
+
#endif
#endif
diff --git a/include/hw/core/iommu.h b/include/hw/core/iommu.h
new file mode 100644
index 0000000..34387c0
--- /dev/null
+++ b/include/hw/core/iommu.h
@@ -0,0 +1,73 @@
+/*
+ * QEMU emulation of IOMMU logic
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_CORE_IOMMU_H
+#define HW_CORE_IOMMU_H
+
+#include "qemu/queue.h"
+
+enum IOMMUEvent {
+ IOMMU_EVENT_BIND_PASIDT,
+};
+typedef enum IOMMUEvent IOMMUEvent;
+
+struct IOMMUEventData {
+ IOMMUEvent event;
+ uint64_t length;
+ void *data;
+};
+typedef struct IOMMUEventData IOMMUEventData;
+
+typedef struct IOMMUNotifier IOMMUNotifier;
+
+typedef void (*IOMMUNotifyFn)(IOMMUNotifier *notifier,
+ IOMMUEventData *event_data);
+
+struct IOMMUNotifier {
+ IOMMUNotifyFn iommu_notify;
+ /*
+ * What events we are listening to. Let's allow multiple event
+ * registrations from beginning.
+ */
+ IOMMUEvent event;
+ QLIST_ENTRY(IOMMUNotifier) node;
+};
+
+typedef struct IOMMUObject IOMMUObject;
+
+/*
+ * This stands for an IOMMU unit. Any translation device should have
+ * this struct inside its own structure to make sure it can leverage
+ * common IOMMU functionalities.
+ */
+struct IOMMUObject {
+ QLIST_HEAD(, IOMMUNotifier) iommu_notifiers;
+};
+
+void iommu_notifier_register(IOMMUObject *iommu,
+ IOMMUNotifier *n,
+ IOMMUNotifyFn fn,
+ IOMMUEvent event);
+void iommu_notifier_unregister(IOMMUObject *iommu,
+ IOMMUNotifier *notifier);
+void iommu_notify(IOMMUObject *iommu, IOMMUEventData *event_data);
+
+#endif
diff --git a/memory.c b/memory.c
index 77fb3ef..307f665 100644
--- a/memory.c
+++ b/memory.c
@@ -235,8 +235,6 @@ struct FlatView {
MemoryRegion *root;
};
-typedef struct AddressSpaceOps AddressSpaceOps;
-
#define FOR_EACH_FLAT_RANGE(var, view) \
for (var = (view)->ranges; var < (view)->ranges + (view)->nr; ++var)
@@ -2793,6 +2791,14 @@ static void do_address_space_destroy(AddressSpace *as)
memory_region_unref(as->root);
}
+IOMMUObject *address_space_iommu_get(AddressSpace *as)
+{
+ if (!as->as_ops.iommu_get) {
+ return NULL;
+ }
+ return as->as_ops.iommu_get(as);
+}
+
void address_space_destroy(AddressSpace *as)
{
MemoryRegion *root = as->root;

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Auger Eric

2017-11-14 21:52:54 UTC

Permalink

Hi Yi L,

Post by Liu, Yi L
Hi Eric,

Hi Yi L,

Hi, sorry I didn't reply to the earlier postings of this after our
discussion in China. I've been sick several times and very busy.

Hi David,
Fully understood. I'll try my best to address your question. Also,
feel free to input further questions, anyhow, the more we discuss the
better work we done.

I still don't feel like there's an adequate explanation of exactly
what an IOMMUObject represents. Obviously it can represent more than

What I don't get is the PASID table is per extended context entry. I
understand this latter is indexed by PCI device function. And today MR
are created per PCIe device if I am not wrong.

In my understanding, MR is more related to AddressSpace not exactly tagged
with PCIe device.

I meant, in the current intel_iommu code, vtd_find_add_as() creates 1
IOMMU MR and 1 AS per PCIe device, right?