Peter Xu
2018-05-04 03:08:01 UTC
v2:
- fix patchew code style warnings
- interval tree: postpone malloc when inserting; simplify node remove
a bit where proper [Jason]
- fix up comment and commit message for iommu lock patch [Kevin]
- protect context cache too using the iommu lock [Kevin, Jason]
- add vast comment in patch 8 to explain the modify-PTE problem
[Jason, Kevin]
Online repo:
https://github.com/xzpeter/qemu/tree/fix-vtd-dma
This series fixes several major problems that current code has:
- Issue 1: when getting very big PSI UNMAP invalidations, the current
code is buggy in that we might skip the notification while actually
we should always send that notification.
- Issue 2: IOTLB is not thread safe, while block dataplane can be
accessing and updating it in parallel.
- Issue 3: For devices that only registered with UNMAP-only notifiers,
we don't really need to do page walking for PSIs, we can directly
deliver the notification down. For example, vhost.
- Issue 4: unsafe window for MAP notified devices like vfio-pci (and
in the future, vDPA as well). The problem is that, now for domain
invalidations we do this to make sure the shadow page tables are
correctly synced:
1. unmap the whole address space
2. replay the whole address space, map existing pages
However during step 1 and 2 there will be a very tiny window (it can
be as big as 3ms) that the shadow page table is either invalid or
incomplete (since we're rebuilding it up). That's fatal error since
devices never know that happending and it's still possible to DMA to
memories.
Patch 1 fixes issue 1. I put it at the first since it's picked from
an old post.
Patch 2 is a cleanup to remove useless IntelIOMMUNotifierNode struct.
Patch 3 fixes issue 2.
Patch 4 fixes issue 3.
Patch 5-9 fix issue 4. Here a very simple interval tree is
implemented based on Gtree. It's different with general interval tree
in that it does not allow user to pass in private data (e.g.,
translated addresses). However that benefits us that then we can
merge adjacent interval leaves so that hopefully we won't consume much
memory even if the mappings are a lot (that happens for nested virt -
when mapping the whole L2 guest RAM range, it can be at least in GBs).
Patch 10 is another big cleanup only can work after patch 9.
Tests:
- device assignments to L1, even L2 guests. With this series applied
(and the kernel IOMMU patches: https://lkml.org/lkml/2018/4/18/5),
we can even nest vIOMMU now, e.g., we can specify vIOMMU in L2 guest
with assigned devices and things will work. We can't before.
- vhost smoke test for regression.
Please review. Thanks,
Peter Xu (10):
intel-iommu: send PSI always even if across PDEs
intel-iommu: remove IntelIOMMUNotifierNode
intel-iommu: add iommu lock
intel-iommu: only do page walk for MAP notifiers
intel-iommu: introduce vtd_page_walk_info
intel-iommu: pass in address space when page walk
util: implement simple interval tree logic
intel-iommu: maintain per-device iova ranges
intel-iommu: don't unmap all for shadow page table
intel-iommu: remove notify_unmap for page walk
include/hw/i386/intel_iommu.h | 19 ++-
include/qemu/interval-tree.h | 130 +++++++++++++++
hw/i386/intel_iommu.c | 306 +++++++++++++++++++++++++---------
util/interval-tree.c | 208 +++++++++++++++++++++++
hw/i386/trace-events | 3 +-
util/Makefile.objs | 1 +
6 files changed, 579 insertions(+), 88 deletions(-)
create mode 100644 include/qemu/interval-tree.h
create mode 100644 util/interval-tree.c
- fix patchew code style warnings
- interval tree: postpone malloc when inserting; simplify node remove
a bit where proper [Jason]
- fix up comment and commit message for iommu lock patch [Kevin]
- protect context cache too using the iommu lock [Kevin, Jason]
- add vast comment in patch 8 to explain the modify-PTE problem
[Jason, Kevin]
Online repo:
https://github.com/xzpeter/qemu/tree/fix-vtd-dma
This series fixes several major problems that current code has:
- Issue 1: when getting very big PSI UNMAP invalidations, the current
code is buggy in that we might skip the notification while actually
we should always send that notification.
- Issue 2: IOTLB is not thread safe, while block dataplane can be
accessing and updating it in parallel.
- Issue 3: For devices that only registered with UNMAP-only notifiers,
we don't really need to do page walking for PSIs, we can directly
deliver the notification down. For example, vhost.
- Issue 4: unsafe window for MAP notified devices like vfio-pci (and
in the future, vDPA as well). The problem is that, now for domain
invalidations we do this to make sure the shadow page tables are
correctly synced:
1. unmap the whole address space
2. replay the whole address space, map existing pages
However during step 1 and 2 there will be a very tiny window (it can
be as big as 3ms) that the shadow page table is either invalid or
incomplete (since we're rebuilding it up). That's fatal error since
devices never know that happending and it's still possible to DMA to
memories.
Patch 1 fixes issue 1. I put it at the first since it's picked from
an old post.
Patch 2 is a cleanup to remove useless IntelIOMMUNotifierNode struct.
Patch 3 fixes issue 2.
Patch 4 fixes issue 3.
Patch 5-9 fix issue 4. Here a very simple interval tree is
implemented based on Gtree. It's different with general interval tree
in that it does not allow user to pass in private data (e.g.,
translated addresses). However that benefits us that then we can
merge adjacent interval leaves so that hopefully we won't consume much
memory even if the mappings are a lot (that happens for nested virt -
when mapping the whole L2 guest RAM range, it can be at least in GBs).
Patch 10 is another big cleanup only can work after patch 9.
Tests:
- device assignments to L1, even L2 guests. With this series applied
(and the kernel IOMMU patches: https://lkml.org/lkml/2018/4/18/5),
we can even nest vIOMMU now, e.g., we can specify vIOMMU in L2 guest
with assigned devices and things will work. We can't before.
- vhost smoke test for regression.
Please review. Thanks,
Peter Xu (10):
intel-iommu: send PSI always even if across PDEs
intel-iommu: remove IntelIOMMUNotifierNode
intel-iommu: add iommu lock
intel-iommu: only do page walk for MAP notifiers
intel-iommu: introduce vtd_page_walk_info
intel-iommu: pass in address space when page walk
util: implement simple interval tree logic
intel-iommu: maintain per-device iova ranges
intel-iommu: don't unmap all for shadow page table
intel-iommu: remove notify_unmap for page walk
include/hw/i386/intel_iommu.h | 19 ++-
include/qemu/interval-tree.h | 130 +++++++++++++++
hw/i386/intel_iommu.c | 306 +++++++++++++++++++++++++---------
util/interval-tree.c | 208 +++++++++++++++++++++++
hw/i386/trace-events | 3 +-
util/Makefile.objs | 1 +
6 files changed, 579 insertions(+), 88 deletions(-)
create mode 100644 include/qemu/interval-tree.h
create mode 100644 util/interval-tree.c
--
2.17.0
2.17.0