Commit c9888d95 authored by Jean-Philippe Brucker's avatar Jean-Philippe Brucker Committed by Will Deacon
Browse files

vfio-pci: add MSI-X support

Add virtual MSI-X tables for PCI devices, and create IRQFD routes to let
the kernel inject MSIs from a physical device directly into the guest.

It would be tempting to create the MSI routes at init time before starting
vCPUs, when we can afford to exit gracefully. But some of it must be
initialized when the guest requests it.

* On the KVM side, MSIs must be enabled after devices allocate their IRQ
  lines and irqchips are operational, which can happen until late_init.

* On the VFIO side, hardware state of devices may be updated when setting
  up MSIs. For example, when passing a virtio-pci-legacy device to the

  (1) The device-specific configuration layout (in BAR0) depends on
      whether MSIs are enabled or not in the device. If they are enabled,
      the device-specific configuration starts at offset 24, otherwise it
      starts at offset 20.
  (2) Linux guest assumes that MSIs are initially disabled (doesn't
      actually check the capability). So it reads the device config at
      offset 20.
  (3) Had we enabled MSIs early, host would have enabled the MSI-X
      capability and device would return the config at offset 24.
  (4) The guest would read junk and explode.

Therefore we have to create MSI-X routes when the guest requests MSIs, and
enable/disable them in VFIO when the guest pokes the MSI-X capability. We
have to follow both physical and virtual state of the capability, which
makes the state machine a bit complex, but I think it works.

An important missing feature is the absence of pending MSI handling. When
a vector or the function is masked, we should rewire the IRQFD to a
special thread that keeps note of pending interrupts (or just poll the
IRQFD before recreating the route?). And when the vector is unmasked, one
MSI should be injected if it was pending. At the moment no MSI is
injected, we simply disconnect the IRQFD and all messages are lost.

Reviewed-by: default avatarPunit Agrawal <>
Signed-off-by: Jean-Philippe Brucker's avatarJean-Philippe Brucker <>
Signed-off-by: default avatarWill Deacon <>
parent 6078a454
#ifndef KVM__VFIO_H
#define KVM__VFIO_H
#include "kvm/mutex.h"
#include "kvm/parse-options.h"
#include "kvm/pci.h"
......@@ -24,8 +25,59 @@ enum vfio_device_type {
/* MSI/MSI-X capability enabled */
/* MSI/MSI-X capability or individual vector masked */
#define VFIO_PCI_MSI_STATE_MASKED (1 << 1)
/* MSI-X capability has no vector enabled yet */
#define VFIO_PCI_MSI_STATE_EMPTY (1 << 2)
struct vfio_pci_msi_entry {
struct msix_table config;
int gsi;
int eventfd;
u8 phys_state;
u8 virt_state;
struct vfio_pci_msix_table {
size_t size;
unsigned int bar;
u32 guest_phys_addr;
struct vfio_pci_msix_pba {
size_t size;
off_t offset; /* in VFIO device fd */
unsigned int bar;
u32 guest_phys_addr;
/* Common data for MSI and MSI-X */
struct vfio_pci_msi_common {
off_t pos;
u8 virt_state;
u8 phys_state;
struct mutex mutex;
struct vfio_irq_info info;
struct vfio_irq_set *irq_set;
size_t nr_entries;
struct vfio_pci_msi_entry *entries;
#define VFIO_PCI_IRQ_MODE_INTX (1 << 0)
#define VFIO_PCI_IRQ_MODE_MSI (1 << 1)
#define VFIO_PCI_IRQ_MODE_MSIX (1 << 2)
struct vfio_pci_device {
struct pci_device_header hdr;
unsigned long irq_modes;
int intx_fd;
unsigned int intx_gsi;
struct vfio_pci_msi_common msix;
struct vfio_pci_msix_table msix_table;
struct vfio_pci_msix_pba msix_pba;
struct vfio_region {
This diff is collapsed.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment