1. 11 Dec, 2019 16 commits
    • Julien Thierry's avatar
      vfio: Add support for BAR configuration · 6d0f8798
      Julien Thierry authored
      When a guest can reassign BARs, kvmtool needs to maintain the vfio_region
      consistent with their corresponding BARs. Take the new updated addresses
      from the PCI header read back from the vfio driver.
      
      Also, to modify the BARs, it is expected that guests will disable
      IO/Memory response in the PCI command. Support this by mapping/unmapping
      regions when the corresponding response gets enabled/disabled.
      Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
      [Fixed BAR selection]
      Signed-off-by: Alexandru Elisei's avatarAlexandru Elisei <alexandru.elisei@arm.com>
      6d0f8798
    • Alexandru Elisei's avatar
      vfio: pci: Don't assume that only even numbered BARs are 64bit · c10de1ca
      Alexandru Elisei authored
      Not all devices have the bottom 32 bits of a 64 bit BAR in an even
      numbered BAR. For example, on an NVIDIA Quadro P400, BARs 1 and 3 are
      64bit. Remove this assumption.
      c10de1ca
    • Alexandru Elisei's avatar
      vfio: pci: Allocate correct size for MSIX table and PBA BARs · 667ac704
      Alexandru Elisei authored
      kvmtool assumes that the BAR that holds the address for the MSIX table
      and PBA structure has a size which is equal to the total size of the
      table and the PBA structure and it allocates memory from MMIO space
      accordingly.  However, when initializing the BARs, the BAR size is set
      according to the region size reported by VFIO. When the physical BAR
      size is greater than what kvmtool allocated, we can have a situation
      where the BAR overlaps with another BAR, in which case kvmtool will fail
      to map the memory. This was found when trying to do PCI passthrough on a
      PCIe Realtek r8168 NIC.  Let's fix this by allocating an amount of MMIO
      memory equal to table + PBA size or BAR size, whichever is greater.
      667ac704
    • Alexandru Elisei's avatar
      virtio/pci: Add support for BAR configuration · e07a44d2
      Alexandru Elisei authored
      A device's Memory or I/O space address can be written by software in a
      Base Address Register (BAR). Allow the BARs to be programable by
      registering the mmio or ioport emulation when access is enabled for that
      region, not when the virtual machine is created.
      e07a44d2
    • Alexandru Elisei's avatar
      Use independent read/write locks for ioport and mmio · 85f439ec
      Alexandru Elisei authored
      kvmtool uses brlock for protecting accesses to the ioport and mmio
      red-black trees. brlock allows concurrent reads, but only one writer,
      which is assumed not to be a VCPU thread. This is done by issuing a
      compiler barrier on read and pausing the entire virtual machine on
      writes. When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread
      read/write lock.
      
      When we will implement reassignable BARs, the mmio or ioport mapping
      will be done as a result of a VCPU mmio access. When brlock is a
      read/write lock, it means that we will try to acquire a write lock with
      the read lock already held by the same VCPU and we will deadlock. When
      it's not, a VCPU will have to call kvm__pause, which means the virtual
      machine will stay paused forever.
      
      Let's avoid all this by using separate pthread_rwlock_t locks for the
      mmio and the ioport red-black trees and carefully choosing our read
      critical region such that modification as a result of a guest mmio
      access doesn't deadlock.
      
      In theory, this leaves us with a small window of opportunity for a VCPU
      to modify a node used by another VCPU. Inserting in the trees is done by
      the main thread before starting the virtual machine, and deleting is
      done after the virtual machine has been paused to be destroyed, so in
      practice this can only happen if the guest is bugged.
      85f439ec
    • Alexandru Elisei's avatar
      virtio/pci: Ignore MMIO and I/O accesses when they are disabled · a5b99219
      Alexandru Elisei authored
      A device's response to memory or I/O accesses is disabled when Memory
      Space, respectively I/O Space, is set to 0 in the Command register.
      According to the PCI Local Bus Specification Revision 3.0, those two
      bits reset to 0.
      
      Let's respect the specifiction, so set Command and I/O Space to 0 on
      reset, and ignore accesses to a device's respective regions when they
      are disabled.
      a5b99219
    • Julien Thierry's avatar
      virtio/pci: Make memory and IO BARs independent · 830ffba7
      Julien Thierry authored
      Currently, callbacks for memory BAR 1 call the IO port emulation.  This
      means that the memory BAR needs I/O Space to be enabled whenever Memory
      Space is enabled.
      
      Refactor the code so the two type of  BARs are independent. Also, unify
      ioport/mmio callback arguments so that they all receive a virtio_device.
      Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
      830ffba7
    • Julien Thierry's avatar
      arm/pci: Do not use first PCI IO space bytes for devices · be9790f1
      Julien Thierry authored
      Linux has this convention that the lower 0x1000 bytes of the IO space
      should not be used. (cf PCIBIOS_MIN_IO).
      
      Just allocate those bytes to prevent future allocation assigning it to
      devices.
      Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
      be9790f1
    • Julien Thierry's avatar
      arm/pci: Fix PCI IO region · 528920f4
      Julien Thierry authored
      Current PCI IO region that is exposed through the DT contains ports that
      are reserved by non-PCI devices.
      
      Use the proper PCI IO start so that the region exposed through DT can
      actually be used to reassign device BARs.
      Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
      528920f4
    • Julien Thierry's avatar
      pci: Fix ioport allocation size · 215bfb2e
      Julien Thierry authored
      The PCI Local Bus Specification, Rev. 3.0,
      Section 6.2.5.1. "Address Maps" states:
      "Devices that map control functions into I/O Space must not consume more
      than 256 bytes per I/O Base Address register."
      
      Yet all the PCI devices allocate IO ports of IOPORT_SIZE (= 1024 bytes).
      
      Fix this by having PCI devices use 256 bytes ports for IO BARs.
      
      There is no hard requirement on the size of the memory region described
      by memory BARs. However, the region must be big enough to hold the
      virtio common interface described in [1], which is 20 bytes, and other
      MSI-X and/or device specific configuration. To be consistent, let's also
      limit the memory region described by BAR1 to 256. This is the same size
      used by BAR2 for each of the two MSI-X vectors.
      
      [1] VIRTIO Version 1.0 Committee Specification 04, section 4.4.8.
      Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
      [Added rationale for changing BAR1 size to PCI_IO_SIZE]
      Signed-off-by: Alexandru Elisei's avatarAlexandru Elisei <alexandru.elisei@arm.com>
      215bfb2e
    • Julien Thierry's avatar
      ioport: pci: Move port allocations to PCI devices · 870836a9
      Julien Thierry authored
      The dynamic ioport allocation with IOPORT_EMPTY is currently only used
      by PCI devices. Other devices use fixed ports for which they request
      registration to the ioport API.
      
      PCI ports need to be in the PCI IO space and there is no reason ioport
      API should know a PCI port is being allocated and needs to be placed in
      PCI IO space. This currently just happens to be the case.
      
      Move the responsability of dynamic allocation of ioports from the ioport
      API to PCI.
      
      In the future, if other types of devices also need dynamic ioport
      allocation, they'll have to figure out the range of ports they are
      allowed to use.
      Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
      [Renamed functions for clarity]
      Signed-off-by: Alexandru Elisei's avatarAlexandru Elisei <alexandru.elisei@arm.com>
      870836a9
    • Alexandru Elisei's avatar
      arm: pci.c: Advertise only PCI bus 0 in the DT · c2a8c2a7
      Alexandru Elisei authored
      The "bus-range" property encodes the first and last bus number. kvmtool
      uses bus 0 for PCI and bus 1 for MMIO.  Advertise only the PCI bus in
      the PCI DT node by setting "bus-range" to <0, 0>.
      c2a8c2a7
    • Alexandru Elisei's avatar
      Check that a PCI device's memory size is power of two · d7b6a894
      Alexandru Elisei authored
      According to the PCI local bus specification [1], a device's memory size
      must be a power of two. This is also implicit in the mechanism that a CPU
      uses to get the memory size requirement for a PCI device.
      
      The vesa device requests a memory size that isn't a power of two.
      According to the same spec [1], a device is allowed to consume more memory
      than it actually requires. As a result, the amount of memory that the vesa
      device now reserves has been increased.
      
      To prevent slip-ups in the future, a few BUILD_BUG_ON statements were added
      in places where the memory size is known at compile time.
      
      [1] PCI Local Bus Specification Revision 3.0, section 6.2.5.1
      d7b6a894
    • Alexandru Elisei's avatar
      Remove pci-shmem device · 5cb470e4
      Alexandru Elisei authored
      The pci-shmem emulated device ("ivshmem") was created by QEMU for
      cross-VM data sharing. The only Linux driver that uses this device is
      the Android Virtual System on a Chip staging driver, which also mentions
      a character device driver implemented on top of shmem, which was removed
      from Linux.
      
      On the kvmtool side, the only commits touching the pci-shmem device
      since it was introduced in 2012 were made when refactoring various
      kvmtool subsystems. Let's remove the maintenance burden on the kvmtool
      maintainers and remove this unused device.
      5cb470e4
    • Sami Mujawar's avatar
      pci: Fix BAR resource sizing arbitration · 49f1e01f
      Sami Mujawar authored
      According to the 'PCI Local Bus Specification, Revision 3.0,
      February 3, 2004, Section 6.2.5.1, Implementation Notes, page 227'
      
          "Software saves the original value of the Base Address register,
          writes 0 FFFF FFFFh to the register, then reads it back. Size
          calculation can be done from the 32-bit value read by first
          clearing encoding information bits (bit 0 for I/O, bits 0-3 for
          memory), inverting all 32 bits (logical NOT), then incrementing
          by 1. The resultant 32-bit value is the memory/I/O range size
          decoded by the register. Note that the upper 16 bits of the result
          is ignored if the Base Address register is for I/O and bits 16-31
          returned zero upon read."
      
      kvmtool was returning the actual BAR resource size which would be
      incorrect as the software software drivers would invert all 32 bits
      (logical NOT), then incrementing by 1. This ends up with a very large
      resource size (in some cases more than 4GB) due to which drivers
      assert/fail to work.
      
      e.g if the BAR resource size was 0x1000, kvmtool would return 0x1000
      instead of 0xFFFFF00x.
      
      Fixed pci__config_wr() to return the size of the BAR in accordance with
      the PCI Local Bus specification, Implementation Notes.
      Signed-off-by: Sami Mujawar's avatarSami Mujawar <sami.mujawar@arm.com>
      Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
      [Reworked algorithm, removed power-of-two check]
      Signed-off-by: Alexandru Elisei's avatarAlexandru Elisei <alexandru.elisei@arm.com>
      49f1e01f
    • Alexandru Elisei's avatar
      Makefile: Use correct objcopy binary when cross-compiling for x86_64 · 4fbff78d
      Alexandru Elisei authored
      Use the compiler toolchain version of objcopy instead of the native one
      when cross-compiling for the x86_64 architecture.
      4fbff78d
  2. 02 Aug, 2019 1 commit
  3. 03 Jul, 2019 4 commits
  4. 10 Jun, 2019 2 commits
    • Andre Przywara's avatar
      run: Check for ghost socket file upon VM creation · ef5b941f
      Andre Przywara authored
      Kvmtool creates a (debug) UNIX socket file for each VM, using its
      (possibly auto-generated) name as the filename. There is a check using
      access(), which bails out with an error message if a socket with that
      name already exists.
      
      Aside from this check being unnecessary, as the bind() call later would
      complain as well, this is also racy. But more annoyingly the bail out is
      not needed most of the time: an existing socket inode is most likely just
      an orphaned leftover from a previous kvmtool run, which just failed to
      remove that file, because of a crash, for instance.
      
      Upon finding such a collision, let's first try to connect to that socket,
      to detect if there is still a kvmtool instance listening on the other
      end. If that fails, this socket will never come back to life, so we can
      safely clean it up and reuse the name for the new guest.
      However if the connect() succeeds, there is an actual live kvmtool
      instance using this name, so not proceeding is the only option.
      This should never happen with the (PID based) automatically generated
      names, though.
      
      This avoids an annoying (and not helpful) error message and helps
      automated kvmtool runs to proceed in more cases.
      Signed-off-by: Andre Przywara's avatarAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      ef5b941f
    • Andre Przywara's avatar
      list: Clean up ghost socket files · 67f9f7b7
      Andre Przywara authored
      When kvmtool (or the host kernel) crashes or gets killed, we cannot
      automatically remove the socket file we created for that VM.
      A later call of "lkvm list" iterates over all those files and complains
      about those "ghost socket files", as there is no one listening on
      the other side. Also sometimes the automatic guest name generation
      happens to generate the same name again, so an unrelated "lkvm run"
      later complains and stops, which is bad for automation.
      
      As the only code doing a listen() on this socket is kvmtool upon VM
      *creation*, such an orphaned socket file will never come back to life,
      so we can as well unlink() those sockets in the code. This spares the
      user from doing it herself.
      We keep the message in the code to notify the user of this.
      Signed-off-by: Andre Przywara's avatarAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      67f9f7b7
  5. 29 May, 2019 4 commits
  6. 26 Apr, 2019 13 commits