1. 27 May, 2021 1 commit
  2. 21 Apr, 2021 6 commits
    • Brijesh Singh's avatar
      KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command · 15fb7de1
      Brijesh Singh authored
      
      
      The command is used for copying the incoming buffer into the
      SEV guest memory space.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <c5d0e3e719db7bb37ea85d79ed4db52e9da06257.1618498113.git.ashish.kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      15fb7de1
    • Brijesh Singh's avatar
      KVM: SVM: Add support for KVM_SEV_RECEIVE_START command · af43cbbf
      Brijesh Singh authored
      
      
      The command is used to create the encryption context for an incoming
      SEV guest. The encryption context can be later used by the hypervisor
      to import the incoming data into the SEV guest memory space.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <c7400111ed7458eee01007c4d8d57cdf2cbb0fc2.1618498113.git.ashish.kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      af43cbbf
    • Steve Rutherford's avatar
      KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command · 5569e2e7
      Steve Rutherford authored
      
      
      After completion of SEND_START, but before SEND_FINISH, the source VMM can
      issue the SEND_CANCEL command to stop a migration. This is necessary so
      that a cancelled migration can restart with a new target later.
      
      Reviewed-by: default avatarNathan Tempelman <natet@google.com>
      Reviewed-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarSteve Rutherford <srutherford@google.com>
      Message-Id: <20210412194408.2458827-1-srutherford@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5569e2e7
    • Brijesh Singh's avatar
      KVM: SVM: Add KVM_SEND_UPDATE_DATA command · d3d1af85
      Brijesh Singh authored
      
      
      The command is used for encrypting the guest memory region using the encryption
      context created with KVM_SEV_SEND_START.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by : Steve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <d6a6ea740b0c668b30905ae31eac5ad7da048bb3.1618498113.git.ashish.kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d3d1af85
    • Brijesh Singh's avatar
      KVM: SVM: Add KVM_SEV SEND_START command · 4cfdd47d
      Brijesh Singh authored
      
      
      The command is used to create an outgoing SEV guest encryption context.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarSteve Rutherford <srutherford@google.com>
      Reviewed-by: default avatarVenu Busireddy <venu.busireddy@oracle.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <2f1686d0164e0f1b3d6a41d620408393e0a48376.1618498113.git.ashish.kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4cfdd47d
    • Nathan Tempelman's avatar
      KVM: x86: Support KVM VMs sharing SEV context · 54526d1f
      Nathan Tempelman authored
      
      
      Add a capability for userspace to mirror SEV encryption context from
      one vm to another. On our side, this is intended to support a
      Migration Helper vCPU, but it can also be used generically to support
      other in-guest workloads scheduled by the host. The intention is for
      the primary guest and the mirror to have nearly identical memslots.
      
      The primary benefits of this are that:
      1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
      can't accidentally clobber each other.
      2) The VMs can have different memory-views, which is necessary for post-copy
      migration (the migration vCPUs on the target need to read and write to
      pages, when the primary guest would VMEXIT).
      
      This does not change the threat model for AMD SEV. Any memory involved
      is still owned by the primary guest and its initial state is still
      attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
      to circumvent SEV, they could achieve the same effect by simply attaching
      a vCPU to the primary VM.
      This patch deliberately leaves userspace in charge of the memslots for the
      mirror, as it already has the power to mess with them in the primary guest.
      
      This patch does not support SEV-ES (much less SNP), as it does not
      handle handing off attested VMSAs to the mirror.
      
      For additional context, we need a Migration Helper because SEV PSP
      migration is far too slow for our live migration on its own. Using
      an in-guest migrator lets us speed this up significantly.
      
      Signed-off-by: default avatarNathan Tempelman <natet@google.com>
      Message-Id: <20210408223214.2582277-1-natet@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      54526d1f
  3. 20 Apr, 2021 1 commit
    • Sean Christopherson's avatar
      KVM: x86: Add capability to grant VM access to privileged SGX attribute · fe7e9488
      Sean Christopherson authored
      
      
      Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by userspace
      to grant a VM access to a priveleged attribute, with args[0] holding a
      file handle to a valid SGX attribute file.
      
      The SGX subsystem restricts access to a subset of enclave attributes to
      provide additional security for an uncompromised kernel, e.g. to prevent
      malware from using the PROVISIONKEY to ensure its nodes are running
      inside a geniune SGX enclave and/or to obtain a stable fingerprint.
      
      To prevent userspace from circumventing such restrictions by running an
      enclave in a VM, KVM restricts guest access to privileged attributes by
      default.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarKai Huang <kai.huang@intel.com>
      Message-Id: <0b099d65e933e068e3ea934b0523bab070cb8cea.1618196135.git.kai.huang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fe7e9488
  4. 17 Apr, 2021 1 commit
  5. 07 Apr, 2021 1 commit
  6. 02 Mar, 2021 1 commit
    • David Woodhouse's avatar
      KVM: x86/xen: Add support for vCPU runstate information · 30b5c851
      David Woodhouse authored
      
      
      This is how Xen guests do steal time accounting. The hypervisor records
      the amount of time spent in each of running/runnable/blocked/offline
      states.
      
      In the Xen accounting, a vCPU is still in state RUNSTATE_running while
      in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules
      does the state become RUNSTATE_blocked. In KVM this means that even when
      the vCPU exits the kvm_run loop, the state remains RUNSTATE_running.
      
      The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the
      KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use
      KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given
      amount of time to the blocked state and subtract it from the running
      state.
      
      The state_entry_time corresponds to get_kvmclock_ns() at the time the
      vCPU entered the current state, and the total times of all four states
      should always add up to state_entry_time.
      
      Co-developed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20210301125309.874953-2-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      30b5c851
  7. 10 Feb, 2021 1 commit
  8. 04 Feb, 2021 11 commits
  9. 07 Jan, 2021 1 commit
    • Tom Lendacky's avatar
      KVM: SVM: Add support for booting APs in an SEV-ES guest · 647daca2
      Tom Lendacky authored
      
      
      Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
      where the guest vCPU register state is updated and then the vCPU is VMRUN
      to begin execution of the AP. For an SEV-ES guest, this won't work because
      the guest register state is encrypted.
      
      Following the GHCB specification, the hypervisor must not alter the guest
      register state, so KVM must track an AP/vCPU boot. Should the guest want
      to park the AP, it must use the AP Reset Hold exit event in place of, for
      example, a HLT loop.
      
      First AP boot (first INIT-SIPI-SIPI sequence):
        Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
        support. It is up to the guest to transfer control of the AP to the
        proper location.
      
      Subsequent AP boot:
        KVM will expect to receive an AP Reset Hold exit event indicating that
        the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
        awaken it. When the AP Reset Hold exit event is received, KVM will place
        the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
        sequence, KVM will make the vCPU runnable. It is again up to the guest
        to then transfer control of the AP to the proper location.
      
        To differentiate between an actual HLT and an AP Reset Hold, a new MP
        state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
        placed in upon receiving the AP Reset Hold exit event. Additionally, to
        communicate the AP Reset Hold exit event up to userspace (if needed), a
        new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
      
      A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
      to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
      original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
      a new function that, for non SEV-ES guests, invokes the original SIPI
      delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
      implements the logic above.
      
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      647daca2
  10. 15 Nov, 2020 2 commits
    • Peter Xu's avatar
      KVM: X86: Implement ring-based dirty memory tracking · fb04a1ed
      Peter Xu authored
      This patch is heavily based on previous work from Lei Cao
      <lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1]
      
      KVM currently uses large bitmaps to track dirty memory.  These bitmaps
      are copied to userspace when userspace queries KVM for its dirty page
      information.  The use of bitmaps is mostly sufficient for live
      migration, as large parts of memory are be dirtied from one log-dirty
      pass to another.  However, in a checkpointing system, the number of
      dirty pages is small and in fact it is often bounded---the VM is
      paused when it has dirtied a pre-defined number of pages. Traversing a
      large, sparsely populated bitmap to find set bits is time-consuming,
      as is copying the bitmap to user-space.
      
      A similar issue will be there for live migration when the guest memory
      is huge while the page dirty procedure is trivial.  In that case for
      each dirty sync we need to pull the whole dirty bitmap to userspace
      and analyse every bit even if it's mostly zeros.
      
      The preferred data structure for above scenarios is a dense list of
      guest frame numbers (GFN).  This patch series stores the dirty list in
      kernel memory that can be memory mapped into userspace to allow speedy
      harvesting.
      
      This patch enables dirty ring for X86 only.  However it should be
      easily extended to other archs as well.
      
      [1] https://patchwork.kernel.org/patch/10471409/
      
      
      
      Signed-off-by: default avatarLei Cao <lei.cao@stratus.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012222.5767-1-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fb04a1ed
    • Vitaly Kuznetsov's avatar
      KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl · c21d54f0
      Vitaly Kuznetsov authored
      
      
      KVM_GET_SUPPORTED_HV_CPUID is a vCPU ioctl but its output is now
      independent from vCPU and in some cases VMMs may want to use it as a system
      ioctl instead. In particular, QEMU doesn CPU feature expansion before any
      vCPU gets created so KVM_GET_SUPPORTED_HV_CPUID can't be used.
      
      Convert KVM_GET_SUPPORTED_HV_CPUID to 'dual' system/vCPU ioctl with the
      same meaning.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200929150944.1235688-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c21d54f0
  11. 21 Oct, 2020 1 commit
  12. 28 Sep, 2020 2 commits
    • Alexander Graf's avatar
      KVM: x86: Introduce MSR filtering · 1a155254
      Alexander Graf authored
      
      
      It's not desireable to have all MSRs always handled by KVM kernel space. Some
      MSRs would be useful to handle in user space to either emulate behavior (like
      uCode updates) or differentiate whether they are valid based on the CPU model.
      
      To allow user space to specify which MSRs it wants to see handled by KVM,
      this patch introduces a new ioctl to push filter rules with bitmaps into
      KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access.
      With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the
      denied MSR events to user space to operate on.
      
      If no filter is populated, MSR handling stays identical to before.
      
      Signed-off-by: default avatarAlexander Graf <graf@amazon.com>
      
      Message-Id: <20200925143422.21718-8-graf@amazon.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1a155254
    • Alexander Graf's avatar
      KVM: x86: Allow deflecting unknown MSR accesses to user space · 1ae09954
      Alexander Graf authored
      
      
      MSRs are weird. Some of them are normal control registers, such as EFER.
      Some however are registers that really are model specific, not very
      interesting to virtualization workloads, and not performance critical.
      Others again are really just windows into package configuration.
      
      Out of these MSRs, only the first category is necessary to implement in
      kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
      certain CPU models and MSRs that contain information on the package level
      are much better suited for user space to process. However, over time we have
      accumulated a lot of MSRs that are not the first category, but still handled
      by in-kernel KVM code.
      
      This patch adds a generic interface to handle WRMSR and RDMSR from user
      space. With this, any future MSR that is part of the latter categories can
      be handled in user space.
      
      Furthermore, it allows us to replace the existing "ignore_msrs" logic with
      something that applies per-VM rather than on the full system. That way you
      can run productive VMs in parallel to experimental ones where you don't care
      about proper MSR handling.
      
      Signed-off-by: default avatarAlexander Graf <graf@amazon.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      
      Message-Id: <20200925143422.21718-3-graf@amazon.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1ae09954
  13. 11 Sep, 2020 1 commit
    • Huacai Chen's avatar
      KVM: MIPS: Change the definition of kvm type · 15e9e35c
      Huacai Chen authored
      MIPS defines two kvm types:
      
       #define KVM_VM_MIPS_TE          0
       #define KVM_VM_MIPS_VZ          1
      
      In Documentation/virt/kvm/api.rst it is said that "You probably want to
      use 0 as machine type", which implies that type 0 be the "automatic" or
      "default" type. And, in user-space libvirt use the null-machine (with
      type 0) to detect the kvm capability, which returns "KVM not supported"
      on a VZ platform.
      
      I try to fix it in QEMU but it is ugly:
      https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html
      
      And Thomas Huth suggests me to change the definition of kvm type:
      https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html
      
      
      
      So I define like this:
      
       #define KVM_VM_MIPS_AUTO        0
       #define KVM_VM_MIPS_VZ          1
       #define KVM_VM_MIPS_TE          2
      
      Since VZ and TE cannot co-exists, using type 0 on a TE platform will
      still return success (so old user-space tools have no problems on new
      kernels); the advantage is that using type 0 on a VZ platform will not
      return failure. So, the only problem is "new user-space tools use type
      2 on old kernels", but if we treat this as a kernel bug, we can backport
      this patch to old stable kernels.
      
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      15e9e35c
  14. 21 Aug, 2020 1 commit
  15. 10 Jul, 2020 1 commit
    • Mohammed Gamal's avatar
      KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support · 3edd6839
      Mohammed Gamal authored
      
      
      This patch adds a new capability KVM_CAP_SMALLER_MAXPHYADDR which
      allows userspace to query if the underlying architecture would
      support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly
      (e.g. qemu can decide if it should warn for -cpu ..,phys-bits=X)
      
      The complications in this patch are due to unexpected (but documented)
      behaviour we see with NPF vmexit handling in AMD processor.  If
      SVM is modified to add guest physical address checks in the NPF
      and guest #PF paths, we see the followning error multiple times in
      the 'access' test in kvm-unit-tests:
      
                  test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001
                  Dump mapping: address: 0x123400000000
                  ------L4: 24c3027
                  ------L3: 24c4027
                  ------L2: 24c5021
                  ------L1: 1002000021
      
      This is because the PTE's accessed bit is set by the CPU hardware before
      the NPF vmexit. This is handled completely by hardware and cannot be fixed
      in software.
      
      Therefore, availability of the new capability depends on a boolean variable
      allow_smaller_maxphyaddr which is set individually by VMX and SVM init
      routines. On VMX it's always set to true, on SVM it's only set to true
      when NPT is not enabled.
      
      CC: Tom Lendacky <thomas.lendacky@amd.com>
      CC: Babu Moger <babu.moger@amd.com>
      Signed-off-by: default avatarMohammed Gamal <mgamal@redhat.com>
      Message-Id: <20200710154811.418214-10-mgamal@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3edd6839
  16. 08 Jul, 2020 1 commit
  17. 23 Jun, 2020 1 commit
  18. 01 Jun, 2020 3 commits
  19. 24 Apr, 2020 1 commit
  20. 20 Apr, 2020 1 commit
  21. 26 Mar, 2020 1 commit
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Add a capability for enabling secure guests · 9a5788c6
      Paul Mackerras authored
      
      
      At present, on Power systems with Protected Execution Facility
      hardware and an ultravisor, a KVM guest can transition to being a
      secure guest at will.  Userspace (QEMU) has no way of knowing
      whether a host system is capable of running secure guests.  This
      will present a problem in future when the ultravisor is capable of
      migrating secure guests from one host to another, because
      virtualization management software will have no way to ensure that
      secure guests only run in domains where all of the hosts can
      support secure guests.
      
      This adds a VM capability which has two functions: (a) userspace
      can query it to find out whether the host can support secure guests,
      and (b) userspace can enable it for a guest, which allows that
      guest to become a secure guest.  If userspace does not enable it,
      KVM will return an error when the ultravisor does the hypercall
      that indicates that the guest is starting to transition to a
      secure guest.  The ultravisor will then abort the transition and
      the guest will terminate.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarRam Pai <linuxram@us.ibm.com>
      9a5788c6