1. 27 May, 2021 1 commit
  2. 07 May, 2021 7 commits
  3. 03 May, 2021 1 commit
    • Maxim Levitsky's avatar
      KVM: nSVM: fix few bugs in the vmcb02 caching logic · c74ad08f
      Maxim Levitsky authored
      * Define and use an invalid GPA (all ones) for init value of last
        and current nested vmcb physical addresses.
      
      * Reset the current vmcb12 gpa to the invalid value when leaving
        the nested mode, similar to what is done on nested vmexit.
      
      * Reset	the last seen vmcb12 address when disabling the nested SVM,
        as it relies on vmcb02 fields which are freed at that point.
      
      Fixes: 4995a368
      
       ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210503125446.1353307-3-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c74ad08f
  4. 21 Apr, 2021 1 commit
    • Nathan Tempelman's avatar
      KVM: x86: Support KVM VMs sharing SEV context · 54526d1f
      Nathan Tempelman authored
      
      
      Add a capability for userspace to mirror SEV encryption context from
      one vm to another. On our side, this is intended to support a
      Migration Helper vCPU, but it can also be used generically to support
      other in-guest workloads scheduled by the host. The intention is for
      the primary guest and the mirror to have nearly identical memslots.
      
      The primary benefits of this are that:
      1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
      can't accidentally clobber each other.
      2) The VMs can have different memory-views, which is necessary for post-copy
      migration (the migration vCPUs on the target need to read and write to
      pages, when the primary guest would VMEXIT).
      
      This does not change the threat model for AMD SEV. Any memory involved
      is still owned by the primary guest and its initial state is still
      attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
      to circumvent SEV, they could achieve the same effect by simply attaching
      a vCPU to the primary VM.
      This patch deliberately leaves userspace in charge of the memslots for the
      mirror, as it already has the power to mess with them in the primary guest.
      
      This patch does not support SEV-ES (much less SNP), as it does not
      handle handing off attested VMSAs to the mirror.
      
      For additional context, we need a Migration Helper because SEV PSP
      migration is far too slow for our live migration on its own. Using
      an in-guest migrator lets us speed this up significantly.
      Signed-off-by: default avatarNathan Tempelman <natet@google.com>
      Message-Id: <20210408223214.2582277-1-natet@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      54526d1f
  5. 20 Apr, 2021 3 commits
  6. 19 Apr, 2021 2 commits
  7. 17 Apr, 2021 4 commits
  8. 18 Mar, 2021 3 commits
    • Sean Christopherson's avatar
      KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish · b318e8de
      Sean Christopherson authored
      Fix a plethora of issues with MSR filtering by installing the resulting
      filter as an atomic bundle instead of updating the live filter one range
      at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
      the hardware MSR bitmaps won't be updated until the next VM-Enter, but
      the relevant software struct is atomically updated, which is what KVM
      really needs.
      
      Similar to the approach used for modifying memslots, make arch.msr_filter
      a SRCU-protected pointer, do all the work configuring the new filter
      outside of kvm->lock, and then acquire kvm->lock only when the new filter
      has been vetted and created.  That way vCPU readers either see the old
      filter or the new filter in their entirety, not some half-baked state.
      
      Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
      TOCTOU bug, but that's just the tip of the iceberg...
      
        - Nothing is __rcu annotated, making it nigh impossible to audit the
          code for correctness.
        - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
          coding style aside, the lack of a smb_rmb() anywhere casts all code
          into doubt.
        - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
          count before taking the lock.
        - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
      
      The entire approach of updating the live filter is also flawed.  While
      installing a new filter is inherently racy if vCPUs are running, fixing
      the above issues also makes it trivial to ensure certain behavior is
      deterministic, e.g. KVM can provide deterministic behavior for MSRs with
      identical settings in the old and new filters.  An atomic update of the
      filter also prevents KVM from getting into a half-baked state, e.g. if
      installing a filter fails, the existing approach would leave the filter
      in a half-baked state, having already committed whatever bits of the
      filter were already processed.
      
      [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
      
      Fixes: 1a155254
      
       ("KVM: x86: Introduce MSR filtering")
      Cc: stable@vger.kernel.org
      Cc: Alexander Graf <graf@amazon.com>
      Reported-by: default avatarYuan Yao <yaoyuan0329os@gmail.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210316184436.2544875-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b318e8de
    • Ingo Molnar's avatar
      x86: Fix various typos in comments · d9f6e12f
      Ingo Molnar authored
      
      
      Fix ~144 single-word typos in arch/x86/ code comments.
      
      Doing this in a single commit should reduce the churn.
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-kernel@vger.kernel.org
      d9f6e12f
    • Vitaly Kuznetsov's avatar
      KVM: x86: hyper-v: Track Hyper-V TSC page status · cc9cfddb
      Vitaly Kuznetsov authored
      
      
      Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it
      was updated from guest/host side or if we've failed to set it up (because
      e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no
      need to retry.
      
      Also, in a hypothetical situation when we are in 'always catchup' mode for
      TSC we can now avoid contending 'hv->hv_lock' on every guest enter by
      setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters()
      returns false.
      
      Check for HV_TSC_PAGE_SET state instead of '!hv->tsc_ref.tsc_sequence' in
      get_time_ref_counter() to properly handle the situation when we failed to
      write the updated TSC page values to the guest.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210316143736.964151-4-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cc9cfddb
  9. 15 Mar, 2021 8 commits
    • Sean Christopherson's avatar
      KVM: x86: Get active PCID only when writing a CR3 value · e83bc09c
      Sean Christopherson authored
      
      
      Retrieve the active PCID only when writing a guest CR3 value, i.e. don't
      get the PCID when using EPT or NPT.  The PCID is especially problematic
      for EPT as the bits have different meaning, and so the PCID and must be
      manually stripped, which is annoying and unnecessary.  And on VMX,
      getting the active PCID also involves reading the guest's CR3 and
      CR4.PCIDE, i.e. may add pointless VMREADs.
      
      Opportunistically rename the pgd/pgd_level params to root_hpa and
      root_level to better reflect their new roles.  Keep the function names,
      as "load the guest PGD" is still accurate/correct.
      
      Last, and probably least, pass root_hpa as a hpa_t/u64 instead of an
      unsigned long.  The EPTP holds a 64-bit value, even in 32-bit mode, so
      in theory EPT could support HIGHMEM for 32-bit KVM.  Never mind that
      doing so would require changing the MMU page allocators and reworking
      the MMU to use kmap().
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210305183123.3978098-2-seanjc@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e83bc09c
    • Sean Christopherson's avatar
      KVM: x86/mmu: Move logic for setting SPTE masks for EPT into the MMU proper · e7b7bdea
      Sean Christopherson authored
      
      
      Let the MMU deal with the SPTE masks to avoid splitting the logic and
      knowledge across the MMU and VMX.
      
      The SPTE masks that are used for EPT are very, very tightly coupled to
      the MMU implementation.  The use of available bits, the existence of A/D
      types, the fact that shadow_x_mask even exists, and so on and so forth
      are all baked into the MMU implementation.  Cross referencing the params
      to the masks is also a nightmare, as pretty much every param is a u64.
      
      A future patch will make the location of the MMU_WRITABLE and
      HOST_WRITABLE bits MMU specific, to free up bit 11 for a MMU_PRESENT bit.
      Doing that change with the current kvm_mmu_set_mask_ptes() would be an
      absolute mess.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210225204749.1512652-18-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e7b7bdea
    • Sean Christopherson's avatar
      KVM: x86: Move RDPMC emulation to common code · c483c454
      Sean Christopherson authored
      
      
      Move the entirety of the accelerated RDPMC emulation to x86.c, and assign
      the common handler directly to the exit handler array for VMX.  SVM has
      bizarre nrips behavior that prevents it from directly invoking the common
      handler.  The nrips goofiness will be addressed in a future patch.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-8-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c483c454
    • Sean Christopherson's avatar
      KVM: x86: Move trivial instruction-based exit handlers to common code · 5ff3a351
      Sean Christopherson authored
      
      
      Move the trivial exit handlers, e.g. for instructions that KVM
      "emulates" as nops, to common x86 code.  Assign the common handlers
      directly to the exit handler arrays and drop the vendor trampolines.
      
      Opportunistically use pr_warn_once() where appropriate.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-7-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5ff3a351
    • Sean Christopherson's avatar
      KVM: x86: Move XSETBV emulation to common code · 92f9895c
      Sean Christopherson authored
      
      
      Move the entirety of XSETBV emulation to x86.c, and assign the
      function directly to both VMX's and SVM's exit handlers, i.e. drop the
      unnecessary trampolines.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-6-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      92f9895c
    • Sean Christopherson's avatar
      KVM: x86: Handle triple fault in L2 without killing L1 · cb6a32c2
      Sean Christopherson authored
      
      
      Synthesize a nested VM-Exit if L2 triggers an emulated triple fault
      instead of exiting to userspace, which likely will kill L1.  Any flow
      that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario
      for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that
      is _not_intercepted by L1.  E.g. if KVM is intercepting #GPs for the
      VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF
      will cause KVM to emulate triple fault.
      
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210302174515.2812275-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cb6a32c2
    • Sean Christopherson's avatar
      KVM: x86/mmu: Unexport MMU load/unload functions · 61a1773e
      Sean Christopherson authored
      
      
      Unexport the MMU load and unload helpers now that they are no longer
      used (incorrectly) in vendor code.
      
      Opportunistically move the kvm_mmu_sync_roots() declaration into mmu.h,
      it should not be exposed to vendor code.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210305011101.3597423-16-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      61a1773e
    • Dongli Zhang's avatar
      KVM: x86: to track if L1 is running L2 VM · 43c11d91
      Dongli Zhang authored
      The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM
      is running or used to run L2 VM.
      
      An example of the usage of 'nested_run' is to help the host administrator
      to easily track if any L1 VM is used to run L2 VM. Suppose there is issue
      that may happen with nested virtualization, the administrator will be able
      to easily narrow down and confirm if the issue is due to nested
      virtualization via 'nested_run'. For example, whether the fix like
      commit 88dddc11
      
       ("KVM: nVMX: do not use dangling shadow VMCS after
      guest reset") is required.
      
      Cc: Joe Jin <joe.jin@oracle.com>
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      43c11d91
  10. 12 Mar, 2021 1 commit
    • Muhammad Usama Anjum's avatar
      kvm: x86: annotate RCU pointers · 6fcd9cbc
      Muhammad Usama Anjum authored
      
      
      This patch adds the annotation to fix the following sparse errors:
      arch/x86/kvm//x86.c:8147:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//x86.c:8147:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//x86.c:8147:15:    struct kvm_apic_map *
      arch/x86/kvm//x86.c:10628:16: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//x86.c:10628:16:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//x86.c:10628:16:    struct kvm_apic_map *
      arch/x86/kvm//x86.c:10629:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//x86.c:10629:15:    struct kvm_pmu_event_filter [noderef] __rcu *
      arch/x86/kvm//x86.c:10629:15:    struct kvm_pmu_event_filter *
      arch/x86/kvm//lapic.c:267:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:267:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:267:15:    struct kvm_apic_map *
      arch/x86/kvm//lapic.c:269:9: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:269:9:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:269:9:    struct kvm_apic_map *
      arch/x86/kvm//lapic.c:637:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:637:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:637:15:    struct kvm_apic_map *
      arch/x86/kvm//lapic.c:994:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:994:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:994:15:    struct kvm_apic_map *
      arch/x86/kvm//lapic.c:1036:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:1036:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:1036:15:    struct kvm_apic_map *
      arch/x86/kvm//lapic.c:1173:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:1173:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:1173:15:    struct kvm_apic_map *
      arch/x86/kvm//pmu.c:190:18: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//pmu.c:190:18:    struct kvm_pmu_event_filter [noderef] __rcu *
      arch/x86/kvm//pmu.c:190:18:    struct kvm_pmu_event_filter *
      arch/x86/kvm//pmu.c:251:18: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//pmu.c:251:18:    struct kvm_pmu_event_filter [noderef] __rcu *
      arch/x86/kvm//pmu.c:251:18:    struct kvm_pmu_event_filter *
      arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter [noderef] __rcu *
      arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter *
      arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter [noderef] __rcu *
      arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter *
      Signed-off-by: default avatarMuhammad Usama Anjum <musamaanjum@gmail.com>
      Message-Id: <20210305191123.GA497469@LEGION>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6fcd9cbc
  11. 02 Mar, 2021 1 commit
    • David Woodhouse's avatar
      KVM: x86/xen: Add support for vCPU runstate information · 30b5c851
      David Woodhouse authored
      
      
      This is how Xen guests do steal time accounting. The hypervisor records
      the amount of time spent in each of running/runnable/blocked/offline
      states.
      
      In the Xen accounting, a vCPU is still in state RUNSTATE_running while
      in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules
      does the state become RUNSTATE_blocked. In KVM this means that even when
      the vCPU exits the kvm_run loop, the state remains RUNSTATE_running.
      
      The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the
      KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use
      KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given
      amount of time to the blocked state and subtract it from the running
      state.
      
      The state_entry_time corresponds to get_kvmclock_ns() at the time the
      vCPU entered the current state, and the total times of all four states
      should always add up to state_entry_time.
      Co-developed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20210301125309.874953-2-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      30b5c851
  12. 26 Feb, 2021 1 commit
  13. 19 Feb, 2021 5 commits
  14. 09 Feb, 2021 2 commits
    • Vitaly Kuznetsov's avatar
      KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional · 8f014550
      Vitaly Kuznetsov authored
      
      
      Hyper-V emulation is enabled in KVM unconditionally. This is bad at least
      from security standpoint as it is an extra attack surface. Ideally, there
      should be a per-VM capability explicitly enabled by VMM but currently it
      is not the case and we can't mandate one without breaking backwards
      compatibility. We can, however, check guest visible CPUIDs and only enable
      Hyper-V emulation when "Hv#1" interface was exposed in
      HYPERV_CPUID_INTERFACE.
      
      Note, VMMs are free to act in any sequence they like, e.g. they can try
      to set MSRs first and CPUIDs later so we still need to allow the host
      to read/write Hyper-V specific MSRs unconditionally.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210126134816.1880136-14-vkuznets@redhat.com>
      [Add selftest vcpu_set_hv_cpuid API to avoid breaking xen_vmcall_test. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8f014550
    • Vitaly Kuznetsov's avatar
      KVM: x86: hyper-v: Allocate 'struct kvm_vcpu_hv' dynamically · 4592b7ea
      Vitaly Kuznetsov authored
      
      
      Hyper-V context is only needed for guests which use Hyper-V emulation in
      KVM (e.g. Windows/Hyper-V guests). 'struct kvm_vcpu_hv' is, however, quite
      big, it accounts for more than 1/4 of the total 'struct kvm_vcpu_arch'
      which is also quite big already. This all looks like a waste.
      
      Allocate 'struct kvm_vcpu_hv' dynamically. This patch does not bring any
      (intentional) functional change as we still allocate the context
      unconditionally but it paves the way to doing that only when needed.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210126134816.1880136-13-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4592b7ea