1. 23 Mar, 2017 1 commit
  2. 01 Mar, 2017 1 commit
  3. 17 Feb, 2017 1 commit
  4. 16 Feb, 2017 1 commit
  5. 30 Jan, 2017 1 commit
  6. 28 Nov, 2016 1 commit
    • Suraj Jitindar Singh's avatar
      KVM: Export kvm module parameter variables · ec76d819
      Suraj Jitindar Singh authored
      
      
      The kvm module has the parameters halt_poll_ns, halt_poll_ns_grow, and
      halt_poll_ns_shrink. Halt polling was recently added to the powerpc kvm-hv
      module and these parameters were essentially duplicated for that. There is
      no benefit to this duplication and it can lead to confusion when trying to
      tune halt polling.
      
      Thus move the definition of these variables to kvm_host.h and export them.
      This will allow the kvm-hv module to use the same module parameters by
      accessing these variables, which will be implemented in the next patch,
      meaning that they will no longer be duplicated.
      
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      ec76d819
  7. 22 Nov, 2016 2 commits
  8. 07 Oct, 2016 1 commit
    • Rik van Riel's avatar
      x86/fpu, kvm: Remove KVM vcpu->fpu_counter · 3d42de25
      Rik van Riel authored
      
      
      With the removal of the lazy FPU code, this field is no longer used.
      Get rid of it.
      
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: pbonzini@redhat.com
      Link: http://lkml.kernel.org/r/1475627678-20788-7-git-send-email-riel@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3d42de25
  9. 16 Sep, 2016 2 commits
    • Luiz Capitulino's avatar
      kvm: create per-vcpu dirs in debugfs · 45b5939e
      Luiz Capitulino authored
      
      
      This commit adds the ability for archs to export
      per-vcpu information via a new per-vcpu dir in
      the VM's debugfs directory.
      
      If kvm_arch_has_vcpu_debugfs() returns true, then KVM
      will create a vcpu dir for each vCPU in the VM's
      debugfs directory. Then kvm_arch_create_vcpu_debugfs()
      is responsible for populating each vcpu directory
      with arch specific entries.
      
      The per-vcpu path in debugfs will look like:
      
      /sys/kernel/debug/kvm/29162-10/vcpu0
      /sys/kernel/debug/kvm/29162-10/vcpu1
      
      This is all arch specific for now because the only
      user of this interface (x86) wants to export x86-specific
      per-vcpu information to user-space.
      
      Signed-off-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      45b5939e
    • Luiz Capitulino's avatar
      kvm: add stubs for arch specific debugfs support · 235539b4
      Luiz Capitulino authored
      
      
      Two stubs are added:
      
       o kvm_arch_has_vcpu_debugfs(): must return true if the arch
         supports creating debugfs entries in the vcpu debugfs dir
         (which will be implemented by the next commit)
      
       o kvm_arch_create_vcpu_debugfs(): code that creates debugfs
         entries in the vcpu debugfs dir
      
      For x86, this commit introduces a new file to avoid growing
      arch/x86/kvm/x86.c even more.
      
      Signed-off-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      235539b4
  10. 12 Aug, 2016 2 commits
  11. 22 Jul, 2016 3 commits
  12. 18 Jul, 2016 1 commit
  13. 14 Jul, 2016 1 commit
  14. 01 Jul, 2016 1 commit
  15. 28 Jun, 2016 1 commit
  16. 15 Jun, 2016 2 commits
    • Paolo Bonzini's avatar
      KVM: remove kvm_vcpu_compatible · 557abc40
      Paolo Bonzini authored
      
      
      The new created_vcpus field makes it possible to avoid the race between
      irqchip and VCPU creation in a much nicer way; just check under kvm->lock
      whether a VCPU has already been created.
      
      We can then remove KVM_APIC_ARCHITECTURE too, because at this point the
      symbol is only governing the default definition of kvm_vcpu_compatible.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      557abc40
    • Paolo Bonzini's avatar
      KVM: introduce kvm->created_vcpus · 6c7caebc
      Paolo Bonzini authored
      
      
      The race between creating the irqchip and the first VCPU is
      currently fixed by checking the presence of an irqchip before
      updating kvm->online_vcpus, and undoing the whole VCPU creation
      if someone created the irqchip in the meanwhile.
      
      Instead, introduce a new field in struct kvm that will count VCPUs
      under a mutex, without the atomic access and memory ordering that we
      need elsewhere to protect the vcpus array.  This also plugs the race
      and is more easily applicable in all similar circumstances.
      
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6c7caebc
  17. 25 May, 2016 1 commit
    • Janosch Frank's avatar
      KVM: Create debugfs dir and stat files for each VM · 536a6f88
      Janosch Frank authored
      
      
      This patch adds a kvm debugfs subdirectory for each VM, which is named
      after its pid and file descriptor. The directories contain the same
      kind of files that are already in the kvm debugfs directory, but the
      data exported through them is now VM specific.
      
      This makes the debugfs kvm data a convenient alternative to the
      tracepoints which already have per VM data. The debugfs data is easy
      to read and low overhead.
      
      CC: Dan Carpenter <dan.carpenter@oracle.com> [includes fixes by Dan Carpenter]
      Signed-off-by: default avatarJanosch Frank <frankja@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      536a6f88
  18. 18 May, 2016 1 commit
  19. 13 May, 2016 1 commit
    • Christian Borntraeger's avatar
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger authored
      
      
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  20. 11 May, 2016 3 commits
  21. 20 Apr, 2016 1 commit
    • Paolo Bonzini's avatar
      KVM: add missing memory barrier in kvm_{make,check}_request · 2e4682ba
      Paolo Bonzini authored
      
      
      kvm_make_request and kvm_check_request imply a producer-consumer
      relationship; add implicit memory barriers to them.  There was indeed
      already a place that was adding an explicit smp_mb() to order between
      kvm_check_request and the processing of the request.  That memory
      barrier can be removed (as an added benefit, kvm_check_request can use
      smp_mb__after_atomic which is free on x86).
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2e4682ba
  22. 25 Feb, 2016 1 commit
    • Marcelo Tosatti's avatar
      KVM: Use simple waitqueue for vcpu->wq · 8577370f
      Marcelo Tosatti authored
      The problem:
      
      On -rt, an emulated LAPIC timer instances has the following path:
      
      1) hard interrupt
      2) ksoftirqd is scheduled
      3) ksoftirqd wakes up vcpu thread
      4) vcpu thread is scheduled
      
      This extra context switch introduces unnecessary latency in the
      LAPIC path for a KVM guest.
      
      The solution:
      
      Allow waking up vcpu thread from hardirq context,
      thus avoiding the need for ksoftirqd to be scheduled.
      
      Normal waitqueues make use of spinlocks, which on -RT
      are sleepable locks. Therefore, waking up a waitqueue
      waiter involves locking a sleeping lock, which
      is not allowed from hard interrupt context.
      
      cyclictest command line:
      
      This patch reduces the average latency in my tests from 14us to 11us.
      
      Daniel writes:
      Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
      benchmark on mainline. The test was run 1000 times on
      tip/sched/core 4.4.0-rc8-01134-g0905f04e:
      
        ./x86-run x86/tscdeadline_latency.flat -cpu host
      
      with idle=poll.
      
      The test seems not to deliver really stable numbers though most of
      them are smaller. Paolo write:
      
      "Anything above ~10000 cycles means that the host went to C1 or
      lower---the number means more or less nothing in that case.
      
      The mean shows an improvement indeed."
      
      Before:
      
                     min             max         mean           std
      count  1000.000000     1000.000000  1000.000000   1000.000000
      mean   5162.596000  2019270.084000  5824.491541  20681.645558
      std      75.431231   622607.723969    89.575700   6492.272062
      min    4466.000000    23928.000000  5537.926500    585.864966
      25%    5163.000000  16132529
      
      .750000  5790.132275  16683.745433
      50%    5175.000000  2281919.000000  5834.654000  23151.990026
      75%    5190.000000  2382865.750000  5861.412950  24148.206168
      max    5228.000000  4175158.000000  6254.827300  46481.048691
      
      After
                     min            max         mean           std
      count  1000.000000     1000.00000  1000.000000   1000.000000
      mean   5143.511000  2076886.10300  5813.312474  21207.357565
      std      77.668322   610413.09583    86.541500   6331.915127
      min    4427.000000    25103.00000  5529.756600    559.187707
      25%    5148.000000  1691272.75000  5784.889825  17473.518244
      50%    5160.000000  2308328.50000  5832.025000  23464.837068
      75%    5172.000000  2393037.75000  5853.177675  24223.969976
      max    5222.000000  3922458.00000  6186.720500  42520.379830
      
      [Patch was originaly based on the swait implementation found in the -rt
       tree. Daniel ported it to mainline's version and gathered the
       benchmark numbers for tscdeadline_latency test.]
      
      Signed-off-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      8577370f
  23. 16 Jan, 2016 1 commit
    • Dan Williams's avatar
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams authored
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
      
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
  24. 08 Jan, 2016 4 commits
  25. 16 Dec, 2015 3 commits
    • Borislav Petkov's avatar
      kvm: Dump guest rIP when the guest tried something unsupported · 671d9ab3
      Borislav Petkov authored
      
      
      It looks like this in action:
      
        kvm [5197]: vcpu0, guest rIP: 0xffffffff810187ba unhandled rdmsr: 0xc001102
      
      and helps to pinpoint quickly where in the guest we did the unsupported
      thing.
      
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      671d9ab3
    • Andrey Smetanin's avatar
      kvm/x86: Hyper-V SynIC timers · 1f4b34f8
      Andrey Smetanin authored
      
      
      Per Hyper-V specification (and as required by Hyper-V-aware guests),
      SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
      of MSRs, and signals expiration by delivering a special format message
      to the configured SynIC message slot and triggering the corresponding
      synthetic interrupt.
      
      Note: as implemented by this patch, all periodic timers are "lazy"
      (i.e. if the vCPU wasn't scheduled for more than the timer period the
      timer events are lost), regardless of the corresponding configuration
      MSR.  If deemed necessary, the "catch up" mode (the timer period is
      shortened until the timer catches up) will be implemented later.
      
      Changes v2:
      * Use remainder to calculate periodic timer expiration time
      
      Signed-off-by: default avatarAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Vitaly Kuznetsov <vkuznets@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1f4b34f8
    • Andrey Smetanin's avatar
      kvm/x86: Hyper-V SynIC message slot pending clearing at SINT ack · 765eaa0f
      Andrey Smetanin authored
      
      
      The SynIC message protocol mandates that the message slot is claimed
      by atomically setting message type to something other than HVMSG_NONE.
      If another message is to be delivered while the slot is still busy,
      message pending flag is asserted to indicate to the guest that the
      hypervisor wants to be notified when the slot is released.
      
      To make sure the protocol works regardless of where the message
      sources are (kernel or userspace), clear the pending flag on SINT ACK
      notification, and let the message sources compete for the slot again.
      
      Signed-off-by: default avatarAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Vitaly Kuznetsov <vkuznets@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      765eaa0f
  26. 30 Nov, 2015 2 commits