1. 14 Sep, 2019 4 commits
    • Sean Christopherson's avatar
      KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot · 002c5f73
      Sean Christopherson authored
      James Harvey reported a livelock that was introduced by commit
      d012a06a ("Revert "KVM: x86/mmu: Zap only the relevant pages when
      removing a memslot"").
      
      The livelock occurs because kvm_mmu_zap_all() as it exists today will
      voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
      to add shadow pages.  With enough vCPUs, kvm_mmu_zap_all() can get stuck
      in an infinite loop as it can never zap all pages before observing lock
      contention or the need to reschedule.  The equivalent of kvm_mmu_zap_all()
      that was in use at the time of the reverted commit (4e103134, "KVM:
      x86/mmu: Zap only the relevant pages when removing a memslot") employed
      a fast invalidate mechanism and was not susceptible to the above livelock.
      
      There are three ways to fix the livelock:
      
      - Reverting the revert (commit d012a06a) is not a viable option as
        the revert is needed to fix a regression that occurs when the guest has
        one or more assigned devices.  It's unlikely we'll root cause the device
        assignment regression soon enough to fix the regression timely.
      
      - Remove the conditional reschedule from kvm_mmu_zap_all().  However, although
        removing the reschedule would be a smaller code change, it's less safe
        in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
        the wild for flushing memslots since the fast invalidate mechanism was
        introduced by commit 6ca18b69 ("KVM: x86: use the fast way to
        invalidate all pages"), back in 2013.
      
      - Reintroduce the fast invalidate mechanism and use it when zapping shadow
        pages in response to a memslot being deleted/moved, which is what this
        patch does.
      
      For all intents and purposes, this is a revert of commit ea145aac
      ("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
      commit 7390de1e ("Revert "KVM: x86: use the fast way to invalidate
      all pages""), i.e. restores the behavior of commit 5304b8d3 ("KVM:
      MMU: fast invalidate all pages") and commit 6ca18b69 ("KVM: x86:
      use the fast way to invalidate all pages") respectively.
      
      Fixes: d012a06a
      
       ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
      Reported-by: default avatarJames Harvey <jamespharvey20@gmail.com>
      Cc: Alex Willamson <alex.williamson@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      002c5f73
    • Fuqian Huang's avatar
      KVM: x86: work around leak of uninitialized stack contents · 541ab2ae
      Fuqian Huang authored
      
      
      Emulation of VMPTRST can incorrectly inject a page fault
      when passed an operand that points to an MMIO address.
      The page fault will use uninitialized kernel stack memory
      as the CR2 and error code.
      
      The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
      exit to userspace; however, it is not an easy fix, so for now just ensure
      that the error code and CR2 are zero.
      
      Signed-off-by: default avatarFuqian Huang <huangfq.daxian@gmail.com>
      Cc: stable@vger.kernel.org
      [add comment]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      541ab2ae
    • Paolo Bonzini's avatar
      KVM: nVMX: handle page fault in vmread · f7eea636
      Paolo Bonzini authored
      
      
      The implementation of vmread to memory is still incomplete, as it
      lacks the ability to do vmread to I/O memory just like vmptrst.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f7eea636
    • Paul Walmsley's avatar
      riscv: modify the Image header to improve compatibility with the ARM64 header · 474efecb
      Paul Walmsley authored
      
      
      Part of the intention during the definition of the RISC-V kernel image
      header was to lay the groundwork for a future merge with the ARM64
      image header.  One error during my original review was not noticing
      that the RISC-V header's "magic" field was at a different size and
      position than the ARM64's "magic" field.  If the existing ARM64 Image
      header parsing code were to attempt to parse an existing RISC-V kernel
      image header format, it would see a magic number 0.  This is
      undesirable, since it's our intention to align as closely as possible
      with the ARM64 header format.  Another problem was that the original
      "res3" field was not being initialized correctly to zero.
      
      Address these issues by creating a 32-bit "magic2" field in the RISC-V
      header which matches the ARM64 "magic" field.  RISC-V binaries will
      store "RSC\x05" in this field.  The intention is that the use of the
      existing 64-bit "magic" field in the RISC-V header will be deprecated
      over time.  Increment the minor version number of the file format to
      indicate this change, and update the documentation accordingly.  Fix
      the assembler directives in head.S to ensure that reserved fields are
      properly zero-initialized.
      
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Reported-by: default avatarPalmer Dabbelt <palmer@sifive.com>
      Reviewed-by: default avatarPalmer Dabbelt <palmer@sifive.com>
      Cc: Atish Patra <atish.patra@wdc.com>
      Cc: Karsten Merker <merker@debian.org>
      Link: https://lore.kernel.org/linux-riscv/194c2f10c9806720623430dbf0cc59a965e50448.camel@wdc.com/T/#u
      Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t
      474efecb
  2. 12 Sep, 2019 2 commits
  3. 08 Sep, 2019 1 commit
    • Jan Stancek's avatar
      x86/timer: Force PIT initialization when !X86_FEATURE_ARAT · afa8b475
      Jan Stancek authored
      KVM guests with commit c8c40767 ("x86/timer: Skip PIT initialization on
      modern chipsets") applied to guest kernel have been observed to have
      unusually higher CPU usage with symptoms of increase in vm exits for HLT
      and MSW_WRITE (MSR_IA32_TSCDEADLINE).
      
      This is caused by older QEMUs lacking support for X86_FEATURE_ARAT.  lapic
      clock retains CLOCK_EVT_FEAT_C3STOP and nohz stays inactive.  There's no
      usable broadcast device either.
      
      Do the PIT initialization if guest CPU lacks X86_FEATURE_ARAT.  On real
      hardware it shouldn't matter as ARAT and DEADLINE come together.
      
      Fixes: c8c40767
      
       ("x86/timer: Skip PIT initialization on modern chipsets")
      Signed-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      afa8b475
  4. 07 Sep, 2019 2 commits
    • Linus Torvalds's avatar
      Revert "x86/apic: Include the LDR when clearing out APIC registers" · 950b07c1
      Linus Torvalds authored
      This reverts commit 558682b5.
      
      Chris Wilson reports that it breaks his CPU hotplug test scripts.  In
      particular, it breaks offlining and then re-onlining the boot CPU, which
      we treat specially (and the BIOS does too).
      
      The symptoms are that we can offline the CPU, but it then does not come
      back online again:
      
          smpboot: CPU 0 is now offline
          smpboot: Booting Node 0 Processor 0 APIC 0x0
          smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
      
      Thomas says he knows why it's broken (my personal suspicion: our magic
      handling of the "cpu0_logical_apicid" thing), but for 5.3 the right fix
      is to just revert it, since we've never touched the LDR bits before, and
      it's not worth the risk to do anything else at this stage.
      
      [ Hotpluging of the boot CPU is special anyway, and should be off by
        default. See the "BOOTPARAM_HOTPLUG_CPU0" config option and the
        cpu0_hotplug kernel parameter.
      
        In general you should not do it, and it has various known limitations
        (hibernate and suspend require the boot CPU, for example).
      
        But it should work, even if the boot CPU is special and needs careful
        treatment       - Linus ]
      
      Link: https://lore.kernel.org/lkml/156785100521.13300.14461504732265570003@skylake-alporthouse-com/
      
      
      Reported-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Bandan Das <bsd@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      950b07c1
    • Arnd Bergmann's avatar
      ipc: fix sparc64 ipc() wrapper · fb377eb8
      Arnd Bergmann authored
      
      
      Matt bisected a sparc64 specific issue with semctl, shmctl and msgctl
      to a commit from my y2038 series in linux-5.1, as I missed the custom
      sys_ipc() wrapper that sparc64 uses in place of the generic version that
      I patched.
      
      The problem is that the sys_{sem,shm,msg}ctl() functions in the kernel
      now do not allow being called with the IPC_64 flag any more, resulting
      in a -EINVAL error when they don't recognize the command.
      
      Instead, the correct way to do this now is to call the internal
      ksys_old_{sem,shm,msg}ctl() functions to select the API version.
      
      As we generally move towards these functions anyway, change all of
      sparc_ipc() to consistently use those in place of the sys_*() versions,
      and move the required ksys_*() declarations into linux/syscalls.h
      
      The IS_ENABLED(CONFIG_SYSVIPC) check is required to avoid link
      errors when ipc is disabled.
      
      Reported-by: default avatarMatt Turner <mattst88@gmail.com>
      Fixes: 275f2214
      
       ("ipc: rename old-style shmctl/semctl/msgctl syscalls")
      Cc: stable@vger.kernel.org
      Tested-by: default avatarMatt Turner <mattst88@gmail.com>
      Tested-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      fb377eb8
  5. 06 Sep, 2019 1 commit
    • Steve Wahl's avatar
      x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to... · e16c2983
      Steve Wahl authored
      
      x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors
      
      The last change to this Makefile caused relocation errors when loading
      a kdump kernel.  Restore -mcmodel=large (not -mcmodel=kernel),
      -ffreestanding, and -fno-zero-initialized-bsss, without reverting to
      the former practice of resetting KBUILD_CFLAGS.
      
      Purgatory.ro is a standalone binary that is not linked against the
      rest of the kernel.  Its image is copied into an array that is linked
      to the kernel, and from there kexec relocates it wherever it desires.
      
      With the previous change to compiler flags, the error "kexec: Overflow
      in relocation type 11 value 0x11fffd000" was encountered when trying
      to load the crash kernel.  This is from kexec code trying to relocate
      the purgatory.ro object.
      
      From the error message, relocation type 11 is R_X86_64_32S.  The
      x86_64 ABI says:
      
        "The R_X86_64_32 and R_X86_64_32S relocations truncate the
         computed value to 32-bits.  The linker must verify that the
         generated value for the R_X86_64_32 (R_X86_64_32S) relocation
         zero-extends (sign-extends) to the original 64-bit value."
      
      This type of relocation doesn't work when kexec chooses to place the
      purgatory binary in memory that is not reachable with 32 bit
      addresses.
      
      The compiler flag -mcmodel=kernel allows those type of relocations to
      be emitted, so revert to using -mcmodel=large as was done before.
      
      Also restore the -ffreestanding and -fno-zero-initialized-bss flags
      because they are appropriate for a stand alone piece of object code
      which doesn't explicitly zero the bss, and one other report has said
      undefined symbols are encountered without -ffreestanding.
      
      These identical compiler flag changes need to happen for every object
      that becomes part of the purgatory.ro object, so gather them together
      first into PURGATORY_CFLAGS_REMOVE and PURGATORY_CFLAGS, and then
      apply them to each of the objects that have C source.  Do not apply
      any of these flags to kexec-purgatory.o, which is not part of the
      standalone object but part of the kernel proper.
      
      Tested-by: default avatarVaibhav Rustagi <vaibhavrustagi@google.com>
      Tested-by: default avatarAndreas Smas <andreas@lonelycoder.com>
      Signed-off-by: default avatarSteve Wahl <steve.wahl@hpe.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: None
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: clang-built-linux@googlegroups.com
      Cc: dimitri.sivanich@hpe.com
      Cc: mike.travis@hpe.com
      Cc: russ.anderson@hpe.com
      Fixes: b059f801 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
      Link: https://lkml.kernel.org/r/20190905202346.GA26595@swahl-linux
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e16c2983
  6. 05 Sep, 2019 7 commits
  7. 04 Sep, 2019 4 commits
    • Mao Han's avatar
      riscv: Add perf callchain support · dbeb90b0
      Mao Han authored
      
      
      This patch add support for perf callchain sampling on riscv platforms.
      The return address of leaf function is retrieved from pt_regs as
      it is not saved in the outmost frame.
      
      Signed-off-by: default avatarMao Han <han_mao@c-sky.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: linux-riscv <linux-riscv@lists.infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Guo Ren <guoren@kernel.org>
      Tested-by: default avatarGreentime Hu <greentime.hu@sifive.com>
      [paul.walmsley@sifive.com: fixed some 'checkpatch.pl --strict' issues;
       fixed patch description spelling]
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      dbeb90b0
    • Gustavo Romero's avatar
      powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts · a8318c13
      Gustavo Romero authored
      When in userspace and MSR FP=0 the hardware FP state is unrelated to
      the current process. This is extended for transactions where if tbegin
      is run with FP=0, the hardware checkpoint FP state will also be
      unrelated to the current process. Due to this, we need to ensure this
      hardware checkpoint is updated with the correct state before we enable
      FP for this process.
      
      Unfortunately we get this wrong when returning to a process from a
      hardware interrupt. A process that starts a transaction with FP=0 can
      take an interrupt. When the kernel returns back to that process, we
      change to FP=1 but with hardware checkpoint FP state not updated. If
      this transaction is then rolled back, the FP registers now contain the
      wrong state.
      
      The process looks like this:
         Userspace:                      Kernel
      
                     Start userspace
                      with MSR FP=0 TM=1
                        < -----
         ...
         tbegin
         bne
                     Hardware interrupt
                         ---- >
                                          <do_IRQ...>
                                          ....
                                          ret_from_except
                                            restore_math()
      				        /* sees FP=0 */
                                              restore_fp()
                                                tm_active_with_fp()
      					    /* sees FP=1 (Incorrect) */
                                                load_fp_state()
                                              FP = 0 -> 1
                        < -----
                     Return to userspace
                       with MSR TM=1 FP=1
                       with junk in the FP TM checkpoint
         TM rollback
         reads FP junk
      
      When returning from the hardware exception, tm_active_with_fp() is
      incorrectly making restore_fp() call load_fp_state() which is setting
      FP=1.
      
      The fix is to remove tm_active_with_fp().
      
      tm_active_with_fp() is attempting to handle the case where FP state
      has been changed inside a transaction. In this case the checkpointed
      and transactional FP state is different and hence we must restore the
      FP state (ie. we can't do lazy FP restore inside a transaction that's
      used FP). It's safe to remove tm_active_with_fp() as this case is
      handled by restore_tm_state(). restore_tm_state() detects if FP has
      been using inside a transaction and will set load_fp and call
      restore_math() to ensure the FP state (checkpoint and transaction) is
      restored.
      
      This is a data integrity problem for the current process as the FP
      registers are corrupted. It's also a security problem as the FP
      registers from one process may be leaked to another.
      
      Similarly for VMX.
      
      A simple testcase to replicate this will be posted to
      tools/testing/selftests/powerpc/tm/tm-poison.c
      
      This fixes CVE-2019-15031.
      
      Fixes: a7771176
      
       ("powerpc: Don't enable FP/Altivec if not checkpointed")
      Cc: stable@vger.kernel.org # 4.15+
      Signed-off-by: default avatarGustavo Romero <gromero@linux.ibm.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190904045529.23002-2-gromero@linux.vnet.ibm.com
      a8318c13
    • Gustavo Romero's avatar
      powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction · 8205d5d9
      Gustavo Romero authored
      When we take an FP unavailable exception in a transaction we have to
      account for the hardware FP TM checkpointed registers being
      incorrect. In this case for this process we know the current and
      checkpointed FP registers must be the same (since FP wasn't used
      inside the transaction) hence in the thread_struct we copy the current
      FP registers to the checkpointed ones.
      
      This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr
      to determine if FP was on when in userspace. thread->ckpt_regs.msr
      represents the state of the MSR when exiting userspace. This is setup
      by check_if_tm_restore_required().
      
      Unfortunatley there is an optimisation in giveup_all() which returns
      early if tsk->thread.regs->msr (via local variable `usermsr`) has
      FP=VEC=VSX=SPE=0. This optimisation means that
      check_if_tm_restore_required() is not called and hence
      thread->ckpt_regs.msr is not updated and will contain an old value.
      
      This can happen if due to load_fp=255 we start a userspace process
      with MSR FP=1 and then we are context switched out. In this case
      thread->ckpt_regs.msr will contain FP=1. If that same process is then
      context switched in and load_fp overflows, MSR will have FP=0. If that
      process now enters a transaction and does an FP instruction, the FP
      unavailable will not update thread->ckpt_regs.msr (the bug) and MSR
      FP=1 will be retained in thread->ckpt_regs.msr.  tm_reclaim_thread()
      will then not perform the required memcpy and the checkpointed FP regs
      in the thread struct will contain the wrong values.
      
      The code path for this happening is:
      
             Userspace:                      Kernel
                         Start userspace
                          with MSR FP/VEC/VSX/SPE=0 TM=1
                            < -----
             ...
             tbegin
             bne
             fp instruction
                         FP unavailable
                             ---- >
                                              fp_unavailable_tm()
      					  tm_reclaim_current()
      					    tm_reclaim_thread()
      					      giveup_all()
      					        return early since FP/VMX/VSX=0
      						/* ckpt MSR not updated (Incorrect) */
      					      tm_reclaim()
      					        /* thread_struct ckpt FP regs contain junk (OK) */
                                                    /* Sees ckpt MSR FP=1 (Incorrect) */
      					      no memcpy() performed
      					        /* thread_struct ckpt FP regs not fixed (Incorrect) */
      					  tm_recheckpoint()
      					     /* Put junk in hardware checkpoint FP regs */
                                               ....
                            < -----
                         Return to userspace
                           with MSR TM=1 FP=1
                           with junk in the FP TM checkpoint
             TM rollback
             reads FP junk
      
      This is a data integrity problem for the current process as the FP
      registers are corrupted. It's also a security problem as the FP
      registers from one process may be leaked to another.
      
      This patch moves up check_if_tm_restore_required() in giveup_all() to
      ensure thread->ckpt_regs.msr is updated correctly.
      
      A simple testcase to replicate this will be posted to
      tools/testing/selftests/powerpc/tm/tm-poison.c
      
      Similarly for VMX.
      
      This fixes CVE-2019-15030.
      
      Fixes: f48e91e8
      
       ("powerpc/tm: Fix FP and VMX register corruption")
      Cc: stable@vger.kernel.org # 4.12+
      Signed-off-by: default avatarGustavo Romero <gromero@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190904045529.23002-1-gromero@linux.vnet.ibm.com
      8205d5d9
    • Christoph Hellwig's avatar
      arm64: remove __iounmap · e376897f
      Christoph Hellwig authored
      
      
      No need to indirect iounmap for arm64.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      e376897f
  8. 03 Sep, 2019 1 commit
    • Marcel Bocu's avatar
      x86/amd_nb: Add PCI device IDs for family 17h, model 70h · af4e1c5e
      Marcel Bocu authored
      
      
      The AMD Ryzen gen 3 processors came with a different PCI IDs for the
      function 3 & 4 which are used to access the SMN interface. The root
      PCI address however remained at the same address as the model 30h.
      
      Adding the F3/F4 PCI IDs respectively to the misc and link ids appear
      to be sufficient for k10temp, so let's add them and follow up on the
      patch if other functions need more tweaking.
      
      Vicki Pfau sent an identical patch after I checked that no-one had
      written this patch. I would have been happy about dropping my patch but
      unlike for his patch series, I had already Cc:ed the x86 people and
      they already reviewed the changes. Since Vicki has not answered to
      any email after his initial series, let's assume she is on vacation
      and let's avoid duplication of reviews from the maintainers and merge
      my series. To acknowledge Vicki's anteriority, I added her S-o-b to
      the patch.
      
      v2, suggested by Guenter Roeck and Brian Woods:
       - rename from 71h to 70h
      
      Signed-off-by: default avatarVicki Pfau <vi@endrift.com>
      Signed-off-by: default avatarMarcel Bocu <marcel.p.bocu@gmail.com>
      Tested-by: default avatarMarcel Bocu <marcel.p.bocu@gmail.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarBrian Woods <brian.woods@amd.com>
      Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# pci_ids.h
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: "Woods, Brian" <Brian.Woods@amd.com>
      Cc: Clemens Ladisch <clemens@ladisch.de>
      Cc: Jean Delvare <jdelvare@suse.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: linux-hwmon@vger.kernel.org
      Link: https://lore.kernel.org/r/20190722174510.2179-1-marcel.p.bocu@gmail.com
      
      
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      af4e1c5e
  9. 02 Sep, 2019 6 commits
  10. 31 Aug, 2019 3 commits
  11. 30 Aug, 2019 9 commits
    • Logan Gunthorpe's avatar
      RISC-V: Implement sparsemem · d95f1a54
      Logan Gunthorpe authored
      
      
      Implement sparsemem support for Risc-v which helps pave the
      way for memory hotplug and eventually P2P support.
      
      Introduce Kconfig options for virtual and physical address bits which
      are used to calculate the size of the vmemmap and set the
      MAX_PHYSMEM_BITS.
      
      The vmemmap is located directly before the VMALLOC region and sized
      such that we can allocate enough pages to populate all the virtual
      address space in the system (similar to the way it's done in arm64).
      
      During initialization, call memblocks_present() and sparse_init(),
      and provide a stub for vmemmap_populate() (all of which is similar to
      arm64).
      
      [greentime.hu@sifive.com: fixed pfn_valid, FIXADDR_TOP and fixed a bug
       rebasing onto v5.3]
      Signed-off-by: default avatarGreentime Hu <greentime.hu@sifive.com>
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarPalmer Dabbelt <palmer@sifive.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Andrew Waterman <andrew@sifive.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Michael Clark <michaeljclark@mac.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Zong Li <zong@andestech.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      [paul.walmsley@sifive.com: updated to apply; minor commit message
       reformat]
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      d95f1a54
    • Bin Meng's avatar
      riscv: Using CSR numbers to access CSRs · 4f3f9008
      Bin Meng authored
      Since commit a3182c91
      
       ("RISC-V: Access CSRs using CSR numbers"),
      we should prefer accessing CSRs using their CSR numbers, but there
      are several leftovers like sstatus / sptbr we missed.
      
      Signed-off-by: default avatarBin Meng <bmeng.cn@gmail.com>
      Reviewed-by: default avatarAnup Patel <anup@brainfault.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      4f3f9008
    • Kim Phillips's avatar
      perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops · 0f4cd769
      Kim Phillips authored
      When counting dispatched micro-ops with cnt_ctl=1, in order to prevent
      sample bias, IBS hardware preloads the least significant 7 bits of
      current count (IbsOpCurCnt) with random values, such that, after the
      interrupt is handled and counting resumes, the next sample taken
      will be slightly perturbed.
      
      The current count bitfield is in the IBS execution control h/w register,
      alongside the maximum count field.
      
      Currently, the IBS driver writes that register with the maximum count,
      leaving zeroes to fill the current count field, thereby overwriting
      the random bits the hardware preloaded for itself.
      
      Fix the driver to actually retain and carry those random bits from the
      read of the IBS control register, through to its write, instead of
      overwriting the lower current count bits with zeroes.
      
      Tested with:
      
      perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload>
      
      'perf annotate' output before:
      
       15.70  65:   addsd     %xmm0,%xmm1
       17.30        add       $0x1,%rax
       15.88        cmp       %rdx,%rax
                    je        82
       17.32  72:   test      $0x1,%al
                    jne       7c
        7.52        movapd    %xmm1,%xmm0
        5.90        jmp       65
        8.23  7c:   sqrtsd    %xmm1,%xmm0
       12.15        jmp       65
      
      'perf annotate' output after:
      
       16.63  65:   addsd     %xmm0,%xmm1
       16.82        add       $0x1,%rax
       16.81        cmp       %rdx,%rax
                    je        82
       16.69  72:   test      $0x1,%al
                    jne       7c
        8.30        movapd    %xmm1,%xmm0
        8.13        jmp       65
        8.24  7c:   sqrtsd    %xmm1,%xmm0
        8.39        jmp       65
      
      Tested on Family 15h and 17h machines.
      
      Machines prior to family 10h Rev. C don't have the RDWROPCNT capability,
      and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't
      affect their operation.
      
      It is unknown why commit db98c5fa
      
       ("perf/x86: Implement 64-bit
      counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt
      field; the number of preloaded random bits has always been 7, AFAICT.
      
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org>
      Cc: <x86@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: "Namhyung Kim" <namhyung@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com
      0f4cd769
    • Josh Hunt's avatar
      perf/x86/intel: Restrict period on Nehalem · 44d3bbb6
      Josh Hunt authored
      
      
      We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
      some cases when using perf:
      
      perfevents: irq loop stuck!
      WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
      ...
      RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
      ...
      Call Trace:
      <NMI>
      ? perf_event_nmi_handler+0x2e/0x50
      ? intel_pmu_save_and_restart+0x50/0x50
      perf_event_nmi_handler+0x2e/0x50
      nmi_handle+0x6e/0x120
      default_do_nmi+0x3e/0x100
      do_nmi+0x102/0x160
      end_repeat_nmi+0x16/0x50
      ...
      ? native_write_msr+0x6/0x20
      ? native_write_msr+0x6/0x20
      </NMI>
      intel_pmu_enable_event+0x1ce/0x1f0
      x86_pmu_start+0x78/0xa0
      x86_pmu_enable+0x252/0x310
      __perf_event_task_sched_in+0x181/0x190
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      finish_task_switch+0x158/0x260
      __schedule+0x2f6/0x840
      ? hrtimer_start_range_ns+0x153/0x210
      schedule+0x32/0x80
      schedule_hrtimeout_range_clock+0x8a/0x100
      ? hrtimer_init+0x120/0x120
      ep_poll+0x2f7/0x3a0
      ? wake_up_q+0x60/0x60
      do_epoll_wait+0xa9/0xc0
      __x64_sys_epoll_wait+0x1a/0x20
      do_syscall_64+0x4e/0x110
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fdeb1e96c03
      ...
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: acme@kernel.org
      Cc: Josh Hunt <johunt@akamai.com>
      Cc: bpuranda@akamai.com
      Cc: mingo@redhat.com
      Cc: jolsa@redhat.com
      Cc: tglx@linutronix.de
      Cc: namhyung@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com
      44d3bbb6
    • Will Deacon's avatar
      arm64: atomics: Use K constraint when toolchain appears to support it · 03adcbd9
      Will Deacon authored
      
      
      The 'K' constraint is a documented AArch64 machine constraint supported
      by GCC for matching integer constants that can be used with a 32-bit
      logical instruction. Unfortunately, some released compilers erroneously
      accept the immediate '4294967295' for this constraint, which is later
      refused by GAS at assembly time. This had led us to avoid the use of
      the 'K' constraint altogether.
      
      Instead, detect whether the compiler is up to the job when building the
      kernel and pass the 'K' constraint to our 32-bit atomic macros when it
      appears to be supported.
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      03adcbd9
    • Will Deacon's avatar
      arm64: atomics: Undefine internal macros after use · 5aad6cda
      Will Deacon authored
      
      
      We use a bunch of internal macros when constructing our atomic and
      cmpxchg routines in order to save on boilerplate. Avoid exposing these
      directly to users of the header files.
      
      Reviewed-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      5aad6cda
    • Will Deacon's avatar
      arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL · b32baf91
      Will Deacon authored
      
      
      Support for LSE atomic instructions (CONFIG_ARM64_LSE_ATOMICS) relies on
      a static key to select between the legacy LL/SC implementation which is
      available on all arm64 CPUs and the super-duper LSE implementation which
      is available on CPUs implementing v8.1 and later.
      
      Unfortunately, when building a kernel with CONFIG_JUMP_LABEL disabled
      (e.g. because the toolchain doesn't support 'asm goto'), the static key
      inside the atomics code tries to use atomics itself. This results in a
      mess of circular includes and a build failure:
      
      In file included from ./arch/arm64/include/asm/lse.h:11,
                       from ./arch/arm64/include/asm/atomic.h:16,
                       from ./include/linux/atomic.h:7,
                       from ./include/asm-generic/bitops/atomic.h:5,
                       from ./arch/arm64/include/asm/bitops.h:26,
                       from ./include/linux/bitops.h:19,
                       from ./include/linux/kernel.h:12,
                       from ./include/asm-generic/bug.h:18,
                       from ./arch/arm64/include/asm/bug.h:26,
                       from ./include/linux/bug.h:5,
                       from ./include/linux/page-flags.h:10,
                       from kernel/bounds.c:10:
      ./include/linux/jump_label.h: In function ‘static_key_count’:
      ./include/linux/jump_label.h:254:9: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
        return atomic_read(&key->enabled);
               ^~~~~~~~~~~
      
      [ ... more of the same ... ]
      
      Since LSE atomic instructions are not critical to the operation of the
      kernel, make them depend on JUMP_LABEL at compile time.
      
      Reviewed-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      b32baf91
    • Will Deacon's avatar
      arm64: asm: Kill 'asm/atomic_arch.h' · 0533f97b
      Will Deacon authored
      
      
      The contents of 'asm/atomic_arch.h' can be split across some of our
      other 'asm/' headers. Remove it.
      
      Reviewed-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      0533f97b
    • Will Deacon's avatar
      arm64: lse: Remove unused 'alt_lse' assembly macro · 0ca98b24
      Will Deacon authored
      The 'alt_lse' assembly macro has been unused since 7c8fc35d
      
      
      ("locking/atomics/arm64: Replace our atomic/lock bitop implementations
      with asm-generic").
      
      Remove it.
      
      Reviewed-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      0ca98b24