1. 14 Sep, 2019 1 commit
  2. 08 Sep, 2019 1 commit
    • Jan Stancek's avatar
      x86/timer: Force PIT initialization when !X86_FEATURE_ARAT · afa8b475
      Jan Stancek authored
      KVM guests with commit c8c40767 ("x86/timer: Skip PIT initialization on
      modern chipsets") applied to guest kernel have been observed to have
      unusually higher CPU usage with symptoms of increase in vm exits for HLT
      This is caused by older QEMUs lacking support for X86_FEATURE_ARAT.  lapic
      clock retains CLOCK_EVT_FEAT_C3STOP and nohz stays inactive.  There's no
      usable broadcast device either.
      Do the PIT initialization if guest CPU lacks X86_FEATURE_ARAT.  On real
      hardware it shouldn't matter as ARAT and DEADLINE come together.
      Fixes: c8c40767
       ("x86/timer: Skip PIT initialization on modern chipsets")
      Signed-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
  3. 07 Sep, 2019 2 commits
    • Linus Torvalds's avatar
      Revert "x86/apic: Include the LDR when clearing out APIC registers" · 950b07c1
      Linus Torvalds authored
      This reverts commit 558682b5.
      Chris Wilson reports that it breaks his CPU hotplug test scripts.  In
      particular, it breaks offlining and then re-onlining the boot CPU, which
      we treat specially (and the BIOS does too).
      The symptoms are that we can offline the CPU, but it then does not come
      back online again:
          smpboot: CPU 0 is now offline
          smpboot: Booting Node 0 Processor 0 APIC 0x0
          smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
      Thomas says he knows why it's broken (my personal suspicion: our magic
      handling of the "cpu0_logical_apicid" thing), but for 5.3 the right fix
      is to just revert it, since we've never touched the LDR bits before, and
      it's not worth the risk to do anything else at this stage.
      [ Hotpluging of the boot CPU is special anyway, and should be off by
        default. See the "BOOTPARAM_HOTPLUG_CPU0" config option and the
        cpu0_hotplug kernel parameter.
        In general you should not do it, and it has various known limitations
        (hibernate and suspend require the boot CPU, for example).
        But it should work, even if the boot CPU is special and needs careful
        treatment       - Linus ]
      Link: https://lore.kernel.org/lkml/156785100521.13300.14461504732265570003@skylake-alporthouse-com/
      Reported-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Bandan Das <bsd@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Arnd Bergmann's avatar
      ipc: fix sparc64 ipc() wrapper · fb377eb8
      Arnd Bergmann authored
      Matt bisected a sparc64 specific issue with semctl, shmctl and msgctl
      to a commit from my y2038 series in linux-5.1, as I missed the custom
      sys_ipc() wrapper that sparc64 uses in place of the generic version that
      I patched.
      The problem is that the sys_{sem,shm,msg}ctl() functions in the kernel
      now do not allow being called with the IPC_64 flag any more, resulting
      in a -EINVAL error when they don't recognize the command.
      Instead, the correct way to do this now is to call the internal
      ksys_old_{sem,shm,msg}ctl() functions to select the API version.
      As we generally move towards these functions anyway, change all of
      sparc_ipc() to consistently use those in place of the sys_*() versions,
      and move the required ksys_*() declarations into linux/syscalls.h
      The IS_ENABLED(CONFIG_SYSVIPC) check is required to avoid link
      errors when ipc is disabled.
      Reported-by: default avatarMatt Turner <mattst88@gmail.com>
      Fixes: 275f2214
       ("ipc: rename old-style shmctl/semctl/msgctl syscalls")
      Cc: stable@vger.kernel.org
      Tested-by: default avatarMatt Turner <mattst88@gmail.com>
      Tested-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
  4. 06 Sep, 2019 1 commit
    • Steve Wahl's avatar
      x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to... · e16c2983
      Steve Wahl authored
      x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors
      The last change to this Makefile caused relocation errors when loading
      a kdump kernel.  Restore -mcmodel=large (not -mcmodel=kernel),
      -ffreestanding, and -fno-zero-initialized-bsss, without reverting to
      the former practice of resetting KBUILD_CFLAGS.
      Purgatory.ro is a standalone binary that is not linked against the
      rest of the kernel.  Its image is copied into an array that is linked
      to the kernel, and from there kexec relocates it wherever it desires.
      With the previous change to compiler flags, the error "kexec: Overflow
      in relocation type 11 value 0x11fffd000" was encountered when trying
      to load the crash kernel.  This is from kexec code trying to relocate
      the purgatory.ro object.
      From the error message, relocation type 11 is R_X86_64_32S.  The
      x86_64 ABI says:
        "The R_X86_64_32 and R_X86_64_32S relocations truncate the
         computed value to 32-bits.  The linker must verify that the
         generated value for the R_X86_64_32 (R_X86_64_32S) relocation
         zero-extends (sign-extends) to the original 64-bit value."
      This type of relocation doesn't work when kexec chooses to place the
      purgatory binary in memory that is not reachable with 32 bit
      The compiler flag -mcmodel=kernel allows those type of relocations to
      be emitted, so revert to using -mcmodel=large as was done before.
      Also restore the -ffreestanding and -fno-zero-initialized-bss flags
      because they are appropriate for a stand alone piece of object code
      which doesn't explicitly zero the bss, and one other report has said
      undefined symbols are encountered without -ffreestanding.
      These identical compiler flag changes need to happen for every object
      that becomes part of the purgatory.ro object, so gather them together
      apply them to each of the objects that have C source.  Do not apply
      any of these flags to kexec-purgatory.o, which is not part of the
      standalone object but part of the kernel proper.
      Tested-by: default avatarVaibhav Rustagi <vaibhavrustagi@google.com>
      Tested-by: default avatarAndreas Smas <andreas@lonelycoder.com>
      Signed-off-by: default avatarSteve Wahl <steve.wahl@hpe.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: None
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: clang-built-linux@googlegroups.com
      Cc: dimitri.sivanich@hpe.com
      Cc: mike.travis@hpe.com
      Cc: russ.anderson@hpe.com
      Fixes: b059f801 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
      Link: https://lkml.kernel.org/r/20190905202346.GA26595@swahl-linux
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  5. 04 Sep, 2019 2 commits
    • Gustavo Romero's avatar
      powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts · a8318c13
      Gustavo Romero authored
      When in userspace and MSR FP=0 the hardware FP state is unrelated to
      the current process. This is extended for transactions where if tbegin
      is run with FP=0, the hardware checkpoint FP state will also be
      unrelated to the current process. Due to this, we need to ensure this
      hardware checkpoint is updated with the correct state before we enable
      FP for this process.
      Unfortunately we get this wrong when returning to a process from a
      hardware interrupt. A process that starts a transaction with FP=0 can
      take an interrupt. When the kernel returns back to that process, we
      change to FP=1 but with hardware checkpoint FP state not updated. If
      this transaction is then rolled back, the FP registers now contain the
      wrong state.
      The process looks like this:
         Userspace:                      Kernel
                     Start userspace
                      with MSR FP=0 TM=1
                        < -----
                     Hardware interrupt
                         ---- >
      				        /* sees FP=0 */
      					    /* sees FP=1 (Incorrect) */
                                              FP = 0 -> 1
                        < -----
                     Return to userspace
                       with MSR TM=1 FP=1
                       with junk in the FP TM checkpoint
         TM rollback
         reads FP junk
      When returning from the hardware exception, tm_active_with_fp() is
      incorrectly making restore_fp() call load_fp_state() which is setting
      The fix is to remove tm_active_with_fp().
      tm_active_with_fp() is attempting to handle the case where FP state
      has been changed inside a transaction. In this case the checkpointed
      and transactional FP state is different and hence we must restore the
      FP state (ie. we can't do lazy FP restore inside a transaction that's
      used FP). It's safe to remove tm_active_with_fp() as this case is
      handled by restore_tm_state(). restore_tm_state() detects if FP has
      been using inside a transaction and will set load_fp and call
      restore_math() to ensure the FP state (checkpoint and transaction) is
      This is a data integrity problem for the current process as the FP
      registers are corrupted. It's also a security problem as the FP
      registers from one process may be leaked to another.
      Similarly for VMX.
      A simple testcase to replicate this will be posted to
      This fixes CVE-2019-15031.
      Fixes: a7771176
       ("powerpc: Don't enable FP/Altivec if not checkpointed")
      Cc: stable@vger.kernel.org # 4.15+
      Signed-off-by: default avatarGustavo Romero <gromero@linux.ibm.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190904045529.23002-2-gromero@linux.vnet.ibm.com
    • Gustavo Romero's avatar
      powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction · 8205d5d9
      Gustavo Romero authored
      When we take an FP unavailable exception in a transaction we have to
      account for the hardware FP TM checkpointed registers being
      incorrect. In this case for this process we know the current and
      checkpointed FP registers must be the same (since FP wasn't used
      inside the transaction) hence in the thread_struct we copy the current
      FP registers to the checkpointed ones.
      This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr
      to determine if FP was on when in userspace. thread->ckpt_regs.msr
      represents the state of the MSR when exiting userspace. This is setup
      by check_if_tm_restore_required().
      Unfortunatley there is an optimisation in giveup_all() which returns
      early if tsk->thread.regs->msr (via local variable `usermsr`) has
      FP=VEC=VSX=SPE=0. This optimisation means that
      check_if_tm_restore_required() is not called and hence
      thread->ckpt_regs.msr is not updated and will contain an old value.
      This can happen if due to load_fp=255 we start a userspace process
      with MSR FP=1 and then we are context switched out. In this case
      thread->ckpt_regs.msr will contain FP=1. If that same process is then
      context switched in and load_fp overflows, MSR will have FP=0. If that
      process now enters a transaction and does an FP instruction, the FP
      unavailable will not update thread->ckpt_regs.msr (the bug) and MSR
      FP=1 will be retained in thread->ckpt_regs.msr.  tm_reclaim_thread()
      will then not perform the required memcpy and the checkpointed FP regs
      in the thread struct will contain the wrong values.
      The code path for this happening is:
             Userspace:                      Kernel
                         Start userspace
                          with MSR FP/VEC/VSX/SPE=0 TM=1
                            < -----
             fp instruction
                         FP unavailable
                             ---- >
      					        return early since FP/VMX/VSX=0
      						/* ckpt MSR not updated (Incorrect) */
      					        /* thread_struct ckpt FP regs contain junk (OK) */
                                                    /* Sees ckpt MSR FP=1 (Incorrect) */
      					      no memcpy() performed
      					        /* thread_struct ckpt FP regs not fixed (Incorrect) */
      					     /* Put junk in hardware checkpoint FP regs */
                            < -----
                         Return to userspace
                           with MSR TM=1 FP=1
                           with junk in the FP TM checkpoint
             TM rollback
             reads FP junk
      This is a data integrity problem for the current process as the FP
      registers are corrupted. It's also a security problem as the FP
      registers from one process may be leaked to another.
      This patch moves up check_if_tm_restore_required() in giveup_all() to
      ensure thread->ckpt_regs.msr is updated correctly.
      A simple testcase to replicate this will be posted to
      Similarly for VMX.
      This fixes CVE-2019-15030.
      Fixes: f48e91e8
       ("powerpc/tm: Fix FP and VMX register corruption")
      Cc: stable@vger.kernel.org # 4.12+
      Signed-off-by: default avatarGustavo Romero <gromero@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190904045529.23002-1-gromero@linux.vnet.ibm.com
  6. 02 Sep, 2019 3 commits
  7. 31 Aug, 2019 2 commits
  8. 30 Aug, 2019 2 commits
    • Kim Phillips's avatar
      perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops · 0f4cd769
      Kim Phillips authored
      When counting dispatched micro-ops with cnt_ctl=1, in order to prevent
      sample bias, IBS hardware preloads the least significant 7 bits of
      current count (IbsOpCurCnt) with random values, such that, after the
      interrupt is handled and counting resumes, the next sample taken
      will be slightly perturbed.
      The current count bitfield is in the IBS execution control h/w register,
      alongside the maximum count field.
      Currently, the IBS driver writes that register with the maximum count,
      leaving zeroes to fill the current count field, thereby overwriting
      the random bits the hardware preloaded for itself.
      Fix the driver to actually retain and carry those random bits from the
      read of the IBS control register, through to its write, instead of
      overwriting the lower current count bits with zeroes.
      Tested with:
      perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload>
      'perf annotate' output before:
       15.70  65:   addsd     %xmm0,%xmm1
       17.30        add       $0x1,%rax
       15.88        cmp       %rdx,%rax
                    je        82
       17.32  72:   test      $0x1,%al
                    jne       7c
        7.52        movapd    %xmm1,%xmm0
        5.90        jmp       65
        8.23  7c:   sqrtsd    %xmm1,%xmm0
       12.15        jmp       65
      'perf annotate' output after:
       16.63  65:   addsd     %xmm0,%xmm1
       16.82        add       $0x1,%rax
       16.81        cmp       %rdx,%rax
                    je        82
       16.69  72:   test      $0x1,%al
                    jne       7c
        8.30        movapd    %xmm1,%xmm0
        8.13        jmp       65
        8.24  7c:   sqrtsd    %xmm1,%xmm0
        8.39        jmp       65
      Tested on Family 15h and 17h machines.
      Machines prior to family 10h Rev. C don't have the RDWROPCNT capability,
      and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't
      affect their operation.
      It is unknown why commit db98c5fa
       ("perf/x86: Implement 64-bit
      counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt
      field; the number of preloaded random bits has always been 7, AFAICT.
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org>
      Cc: <x86@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: "Namhyung Kim" <namhyung@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com
    • Josh Hunt's avatar
      perf/x86/intel: Restrict period on Nehalem · 44d3bbb6
      Josh Hunt authored
      We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
      some cases when using perf:
      perfevents: irq loop stuck!
      WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
      RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
      Call Trace:
      ? perf_event_nmi_handler+0x2e/0x50
      ? intel_pmu_save_and_restart+0x50/0x50
      ? native_write_msr+0x6/0x20
      ? native_write_msr+0x6/0x20
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      ? hrtimer_start_range_ns+0x153/0x210
      ? hrtimer_init+0x120/0x120
      ? wake_up_q+0x60/0x60
      RIP: 0033:0x7fdeb1e96c03
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: acme@kernel.org
      Cc: Josh Hunt <johunt@akamai.com>
      Cc: bpuranda@akamai.com
      Cc: mingo@redhat.com
      Cc: jolsa@redhat.com
      Cc: tglx@linutronix.de
      Cc: namhyung@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com
  9. 29 Aug, 2019 3 commits
    • Thomas Gleixner's avatar
      x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text · 7af01450
      Thomas Gleixner authored
      ftrace does not use text_poke() for enabling trace functionality. It uses
      its own mechanism and flips the whole kernel text to RW and back to RO.
      The CPA rework removed a loop based check of 4k pages which tried to
      preserve a large page by checking each 4k page whether the change would
      actually cover all pages in the large page.
      This resulted in endless loops for nothing as in testing it turned out that
      it actually never preserved anything. Of course testing missed to include
      ftrace, which is the one and only case which benefitted from the 4k loop.
      As a consequence enabling function tracing or ftrace based kprobes results
      in a full 4k split of the kernel text, which affects iTLB performance.
      The kernel RO protection is the only valid case where this can actually
      preserve large pages.
      All other static protections (RO data, data NX, PCI, BIOS) are truly
      static.  So a conflict with those protections which results in a split
      should only ever happen when a change of memory next to a protected region
      is attempted. But these conflicts are rightfully splitting the large page
      to preserve the protected regions. In fact a change to the protected
      regions itself is a bug and is warned about.
      Add an exception for the static protection check for kernel text RO when
      the to be changed region spawns a full large page which allows to preserve
      the large mappings. This also prevents the syslog to be spammed about CPA
      violations when ftrace is used.
      The exception needs to be removed once ftrace switched over to text_poke()
      which avoids the whole issue.
      Fixes: 585948f4
       ("x86/mm/cpa: Avoid the 4k pages check completely")
      Reported-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarSong Liu <songliubraving@fb.com>
      Reviewed-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282355340.1938@nanos.tec.linutronix.de
    • Gustavo A. R. Silva's avatar
      nds32: Mark expected switch fall-throughs · 7c9eb2db
      Gustavo A. R. Silva authored
      Mark switch cases where we are expecting to fall through.
      This patch fixes the following warnings (Building: allmodconfig nds32):
      include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/nds32/kernel/signal.c:362:20: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/nds32/kernel/signal.c:315:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
    • Gustavo A. R. Silva's avatar
      ARC: unwind: Mark expected switch fall-through · 00a0c845
      Gustavo A. R. Silva authored
      Mark switch cases where we are expecting to fall through.
      This patch fixes the following warnings (Building: haps_hs_defconfig arc):
      arch/arc/kernel/unwind.c: In function ‘read_pointer’:
      ./include/linux/compiler.h:328:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
        do {        \
      ./include/linux/compiler.h:338:2: note: in expansion of macro ‘__compiletime_assert’
        __compiletime_assert(condition, msg, prefix, suffix)
      ./include/linux/compiler.h:350:2: note: in expansion of macro ‘_compiletime_assert’
        _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
      ./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
       #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
      ./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
        BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
      arch/arc/kernel/unwind.c:573:3: note: in expansion of macro ‘BUILD_BUG_ON’
         BUILD_BUG_ON(sizeof(u32) != sizeof(value));
      arch/arc/kernel/unwind.c:575:2: note: here
        case DW_EH_PE_native:
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
  10. 28 Aug, 2019 3 commits
    • zhaoyang's avatar
      ARM: 8901/1: add a criteria for pfn_valid of arm · 5b3efa4f
      zhaoyang authored
      pfn_valid can be wrong when parsing a invalid pfn whose phys address
      exceeds BITS_PER_LONG as the MSB will be trimed when shifted.
      The issue originally arise from bellowing call stack, which corresponding to
      an access of the /proc/kpageflags from userspace with a invalid pfn parameter
      and leads to kernel panic.
      [46886.723249] c7 [<c031ff98>] (stable_page_flags) from [<c03203f8>]
      [46886.723264] c7 [<c0320368>] (kpageflags_read) from [<c0312030>]
      [46886.723280] c7 [<c0311fb0>] (proc_reg_read) from [<c02a6e6c>]
      [46886.723290] c7 [<c02a6e24>] (__vfs_read) from [<c02a7018>]
      [46886.723301] c7 [<c02a6f74>] (vfs_read) from [<c02a778c>]
      [46886.723315] c7 [<c02a770c>] (SyS_pread64) from [<c0108620>]
      Signed-off-by: default avatarZhaoyang Huang <zhaoyang.huang@unisoc.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
    • Anup Patel's avatar
      RISC-V: Fix FIXMAP area corruption on RV32 systems · a256f2e3
      Anup Patel authored
      Currently, various virtual memory areas of Linux RISC-V are organized
      in increasing order of their virtual addresses is as follows:
      1. User space area (This is lowest area and starts at 0x0)
      2. FIXMAP area
      3. VMALLOC area
      4. Kernel area (This is highest area and starts at PAGE_OFFSET)
      The maximum size of user space aread is represented by TASK_SIZE.
      On RV32 systems, TASK_SIZE is defined as VMALLOC_START which causes the
      user space area to overlap the FIXMAP area. This allows user space apps
      to potentially corrupt the FIXMAP area and kernel OF APIs will crash
      whenever they access corrupted FDT in the FIXMAP area.
      On RV64 systems, TASK_SIZE is set to fixed 256GB and no other areas
      happen to overlap so we don't see any FIXMAP area corruptions.
      This patch fixes FIXMAP area corruption on RV32 systems by setting
      and FIXADDR_START defines to asm/pgtable.h so that we can avoid cyclic
      header includes.
      Signed-off-by: default avatarAnup Patel <anup.patel@wdc.com>
      Tested-by: default avatarAlistair Francis <alistair.francis@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
    • Linus Torvalds's avatar
      x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning · 42e0e954
      Linus Torvalds authored
      One of the very few warnings I have in the current build comes from
      arch/x86/boot/edd.c, where I get the following with a gcc9 build:
         arch/x86/boot/edd.c: In function ‘query_edd’:
         arch/x86/boot/edd.c:148:11: warning: taking address of packed member of ‘struct boot_params’ may result in an unaligned pointer value [-Waddress-of-packed-member]
           148 |  mbrptr = boot_params.edd_mbr_sig_buffer;
               |           ^~~~~~~~~~~
      This warning triggers because we throw away all the CFLAGS and then make
      a new set for REALMODE_CFLAGS, so the -Wno-address-of-packed-member we
      added in the following commit is not present:
       ("gcc-9: silence 'address-of-packed-member' warning")
      The simplest solution for now is to adjust the warning for this version
      of CFLAGS as well, but it would definitely make sense to examine whether
      REALMODE_CFLAGS could be derived from CFLAGS, so that it picks up changes
      in the compiler flags environment automatically.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  11. 27 Aug, 2019 4 commits
  12. 26 Aug, 2019 6 commits
    • Bandan Das's avatar
      x86/apic: Include the LDR when clearing out APIC registers · 558682b5
      Bandan Das authored
      Although APIC initialization will typically clear out the LDR before
      setting it, the APIC cleanup code should reset the LDR.
      This was discovered with a 32-bit KVM guest jumping into a kdump
      kernel. The stale bits in the LDR triggered a bug in the KVM APIC
      implementation which caused the destination mapping for VCPUs to be
      Note that this isn't intended to paper over the KVM APIC bug. The kernel
      has to clear the LDR when resetting the APIC registers except when X2APIC
      is enabled.
      This lacks a Fixes tag because missing to clear LDR goes way back into pre
      git history.
      [ tglx: Made x2apic_enabled a function call as required ]
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.com
    • Bandan Das's avatar
      x86/apic: Do not initialize LDR and DFR for bigsmp · bae3a8d3
      Bandan Das authored
      Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
      bigsmp APIC implementation uses physical destination mode, but it
      nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
      multiple bit being set.
      This does not cause a functional problem because LDR and DFR are ignored
      when physical destination mode is active, but it triggered a problem on a
      32-bit KVM guest which jumps into a kdump kernel.
      The multiple bits set unearthed a bug in the KVM APIC implementation. The
      code which creates the logical destination map for VCPUs ignores the
      disabled state of the APIC and ends up overwriting an existing valid entry
      and as a result, APIC calibration hangs in the guest during kdump
      Remove the bogus LDR/DFR initialization.
      This is not intended to work around the KVM APIC bug. The LDR/DFR
      ininitalization is wrong on its own.
      The issue goes back into the pre git history. The fixes tag is the commit
      in the bitkeeper import which introduced bigsmp support in 2003.
      Fixes: db7b9e9f26b8 ("[PATCH] Clustered APIC setup for >8 CPU systems")
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190826101513.5080-2-bsd@redhat.com
    • Nick Desaulniers's avatar
    • Mischa Jonker's avatar
      ARCv2: IDU-intc: Add support for edge-triggered interrupts · 174ae4e9
      Mischa Jonker authored
      This adds support for an optional extra interrupt cell to specify edge
      vs level triggered. It is backward compatible with dts files with only
      one cell, and will default to level-triggered in such a case.
      Note that I had to make a change to idu_irq_set_affinity as well, as
      this function was setting the interrupt type to "level" unconditionally,
      since this was the only type supported previously.
      Signed-off-by: default avatarMischa Jonker <mischa.jonker@synopsys.com>
      Reviewed-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
    • Sebastian Mayr's avatar
      uprobes/x86: Fix detection of 32-bit user mode · 9212ec7d
      Sebastian Mayr authored
      32-bit processes running on a 64-bit kernel are not always detected
      correctly, causing the process to crash when uretprobes are installed.
      The reason for the crash is that in_ia32_syscall() is used to determine the
      process's mode, which only works correctly when called from a syscall.
      In the case of uretprobes, however, the function is called from a exception
      and always returns 'false' on a 64-bit kernel. In consequence this leads to
      corruption of the process's return address.
      Fix this by using user_64bit_mode() instead of in_ia32_syscall(), which
      is correct in any situation.
      [ tglx: Add a comment and the following historical info ]
      This should have been detected by the rename which happened in commit
        abfb9498 ("x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()")
      which states in the changelog:
          The is_ia32_task()/is_x32_task() function names are a big misnomer: they
          suggests that the compat-ness of a system call is a task property, which
          is not true, the compatness of a system call purely depends on how it
          was invoked through the system call layer.
      and then it went and blindly renamed every call site.
      Sadly enough this was already mentioned here:
         8faaed1b ("uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and
      where the changelog says:
          TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
          not necessarily mean 32bit. Fortunately syscall-like insns can't be
          probed so it actually works, but it would be better to rename and
          use is_ia32_frame().
      and goes all the way back to:
          0326f5a9 ("uprobes/core: Handle breakpoint and singlestep exceptions")
      Oh well. 7+ years until someone actually tried a uretprobe on a 32bit
      process on a 64bit kernel....
      Fixes: 0326f5a9
       ("uprobes/core: Handle breakpoint and singlestep exceptions")
      Signed-off-by: default avatarSebastian Mayr <me@sam.st>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190728152617.7308-1-me@sam.st
    • Thomas Gleixner's avatar
      x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines · 3e5bedc2
      Thomas Gleixner authored
      Rahul Tanwar reported the following bug on DT systems:
      > 'ioapic_dynirq_base' contains the virtual IRQ base number. Presently, it is
      > updated to the end of hardware IRQ numbers but this is done only when IOAPIC
      > configuration type is IOAPIC_DOMAIN_LEGACY or IOAPIC_DOMAIN_STRICT. There is
      > a third type IOAPIC_DOMAIN_DYNAMIC which applies when IOAPIC configuration
      > comes from devicetree.
      > See dtb_add_ioapic() in arch/x86/kernel/devicetree.c
      > In case of IOAPIC_DOMAIN_DYNAMIC (DT/OF based system), 'ioapic_dynirq_base'
      > remains to zero initialized value. This means that for OF based systems,
      > virtual IRQ base will get set to zero.
      Such systems will very likely not even boot.
      For DT enabled machines ioapic_dynirq_base is irrelevant and not
      updated, so simply map the IRQ base 1:1 instead.
      Reported-by: default avatarRahul Tanwar <rahul.tanwar@linux.intel.com>
      Tested-by: default avatarRahul Tanwar <rahul.tanwar@linux.intel.com>
      Tested-by: default avatarAndy Shevchenko <andriy.shevchenko@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: alan@linux.intel.com
      Cc: bp@alien8.de
      Cc: cheol.yong.kim@intel.com
      Cc: qi-ming.wu@intel.com
      Cc: rahul.tanwar@intel.com
      Cc: rppt@linux.ibm.com
      Cc: tony.luck@intel.com
      Link: http://lkml.kernel.org/r/20190821081330.1187-1-rahul.tanwar@linux.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  13. 25 Aug, 2019 1 commit
  14. 23 Aug, 2019 3 commits
    • Sean Christopherson's avatar
      x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386 · b63f20a7
      Sean Christopherson authored
      Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to
      avoid clobbering flags.
      KVM's emulator makes indirect calls into a jump table of sorts, where
      the destination of the CALL_NOSPEC is a small blob of code that performs
      fast emulation by executing the target instruction with fixed operands.
           0x000339f8 <+0>:   adc    %dl,%al
           0x000339fa <+2>:   ret
      A major motiviation for doing fast emulation is to leverage the CPU to
      handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
      both an input and output to the target of CALL_NOSPEC.  Clobbering flags
      results in all sorts of incorrect emulation, e.g. Jcc instructions often
      take the wrong path.  Sans the nops...
        asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
           0x0003595a <+58>:  mov    0xc0(%ebx),%eax
           0x00035960 <+64>:  mov    0x60(%ebx),%edx
           0x00035963 <+67>:  mov    0x90(%ebx),%ecx
           0x00035969 <+73>:  push   %edi
           0x0003596a <+74>:  popf
           0x0003596b <+75>:  call   *%esi
           0x000359a0 <+128>: pushf
           0x000359a1 <+129>: pop    %edi
           0x000359a2 <+130>: mov    %eax,0xc0(%ebx)
           0x000359b1 <+145>: mov    %edx,0x60(%ebx)
        ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
           0x000359a8 <+136>: mov    -0x10(%ebp),%eax
           0x000359ab <+139>: and    $0x8d5,%edi
           0x000359b4 <+148>: and    $0xfffff72a,%eax
           0x000359b9 <+153>: or     %eax,%edi
           0x000359bd <+157>: mov    %edi,0x4(%ebx)
      For the most part this has gone unnoticed as emulation of guest code
      that can trigger fast emulation is effectively limited to MMIO when
      running on modern hardware, and MMIO is rarely, if ever, accessed by
      instructions that affect or consume flags.
      Breakage is almost instantaneous when running with unrestricted guest
      disabled, in which case KVM must emulate all instructions when the guest
      has invalid state, e.g. when the guest is in Big Real Mode during early
      Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support")
      Fixes: 1a29b5b7
       ("KVM: x86: Make indirect calls in emulator speculation safe")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel.com
    • Lvqiang Huang's avatar
      ARM: 8897/1: check stmfd instruction using right shift · 69389837
      Lvqiang Huang authored
      In the commit ef41b5c9
       ("ARM: make kernel oops easier to read"),
      -               .word   0xe92d0000 >> 10        @ stmfd sp!, {}
      +               .word   0xe92d0000 >> 11        @ stmfd sp!, {}
      then the shift need to change to 11.
      Signed-off-by: default avatarLvqiang Huang <Lvqiang.Huang@unisoc.com>
      Signed-off-by: default avatarChunyan Zhang <zhang.lyra@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
    • Doug Berger's avatar
      ARM: 8874/1: mm: only adjust sections of valid mm structures · c51bc12d
      Doug Berger authored
      A timing hazard exists when an early fork/exec thread begins
      exiting and sets its mm pointer to NULL while a separate core
      tries to update the section information.
      This commit ensures that the mm pointer is not NULL before
      setting its section parameters. The arguments provided by
      commit 11ce4b33 ("ARM: 8672/1: mm: remove tasklist locking
      from update_sections_early()") are equally valid for not
      requiring grabbing the task_lock around this check.
      Fixes: 08925c2f
       ("ARM: 8464/1: Update all mm structures with section adjustments")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Acked-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Cc: Peng Fan <peng.fan@nxp.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
  15. 22 Aug, 2019 1 commit
    • Johannes Berg's avatar
      um: fix time travel mode · e0917f87
      Johannes Berg authored
      Unfortunately, my build fix for when time travel mode isn't
      enabled broke time travel mode, because I forgot that we need
      to use the timer time after the timer has been marked disabled,
      and thus need to leave the time stored instead of zeroing it.
      Fix that by splitting the inline into two, so we can call only
      the _mode() one in the relevant code path.
      Fixes: b482e48d
       ("um: fix build without CONFIG_UML_TIME_TRAVEL_SUPPORT")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
  16. 21 Aug, 2019 4 commits
  17. 20 Aug, 2019 1 commit