1. 01 Feb, 2018 1 commit
  2. 25 Jan, 2018 1 commit
  3. 15 Jan, 2018 2 commits
  4. 14 Jan, 2018 2 commits
  5. 13 Jan, 2018 1 commit
    • Masami Hiramatsu's avatar
      error-injection: Separate error-injection from kprobe · 540adea3
      Masami Hiramatsu authored
      
      
      Since error-injection framework is not limited to be used
      by kprobes, nor bpf. Other kernel subsystems can use it
      freely for checking safeness of error-injection, e.g.
      livepatch, ftrace etc.
      So this separate error-injection framework from kprobes.
      
      Some differences has been made:
      
      - "kprobe" word is removed from any APIs/structures.
      - BPF_ALLOW_ERROR_INJECTION() is renamed to
        ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
      - CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
        feature. It is automatically enabled if the arch supports
        error injection feature for kprobe or ftrace etc.
      
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      540adea3
  6. 11 Jan, 2018 1 commit
    • David Woodhouse's avatar
      x86/retpoline: Add initial retpoline support · 76b04384
      David Woodhouse authored
      
      
      Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
      the corresponding thunks. Provide assembler macros for invoking the thunks
      in the same way that GCC does, from native and inline assembler.
      
      This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
      some circumstances, IBRS microcode features may be used instead, and the
      retpoline can be disabled.
      
      On AMD CPUs if lfence is serialising, the retpoline can be dramatically
      simplified to a simple "lfence; jmp *\reg". A future patch, after it has
      been verified that lfence really is serialising in all circumstances, can
      enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
      to X86_FEATURE_RETPOLINE.
      
      Do not align the retpoline in the altinstr section, because there is no
      guarantee that it stays aligned when it's copied over the oldinstr during
      alternative patching.
      
      [ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
      [ tglx: Put actual function CALL/JMP in front of the macros, convert to
        	symbolic labels ]
      [ dwmw2: Convert back to numeric labels, merge objtool fixes ]
      
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
      76b04384
  7. 10 Jan, 2018 1 commit
    • Christoph Hellwig's avatar
      dma-mapping: move swiotlb arch helpers to a new header · ea8c64ac
      Christoph Hellwig authored
      
      
      phys_to_dma, dma_to_phys and dma_capable are helpers published by
      architecture code for use of swiotlb and xen-swiotlb only.  Drivers are
      not supposed to use these directly, but use the DMA API instead.
      
      Move these to a new asm/dma-direct.h helper, included by a
      linux/dma-direct.h wrapper that provides the default linear mapping
      unless the architecture wants to override it.
      
      In the MIPS case the existing dma-coherent.h is reused for now as
      untangling it will take a bit of work.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: Robin Murphy's avatarRobin Murphy <robin.murphy@arm.com>
      ea8c64ac
  8. 08 Jan, 2018 2 commits
  9. 22 Dec, 2017 1 commit
    • Thomas Gleixner's avatar
      x86/Kconfig: Limit NR_CPUS on 32-bit to a sane amount · 7bbcbd3d
      Thomas Gleixner authored
      
      
      The recent cpu_entry_area changes fail to compile on 32-bit when BIGSMP=y
      and NR_CPUS=512, because the fixmap area becomes too big.
      
      Limit the number of CPUs with BIGSMP to 64, which is already way to big for
      32-bit, but it's at least a working limitation.
      
      We performed a quick survey of 32-bit-only machines that might be affected
      by this change negatively, but found none.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7bbcbd3d
  10. 17 Dec, 2017 1 commit
    • Andrey Ryabinin's avatar
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · 2aeb0736
      Andrey Ryabinin authored
      [ Note, this is a Git cherry-pick of the following commit:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 PTI code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
      
      
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2aeb0736
  11. 12 Dec, 2017 1 commit
    • Josef Bacik's avatar
      bpf: add a bpf_override_function helper · 9802d865
      Josef Bacik authored
      
      
      Error injection is sloppy and very ad-hoc.  BPF could fill this niche
      perfectly with it's kprobe functionality.  We could make sure errors are
      only triggered in specific call chains that we care about with very
      specific situations.  Accomplish this with the bpf_override_funciton
      helper.  This will modify the probe'd callers return value to the
      specified value and set the PC to an override function that simply
      returns, bypassing the originally probed function.  This gives us a nice
      clean way to implement systematic error injection for all of our code
      paths.
      
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9802d865
  12. 22 Nov, 2017 1 commit
    • Andrey Ryabinin's avatar
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · f68d62a5
      Andrey Ryabinin authored
      [ Note, this commit is a cherry-picked version of:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 entry code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
      
      
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f68d62a5
  13. 16 Nov, 2017 2 commits
  14. 14 Nov, 2017 1 commit
  15. 11 Nov, 2017 2 commits
  16. 08 Nov, 2017 1 commit
    • Ricardo Neri's avatar
      x86/umip: Enable User-Mode Instruction Prevention at runtime · aa35f896
      Ricardo Neri authored
      
      
      User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
      bit in %cr4.
      
      It makes sense to enable UMIP at some point while booting, before user
      spaces come up. Like SMAP and SMEP, is not critical to have it enabled
      very early during boot. This is because UMIP is relevant only when there is
      a user space to be protected from. Given these similarities, UMIP can be
      enabled along with SMAP and SMEP.
      
      At the moment, UMIP is disabled by default at build time. It can be enabled
      at build time by selecting CONFIG_X86_INTEL_UMIP. If enabled at build time,
      it can be disabled at run time by adding clearcpuid=514 to the kernel
      parameters.
      
      Signed-off-by: default avatarRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1509935277-22138-10-git-send-email-ricardo.neri-calderon@linux.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aa35f896
  17. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      
      
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  18. 20 Oct, 2017 1 commit
    • Andrey Ryabinin's avatar
      x86/kasan: Use the same shadow offset for 4- and 5-level paging · 12a8cc7f
      Andrey Ryabinin authored
      
      
      We are going to support boot-time switching between 4- and 5-level
      paging. For KASAN it means we cannot have different KASAN_SHADOW_OFFSET
      for different paging modes: the constant is passed to gcc to generate
      code and cannot be changed at runtime.
      
      This patch changes KASAN code to use 0xdffffc0000000000 as shadow offset
      for both 4- and 5-level paging.
      
      For 5-level paging it means that shadow memory region is not aligned to
      PGD boundary anymore and we have to handle unaligned parts of the region
      properly.
      
      In addition, we have to exclude paravirt code from KASAN instrumentation
      as we now use set_pgd() before KASAN is fully ready.
      
      [kirill.shutemov@linux.intel.com: clenaup, changelog message]
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170929140821.37654-4-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      12a8cc7f
  19. 18 Oct, 2017 1 commit
    • Thomas Gleixner's avatar
      x86/vector/msi: Select CONFIG_GENERIC_IRQ_RESERVATION_MODE · c201c917
      Thomas Gleixner authored
      Select CONFIG_GENERIC_IRQ_RESERVATION_MODE so PCI/MSI domains get the
      MSI_FLAG_MUST_REACTIVATE flag set in pci_msi_create_irq_domain().
      
      Remove the explicit setters of this flag in the apic/msi code as they are
      not longer required.
      
      Fixes: 4900be83
      
       ("x86/vector/msi: Switch to global reservation mode")
      Reported-and-tested-by: default avatarDexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Link: https://lkml.kernel.org/r/20171017075600.527569354@linutronix.de
      c201c917
  20. 14 Oct, 2017 1 commit
  21. 28 Sep, 2017 1 commit
  22. 25 Sep, 2017 1 commit
  23. 09 Sep, 2017 2 commits
  24. 07 Sep, 2017 1 commit
    • Rik van Riel's avatar
      x86,mpx: make mpx depend on x86-64 to free up VMA flag · df3735c5
      Rik van Riel authored
      Patch series "mm,fork,security: introduce MADV_WIPEONFORK", v4.
      
      If a child process accesses memory that was MADV_WIPEONFORK, it will get
      zeroes.  The address ranges are still valid, they are just empty.
      
      If a child process accesses memory that was MADV_DONTFORK, it will get a
      segmentation fault, since those address ranges are no longer valid in
      the child after fork.
      
      Since MADV_DONTFORK also seems to be used to allow very large programs
      to fork in systems with strict memory overcommit restrictions, changing
      the semantics of MADV_DONTFORK might break existing programs.
      
      The use case is libraries that store or cache information, and want to
      know that they need to regenerate it in the child process after fork.
      
      Examples of this would be:
       - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
         check, which is too slow without a PID cache)
       - PKCS#11 API reinitialization check (mandated by specification)
       - glibc's upcoming PRNG (reseed after fork)
       - OpenSSL PRNG (reseed after fork)
      
      The security benefits of a forking server having a re-inialized PRNG in
      every child process are pretty obvious.  However, due to libraries
      having all kinds of internal state, and programs getting compiled with
      many different versions of each library, it is unreasonable to expect
      calling programs to re-initialize everything manually after fork.
      
      A further complication is the proliferation of clone flags, programs
      bypassing glibc's functions to call clone directly, and programs calling
      unshare, causing the glibc pthread_atfork hook to not get called.
      
      It would be better to have the kernel take care of this automatically.
      
      The patchset also adds MADV_KEEPONFORK, to undo the effects of a prior
      MADV_WIPEONFORK.
      
      This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
      
          https://man.openbsd.org/minherit.2
      
      This patch (of 2):
      
      MPX only seems to be available on 64 bit CPUs, starting with Skylake and
      Goldmont.  Move VM_MPX into the 64 bit only portion of vma->vm_flags, in
      order to free up a VMA flag.
      
      Link: http://lkml.kernel.org/r/20170811212829.29186-2-riel@redhat.com
      
      
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Colm MacCártaigh <colm@allcosts.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df3735c5
  25. 31 Aug, 2017 2 commits
    • Robin Murphy's avatar
      libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7
      Robin Murphy authored
      mmio_flush_range() suffers from a lack of clearly-defined semantics,
      and is somewhat ambiguous to port to other architectures where the
      scope of the writeback implied by "flush" and ordering might matter,
      but MMIO would tend to imply non-cacheable anyway. Per the rationale
      in 67a3e8fe
      
       ("nd_blk: change aperture mapping from WC to WB"), the
      only existing use is actually to invalidate clean cache lines for
      ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
      cleanup of the pmem API, that also now happens to be the exact purpose
      of arch_invalidate_pmem(), which would be a far more well-defined tool
      for the job.
      
      Rather than risk potentially inconsistent implementations of
      mmio_flush_range() for the sake of one callsite, streamline things by
      removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
      definitions up to the libnvdimm level, so they can be shared by NFIT
      as well. This allows NFIT to be enabled for arm64.
      
      Signed-off-by: Robin Murphy's avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      5deb67f7
    • Vitaly Kuznetsov's avatar
      x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y) · 9e52fc2b
      Vitaly Kuznetsov authored
      
      
      There's a subtle bug in how some of the paravirt guest code handles
      page table freeing on x86:
      
      On x86 software page table walkers depend on the fact that remote TLB flush
      does an IPI: walk is performed lockless but with interrupts disabled and in
      case the page table is freed the freeing CPU will get blocked as remote TLB
      flush is required. On other architectures which don't require an IPI to do
      remote TLB flush we have an RCU-based mechanism (see
      include/asm-generic/tlb.h for more details).
      
      In virtualized environments we may want to override the ->flush_tlb_others
      callback in pv_mmu_ops and use a hypercall asking the hypervisor to do a
      remote TLB flush for us. This breaks the assumption about IPIs. Xen PV has
      been doing this for years and the upcoming remote TLB flush for Hyper-V will
      do it too.
      
      This is not safe, as software page table walkers may step on an already
      freed page.
      
      Fix the bug by enabling the RCU-based page table freeing mechanism,
      CONFIG_HAVE_RCU_TABLE_FREE=y.
      
      Testing with kernbench and mmap/munmap microbenchmarks, and neither showed
      any noticeable performance impact.
      
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarJuergen Gross <jgross@suse.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/20170828082251.5562-1-vkuznets@redhat.com
      
      
      [ Rewrote/fixed/clarified the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9e52fc2b
  26. 29 Aug, 2017 1 commit
    • Ingo Molnar's avatar
      locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being · 7b3d61cc
      Ingo Molnar authored
      Mike Galbraith bisected a boot crash back to the following commit:
      
        7a46ec0e
      
       ("locking/refcounts, x86/asm: Implement fast refcount overflow protection")
      
      The crash/hang pattern is:
      
       > Symptom is a few splats as below, with box finally hanging.  Network
       > comes up, but neither ssh nor console login is possible.
       >
       >  ------------[ cut here ]------------
       >  WARNING: CPU: 4 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
       >  ...
       >  __sk_destruct()
       >  rcu_process_callbacks()
       >  __do_softirq()
       >  irq_exit()
       >  smp_apic_timer_interrupt()
       >  apic_timer_interrupt()
      
      We are at -rc7 already, and the code has grown some dependencies, so
      instead of a plain revert disable the config temporarily, in the hope
      of getting real fixes.
      
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Tested-by: default avatarMike Galbraith <efault@gmx.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/tip-7a46ec0e2f4850407de5e1d19a44edee6efa58ec@git.kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7b3d61cc
  27. 24 Aug, 2017 1 commit
  28. 18 Aug, 2017 2 commits
    • Nicholas Piggin's avatar
      kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog · 92e5aae4
      Nicholas Piggin authored
      Commit 05a4a952 ("kernel/watchdog: split up config options") lost
      the perf-based hardlockup detector's dependency on PERF_EVENTS, which
      can result in broken builds with some powerpc configurations.
      
      Restore the dependency.  Add it in for x86 too, despite x86 always
      selecting PERF_EVENTS it seems reasonable to make the dependency
      explicit.
      
      Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com
      Fixes: 05a4a952
      
       ("kernel/watchdog: split up config options")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      92e5aae4
    • Thomas Gleixner's avatar
      kernel/watchdog: Prevent false positives with turbo modes · 7edaeb68
      Thomas Gleixner authored
      The hardlockup detector on x86 uses a performance counter based on unhalted
      CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
      performance counter period, so the hrtimer should fire 2-3 times before the
      performance counter NMI fires. The NMI code checks whether the hrtimer
      fired since the last invocation. If not, it assumess a hard lockup.
      
      The calculation of those periods is based on the nominal CPU
      frequency. Turbo modes increase the CPU clock frequency and therefore
      shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
      nominal frequency) the perf/NMI period is shorter than the hrtimer period
      which leads to false positives.
      
      A simple fix would be to shorten the hrtimer period, but that comes with
      the side effect of more frequent hrtimer and softlockup thread wakeups,
      which is not desired.
      
      Implement a low pass filter, which checks the perf/NMI period against
      kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
      elapsed then the event is ignored and postponed to the next perf/NMI.
      
      That solves the problem and avoids the overhead of shorter hrtimer periods
      and more frequent softlockup thread wakeups.
      
      Fixes: 58687acb
      
       ("lockup_detector: Combine nmi_watchdog and softlockup detector")
      Reported-and-tested-by: default avatarKan Liang <Kan.liang@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: dzickus@redhat.com
      Cc: prarit@redhat.com
      Cc: ak@linux.intel.com
      Cc: babu.moger@oracle.com
      Cc: peterz@infradead.org
      Cc: eranian@google.com
      Cc: acme@redhat.com
      Cc: stable@vger.kernel.org
      Cc: atomlin@redhat.com
      Cc: akpm@linux-foundation.org
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
      7edaeb68
  29. 17 Aug, 2017 1 commit
    • Kees Cook's avatar
      locking/refcounts, x86/asm: Implement fast refcount overflow protection · 7a46ec0e
      Kees Cook authored
      This implements refcount_t overflow protection on x86 without a noticeable
      performance impact, though without the fuller checking of REFCOUNT_FULL.
      
      This is done by duplicating the existing atomic_t refcount implementation
      but with normally a single instruction added to detect if the refcount
      has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
      the handler saturates the refcount_t to INT_MIN / 2. With this overflow
      protection, the erroneous reference release that would follow a wrap back
      to zero is blocked from happening, avoiding the class of refcount-overflow
      use-after-free vulnerabilities entirely.
      
      Only the overflow case of refcounting can be perfectly protected, since
      it can be detected and stopped before the reference is freed and left to
      be abused by an attacker. There isn't a way to block early decrements,
      and while REFCOUNT_FULL stops increment-from-zero cases (which would
      be the state _after_ an early decrement and stops potential double-free
      conditions), this fast implementation does not, since it would require
      the more expensive cmpxchg loops. Since the overflow case is much more
      common (e.g. missing a "put" during an error path), this protection
      provides real-world protection. For example, the two public refcount
      overflow use-after-free exploits published in 2016 would have been
      rendered unexploitable:
      
        http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/
      
        http://cyseclabs.com/page?n=02012016
      
      
      
      This implementation does, however, notice an unchecked decrement to zero
      (i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
      resulted in a zero). Decrements under zero are noticed (since they will
      have resulted in a negative value), though this only indicates that a
      use-after-free may have already happened. Such notifications are likely
      avoidable by an attacker that has already exploited a use-after-free
      vulnerability, but it's better to have them reported than allow such
      conditions to remain universally silent.
      
      On first overflow detection, the refcount value is reset to INT_MIN / 2
      (which serves as a saturation value) and a report and stack trace are
      produced. When operations detect only negative value results (such as
      changing an already saturated value), saturation still happens but no
      notification is performed (since the value was already saturated).
      
      On the matter of races, since the entire range beyond INT_MAX but before
      0 is negative, every operation at INT_MIN / 2 will trap, leaving no
      overflow-only race condition.
      
      As for performance, this implementation adds a single "js" instruction
      to the regular execution flow of a copy of the standard atomic_t refcount
      operations. (The non-"and_test" refcount_dec() function, which is uncommon
      in regular refcount design patterns, has an additional "jz" instruction
      to detect reaching exactly zero.) Since this is a forward jump, it is by
      default the non-predicted path, which will be reinforced by dynamic branch
      prediction. The result is this protection having virtually no measurable
      change in performance over standard atomic_t operations. The error path,
      located in .text.unlikely, saves the refcount location and then uses UD0
      to fire a refcount exception handler, which resets the refcount, handles
      reporting, and returns to regular execution. This keeps the changes to
      .text size minimal, avoiding return jumps and open-coded calls to the
      error reporting routine.
      
      Example assembly comparison:
      
      refcount_inc() before:
      
        .text:
        ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
      
      refcount_inc() after:
      
        .text:
        ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
        ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
        ...
        .text.unlikely:
        ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
        ffffffff816c36d7:       0f ff                   (bad)
      
      These are the cycle counts comparing a loop of refcount_inc() from 1
      to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
      unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
      (refcount_t-full), and this overflow-protected refcount (refcount_t-fast):
      
        2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
      		    cycles		protections
        atomic_t           82249267387	none
        refcount_t-fast    82211446892	overflow, untested dec-to-zero
        refcount_t-full   144814735193	overflow, untested dec-to-zero, inc-from-zero
      
      This code is a modified version of the x86 PAX_REFCOUNT atomic_t
      overflow defense from the last public patch of PaX/grsecurity, based
      on my understanding of the code. Changes or omissions from the original
      code are mine and don't reflect the original grsecurity/PaX code. Thanks
      to PaX Team for various suggestions for improvement for repurposing this
      code to be a refcount-only protection.
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Hans Liljestrand <ishkamiel@gmail.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Serge E. Hallyn <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arozansk@redhat.com
      Cc: axboe@kernel.dk
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7a46ec0e
  30. 01 Aug, 2017 1 commit
  31. 26 Jul, 2017 2 commits
    • Josh Poimboeuf's avatar
      x86/kconfig: Consolidate unwinders into multiple choice selection · 81d38719
      Josh Poimboeuf authored
      
      
      There are three mutually exclusive unwinders.  Make that more obvious by
      combining them into a multiple-choice selection:
      
        CONFIG_FRAME_POINTER_UNWINDER
        CONFIG_ORC_UNWINDER
        CONFIG_GUESS_UNWINDER (if CONFIG_EXPERT=y)
      
      Frame pointers are still the default (for now).
      
      The old CONFIG_FRAME_POINTER option is still used in some
      arch-independent places, so keep it around, but make it
      invisible to the user on x86 - it's now selected by
      CONFIG_FRAME_POINTER_UNWINDER=y.
      
      Suggested-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170725135424.zukjmgpz3plf5pmt@treble
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      81d38719
    • Josh Poimboeuf's avatar
      x86/unwind: Add the ORC unwinder · ee9f8fce
      Josh Poimboeuf authored
      
      
      Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
      It plugs into the existing x86 unwinder framework.
      
      It relies on objtool to generate the needed .orc_unwind and
      .orc_unwind_ip sections.
      
      For more details on why ORC is used instead of DWARF, see
      Documentation/x86/orc-unwinder.txt - but the short version is
      that it's a simplified, fundamentally more robust debugninfo
      data structure, which also allows up to two orders of magnitude
      faster lookups than the DWARF unwinder - which matters to
      profiling workloads like perf.
      
      Thanks to Andy Lutomirski for the performance improvement ideas:
      splitting the ORC unwind table into two parallel arrays and creating a
      fast lookup table to search a subset of the unwind table.
      
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
      
      
      [ Extended the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ee9f8fce