1. 15 Aug, 2020 1 commit
    • Xiaoming Ni's avatar
      all arch: remove system call sys_sysctl · 88db0aa2
      Xiaoming Ni authored
      Since commit 61a47c1a ("sysctl: Remove the sysctl system call"),
      sys_sysctl is actually unavailable: any input can only return an error.
      
      We have been warning about people using the sysctl system call for years
      and believe there are no more users.  Even if there are users of this
      interface if they have not complained or fixed their code by now they
      probably are not going to, so there is no point in warning them any
      longer.
      
      So completely remove sys_sysctl on all architectures.
      
      [nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
       Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.com
      
      
      
      Signed-off-by: default avatarXiaoming Ni <nixiaoming@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: Will Deacon <will@kernel.org>		[arm/arm64]
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: chenzefeng <chenzefeng2@huawei.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Diego Elio Pettenò <flameeyes@flameeyes.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kars de Jong <jongk@linux-m68k.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Sven Schnelle <svens@stackframe.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com>
      Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      88db0aa2
  2. 14 Aug, 2020 4 commits
  3. 13 Aug, 2020 1 commit
  4. 12 Aug, 2020 5 commits
    • Peter Xu's avatar
      mm/x86: use general page fault accounting · 968614fc
      Peter Xu authored
      
      
      Use the general page fault accounting by passing regs into
      handle_mm_fault().
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/20200707225021.200906-23-peterx@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      968614fc
    • Peter Xu's avatar
      mm: do page fault accounting in handle_mm_fault · bce617ed
      Peter Xu authored
      Patch series "mm: Page fault accounting cleanups", v5.
      
      This is v5 of the pf accounting cleanup series.  It originates from Gerald
      Schaefer's report on an issue a week ago regarding to incorrect page fault
      accountings for retried page fault after commit 4064b982 ("mm: allow
      VM_FAULT_RETRY for multiple times"):
      
        https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/
      
      
      
      What this series did:
      
        - Correct page fault accounting: we do accounting for a page fault
          (no matter whether it's from #PF handling, or gup, or anything else)
          only with the one that completed the fault.  For example, page fault
          retries should not be counted in page fault counters.  Same to the
          perf events.
      
        - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
          event is used in an adhoc way across different archs.
      
          Case (1): for many archs it's done at the entry of a page fault
          handler, so that it will also cover e.g.  errornous faults.
      
          Case (2): for some other archs, it is only accounted when the page
          fault is resolved successfully.
      
          Case (3): there're still quite some archs that have not enabled
          this perf event.
      
          Since this series will touch merely all the archs, we unify this
          perf event to always follow case (1), which is the one that makes most
          sense.  And since we moved the accounting into handle_mm_fault, the
          other two MAJ/MIN perf events are well taken care of naturally.
      
        - Unify definition of "major faults": the definition of "major
          fault" is slightly changed when used in accounting (not
          VM_FAULT_MAJOR).  More information in patch 1.
      
        - Always account the page fault onto the one that triggered the page
          fault.  This does not matter much for #PF handlings, but mostly for
          gup.  More information on this in patch 25.
      
      Patchset layout:
      
      Patch 1:     Introduced the accounting in handle_mm_fault(), not enabled.
      Patch 2-23:  Enable the new accounting for arch #PF handlers one by one.
      Patch 24:    Enable the new accounting for the rest outliers (gup, iommu, etc.)
      Patch 25:    Cleanup GUP task_struct pointer since it's not needed any more
      
      This patch (of 25):
      
      This is a preparation patch to move page fault accountings into the
      general code in handle_mm_fault().  This includes both the per task
      flt_maj/flt_min counters, and the major/minor page fault perf events.  To
      do this, the pt_regs pointer is passed into handle_mm_fault().
      
      PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
      handlers.
      
      So far, all the pt_regs pointer that passed into handle_mm_fault() is
      NULL, which means this patch should have no intented functional change.
      
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
      Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bce617ed
    • Christoph Hellwig's avatar
      uaccess: remove segment_eq · 428e2976
      Christoph Hellwig authored
      
      
      segment_eq is only used to implement uaccess_kernel.  Just open code
      uaccess_kernel in the arch uaccess headers and remove one layer of
      indirection.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarGreentime Hu <green.hu@gmail.com>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Link: http://lkml.kernel.org/r/20200710135706.537715-5-hch@lst.de
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      428e2976
    • justin-he's avatar
      mm/memory_hotplug: introduce default dummy memory_add_physaddr_to_nid() · d622ecec
      justin-he authored
      
      
      This is to introduce a general dummy helper.  memory_add_physaddr_to_nid()
      is a fallback option to get the nid in case NUMA_NO_NID is detected.
      
      After this patch, arm64/sh/s390 can simply use the general dummy version.
      PowerPC/x86/ia64 will still use their specific version.
      
      This is the preparation to set a fallback value for dev_dax->target_node.
      
      Signed-off-by: justin-he's avatarJia He <justin.he@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Chuhong Yuan <hslester96@gmail.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
      Cc: Kaly Xin <Kaly.Xin@arm.com>
      Link: http://lkml.kernel.org/r/20200710031619.18762-2-justin.he@arm.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d622ecec
    • Daniel Jordan's avatar
      x86/mm: use max memory block size on bare metal · fe124c95
      Daniel Jordan authored
      Some of our servers spend significant time at kernel boot initializing
      memory block sysfs directories and then creating symlinks between them and
      the corresponding nodes.  The slowness happens because the machines get
      stuck with the smallest supported memory block size on x86 (128M), which
      results in 16,288 directories to cover the 2T of installed RAM.  The
      search for each memory block is noticeable even with commit 4fb6eabf
      ("drivers/base/memory.c: cache memory blocks in xarray to accelerate
      lookup").
      
      Commit 078eb6aa ("x86/mm/memory_hotplug: determine block size based on
      the end of boot memory") chooses the block size based on alignment with
      memory end.  That addresses hotplug failures in qemu guests, but for bare
      metal systems whose memory end isn't aligned to even the smallest size, it
      leaves them at 128M.
      
      Make kernels that aren't running on a hypervisor use the largest supported
      size (2G) to minimize overhead on big machines.  Kernel boot goes 7%
      faster on the aforementioned servers, shaving off half a second.
      
      [daniel.m.jordan@oracle.com: v3]
        Link: http://lkml.kernel.org/r/20200714205450.945834-1-daniel.m.jordan@oracle.com
      
      
      
      Signed-off-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20200609225451.3542648-1-daniel.m.jordan@oracle.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe124c95
  5. 11 Aug, 2020 4 commits
  6. 10 Aug, 2020 1 commit
    • Jon Doron's avatar
      x86/kvm/hyper-v: Synic default SCONTROL MSR needs to be enabled · 99b48ecc
      Jon Doron authored
      
      
      Based on an analysis of the HyperV firmwares (Gen1 and Gen2) it seems
      like the SCONTROL is not being set to the ENABLED state as like we have
      thought.
      
      Also from a test done by Vitaly Kuznetsov, running a nested HyperV it
      was concluded that the first access to the SCONTROL MSR with a read
      resulted with the value of 0x1, aka HV_SYNIC_CONTROL_ENABLE.
      
      It's important to note that this diverges from the value states in the
      HyperV TLFS of 0.
      
      Signed-off-by: default avatarJon Doron <arilou@gmail.com>
      Message-Id: <20200717125238.1103096-2-arilou@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      99b48ecc
  7. 09 Aug, 2020 2 commits
    • Paolo Bonzini's avatar
      x86: Expose SERIALIZE for supported cpuid · 43bd9ef4
      Paolo Bonzini authored
      
      
      The SERIALIZE instruction is supported by Tntel processors, like
      Sapphire Rapids.  SERIALIZE is a faster serializing instruction which
      does not modify registers, arithmetic flags or memory, will not cause VM
      exit. It's availability is indicated by CPUID.(EAX=7,ECX=0):ECX[bit 14].
      
      Expose it in KVM supported CPUID.  This way, KVM could pass this
      information to guests and they can make use of these features accordingly.
      
      Signed-off-by: default avatarCathy Zhang <cathy.zhang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      43bd9ef4
    • Sean Christopherson's avatar
      KVM: x86: Don't attempt to load PDPTRs when 64-bit mode is enabled · 05487215
      Sean Christopherson authored
      Don't attempt to load PDPTRs if EFER.LME=1, i.e. if 64-bit mode is
      enabled.  A recent change to reload the PDTPRs when CR0.CD or CR0.NW is
      toggled botched the EFER.LME handling and sends KVM down the PDTPR path
      when is_paging() is true, i.e. when the guest toggles CD/NW in 64-bit
      mode.
      
      Split the CR0 checks for 64-bit vs. 32-bit PAE into separate paths.  The
      64-bit path is specifically checking state when paging is toggled on,
      i.e. CR0.PG transititions from 0->1.  The PDPTR path now needs to run if
      the new CR0 state has paging enabled, irrespective of whether paging was
      already enabled.  Trying to shave a few cycles to make the PDPTR path an
      "else if" case is a mess.
      
      Fixes: d42e3fae
      
       ("kvm: x86: Read PDPTEs on CR0.CD and CR0.NW changes")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Peter Shier <pshier@google.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20200714015732.32426-1-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      05487215
  8. 07 Aug, 2020 9 commits
    • Mike Rapoport's avatar
      mm/sparse: cleanup the code surrounding memory_present() · c89ab04f
      Mike Rapoport authored
      
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
      functions that call memory_present() for each region in memblock.memory:
      sparse_memory_present_with_active_regions() and membocks_present().
      
      Moreover, all architectures have a call to either of these functions
      preceding the call to sparse_init() and in the most cases they are called
      one after the other.
      
      Mark the regions from memblock.memory as present during sparce_init() by
      making sparse_init() call memblocks_present(), make memblocks_present()
      and memory_present() functions static and remove redundant
      sparse_memory_present_with_active_regions() function.
      
      Also remove no longer required HAVE_MEMORY_PRESENT configuration option.
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c89ab04f
    • Anshuman Khandual's avatar
      mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf() · 56993b4e
      Anshuman Khandual authored
      
      
      There are many instances where vmemap allocation is often switched between
      regular memory and device memory just based on whether altmap is available
      or not.  vmemmap_alloc_block_buf() is used in various platforms to
      allocate vmemmap mappings.  Lets also enable it to handle altmap based
      device memory allocation along with existing regular memory allocations.
      This will help in avoiding the altmap based allocation switch in many
      places.  To summarize there are two different methods to call
      vmemmap_alloc_block_buf().
      
      vmemmap_alloc_block_buf(size, node, NULL)   /* Allocate from system RAM */
      vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */
      
      This converts altmap_alloc_block_buf() into a static function, drops it's
      entry from the header and updates Documentation/vm/memory-model.rst.
      
      Suggested-by: Robin Murphy's avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: Anshuman Khandual's avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: justin-he's avatarJia He <justin.he@arm.com>
      Reviewed-by: Catalin Marinas's avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Will Deacon <will@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56993b4e
    • Anshuman Khandual's avatar
      mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages() · 1d9cfee7
      Anshuman Khandual authored
      
      
      Patch series "arm64: Enable vmemmap mapping from device memory", v4.
      
      This series enables vmemmap backing memory allocation from device memory
      ranges on arm64.  But before that, it enables vmemmap_populate_basepages()
      and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
      alocation requests.
      
      This patch (of 3):
      
      vmemmap_populate_basepages() is used across platforms to allocate backing
      memory for vmemmap mapping.  This is used as a standard default choice or
      as a fallback when intended huge pages allocation fails.  This just
      creates entire vmemmap mapping with base pages (PAGE_SIZE).
      
      On arm64 platforms, vmemmap_populate_basepages() is called instead of the
      platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS
      is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs.
      
      At present vmemmap_populate_basepages() does not support allocating from
      driver defined struct vmem_altmap while trying to create vmemmap mapping
      for a device memory range.  It prevents ARM64_16K_PAGES and
      ARM64_64K_PAGES configs on arm64 from supporting device memory with
      vmemap_altmap request.
      
      This enables vmem_altmap support in vmemmap_populate_basepages() unlocking
      device memory allocation for vmemap mapping on arm64 platforms with 16K or
      64K base page configs.
      
      Each architecture should evaluate and decide on subscribing device memory
      based base page allocation through vmemmap_populate_basepages().  Hence
      lets keep it disabled on all archs in order to preserve the existing
      semantics.  A subsequent patch enables it on arm64.
      
      Signed-off-by: Anshuman Khandual's avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: justin-he's avatarJia He <justin.he@arm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: Catalin Marinas's avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Link: http://lkml.kernel.org/r/1594004178-8861-1-git-send-email-anshuman.khandual@arm.com
      Link: http://lkml.kernel.org/r/1594004178-8861-2-git-send-email-anshuman.khandual@arm.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1d9cfee7
    • Mike Rapoport's avatar
      asm-generic: pgalloc: provide generic pgd_free() · f9cb654c
      Mike Rapoport authored
      
      
      Most architectures define pgd_free() as a wrapper for free_page().
      
      Provide a generic version in asm-generic/pgalloc.h and enable its use for
      most architectures.
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9cb654c
    • Mike Rapoport's avatar
      asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one() · d9e8b929
      Mike Rapoport authored
      
      
      Several architectures define pud_alloc_one() as a wrapper for
      __get_free_page() and pud_free() as a wrapper for free_page().
      
      Provide a generic implementation in asm-generic/pgalloc.h and use it where
      appropriate.
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200627143453.31835-6-rppt@kernel.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9e8b929
    • Mike Rapoport's avatar
      asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one() · 1355c31e
      Mike Rapoport authored
      
      
      For most architectures that support >2 levels of page tables,
      pmd_alloc_one() is a wrapper for __get_free_pages(), sometimes with
      __GFP_ZERO and sometimes followed by memset(0) instead.
      
      More elaborate versions on arm64 and x86 account memory for the user page
      tables and call to pgtable_pmd_page_ctor() as the part of PMD page
      initialization.
      
      Move the arm64 version to include/asm-generic/pgalloc.h and use the
      generic version on several architectures.
      
      The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is
      not enabled, so there is no functional change for most architectures
      except of the addition of __GFP_ACCOUNT for allocation of user page
      tables.
      
      The pmd_free() is a wrapper for free_page() in all the cases, so no
      functional change here.
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Link: http://lkml.kernel.org/r/20200627143453.31835-5-rppt@kernel.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1355c31e
    • Mike Rapoport's avatar
      mm: remove unneeded includes of <asm/pgalloc.h> · ca15ca40
      Mike Rapoport authored
      
      
      Patch series "mm: cleanup usage of <asm/pgalloc.h>"
      
      Most architectures have very similar versions of pXd_alloc_one() and
      pXd_free_one() for intermediate levels of page table.  These patches add
      generic versions of these functions in <asm-generic/pgalloc.h> and enable
      use of the generic functions where appropriate.
      
      In addition, functions declared and defined in <asm/pgalloc.h> headers are
      used mostly by core mm and early mm initialization in arch and there is no
      actual reason to have the <asm/pgalloc.h> included all over the place.
      The first patch in this series removes unneeded includes of
      <asm/pgalloc.h>
      
      In the end it didn't work out as neatly as I hoped and moving
      pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
      unnecessary changes to arches that have custom page table allocations, so
      I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
      to mm/.
      
      This patch (of 8):
      
      In most cases <asm/pgalloc.h> header is required only for allocations of
      page table memory.  Most of the .c files that include that header do not
      use symbols declared in <asm/pgalloc.h> and do not require that header.
      
      As for the other header files that used to include <asm/pgalloc.h>, it is
      possible to move that include into the .c file that actually uses symbols
      from <asm/pgalloc.h> and drop the include from the header file.
      
      The process was somewhat automated using
      
      	sed -i -E '/[<"]asm\/pgalloc\.h/d' \
                      $(grep -L -w -f /tmp/xx \
                              $(git grep -E -l '[<"]asm/pgalloc\.h'))
      
      where /tmp/xx contains all the symbols defined in
      arch/*/include/asm/pgalloc.h.
      
      [rppt@linux.ibm.com: fix powerpc warning]
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ca15ca40
    • Waiman Long's avatar
      mm, treewide: rename kzfree() to kfree_sensitive() · 453431a5
      Waiman Long authored
      
      
      As said by Linus:
      
        A symmetric naming is only helpful if it implies symmetries in use.
        Otherwise it's actively misleading.
      
        In "kzalloc()", the z is meaningful and an important part of what the
        caller wants.
      
        In "kzfree()", the z is actively detrimental, because maybe in the
        future we really _might_ want to use that "memfill(0xdeadbeef)" or
        something. The "zero" part of the interface isn't even _relevant_.
      
      The main reason that kzfree() exists is to clear sensitive information
      that should not be leaked to other future users of the same memory
      objects.
      
      Rename kzfree() to kfree_sensitive() to follow the example of the recently
      added kvfree_sensitive() and make the intention of the API more explicit.
      In addition, memzero_explicit() is used to clear the memory to make sure
      that it won't get optimized away by the compiler.
      
      The renaming is done by using the command sequence:
      
        git grep -w --name-only kzfree |\
        xargs sed -i 's/kzfree/kfree_sensitive/'
      
      followed by some editing of the kfree_sensitive() kerneldoc and adding
      a kzfree backward compatibility macro in slab.h.
      
      [akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
      [akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]
      
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
      Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      453431a5
    • Joerg Roedel's avatar
      x86/mm/64: Do not dereference non-present PGD entries · 995909a4
      Joerg Roedel authored
      
      
      The code for preallocate_vmalloc_pages() was written under the
      assumption that the p4d_offset() and pud_offset() functions will perform
      present checks before dereferencing the parent entries.
      
      This assumption is wrong an leads to a bug in the code which causes the
      physical address found in the PGD be used as a page-table page, even if
      the PGD is not present.
      
      So the code flow currently is:
      
      	pgd = pgd_offset_k(addr);
      	p4d = p4d_offset(pgd, addr);
      	if (p4d_none(*p4d))
      		p4d = p4d_alloc(&init_mm, pgd, addr);
      
      This lacks a check for pgd_none() at least, the correct flow would be:
      
      	pgd = pgd_offset_k(addr);
      	if (pgd_none(*pgd))
      		p4d = p4d_alloc(&init_mm, pgd, addr);
      	else
      		p4d = p4d_offset(pgd, addr);
      
      But this is the same flow that the p4d_alloc() and the pud_alloc()
      functions use internally, so there is no need to duplicate them.
      
      Remove the p?d_none() checks from the function and just call into
      p4d_alloc() and pud_alloc() to correctly pre-allocate the PGD entries.
      
      Reported-and-tested-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Fixes: 6eb82f99
      
       ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc area")
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      995909a4
  9. 06 Aug, 2020 12 commits
  10. 05 Aug, 2020 1 commit