1. 24 Sep, 2019 1 commit
  2. 13 Sep, 2019 1 commit
    • Qian Cai's avatar
      powerpc/mm/radix: remove useless kernel messages · ec5b705c
      Qian Cai authored
      
      
      Booting a POWER9 PowerNV system generates a few messages below with
      "____ptrval____" due to the pointers printed without a specifier
      extension (i.e unadorned %p) are hashed to prevent leaking information
      about the kernel memory layout.
      
      radix-mmu: Initializing Radix MMU
      radix-mmu: Partition table (____ptrval____)
      radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB
      pages (exec)
      radix-mmu: Mapped 0x0000000040000000-0x0000002000000000 with 1.00 GiB
      pages
      radix-mmu: Mapped 0x0000200000000000-0x0000202000000000 with 1.00 GiB
      pages
      radix-mmu: Process table (____ptrval____) and radix root for kernel:
      (____ptrval____)
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1566570120-16529-1-git-send-email-cai@lca.pw
      ec5b705c
  3. 05 Sep, 2019 4 commits
  4. 29 Aug, 2019 1 commit
  5. 27 Aug, 2019 1 commit
  6. 20 Aug, 2019 2 commits
  7. 17 Jul, 2019 1 commit
  8. 04 Jul, 2019 2 commits
  9. 19 Jun, 2019 2 commits
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP · d909f910
      Nicholas Piggin authored
      
      
      This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required
      page table functions.
      
      This enables huge (2MB and 1GB) ioremap mappings. I don't have a
      benchmark for this change, but huge vmap will be used by a later core
      kernel change to enable huge vmalloc memory mappings. This improves
      cached `git diff` performance by about 5% on a 2-node POWER9 with 32MB
      size dentry cache hash.
      
        Profiling git diff dTLB misses with a vanilla kernel:
      
        81.75%  git      [kernel.vmlinux]    [k] __d_lookup_rcu
         7.21%  git      [kernel.vmlinux]    [k] strncpy_from_user
         1.77%  git      [kernel.vmlinux]    [k] find_get_entry
         1.59%  git      [kernel.vmlinux]    [k] kmem_cache_free
      
                  40,168      dTLB-miss
             0.100342754 seconds time elapsed
      
        With powerpc huge vmalloc:
      
                   2,987      dTLB-miss
             0.095933138 seconds time elapsed
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d909f910
    • Nicholas Piggin's avatar
      powerpc/64s/radix: ioremap use ioremap_page_range · d38153f9
      Nicholas Piggin authored
      
      
      Radix can use ioremap_page_range for ioremap, after slab is available.
      This makes it possible to enable huge ioremap mapping support.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d38153f9
  10. 30 May, 2019 1 commit
  11. 28 May, 2019 1 commit
  12. 02 May, 2019 1 commit
  13. 28 Apr, 2019 1 commit
  14. 21 Apr, 2019 5 commits
  15. 12 Mar, 2019 1 commit
    • Mike Rapoport's avatar
      treewide: add checks for the return value of memblock_alloc*() · 8a7f97b9
      Mike Rapoport authored
      Add check for the return value of memblock_alloc*() functions and call
      panic() in case of error.  The panic message repeats the one used by
      panicing memblock allocators with adjustment of parameters to include
      only relevant ones.
      
      The replacement was mostly automated with semantic patches like the one
      below with manual massaging of format strings.
      
        @@
        expression ptr, size, align;
        @@
        ptr = memblock_alloc(size, align);
        + if (!ptr)
        + 	panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);
      
      [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
        Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
      [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
        Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
      [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
        Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
      [akpm@linux-foundation.org: fix xtensa printk warning]
      Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Reviewed-by: Guo Ren <ren_guo@c-sky.com>		[c-sky]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>	[s390]
      Reviewed-by: Juergen Gross <jgross@suse.com>		[Xen]
      Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Acked-by: Max Filippov <jcmvbkbc@gmail.com>		[xtensa]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a7f97b9
  16. 08 Mar, 2019 1 commit
    • Mike Rapoport's avatar
      powerpc: prefer memblock APIs returning virtual address · f806714f
      Mike Rapoport authored
      Patch series "memblock: simplify several early memory allocation", v4.
      
      These patches simplify some of the early memory allocations by replacing
      usage of older memblock APIs with newer and shinier ones.
      
      Quite a few places in the arch/ code allocated memory using a memblock
      API that returns a physical address of the allocated area, then
      converted this physical address to a virtual one and then used memset(0)
      to clear the allocated range.
      
      More recent memblock APIs do all the three steps in one call and their
      usage simplifies the code.
      
      It's important to note that regardless of API used, the core allocation
      is nearly identical for any set of memblock allocators: first it tries
      to find a free memory with all the constraints specified by the caller
      and then falls back to the allocation with some or all constraints
      disabled.
      
      The first three patches perform the conversion of call sites that have
      exact requirements for the node and the possible memory range.
      
      The fourth patch is a bit one-off as it simplifies openrisc's
      implementation of pte_alloc_one_kernel(), and not only the memblock
      usage.
      
      The fifth patch takes care of simpler cases when the allocation can be
      satisfied with a simple call to memblock_alloc().
      
      The sixth patch removes one-liner wrappers for memblock_alloc on arm and
      unicore32, as suggested by Christoph.
      
      This patch (of 6):
      
      There are a several places that allocate memory using memblock APIs that
      return a physical address, convert the returned address to the virtual
      address and frequently also memset(0) the allocated range.
      
      Update these places to use memblock allocators already returning a
      virtual address.  Use memblock functions that clear the allocated memory
      instead of calling memset(0) where appropriate.
      
      The calls to memblock_alloc_base() that were not followed by memset(0)
      are replaced with memblock_alloc_try_nid_raw().  Since the latter does
      not panic() when the allocation fails, the appropriate panic() calls are
      added to the call sites.
      
      Link: http://lkml.kernel.org/r/1546248566-14910-2-git-send-email-rppt@linux.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f806714f
  17. 06 Mar, 2019 1 commit
  18. 20 Oct, 2018 6 commits
    • Michael Ellerman's avatar
      powerpc/mm/radix: Display if mappings are exec or not · afb6d064
      Michael Ellerman authored
      
      
      At boot we print the ranges we've mapped for the linear mapping and
      what page size we've used. Also track whether the range is mapped
      executable or not and display that as well.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      afb6d064
    • Michael Ellerman's avatar
      powerpc/mm/radix: Simplify split mapping logic · 232aa407
      Michael Ellerman authored
      
      
      If we look closely at the logic in create_physical_mapping(), when
      we're doing STRICT_KERNEL_RWX, we do the following steps:
        - determine the gap from where we are to the end of the range
        - choose an appropriate mapping_size based on the gap
        - check if that mapping_size would overlap the __init_begin
          boundary, and if not choose an appropriate mapping_size
      
      We can simplify the logic by taking the __init_begin boundary into
      account when we calculate the initial gap.
      
      So add a next_boundary() function which tells us what the next
      boundary is, either the __init_begin boundary or end. In future we can
      add more boundaries.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      232aa407
    • Michael Ellerman's avatar
      powerpc/mm/radix: Remove the retry in the split mapping logic · 57306c66
      Michael Ellerman authored
      
      
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel
      text read only.
      
      The current logic uses a goto inside the for loop, which works, but is
      hard to reason about.
      
      When we hit the goto retry case we set max_mapping_size to PMD_SIZE
      and go back to the start.
      
      Setting max_mapping_size means we skip the PUD case and go to the PMD
      case.
      
      We know we will pass the alignment and gap checks because the only
      reason we are there is we hit the goto retry, and that is guarded by
      mapping_size == PUD_SIZE, which means addr is PUD aligned and gap is
      greater or equal to PUD_SIZE.
      
      So the only part of the check that can fail is the mmu_psize_defs
      check for the 2M page size.
      
      If we just duplicate that check we can avoid the goto, and we get the
      same result.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      57306c66
    • Michael Ellerman's avatar
      powerpc/mm/radix: Fix small page at boundary when splitting · 81d1b54d
      Michael Ellerman authored
      
      
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel
      text read only.
      
      Currently we always use a small page at the text/data boundary, even
      when that's not necessary:
      
        Mapped 0x0000000000000000-0x0000000000e00000 with 2.00 MiB pages
        Mapped 0x0000000000e00000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
      
      This is because the check that the mapping crosses the __init_begin
      boundary is too strict, it also returns true when we map exactly up to
      the boundary.
      
      So fix it to check that the mapping would actually map past
      __init_begin, and with that we see:
      
        Mapped 0x0000000000000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      81d1b54d
    • Michael Ellerman's avatar
      powerpc/mm/radix: Fix overuse of small pages in splitting logic · 3b5657ed
      Michael Ellerman authored
      
      
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
      linear mapping at the text/data boundary so we can map the kernel text
      read only.
      
      But the current logic uses small pages for the entire text section,
      regardless of whether a larger page size would fit. eg. with the
      boundary at 16M we could use 2M pages, but instead we use 64K pages up
      to the 16M boundary:
      
        Mapped 0x0000000000000000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      This is because the test is checking if addr is < __init_begin
      and addr + mapping_size is >= _stext. But that is true for all pages
      between _stext and __init_begin.
      
      Instead what we want to check is if we are crossing the text/data
      boundary, which is at __init_begin. With that fixed we see:
      
        Mapped 0x0000000000000000-0x0000000000e00000 with 2.00 MiB pages
        Mapped 0x0000000000e00000-0x0000000001000000 with 64.0 KiB pages
        Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      ie. we're correctly using 2MB pages below __init_begin, but we still
      drop down to 64K pages unnecessarily at the boundary.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3b5657ed
    • Michael Ellerman's avatar
      powerpc/mm/radix: Fix off-by-one in split mapping logic · 5c6499b7
      Michael Ellerman authored
      
      
      When we have CONFIG_STRICT_KERNEL_RWX enabled, we try to split the
      kernel linear (1:1) mapping so that the kernel text is in a separate
      page to kernel data, so we can mark the former read-only.
      
      We could achieve that just by always using 64K pages for the linear
      mapping, but we try to be smarter. Instead we use huge pages when
      possible, and only switch to smaller pages when necessary.
      
      However we have an off-by-one bug in that logic, which causes us to
      calculate the wrong boundary between text and data.
      
      For example with the end of the kernel text at 16M we see:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000001200000 with 64.0 KiB pages
        radix-mmu: Mapped 0x0000000001200000-0x0000000040000000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      
      ie. we mapped from 0 to 18M with 64K pages, even though the boundary
      between text and data is at 16M.
      
      With the fix we see we're correctly hitting the 16M boundary:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000001000000 with 64.0 KiB pages
        radix-mmu: Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5c6499b7
  19. 23 Aug, 2018 1 commit
  20. 13 Aug, 2018 1 commit
    • Aneesh Kumar K.V's avatar
      powerpc/mm/book3s/radix: Add mapping statistics · a2dc009a
      Aneesh Kumar K.V authored
      Add statistics that show how memory is mapped within the kernel linear mapping.
      This is similar to commit 37cd944c
      
       ("s390/pgtable: add mapping statistics")
      
      We don't do this with Hash translation mode. Hash uses one size (mmu_linear_psize)
      to map the kernel linear mapping and we print the linear psize during boot as
      below.
      
      "Page orders: linear mapping = 24, virtual = 16, io = 16, vmemmap = 24"
      
      A sample output looks like:
      
      DirectMap4k:           0 kB
      DirectMap64k:       18432 kB
      DirectMap2M:     1030144 kB
      DirectMap1G:    11534336 kB
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a2dc009a
  21. 16 Jul, 2018 1 commit
  22. 03 Jun, 2018 4 commits
    • Nicholas Piggin's avatar
      powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags · f1cb8f9b
      Nicholas Piggin authored
      
      
      The ISA suggests ptesync after setting a pte, to prevent a table walk
      initiated by a subsequent access from missing that store and causing a
      spurious fault. This is an architectual allowance that allows an
      implementation's page table walker to be incoherent with the store
      queue.
      
      However there is no correctness problem in taking a spurious fault in
      userspace -- the kernel copes with these at any time, so the updated
      pte will be found eventually. Spurious kernel faults on vmap memory
      must be avoided, so a ptesync is put into flush_cache_vmap.
      
      On POWER9 so far I have not found a measurable window where this can
      result in more minor faults, so as an optimisation, remove the costly
      ptesync from pte updates. If an implementation benefits from ptesync,
      it would be better to add it back in update_mmu_cache, so it's not
      done for things like fork(2).
      
      fork --fork --exec benchmark improved 5.2% (12400->13100).
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f1cb8f9b
    • Nicholas Piggin's avatar
      powerpc/64s/radix: do not flush TLB when relaxing access · e5f7cb58
      Nicholas Piggin authored
      
      
      Radix flushes the TLB when updating ptes to increase permissiveness
      of protection (increase access authority). Book3S does not require
      TLB flushing in this case, and it is not done on hash. This patch
      avoids the flush for radix.
      
      >From Power ISA v3.0B, p.1090:
      
          Setting a Reference or Change Bit or Upgrading Access Authority
          (PTE Subject to Atomic Hardware Updates)
      
          If the only change being made to a valid PTE that is subject to
          atomic hardware updates is to set the Reference or Change bit to 1
          or to add access authorities, a simpler sequence suffices because
          the translation hardware will refetch the PTE if an access is
          attempted for which the only problems were reference and/or change
          bits needing to be set or insufficient access authority.
      
      The nest MMU on POWER9 does not re-fetch the PTE after such an access
      attempt before faulting, so address spaces with a coprocessor
      attached will continue to flush in these cases.
      
      This reduces tlbies for a kernel compile workload from 1.28M to 0.95M,
      tlbiels from 20.17M 19.68M.
      
      fork --fork --exec benchmark improved 2.77% (12000->12300).
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e5f7cb58
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang · bd5050e3
      Aneesh Kumar K.V authored
      
      
      When relaxing access (read -> read_write update), pte needs to be marked invalid
      to handle a nest MMU bug. We also need to do a tlb flush after the pte is
      marked invalid before updating the pte with new access bits.
      
      We also move tlb flush to platform specific __ptep_set_access_flags. This will
      help us to gerid of unnecessary tlb flush on BOOK3S 64 later. We don't do that
      in this patch. This also helps in avoiding multiple tlbies with coprocessor
      attached.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bd5050e3
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Change function prototype · e4c1112c
      Aneesh Kumar K.V authored
      
      
      In later patch, we use the vma and psize to do tlb flush. Do the prototype
      update in separate patch to make the review easy.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e4c1112c