1. 16 Feb, 2017 1 commit
    • Paul Mackerras's avatar
      powerpc/64: Disable use of radix under a hypervisor · 3f91a89d
      Paul Mackerras authored
      Currently, if the kernel is running on a POWER9 processor under a
      hypervisor, it may try to use the radix MMU even though it doesn't have
      the necessary code to do so (it doesn't negotiate use of radix, and it
      doesn't do the H_REGISTER_PROC_TBL hcall).  If the hypervisor supports
      both radix and HPT, then it will set up the guest to use HPT (since the
      guest doesn't request radix in the CAS call), but if the radix feature
      bit is set in the ibm,pa-features property (which is valid, since
      ibm,pa-features is defined to represent the capabilities of the
      processor) the guest will try to use radix, resulting in a crash when
      it turns the MMU on.
      This makes the minimal fix for the current code, which is to disable
      radix unless we are running in hypervisor mode.
      Fixes: 2bfd65e4
       ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
  2. 24 Dec, 2016 1 commit
  3. 10 Dec, 2016 1 commit
    • Christophe Leroy's avatar
      powerpc: port 64 bits pgtable_cache to 32 bits · 9b081e10
      Christophe Leroy authored
      Today powerpc64 uses a set of pgtable_caches while powerpc32 uses
      standard pages when using 4k pages and a single pgtable_cache
      if using other size pages.
      In preparation of implementing huge pages on the 8xx, this patch
      replaces the specific powerpc32 handling by the 64 bits approach.
      This is done by:
      * moving 64 bits pgtable_cache_add() and pgtable_cache_init()
      in a new file called init-common.c
      * modifying pgtable_cache_init() to also handle the case
      without PMD
      * removing the 32 bits version of pgtable_cache_add() and
      * copying related header contents from 64 bits into both the
      book3s/32 and nohash/32 header files
      On the 8xx, the following cache sizes will be used:
      * 4k pages mode:
      - PGT_CACHE(10) for PGD
      - PGT_CACHE(3) for 512k hugepage tables
      * 16k pages mode:
      - PGT_CACHE(6) for PGD
      - PGT_CACHE(7) for 512k hugepage tables
      - PGT_CACHE(3) for 8M hugepage tables
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarScott Wood <oss@buserror.net>
  4. 01 Aug, 2016 6 commits
  5. 01 May, 2016 3 commits
  6. 03 Mar, 2016 1 commit
  7. 01 Mar, 2016 2 commits
    • David Gibson's avatar
      powerpc/mm: Clean up memory hotplug failure paths · 1dace6c6
      David Gibson authored
      This makes a number of cleanups to handling of mapping failures during
      memory hotplug on Power:
      For errors creating the linear mapping for the hot-added region:
        * This is now reported with EFAULT which is more appropriate than the
          previous EINVAL (the failure is unlikely to be related to the
          function's parameters)
        * An error in this path now prints a warning message, rather than just
          silently failing to add the extra memory.
        * Previously a failure here could result in the region being partially
          mapped.  We now clean up any partial mapping before failing.
      For errors creating the vmemmap for the hot-added region:
         * This is now reported with EFAULT instead of causing a BUG() - this
           could happen for external reason (e.g. full hash table) so it's better
           to handle this non-fatally
         * An error message is also printed, so the failure won't be silent
         * As above a failure could cause a partially mapped region, we now
           clean this up. [mpe: move htab_remove_mapping() out of #ifdef
           CONFIG_MEMORY_HOTPLUG to enable this]
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • David Gibson's avatar
      powerpc/mm: Handle removing maybe-present bolted HPTEs · 27828f98
      David Gibson authored
      At the moment the hpte_removebolted callback in ppc_md returns void and
      will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
      place.  This is awkward for the case of cleaning up a mapping which was
      partially made before failing.
      So, we add a return value to hpte_removebolted, and have it return ENOENT
      in the case that the HPTE to remove didn't exist in the first place.
      In the (sole) caller, we propagate errors in hpte_removebolted to its
      caller to handle.  However, we handle ENOENT specially, continuing to
      complete the unmapping over the specified range before returning the error
      to the caller.
      This means that htab_remove_mapping() will work sanely on a partially
      present mapping, removing any HPTEs which are present, while also returning
      ENOENT to its caller in case it's important there.
      There are two callers of htab_remove_mapping():
         - In remove_section_mapping() we already WARN_ON() any error return,
           which is reasonable - in this case the mapping should be fully
         - In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
           just a WARN_ON() in the case of ENOENT, since failing to remove a
           mapping that wasn't there in the first place probably shouldn't be
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
  8. 14 Dec, 2015 1 commit
  9. 26 Mar, 2015 1 commit
    • Yanjiang Jin's avatar
      powerpc/mm: Free string after creating kmem cache · e77553cb
      Yanjiang Jin authored
      kmem_cache_create()->kmem_cache_create_memcg()->kstrdup() allocates new
      space and copys name's content, so it is safe to free name memory after
      calling kmem_cache_create(). Else kmemleak will report the below
      unreferenced object 0xc0000000f9002160 (size 16):
        comm "swapper/0", pid 0, jiffies 4294892296 (age 1386.640s)
        hex dump (first 16 bytes):
          70 67 74 61 62 6c 65 2d 32 5e 39 00 de ad be ef  pgtable-2^9.....
          [<c0000000004e03ec>] .kvasprintf+0x5c/0xa0
          [<c0000000004e045c>] .kasprintf+0x2c/0x50
          [<c00000000002e36c>] .pgtable_cache_add+0xac/0x100
          [<c00000000002e3e4>] .pgtable_cache_init+0x24/0x80
          [<c000000000c6c67c>] .start_kernel+0x228/0x4c8
          [<c000000000000594>] .start_here_common+0x24/0x90
      Signed-off-by: default avatarYanjiang Jin <yanjiang.jin@windriver.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
  10. 09 Nov, 2014 1 commit
  11. 25 Sep, 2014 1 commit
  12. 05 Aug, 2014 4 commits
  13. 11 Oct, 2013 1 commit
    • Alexey Kardashevskiy's avatar
      powerpc: Prepare to support kernel handling of IOMMU map/unmap · 8e0861fa
      Alexey Kardashevskiy authored
      The current VFIO-on-POWER implementation supports only user mode
      driven mapping, i.e. QEMU is sending requests to map/unmap pages.
      However this approach is really slow, so we want to move that to KVM.
      Since H_PUT_TCE can be extremely performance sensitive (especially with
      network adapters where each packet needs to be mapped/unmapped) we chose
      to implement that as a "fast" hypercall directly in "real
      mode" (processor still in the guest context but MMU off).
      To be able to do that, we need to provide some facilities to
      access the struct page count within that real mode environment as things
      like the sparsemem vmemmap mappings aren't accessible.
      This adds an API function realmode_pfn_to_page() to get page struct when
      MMU is off.
      This adds to MM a new function put_page_unless_one() which drops a page
      if counter is bigger than 1. It is going to be used when MMU is off
      (for example, real mode on PPC64) and we want to make sure that page
      release will not happen in real mode as it may crash the kernel in
      a horrible way.
      Cc: linux-mm@kvack.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  14. 03 Oct, 2013 1 commit
    • Nathan Fontenot's avatar
      powerpc: Fix memory hotplug with sparse vmemmap · f7e3334a
      Nathan Fontenot authored
      Previous commit 46723bfa
      ... introduced a new config option
      HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for ppc
      when sparse vmemmap is not defined.
      This patch defines HAVE_BOOTMEM_INFO_NODE for ppc and adds the call to
      register_page_bootmem_info_node. Without this we get a BUG_ON for memory
      hot remove in put_page_bootmem().
      This also adds a stub for register_page_bootmem_memmap to allow ppc to build
      with sparse vmemmap defined. Leaving this as a stub is fine since the same
      vmemmap addresses are also handled in vmemmap_populate and as such are
      properly mapped.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: <stable@vger.kernel.org> [v3.9+]
  15. 21 Jun, 2013 1 commit
  16. 14 May, 2013 1 commit
  17. 30 Apr, 2013 1 commit
  18. 29 Apr, 2013 1 commit
    • Johannes Weiner's avatar
      sparse-vmemmap: specify vmemmap population range in bytes · 0aad818b
      Johannes Weiner authored
      The sparse code, when asking the architecture to populate the vmemmap,
      specifies the section range as a starting page and a number of pages.
      This is an awkward interface, because none of the arch-specific code
      actually thinks of the range in terms of 'struct page' units and always
      translates it to bytes first.
      In addition, later patches mix huge page and regular page backing for
      the vmemmap.  For this, they need to call vmemmap_populate_basepages()
      on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind.  But these
      are not necessarily multiples of the 'struct page' size and so this unit
      is too coarse.
      Just translate the section range into bytes once in the generic sparse
      code, then pass byte ranges down the stack.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Tested-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  19. 24 Feb, 2013 2 commits
    • Tang Chen's avatar
      memory-hotplug: remove memmap of sparse-vmemmap · 0197518c
      Tang Chen authored
      Introduce a new API vmemmap_free() to free and remove vmemmap
      pagetables.  Since pagetable implements are different, each architecture
      has to provide its own version of vmemmap_free(), just like
      Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc.
      [mhocko@suse.cz: fix implicit declaration of remove_pagetable]
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Yasuaki Ishimatsu's avatar
      memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap · 46723bfa
      Yasuaki Ishimatsu authored
      For removing memmap region of sparse-vmemmap which is allocated bootmem,
      memmap region of sparse-vmemmap needs to be registered by
      get_page_bootmem().  So the patch searches pages of virtual mapping and
      registers the pages by get_page_bootmem().
      NOTE: register_page_bootmem_memmap() is not implemented for ia64,
            ppc, s390, and sparc.  So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
            and revert register_page_bootmem_info_node() when platform doesn't
            support it.
            It's implemented by adding a new Kconfig option named
            CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
            by memory-hotplug feature fully supported archs(currently only on
            Since we have 2 config options called MEMORY_HOTPLUG and
            MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
            and codes in function register_page_bootmem_info_node() are only
            used for collecting infomation for hot-remove, so reside it under
            Besides page_isolation.c selected by MEMORY_ISOLATION under
            MEMORY_HOTPLUG is also such case, move it too.
      [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
      [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
      [mhocko@suse.cz: remove the arch specific functions without any implementation]
      [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
      [rientjes@google.com: fix defined but not used warning]
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: default avatarWu Jianguo <wujianguo@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarLin Feng <linfeng@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  20. 05 Sep, 2012 1 commit
  21. 28 Mar, 2012 1 commit
  22. 30 Jun, 2011 1 commit
  23. 09 Jun, 2011 1 commit
    • Benjamin Herrenschmidt's avatar
      powerpc: Force page alignment for initrd reserved memory · 307cfe71
      Benjamin Herrenschmidt authored
      When using 64K pages with a separate cpio rootfs, U-Boot will align
      the rootfs on a 4K page boundary. When the memory is reserved, and
      subsequent early memblock_alloc is called, it will allocate memory
      between the 64K page alignment and reserved memory. When the reserved
      memory is subsequently freed, it is done so by pages, causing the
      early memblock_alloc requests to be re-used, which in my case, caused
      the device-tree to be clobbered.
      This patch forces the reserved memory for initrd to be kernel page
      aligned, and will move the device tree if it overlaps with the range
      extension of initrd. This patch will also consolidate the identical
      function free_initrd_mem() from mm/init_32.c, init_64.c to mm/mem.c,
      and adds the same range extension when freeing initrd. free_initrd_mem()
      is also moved to the __init section.
      Many thanks to Milton Miller for his input on this patch.
      [BenH: Fixed build without CONFIG_BLK_DEV_INITRD]
      Signed-off-by: default avatarDave Carroll <dcarroll@astekcorp.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  24. 24 Aug, 2010 1 commit
  25. 05 Aug, 2010 1 commit
    • Benjamin Herrenschmidt's avatar
      memblock: Remove rmo_size, burry it in arch/powerpc where it belongs · cd3db0c4
      Benjamin Herrenschmidt authored
      The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact
      server ppc64 though I hijack it on embedded ppc64 for similar purposes)
      and represents the area of memory that can be accessed in real mode
      (aka with MMU off), or on embedded, from the exception vectors (which
      is bolted in the TLB) which pretty much boils down to the same thing.
      We take that out of the generic MEMBLOCK data structure and move it into
      arch/powerpc where it belongs, renaming it to "RMA" while at it.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  26. 14 Jul, 2010 1 commit
  27. 06 May, 2010 1 commit
    • Mark Nelson's avatar
      powerpc/mm: Track backing pages allocated by vmemmap_populate() · 91eea67c
      Mark Nelson authored
      We need to keep track of the backing pages that get allocated by
      vmemmap_populate() so that when we use kdump, the dump-capture kernel knows
      where these pages are.
      We use a simple linked list of structures that contain the physical address
      of the backing page and corresponding virtual address to track the backing
      To save space, we just use a pointer to the next struct vmemmap_backing. We
      can also do this because we never remove nodes.  We call the pointer "list"
      to be compatible with changes made to the crash utility.
      vmemmap_populate() is called either at boot-time or on a memory hotplug
      operation. We don't have to worry about the boot-time calls because they
      will be inherently single-threaded, and for a memory hotplug operation
      vmemmap_populate() is called through:
      and in sparse_add_one_section() we're protected by pgdat_resize_lock().
      So, we don't need a spinlock to protect the vmemmap_list.
      We allocate space for the vmemmap_backing structs by allocating whole pages
      in vmemmap_list_alloc() and then handing out chunks of this to
      This means that we waste at most just under one page, but this keeps the code
      is simple.
      Signed-off-by: default avatarMark Nelson <markn@au1.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  28. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>