1. 24 Sep, 2019 2 commits
    • Aneesh Kumar K.V's avatar
      libnvdimm/altmap: Track namespace boundaries in altmap · cf387d96
      Aneesh Kumar K.V authored
      With PFN_MODE_PMEM namespace, the memmap area is allocated from the device
      area. Some architectures map the memmap area with large page size. On
      architectures like ppc64, 16MB page for memap mapping can map 262144 pfns.
      This maps a namespace size of 16G.
      
      When populating memmap region with 16MB page from the device area,
      make sure the allocated space is not used to map resources outside this
      namespace. Such usage of device area will prevent a namespace destroy.
      
      Add resource end pnf in altmap and use that to check if the memmap area
      allocation can map pfn outside the namespace. On ppc64 in such case we fallback
      to allocation from memory.
      
      This fix kernel crash reported below:
      
      [  132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0
      [  133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000
      [  133.464760] Faulting instruction address: 0xc00000000007580c
      [  133.464766] Oops: Kernel access of bad area, sig: 11 [#1]
      [  133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
      .....
      [  133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0
      [  133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0
      [  133.464910] Call Trace:
      [  133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable)
      [  133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240
      [  133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590
      [  133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160
      [  133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0
      [  133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50
      [  133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400
      [  133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250
      [  133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110
      [  133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70
      [  133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0
      [  133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250
      [  133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70
      [  133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250
      
      djbw: Aneesh notes that this crash can likely be triggered in any kernel that
      supports 'papr_scm', so flagging that commit for -stable consideration.
      
      Fixes: b5beae5e
      
       ("powerpc/pseries: Add driver for PAPR SCM regions")
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Tested-by: default avatarSantosh Sivaraj <santosh@fossix.org>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      cf387d96
    • Aneesh Kumar K.V's avatar
      libnvdimm/dax: Pick the right alignment default when creating dax devices · f5376699
      Aneesh Kumar K.V authored
      Allow arch to provide the supported alignments and use hugepage alignment only
      if we support hugepage. Right now we depend on compile time configs whereas this
      patch switch this to runtime discovery.
      
      Architectures like ppc64 can have THP enabled in code, but then can have
      hugepage size disabled by the hypervisor. This allows us to create dax devices
      with PAGE_SIZE alignment in this case.
      
      Existing dax namespace with alignment larger than PAGE_SIZE will fail to
      initialize in this specific case. We still allow fsdax namespace initialization.
      
      With respect to identifying whether to enable hugepage fault for a dax device,
      if THP is enabled during compile, we default to taking hugepage fault and in dax
      fault handler if we find the fault size > alignment we retry with PAGE_SIZE
      fault size.
      
      This also addresses the below failure scenario on ppc64
      
      ndctl create-namespace --mode=devdax  | grep align
       "align":16777216,
       "align":16777216
      
      cat /sys/devices/ndbus0/region0/dax0.0/supported_alignments
       65536 16777216
      
      daxio.static-debug  -z -o /dev/dax0.0
        Bus error (core dumped)
      
        $ dmesg | tail
         lpar: Failed hash pte insert with error -4
         hash-mmu: mm: Hashing failure ! EA=0x7fff17000000 access=0x8000000000000006 current=daxio
         hash-mmu:     trap=0x300 vsid=0x22cb7a3a
      
       ssize=1 base psize=2 psize 10 pte=0xc000000501002b86
         daxio[3860]: bus error (7) at 7fff17000000 nip 7fff973c007c lr 7fff973bff34 code 2 in libpmem.so.1.0.0[7fff973b0000+20000]
         daxio[3860]: code: 792945e4 7d494b78 e95f0098 7d494b78 f93f00a0 4800012c e93f0088 f93f0120
         daxio[3860]: code: e93f00a0 f93f0128 e93f0120 e95f0128 <f9490000> e93f0088 39290008 f93f0110
      
      The failure was due to guest kernel using wrong page size.
      
      The namespaces created with 16M alignment will appear as below on a config with
      16M page size disabled.
      
      $ ndctl list -Ni
      [
        {
          "dev":"namespace0.1",
          "mode":"fsdax",
          "map":"dev",
          "size":5351931904,
          "uuid":"fc6e9667-461a-4718-82b4-69b24570bddb",
          "align":16777216,
          "blockdev":"pmem0.1",
          "supported_alignments":[
            65536
          ]
        },
        {
          "dev":"namespace0.0",
          "mode":"fsdax",    <==== devdax 16M alignment marked disabled.
          "map":"mem",
          "size":5368709120,
          "uuid":"a4bdf81a-f2ee-4bc6-91db-7b87eddd0484",
          "state":"disabled"
        }
      ]
      
      Cc: linux-mm@kvack.org
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Link: https://lore.kernel.org/r/20190905154603.10349-8-aneesh.kumar@linux.ibm.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      f5376699
  2. 05 Sep, 2019 2 commits
  3. 28 Aug, 2019 1 commit
  4. 19 Jul, 2019 2 commits
    • Dan Williams's avatar
      libnvdimm/pfn: stop padding pmem namespaces to section alignment · a3619190
      Dan Williams authored
      Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
      memory, we no longer need to add padding at pfn/dax device creation
      time.  The kernel will still honor padding established by older kernels.
      
      Link: http://lkml.kernel.org/r/156092356588.979959.6793371748950931916.stgit@dwillia2-desk3.amr.corp.intel.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3619190
    • Dan Williams's avatar
      libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields · 7e3e888d
      Dan Williams authored
      At namespace creation time there is the potential for the "expected to
      be zero" fields of a 'pfn' info-block to be filled with indeterminate
      data.  While the kernel buffer is zeroed on allocation it is immediately
      overwritten by nd_pfn_validate() filling it with the current contents of
      the on-media info-block location.  For fields like, 'flags' and the
      'padding' it potentially means that future implementations can not rely on
      those fields being zero.
      
      In preparation to stop using the 'start_pad' and 'end_trunc' fields for
      section alignment, arrange for fields that are not explicitly
      initialized to be guaranteed zero.  Bump the minor version to indicate
      it is safe to assume the 'padding' and 'flags' are zero.  Otherwise,
      this corruption is expected to benign since all other critical fields
      are explicitly initialized.
      
      Note The cc: stable is about spreading this new policy to as many
      kernels as possible not fixing an issue in those kernels.  It is not
      until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to
      section alignment" where this improper initialization becomes a problem.
      So if someone decides to backport "libnvdimm/pfn: Stop padding pmem
      namespaces to section alignment" (which is not tagged for stable), make
      sure this pre-requisite is flagged.
      
      Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: 32ab0a3f
      
       ("libnvdimm, pmem: 'struct page' for pmem")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: <stable@vger.kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e3e888d
  5. 18 Jul, 2019 1 commit
    • Dan Williams's avatar
      driver-core, libnvdimm: Let device subsystems add local lockdep coverage · 87a30e1f
      Dan Williams authored
      
      
      For good reason, the standard device_lock() is marked
      lockdep_set_novalidate_class() because there is simply no sane way to
      describe the myriad ways the device_lock() ordered with other locks.
      However, that leaves subsystems that know their own local device_lock()
      ordering rules to find lock ordering mistakes manually. Instead,
      introduce an optional / additional lockdep-enabled lock that a subsystem
      can acquire in all the same paths that the device_lock() is acquired.
      
      A conversion of the NFIT driver and NVDIMM subsystem to a
      lockdep-validate device_lock() scheme is included. The
      debug_nvdimm_lock() implementation implements the correct lock-class and
      stacking order for the libnvdimm device topology hierarchy.
      
      Yes, this is a hack, but hopefully it is a useful hack for other
      subsystems device_lock() debug sessions. Quoting Greg:
      
          "Yeah, it feels a bit hacky but it's really up to a subsystem to mess up
           using it as much as anything else, so user beware :)
      
           I don't object to it if it makes things easier for you to debug."
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com
      87a30e1f
  6. 02 Jul, 2019 1 commit
  7. 05 Jun, 2019 1 commit
  8. 06 Apr, 2019 1 commit
    • Christoph Hellwig's avatar
      block: remove CONFIG_LBDAF · 72deb455
      Christoph Hellwig authored
      
      
      Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
      architectures.  These types are required to support block device and/or
      file sizes larger than 2 TiB, and have generally defaulted to on for
      a long time.  Enabling the option only increases the i386 tinyconfig
      size by 145 bytes, and many data structures already always use
      64-bit values for their in-core and on-disk data structures anyway,
      so there should not be a large change in dynamic memory usage either.
      
      Dropping this option removes a somewhat weird non-default config that
      has cause various bugs or compiler warnings when actually used.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      72deb455
  9. 22 Feb, 2019 1 commit
  10. 12 Feb, 2019 3 commits
    • Dan Williams's avatar
      libnvdimm/pfn: Account for PAGE_SIZE > info-block-size in nd_pfn_init() · 11a35810
      Dan Williams authored
      
      
      Similar to "libnvdimm: Fix altmap reservation size calculation" provide
      for a reservation of a full page worth of info block space at info-block
      establishment time.  Typically there is already slack in the padding
      from honoring the default 2MB alignment, but provide for a reservation
      for corner case configurations that would otherwise fit.
      
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      11a35810
    • Oliver O'Halloran's avatar
      libnvdimm: Fix altmap reservation size calculation · 07464e88
      Oliver O'Halloran authored
      Libnvdimm reserves the first 8K of pfn and devicedax namespaces to
      store a superblock describing the namespace. This 8K reservation
      is contained within the altmap area which the kernel uses for the
      vmemmap backing for the pages within the namespace. The altmap
      allows for some pages at the start of the altmap area to be reserved
      and that mechanism is used to protect the superblock from being
      re-used as vmemmap backing.
      
      The number of PFNs to reserve is calculated using:
      
      	PHYS_PFN(SZ_8K)
      
      Which is implemented as:
      
       #define PHYS_PFN(x) ((unsigned long)((x) >> PAGE_SHIFT))
      
      So on systems where PAGE_SIZE is greater than 8K the reservation
      size is truncated to zero and the superblock area is re-used as
      vmemmap backing. As a result all the namespace information stored
      in the superblock (i.e. if it's a PFN or DAX namespace) is lost
      and the namespace needs to be re-created to get access to the
      contents.
      
      This patch fixes this by using PFN_UP() rather than PHYS_PFN() to ensure
      that at least one page is reserved. On systems with a 4K pages size this
      patch should have no effect.
      
      Cc: stable@vger.kernel.org
      Cc: Dan Williams <dan.j.williams@intel.com>
      Fixes: ac515c08
      
       ("libnvdimm, pmem, pfn: move pfn setup to the core")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      07464e88
    • Wei Yang's avatar
      libnvdimm, pfn: Fix over-trim in trim_pfn_device() · f101ada7
      Wei Yang authored
      When trying to see whether current nd_region intersects with others,
      trim_pfn_device() has already calculated the *size* to be expanded to
      SECTION size.
      
      Do not double append 'adjust' to 'size' when calculating whether the end
      of a region collides with the next pmem region.
      
      Fixes: ae86cbfe
      
       "libnvdimm, pfn: Pad pfn namespaces relative to other regions"
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      f101ada7
  11. 05 Dec, 2018 1 commit
  12. 28 Sep, 2018 1 commit
    • Vishal Verma's avatar
      libnvdimm, pfn: during init, clear errors in the metadata area · 48af2f7e
      Vishal Verma authored
      
      
      If there are badblocks present in the 'struct page' area for pfn
      namespaces, until now, the only way to clear them has been to force the
      namespace into raw mode, clear the errors, and re-enable the fsdax mode.
      This is clunky, given that it should be easy enough for the pfn driver
      to do the same.
      
      Add a new helper that uses the most recently available badblocks list to
      check whether there are any badblocks that lie in the volatile struct
      page area. If so, before initializing the struct pages, send down
      targeted writes via nvdimm_write_bytes to write zeroes to the affected
      blocks, and thus clear errors.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      48af2f7e
  13. 22 May, 2018 1 commit
    • Dan Williams's avatar
      mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS · e7638488
      Dan Williams authored
      
      
      In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
      be able to rely on the fact that they will get wakeups on dev_pagemap
      page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
      generic_dax_page_free() as common indicator / infrastructure for dax
      filesytems to require. With this change there are no users of the
      MEMORY_DEVICE_HOST designation, so remove it.
      
      The HMM sub-system extended dev_pagemap to arrange a callback when a
      dev_pagemap managed page is freed. Since a dev_pagemap page is free /
      idle when its reference count is 1 it requires an additional branch to
      check the page-type at put_page() time. Given put_page() is a hot-path
      we do not want to incur that check if HMM is not in use, so a static
      branch is used to avoid that overhead when not necessary.
      
      Now, the FS_DAX implementation wants to reuse this mechanism for
      receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
      static-key into a generic mechanism that either HMM or FS_DAX code paths
      can enable.
      
      For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
      care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
      However, we still need to support FS_DAX in the FS_DAX_LIMITED case
      implemented by the s390/dcssblk driver.
      
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Reported-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Reported-by: default avatarDave Jiang <dave.jiang@intel.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      e7638488
  14. 14 Mar, 2018 1 commit
  15. 06 Mar, 2018 1 commit
  16. 08 Jan, 2018 1 commit
  17. 19 Dec, 2017 2 commits
    • Dan Williams's avatar
      libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment · 41fce90f
      Dan Williams authored
      The following namespace configuration attempt:
      
          # ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
          libndctl: ndctl_dax_enable: dax0.1: failed to enable
            Error: namespace0.0: failed to enable
      
          failed to reconfigure namespace: No such device or address
      
      ...fails when the backing memory range is not physically aligned to 1G:
      
          # cat /proc/iomem | grep Persistent
          210000000-30fffffff : Persistent Memory (legacy)
      
      In the above example the 4G persistent memory range starts and ends on a
      256MB boundary.
      
      We handle this case correctly when needing to handle cases that violate
      section alignment (128MB) collisions against "System RAM", and we simply
      need to extend that padding/truncation for the 1GB alignment use case.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 315c5625
      
       ("libnvdimm, pfn: add 'align' attribute...")
      Reported-and-tested-by: default avatarJane Chu <jane.chu@oracle.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      41fce90f
    • Dan Williams's avatar
      libnvdimm, pfn: fix start_pad handling for aligned namespaces · 19deaa21
      Dan Williams authored
      The alignment checks at pfn driver startup fail to properly account for
      the 'start_pad' in the case where the namespace is misaligned relative
      to its internal alignment. This is typically triggered in 1G aligned
      namespace, but could theoretically trigger with small namespace
      alignments. When this triggers the kernel reports messages of the form:
      
          dax2.1: bad offset: 0x3c000000 dax disabled align: 0x40000000
      
      Cc: <stable@vger.kernel.org>
      Fixes: 1ee6667c
      
       ("libnvdimm, pfn, dax: fix initialization vs autodetect...")
      Reported-by: default avatarJane Chu <jane.chu@oracle.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      19deaa21
  18. 28 Sep, 2017 1 commit
  19. 15 Aug, 2017 1 commit
  20. 12 Aug, 2017 1 commit
  21. 25 Jul, 2017 1 commit
    • Oliver O'Halloran's avatar
      libnvdimm: Stop using HPAGE_SIZE · 0dd69643
      Oliver O'Halloran authored
      
      
      Currently libnvdimm uses HPAGE_SIZE as the default alignment for DAX and
      PFN devices. HPAGE_SIZE is the default hugetlbfs page size and when
      hugetlbfs is disabled it defaults to PAGE_SIZE. Given DAX has more
      in common with THP than hugetlbfs we should proably be using
      HPAGE_PMD_SIZE, but this is undefined when THP is disabled so lets just
      give it a new name.
      
      The other usage of HPAGE_SIZE in libnvdimm is when determining how large
      the altmap should be. For the reasons mentioned above it doesn't really
      make sense to use HPAGE_SIZE here either. PMD_SIZE seems to be safe to
      use in generic code and it happens to match the vmemmap allocation block
      on x86 and Power. It's still a hack, but it's a slightly nicer hack.
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      0dd69643
  22. 27 Jun, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm, nfit: enable support for volatile ranges · c9e582aa
      Dan Williams authored
      
      
      Allow volatile nfit ranges to participate in all the same infrastructure
      provided for persistent memory regions. A resulting resulting namespace
      device will still be called "pmem", but the parent region type will be
      "nd_volatile". This is in preparation for disabling the dax ->flush()
      operation in the pmem driver when it is hosted on a volatile range.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      c9e582aa
  23. 15 Jun, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm, label: add address abstraction identifiers · b3fde74e
      Dan Williams authored
      
      
      Starting with v1.2 labels, 'address abstractions' can be hinted via an
      address abstraction id that implies an info-block format. The standard
      address abstraction in the specification is the v2 format of the
      Block-Translation-Table (BTT). Support for that is saved for a later
      patch, for now we add support for the Linux supported address
      abstractions BTT (v1), PFN, and DAX.
      
      The new 'holder_class' attribute for namespace devices is added for
      tooling to specify the 'abstraction_guid' to store in the namespace label.
      For v1.1 labels this field is undefined and any setting of
      'holder_class' away from the default 'none' value will only have effect
      until the driver is unloaded. Setting 'holder_class' requires that
      whatever device tries to claim the namespace must be of the specified
      class.
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      b3fde74e
  24. 11 May, 2017 1 commit
    • Vishal Verma's avatar
      libnvdimm: add an atomic vs process context flag to rw_bytes · 3ae3d67b
      Vishal Verma authored
      
      
      nsio_rw_bytes can clear media errors, but this cannot be done while we
      are in an atomic context due to locking within ACPI. From the BTT,
      ->rw_bytes may be called either from atomic or process context depending
      on whether the calls happen during initialization or during IO.
      
      During init, we want to ensure error clearing happens, and the flag
      marking process context allows nsio_rw_bytes to do that. When called
      during IO, we're in atomic context, and error clearing can be skipped.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      3ae3d67b
  25. 05 May, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm, pfn: fix 'npfns' vs section alignment · d5483fed
      Dan Williams authored
      Fix failures to create namespaces due to the vmem_altmap not advertising
      enough free space to store the memmap.
      
       WARNING: CPU: 15 PID: 8022 at arch/x86/mm/init_64.c:656 arch_add_memory+0xde/0xf0
       [..]
       Call Trace:
        dump_stack+0x63/0x83
        __warn+0xcb/0xf0
        warn_slowpath_null+0x1d/0x20
        arch_add_memory+0xde/0xf0
        devm_memremap_pages+0x244/0x440
        pmem_attach_disk+0x37e/0x490 [nd_pmem]
        nd_pmem_probe+0x7e/0xa0 [nd_pmem]
        nvdimm_bus_probe+0x71/0x120 [libnvdimm]
        driver_probe_device+0x2bb/0x460
        bind_store+0x114/0x160
        drv_attr_store+0x25/0x30
      
      In commit 658922e5 "libnvdimm, pfn: fix memmap reservation sizing"
      we arranged for the capacity to be allocated, but failed to also update
      the 'npfns' parameter. This leads to cases where there is enough
      capacity reserved to hold all the allocated sections, but
      vmemmap_populate_hugepages() still encounters -ENOMEM from
      altmap_alloc_block_buf().
      
      This fix is a stop-gap until we can teach the core memory hotplug
      implementation to permit sub-section hotplug.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 658922e5
      
       ("libnvdimm, pfn: fix memmap reservation sizing")
      Reported-by: default avatarAnisha Allada <anisha.allada@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      d5483fed
  26. 01 May, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering · 452bae0a
      Dan Williams authored
      A debug patch to turn the standard device_lock() into something that
      lockdep can analyze yielded the following:
      
       ======================================================
       [ INFO: possible circular locking dependency detected ]
       4.11.0-rc4+ #106 Tainted: G           O
       -------------------------------------------------------
       lt-libndctl/1898 is trying to acquire lock:
        (&dev->nvdimm_mutex/3){+.+.+.}, at: [<ffffffffc023c948>] nd_attach_ndns+0x178/0x1b0 [libnvdimm]
      
       but task is already holding lock:
        (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [<ffffffffc022e0b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm]
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&nvdimm_bus->reconfig_mutex){+.+.+.}:
              lock_acquire+0xf6/0x1f0
              __mutex_lock+0x88/0x980
              mutex_lock_nested+0x1b/0x20
              nvdimm_bus_lock+0x21/0x30 [libnvdimm]
              nvdimm_namespace_capacity+0x1b/0x40 [libnvdimm]
              nvdimm_namespace_common_probe+0x230/0x510 [libnvdimm]
              nd_pmem_probe+0x14/0x180 [nd_pmem]
              nvdimm_bus_probe+0xa9/0x260 [libnvdimm]
      
       -> #0 (&dev->nvdimm_mutex/3){+.+.+.}:
              __lock_acquire+0x1107/0x1280
              lock_acquire+0xf6/0x1f0
              __mutex_lock+0x88/0x980
              mutex_lock_nested+0x1b/0x20
              nd_attach_ndns+0x178/0x1b0 [libnvdimm]
              nd_namespace_store+0x308/0x3c0 [libnvdimm]
              namespace_store+0x87/0x220 [libnvdimm]
      
      In this case '&dev->nvdimm_mutex/3' mirrors '&dev->mutex'.
      
      Fix this by replacing the use of device_lock() with nvdimm_bus_lock() to protect
      nd_{attach,detach}_ndns() operations.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 8c2f7e86
      
       ("libnvdimm: infrastructure for btt devices")
      Reported-by: default avatarYi Zhang <yizhan@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      452bae0a
  27. 04 Feb, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm, pfn: fix memmap reservation size versus 4K alignment · bfb34527
      Dan Williams authored
      When vmemmap_populate() allocates space for the memmap it does so in 2MB
      sized chunks. The libnvdimm-pfn driver incorrectly accounts for this
      when the alignment of the device is set to 4K. When this happens we
      trigger memory allocation failures in altmap_alloc_block_buf() and
      trigger warnings of the form:
      
       WARNING: CPU: 0 PID: 3376 at arch/x86/mm/init_64.c:656 arch_add_memory+0xe4/0xf0
       [..]
       Call Trace:
        dump_stack+0x86/0xc3
        __warn+0xcb/0xf0
        warn_slowpath_null+0x1d/0x20
        arch_add_memory+0xe4/0xf0
        devm_memremap_pages+0x29b/0x4e0
      
      Fixes: 315c5625
      
       ("libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZE")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      bfb34527
  28. 10 Dec, 2016 1 commit
  29. 24 Jun, 2016 1 commit
    • Dan Williams's avatar
      libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment · 1ee6667c
      Dan Williams authored
      
      
      The updated ndctl unit tests discovered that if a pfn configuration with
      a 4K alignment is read from the namespace, that alignment will be
      ignored in favor of the default 2M alignment.  The result is that the
      configuration will fail initialization with a message like:
      
          dax6.1: bad offset: 0x22000 dax disabled align: 0x200000
      
      Fix this by allowing the alignment read from the info block to override
      the default which is 2M not 0 in the autodetect path.  This also fixes a
      similar problem with the mode and alignment settings silently being
      overwritten by the kernel when userspace has changed it.  We now will
      either overwrite the info block if userspace changes the uuid or fail
      and warn if a live setting disagrees with the info block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Micah Parrish <micah.parrish@hpe.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      1ee6667c
  30. 21 May, 2016 3 commits
    • Dan Williams's avatar
      libnvdimm, dax: fix deletion · 03dca343
      Dan Williams authored
      
      
      The ndctl unit tests discovered that the dax enabling omitted updates to
      nd_detach_and_reset().  This routine clears device the configuration
      when the namespace is detached.  Without this clearing userspace may
      assume that the device is in the process of being configured by another
      agent in the system.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      03dca343
    • Dan Williams's avatar
      libnvdimm, dax: fix alignment validation · 5e24c9fd
      Dan Williams authored
      
      
      Testing the dax-device autodetect support revealed a probe failure with
      the following result:
      
          dax0.1: bad offset: 0x8200000 dax disabled
      
      The original pfn-device implementation inferred the alignment from
      ilog2(offset), now that the alignment is explicit the is_power_of_2()
      needs replacing with a real sanity check against the recorded alignment.
      Otherwise the alignment check is useless in the implicit case and only
      the minimum size of the offset matters.
      
      This self-consistency check is further validated by the probe path that
      will re-check that the offset is large enough to contain all the
      metadata required to enable the device.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      5e24c9fd
    • Dan Williams's avatar
      libnvdimm, dax: autodetect support · c5ed9268
      Dan Williams authored
      
      
      For autodetecting a previously established dax configuration we need the
      info block to indicate block-device vs device-dax mode, and we need to
      have the default namespace probe hand-off the configuration to the
      dax_pmem driver.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      c5ed9268
  31. 09 May, 2016 2 commits