1. 24 Sep, 2019 1 commit
    • Aneesh Kumar K.V's avatar
      libnvdimm/region: Initialize bad block for volatile namespaces · c42adf87
      Aneesh Kumar K.V authored
      We do check for a bad block during namespace init and that use
      region bad block list. We need to initialize the bad block
      for volatile regions for this to work. We also observe a lockdep
      warning as below because the lock is not initialized correctly
      since we skip bad block init for volatile regions.
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149
       Call Trace:
       [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable)
       [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60
       [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0
       [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270
       [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290
       [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0
       [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0
       [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160
       [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240
       [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0
       [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0
       [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0
       [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0
       [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130
       [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50
       [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0
       [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170
       [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100
       [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48
       [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0
       [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c
       [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180
       [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.com
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  2. 18 Jul, 2019 2 commits
    • Dan Williams's avatar
      driver-core, libnvdimm: Let device subsystems add local lockdep coverage · 87a30e1f
      Dan Williams authored
      For good reason, the standard device_lock() is marked
      lockdep_set_novalidate_class() because there is simply no sane way to
      describe the myriad ways the device_lock() ordered with other locks.
      However, that leaves subsystems that know their own local device_lock()
      ordering rules to find lock ordering mistakes manually. Instead,
      introduce an optional / additional lockdep-enabled lock that a subsystem
      can acquire in all the same paths that the device_lock() is acquired.
      A conversion of the NFIT driver and NVDIMM subsystem to a
      lockdep-validate device_lock() scheme is included. The
      debug_nvdimm_lock() implementation implements the correct lock-class and
      stacking order for the libnvdimm device topology hierarchy.
      Yes, this is a hack, but hopefully it is a useful hack for other
      subsystems device_lock() debug sessions. Quoting Greg:
          "Yeah, it feels a bit hacky but it's really up to a subsystem to mess up
           using it as much as anything else, so user beware :)
           I don't object to it if it makes things easier for you to debug."
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com
    • Dan Williams's avatar
      libnvdimm/region: Register badblocks before namespaces · 700cd033
      Dan Williams authored
      Namespace activation expects to be able to reference region badblocks.
      The following warning sometimes triggers when asynchronous namespace
      activation races in front of the completion of namespace probing. Move
      all possible namespace probing after region badblocks initialization.
      Otherwise, lockdep sometimes catches the uninitialized state of the
      badblocks seqlock with stack trace signatures like:
          INFO: trying to register non-static key.
          pmem2: detected capacity change from 0 to 136365211648
          the code is fine but needs lockdep annotation.
          turning off the locking correctness validator.
          CPU: 9 PID: 358 Comm: kworker/u80:5 Tainted: G           OE     5.2.0-rc4+ #3382
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
          Workqueue: events_unbound async_run_entry_fn
          Call Trace:
          pmem1.12: detected capacity change from 0 to 8589934592
           ? check_object+0x140/0x270
           ? __mutex_lock+0x39d/0x910
           ? nd_pfn_validate+0x28f/0x440 [libnvdimm]
           ? nd_pfn_validate+0x28f/0x440 [libnvdimm]
           nd_pfn_validate+0x28f/0x440 [libnvdimm]
           ? lockdep_hardirqs_on+0xf0/0x180
           nd_dax_probe+0x9a/0x120 [libnvdimm]
           nd_pmem_probe+0x6d/0x180 [nd_pmem]
           nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
      Fixes: 48af2f7e
       ("libnvdimm, pfn: during init, clear errors...")
      Cc: <stable@vger.kernel.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Link: https://lore.kernel.org/r/156341208365.292348.1547528796026249120.stgit@dwillia2-desk3.amr.corp.intel.com
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  3. 05 Jun, 2019 1 commit
  4. 07 Apr, 2018 1 commit
  5. 01 Jul, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime · 6aa734a2
      Dan Williams authored
      We need to hold a reference on the 'dirent' until we are sure there are
      no more notifications that will be sent. As noted in the new comments we
      take advantage of the fact that the references are taken and dropped
      under device_lock() and that nd_device_notify() holds device_lock() over
      new badblocks notifications. The notifications that happen when
      badblocks are cleared only occur while the device is active.
      Also take the opportunity to fix up the error messages to report the
      user visible effect of a sysfs_get_dirent() failure.
      Fixes: 975750a9
       ("libnvdimm, pmem: Add sysfs notifications to badblocks")
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  6. 15 Jun, 2017 1 commit
    • Toshi Kani's avatar
      libnvdimm, pmem: Add sysfs notifications to badblocks · 975750a9
      Toshi Kani authored
      Sysfs "badblocks" information may be updated during run-time that:
       - MCE, SCI, and sysfs "scrub" may add new bad blocks
       - Writes and ioctl() may clear bad blocks
      Add support to send sysfs notifications to sysfs "badblocks" file
      under region and pmem directories when their badblocks information
      is re-evaluated (but is not necessarily changed) during run-time.
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Linda Knippers <linda.knippers@hpe.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  7. 29 Apr, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm: rework region badblocks clearing · 23f49844
      Dan Williams authored
      Toshi noticed that the new support for a region-level badblocks missed
      the case where errors are cleared due to BTT I/O.
      An initial attempt to fix this ran into a "sleeping while atomic"
      warning due to taking the nvdimm_bus_lock() in the BTT I/O path to
      satisfy the locking requirements of __nvdimm_bus_badblocks_clear().
      However, that lock is not needed since we are not acting on any data that
      is subject to change under that lock. The badblocks instance has its own
      internal lock to handle mutations of the error list.
      So, in order to make it clear that we are just acting on region devices,
      rename __nvdimm_bus_badblocks_clear() to nvdimm_clear_badblocks_regions().
      Eliminate the lock and consolidate all support routines for the new
      nvdimm_account_cleared_poison() in drivers/nvdimm/bus.c. Finally, to the
      opportunity to cleanup to some unnecessary casts, make the calling
      convention of nvdimm_clear_badblocks_regions() clearer by replacing struct
      resource with the minimal struct clear_badblocks_context, and use the
      DEVICE_ATTR macro.
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Reported-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  8. 13 Apr, 2017 2 commits
  9. 11 Jul, 2016 2 commits
  10. 09 May, 2016 1 commit
    • Dan Williams's avatar
      libnvdimm, dax: introduce device-dax infrastructure · cd03412a
      Dan Williams authored
      Device DAX is the device-centric analogue of Filesystem DAX
      (CONFIG_FS_DAX).  It allows persistent memory ranges to be allocated and
      mapped without need of an intervening file system.  This initial
      infrastructure arranges for a libnvdimm pfn-device to be represented as
      a different device-type so that it can be attached to a driver other
      than the pmem driver.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  11. 05 Mar, 2016 1 commit
  12. 29 Aug, 2015 1 commit
    • Dan Williams's avatar
      libnvdimm, pfn: 'struct page' provider infrastructure · e1455744
      Dan Williams authored
      Implement the base infrastructure for libnvdimm PFN devices. Similar to
      BTT devices they take a namespace as a backing device and layer
      functionality on top. In this case the functionality is reserving space
      for an array of 'struct page' entries to be handed out through
      pfn_to_page(). For now this is just the basic libnvdimm-device-model for
      configuring the base PFN device.
      As the namespace claiming mechanism for PFN devices is mostly identical
      to BTT devices drivers/nvdimm/claim.c is created to house the common
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  13. 26 Jun, 2015 2 commits
    • Ross Zwisler's avatar
      libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory · 047fc8a1
      Ross Zwisler authored
      The libnvdimm implementation handles allocating dimm address space (DPA)
      between PMEM and BLK mode interfaces.  After DPA has been allocated from
      a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
      as a struct bio based block device. Unlike PMEM, BLK is required to
      handle platform specific details like mmio register formats and memory
      controller interleave.  For this reason the libnvdimm generic nd_blk
      driver calls back into the bus provider to carry out the I/O.
      This initial implementation handles the BLK interface defined by the
      ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
      DCR (dimm control region), BDW (block data window), IDT (interleave
      descriptor) NFIT structures and the hardware register format.
      [1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
      [2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    • Vishal Verma's avatar
      nd_btt: atomic sector updates · 5212e11f
      Vishal Verma authored
      BTT stands for Block Translation Table, and is a way to provide power
      fail sector atomicity semantics for block devices that have the ability
      to perform byte granularity IO. It relies on the capability of libnvdimm
      namespace devices to do byte aligned IO.
      The BTT works as a stacked blocked device, and reserves a chunk of space
      from the backing device for its accounting metadata. It is a bio-based
      driver because all IO is done synchronously, and there is no queuing or
      asynchronous completions at either the device or the driver level.
      The BTT uses 'lanes' to index into various 'on-disk' data structures,
      and lanes also act as a synchronization mechanism in case there are more
      CPUs than available lanes. We did a comparison between two lane lock
      strategies - first where we kept an atomic counter around that tracked
      which was the last lane that was used, and 'our' lane was determined by
      atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
      theoretically, no CPU would be blocked waiting for a lane. The other
      strategy was to use the cpu number we're scheduled on to and hash it to
      a lane number. Theoretically, this could block an IO that could've
      otherwise run using a different, free lane. But some fio workloads
      showed that the direct cpu -> lane hash performed faster than tracking
      'last lane' - my reasoning is the cache thrash caused by moving the
      atomic variable made that approach slower than simply waiting out the
      in-progress IO. This supports the conclusion that the driver can be a
      very simple bio-based one that does synchronous IOs instead of queuing.
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      [jmoyer: fix nmi watchdog timeout in btt_map_init]
      [jmoyer: move btt initialization to module load path]
      [jmoyer: fix memory leak in the btt initialization path]
      [jmoyer: Don't overwrite corrupted arenas]
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  14. 25 Jun, 2015 3 commits
    • Dan Williams's avatar
      libnvdimm: infrastructure for btt devices · 8c2f7e86
      Dan Williams authored
      NVDIMM namespaces, in addition to accepting "struct bio" based requests,
      also have the capability to perform byte-aligned accesses.  By default
      only the bio/block interface is used.  However, if another driver can
      make effective use of the byte-aligned capability it can claim namespace
      interface and use the byte-aligned ->rw_bytes() interface.
      The BTT driver is the initial first consumer of this mechanism to allow
      adding atomic sector update semantics to a pmem or blk namespace.  This
      patch is the sysfs infrastructure to allow configuring a BTT instance
      for a namespace.  Enabling that BTT and performing i/o is in a
      subsequent patch.
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    • Dan Williams's avatar
      libnvdimm: pmem label sets and namespace instantiation. · bf9bccc1
      Dan Williams authored
      A complete label set is a PMEM-label per-dimm per-interleave-set where
      all the UUIDs match and the interleave set cookie matches the hosting
      interleave set.
      Present sysfs attributes for manipulation of a PMEM-namespace's
      'alt_name', 'uuid', and 'size' attributes.  A later patch will make
      these settings persistent by writing back the label.
      Note that PMEM allocations grow forwards from the start of an interleave
      set (lowest dimm-physical-address (DPA)).  BLK-namespaces that alias
      with a PMEM interleave set will grow allocations backward from the
      highest DPA.
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Neil Brown <neilb@suse.de>
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    • Dan Williams's avatar
      libnvdimm: support for legacy (non-aliasing) nvdimms · 3d88002e
      Dan Williams authored
      The libnvdimm region driver is an intermediary driver that translates
      non-volatile "region"s into "namespace" sub-devices that are surfaced by
      persistent memory block-device drivers (PMEM and BLK).
      ACPI 6 introduces the concept that a given nvdimm may simultaneously
      offer multiple access modes to its media through direct PMEM load/store
      access, or windowed BLK mode.  Existing nvdimms mostly implement a PMEM
      interface, some offer a BLK-like mode, but never both as ACPI 6 defines.
      If an nvdimm is single interfaced, then there is no need for dimm
      metadata labels.  For these devices we can take the region boundaries
      directly to create a child namespace device (nd_namespace_io).
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarToshi Kani <toshi.kani@hp.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>