1. 24 Sep, 2019 1 commit
    • Aneesh Kumar K.V's avatar
      libnvdimm/region: Initialize bad block for volatile namespaces · c42adf87
      Aneesh Kumar K.V authored
      
      
      We do check for a bad block during namespace init and that use
      region bad block list. We need to initialize the bad block
      for volatile regions for this to work. We also observe a lockdep
      warning as below because the lock is not initialized correctly
      since we skip bad block init for volatile regions.
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149
       Call Trace:
       [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable)
       [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60
       [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0
       [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270
       [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290
       [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0
       [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0
       [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160
       [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240
       [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0
       [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0
       [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0
       [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0
       [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130
       [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50
       [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0
       [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170
       [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100
       [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48
       [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0
       [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c
       [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180
       [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      c42adf87
  2. 05 Sep, 2019 2 commits
  3. 29 Aug, 2019 1 commit
  4. 18 Jul, 2019 5 commits
  5. 05 Jun, 2019 1 commit
  6. 20 May, 2019 1 commit
    • Qian Cai's avatar
      libnvdimm: Fix compilation warnings with W=1 · c01dafad
      Qian Cai authored
      
      
      Several places (dimm_devs.c, core.c etc) include label.h but only
      label.c uses NSINDEX_SIGNATURE, so move its definition to label.c
      instead.
      
      In file included from drivers/nvdimm/dimm_devs.c:23:
      drivers/nvdimm/label.h:41:19: warning: 'NSINDEX_SIGNATURE' defined but
      not used [-Wunused-const-variable=]
      
      Also, some places abuse "/**" which is only reserved for the kernel-doc.
      
      drivers/nvdimm/bus.c:648: warning: cannot understand function prototype:
      'struct attribute_group nd_device_attribute_group = '
      drivers/nvdimm/bus.c:677: warning: cannot understand function prototype:
      'struct attribute_group nd_numa_attribute_group = '
      
      Those are just some member assignments for the "struct attribute_group"
      instances and it can't be expressed in the kernel-doc.
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      c01dafad
  7. 09 Apr, 2019 1 commit
    • Sakari Ailus's avatar
      treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively · d75f773c
      Sakari Ailus authored
      %pF and %pf are functionally equivalent to %pS and %ps conversion
      specifiers. The former are deprecated, therefore switch the current users
      to use the preferred variant.
      
      The changes have been produced by the following command:
      
      	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
      	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done
      
      And verifying the result.
      
      Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
      
      
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: xen-devel@lists.xenproject.org
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Cc: linux-mmc@vger.kernel.org
      Cc: linux-nvdimm@lists.01.org
      Cc: linux-pci@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: linux-mm@kvack.org
      Cc: ceph-devel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
      Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
      Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      d75f773c
  8. 31 Jan, 2019 1 commit
  9. 21 Dec, 2018 1 commit
    • Dave Jiang's avatar
      acpi/nfit, libnvdimm/security: Add security DSM overwrite support · 7d988097
      Dave Jiang authored
      
      
      Add support for the NVDIMM_FAMILY_INTEL "ovewrite" capability as
      described by the Intel DSM spec v1.7. This will allow triggering of
      overwrite on Intel NVDIMMs. The overwrite operation can take tens of
      minutes. When the overwrite DSM is issued successfully, the NVDIMMs will
      be unaccessible. The kernel will do backoff polling to detect when the
      overwrite process is completed. According to the DSM spec v1.7, the 128G
      NVDIMMs can take up to 15mins to perform overwrite and larger DIMMs will
      take longer.
      
      Given that overwrite puts the DIMM in an indeterminate state until it
      completes introduce the NDD_SECURITY_OVERWRITE flag to prevent other
      operations from executing when overwrite is happening. The
      NDD_WORK_PENDING flag is added to denote that there is a device reference
      on the nvdimm device for an async workqueue thread context.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      7d988097
  10. 14 Dec, 2018 1 commit
    • Dave Jiang's avatar
      acpi/nfit, libnvdimm: Introduce nvdimm_security_ops · f2989396
      Dave Jiang authored
      
      
      Some NVDIMMs, like the ones defined by the NVDIMM_FAMILY_INTEL command
      set, expose a security capability to lock the DIMMs at poweroff and
      require a passphrase to unlock them. The security model is derived from
      ATA security. In anticipation of other DIMMs implementing a similar
      scheme, and to abstract the core security implementation away from the
      device-specific details, introduce nvdimm_security_ops.
      
      Initially only a status retrieval operation, ->state(), is defined,
      along with the base infrastructure and definitions for future
      operations.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Co-developed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      f2989396
  11. 10 Dec, 2018 1 commit
  12. 04 Dec, 2018 1 commit
    • Dave Jiang's avatar
      acpi/nfit: Add support for Intel DSM 1.8 commands · b3ed2ce0
      Dave Jiang authored
      Add command definition for security commands defined in Intel DSM
      specification v1.8 [1]. This includes "get security state", "set
      passphrase", "unlock unit", "freeze lock", "secure erase", "overwrite",
      "overwrite query", "master passphrase enable/disable", and "master
      erase", . Since this adds several Intel definitions, move the relevant
      bits to their own header.
      
      These commands mutate physical data, but that manipulation is not cache
      coherent. The requirement to flush and invalidate caches makes these
      commands unsuitable to be called from userspace, so extra logic is added
      to detect and block these commands from being submitted via the ioctl
      command submission path.
      
      Lastly, the commands may contain sensitive key material that should not
      be dumped in a standard debug session. Update the nvdimm-command
      payload-dump facility to move security command payloads behind a
      default-off compile time switch.
      
      [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf
      
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      b3ed2ce0
  13. 26 Sep, 2018 2 commits
  14. 20 Aug, 2018 1 commit
    • Vishal Verma's avatar
      libnvdimm: fix ars_status output length calculation · 286e8771
      Vishal Verma authored
      Commit efda1b5d ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling")
      Introduced additional hardening for ambiguity in the ACPI spec for
      ars_status output sizing. However, it had a couple of cases mixed up.
      Where it should have been checking for (and returning) "out_field[1] -
      4" it was using "out_field[1] - 8" and vice versa.
      
      This caused a four byte discrepancy in the buffer size passed on to
      the command handler, and in some cases, this caused memory corruption
      like:
      
        ./daxdev-errors.sh: line 76: 24104 Aborted   (core dumped) ./daxdev-errors $busdev $region
        malloc(): memory corruption
        Program received signal SIGABRT, Aborted.
        [...]
        #5  0x00007ffff7865a2e in calloc () from /lib64/libc.so.6
        #6  0x00007ffff7bc2970 in ndctl_bus_cmd_new_ars_status (ars_cap=ars_cap@entry=0x6153b0) at ars.c:136
        #7  0x0000000000401644 in check_ars_status (check=0x7fffffffdeb0, bus=0x604c20) at daxdev-errors.c:144
        #8  test_daxdev_clear_error (region_name=<optimized out>, bus_name=<optimized out>)
            at daxdev-errors.c:332
      
      Cc: <stable@vger.kernel.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Lukasz Dorau <lukasz.dorau@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Fixes: efda1b5d
      
       ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling")
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-of-by: default avatarDave Jiang <dave.jiang@intel.com>
      286e8771
  15. 03 Jun, 2018 1 commit
  16. 01 Jun, 2018 1 commit
    • Robert Elliott's avatar
      linvdimm, pmem: Preserve read-only setting for pmem devices · 254a4cd5
      Robert Elliott authored
      The pmem driver does not honor a forced read-only setting for very long:
      	$ blockdev --setro /dev/pmem0
      	$ blockdev --getro /dev/pmem0
      	1
      
      followed by various commands like these:
      	$ blockdev --rereadpt /dev/pmem0
      	or
      	$ mkfs.ext4 /dev/pmem0
      
      results in this in the kernel serial log:
      	 nd_pmem namespace0.0: region0 read-write, marking pmem0 read-write
      
      with the read-only setting lost:
      	$ blockdev --getro /dev/pmem0
      	0
      
      That's from bus.c nvdimm_revalidate_disk(), which always applies the
      setting from nd_region (which is initially based on the ACPI NFIT
      NVDIMM state flags not_armed bit).
      
      In contrast, commit 20bd1d02 ("scsi: sd: Keep disk read-only when
      re-reading partition") fixed this issue for SCSI devices to preserve
      the previous setting if it was set to read-only.
      
      This patch modifies bus.c to preserve any previous read-only setting.
      It also eliminates the kernel serial log print except for cases where
      read-write is changed to read-only, so it doesn't print read-only to
      read-only non-changes.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 58138820
      
       ("libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only")
      Signed-off-by: default avatarRobert Elliott <elliott@hpe.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      254a4cd5
  17. 07 Apr, 2018 1 commit
  18. 06 Mar, 2018 1 commit
  19. 04 Dec, 2017 1 commit
    • Dan Williams's avatar
      nfit, libnvdimm: deprecate the generic SMART ioctl · cdd77d3e
      Dan Williams authored
      The kernel's ND_IOCTL_SMART_THRESHOLD command is based on a payload
      definition that has become broken / out-of-sync with recent versions of
      the NVDIMM_FAMILY_INTEL definition. Deprecate the use of the
      ND_IOCTL_SMART_THRESHOLD command in favor of the ND_CMD_CALL approach
      taken by NVDIMM_FAMILY_{HPE,MSFT}, where we can manage the per-vendor
      variance in userspace.
      
      In a couple years, when the new scheme is widely deployed in userspace
      packages, the ND_IOCTL_SMART_THRESHOLD support can be removed. For now
      we prevent new binaries from compiling against the kernel header
      definitions, but kernel still compatible with old binaries. The
      libndctl.h [1] header is now the authoritative interface definition for
      NVDIMM SMART.
      
      [1]: https://github.com/pmem/ndctl
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      cdd77d3e
  20. 02 Nov, 2017 1 commit
  21. 04 Sep, 2017 1 commit
    • Meng Xu's avatar
      libnvdimm, nfit: move the check on nd_reserved2 to the endpoint · 9edcad53
      Meng Xu authored
      
      
      Delay the check of nd_reserved2 to the actual endpoint (acpi_nfit_ctl)
      that uses it, as a prevention of a potential double-fetch bug.
      
      While examining the kernel source code, I found a dangerous operation that
      could turn into a double-fetch situation (a race condition bug) where
      the same userspace memory region are fetched twice into kernel with sanity
      checks after the first fetch while missing checks after the second fetch.
      
      In the case of _IOC_NR(ioctl_cmd) == ND_CMD_CALL:
      
      1. The first fetch happens in line 935 copy_from_user(&pkg, p, sizeof(pkg)
      
      2. subsequently `pkg.nd_reserved2` is asserted to be all zeroes
      (line 984 to 986).
      
      3. The second fetch happens in line 1022 copy_from_user(buf, p, buf_len)
      
      4. Given that `p` can be fully controlled in userspace, an attacker can
      race condition to override the header part of `p`, say,
      `((struct nd_cmd_pkg *)p)->nd_reserved2` to arbitrary value
      (say nine 0xFFFFFFFF for `nd_reserved2`) after the first fetch but before the
      second fetch. The changed value will be copied to `buf`.
      
      5. There is no checks on the second fetches until the use of it in
      line 1034: nd_cmd_clear_to_send(nvdimm_bus, nvdimm, cmd, buf) and
      line 1038: nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len, &cmd_rc)
      which means that the assumed relation, `p->nd_reserved2` are all zeroes might
      not hold after the second fetch. And once the control goes to these functions
      we lose the context to assert the assumed relation.
      
      6. Based on my manual analysis, `p->nd_reserved2` is not used in function
      `nd_cmd_clear_to_send` and potential implementations of `nd_desc->ndctl`
      so there is no working exploit against it right now. However, this could
      easily turns to an exploitable one if careless developers start to use
      `p->nd_reserved2` later and assume that they are all zeroes.
      
      Move the validation of the nd_reserved2 field to the ->ndctl()
      implementation where it has a stable buffer to evaluate.
      Signed-off-by: default avatarMeng Xu <mengxu.gatech@gmail.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      9edcad53
  22. 31 Aug, 2017 2 commits
    • Dan Williams's avatar
      libnvdimm: fix integer overflow static analysis warning · 58738c49
      Dan Williams authored
      Dan reports:
          The patch 62232e45: "libnvdimm: control (ioctl) messages for
          nvdimm_bus and nvdimm devices" from Jun 8, 2015, leads to the
          following static checker warning:
      
                  drivers/nvdimm/bus.c:1018 __nd_ioctl()
                  warn: integer overflows 'buf_len'
      
          From a casual review, this seems like it might be a real bug.  On
          the first iteration we load some data into in_env[].  On the second
          iteration we read a use controlled "in_size" from nd_cmd_in_size().
          It can go up to UINT_MAX - 1.  A high number means we will fill the
          whole in_env[] buffer.  But we potentially keep looping and adding
          more to in_len so now it can be any value.
      
          It simple enough to change, but it feels weird that we keep looping
          even though in_env is totally full.  Shouldn't we just return an
          error if we don't have space for desc->in_num.
      
      We keep looping because the size of the total input is allowed to be
      bigger than the 'envelope' which is a subset of the payload that tells
      us how much data to expect. For safety explicitly check that buf_len
      does not overflow which is what the checker flagged.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 62232e45
      
      : "libnvdimm: control (ioctl) messages for nvdimm_bus..."
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      58738c49
    • Vishal Verma's avatar
      libnvdimm: fix potential deadlock while clearing errors · 0930a750
      Vishal Verma authored
      
      
      With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths.
      Specifically, the DSM to clear media errors is called during writes, so
      that we can provide a writes-fix-errors model.
      
      However it is easy to imagine a scenario like:
       -> write through the nvdimm driver
         -> acpi allocation
           -> writeback, causes more IO through the nvdimm driver
             -> deadlock
      
      Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO
      flag for the current scope when issuing commands/IOs that are expected
      to clear errors.
      
      Cc: <linux-acpi@vger.kernel.org>
      Cc: <linux-nvdimm@lists.01.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      0930a750
  23. 01 Jul, 2017 1 commit
  24. 27 Jun, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm, nfit: enable support for volatile ranges · c9e582aa
      Dan Williams authored
      
      
      Allow volatile nfit ranges to participate in all the same infrastructure
      provided for persistent memory regions. A resulting resulting namespace
      device will still be called "pmem", but the parent region type will be
      "nd_volatile". This is in preparation for disabling the dax ->flush()
      operation in the pmem driver when it is hosted on a volatile range.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      c9e582aa
  25. 15 Jun, 2017 1 commit
    • Toshi Kani's avatar
      libnvdimm, pmem: Add sysfs notifications to badblocks · 975750a9
      Toshi Kani authored
      
      
      Sysfs "badblocks" information may be updated during run-time that:
       - MCE, SCI, and sysfs "scrub" may add new bad blocks
       - Writes and ioctl() may clear bad blocks
      
      Add support to send sysfs notifications to sysfs "badblocks" file
      under region and pmem directories when their badblocks information
      is re-evaluated (but is not necessarily changed) during run-time.
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Linda Knippers <linda.knippers@hpe.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      975750a9
  26. 29 Apr, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm: rework region badblocks clearing · 23f49844
      Dan Williams authored
      
      
      Toshi noticed that the new support for a region-level badblocks missed
      the case where errors are cleared due to BTT I/O.
      
      An initial attempt to fix this ran into a "sleeping while atomic"
      warning due to taking the nvdimm_bus_lock() in the BTT I/O path to
      satisfy the locking requirements of __nvdimm_bus_badblocks_clear().
      However, that lock is not needed since we are not acting on any data that
      is subject to change under that lock. The badblocks instance has its own
      internal lock to handle mutations of the error list.
      
      So, in order to make it clear that we are just acting on region devices,
      rename __nvdimm_bus_badblocks_clear() to nvdimm_clear_badblocks_regions().
      Eliminate the lock and consolidate all support routines for the new
      nvdimm_account_cleared_poison() in drivers/nvdimm/bus.c. Finally, to the
      opportunity to cleanup to some unnecessary casts, make the calling
      convention of nvdimm_clear_badblocks_regions() clearer by replacing struct
      resource with the minimal struct clear_badblocks_context, and use the
      DEVICE_ATTR macro.
      
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Reported-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      23f49844
  27. 28 Apr, 2017 1 commit
  28. 13 Apr, 2017 2 commits
    • Dave Jiang's avatar
      libnvdimm: fix clear poison locking with spinlock and GFP_NOWAIT allocation · b3b454f6
      Dave Jiang authored
      
      
      The following warning results from holding a lane spinlock,
      preempt_disable(), or the btt map spinlock and then trying to take the
      reconfig_mutex to walk the poison list and potentially add new entries.
      
      BUG: sleeping function called from invalid context at kernel/locking/mutex.
      c:747
      in_atomic(): 1, irqs_disabled(): 0, pid: 17159, name: dd
      [..]
      Call Trace:
      dump_stack+0x85/0xc8
      ___might_sleep+0x184/0x250
      __might_sleep+0x4a/0x90
      __mutex_lock+0x58/0x9b0
      ? nvdimm_bus_lock+0x21/0x30 [libnvdimm]
      ? __nvdimm_bus_badblocks_clear+0x2f/0x60 [libnvdimm]
      ? acpi_nfit_forget_poison+0x79/0x80 [nfit]
      ? _raw_spin_unlock+0x27/0x40
      mutex_lock_nested+0x1b/0x20
      nvdimm_bus_lock+0x21/0x30 [libnvdimm]
      nvdimm_forget_poison+0x25/0x50 [libnvdimm]
      nvdimm_clear_poison+0x106/0x140 [libnvdimm]
      nsio_rw_bytes+0x164/0x270 [libnvdimm]
      btt_write_pg+0x1de/0x3e0 [nd_btt]
      ? blk_queue_enter+0x30/0x290
      btt_make_request+0x11a/0x310 [nd_btt]
      ? blk_queue_enter+0xb7/0x290
      ? blk_queue_enter+0x30/0x290
      generic_make_request+0x118/0x3b0
      
      A spinlock is introduced to protect the poison list. This allows us to not
      having to acquire the reconfig_mutex for touching the poison list. The
      add_poison() function has been broken out into two helper functions. One to
      allocate the poison entry and the other to apppend the entry. This allows us
      to unlock the poison_lock in non-I/O path and continue to be able to allocate
      the poison entry with GFP_KERNEL. We will use GFP_NOWAIT in the I/O path in
      order to satisfy being in atomic context.
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      b3b454f6
    • Dave Jiang's avatar
      libnvdimm: add support for clear poison list and badblocks for device dax · 006358b3
      Dave Jiang authored
      
      
      Providing mechanism to clear poison list via the ndctl ND_CMD_CLEAR_ERROR
      call. We will update the poison list and also the badblocks at region level
      if the region is in dax mode or in pmem mode and not active. In other
      words we force badblocks to be cleared through write requests if the
      address is currently accessed through a block device, otherwise it can
      only be done via the ioctl+dsm path.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      006358b3
  29. 11 Apr, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat · 0beb2012
      Dan Williams authored
      Holding the reconfig_mutex over a potential userspace fault sets up a
      lockdep dependency chain between filesystem-DAX and the libnvdimm ioctl
      path. Move the user access outside of the lock.
      
           [ INFO: possible circular locking dependency detected ]
           4.11.0-rc3+ #13 Tainted: G        W  O
           -------------------------------------------------------
           fallocate/16656 is trying to acquire lock:
            (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [<ffffffffa00080b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm]
           but task is already holding lock:
            (jbd2_handle){++++..}, at: [<ffffffff813b4944>] start_this_handle+0x104/0x460
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #2 (jbd2_handle){++++..}:
                  lock_acquire+0xbd/0x200
                  start_this_handle+0x16a/0x460
                  jbd2__journal_start+0xe9/0x2d0
                  __ext4_journal_start_sb+0x89/0x1c0
                  ext4_dirty_inode+0x32/0x70
                  __mark_inode_dirty+0x235/0x670
                  generic_update_time+0x87/0xd0
                  touch_atime+0xa9/0xd0
                  ext4_file_mmap+0x90/0xb0
                  mmap_region+0x370/0x5b0
                  do_mmap+0x415/0x4f0
                  vm_mmap_pgoff+0xd7/0x120
                  SyS_mmap_pgoff+0x1c5/0x290
                  SyS_mmap+0x22/0x30
                  entry_SYSCALL_64_fastpath+0x1f/0xc2
      
          -> #1 (&mm->mmap_sem){++++++}:
                  lock_acquire+0xbd/0x200
                  __might_fault+0x70/0xa0
                  __nd_ioctl+0x683/0x720 [libnvdimm]
                  nvdimm_ioctl+0x8b/0xe0 [libnvdimm]
                  do_vfs_ioctl+0xa8/0x740
                  SyS_ioctl+0x79/0x90
                  do_syscall_64+0x6c/0x200
                  return_from_SYSCALL_64+0x0/0x7a
      
          -> #0 (&nvdimm_bus->reconfig_mutex){+.+.+.}:
                  __lock_acquire+0x16b6/0x1730
                  lock_acquire+0xbd/0x200
                  __mutex_lock+0x88/0x9b0
                  mutex_lock_nested+0x1b/0x20
                  nvdimm_bus_lock+0x21/0x30 [libnvdimm]
                  nvdimm_forget_poison+0x25/0x50 [libnvdimm]
                  nvdimm_clear_poison+0x106/0x140 [libnvdimm]
                  pmem_do_bvec+0x1c2/0x2b0 [nd_pmem]
                  pmem_make_request+0xf9/0x270 [nd_pmem]
                  generic_make_request+0x118/0x3b0
                  submit_bio+0x75/0x150
      
      Cc: <stable@vger.kernel.org>
      Fixes: 62232e45
      
       ("libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices")
      Cc: Dave Jiang <dave.jiang@intel.com>
      Reported-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      0beb2012
  30. 07 Dec, 2016 1 commit
    • Dan Williams's avatar
      acpi, nfit, libnvdimm: fix / harden ars_status output length handling · efda1b5d
      Dan Williams authored
      Given ambiguities in the ACPI 6.1 definition of the "Output (Size)"
      field of the ARS (Address Range Scrub) Status command, a firmware
      implementation may in practice return 0, 4, or 8 to indicate that there
      is no output payload to process.
      
      The specification states "Size of Output Buffer in bytes, including this
      field.". However, 'Output Buffer' is also the name of the entire
      payload, and earlier in the specification it states "Max Query ARS
      Status Output Buffer Size: Maximum size of buffer (including the Status
      and Extended Status fields)".
      
      Without this fix if the BIOS happens to return 0 it causes memory
      corruption as evidenced by this result from the acpi_nfit_ctl() unit
      test.
      
       ars_status00000000: 00020000 00000000                    ........
       BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff)
       kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC
       task: ffff8803332d2ec0 task.stack: ffffc9000174c000
       RIP: 0010:[<ffffffff814cfe72>]  [<ffffffff814cfe72>] __memcpy+0x12/0x20
       RSP: 0018:ffffc9000174f9a8  EFLAGS: 00010246
       RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56
       RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000
       RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8
       R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0
       FS:  00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0
       Stack:
        ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac
        0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000
        0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246
       Call Trace:
        [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit]
        [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test]
      
      Cc: <stable@vger.kernel.org>
      Fixes: 747ffe11
      
       ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      efda1b5d
  31. 01 Oct, 2016 1 commit
    • Vishal Verma's avatar
      libnvdimm: clear the internal poison_list when clearing badblocks · e046114a
      Vishal Verma authored
      nvdimm_clear_poison cleared the user-visible badblocks, and sent
      commands to the NVDIMM to clear the areas marked as 'poison', but it
      neglected to clear the same areas from the internal poison_list which is
      used to marshal ARS results before sorting them by namespace. As a
      result, once on-demand ARS functionality was added:
      
      37b137ff nfit, libnvdimm: allow an ARS scrub to be triggered on demand
      
      A scrub triggered from either sysfs or an MCE was found to be adding
      stale entries that had been cleared from gendisk->badblocks, but were
      still present in nvdimm_bus->poison_list. Additionally, the stale entries
      could be triggered into producing stale disk->badblocks by simply disabling
      and re-enabling the namespace or region.
      
      This adds the missing step of clearing poison_list entries when clearing
      poison, so that it is always in sync with badblocks.
      
      Fixes: 37b137ff
      
       ("nfit, libnvdimm: allow an ARS scrub to be triggered on demand")
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      e046114a
  32. 10 Sep, 2016 1 commit