1. 24 Sep, 2019 1 commit
  2. 05 Jun, 2019 1 commit
  3. 28 Feb, 2019 2 commits
    • Vishal Verma's avatar
      libnvdimm/btt: Fix LBA masking during 'free list' population · 9dedc73a
      Vishal Verma authored
      The Linux BTT implementation assumes that log entries will never have
      the 'zero' flag set, and indeed it never sets that flag for log entries
      However, the UEFI spec is ambiguous on the exact format of the LBA field
      of a log entry, specifically as to whether it should include the
      additional flag bits or not. While a zero bit doesn't make sense in the
      context of a log entry, other BTT implementations might still have it set.
      If an implementation does happen to have it set, we would happily read
      it in as the next block to write to for writes. Since a high bit is set,
      it pushes the block number out of the range of an 'arena', and we fail
      such a write with an EIO.
      Follow the robustness principle, and tolerate such implementations by
      stripping out the zero flag when populating the free list during
      initialization. Additionally, use the same stripped out entries for
      detection of incomplete writes and map restoration that happens at this
      Add a sysfs file 'log_zero_flags' that indicates the ability to accept
      such a layout to userspace applications. This enables 'ndctl
      check-namespace' to recognize whether the kernel is able to handle zero
      flags, or whether it should attempt a fix-up under the --repair option.
      Cc: Dan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarDexuan Cui <decui@microsoft.com>
      Reported-by: default avatarPedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com>
      Tested-by: default avatarDexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    • Vishal Verma's avatar
      libnvdimm/btt: Remove unnecessary code in btt_freelist_init · 2f8c9011
      Vishal Verma authored
      We call btt_log_read() twice, once to get the 'old' log entry, and again
      to get the 'new' entry. However, we have no use for the 'old' entry, so
      remove it.
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  4. 28 Sep, 2018 1 commit
  5. 18 Jul, 2018 1 commit
    • Tejun Heo's avatar
      block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb
      Tejun Heo authored
       ("block/mm: make bdev_ops->rw_page() take a bool for
      read/write") replaced @op with boolean @is_write, which limited the
      amount of information going into ->rw_page() and more importantly
      page_endio(), which removed the need to expose block internals to mm.
      Unfortunately, we want to track discards separately and @is_write
      isn't enough information.  This patch updates bdev_ops->rw_page() to
      take REQ_OP instead but leaves page_endio() to take bool @is_write.
      This allows the block part of operations to have enough information
      while not leaking it to mm.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
  6. 08 Mar, 2018 1 commit
  7. 07 Mar, 2018 1 commit
  8. 20 Jan, 2018 1 commit
    • Jeff Moyer's avatar
      libnvdimm, btt: fix uninitialized err_lock · d08cd5e0
      Jeff Moyer authored
      When a sector mode namespace is initially created, the arena's err_lock
      is not initialized.  If, on the other hand, the namespace already
      exists, the mutex is initialized.  To fix the issue, I moved the mutex
      initialization into the arena_alloc, which is called by both
      discover_arenas and create_arenas.
      This was discovered on an older kernel where mutex_trylock checks the
      count to determine whether the lock is held.  Because the data structure
      is kzalloc-d, that count was 0 (held), and I/O to the device would hang
      forever waiting for the lock to be released (see btt_write_pg, for
      example).  Current kernels have a different mutex implementation that
      checks for a non-null owner, and so this doesn't show up as a problem.
      If that lock were ever contended, it might cause issues, but you'd have
      to be really unlucky, I think.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  9. 21 Dec, 2017 1 commit
  10. 16 Nov, 2017 1 commit
  11. 09 Sep, 2017 1 commit
    • Randy Dunlap's avatar
      libnvdimm, btt: fix format string warnings · 04c3c982
      Randy Dunlap authored
      Fix format warnings (seen on i386) in nvdimm/btt.c:
      ../drivers/nvdimm/btt.c: In function ‘btt_map_init’:
      ../drivers/nvdimm/btt.c:430:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Wformat=]
         dev_WARN_ONCE(to_dev(arena), size < 512,
      ../drivers/nvdimm/btt.c: In function ‘btt_log_init’:
      ../drivers/nvdimm/btt.c:474:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Wformat=]
         dev_WARN_ONCE(to_dev(arena), size < 512,
      Fixes: 86652d2e
       ("libnvdimm, btt: clean up warning and error messages")
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  12. 07 Sep, 2017 2 commits
  13. 31 Aug, 2017 5 commits
    • Vishal Verma's avatar
      libnvdimm, btt: rework error clearing · d9b83c75
      Vishal Verma authored
      Clearing errors or badblocks during a BTT write requires sending an ACPI
      DSM, which means potentially sleeping. Since a BTT IO happens in atomic
      context (preemption disabled, spinlocks may be held), we cannot perform
      error clearing in the course of an IO. Due to this error clearing for
      BTT IOs has hitherto been disabled.
      In this patch we move error clearing out of the atomic section, and thus
      re-enable error clearing with BTTs. When we are about to add a block to
      the free list, we check if it was previously marked as an error, and if
      it was, we add it to the freelist, but also set a flag that says error
      clearing will be required. We then drop the lane (ending the atomic
      context), and send a zero buffer so that the error can be cleared. The
      error flag in the free list is protected by the nd 'lane', and is set
      only be a thread while it holds that lane. When the error is cleared,
      the flag is cleared, but while holding a mutex for that freelist index.
      When writing, we check for two things -
      1/ If the freelist mutex is held or if the error flag is set. If so,
      this is an error block that is being (or about to be) cleared.
      2/ If the block is a known badblock based on nsio->bb
      The second check is required because the BTT map error flag for a map
      entry only gets set when an error LBA is read. If we write to a new
      location that may not have the map error flag set, but still might be in
      the region's badblock list, we can trigger an EIO on the write, which is
      undesirable and completely avoidable.
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    • Vishal Verma's avatar
      libnvdimm, btt: cache sector_size in arena_info · 75892004
      Vishal Verma authored
      In preparation for the error clearing rework, add sector_size in the
      arena_info struct.
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    • Vishal Verma's avatar
      libnvdimm, btt: ensure that flags were also unchanged during a map_read · 1398199d
      Vishal Verma authored
      In btt_map_read, we read the map twice to make sure that the map entry
      didn't change after we added it to the read tracking table. In
      anticipation of expanding the use of the error bit, also make sure that
      the error and zero flags are constant across the two map reads.
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    • Vishal Verma's avatar
      libnvdimm, btt: refactor map entry operations with macros · 0595d539
      Vishal Verma authored
      Add helpers for converting a raw map entry to just the block number, or
      either of the 'e' or 'z' flags in preparation for actually using the
      error flag to mark blocks with media errors.
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    • Vishal Verma's avatar
      libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path · 1db1f3ce
      Vishal Verma authored
      The IO context conversion for rw_bytes missed a case in the BTT write
      path (btt_map_write) which should've been marked as atomic.
      In reality this should not cause a problem, because map writes are to
      small for nsio_rw_bytes to attempt error clearing, but it should be
      fixed for posterity.
      Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
      things like the nfit unit tests, which don't actually sleep, can catch
      bugs like this.
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  14. 30 Aug, 2017 1 commit
  15. 03 Jul, 2017 2 commits
  16. 01 Jul, 2017 1 commit
  17. 30 Jun, 2017 1 commit
  18. 29 Jun, 2017 1 commit
  19. 27 Jun, 2017 1 commit
  20. 09 Jun, 2017 1 commit
  21. 11 May, 2017 2 commits
    • Vishal Verma's avatar
      libnvdimm, btt: ensure that initializing metadata clears poison · b177fe85
      Vishal Verma authored
      If we had badblocks/poison in the metadata area of a BTT, recreating the
      BTT would not clear the poison in all cases, notably the flog area. This
      is because rw_bytes will only clear errors if the request being sent
      down is 512B aligned and sized.
      Make sure that when writing the map and info blocks, the rw_bytes being
      sent are of the correct size/alignment. For the flog, instead of doing
      the smaller log_entry writes only, first do a 'wipe' of the entire area
      by writing zeroes in large enough chunks so that errors get cleared.
      Cc: Andy Rudoff <andy.rudoff@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    • Vishal Verma's avatar
      libnvdimm: add an atomic vs process context flag to rw_bytes · 3ae3d67b
      Vishal Verma authored
      nsio_rw_bytes can clear media errors, but this cannot be done while we
      are in an atomic context due to locking within ACPI. From the BTT,
      ->rw_bytes may be called either from atomic or process context depending
      on whether the calls happen during initialization or during IO.
      During init, we want to ensure error clearing happens, and the flag
      marking process context allows nsio_rw_bytes to do that. When called
      during IO, we're in atomic context, and error clearing can be skipped.
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  22. 08 Aug, 2016 1 commit
  23. 07 Aug, 2016 1 commit
  24. 04 Aug, 2016 1 commit
  25. 27 Jun, 2016 1 commit
    • Dan Williams's avatar
      block: convert to device_add_disk() · 0d52c756
      Dan Williams authored
      For block drivers that specify a parent device, convert them to use
      This conversion was done with the following semantic patch:
          struct gendisk *disk;
          expression E;
          - disk->driverfs_dev = E;
          - add_disk(disk);
          + device_add_disk(E, disk);
          struct gendisk *disk;
          expression E1, E2;
          - disk->driverfs_dev = E1;
          E2 = disk;
          - add_disk(E2);
          + device_add_disk(E1, E2);
      ...plus some manual fixups for a few missed conversions.
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
  26. 22 Apr, 2016 3 commits
  27. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      This promise never materialized.  And unlikely will.
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      Let's stop pretending that pages in page cache are special.  They are
      The changes are pretty straight-forward:
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
       - page_cache_get() -> get_page();
       - page_cache_release() -> put_page();
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      virtual patch
      expression E;
      + E
      expression E;
      + E
      + PAGE_SHIFT
      + PAGE_SIZE
      + PAGE_MASK
      expression E;
      + PAGE_ALIGN(E)
      expression E;
      - page_cache_get(E)
      + get_page(E)
      expression E;
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  28. 09 Mar, 2016 1 commit
  29. 07 Nov, 2015 1 commit
  30. 21 Oct, 2015 1 commit