1. 28 Dec, 2018 2 commits
    • Andrey Konovalov's avatar
      kasan: add CONFIG_KASAN_GENERIC and CONFIG_KASAN_SW_TAGS · 2bd926b4
      Andrey Konovalov authored
      This commit splits the current CONFIG_KASAN config option into two:
      1. CONFIG_KASAN_GENERIC, that enables the generic KASAN mode (the one
         that exists now);
      2. CONFIG_KASAN_SW_TAGS, that enables the software tag-based KASAN mode.
      
      The name CONFIG_KASAN_SW_TAGS is chosen as in the future we will have
      another hardware tag-based KASAN mode, that will rely on hardware memory
      tagging support in arm64.
      
      With CONFIG_KASAN_SW_TAGS enabled, compiler options are changed to
      instrument kernel files with -fsantize=kernel-hwaddress (except the ones
      for which KASAN_SANITIZE := n is set).
      
      Both CONFIG_KASAN_GENERIC and CONFIG_KASAN_SW_TAGS support both
      CONFIG_KASAN_INLINE and CONFIG_KASAN_OUTLINE instrumentation modes.
      
      This commit also adds empty placeholder (for now) implementation of
      tag-based KASAN specific hooks inserted by the compiler and adjusts
      common hooks implementation.
      
      While this commit adds the CONFIG_KASAN_SW_TAGS config option, this option
      is not selectable, as it depends on HAVE_ARCH_KASAN_SW_TAGS, which we will
      enable once all the infrastracture code has been added.
      
      Link: http://lkml.kernel.org/r/b2550106eb8a68b10fefbabce820910b115aa853.1544099024.git.andreyknvl@google.com
      
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2bd926b4
    • Andrey Konovalov's avatar
      kasan, mm: change hooks signatures · 0116523c
      Andrey Konovalov authored
      Patch series "kasan: add software tag-based mode for arm64", v13.
      
      This patchset adds a new software tag-based mode to KASAN [1].  (Initially
      this mode was called KHWASAN, but it got renamed, see the naming rationale
      at the end of this section).
      
      The plan is to implement HWASan [2] for the kernel with the incentive,
      that it's going to have comparable to KASAN performance, but in the same
      time consume much less memory, trading that off for somewhat imprecise bug
      detection and being supported only for arm64.
      
      The underlying ideas of the approach used by software tag-based KASAN are:
      
      1. By using the Top Byte Ignore (TBI) arm64 CPU feature, we can store
         pointer tags in the top byte of each kernel pointer.
      
      2. Using shadow memory, we can store memory tags for each chunk of kernel
         memory.
      
      3. On each memory allocation, we can generate a random tag, embed it into
         the returned pointer and set the memory tags that correspond to this
         chunk of memory to the same value.
      
      4. By using compiler instrumentation, before each memory access we can add
         a check that the pointer tag matches the tag of the memory that is being
         accessed.
      
      5. On a tag mismatch we report an error.
      
      With this patchset the existing KASAN mode gets renamed to generic KASAN,
      with the word "generic" meaning that the implementation can be supported
      by any architecture as it is purely software.
      
      The new mode this patchset adds is called software tag-based KASAN.  The
      word "tag-based" refers to the fact that this mode uses tags embedded into
      the top byte of kernel pointers and the TBI arm64 CPU feature that allows
      to dereference such pointers.  The word "software" here means that shadow
      memory manipulation and tag checking on pointer dereference is done in
      software.  As it is the only tag-based implementation right now, "software
      tag-based" KASAN is sometimes referred to as simply "tag-based" in this
      patchset.
      
      A potential expansion of this mode is a hardware tag-based mode, which
      would use hardware memory tagging support (announced by Arm [3]) instead
      of compiler instrumentation and manual shadow memory manipulation.
      
      Same as generic KASAN, software tag-based KASAN is strictly a debugging
      feature.
      
      [1] https://www.kernel.org/doc/html/latest/dev-tools/kasan.html
      
      [2] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
      
      [3] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a
      
      ====== Rationale
      
      On mobile devices generic KASAN's memory usage is significant problem.
      One of the main reasons to have tag-based KASAN is to be able to perform a
      similar set of checks as the generic one does, but with lower memory
      requirements.
      
      Comment from Vishwath Mohan <vishwath@google.com>:
      
      I don't have data on-hand, but anecdotally both ASAN and KASAN have proven
      problematic to enable for environments that don't tolerate the increased
      memory pressure well.  This includes
      
      (a) Low-memory form factors - Wear, TV, Things, lower-tier phones like Go,
      (c) Connected components like Pixel's visual core [1].
      
      These are both places I'd love to have a low(er) memory footprint option at
      my disposal.
      
      Comment from Evgenii Stepanov <eugenis@google.com>:
      
      Looking at a live Android device under load, slab (according to
      /proc/meminfo) + kernel stack take 8-10% available RAM (~350MB).  KASAN's
      overhead of 2x - 3x on top of it is not insignificant.
      
      Not having this overhead enables near-production use - ex.  running
      KASAN/KHWASAN kernel on a personal, daily-use device to catch bugs that do
      not reproduce in test configuration.  These are the ones that often cost
      the most engineering time to track down.
      
      CPU overhead is bad, but generally tolerable.  RAM is critical, in our
      experience.  Once it gets low enough, OOM-killer makes your life
      miserable.
      
      [1] https://www.blog.google/products/pixel/pixel-visual-core-image-processing-and-machine-learning-pixel-2/
      
      ====== Technical details
      
      Software tag-based KASAN mode is implemented in a very similar way to the
      generic one. This patchset essentially does the following:
      
      1. TCR_TBI1 is set to enable Top Byte Ignore.
      
      2. Shadow memory is used (with a different scale, 1:16, so each shadow
         byte corresponds to 16 bytes of kernel memory) to store memory tags.
      
      3. All slab objects are aligned to shadow scale, which is 16 bytes.
      
      4. All pointers returned from the slab allocator are tagged with a random
         tag and the corresponding shadow memory is poisoned with the same value.
      
      5. Compiler instrumentation is used to insert tag checks. Either by
         calling callbacks or by inlining them (CONFIG_KASAN_OUTLINE and
         CONFIG_KASAN_INLINE flags are reused).
      
      6. When a tag mismatch is detected in callback instrumentation mode
         KASAN simply prints a bug report. In case of inline instrumentation,
         clang inserts a brk instruction, and KASAN has it's own brk handler,
         which reports the bug.
      
      7. The memory in between slab objects is marked with a reserved tag, and
         acts as a redzone.
      
      8. When a slab object is freed it's marked with a reserved tag.
      
      Bug detection is imprecise for two reasons:
      
      1. We won't catch some small out-of-bounds accesses, that fall into the
         same shadow cell, as the last byte of a slab object.
      
      2. We only have 1 byte to store tags, which means we have a 1/256
         probability of a tag match for an incorrect access (actually even
         slightly less due to reserved tag values).
      
      Despite that there's a particular type of bugs that tag-based KASAN can
      detect compared to generic KASAN: use-after-free after the object has been
      allocated by someone else.
      
      ====== Testing
      
      Some kernel developers voiced a concern that changing the top byte of
      kernel pointers may lead to subtle bugs that are difficult to discover.
      To address this concern deliberate testing has been performed.
      
      It doesn't seem feasible to do some kind of static checking to find
      potential issues with pointer tagging, so a dynamic approach was taken.
      All pointer comparisons/subtractions have been instrumented in an LLVM
      compiler pass and a kernel module that would print a bug report whenever
      two pointers with different tags are being compared/subtracted (ignoring
      comparisons with NULL pointers and with pointers obtained by casting an
      error code to a pointer type) has been used.  Then the kernel has been
      booted in QEMU and on an Odroid C2 board and syzkaller has been run.
      
      This yielded the following results.
      
      The two places that look interesting are:
      
      is_vmalloc_addr in include/linux/mm.h
      is_kernel_rodata in mm/util.c
      
      Here we compare a pointer with some fixed untagged values to make sure
      that the pointer lies in a particular part of the kernel address space.
      Since tag-based KASAN doesn't add tags to pointers that belong to rodata
      or vmalloc regions, this should work as is.  To make sure debug checks to
      those two functions that check that the result doesn't change whether we
      operate on pointers with or without untagging has been added.
      
      A few other cases that don't look that interesting:
      
      Comparing pointers to achieve unique sorting order of pointee objects
      (e.g. sorting locks addresses before performing a double lock):
      
      tty_ldisc_lock_pair_timeout in drivers/tty/tty_ldisc.c
      pipe_double_lock in fs/pipe.c
      unix_state_double_lock in net/unix/af_unix.c
      lock_two_nondirectories in fs/inode.c
      mutex_lock_double in kernel/events/core.c
      
      ep_cmp_ffd in fs/eventpoll.c
      fsnotify_compare_groups fs/notify/mark.c
      
      Nothing needs to be done here, since the tags embedded into pointers
      don't change, so the sorting order would still be unique.
      
      Checks that a pointer belongs to some particular allocation:
      
      is_sibling_entry in lib/radix-tree.c
      object_is_on_stack in include/linux/sched/task_stack.h
      
      Nothing needs to be done here either, since two pointers can only belong
      to the same allocation if they have the same tag.
      
      Overall, since the kernel boots and works, there are no critical bugs.
      As for the rest, the traditional kernel testing way (use until fails) is
      the only one that looks feasible.
      
      Another point here is that tag-based KASAN is available under a separate
      config option that needs to be deliberately enabled. Even though it might
      be used in a "near-production" environment to find bugs that are not found
      during fuzzing or running tests, it is still a debug tool.
      
      ====== Benchmarks
      
      The following numbers were collected on Odroid C2 board. Both generic and
      tag-based KASAN were used in inline instrumentation mode.
      
      Boot time [1]:
      * ~1.7 sec for clean kernel
      * ~5.0 sec for generic KASAN
      * ~5.0 sec for tag-based KASAN
      
      Network performance [2]:
      * 8.33 Gbits/sec for clean kernel
      * 3.17 Gbits/sec for generic KASAN
      * 2.85 Gbits/sec for tag-based KASAN
      
      Slab memory usage after boot [3]:
      * ~40 kb for clean kernel
      * ~105 kb (~260% overhead) for generic KASAN
      * ~47 kb (~20% overhead) for tag-based KASAN
      
      KASAN memory overhead consists of three main parts:
      1. Increased slab memory usage due to redzones.
      2. Shadow memory (the whole reserved once during boot).
      3. Quaratine (grows gradually until some preset limit; the more the limit,
         the more the chance to detect a use-after-free).
      
      Comparing tag-based vs generic KASAN for each of these points:
      1. 20% vs 260% overhead.
      2. 1/16th vs 1/8th of physical memory.
      3. Tag-based KASAN doesn't require quarantine.
      
      [1] Time before the ext4 driver is initialized.
      [2] Measured as `iperf -s & iperf -c 127.0.0.1 -t 30`.
      [3] Measured as `cat /proc/meminfo | grep Slab`.
      
      ====== Some notes
      
      A few notes:
      
      1. The patchset can be found here:
         https://github.com/xairy/kasan-prototype/tree/khwasan
      
      2. Building requires a recent Clang version (7.0.0 or later).
      
      3. Stack instrumentation is not supported yet and will be added later.
      
      This patch (of 25):
      
      Tag-based KASAN changes the value of the top byte of pointers returned
      from the kernel allocation functions (such as kmalloc).  This patch
      updates KASAN hooks signatures and their usage in SLAB and SLUB code to
      reflect that.
      
      Link: http://lkml.kernel.org/r/aec2b5e3973781ff8a6bb6760f8543643202c451.1544099024.git.andreyknvl@google.com
      
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0116523c
  2. 23 Dec, 2018 2 commits
  3. 22 Dec, 2018 1 commit
  4. 21 Dec, 2018 5 commits
  5. 20 Dec, 2018 5 commits
    • John Fastabend's avatar
      bpf: sk_msg, sock{map|hash} redirect through ULP · 0608c69c
      John Fastabend authored
      A sockmap program that redirects through a kTLS ULP enabled socket
      will not work correctly because the ULP layer is skipped. This
      fixes the behavior to call through the ULP layer on redirect to
      ensure any operations required on the data stream at the ULP layer
      continue to be applied.
      
      To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid
      calling the BPF layer on a redirected message. This is
      required to avoid calling the BPF layer multiple times (possibly
      recursively) which is not the current/expected behavior without
      ULPs. In the future we may add a redirect flag if users _do_
      want the policy applied again but this would need to work for both
      ULP and non-ULP sockets and be opt-in to avoid breaking existing
      programs.
      
      Also to avoid polluting the flag space with an internal flag we
      reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with
      MSG_WAITFORONE. Here WAITFORONE is specific to recv path and
      SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing
      to verify is user space API is masked correctly to ensure the flag
      can not be set by user. (Note this needs to be true regardless
      because we have internal flags already in-use that user space
      should not be able to set). But for completeness we have two UAPI
      paths into sendpage, sendfile and splice.
      
      In the sendfile case the function do_sendfile() zero's flags,
      
      ./fs/read_write.c:
       static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
      		   	    size_t count, loff_t max)
       {
         ...
         fl = 0;
      #if 0
         /*
          * We need to debate whether we can enable this or not. The
          * man page documents EAGAIN return for the output at least,
          * and the application is arguably buggy if it doesn't expect
          * EAGAIN on a non-blocking file descriptor.
          */
          if (in.file->f_flags & O_NONBLOCK)
      	fl = SPLICE_F_NONBLOCK;
      #endif
          file_start_write(out.file);
          retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
       }
      
      In the splice case the pipe_to_sendpage "actor" is used which
      masks flags with SPLICE_F_MORE.
      
      ./fs/splice.c:
       static int pipe_to_sendpage(struct pipe_inode_info *pipe,
      			    struct pipe_buffer *buf, struct splice_desc *sd)
       {
         ...
         more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
         ...
       }
      
      Confirming what we expect that internal flags  are in fact internal
      to socket side.
      
      Fixes: d3b18ad3
      
       ("tls: add bpf support to sk_msg handling")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0608c69c
    • John Fastabend's avatar
      bpf: sk_msg, fix socket data_ready events · 552de910
      John Fastabend authored
      When a skb verdict program is in-use and either another BPF program
      redirects to that socket or the new SK_PASS support is used the
      data_ready callback does not wake up application. Instead because
      the stream parser/verdict is using the sk data_ready callback we wake
      up the stream parser/verdict block.
      
      Fix this by adding a helper to check if the stream parser block is
      enabled on the sk and if so call the saved pointer which is the
      upper layers wake up function.
      
      This fixes application stalls observed when an application is waiting
      for data in a blocking read().
      
      Fixes: d829e9c4
      
       ("tls: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      552de910
    • John Fastabend's avatar
      bpf: skmsg, replace comments with BUILD bug · 7a69c0f2
      John Fastabend authored
      
      
      Enforce comment on structure layout dependency with a BUILD_BUG_ON
      to ensure the condition is maintained.
      Suggested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7a69c0f2
    • Christoph Hellwig's avatar
      powerpc: use mm zones more sensibly · 25078dc1
      Christoph Hellwig authored
      
      
      Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
      common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.
      
      Move to a scheme closer to what other architectures use (and I dare to
      say the intent of the system):
      
       - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only)
       - ZONE_NORMAL: everything addressable by the kernel
       - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels
      
      Also provide information on how ZONE_DMA is used by defining
      ARCH_ZONE_DMA_BITS.
      
      Contains various fixes from Benjamin Herrenschmidt.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      25078dc1
    • Sinan Kaya's avatar
      PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set · 5d32a665
      Sinan Kaya authored
      
      
      We are compiling PCI code today for systems with ACPI and no PCI
      device present. Remove the useless code and reduce the tight
      dependency.
      Signed-off-by: default avatarSinan Kaya <okaya@kernel.org>
      Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI parts
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5d32a665
  6. 19 Dec, 2018 12 commits
    • Florian Westphal's avatar
      net: switch secpath to use skb extension infrastructure · 4165079b
      Florian Westphal authored
      
      
      Remove skb->sp and allocate secpath storage via extension
      infrastructure.  This also reduces sk_buff by 8 bytes on x86_64.
      
      Total size of allyesconfig kernel is reduced slightly, as there is
      less inlined code (one conditional atomic op instead of two on
      skb_clone).
      
      No differences in throughput in following ipsec performance tests:
      - transport mode with aes on 10GB link
      - tunnel mode between two network namespaces with aes and null cipher
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4165079b
    • Florian Westphal's avatar
      net: use skb_sec_path helper in more places · 2294be0f
      Florian Westphal authored
      
      
      skb_sec_path gains 'const' qualifier to avoid
      xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type
      
      same reasoning as previous conversions: Won't need to touch these
      spots anymore when skb->sp is removed.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2294be0f
    • Florian Westphal's avatar
      net: move secpath_exist helper to sk_buff.h · 7af8f4ca
      Florian Westphal authored
      
      
      Future patch will remove skb->sp pointer.
      To reduce noise in those patches, move existing helper to
      sk_buff and use it in more places to ease skb->sp replacement later.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7af8f4ca
    • Florian Westphal's avatar
      net: convert bridge_nf to use skb extension infrastructure · de8bda1d
      Florian Westphal authored
      
      
      This converts the bridge netfilter (calling iptables hooks from bridge)
      facility to use the extension infrastructure.
      
      The bridge_nf specific hooks in skb clone and free paths are removed, they
      have been replaced by the skb_ext hooks that do the same as the bridge nf
      allocations hooks did.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de8bda1d
    • Florian Westphal's avatar
      sk_buff: add skb extension infrastructure · df5042f4
      Florian Westphal authored
      
      
      This adds an optional extension infrastructure, with ispec (xfrm) and
      bridge netfilter as first users.
      objdiff shows no changes if kernel is built without xfrm and br_netfilter
      support.
      
      The third (planned future) user is Multipath TCP which is still
      out-of-tree.
      MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
      numbers used by individual subflows.
      
      This DSS mapping is read/written from tcp option space on receive and
      written to tcp option space on transmitted tcp packets that are part of
      and MPTCP connection.
      
      Extending skb_shared_info or adding a private data field to skb fclones
      doesn't work for incoming skb, so a different DSS propagation method would
      be required for the receive side.
      
      mptcp has same requirements as secpath/bridge netfilter:
      
      1. extension memory is released when the sk_buff is free'd.
      2. data is shared after cloning an skb (clone inherits extension)
      3. adding extension to an skb will COW the extension buffer if needed.
      
      The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the
      mapping for tx and rx processing.
      
      Two new members are added to sk_buff:
      1. 'active_extensions' byte (filling a hole), telling which extensions
         are available for this skb.
         This has two purposes.
         a) avoids the need to initialize the pointer.
         b) allows to "delete" an extension by clearing its bit
         value in ->active_extensions.
      
         While it would be possible to store the active_extensions byte
         in the extension struct instead of sk_buff, there is one problem
         with this:
          When an extension has to be disabled, we can always clear the
          bit in skb->active_extensions.  But in case it would be stored in the
          extension buffer itself, we might have to COW it first, if
          we are dealing with a cloned skb.  On kmalloc failure we would
          be unable to turn an extension off.
      
      2. extension pointer, located at the end of the sk_buff.
         If the active_extensions byte is 0, the pointer is undefined,
         it is not initialized on skb allocation.
      
      This adds extra code to skb clone and free paths (to deal with
      refcount/free of extension area) but this replaces similar code that
      manages skb->nf_bridge and skb->sp structs in the followup patches of
      the series.
      
      It is possible to add support for extensions that are not preseved on
      clones/copies.
      
      To do this, it would be needed to define a bitmask of all extensions that
      need copy/cow semantics, and change __skb_ext_copy() to check
      ->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set
      ->active_extensions to 0 on the new clone.
      
      This isn't done here because all extensions that get added here
      need the copy/cow semantics.
      
      v2:
      Allocate entire extension space using kmem_cache.
      Upside is that this allows better tracking of used memory,
      downside is that we will allocate more space than strictly needed in
      most cases (its unlikely that all extensions are active/needed at same
      time for same skb).
      The allocated memory (except the small extension header) is not cleared,
      so no additonal overhead aside from memory usage.
      
      Avoid atomic_dec_and_test operation on skb_ext_put()
      by using similar trick as kfree_skbmem() does with fclone_ref:
      If recount is 1, there is no concurrent user and we can free right away.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df5042f4
    • Florian Westphal's avatar
      netfilter: avoid using skb->nf_bridge directly · c4b0e771
      Florian Westphal authored
      
      
      This pointer is going to be removed soon, so use the existing helpers in
      more places to avoid noise when the removal happens.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4b0e771
    • Bartosz Golaszewski's avatar
      regmap: irq: add an option to clear status registers on unmask · c82ea33e
      Bartosz Golaszewski authored and Mark Brown's avatar Mark Brown committed
      
      
      Some interrupt controllers whose interrupts are acked on read will set
      the status bits for masked interrupts without changing the state of
      the IRQ line.
      
      Some chips have an additional "feature" where if those set bits are
      not cleared before unmasking their respective interrupts, the IRQ
      line will change the state and we'll interpret this as an interrupt
      although it actually fired when it was masked.
      
      Add a new field to the irq chip struct that tells the regmap irq chip
      code to always clear the status registers before actually changing the
      irq mask values.
      Signed-off-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: Mark Brown's avatarMark Brown <broonie@kernel.org>
      c82ea33e
    • Matti Vaittinen's avatar
      regmap: regmap-irq/gpio-max77620: add level-irq support · 1c2928e3
      Matti Vaittinen authored and Mark Brown's avatar Mark Brown committed
      
      
      Add level active IRQ support to regmap-irq irqchip. Change breaks
      existing regmap-irq type setting. Convert the existing drivers which
      use regmap-irq with trigger type setting (gpio-max77620) to work
      with this new approach. So we do not magically support level-active
      IRQs on gpio-max77620 - but add support to the regmap-irq for chips
      which support them =)
      
      We do not support distinguishing situation where HW supports rising
      and falling edge detection but not both. Separating this would require
      inventing yet another flags for IRQ types.
      Signed-off-by: default avatarMatti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
      Signed-off-by: Mark Brown's avatarMark Brown <broonie@kernel.org>
      1c2928e3
    • Ingo Molnar's avatar
      Revert "x86/objtool: Use asm macros to work around GCC inlining bugs" · 96af6cd0
      Ingo Molnar authored
      This reverts commit c06c4d80.
      
      See this commit for details about the revert:
      
        e769742d
      
       ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")
      Reported-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Richard Biener <rguenther@suse.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      96af6cd0
    • Dou Liyang's avatar
      genirq/affinity: Add is_managed to struct irq_affinity_desc · c410abbb
      Dou Liyang authored
      
      
      Devices which use managed interrupts usually have two classes of
      interrupts:
      
        - Interrupts for multiple device queues
        - Interrupts for general device management
      
      Currently both classes are treated the same way, i.e. as managed
      interrupts. The general interrupts get the default affinity mask assigned
      while the device queue interrupts are spread out over the possible CPUs.
      
      Treating the general interrupts as managed is both a limitation and under
      certain circumstances a bug. Assume the following situation:
      
       default_irq_affinity = 4..7
      
      So if CPUs 4-7 are offlined, then the core code will shut down the device
      management interrupts because the last CPU in their affinity mask went
      offline.
      
      It's also a limitation because it's desired to allow manual placement of
      the general device interrupts for various reasons. If they are marked
      managed then the interrupt affinity setting from both user and kernel space
      is disabled. That limitation was reported by Kashyap and Sumit.
      
      Expand struct irq_affinity_desc with a new bit 'is_managed' which is set
      for truly managed interrupts (queue interrupts) and cleared for the general
      device interrupts.
      
      [ tglx: Simplify code and massage changelog ]
      Reported-by: default avatarKashyap Desai <kashyap.desai@broadcom.com>
      Reported-by: default avatarSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: default avatarDou Liyang <douliyangs@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: linux-pci@vger.kernel.org
      Cc: shivasharan.srikanteshwara@broadcom.com
      Cc: ming.lei@redhat.com
      Cc: hch@lst.de
      Cc: bhelgaas@google.com
      Cc: douliyang1@huawei.com
      Link: https://lkml.kernel.org/r/20181204155122.6327-3-douliyangs@gmail.com
      c410abbb
    • Dou Liyang's avatar
      genirq/core: Introduce struct irq_affinity_desc · bec04037
      Dou Liyang authored
      
      
      The interrupt affinity management uses straight cpumask pointers to convey
      the automatically assigned affinity masks for managed interrupts. The core
      interrupt descriptor allocation also decides based on the pointer being non
      NULL whether an interrupt is managed or not.
      
      Devices which use managed interrupts usually have two classes of
      interrupts:
      
        - Interrupts for multiple device queues
        - Interrupts for general device management
      
      Currently both classes are treated the same way, i.e. as managed
      interrupts. The general interrupts get the default affinity mask assigned
      while the device queue interrupts are spread out over the possible CPUs.
      
      Treating the general interrupts as managed is both a limitation and under
      certain circumstances a bug. Assume the following situation:
      
       default_irq_affinity = 4..7
      
      So if CPUs 4-7 are offlined, then the core code will shut down the device
      management interrupts because the last CPU in their affinity mask went
      offline.
      
      It's also a limitation because it's desired to allow manual placement of
      the general device interrupts for various reasons. If they are marked
      managed then the interrupt affinity setting from both user and kernel space
      is disabled.
      
      To remedy that situation it's required to convey more information than the
      cpumasks through various interfaces related to interrupt descriptor
      allocation.
      
      Instead of adding yet another argument, create a new data structure
      'irq_affinity_desc' which for now just contains the cpumask. This struct
      can be expanded to convey auxilliary information in the next step.
      
      No functional change, just preparatory work.
      
      [ tglx: Simplified logic and clarified changelog ]
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Suggested-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarDou Liyang <douliyangs@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: linux-pci@vger.kernel.org
      Cc: kashyap.desai@broadcom.com
      Cc: shivasharan.srikanteshwara@broadcom.com
      Cc: sumit.saxena@broadcom.com
      Cc: ming.lei@redhat.com
      Cc: hch@lst.de
      Cc: douliyang1@huawei.com
      Link: https://lkml.kernel.org/r/20181204155122.6327-2-douliyangs@gmail.com
      bec04037
    • vingu-linaro's avatar
      PM-runtime: Switch autosuspend over to using hrtimers · 8234f673
      vingu-linaro authored
      PM-runtime uses the timer infrastructure for autosuspend. This implies
      that the minimum time before autosuspending a device is in the range
      of 1 tick included to 2 ticks excluded
       -On arm64 this means between 4ms and 8ms with default jiffies
        configuration
       -And on arm, it is between 10ms and 20ms
      
      These values are quite high for embedded systems which sometimes want
      the duration to be in the range of 1 ms.
      
      It is possible to switch autosuspend over to using hrtimers to get
      finer granularity for short durations and take advantage of slack to
      retain some margins and get long timeouts with minimum wakeups.
      
      On an arm64 platform that uses 1ms for autosuspending timeout of its
      GPU, idle power is reduced by 10% with hrtimer.
      
      The latency impact on arm64 hikey octo cores is:
       - mark_last_busy: from 1.11 us to 1.25 us
       - rpm_suspend: from 15.54 us to 15.38 us
      [Only the code path of rpm_suspend() that starts hrtimer has been
      measured.]
      
      arm64 image (arm64 default defconfig) decreases by around 3KB
      with following details:
      
      $ size vmlinux-timer
         text	   data	    bss	    dec	    hex	filename
      12034646	6869268	 386840	19290754	1265a82	vmlinux
      
      $ size vmlinux-hrtimer
         text	   data	    bss	    dec	    hex	filename
      12030550	6870164	 387032	19287746	1264ec2	vmlinux
      
      The latency impact on arm 32bits snowball dual cores is :
       - mark_last_busy: from 0.31 us usec to 0.77 us
       - rpm_suspend: from 6.83 us to 6.67 usec
      
      The increase of the image for snowball platform that I used for
      testing performance impact, is neglictable (244B).
      
      $ size vmlinux-timer
         text	   data	    bss	    dec	    hex	filename
      7157961	2119580	 264120	9541661	 91981d	build-ux500/vmlinux
      
      size vmlinux-hrtimer
         text	   data	    bss	    dec	    hex	filename
      7157773	21198846
      
      	 264248	9541905	 919911	vmlinux-hrtimer
      
      And arm 32bits image (multi_v7_defconfig) increases by around 1.7KB
      with following details:
      
      $ size vmlinux-timer
         text	   data	    bss	    dec	    hex	filename
      13304443	6803420	 402768	20510631	138f7a7	vmlinux
      
      $ size vmlinux-hrtimer
         text	   data	    bss	    dec	    hex	filename
      13304299	6805276	 402768	20512343	138fe57	vmlinux
      Signed-off-by: vingu-linaro's avatarVincent Guittot <vincent.guittot@linaro.org>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8234f673
  7. 18 Dec, 2018 8 commits
  8. 17 Dec, 2018 4 commits
  9. 16 Dec, 2018 1 commit