1. 02 Aug, 2019 1 commit
  2. 24 May, 2019 1 commit
    • Douglas Anderson's avatar
      platform/chrome: cros_ec_spi: Request the SPI thread be realtime · ac5bdfdc
      Douglas Anderson authored
      
      
      All currently known ECs in the wild are very sensitive to timing.
      Specifically the ECs are known to drop a transfer if more than 8 ms
      passes from the assertion of the chip select until the transfer
      finishes.
      
      Let's use the new feature introduced in the patch (spi: Allow SPI
      devices to request the pumping thread be realtime") to request the SPI
      pumping thread be realtime.  This means that if we get shunted off to
      the SPI thread for whatever reason we won't get downgraded to low
      priority.
      
      NOTES:
      - We still need to keep ourselves as high priority since the SPI core
        doesn't guarantee that all transfers end up on the pumping thread
        (in fact, it tries pretty hard to do them in the calling context).
      - If future Chrome OS ECs ever fix themselves to be less sensitive
        then we could consider adding a property (or compatible string) to
        not set this property.  For now we need it across the board.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGuenter Roeck <groeck@chromium.org>
      Signed-off-by: default avatarEnric Balletbo i Serra <enric.balletbo@collabora.com>
      ac5bdfdc
  3. 20 May, 2019 2 commits
    • Douglas Anderson's avatar
      platform/chrome: cros_ec_spi: Move to real time priority for transfers · 7dadf88f
      Douglas Anderson authored
      In commit 37a18622 ("platform/chrome: cros_ec_spi: Transfer
      messages at high priority") we moved transfers to a high priority
      workqueue.  This helped make them much more reliable.
      
      ...but, we still saw failures.
      
      We were actually finding ourselves competing for time with dm-crypt
      which also scheduled work on HIGHPRI workqueues.  While we can
      consider reverting the change that made dm-crypt run its work at
      HIGHPRI, the argument in commit a1b89132
      
       ("dm crypt: use
      WQ_HIGHPRI for the IO and crypt workqueues") is somewhat compelling.
      It does make sense for IO to be scheduled at a priority that's higher
      than the default user priority.  It also turns out that dm-crypt isn't
      alone in using high priority like this.  loop_prepare_queue() does
      something similar for loopback devices.
      
      Looking in more detail, it can be seen that the high priority
      workqueue isn't actually that high of a priority.  It runs at MIN_NICE
      which is _fairly_ high priority but still below all real time
      priority.
      
      Should we move cros_ec_spi to real time priority to fix our problems,
      or is this just escalating a priority war?  I'll argue here that
      cros_ec_spi _does_ belong at real time priority.  Specifically
      cros_ec_spi actually needs to run quickly for correctness.  As I
      understand this is exactly what real time priority is for.
      
      There currently doesn't appear to be any way to use the standard
      workqueue APIs with a real time priority, so we'll switch over to
      using using a kthread worker.  We'll match the priority that the SPI
      core uses when it wants to do things on a realtime thread and just use
      "MAX_RT_PRIO - 1".
      
      This commit plus the patch ("platform/chrome: cros_ec_spi: Request the
      SPI thread be realtime") are enough to get communications very close
      to 100% reliable (the only known problem left is when serial console
      is turned on, which isn't something that happens in shipping devices).
      Specifically this test case now passes (tested on rk3288-veyron-jerry):
      
        dd if=/dev/zero of=/var/log/foo.txt bs=4M count=512&
        while true; do
          ectool version > /dev/null;
        done
      
      It should be noted that "/var/log" is encrypted (and goes through
      dm-crypt) and also passes through a loopback device.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGuenter Roeck <groeck@chromium.org>
      Signed-off-by: default avatarEnric Balletbo i Serra <enric.balletbo@collabora.com>
      7dadf88f
    • Evan Green's avatar
      platform/chrome: cros_ec_spi: Always add of_match_table · 75501d2e
      Evan Green authored
      
      
      The Chrome OS EC driver attaches to devices using the of_match_table
      even when ACPI is the underlying firmware. It does this using the
      magic PRP0001 ACPI HID, which tells ACPI to go find an OF compatible
      string under the hood and match on that.
      
      The cros_ec_spi driver needs to provide the of_match_table regardless
      of whether CONFIG_OF is enabled or not, since the table is used by
      ACPI for PRP0001 devices.
      Signed-off-by: default avatarEvan Green <evgreen@chromium.org>
      Reviewed-by: default avatarBenson Leung <bleung@chromium.org>
      Signed-off-by: default avatarEnric Balletbo i Serra <enric.balletbo@collabora.com>
      75501d2e
  4. 15 Apr, 2019 1 commit
    • Douglas Anderson's avatar
      platform/chrome: cros_ec_spi: Transfer messages at high priority · 37a18622
      Douglas Anderson authored
      The software running on the Chrome OS Embedded Controller (cros_ec)
      handles SPI transfers in a bit of a wonky way.  Specifically if the EC
      sees too long of a delay in a SPI transfer it will give up and the
      transfer will be counted as failed.  Unfortunately the timeout is
      fairly short, though the actual number may be different for different
      EC codebases.
      
      We can end up tripping the timeout pretty easily if we happen to
      preempt the task running the SPI transfer and don't get back to it for
      a little while.
      
      Historically this hasn't been a _huge_ deal because:
      1. On old devices Chrome OS used to run PREEMPT_VOLUNTARY.  That meant
         we were pretty unlikely to take a big break from the transfer.
      2. On recent devices we had faster / more processors.
      3. Recent devices didn't use "cros-ec-spi-pre-delay".  Using that
         delay makes us more likely to trip this use case.
      4. For whatever reasons (I didn't dig) old kernels seem to be less
         likely to trip this.
      5. For the most part it's kinda OK if a few transfers to the EC fail.
         Mostly we're just polling the battery or doing some other task
         where we'll try again.
      
      Even with the above things, this issue has reared its ugly head
      periodically.  We could solve this in a nice way by adding reliable
      retries to the EC protocol [1] or by re-designing the code in the EC
      codebase to allow it to wait longer, but that code doesn't ever seem
      to get changed.  ...and even if it did, it wouldn't help old devices.
      
      It's now time to finally take a crack at making this a little better.
      This patch isn't guaranteed to make every cros_ec SPI transfer
      perfect, but it should improve things by a few orders of magnitude.
      Specifically you can try this on a rk3288-veyron Chromebook (which is
      slower and also _does_ need "cros-ec-spi-pre-delay"):
        md5sum /dev/zero &
        md5sum /dev/zero &
        md5sum /dev/zero &
        md5sum /dev/zero &
        while true; do
          cat /sys/class/power_supply/sbs-20-000b/charge_now > /dev/null;
        done
      ...before this patch you'll see boatloads of errors.  After this patch I
      don't see any in the testing I did.
      
      The way this patch works is by effectively boosting the priority of
      the cros_ec transfers.  As far as I know there is no simple way to
      just boost the priority of the current process temporarily so the way
      we accomplish this is by queuing the work on the system_highpri_wq.
      
      NOTE: this patch relies on the fact that the SPI framework attempts to
      push the messages out on the calling context (which is the one that is
      boosted to high priority).  As I understand from earlier (long ago)
      discussions with Mark Brown this should be a fine assumption.  Even if
      it isn't true sometimes this patch will still not make things worse.
      
      [1] https://crbug.com/678675
      
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Signed-off-by: default avatarEnric Balletbo i Serra <enric.balletbo@collabora.com>
      37a18622
  5. 01 Feb, 2019 2 commits
  6. 03 Jul, 2018 1 commit
  7. 23 May, 2018 1 commit
    • Brian Norris's avatar
      mfd: cros_ec: Retry commands when EC is known to be busy · 11799564
      Brian Norris authored
      Commit 001dde94 ("mfd: cros ec: spi: Fix "in progress" error
      signaling") pointed out some bad code, but its analysis and conclusion
      was not 100% correct.
      
      It *is* correct that we should not propagate result==EC_RES_IN_PROGRESS
      for transport errors, because this has a special meaning -- that we
      should follow up with EC_CMD_GET_COMMS_STATUS until the EC is no longer
      busy. This is definitely the wrong thing for many commands, because
      among other problems, EC_CMD_GET_COMMS_STATUS doesn't actually retrieve
      any RX data from the EC, so commands that expected some data back will
      instead start processing junk.
      
      For such commands, the right answer is to either propagate the error
      (and return that error to the caller) or resend the original command
      (*not* EC_CMD_GET_COMMS_STATUS).
      
      Unfortunately, commit 001dde94 forgets a crucial point: that for
      some long-running operations, the EC physically cannot respond to
      commands any more. For example, with EC_CMD_FLASH_ERASE, the EC may be
      re-flashing its own code regions, so it can't respond to SPI interrupts.
      Instead, the EC prepares us ahead of time for being busy for a "long"
      time, and fills its hardware buffer with EC_SPI_PAST_END. Thus, we
      expect to see several "transport" errors (or, messages filled with
      EC_SPI_PAST_END). So we should really translate that to a retryable
      error (-EAGAIN) and continue sending EC_CMD_GET_COMMS_STATUS until we
      get a ready status.
      
      IOW, it is actually important to treat some of these "junk" values as
      retryable errors.
      
      Together with commit 001dde94, this resolves bugs like the
      following:
      
      1. EC_CMD_FLASH_ERASE now works again (with commit 001dde94, we
         would abort the first time we saw EC_SPI_PAST_END)
      2. Before commit 001dde94, transport errors (e.g.,
         EC_SPI_RX_BAD_DATA) seen in other commands (e.g.,
         EC_CMD_RTC_GET_VALUE) used to yield junk data in the RX buffer; they
         will now yield -EAGAIN return values, and tools like 'hwclock' will
         simply fail instead of retrieving and re-programming undefined time
         values
      
      Fixes: 001dde94
      
       ("mfd: cros ec: spi: Fix "in progress" error signaling")
      Signed-off-by: default avatarBrian Norris <briannorris@chromium.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      11799564
  8. 08 Jan, 2018 1 commit
  9. 29 Nov, 2017 2 commits
  10. 27 Apr, 2017 1 commit
    • Doug Anderson's avatar
      mfd: cros ec: spi: Increase wait time to 200ms · ca691f71
      Doug Anderson authored
      
      
      This is a sucky change to bump up the time we'll wait for the EC.  Why
      is it sucky?  If 200ms for a transfer is a common thing it will have a
      massively bad impact on keyboard responsiveness.
      
      It still seems like a good idea to do this, though, because we have a
      gas gauge that claims that in an extreme case it could stretch the i2c
      clock for 144ms.  It's not a common case so it shouldn't affect
      responsiveness, but it can happen.  It's much better to have a single
      slow keyboard response than to start returning errors when we don't
      have to.
      
      In newer EC designs we should probably implement a virtual battery to
      respond to the kernel to insulate the kernel from these types of
      issues.
      Signed-off-by: default avatarDoug Anderson <dianders@chromium.org>
      Signed-off-by: default avatarEnric Balletbo i Serra <enric.balletbo@collabora.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      ca691f71
  11. 04 Oct, 2016 1 commit
  12. 14 Jan, 2016 1 commit
    • Lee Jones's avatar
      mfd: cros_ec_spi: Repair comparison ordering issue · 8827a642
      Lee Jones authored
      
      
      WARNING: Comparisons should place the constant on the right side of the test
      +       BUG_ON(EC_MSG_PREAMBLE_COUNT > ec_dev->din_size);
      
      WARNING: Comparisons should place the constant on the right side of the test
      +       BUG_ON(EC_MSG_PREAMBLE_COUNT > ec_dev->din_size);
      
      total: 0 errors, 2 warnings, 731 lines checked
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      8827a642
  13. 11 Jan, 2016 1 commit
    • Nicolas Boichat's avatar
      mfd: cros ec: Lock the SPI bus while holding chipselect · 6d6e44a9
      Nicolas Boichat authored
      
      
      cros_ec_cmd_xfer_spi and cros_ec_pkt_xfer_spi generally work like
      this:
       - Pull CS down (active), wait a bit, then send a command
       - Wait for response (multiple requests)
       - Wait a while, pull CS up (inactive)
      
      These operations, individually, lock the SPI bus, but there is
      nothing preventing the SPI framework from interleaving messages
      intended for other devices as the bus is unlocked in between.
      
      This is a problem as the EC expects CS to be held low for the
      whole duration.
      
      Solution: Lock the SPI bus during the whole transaction, to make
      sure that no other messages can be interleaved.
      Signed-off-by: default avatarNicolas Boichat <drinkcat@chromium.org>
      Reviewed-by: default avatarGwendal Grignou <gwendal@chromium.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      6d6e44a9
  14. 28 Oct, 2015 1 commit
  15. 24 Aug, 2015 1 commit
  16. 15 Jun, 2015 6 commits
  17. 06 Oct, 2014 2 commits
  18. 23 Jul, 2014 1 commit
  19. 09 Jul, 2014 8 commits
  20. 03 Jun, 2014 4 commits
  21. 21 Jan, 2014 1 commit