1. 12 Dec, 2018 1 commit
    • Jarkko Nikula's avatar
      PCI / PM: Allow runtime PM without callback functions · c5eb1190
      Jarkko Nikula authored
      a9c8088c ("i2c: i801: Don't restore config registers on runtime PM")
      nullified the runtime PM suspend/resume callback pointers while keeping the
      runtime PM enabled.
      This caused the SMBus PCI device to stay in D0 with
      /sys/devices/.../power/runtime_status showing "error" when the runtime PM
      framework attempted to autosuspend the device.  This is due to PCI bus
      runtime PM, which checks for driver runtime PM callbacks and returns
      -ENOSYS if they are not set.
      Since i2c-i801.c doesn't need to do anything device-specific for runtime
      PM, Jean Delvare proposed this be fixed in the PCI core rather than adding
      dummy runtime PM callback functions in the PCI drivers.
      Change pci_pm_runtime_suspend()/pci_pm_runtime_resume() so they allow
      changing the PCI device power state during runtime PM transitions even if
      the driver supplies no runtime PM callbacks.
      This fixes the runtime PM regression on i2c-i801.c.
      It is not obvious why the code previously required the runtime PM
      callbacks.  The test has been there since the code was introduced by
      6cbf8214 ("PCI PM: Run-time callbacks for PCI bus type").
      On the other hand, a similar change was done to generic runtime PM
      callbacks in 05aa55dd ("PM / Runtime: Lenient generic runtime pm
      Fixes: a9c8088c
       ("i2c: i801: Don't restore config registers on runtime PM")
      Reported-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarJean Delvare <jdelvare@suse.de>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: stable@vger.kernel.org	# v4.18+
  2. 30 Jul, 2018 1 commit
  3. 29 Jun, 2018 1 commit
    • Jakub Kicinski's avatar
      PCI/IOV: Reset total_VFs limit after detaching PF driver · 38972375
      Jakub Kicinski authored
      The TotalVFs register in the SR-IOV capability is the hardware limit on the
      number of VFs.  A PF driver can limit the number of VFs further with
      pci_sriov_set_totalvfs().  When the PF driver is removed, reset any VF
      limit that was imposed by the driver because that limit may not apply to
      other drivers.
      Before 8d85a7a4 ("PCI/IOV: Allow PF drivers to limit total_VFs to 0"),
      pci_sriov_set_totalvfs(pdev, 0) meant "we can enable TotalVFs virtual
      functions", and the nfp driver used that to remove the VF limit when the
      driver unloads.
      8d85a7a4 broke that because instead of removing the VF limit,
      pci_sriov_set_totalvfs(pdev, 0) actually sets the limit to zero, and that
      limit persists even if another driver is loaded.
      We could fix that by making the nfp driver reset the limit when it unloads,
      but it seems more robust to do it in the PCI core instead of relying on the
      The regression scenario is:
        nfp_pci_probe (driver 1)
          pci_sriov_set_totalvfs(pf->pdev, 0)   # limits VFs to 0
        nfp_pci_probe (driver 2)
          # no VF limit from firmware
      Now driver 2 is broken because the VF limit is still 0 from driver 1.
      Fixes: 8d85a7a4
       ("PCI/IOV: Allow PF drivers to limit total_VFs to 0")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      [bhelgaas: changelog, rename functions]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
  4. 24 May, 2018 1 commit
    • Rafael J. Wysocki's avatar
      PCI / PM: Do not clear state_saved for devices that remain suspended · 656088aa
      Rafael J. Wysocki authored
      The state_saved flag should not be cleared in pci_pm_suspend() if the
      given device is going to remain suspended, or the device's config
      space will not be restored properly during the subsequent resume.
      Namely, if the device is going to stay in suspend, both the late
      and noirq callbacks return early for it, so if its state_saved flag
      is cleared in pci_pm_suspend(), it will remain unset throughout the
      remaining part of suspend and resume and pci_restore_state() called
      for the device going forward will return without doing anything.
      For this reason, change pci_pm_suspend() to only clear state_saved
      if the given device is not going to remain suspended.  [This is
      analogous to what commit ae860a19 (PCI / PM: Do not clear
      state_saved in pci_pm_freeze() when smart suspend is set) did for
      Fixes: c4b65157
       (PCI / PM: Take SMART_SUSPEND driver flag into account)
      Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
  5. 17 May, 2018 1 commit
  6. 03 May, 2018 2 commits
  7. 23 Apr, 2018 1 commit
    • Mika Westerberg's avatar
      PCI / PM: Do not clear state_saved in pci_pm_freeze() when smart suspend is set · ae860a19
      Mika Westerberg authored
      If a driver uses DPM_FLAG_SMART_SUSPEND and the device is already
      runtime suspended when hibernate is started PCI core skips runtime
      resuming the device but still clears pci_dev->state_saved. After the
      hibernation image is written pci_pm_thaw_noirq() makes sure subsequent
      thaw phases for the device are also skipped leaving it runtime suspended
      with pci_dev->state_saved == false.
      When the device is eventually runtime resumed pci_pm_runtime_resume()
      restores config space by calling pci_restore_standard_config(), however
      because pci_dev->state_saved == false pci_restore_state() never actually
      restores the config space leaving the device in a state that is not what
      the driver might expect.
      For example here is what happens for intel-lpss I2C devices once the
      hibernation snapshot is taken:
        intel-lpss 0000:00:15.0: power state changed by ACPI to D0
        intel-lpss 0000:00:1e.0: power state changed by ACPI to D3cold
        video LNXVIDEO:00: Restoring backlight state
        PM: hibernation exit
        i2c_designware i2c_designware.1: Unknown Synopsys component type: 0xffffffff
        i2c_designware i2c_designware.0: Unknown Synopsys component type: 0xffffffff
        i2c_designware i2c_designware.1: timeout in disabling adapter
        i2c_designware i2c_designware.0: timeout in disabling adapter
      Since PCI config space is not restored the device is still in D3hot
      making MMIO register reads return 0xffffffff.
      Fix this by clearing pci_dev->state_saved only if we actually end up
      runtime resuming the device.
      Fixes: c4b65157
       (PCI / PM: Take SMART_SUSPEND driver flag into account)
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
  8. 30 Mar, 2018 2 commits
    • Bjorn Helgaas's avatar
      PCI/portdrv: Remove pcie_port_bus_type link order dependency · c6c889d9
      Bjorn Helgaas authored
      The pcie_port_bus_type must be registered before drivers that depend on it
      can be registered.  Those drivers include:
        pcied_init()                # PCIe native hotplug driver
        aer_service_init()          # AER driver
        dpc_service_init()          # DPC driver
        pcie_pme_service_init()     # PME driver
      Previously we registered pcie_port_bus_type from pcie_portdrv_init(), a
      device_initcall.  The callers of pcie_port_service_register() (above) are
      also device_initcalls.  This is fragile because the device_initcall
      ordering depends on link order, which is not explicit.
      Register pcie_port_bus_type from pci_driver_init() along with pci_bus_type.
      This removes the link order dependency between portdrv and the pciehp, AER,
      DPC, and PCIe PME drivers.
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    • Bjorn Helgaas's avatar
      PCI/PM: Clear PCIe PME Status bit for Root Complex Event Collectors · 3620c714
      Bjorn Helgaas authored
      Per PCIe r4.0, sec 6.1.6, Root Complex Event Collectors can generate PME
      interrupts on behalf of Root Complex Integrated Endpoints.
      Linux does not currently enable PME interrupts from RC Event Collectors,
      but fe31e697 ("PCI/PCIe: Clear Root PME Status bits early during system
      resume") suggests PME interrupts may be enabled by the platform for ACPI-
      based runtime wakeup.
      Clear the PCIe PME Status bit for Root Complex Event Collectors during
      resume, just like we already do for Root Ports.
      If the BIOS enables PME interrupts for an event collector and neglects to
      clear the status bit on resume, this change should fix the same bug as
       (PMEs not working after waking from a sleep state), but for
      Root Complex Integrated Endpoints.
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
  9. 19 Mar, 2018 1 commit
  10. 13 Mar, 2018 1 commit
    • Rafael J. Wysocki's avatar
      PCI: Restore config space on runtime resume despite being unbound · 5775b843
      Rafael J. Wysocki authored
      We leave PCI devices not bound to a driver in D0 during runtime suspend.
      But they may have a parent which is bound and can be transitioned to
      D3cold at runtime.  Once the parent goes to D3cold, the unbound child
      may go to D3cold as well.  When the child goes to D3cold, its internal
      state, including configuration of BARs, MSI, ASPM, MPS, etc., is lost.
      One example are recent hybrid graphics laptops which cut power to the
      discrete GPU when the root port above it goes to ACPI power state D3.
      Users may provoke this by unbinding the GPU driver and allowing runtime
      PM on the GPU via sysfs:  The PM core will then treat the GPU as
      "suspended", which in turn allows the root port to runtime suspend,
      causing the power resources listed in its _PR3 object to be powered off.
      The GPU's BARs will be uninitialized when a driver later probes it.
      Another example are hybrid graphics laptops where the GPU itself (rather
      than the root port) is capable of runtime suspending to D3cold.  If the
      GPU's integrated HDA controller is not bound and the GPU's driver
      decides to runtime suspend to D3cold, the HDA controller's BARs will be
      uninitialized when a driver later probes it.
      Fix by saving and restoring config space over a runtime suspend cycle
      even if the device is not bound.
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Tested-by: Peter Wu <peter@lekensteyn.nl>              # Nvidia Optimus
      Tested-by: Lukas Wunner <lukas@wunner.de>              # MacBook Pro
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [lukas: add commit message, bikeshed code comments for clarity]
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/92fb6e6ae2730915eb733c08e2f76c6a313e3860.1520068884.git.lukas@wunner.de
  11. 12 Mar, 2018 1 commit
    • Bjorn Helgaas's avatar
      PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver · a39bd851
      Bjorn Helgaas authored
      fe31e697 ("PCI/PCIe: Clear Root PME Status bits early during system
      resume") added a .resume_noirq() callback to the PCIe port driver to clear
      the PME Status bit during resume to work around a BIOS issue.
      The BIOS evidently enabled PME interrupts for ACPI-based runtime wakeups
      but did not clear the PME Status bit during resume, which meant PMEs after
      resume did not trigger interrupts because PME Status did not transition
      from cleared to set.
      The fix was in the PCIe port driver, so it worked when CONFIG_PCIEPORTBUS
      was set.  But I think we *always* want the fix because the platform may use
      PME interrupts even if Linux is built without the PCIe port driver.
      Move the fix from the port driver to the PCI core so we can work around
      this "PME doesn't work after waking from a sleep state" issue regardless of
      [bhelgaas: folded in warning fix from Arnd Bergmann <arnd@arndb.de>:
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
  12. 22 Feb, 2018 1 commit
  13. 28 Jan, 2018 1 commit
  14. 18 Jan, 2018 1 commit
  15. 18 Dec, 2017 1 commit
  16. 11 Dec, 2017 1 commit
    • Rafael J. Wysocki's avatar
      PM / sleep: Avoid excess pm_runtime_enable() calls in device_resume() · 3487972d
      Rafael J. Wysocki authored
      Middle-layer code doing suspend-time optimizations for devices with
      the DPM_FLAG_SMART_SUSPEND flag set (currently, the PCI bus type and
      the ACPI PM domain) needs to make the core skip ->thaw_early and
      ->thaw callbacks for those devices in some cases and it sets the
      power.direct_complete flag for them for this purpose.
      However, it turns out that setting power.direct_complete outside of
      the PM core is a bad idea as it triggers an excess invocation of
      pm_runtime_enable() in device_resume().
      For this reason, provide a helper to clear power.is_late_suspended
      and power.is_suspended to be invoked by the middle-layer code in
      question instead of setting power.direct_complete and make that code
      call the new helper.
      Fixes: c4b65157 (PCI / PM: Take SMART_SUSPEND driver flag into account)
      Fixes: 05087360
       (ACPI / PM: Take SMART_SUSPEND driver flag into account)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
  17. 27 Nov, 2017 1 commit
  18. 06 Nov, 2017 3 commits
    • Rafael J. Wysocki's avatar
      PCI / PM: Take SMART_SUSPEND driver flag into account · c4b65157
      Rafael J. Wysocki authored
      Make the PCI bus type take DPM_FLAG_SMART_SUSPEND into account in its
      system-wide PM callbacks and make sure that all code that should not
      run in parallel with pci_pm_runtime_resume() is executed in the "late"
      phases of system suspend, freeze and poweroff transitions.
      [Note that the pm_runtime_suspended() check in pci_dev_keep_suspended()
      is an optimization, because if is not passed, all of the subsequent
      checks may be skipped and some of them are much more overhead in
      Also use the observation that if the device is in runtime suspend
      at the beginning of the "late" phase of a system-wide suspend-like
      transition, its state cannot change going forward (runtime PM is
      disabled for it at that time) until the transition is over and the
      subsequent system-wide PM callbacks should be skipped for it (as
      they generally assume the device to not be suspended), so add checks
      for that in pci_pm_suspend_late/noirq(), pci_pm_freeze_late/noirq()
      and pci_pm_poweroff_late/noirq().
      Moreover, if pci_pm_resume_noirq() or pci_pm_restore_noirq() is
      called during the subsequent system-wide resume transition and if
      the device was left in runtime suspend previously, its runtime PM
      status needs to be changed to "active" as it is going to be put
      into the full-power state, so add checks for that too to these
      In turn, if pci_pm_thaw_noirq() runs after the device has been
      left in runtime suspend, the subsequent "thaw" callbacks need
      to be skipped for it (as they may not work correctly with a
      suspended device), so set the power.direct_complete flag for the
      device then to make the PM core skip those callbacks.
      In addition to the above add a core helper for checking if
      DPM_FLAG_SMART_SUSPEND is set and the device runtime PM status is
      "suspended" at the same time, which is done quite often in the new
      code (and will be done elsewhere going forward too).
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    • Rafael J. Wysocki's avatar
      PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks · 302666d8
      Rafael J. Wysocki authored
      The only user of non-empty pcibios_pm_ops is s390 and it only uses
      "noirq" callbacks, so drop the invocations of the other pcibios_pm_ops
      callbacks from the PCI PM code.
      That will allow subsequent changes to be somewhat simpler.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
    • Rafael J. Wysocki's avatar
      PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags · 08810a41
      Rafael J. Wysocki authored
      The motivation for this change is to provide a way to work around
      a problem with the direct-complete mechanism used for avoiding
      system suspend/resume handling for devices in runtime suspend.
      The problem is that some middle layer code (the PCI bus type and
      the ACPI PM domain in particular) returns positive values from its
      system suspend ->prepare callbacks regardless of whether the driver's
      ->prepare returns a positive value or 0, which effectively prevents
      drivers from being able to control the direct-complete feature.
      Some drivers need that control, however, and the PCI bus type has
      grown its own flag to deal with this issue, but since it is not
      limited to PCI, it is better to address it by adding driver flags at
      the core level.
      To that end, add a driver_flags field to struct dev_pm_info for flags
      that can be set by device drivers at the probe time to inform the PM
      core and/or bus types, PM domains and so on on the capabilities and/or
      preferences of device drivers.  Also add two static inline helpers
      for setting that field and testing it against a given set of flags
      and make the driver core clear it automatically on driver remove
      and probe failures.
      Define and document two PM driver flags related to the direct-
      complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
      respectively, to indicate to the PM core that the direct-complete
      mechanism should never be used for the device and to inform the
      middle layer code (bus types, PM domains etc) that it can only
      request the PM core to use the direct-complete mechanism for
      the device (by returning a positive value from its ->prepare
      callback) if it also has been requested by the driver.
      While at it, make the core check pm_runtime_suspended() when
      setting power.direct_complete so that it doesn't need to be
      checked by ->prepare callbacks.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
  19. 19 Oct, 2017 1 commit
    • Robin Murphy's avatar
      drivers: flag buses which demand DMA configuration · d89e2378
      Robin Murphy authored
      We do not want the common dma_configure() pathway to apply
      indiscriminately to all devices, since there are plenty of buses which
      do not have DMA capability, and if their child devices were used for
      DMA API calls it would only be indicative of a driver bug. However,
      there are a number of buses for which DMA is implicitly expected even
      when not described by firmware - those we whitelist with an automatic
      opt-in to dma_configure(), assuming that the DMA address space and the
      physical address space are equivalent if not otherwise specified.
      Commit 72328883
       ("of: restrict DMA configuration") introduced a
      short-term fix by comparing explicit bus types, but this approach is far
      from pretty, doesn't scale well, and fails to cope at all with bus
      drivers which may be built as modules, like host1x. Let's refine things
      by making that opt-in a property of the bus type, which neatly addresses
      those problems and lets the decision of whether firmware description of
      DMA capability should be optional or mandatory stay internal to the bus
      drivers themselves.
      Signed-off-by: Robin Murphy's avatarRobin Murphy <robin.murphy@arm.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
  20. 03 Oct, 2017 1 commit
  21. 28 Sep, 2017 1 commit
  22. 18 Aug, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      PCI/IB: add support for pci driver attribute groups · 92d50fc1
      Greg Kroah-Hartman authored
      Some drivers (specifically the nes IB driver), want to create a lot of
      sysfs driver attributes.  Instead of open-coding the creation and
      removal of these files (and getting it wrong btw), it's a better idea to
      let the driver core handle all of this logic for us.
      So add a new field to the pci driver structure, **groups, that allows
      pci drivers to specify an attribute group list it wishes to have created
      when it is registered with the driver core.
      Big bonus is now the driver doesn't race with userspace when the sysfs
      files are created vs. when the kobject is announced, so any script/tool
      that actually wanted to use these files will not have to poll waiting
      for them to show up.
      Cc: Faisal Latif <faisal.latif@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
  23. 01 Aug, 2017 1 commit
  24. 12 Jul, 2017 1 commit
    • Rafael J. Wysocki's avatar
      PCI / PM: Restore PME Enable after config space restoration · 0ce3fcaf
      Rafael J. Wysocki authored
      Commit dc15e71e (PCI / PM: Restore PME Enable if skipping wakeup
      setup) introduced a mechanism by which the PME Enable bit can be
      restored by pci_enable_wake() if dev->wakeup_prepared is set in
      case it has been overwritten by PCI config space restoration.
      However, that commit overlooked the fact that on some systems (Dell
      XPS13 9360 in particular) the AML handling wakeup events checks PME
      Status and PME Enable and it won't trigger a Notify() for devices
      where those bits are not set while it is running.
      That happens during resume from suspend-to-idle when pci_restore_state()
      invoked by pci_pm_default_resume_early() clears PME Enable before the
      wakeup events are processed by AML, effectively causing those wakeup
      events to be ignored.
      Fix this issue by restoring the PME Enable configuration right after
      pci_restore_state() has been called instead of doing that in
      Fixes: dc15e71e
       (PCI / PM: Restore PME Enable if skipping wakeup setup)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
  25. 02 Jul, 2017 1 commit
    • Matthew Minter's avatar
      PCI: Add a call to pci_assign_irq() in pci_device_probe() · 30fdfb92
      Matthew Minter authored
      The pci_assign_irq() function allows assignment of an IRQ to devices during
      device enable time rather than only at boot.  Therefore call it in the
      pci_device_probe() function during the enable device code path so this
      assignment can be performed.
      This patch will do nothing on arches which do not set the IRQ mapping
      function pointers and is therefore currently a nop, however as support for
      these function pointers is added to arch-specific code this will cause IRQ
      assignment to migrate to device enable time allowing the new code paths to
      be used.
      Signed-off-by: default avatarMatthew Minter <matt@masarand.com>
      [lorenzo.pieralisi@arm.com: moved pci_assign_irq() call site]
      Signed-off-by: Lorenzo Pieralisi's avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
  26. 30 Jun, 2017 1 commit
    • Chen Yu's avatar
      PCI/PM: Restore the status of PCI devices across hibernation · e60514bd
      Chen Yu authored
      Currently we saw a lot of "No irq handler" errors during hibernation, which
      caused the system hang finally:
        ata4.00: qc timeout (cmd 0xec)
        ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
        ata4.00: revalidation failed (errno=-5)
        ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
        do_IRQ: 31.151 No irq handler for vector
      According to above logs, there is an interrupt triggered and it is
      dispatched to CPU31 with a vector number 151, but there is no handler for
      it, thus this IRQ will not get acked and will cause an IRQ flood which
      kills the system.  To be more specific, the 31.151 is an interrupt from the
      AHCI host controller.
      After some investigation, the reason why this issue is triggered is because
      the thaw_noirq() function does not restore the MSI/MSI-X settings across
      The scenario is illustrated below:
        1. Before hibernation, IRQ 34 is the handler for the AHCI device, which
           is bound to CPU31.
        2. Hibernation starts, the AHCI device is put into low power state.
        3. All the nonboot CPUs are put offline, so IRQ 34 has to be migrated to
           the last alive one - CPU0.
        4. After the snapshot has been created, all the nonboot CPUs are brought
           up again; IRQ 34 remains bound to CPU0.
        5. AHCI devices are put into D0.
        6. The snapshot is written to the disk.
      The issue is triggered in step 6.  The AHCI interrupt should be delivered
      to CPU0, however it is delivered to the original CPU31 instead, which
      causes the "No irq handler" issue.
      Ying Huang has provided a clue that, in step 3 it is possible that writing
      to the register might not take effect as the PCI devices have been
      In step 3, the IRQ 34 affinity should be modified from CPU31 to CPU0, but
      in fact it is not.  In __pci_write_msi_msg(), if the device is already in
      low power state, the low level MSI message entry will not be updated but
      cached.  During the device restore process after a normal suspend/resume,
      pci_restore_msi_state() writes the cached MSI back to the hardware.
      But this is not the case for hibernation.  pci_restore_msi_state() is not
      currently called in pci_pm_thaw_noirq(), although pci_save_state() has
      saved the necessary PCI cached information in pci_pm_freeze_noirq().
      Restore the PCI status for the device during hibernation.  Otherwise the
      status might be lost across hibernation (for example, settings for MSI,
      MSI-X, ATS, ACS, IOV, etc.), which might cause problems during hibernation.
      Suggested-by: default avatarYing Huang <ying.huang@intel.com>
      Suggested-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      [bhelgaas: changelog]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: stable@vger.kernel.org
      Cc: Len Brown <len.brown@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: Ying Huang <ying.huang@intel.com>
  27. 27 Jun, 2017 1 commit
  28. 12 Jun, 2017 1 commit
  29. 26 May, 2017 2 commits
    • Thomas Gleixner's avatar
      PCI: Replace the racy recursion prevention · 0b2c2a71
      Thomas Gleixner authored
      pci_call_probe() can called recursively when a physcial function is probed
      and the probing creates virtual functions, which are populated via
      pci_bus_add_device() which in turn can end up calling pci_call_probe()
      The code has an interesting way to prevent recursing into the workqueue
      code.  That's accomplished by a check whether the current task runs already
      on the numa node which is associated with the device.
      While that works to prevent the recursion into the workqueue code, it's
      racy versus normal execution as there is no guarantee that the node does
      not vanish after the check.
      There is another issue with this code. It dereferences cpumask_of_node()
      unconditionally without checking whether the node is available.
      Make the detection reliable by:
       - Mark a probed device as 'is_probed' in pci_call_probe()
       - Check in pci_call_probe for a virtual function. If it's a virtual
         function and the associated physical function device is marked
         'is_probed' then this is a recursive call, so the call can be invoked in
         the calling context.
       - Add a check whether the node is online before dereferencing it.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-pci@vger.kernel.org
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20170524081548.771457199@linutronix.de
    • Thomas Gleixner's avatar
      PCI: Use cpu_hotplug_disable() instead of get_online_cpus() · 1ddd45f8
      Thomas Gleixner authored
      Converting the hotplug locking, i.e. get_online_cpus(), to a percpu rwsem
      unearthed a circular lock dependency which was hidden from lockdep due to
      the lockdep annotation of get_online_cpus() which prevents lockdep from
      creating full dependency chains. There are several variants of this. And
      example is:
      Chain exists of:
      cpu_hotplug_lock.rw_sem --> drm_global_mutex --> &item->mutex
      CPU0                    CPU1
      ----                    ----
      because there are dependencies through workqueues. The call chain is:
      This is not a problem of get_online_cpus() recursion, it's a possible
      deadlock undetected by lockdep so far.
      The cure is to use cpu_hotplug_disable() instead of get_online_cpus() to
      protect the PCI probing.
      There is a side effect to this: cpu_hotplug_disable() makes a concurrent
      cpu hotplug attempt via the sysfs interfaces fail with -EBUSY, but PCI
      probing usually happens during the boot process where no interaction is
      possible. Any later invocations are infrequent enough and concurrent
      hotplug attempts are so unlikely that the danger of user space visible
      regressions is very close to zero. Anyway, thats preferrable over a real
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-pci@vger.kernel.org
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20170524081548.691198590@linutronix.de
  30. 20 Apr, 2017 1 commit
  31. 09 Mar, 2017 1 commit
    • Prarit Bhargava's avatar
      PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown() · fda78d7a
      Prarit Bhargava authored
      The pci_bus_type .shutdown method, pci_device_shutdown(), is called from
      device_shutdown() in the kernel restart and shutdown paths.
      Previously, pci_device_shutdown() called pci_msi_shutdown() and
      pci_msix_shutdown().  This disables MSI and MSI-X, which causes the device
      to fall back to raising interrupts via INTx.  But the driver is still bound
      to the device, it doesn't know about this change, and it likely doesn't
      have an INTx handler, so these INTx interrupts cause "nobody cared"
      warnings like this:
        irq 16: nobody cared (try booting with the "irqpoll" option)
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.2-1.el7_UNSUPPORTED.x86_64 #1
        Hardware name: Hewlett-Packard HP Z820 Workstation/158B, BIOS J63 v03.90 06/
      The MSI disabling code was added by d52877c7 ("pci/irq: let
      pci_device_shutdown to call pci_msi_shutdown v2") because a driver left MSI
      enabled and kdump failed because the kexeced kernel wasn't prepared to
      receive the MSI interrupts.
      Subsequent commits 1851617c ("PCI/MSI: Disable MSI at enumeration even
      if kernel doesn't support MSI") and  e80e7edc ("PCI/MSI: Initialize MSI
      capability for all architectures") changed the kexeced kernel to disable
      all MSIs itself so it no longer depends on the crashed kernel to clean up
      after itself.
      Stop disabling MSI/MSI-X in pci_device_shutdown().  This resolves the
      "nobody cared" unhandled IRQ issue above.  It also allows PCI serial
      devices, which may rely on the MSI interrupts, to continue outputting
      messages during reboot/shutdown.
      [bhelgaas: changelog, drop pci_msi_shutdown() and pci_msix_shutdown() calls
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=187351
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: Alex Williamson <alex.williamson@redhat.com>
      CC: David Arcari <darcari@redhat.com>
      CC: Myron Stowe <mstowe@redhat.com>
      CC: Lukas Wunner <lukas@wunner.de>
      CC: Keith Busch <keith.busch@intel.com>
      CC: Mika Westerberg <mika.westerberg@linux.intel.com>
  32. 10 Feb, 2017 1 commit
  33. 20 Jan, 2017 1 commit
  34. 28 Sep, 2016 1 commit
    • Lukas Wunner's avatar
      PCI: Avoid unnecessary resume after direct-complete · a0d2a959
      Lukas Wunner authored
      Commit 58a1fbbb
       ("PM / PCI / ACPI: Kick devices that might have been
      reset by firmware") added a runtime resume for devices that were runtime
      suspended when the system entered sleep.
      The motivation was that devices might be in a reset-power-on state after
      waking from system sleep, so their power state as perceived by Linux
      (stored in pci_dev->current_state) would no longer reflect reality.  By
      resuming such devices, we allow them to return to a low-power state via
      autosuspend and also bring their current_state in sync with reality.
      However for devices that are *not* in a reset-power-on state, doing an
      unconditional resume wastes energy.  A more refined approach is called for
      which issues a runtime resume only if the power state after direct-complete
      is shallower than it was before. To achieve this, update the device's
      current_state and compare it to its pre-sleep value.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
  35. 09 Aug, 2016 1 commit