Skip to content
  • Oza Pawandeep's avatar
    PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices · 7e9084b3
    Oza Pawandeep authored
    PCIe ERR_FATAL errors mean the Link is unreliable.  Components on the Link
    may need to be reset to return to reliable operation (PCIe r4.0, sec
    6.2.2).  We previously handled these errors much differently depending on
    whether the platform supports Downstream Port Containment (DPC) (PCIe r4.0,
    sec 6.2.10) or not.
    
    The AER driver has historically logged the error details, called
    driver-supplied pci_error_handlers callbacks, and reset the Link.  This
    reset downstream devices, but did not remove them from the PCI subsystem,
    re-enumerate them, or call their driver .remove() or .probe() methods.
    
    DPC is different because the hardware automatically disables the Link when
    it detects ERR_FATAL, which resets downstream devices.  There's no
    opportunity for pci_error_handlers callbacks before resetting the Link.
    The DPC driver removes affected devices (which calls their driver .remove()
    methods), brings the Link back up, and re-enumerates (which cal...
    7e9084b3