Skip to content
  • Daniel Drake's avatar
    PCI: Reprogram bridge prefetch registers on resume · 08387454
    Daniel Drake authored
    On 38+ Intel-based ASUS products, the NVIDIA GPU becomes unusable after S3
    suspend/resume.  The affected products include multiple generations of
    NVIDIA GPUs and Intel SoCs.  After resume, nouveau logs many errors such
    as:
    
      fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04
            [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
      DRM: failed to idle channel 0 [DRM]
    
    Similarly, the NVIDIA proprietary driver also fails after resume (black
    screen, 100% CPU usage in Xorg process).  We shipped a sample to NVIDIA for
    diagnosis, and their response indicated that it's a problem with the parent
    PCI bridge (on the Intel SoC), not the GPU.
    
    Runtime suspend/resume works fine, only S3 suspend is affected.
    
    We found a workaround: on resume, rewrite the Intel PCI bridge
    'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32).  In the
    cases that I checked, this register has value 0 and we just have to rewrite
    that value.
    
    Linux already saves and restores PCI config space during suspend/resume,
    but this register was being skipped because upon resume, it already has
    value 0 (the correct, pre-suspend value).
    
    Intel appear to have previously acknowledged this behaviour and the
    requirement to rewrite this register:
    https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
    
    Based on that, rewrite the prefetch register values even when that appears
    unnecessary.
    
    We have confirmed this solution on all the affected models we have in-hands
    (X542UQ, UX533FD, X530UN, V272UN).
    
    Additionally, this solves an issue where r8169 MSI-X interrupts were broken
    after S3 suspend/resume on ASUS X441UAR.  This issue was recently worked
    around in commit 7bb05b85 ("r8169: don't use MSI-X on RTL8106e").  It
    also fixes the same issue on RTL6186evl/8111evl on an Aimfor-tech laptop
    that we had not yet patched.  I suspect it will also fix the issue that was
    worked around in commit 7c53a722 ("r8169: don't use MSI-X on
    RTL8168g").
    
    Thomas Martitz reports that this change also solves an issue where the AMD
    Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive after S3
    suspend/resume.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
    
    
    Signed-off-by: default avatarDaniel Drake <drake@endlessm.com>
    Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Reviewed-By: default avatarPeter Wu <peter@lekensteyn.nl>
    CC: stable@vger.kernel.org
    08387454