Skip to content
  • Tony Luck's avatar
    x86/mce: Don't check for the overflow bit on action optional machine checks · aaefca8e
    Tony Luck authored
    
    
    We currently do not process SRAO (Software Recoverable Action Optional)
    machine checks if they are logged with the overflow bit set to 1 in the
    machine check bank status register. This is overly conservative.
    
    There are two cases where we could end up with an SRAO+OVER log based
    on the SDM volume 3 overwrite rules in "Table 15-8. Overwrite Rules for
    UC, CE, and UCR Errors"
    
    1) First a corrected error is logged, then the SRAO error overwrites.
       The second error overwrites the first because uncorrected errors
       have a higher severity than corrected errors.
    2) The SRAO error was logged first, followed by a correcetd error.
       In this case the first error is retained in the bank.
    
    So in either case the machine check bank will contain the address
    of the SRAO error. So we can process that even if the overflow bit
    was set.
    
    Reported-by: default avatarYongkai Wu <yongkaiwu@tencent.com>
    Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: linux-edac <linux-edac@vger.kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: x86-ml <x86@kernel.org>
    Link: https://lkml.kernel.org/r/20190718182920.32621-1-tony.luck@intel.com
    aaefca8e