Skip to content
  • Kirill A. Shutemov's avatar
    x86/mm: Optimize boot-time paging mode switching cost · 39b95522
    Kirill A. Shutemov authored
    
    
    By this point we have functioning boot-time switching between 4- and
    5-level paging mode. But naive approach comes with cost.
    
    Numbers below are for kernel build, allmodconfig, 5 times.
    
    CONFIG_X86_5LEVEL=n:
    
     Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
    
       17308719.892691      task-clock:u (msec)       #   26.772 CPUs utilized            ( +-  0.11% )
                     0      context-switches:u        #    0.000 K/sec
                     0      cpu-migrations:u          #    0.000 K/sec
           331,993,164      page-faults:u             #    0.019 M/sec                    ( +-  0.01% )
    43,614,978,867,455      cycles:u                  #    2.520 GHz                      ( +-  0.01% )
    39,371,534,575,126      stalled-cycles-frontend:u #   90.27% frontend cycles idle     ( +-  0.09% )
    28,363,350,152,428      instructions:u            #    0.65  insn per cycle
                                                      #    1.39  stalled cycles per insn  ( +-  0.00% )
     6,316,784,066,413      branches:u                #  364.948 M/sec                    ( +-  0.00% )
       250,808,144,781      branch-misses:u           #    3.97% of all branches          ( +-  0.01% )
    
         646.531974142 seconds time elapsed                                          ( +-  1.15% )
    
    CONFIG_X86_5LEVEL=y:
    
     Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
    
       17411536.780625      task-clock:u (msec)       #   26.426 CPUs utilized            ( +-  0.10% )
                     0      context-switches:u        #    0.000 K/sec
                     0      cpu-migrations:u          #    0.000 K/sec
           331,868,663      page-faults:u             #    0.019 M/sec                    ( +-  0.01% )
    43,865,909,056,301      cycles:u                  #    2.519 GHz                      ( +-  0.01% )
    39,740,130,365,581      stalled-cycles-frontend:u #   90.59% frontend cycles idle     ( +-  0.05% )
    28,363,358,997,959      instructions:u            #    0.65  insn per cycle
                                                      #    1.40  stalled cycles per insn  ( +-  0.00% )
     6,316,784,937,460      branches:u                #  362.793 M/sec                    ( +-  0.00% )
       251,531,919,485      branch-misses:u           #    3.98% of all branches          ( +-  0.00% )
    
         658.886307752 seconds time elapsed                                          ( +-  0.92% )
    
    The patch tries to fix the performance regression by using
    cpu_feature_enabled(X86_FEATURE_LA57) instead of pgtable_l5_enabled in
    all hot code paths. These will statically patch the target code for
    additional performance.
    
    CONFIG_X86_5LEVEL=y + the patch:
    
     Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
    
       17381990.268506      task-clock:u (msec)       #   26.907 CPUs utilized            ( +-  0.19% )
                     0      context-switches:u        #    0.000 K/sec
                     0      cpu-migrations:u          #    0.000 K/sec
           331,862,625      page-faults:u             #    0.019 M/sec                    ( +-  0.01% )
    43,697,726,320,051      cycles:u                  #    2.514 GHz                      ( +-  0.03% )
    39,480,408,690,401      stalled-cycles-frontend:u #   90.35% frontend cycles idle     ( +-  0.05% )
    28,363,394,221,388      instructions:u            #    0.65  insn per cycle
                                                      #    1.39  stalled cycles per insn  ( +-  0.00% )
     6,316,794,985,573      branches:u                #  363.410 M/sec                    ( +-  0.00% )
       251,013,232,547      branch-misses:u           #    3.97% of all branches          ( +-  0.01% )
    
         645.991174661 seconds time elapsed                                          ( +-  1.19% )
    
    Unfortunately, this approach doesn't help with text size:
    
      vmlinux.before .text size:	8190319
      vmlinux.after .text size:	8200623
    
    The .text section is increased by about 4k. Not sure if we can do anything
    about this.
    
    Signed-off-by: default avatarKirill A. Shuemov <kirill.shutemov@linux.intel.com>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arjan van de Ven <arjan@linux.intel.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Woodhouse <dwmw2@infradead.org>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20180216114948.68868-4-kirill.shutemov@linux.intel.com
    
    
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    39b95522