Skip to content
  • Stefano Stabellini's avatar
    x86,xen: introduce x86_init.mapping.pagetable_reserve · 279b706b
    Stefano Stabellini authored
    Introduce a new x86_init hook called pagetable_reserve that at the end
    of init_memory_mapping is used to reserve a range of memory addresses for
    the kernel pagetable pages we used and free the other ones.
    
    On native it just calls memblock_x86_reserve_range while on xen it also
    takes care of setting the spare memory previously allocated
    for kernel pagetable pages from RO to RW, so that it can be used for
    other purposes.
    
    A detailed explanation of the reason why this hook is needed follows.
    
    As a consequence of the commit:
    
    commit 4b239f45
    
    
    Author: Yinghai Lu <yinghai@kernel.org>
    Date:   Fri Dec 17 16:58:28 2010 -0800
    
        x86-64, mm: Put early page table high
    
    at some point init_memory_mapping is going to reach the pagetable pages
    area and map those pages too (mapping them as normal memory that falls
    in the range of addresses passed to init_memory_mapping as argument).
    Some of those pages are already pagetable pages (they are in the range
    pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
    everything is fine.
    Some of these pages are not pagetable pages yet (they fall in the range
    pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
    are going to be mapped RW.  When these pages become pagetable pages and
    are hooked into the pagetable, xen will find that the guest has already
    a RW mapping of them somewhere and fail the operation.
    The reason Xen requires pagetables to be RO is that the hypervisor needs
    to verify that the pagetables are valid before using them. The validation
    operations are called "pinning" (more details in arch/x86/xen/mmu.c).
    
    In order to fix the issue we mark all the pages in the entire range
    pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
    is completed only the range pgt_buf_start-pgt_buf_end is reserved by
    init_memory_mapping. Hence the kernel is going to crash as soon as one
    of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
    ranges are RO).
    
    For this reason we need a hook to reserve the kernel pagetable pages we
    used and free the other ones so that they can be reused for other
    purposes.
    On native it just means calling memblock_x86_reserve_range, on Xen it
    also means marking RW the pagetable pages that we allocated before but
    that haven't been used before.
    
    Another way to fix this is without using the hook is by adding a 'if
    (xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
    counterpart, but that is just nasty.
    
    Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
    Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
    Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    279b706b