1. 06 Apr, 2012 1 commit
    • Konrad Rzeszutek Wilk's avatar
      xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' · 2531d64b
      Konrad Rzeszutek Wilk authored
      
      
      The above mentioned patch checks the IOAPIC and if it contains
      -1, then it unmaps said IOAPIC. But under Xen we get this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
      IP: [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
      PGD 0
      Oops: 0002 [#1] SMP
      CPU 0
      Modules linked in:
      
      Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron
      1525                  /0U990C
      RIP: e030:[<ffffffff8134e51f>]  [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
      RSP: e02b: ffff8800d42cbb70  EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 00000000ffffffef RCX: 0000000000000001
      RDX: 0000000000000040 RSI: 00000000ffffffef RDI: 0000000000000001
      RBP: ffff8800d42cbb80 R08: ffff8800d6400000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffef
      R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000010
      FS:  0000000000000000(0000) GS:ffff8800df5fe000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0:000000008005003b
      CR2: 0000000000000040 CR3: 0000000001a05000 CR4: 0000000000002660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper/0 (pid: 1, threadinfo ffff8800d42ca000, task ffff8800d42d0000)
      Stack:
       00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157
       ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384
       0000000000000002 0000000000000010 00000000ffffffff 0000000000000000
      Call Trace:
       [<ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230
       [<ffffffff8100a9b2>] ? check_events+0x12+0x20
       [<ffffffff814bab42>] xen_register_pirq+0x82/0xe0
       [<ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0
       [<ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30
       [<ffffffff8103036f>] acpi_register_gsi+0xf/0x20
       [<ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202
       [<ffffffff814bc849>] pcibios_enable_device+0x39/0x40
       [<ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70
       [<ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0
       [<ffffffff812dc8d3>] pci_enable_device+0x13/0x20
      
      The reason we are dying is b/c the call acpi_get_override_irq() is used,
      which returns the polarity and trigger for the IRQs. That function calls
      mp_find_ioapics to get the 'struct ioapic' structure - which along with the
      mp_irq[x] is used to figure out the default values and the polarity/trigger
      overrides. Since the mp_find_ioapics now returns -1 [b/c the IOAPIC is filled
      with 0xffffffff], the acpi_get_override_irq() stops trying to lookup in the
      mp_irq[x] the proper INT_SRV_OVR and we can't install the SCI interrupt.
      
      The proper fix for this is going in v3.5 and adds an x86_io_apic_ops
      struct so that platforms can override it. But for v3.4 lets carry this
      work-around. This patch does that by providing a slightly different variant
      of the fake IOAPIC entries.
      
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2531d64b
  2. 20 Feb, 2012 1 commit
    • Konrad Rzeszutek Wilk's avatar
      xen/pat: Disable PAT support for now. · 8eaffa67
      Konrad Rzeszutek Wilk authored
      [Pls also look at https://lkml.org/lkml/2012/2/10/228]
      
      Using of PAT to change pages from WB to WC works quite nicely.
      Changing it back to WB - not so much. The crux of the matter is
      that the code that does this (__page_change_att_set_clr) has only
      limited information so when it tries to the change it gets
      the "raw" unfiltered information instead of the properly filtered one -
      and the "raw" one tell it that PSE bit is on (while infact it
      is not).  As a result when the PTE is set to be WB from WC, we get
      tons of:
      
      :WARNING: at arch/x86/xen/mmu.c:475 xen_make_pte+0x67/0xa0()
      :Hardware name: HP xw4400 Workstation
      .. snip..
      :Pid: 27, comm: kswapd0 Tainted: G        W    3.2.2-1.fc16.x86_64 #1
      :Call Trace:
      : [<ffffffff8106dd1f>] warn_slowpath_common+0x7f/0xc0
      : [<ffffffff8106dd7a>] warn_slowpath_null+0x1a/0x20
      : [<ffffffff81005a17>] xen_make_pte+0x67/0xa0
      : [<ffffffff810051bd>] __raw_callee_save_xen_make_pte+0x11/0x1e
      : [<ffffffff81040e15>] ? __change_page_attr_set_clr+0x9d5/0xc00
      : [<ffffffff8114c2e8>] ? __purge_vmap_area_lazy+0x158/0x1d0
      : [<ffffffff8114cca5>] ? vm_unmap_aliases+0x175/0x190
      : [<ffffffff81041168>] change_page_attr_set_clr+0x128/0x4c0
      : [<ffffffff81041542>] set_pages_array_wb+0x42/0xa0
      : [<ffffffff8100a9b2>] ? check_events+0x12/0x20
      : [<ffffffffa0074d4c>] ttm_pages_put+0x1c/0x70 [ttm]
      : [<ffffffffa0074e98>] ttm_page_pool_free+0xf8/0x180 [ttm]
      : [<ffffffffa0074f78>] ttm_pool_mm_shrink+0x58/0x90 [ttm]
      : [<ffffffff8112ba04>] shrink_slab+0x154/0x310
      : [<ffffffff8112f17a>] balance_pgdat+0x4fa/0x6c0
      : [<ffffffff8112f4b8>] kswapd+0x178/0x3d0
      : [<ffffffff815df134>] ? __schedule+0x3d4/0x8c0
      : [<ffffffff81090410>] ? remove_wait_queue+0x50/0x50
      : [<ffffffff8112f340>] ? balance_pgdat+0x6c0/0x6c0
      : [<ffffffff8108fb6c>] kthread+0x8c/0xa0
      
      for every page. The proper fix for this is has been posted
      and is https://lkml.org/lkml/2012/2/10/228
      
      
      "x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations."
      along with a detailed description of the problem and solution.
      
      But since that posting has gone nowhere I am proposing
      this band-aid solution so that at least users don't get
      the page corruption (the pages that are WC don't get changed to WB
      and end up being recycled for filesystem or other things causing
      mysterious crashes).
      
      The negative impact of this patch is that users of WC flag
      (which are InfiniBand, radeon, nouveau drivers) won't be able
      to set that flag - so they are going to see performance degradation.
      But stability is more important here.
      
      Fixes RH BZ# 742032, 787403, and 745574
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8eaffa67
  3. 24 Jan, 2012 1 commit
  4. 09 Jan, 2012 1 commit
  5. 24 Sep, 2011 1 commit
  6. 15 Sep, 2011 1 commit
  7. 17 Aug, 2011 1 commit
    • Jan Beulich's avatar
      xen/x86: replace order-based range checking of M2P table by linear one · ccbcdf7c
      Jan Beulich authored
      
      
      The order-based approach is not only less efficient (requiring a shift
      and a compare, typical generated code looking like this
      
      	mov	eax, [machine_to_phys_order]
      	mov	ecx, eax
      	shr	ebx, cl
      	test	ebx, ebx
      	jnz	...
      
      whereas a direct check requires just a compare, like in
      
      	cmp	ebx, [machine_to_phys_nr]
      	jae	...
      
      ), but also slightly dangerous in the 32-on-64 case - the element
      address calculation can wrap if the next power of two boundary is
      sufficiently far away from the actual upper limit of the table, and
      hence can result in user space addresses being accessed (with it being
      unknown what may actually be mapped there).
      
      Additionally, the elimination of the mistaken use of fls() here (should
      have been __fls()) fixes a latent issue on x86-64 that would trigger
      if the code was run on a system with memory extending beyond the 44-bit
      boundary.
      
      CC: stable@kernel.org
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      [v1: Based on Jeremy's feedback]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ccbcdf7c
  8. 09 Aug, 2011 1 commit
  9. 04 Aug, 2011 1 commit
  10. 18 Jul, 2011 6 commits
  11. 14 Jul, 2011 1 commit
  12. 30 Jun, 2011 1 commit
  13. 15 Jun, 2011 1 commit
    • Andrew Jones's avatar
      xen: support CONFIG_MAXSMP · 900cba88
      Andrew Jones authored
      
      
      The MAXSMP config option requires CPUMASK_OFFSTACK, which in turn
      requires we init the memory for the maps while we bring up the cpus.
      MAXSMP also increases NR_CPUS to 4096. This increase in size exposed an
      issue in the argument construction for multicalls from
      xen_flush_tlb_others. The args should only need space for the actual
      number of cpus.
      
      Also in 2.6.39 it exposes a bootup problem.
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8157a1d3>] set_cpu_sibling_map+0x123/0x30d
      ...
      Call Trace:
      [<ffffffff81039a3f>] ? xen_restore_fl_direct_reloc+0x4/0x4
      [<ffffffff819dc4db>] xen_smp_prepare_cpus+0x36/0x135
      ..
      
      CC: stable@kernel.org
      Signed-off-by: Andrew Jones's avatarAndrew Jones <drjones@redhat.com>
      [v2: Updated to compile on 3.0]
      [v3: Updated to compile when CONFIG_SMP is not defined]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      900cba88
  14. 09 Jun, 2011 1 commit
  15. 20 May, 2011 9 commits
  16. 19 May, 2011 1 commit
  17. 12 May, 2011 3 commits
    • Tian, Kevin's avatar
      xen mmu: fix a race window causing leave_mm BUG() · 7899891c
      Tian, Kevin authored
      There's a race window in xen_drop_mm_ref, where remote cpu may exit
      dirty bitmap between the check on this cpu and the point where remote
      cpu handles drop request. So in drop_other_mm_ref we need check
      whether TLB state is still lazy before calling into leave_mm. This
      bug is rarely observed in earlier kernel, but exaggerated by the
      commit 831d52bc
      
      
      ("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm")
      which clears bitmap after changing the TLB state. the call trace is as below:
      
      ---------------------------------
      kernel BUG at arch/x86/mm/tlb.c:61!
      invalid opcode: 0000 [#1] SMP
      last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
      CPU 1
      Modules linked in: 8021q garp xen_netback xen_blkback blktap blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc lp parport ses enclosure snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device serio_raw bnx2 snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt snd soundcore snd_page_alloc i2c_i801 iTCO_vendor_support i2c_core pcs pkr pata_acpi ata_generic ata_piix shpchp mptsas mptscsih mptbase [last unloaded: freq_table]
      Pid: 25581, comm: khelper Not tainted 2.6.32.36fixxen #1 Tecal RH2285
      RIP: e030:[<ffffffff8103a3cb>]  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
      RSP: e02b:ffff88002805be48  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88015f8e2da0
      RDX: ffff88002805be78 RSI: 0000000000000000 RDI: 0000000000000001
      RBP: ffff88002805be48 R08: ffff88009d662000 R09: dead000000200200
      R10: dead000000100100 R11: ffffffff814472b2 R12: ffff88009bfc1880
      R13: ffff880028063020 R14: 00000000000004f6 R15: 0000000000000000
      FS:  00007f62362d66e0(0000) GS:ffff880028058000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000003aabc11909 CR3: 000000009b8ca000 CR4: 0000000000002660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000 00
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process khelper (pid: 25581, threadinfo ffff88007691e000, task ffff88009b92db40)
      Stack:
       ffff88002805be68 ffffffff8100e4ae 0000000000000001 ffff88009d733b88
      <0> ffff88002805be98 ffffffff81087224 ffff88002805be78 ffff88002805be78
      <0> ffff88015f808360 00000000000004f6 ffff88002805bea8 ffffffff81010108
      Call Trace:
       <IRQ>
       [<ffffffff8100e4ae>] drop_other_mm_ref+0x2a/0x53
       [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
       [<ffffffff81010108>] xen_call_function_single_interrupt+0x13/0x28
       [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
       [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
       [<ffffffff8128c1c0>] __xen_evtchn_do_upcall+0x1ab/0x27d
       [<ffffffff8128dd11>] xen_evtchn_do_upcall+0x33/0x46
       [<ffffffff81013efe>] xen_do_hyper visor_callback+0x1e/0x30
       <EOI>
       [<ffffffff814472b2>] ? _spin_unlock_irqrestore+0x15/0x17
       [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
       [<ffffffff81113f71>] ? flush_old_exec+0x3ac/0x500
       [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
       [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
       [<ffffffff8115115d>] ? load_elf_binary+0x398/0x17ef
       [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
       [<ffffffff811f4648>] ? process_measurement+0xc0/0xd7
       [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
       [<ffffffff81113094>] ? search_binary_handler+0xc8/0x255
       [<ffffffff81114362>] ? do_execve+0x1c3/0x29e
       [<ffffffff8101155d>] ? sys_execve+0x43/0x5d
       [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
       [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
       [<ffffffff 8106fc45>] ? __call_usermodehelper+0x0/0x6f
       [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
       [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
       [<ffffffff81013daa>] ? child_rip+0xa/0x20
       [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
       [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
       [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
       [<ffffffff81013da0>] ? child_rip+0x0/0x20
      Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8
      RIP  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
       RSP <ffff88002805be48>
      ---[ end trace ce9cee6832a9c503 ]---
      
      Tested-by: default avatar <Maoxiaoyun&lt;tinnycloud@hotmail.com>
      Signed-off-by: default avatarKevin Tian <kevin.tian@intel.com>
      [v1: Fleshed out the git description a bit]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7899891c
    • Stefano Stabellini's avatar
      x86,xen: introduce x86_init.mapping.pagetable_reserve · 279b706b
      Stefano Stabellini authored
      Introduce a new x86_init hook called pagetable_reserve that at the end
      of init_memory_mapping is used to reserve a range of memory addresses for
      the kernel pagetable pages we used and free the other ones.
      
      On native it just calls memblock_x86_reserve_range while on xen it also
      takes care of setting the spare memory previously allocated
      for kernel pagetable pages from RO to RW, so that it can be used for
      other purposes.
      
      A detailed explanation of the reason why this hook is needed follows.
      
      As a consequence of the commit:
      
      commit 4b239f45
      
      
      Author: Yinghai Lu <yinghai@kernel.org>
      Date:   Fri Dec 17 16:58:28 2010 -0800
      
          x86-64, mm: Put early page table high
      
      at some point init_memory_mapping is going to reach the pagetable pages
      area and map those pages too (mapping them as normal memory that falls
      in the range of addresses passed to init_memory_mapping as argument).
      Some of those pages are already pagetable pages (they are in the range
      pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
      everything is fine.
      Some of these pages are not pagetable pages yet (they fall in the range
      pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
      are going to be mapped RW.  When these pages become pagetable pages and
      are hooked into the pagetable, xen will find that the guest has already
      a RW mapping of them somewhere and fail the operation.
      The reason Xen requires pagetables to be RO is that the hypervisor needs
      to verify that the pagetables are valid before using them. The validation
      operations are called "pinning" (more details in arch/x86/xen/mmu.c).
      
      In order to fix the issue we mark all the pages in the entire range
      pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
      is completed only the range pgt_buf_start-pgt_buf_end is reserved by
      init_memory_mapping. Hence the kernel is going to crash as soon as one
      of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
      ranges are RO).
      
      For this reason we need a hook to reserve the kernel pagetable pages we
      used and free the other ones so that they can be reused for other
      purposes.
      On native it just means calling memblock_x86_reserve_range, on Xen it
      also means marking RW the pagetable pages that we allocated before but
      that haven't been used before.
      
      Another way to fix this is without using the hook is by adding a 'if
      (xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
      counterpart, but that is just nasty.
      
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      279b706b
    • Konrad Rzeszutek Wilk's avatar
      Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high"" · 92bdaef7
      Konrad Rzeszutek Wilk authored
      This reverts commit a3864783
      
      .
      
      It does not work with certain AMD machines.
      
      last_pfn = 0x100000 max_arch_pfn = 0x400000000
      initial memory mapped : 0 - 02c3a000
      Base memory trampoline at [ffff88000009b000] 9b000 size 20480
      init_memory_mapping: 0000000000000000-0000000100000000
       0000000000 - 0100000000 page 4k
      kernel direct mapping tables up to 100000000 @ ff7fb000-100000000
      init_memory_mapping: 0000000100000000-00000001e0800000
       0100000000 - 01e0800000 page 4k
      kernel direct mapping tables up to 1e0800000 @ 1df0f3000-1e0000000
      xen: setting RW the range fffdc000 - 100000000
      RAMDISK: 0203b000 - 02c3a000
      No NUMA configuration found
      Faking a node at 0000000000000000-00000001e0800000
      NUMA: Using 63 for the hash shift.
      Initmem setup node 0 0000000000000000-00000001e0800000
        NODE_DATA [00000001dfffb000 - 00000001dfffffff]
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
      PGD 0
      Oops: 0003 [#1] SMP
      last sysfs file:
      CPU 0
      Modules linked in:
      
      Pid: 0, comm: swapper Not tainted 2.6.39-0-virtual #6~smb1
      RIP: e030:[<ffffffff81cf6a75>]  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
      RSP: e02b:ffffffff81c01e38  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 00000001e0800000 RCX: 0000000000001040
      RDX: 0000000000004100 RSI: 0000000000000000 RDI: ffff8801dfffb000
      RBP: ffffffff81c01e58 R08: 0000000000000020 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000bfe400
      FS:  0000000000000000(0000) GS:ffffffff81cca000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000001c03000 CR4: 0000000000000660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0b020)
      Stack:
       0000000000000040 0000000000000001 0000000000000000 ffffffffffffffff
       ffffffff81c01e88 ffffffff81cf6c25 0000000000000000 0000000000000000
       ffffffff81cf687f 0000000000000000 ffffffff81c01ea8 ffffffff81cf6e45
      Call Trace:
       [<ffffffff81cf6c25>] numa_register_memblks.constprop.3+0x150/0x181
       [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
       [<ffffffff81cf6e45>] numa_init.part.2+0x1c/0x7c
       [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
       [<ffffffff81cf6f67>] numa_init+0x6c/0x70
       [<ffffffff81cf7057>] initmem_init+0x39/0x3b
       [<ffffffff81ce5865>] setup_arch+0x64e/0x769
       [<ffffffff815e43c1>] ? printk+0x51/0x53
       [<ffffffff81cdf92b>] start_kernel+0xd4/0x3f3
       [<ffffffff81cdf388>] x86_64_start_reservations+0x132/0x136
       [<ffffffff81ce2ed4>] xen_start_kernel+0x588/0x58f
      Code: 41 00 00 48 8b 3c c5 a0 24 cc 81 31 c0 40 f6 c7 01 74 05 aa 66 ba ff 40 40 f6 c7 02 74 05 66 ab 83 ea 02 89 d1 c1 e9 02 f6 c2 02 <f3> ab 74 02 66 ab 80 e2 01 74 01 aa 49 63 c4 48 c1 eb 0c 44 89
      RIP  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
       RSP <ffffffff81c01e38>
      CR2: 0000000000000000
      ---[ end trace a7919e7f17c0a725 ]---
      Kernel panic - not syncing: Attempted to kill the idle task!
      Pid: 0, comm: swapper Tainted: G      D     2.6.39-0-virtual #6~smb1
      
      Reported-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      92bdaef7
  18. 02 May, 2011 2 commits
    • Stefano Stabellini's avatar
      xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top · b9269dc7
      Stefano Stabellini authored
      
      
      mask_rw_pte is currently checking if a pfn is a pagetable page if it
      falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
      because pgt_buf_end is a moving target: pgt_buf_top is the real
      boundary.
      
      Acked-by: default avatar"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b9269dc7
    • Konrad Rzeszutek Wilk's avatar
      xen/mmu: Add workaround "x86-64, mm: Put early page table high" · a3864783
      Konrad Rzeszutek Wilk authored
      As a consequence of the commit:
      
      commit 4b239f45
      
      
      Author: Yinghai Lu <yinghai@kernel.org>
      Date:   Fri Dec 17 16:58:28 2010 -0800
      
          x86-64, mm: Put early page table high
      
      it causes the Linux kernel to crash under Xen:
      
      mapping kernel into physical memory
      Xen: setup ISA identity maps
      about to get started...
      (XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
      (XEN) mm.c:3027:d0 Error while pinning mfn b1d89
      (XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      ...
      
      The reason is that at some point init_memory_mapping is going to reach
      the pagetable pages area and map those pages too (mapping them as normal
      memory that falls in the range of addresses passed to init_memory_mapping
      as argument). Some of those pages are already pagetable pages (they are
      in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
      mapped RO and everything is fine.
      Some of these pages are not pagetable pages yet (they fall in the range
      pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
      are going to be mapped RW.  When these pages become pagetable pages and
      are hooked into the pagetable, xen will find that the guest has already
      a RW mapping of them somewhere and fail the operation.
      The reason Xen requires pagetables to be RO is that the hypervisor needs
      to verify that the pagetables are valid before using them. The validation
      operations are called "pinning" (more details in arch/x86/xen/mmu.c).
      
      In order to fix the issue we mark all the pages in the entire range
      pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
      is completed only the range pgt_buf_start-pgt_buf_end is reserved by
      init_memory_mapping. Hence the kernel is going to crash as soon as one
      of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
      ranges are RO).
      
      For this reason, this function is introduced which is called _after_
      the init_memory_mapping has completed (in a perfect world we would
      call this function from init_memory_mapping, but lets ignore that).
      
      Because we are called _after_ init_memory_mapping the pgt_buf_[start,
      end,top] have all changed to new values (b/c another init_memory_mapping
      is called). Hence, the first time we enter this function, we save
      away the pgt_buf_start value and update the pgt_buf_[end,top].
      
      When we detect that the "old" pgt_buf_start through pgt_buf_end
      PFNs have been reserved (so memblock_x86_reserve_range has been called),
      we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.
      
      And then we update those "old" pgt_buf_[end|top] with the new ones
      so that we can redo this on the next pagetable.
      
      Acked-by: default avatar"H. Peter Anvin" <hpa@zytor.com>
      Reviewed-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      [v1: Updated with Jeremy's comments]
      [v2: Added the crash output]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a3864783
  19. 20 Apr, 2011 1 commit
    • Stefano Stabellini's avatar
      xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32 · ee176455
      Stefano Stabellini authored
      
      
      The two "is_early_ioremap_ptep" checks in mask_rw_pte are only used on
      x86_64, in fact early_ioremap is not used at all to setup the initial
      pagetable on x86_32.
      Moreover on x86_32 the two checks are wrong because the range
      pgt_buf_start..pgt_buf_end initially should be mapped RW because
      the pages in the range are not pagetable pages yet and haven't been
      cleared yet. Afterwards considering the pgt_buf_start..pgt_buf_end is
      part of the initial mapping, xen_alloc_pte is capable of turning
      the ptes RO when they become pagetable pages.
      
      Fix the issue and improve the readability of the code providing two
      different implementation of mask_rw_pte for x86_32 and x86_64.
      
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ee176455
  20. 04 Apr, 2011 1 commit
  21. 19 Mar, 2011 2 commits
    • Stefano Stabellini's avatar
      xen: update mask_rw_pte after kernel page tables init changes · d8aa5ec3
      Stefano Stabellini authored
      After "x86-64, mm: Put early page table high" already existing kernel
      page table pages can be mapped using early_ioremap too so we need to
      update mask_rw_pte to make sure these pages are still mapped RO.
      The reason why we have to do that is explain by the commit message of
      fef5ba79
      
      :
      
      "Xen requires that all pages containing pagetable entries to be mapped
      read-only.  If pages used for the initial pagetable are already mapped
      then we can change the mapping to RO.  However, if they are initially
      unmapped, we need to make sure that when they are later mapped, they
      are also mapped RO.
      
      ..SNIP..
      
      the pagetable setup code early_ioremaps the pages to write their
      entries, so we must make sure that mappings created in the early_ioremap
      fixmap area are mapped RW.  (Those mappings are removed before the pages
      are presented to Xen as pagetable pages.)"
      
      We accomplish all this in mask_rw_pte by mapping RO all the pages mapped
      using early_ioremap apart from the last one that has been allocated
      because it is not a page table page yet (it has not been hooked into the
      page tables yet).
      
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      d8aa5ec3
    • Stefano Stabellini's avatar
      xen: set max_pfn_mapped to the last pfn mapped · 14988a4d
      Stefano Stabellini authored
      
      
      Do not set max_pfn_mapped to the end of the initial memory mappings,
      that also contain pages that don't belong in pfn space (like the mfn
      list).
      
      Set max_pfn_mapped to the last real pfn mapped in the initial memory
      mappings that is the pfn backing _end.
      
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      14988a4d
  22. 18 Mar, 2011 1 commit
  23. 14 Mar, 2011 1 commit