summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2013-02-06KVM: MMU: remove pt_access in mmu_set_spteXiao Guangrong
It is only used in debug code, so drop it Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-06KVM: MMU: cleanup mapping-levelXiao Guangrong
Use min() to cleanup mapping_level Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-06KVM: MMU: lazily drop large spteXiao Guangrong
Currently, kvm zaps the large spte if write-protected is needed, the later read can fault on that spte. Actually, we can make the large spte readonly instead of making them not present, the page fault caused by read access can be avoided The idea is from Avi: | As I mentioned before, write-protecting a large spte is a good idea, | since it moves some work from protect-time to fault-time, so it reduces | jitter. This removes the need for the return value. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-06KVM: VMX: cleanup vmx_set_cr0().Gleb Natapov
When calculating hw_cr0 teh current code masks bits that should be always on and re-adds them back immediately after. Cleanup the code by masking only those bits that should be dropped from hw_cr0. This allow us to get rid of some defines. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-05KVM: VMX: add missing exit names to VMX_EXIT_REASONS arrayGleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-05KVM: VMX: disable SMEP feature when guest is in non-paging modeDongxiao Xu
SMEP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMEP needs to be manually disabled when guest switches to non-paging mode. We met an issue that, SMP Linux guest with recent kernel (enable SMEP support, for example, 3.5.3) would crash with triple fault if setting unrestricted_guest=0. This is because KVM uses an identity mapping page table to emulate the non-paging mode, where the page table is set with USER flag. If SMEP is still enabled in this case, guest will meet unhandlable page fault and then crash. Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-05Revert "KVM: MMU: split kvm_mmu_free_page"Gleb Natapov
This reverts commit bd4c86eaa6ff10abc4e00d0f45d2a28b10b09df4. There is not user for kvm_mmu_isolate_page() any more. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04KVM: MMU: drop superfluous is_present_gpte() check.Gleb Natapov
Gust page walker puts only present ptes into ptes[] array. No need to check it again. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04KVM: MMU: drop superfluous min() call.Gleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04KVM: MMU: set base_role.nxe during mmu initialization.Gleb Natapov
Move base_role.nxe initialisation to where all other roles are initialized. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04KVM: MMU: drop unneeded checks.Gleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04KVM: MMU: make spte_is_locklessly_modifiable() more clearGleb Natapov
spte_is_locklessly_modifiable() checks that both SPTE_HOST_WRITEABLE and SPTE_MMU_WRITEABLE are present on spte. Make it more explicit. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-29x86, apicv: add virtual interrupt delivery supportYang Zhang
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29x86, apicv: add virtual x2apic supportYang Zhang
basically to benefit from apicv, we need to enable virtualized x2apic mode. Currently, we only enable it when guest is really using x2apic. Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic: 0x800 - 0x8ff: no read intercept for apicv register virtualization, except APIC ID and TMCCT which need software's assistance to get right value. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29x86, apicv: add APICv register virtualization supportYang Zhang
- APIC read doesn't cause VM-Exit - APIC write becomes trap-like Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-27KVM: x86 emulator: fix test_cc() build failure on i386Avi Kivity
'pushq' doesn't exist on i386. Replace with 'push', which should work since the operand is a register. Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-24KVM: VMX: set vmx->emulation_required only when needed.Gleb Natapov
If emulate_invalid_guest_state=false vmx->emulation_required is never actually used, but it ends up to be always set to true since handle_invalid_guest_state(), the only place it is reset back to false, is never called. This, besides been not very clean, makes vmexit and vmentry path to check emulate_invalid_guest_state needlessly. The patch fixes that by keeping emulation_required coherent with emulate_invalid_guest_state setting. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24KVM: x86: fix use of uninitialized memory as segment descriptor in emulator.Gleb Natapov
If VMX reports segment as unusable, zero descriptor passed by the emulator before returning. Such descriptor will be considered not present by the emulator. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24KVM: VMX: rename fix_pmode_dataseg to fix_pmode_seg.Gleb Natapov
The function deals with code segment too. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24KVM: VMX: don't clobber segment AR of unusable segments.Gleb Natapov
Usability is returned in unusable field, so not need to clobber entire AR. Callers have to know how to deal with unusable segments already since if emulate_invalid_guest_state=true AR is not zeroed. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24KVM: VMX: skip vmx->rmode.vm86_active check on cr0 write if unrestricted ↵Gleb Natapov
guest is enabled vmx->rmode.vm86_active is never true is unrestricted guest is enabled. Make it more explicit that neither enter_pmode() nor enter_rmode() is called in this case. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24KVM: VMX: remove hack that disables emulation on vcpu reset/initGleb Natapov
There is no reason for it. If state is suitable for vmentry it will be detected during guest entry and no emulation will happen. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24KVM: VMX: if unrestricted guest is enabled vcpu state is always valid.Gleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24KVM: VMX: reset CPL only on CS register write.Gleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24KVM: VMX: remove special CPL cache access during transition to real mode.Gleb Natapov
Since vmx_get_cpl() always returns 0 when VCPU is in real mode it is no longer needed. Also reset CPL cache to zero during transaction to protected mode since transaction may happen while CS.selectors & 3 != 0, but in reality CPL is 0. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23KVM: x86 emulator: convert a few freestanding emulations to fastopAvi Kivity
Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23KVM: x86 emulator: rearrange fastop definitionsAvi Kivity
Make fastop opcodes usable in other emulations. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23KVM: x86 emulator: convert 2-operand IMUL to fastopAvi Kivity
Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23KVM: x86 emulator: convert BT/BTS/BTR/BTC/BSF/BSR to fastopAvi Kivity
Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23KVM: x86 emulator: convert INC/DEC to fastopAvi Kivity
Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23KVM: x86 emulator: covert SETCC to fastopAvi Kivity
This is a bit of a special case since we don't have the usual byte/word/long/quad switch; instead we switch on the condition code embedded in the instruction. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23KVM: x86 emulator: convert shift/rotate instructions to fastopAvi Kivity
SHL, SHR, ROL, ROR, RCL, RCR, SAR, SAL Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23KVM: x86 emulator: Convert SHLD, SHRD to fastopAvi Kivity
Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-21KVM: x86: improve reexecute_instructionXiao Guangrong
The current reexecute_instruction can not well detect the failed instruction emulation. It allows guest to retry all the instructions except it accesses on error pfn For example, some cases are nested-write-protect - if the page we want to write is used as PDE but it chains to itself. Under this case, we should stop the emulation and report the case to userspace Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-21KVM: x86: let reexecute_instruction work for tdpXiao Guangrong
Currently, reexecute_instruction refused to retry all instructions if tdp is enabled. If nested npt is used, the emulation may be caused by shadow page, it can be fixed by dropping the shadow page. And the only condition that tdp can not retry the instruction is the access fault on error pfn Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-21KVM: x86: clean up reexecute_instructionXiao Guangrong
Little cleanup for reexecute_instruction, also use gpa_to_gfn in retry_instruction Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-14KVM: MMU: Conditionally reschedule when kvm_mmu_slot_remove_write_access() ↵Takuya Yoshikawa
takes a long time If the userspace starts dirty logging for a large slot, say 64GB of memory, kvm_mmu_slot_remove_write_access() needs to hold mmu_lock for a long time such as tens of milliseconds. This patch controls the lock hold time by asking the scheduler if we need to reschedule for others. One penalty for this is that we need to flush TLBs before releasing mmu_lock. But since holding mmu_lock for a long time does affect not only the guest, vCPU threads in other words, but also the host as a whole, we should pay for that. In practice, the cost will not be so high because we can protect a fair amount of memory before being rescheduled: on my test environment, cond_resched_lock() was called only once for protecting 12GB of memory even without THP. We can also revisit Avi's "unlocked TLB flush" work later for completely suppressing extra TLB flushes if needed. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14KVM: Make kvm_mmu_slot_remove_write_access() take mmu_lock by itselfTakuya Yoshikawa
Better to place mmu_lock handling and TLB flushing code together since this is a self-contained function. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14KVM: Make kvm_mmu_change_mmu_pages() take mmu_lock by itselfTakuya Yoshikawa
No reason to make callers take mmu_lock since we do not need to protect kvm_mmu_change_mmu_pages() and kvm_mmu_slot_remove_write_access() together by mmu_lock in kvm_arch_commit_memory_region(): the former calls kvm_mmu_commit_zap_page() and flushes TLBs by itself. Note: we do not need to protect kvm->arch.n_requested_mmu_pages by mmu_lock as can be seen from the fact that it is read locklessly. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14KVM: Remove unused slot_bitmap from kvm_mmu_pageTakuya Yoshikawa
Not needed any more. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14KVM: MMU: Make kvm_mmu_slot_remove_write_access() rmap basedTakuya Yoshikawa
This makes it possible to release mmu_lock and reschedule conditionally in a later patch. Although this may increase the time needed to protect the whole slot when we start dirty logging, the kernel should not allow the userspace to trigger something that will hold a spinlock for such a long time as tens of milliseconds: actually there is no limit since it is roughly proportional to the number of guest pages. Another point to note is that this patch removes the only user of slot_bitmap which will cause some problems when we increase the number of slots further. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14KVM: MMU: Remove unused parameter level from __rmap_write_protect()Takuya Yoshikawa
No longer need to care about the mapping level in this function. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14KVM: Write protect the updated slot only when dirty logging is enabledTakuya Yoshikawa
Calling kvm_mmu_slot_remove_write_access() for a deleted slot does nothing but search for non-existent mmu pages which have mappings to that deleted memory; this is safe but a waste of time. Since we want to make the function rmap based in a later patch, in a manner which makes it unsafe to be called for a deleted slot, we makes the caller see if the slot is non-zero and being dirty logged. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-10KVM: MMU: fix infinite fault access retryXiao Guangrong
We have two issues in current code: - if target gfn is used as its page table, guest will refault then kvm will use small page size to map it. We need two #PF to fix its shadow page table - sometimes, say a exception is triggered during vm-exit caused by #PF (see handle_exception() in vmx.c), we remove all the shadow pages shadowed by the target gfn before go into page fault path, it will cause infinite loop: delete shadow pages shadowed by the gfn -> try to use large page size to map the gfn -> retry the access ->... To fix these, we can adjust page size early if the target gfn is used as page table Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-10KVM: MMU: fix Dirty bit missed if CR0.WP = 0Xiao Guangrong
If the write-fault access is from supervisor and CR0.WP is not set on the vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte and clears U bit. This is the chance that kvm can change pte access from readonly to writable Unfortunately, the pte access is the access of 'direct' shadow page table, means direct sp.role.access = pte_access, then we will create a writable spte entry on the readonly shadow page table. It will cause Dirty bit is not tracked when two guest ptes point to the same large page. Note, it does not have other impact except Dirty bit since cr0.wp is encoded into sp.role It can be fixed by adjusting pte access before establishing shadow page table. Also, after that, no mmu specified code exists in the common function and drop two parameters in set_spte Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09KVM: x86 emulator: convert basic ALU ops to fastopAvi Kivity
Opcodes: TEST CMP ADD ADC SUB SBB XOR OR AND Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09KVM: x86 emulator: add macros for defining 2-operand fastop emulationAvi Kivity
Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09KVM: x86 emulator: convert NOT, NEG to fastopAvi Kivity
Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWriteAvi Kivity
Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09KVM: x86 emulator: introduce NoWrite flagAvi Kivity
Instead of disabling writeback via OP_NONE, just specify NoWrite. Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>