summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-05-30KVM: PPC: PR: Fill pvinfo hcall instructions in big endianAlexander Graf
We expose a blob of hypercall instructions to user space that it gives to the guest via device tree again. That blob should contain a stream of instructions necessary to do a hypercall in big endian, as it just gets passed into the guest and old guests use them straight away. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: PAPR: Access RTAS in big endianAlexander Graf
When the guest does an RTAS hypercall it keeps all RTAS variables inside a big endian data structure. To make sure we don't have to bother about endianness inside the actual RTAS handlers, let's just convert the whole structure to host endian before we call our RTAS handlers and back to big endian when we return to the guest. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: PAPR: Access HTAB in big endianAlexander Graf
The HTAB on PPC is always in big endian. When we access it via hypercalls on behalf of the guest and we're running on a little endian host, we need to make sure we swap the bits accordingly. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Default to big endian guestAlexander Graf
The default MSR when user space does not define anything should be identical on little and big endian hosts, so remove MSR_LE from it. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S_64 PR: Access shadow slb in big endianAlexander Graf
The "shadow SLB" in the PACA is shared with the hypervisor, so it has to be big endian. We access the shadow SLB during world switch, so let's make sure we access it in big endian even when we're on a little endian host. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S_64 PR: Access HTAB in big endianAlexander Graf
The HTAB is always big endian. We access the guest's HTAB using copy_from/to_user, but don't yet take care of the fact that we might be running on an LE host. Wrap all accesses to the guest HTAB with big endian accessors. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S_32: PR: Access HTAB in big endianAlexander Graf
The HTAB is always big endian. We access the guest's HTAB using copy_from/to_user, but don't yet take care of the fact that we might be running on an LE host. Wrap all accesses to the guest HTAB with big endian accessors. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S: PR: Fix C/R bit settingAlexander Graf
Commit 9308ab8e2d made C/R HTAB updates go byte-wise into the target HTAB. However, it didn't update the guest's copy of the HTAB, but instead the host local copy of it. Write to the guest's HTAB instead. Signed-off-by: Alexander Graf <agraf@suse.de> CC: Paul Mackerras <paulus@samba.org> Acked-by: Paul Mackerras <paulus@samba.org>
2014-05-30KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options onAneesh Kumar K.V
With debug option "sleep inside atomic section checking" enabled we get the below WARN_ON during a PR KVM boot. This is because upstream now have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the warning by adding preempt_disable/enable around floating point and altivec enable. WARNING: at arch/powerpc/kernel/process.c:156 Modules linked in: kvm_pr kvm CPU: 1 PID: 3990 Comm: qemu-system-ppc Tainted: G W 3.15.0-rc1+ #4 task: c0000000eb85b3a0 ti: c0000000ec59c000 task.ti: c0000000ec59c000 NIP: c000000000015c84 LR: d000000003334644 CTR: c000000000015c00 REGS: c0000000ec59f140 TRAP: 0700 Tainted: G W (3.15.0-rc1+) MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 42000024 XER: 20000000 CFAR: c000000000015c24 SOFTE: 1 GPR00: d000000003334644 c0000000ec59f3c0 c000000000e2fa40 c0000000e2f80000 GPR04: 0000000000000800 0000000000002000 0000000000000001 8000000000000000 GPR08: 0000000000000001 0000000000000001 0000000000002000 c000000000015c00 GPR12: d00000000333da18 c00000000fb80900 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 00003fffce4e0fa1 GPR20: 0000000000000010 0000000000000001 0000000000000002 00000000100b9a38 GPR24: 0000000000000002 0000000000000000 0000000000000000 0000000000000013 GPR28: 0000000000000000 c0000000eb85b3a0 0000000000002000 c0000000e2f80000 NIP [c000000000015c84] .enable_kernel_fp+0x84/0x90 LR [d000000003334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr] Call Trace: [c0000000ec59f3c0] [0000000000000010] 0x10 (unreliable) [c0000000ec59f430] [d000000003334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr] [c0000000ec59f4c0] [d00000000324b380] .kvmppc_set_msr+0x30/0x50 [kvm] [c0000000ec59f530] [d000000003337cac] .kvmppc_core_emulate_op_pr+0x16c/0x5e0 [kvm_pr] [c0000000ec59f5f0] [d00000000324a944] .kvmppc_emulate_instruction+0x284/0xa80 [kvm] [c0000000ec59f6c0] [d000000003336888] .kvmppc_handle_exit_pr+0x488/0xb70 [kvm_pr] [c0000000ec59f790] [d000000003338d34] kvm_start_lightweight+0xcc/0xdc [kvm_pr] [c0000000ec59f960] [d000000003336288] .kvmppc_vcpu_run_pr+0xc8/0x190 [kvm_pr] [c0000000ec59f9f0] [d00000000324c880] .kvmppc_vcpu_run+0x30/0x50 [kvm] [c0000000ec59fa60] [d000000003249e74] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [kvm] [c0000000ec59faf0] [d000000003244948] .kvm_vcpu_ioctl+0x478/0x760 [kvm] [c0000000ec59fcb0] [c000000000224e34] .do_vfs_ioctl+0x4d4/0x790 [c0000000ec59fd90] [c000000000225148] .SyS_ioctl+0x58/0xb0 [c0000000ec59fe30] [c00000000000a1e4] syscall_exit+0x0/0x98 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: BOOK3S: PR: Enable Little Endian PR guestAneesh Kumar K.V
This patch make sure we inherit the LE bit correctly in different case so that we can run Little Endian distro in PR mode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: E500: Add dcbtls emulationAlexander Graf
The dcbtls instruction is able to lock data inside the L1 cache. We don't want to give the guest actual access to hardware cache locks, as that could influence other VMs on the same system. But we can tell the guest that its locking attempt failed. By implementing the instruction we at least don't give the guest a program exception which it definitely does not expect. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: E500: Ignore L1CSR1_ICFI,ICLFRAlexander Graf
The L1 instruction cache control register contains bits that indicate that we're still handling a request. Mask those out when we set the SPR so that a read doesn't assume we're still doing something. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-22KVM: vmx: DR7 masking on task switch emulation is wrongNadav Amit
The DR7 masking which is done on task switch emulation should be in hex format (clearing the local breakpoints enable bits 0,2,4 and 6). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22x86: fix page fault tracing when KVM guest support enabledDave Hansen
I noticed on some of my systems that page fault tracing doesn't work: cd /sys/kernel/debug/tracing echo 1 > events/exceptions/enable cat trace; # nothing shows up I eventually traced it down to CONFIG_KVM_GUEST. At least in a KVM VM, enabling that option breaks page fault tracing, and disabling fixes it. I tried on some old kernels and this does not appear to be a regression: it never worked. There are two page-fault entry functions today. One when tracing is on and another when it is off. The KVM code calls do_page_fault() directly instead of calling the traced version: > dotraplinkage void __kprobes > do_async_page_fault(struct pt_regs *regs, unsigned long > error_code) > { > enum ctx_state prev_state; > > switch (kvm_read_and_reset_pf_reason()) { > default: > do_page_fault(regs, error_code); > break; > case KVM_PV_REASON_PAGE_NOT_PRESENT: I'm also having problems with the page fault tracing on bare metal (same symptom of no trace output). I'm unsure if it's related. Steven had an alternative to this which has zero overhead when tracing is off where this includes the standard noops even when tracing is disabled. I'm unconvinced that the extra complexity of his apporach: http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home is worth it, expecially considering that the KVM code is already making page fault entry slower here. This solution is dirt-simple. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gleb Natapov <gleb@redhat.com> Cc: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22KVM: x86: get CPL from SS.DPLPaolo Bonzini
CS.RPL is not equal to the CPL in the few instructions between setting CR0.PE and reloading CS. And CS.DPL is also not equal to the CPL for conforming code segments. However, SS.DPL *is* always equal to the CPL except for the weird case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the value in the STAR MSR, but force CPL=3 (Intel instead forces SS.DPL=SS.RPL=CPL=3). So this patch: - modifies SVM to update the CPL from SS.DPL rather than CS.RPL; the above case with SYSRET is not broken further, and the way to fix it would be to pass the CPL to userspace and back - modifies VMX to always return the CPL from SS.DPL (except forcing it to 0 if we are emulating real mode via vm86 mode; in vm86 mode all DPLs have to be 3, but real mode does allow privileged instructions). It also removes the CPL cache, which becomes a duplicate of the SS access rights cache. This fixes doing KVM_IOCTL_SET_SREGS exactly after setting CR0.PE=1 but before CS has been reloaded. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22KVM: x86: check CS.DPL against RPL during task switchPaolo Bonzini
Table 7-1 of the SDM mentions a check that the code segment's DPL must match the selector's RPL. This was not done by KVM, fix it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22KVM: x86: drop set_rflags callbackPaolo Bonzini
Not needed anymore now that the CPL is computed directly during task switch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22KVM: x86: use new CS.RPL as CPL during task switchPaolo Bonzini
During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition to all the other requirements) and will be the new CPL. So far this worked by carefully setting the CS selector and flag before doing the task switch; setting CS.selector will already change the CPL. However, this will not work once we get the CPL from SS.DPL, because then you will have to set the full segment descriptor cache to change the CPL. ctxt->ops->cpl(ctxt) will then return the old CPL during the task switch, and the check that SS.DPL == CPL will fail. Temporarily assume that the CPL comes from CS.RPL during task switch to a protected-mode task. This is the same approach used in QEMU's emulation code, which (until version 2.0) manually tracks the CPL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-16Merge tag 'kvm-s390-20140516' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next 1. Correct locking for lazy storage key handling A test loop with multiple CPUs triggered a race in the lazy storage key handling as introduced by commit 934bc131efc3e4be6a52f7dd6c4dbf (KVM: s390: Allow skeys to be enabled for the current process). This race should not happen with Linux guests, but let's fix it anyway. Patch touches !/kvm/ code, but is from the s390 maintainer. 2. Better handling of broken guests If we detect a program check loop we stop the guest instead of wasting CPU cycles. 3. Better handling on MVPG emulation The move page handling is improved to be architecturally correct. 3. Trace point rework Let's rework the kvm trace points to have a common header file (for later perf usage) and provided a table based instruction decoder. 4. Interpretive execution of SIGP external call Let the hardware handle most cases of SIGP external call (IPI) and wire up the fixup code for the corner cases. 5. Initial preparations for the IBC facility Prepare the code to handle instruction blocking
2014-05-16KVM: s390: split SIE state guest prefix fieldMichael Mueller
This patch splits the SIE state guest prefix at offset 4 into a prefix bit field. Additionally it provides the access functions: - kvm_s390_get_prefix() - kvm_s390_set_prefix() to access the prefix per vcpu. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16s390/sclp: add sclp_get_ibc functionMichael Mueller
The patch adds functionality to retrieve the IBC configuration by means of function sclp_get_ibc(). Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16KVM: s390: interpretive execution of SIGP EXTERNAL CALLDavid Hildenbrand
If the sigp interpretation facility is installed, most SIGP EXTERNAL CALL operations will be interpreted instead of intercepted. A partial execution interception will occurr at the sending cpu only if the target cpu is in the wait state ("W" bit in the cpuflags set). Instruction interception will only happen in error cases (e.g. cpu addr invalid). As a sending cpu might set the external call interrupt pending flags at the target cpu at every point in time, we can't handle this kind of interrupt using our kvm interrupt injection mechanism. The injection will be done automatically by the SIE when preparing the start of the target cpu. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> CC: Thomas Huth <thuth@linux.vnet.ibm.com> [Adopt external call injection to check for sigp interpretion] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16KVM: s390: Use intercept_insn decoder in trace eventAlexander Yarygin
The current trace definition doesn't work very well with the perf tool. Perf shows a "insn_to_mnemonic not found" message. Let's handle the decoding completely in a parseable format. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16KVM: s390: decoder of SIE intercepted instructionsAlexander Yarygin
This patch adds a new decoder of SIE intercepted instructions. The decoder implemented as a macro and potentially can be used in both kernelspace and userspace. Note that this simplified instruction decoder is only intended to be used with the subset of instructions that may cause a SIE intercept. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16KVM: s390: Use trace tables from sie.h.Alexander Yarygin
Use the symbolic translation tables from sie.h for decoding diag, sigp and sie exit codes. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16KVM: s390: add sie exit reasons tablesAlexander Yarygin
This patch defines tables of reasons for exiting from SIE mode in a new sie.h header file. Tables contain SIE intercepted codes, intercepted instructions and program interruptions codes. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16KVM: s390: Improved MVPG partial execution handlerThomas Huth
Use the new helper function kvm_arch_fault_in_page() for faulting-in the guest pages and only inject addressing errors when we've really hit a bad address (and return other error codes to userspace instead). Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16KVM: s390: Introduce helper function for faulting-in a guest pageThomas Huth
Rework the function kvm_arch_fault_in_sync() to become a proper helper function for faulting-in a guest page. Now it takes the guest address as a parameter and does not ignore the possible error code from gmap_fault() anymore (which could cause undetected error conditions before). Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16KVM: s390: Avoid endless loops of specification exceptionsThomas Huth
If the new PSW for program interrupts is invalid, the VM ends up in an endless loop of specification exceptions. Since there is not much left we can do in this case, we should better drop to userspace instead so that the crash can be reported to the user. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16KVM: s390: Improve is_valid_psw()Thomas Huth
As a program status word is also invalid (and thus generates an specification exception) if the instruction address is not even, we should test this in is_valid_psw(), too. This patch also exports the function so that it becomes available for other parts of the S390 KVM code as well. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16KVM: s390: correct locking for s390_enable_skeyMartin Schwidefsky
Use the mm semaphore to serialize multiple invocations of s390_enable_skey. The second CPU faulting on a storage key operation needs to wait for the completion of the page table update. Taking the mm semaphore writable has the positive side-effect that it prevents any host faults from taking place which does have implications on keys vs PGSTE. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-12KVM: x86: Fix CR3 reserved bits check in long modeJan Kiszka
Regression of 346874c9: PAE is set in long mode, but that does not mean we have valid PDPTRs. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-08kvm: x86: emulate monitor and mwait instructions as nopGabriel L. Somlo
Treat monitor and mwait instructions as nop, which is architecturally correct (but inefficient) behavior. We do this to prevent misbehaving guests (e.g. OS X <= 10.7) from crashing after they fail to check for monitor/mwait availability via cpuid. Since mwait-based idle loops relying on these nop-emulated instructions would keep the host CPU pegged at 100%, do NOT advertise their presence via cpuid, to prevent compliant guests from using them inadvertently. Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07kvm/x86: implement hv EOI assistMichael S. Tsirkin
It seems that it's easy to implement the EOI assist on top of the PV EOI feature: simply convert the page address to the format expected by PV EOI. Notes: -"No EOI required" is set only if interrupt injected is edge triggered; this is true because level interrupts are going through IOAPIC which disables PV EOI. In any case, if guest triggers EOI the bit will get cleared on exit. -For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE seems sufficient In any case, bit is cleared on exit so worst case it's never re-enabled -no handling of PV EOI data is performed at HV_X64_MSR_EOI write; HV_X64_MSR_EOI is a separate optimization - it's an X2APIC replacement that lets you do EOI with an MSR and not IO. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07KVM: x86: Mark bit 7 in long-mode PDPTE according to 1GB pages supportNadav Amit
In long-mode, bit 7 in the PDPTE is not reserved only if 1GB pages are supported by the CPU. Currently the bit is considered by KVM as always reserved. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07KVM: vmx: handle_dr does not handle RSP correctlyNadav Amit
The RSP register is not automatically cached, causing mov DR instruction with RSP to fail. Instead the regular register accessing interface should be used. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptrBandan Das
Some checks are common to all, and moreover, according to the spec, the check for whether any bits beyond the physical address width are set are also applicable to all of them Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: fail on invalid vmclear/vmptrld pointerBandan Das
The spec mandates that if the vmptrld or vmclear address is equal to the vmxon region pointer, the instruction should fail with error "VMPTRLD with VMXON pointer" or "VMCLEAR with VMXON pointer" Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: additional checks on vmxon regionBandan Das
Currently, the vmxon region isn't used in the nested case. However, according to the spec, the vmxon instruction performs additional sanity checks on this region and the associated pointer. Modify emulated vmxon to better adhere to the spec requirements Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: rearrange get_vmx_mem_addressBandan Das
Our common function for vmptr checks (in 2/4) needs to fetch the memory address Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06Merge tag 'kvm-s390-20140506' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next 1. Fixes an error return code for the breakpoint setup 2. External interrupt fixes 2.1. Some interrupt conditions like cpu timer or clock comparator stay pending even after the interrupt is injected. If the external new PSW is enabled for interrupts this will result in an endless loop. Usually this indicates a programming error in the guest OS. Lets detect such situations and go to userspace. We will provide a QEMU patch that sets the guest in panicked/crashed state to avoid wasting CPU cycles. 2.2 Resend external interrupts back to the guest if the HW could not do it. -
2014-05-06KVM: s390: Fix external interrupt interceptionThomas Huth
The external interrupt interception can only occur in rare cases, e.g. when the PSW of the interrupt handler has a bad value. The old handler for this interception simply ignored these events (except for increasing the exit_external_interrupt counter), but for proper operation we either have to inject the interrupts manually or we should drop to userspace in case of errors. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-06KVM: s390: Add clock comparator and CPU timer IRQ injectionThomas Huth
Add an interface to inject clock comparator and CPU timer interrupts into the guest. This is needed for handling the external interrupt interception. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-06KVM: s390: return -EFAULT if copy_from_user() failsDan Carpenter
When copy_from_user() fails, this code returns the number of bytes remaining instead of a negative error code. The positive number is returned to the user but otherwise it is harmless. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-05KVM: x86: improve the usability of the 'kvm_pio' tracepointUlrich Obergfell
This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated() and emulator_pio_out_emulated(), and it adds an argument (a pointer to the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched from 'pio_data' (depending on 'size'), and the value is included in the trace record ('val'). If 'count' is greater than one, this is indicated by the string "(...)" in the trace output. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-05kvm/irqchip: Speed up KVM_SET_GSI_ROUTINGChristian Borntraeger
When starting lots of dataplane devices the bootup takes very long on Christian's s390 with irqfd patches. With larger setups he is even able to trigger some timeouts in some components. Turns out that the KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec) when having multiple CPUs. This is caused by the synchronize_rcu and the HZ=100 of s390. By changing the code to use a private srcu we can speed things up. This patch reduces the boot time till mounting root from 8 to 2 seconds on my s390 guest with 100 disks. Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu are fine because they do not have lockdep checks (hlist_for_each_entry_rcu uses rcu_dereference_raw rather than rcu_dereference, and write-sides do not do rcu lockdep at all). Note that we're hardly relying on the "sleepable" part of srcu. We just want SRCU's faster detection of grace periods. Testing was done by Andrew Theurer using netperf tests STREAM, MAERTS and RR. The difference between results "before" and "after" the patch has mean -0.2% and standard deviation 0.6%. Using a paired t-test on the data points says that there is a 2.5% probability that the patch is the cause of the performance difference (rather than a random fluctuation). (Restricting the t-test to RR, which is the most likely to be affected, changes the numbers to respectively -0.3% mean, 0.7% stdev, and 8% probability that the numbers actually say something about the patch. The probability increases mostly because there are fewer data points). Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390 Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-30Merge tag 'kvm-s390-20140429' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next 1. Guest handling fixes The handling of MVPG, PFMF and Test Block is fixed to better follow the architecture. None of these fixes is critical for any current Linux guests, but let's play safe. 2. Optimization for single CPU guests We can enable the IBS facility if only one VCPU is running (!STOPPED state). We also enable this optimization for guest > 1 VCPU as soon as all but one VCPU is in stopped state. Thus will help guests that have tools like cpuplugd (from s390-utils) that do dynamic offline/ online of CPUs. 3. NOTES There is one non-s390 change in include/linux/kvm_host.h that introduces 2 defines for VCPU requests: define KVM_REQ_ENABLE_IBS 23 define KVM_REQ_DISABLE_IBS 24
2014-04-29KVM: x86: expose invariant tsc cpuid bit (v2)Marcelo Tosatti
Invariant TSC is a property of TSC, no additional support code necessary. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-29KVM: s390: enable IBS for single running VCPUsDavid Hildenbrand
This patch enables the IBS facility when a single VCPU is running. The facility is dynamically turned on/off as soon as other VCPUs enter/leave the stopped state. When this facility is operating, some instructions can be executed faster for single-cpu guests. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29KVM: s390: introduce kvm_s390_vcpu_{start,stop}David Hildenbrand
This patch introduces two new functions to set/clear the CPUSTAT_STOPPED bit and makes use of it at all applicable places. These functions prepare the additional execution of code when starting/stopping a vcpu. The CPUSTAT_STOPPED bit should not be touched outside of these functions. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>