summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2008-07-08x86: move e820_resource_resources to e820.cYinghai Lu
and make 32-bit resource registration more like 64 bit. also move probe_roms back to setup_32.c Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86 boot: pass E820 memory map entries more than 128 via linked list of ↵Huang, Ying
setup data Because of the size limits of struct boot_params (zero page), the maximum number of E820 memory map entries can be passed to kernel is 128. As pointed by Paul Jackson, there is some machine produced by SGI with so many nodes that the number of E820 memory map entries is more than 128. To enabling Linux kernel on these system, a new setup data type named SETUP_E820_EXT is defined to pass additional memory map entries to Linux kernel. This patch is based on x86/auto-latest branch of git-x86 tree and has been tested on x86_64 and i386 platform. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86, mm: use add_highpages_with_active_regions() for high pages init v2Yinghai Lu
use early_node_map to init high pages, so we can remove page_is_ram() and page_is_reserved_early() in the big loop with add_one_highpage also remove page_is_reserved_early(), it is not needed anymore. v2: fix the build of other platforms Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: rename two e820 related functionsYinghai Lu
rename update_memory_range to e820_update_range rename add_memory_region to e820_add_region to make it more clear that they are about e820 map operations. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: use dstapic in mp_config_acpi_legacy_irqsYinghai Lu
so we don't get the same value multiple times. also make mp_config_acpi_legacy_irqs more readable by moving assignments together. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: keep MP_intsrc_info untouched if we do not update mptableYinghai Lu
Daniel Exner reported IO-APIC enumeration breakage in linux-next. Alexey Starikovskiy found out that it might be related to commit 2944e16b25 "x86: update mptable". use enable_update_mptable to decide if need check before add mp_irqs array. Reported-by: Daniel Exner <webmaster@dragonslave.de> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: clean up relocate_initrdYinghai Lu
1. move that before zone_sizes_init ... 2. add free_early for one old one, otherwise it will be be reserved again when we init highmem. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: replace shrink_active_range() with remove_active_range()Yinghai Lu
in case we have kva before ramdisk on a node, we still need to use those ranges. v2: reserve_early kva ram area, in case there are holes in highmem, to avoid those area could be treat as free high pages. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: clean up reserve_bootmem_generic() and port it to 32-bitYinghai Lu
1. add reserve_bootmem_generic for 32bit 2. change len to unsigned long 3. make early_res_to_bootmem to use it Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: make generic arch support NUMAQ, fix #2Yinghai Lu
we are checking mptable early for numaq, so don't need to reserve_bootmem for it. bootmem is not there yet. do the same thing as 64-bit. found it on 64g above system from 64-bit kernel kexec to 32 bit kernel with numaq support. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: make generic arch support NUMAQ, fixYinghai Lu
fix typo in bigsmp switching. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: e820 merge parsing of the mem=/memmap= boot parametersYinghai Lu
since we now have 32-bit support for e820_register_active_regions(), we can merge the parsing of the mem=/memmap= boot parameters. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: unify the reserve_bootmem() behavior of early_res_to_bootmem()Ingo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: use reserve_bootmem_generic() to reserve crashkernel memory on x86_64Bernhard Walle
This patch uses reserve_bootmem_generic() instead of reserve_bootmem() to reserve the crashkernel memory on x86_64. That's necessary for NUMA machines, see 00212fef814612245ed0261cbac8426d0c9a31a5: [PATCH] Fix kdump Crash Kernel boot memory reservation for NUMA machines This patch will fix a boot memory reservation bug that trashes memory on the ES7000 when loading the kdump crash kernel. The code in arch/x86_64/kernel/setup.c to reserve boot memory for the crash kernel uses the non-numa aware "reserve_bootmem" function instead of the NUMA aware "reserve_bootmem_generic". I checked to make sure that no other function was using "reserve_bootmem" and found none, except the ones that had NUMA ifdef'ed out. I have tested this patch only on an ES7000 with NUMA on and off (numa=off) in a single (non-NUMA) and multi-cell (NUMA) configurations. Signed-off-by: Amul Shah <amul.shah@unisys.com> Looks-good-to: Vivek Goyal <vgoyal@in.ibm.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> The switch-back to reserve_bootmem() was accidentally introduced in 5c3391f9f749023a49c64d607da4fb49263690eb when adding the BOOTMEM_EXCLUSIVE parameter. Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: add flags parameter to reserve_bootmem_generic()Bernhard Walle
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it already has been added in reserve_bootmem() with commit 72a7fe3967dbf86cb34e24fbf1d957fe24d2f246. It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively change the behaviour. Since the change is x86-specific, I don't think it's necessary to add a new API for migration. There are only 4 users of that function. The change is necessary for the next patch, using reserve_bootmem_generic() for crashkernel reservation. Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08Merge branch 'linus' into tmp.x86.mpparse.newIngo Molnar
2008-07-05Merge branch 'x86/s2ram-fix' into x86/urgentIngo Molnar
2008-07-05x86 ACPI: fix resume from suspend to RAM on uniprocessor x86-64Rafael J. Wysocki
Since the trampoline code is now used for ACPI resume from suspend to RAM, the trampoline page tables have to be fixed up during boot not only on SMP systems, but also on UP systems that use the trampoline. Reference: http://bugzilla.kernel.org/show_bug.cgi?id=10923 Reported-by: Dionisus Torimens <djtm@gmx.net> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: pm list <linux-pm@lists.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-05x86 ACPI: normalize segment descriptor register on resumeH. Peter Anvin
Some Dell laptops enter resume with apparent garbage in the segment descriptor registers (almost certainly the result of a botched transition from protected to real mode.) The only way to clean that up is to enter protected mode ourselves and clean out the descriptor registers. This fixes resume on Dell XPS M1210 and Dell D620. Reference: http://bugzilla.kernel.org/show_bug.cgi?id=10927 Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: pm list <linux-pm@lists.linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: xen: fix address truncation in pte mfn<->pfn conversion arch/x86/mm/init_64.c: early_memtest(): fix types x86: fix Intel Mac booting with EFI
2008-07-04xen: fix address truncation in pte mfn<->pfn conversionJeremy Fitzhardinge
When converting the page number in a pte/pmd/pud/pgd between machine and pseudo-physical addresses, the converted result was being truncated at 32-bits. This caused failures on machines with more than 4G of physical memory. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: "Christopher S. Aker" <caker@theshore.net> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-03arch/x86/mm/init_64.c: early_memtest(): fix typesAndrew Morton
fix this warning: arch/x86/mm/init_64.c: In function 'early_memtest': arch/x86/mm/init_64.c:524: warning: passing argument 2 of 'find_e820_area_size' from incompatible pointer type Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-03x86: fix Intel Mac booting with EFIHugh Dickins
Fedora reports that mem_init()'s zap_low_mappings(), extended to SMP in 61165d7a035f6571c7576e7f51e7230157724c8d x86: fix app crashes after SMP resume causes 32-bit Intel Mac machines to reboot very early when booting with EFI. The EFI code appears to manage low mappings for itself when needed; but like many before it, confuses PSE with PAE. So it has only been mapping half the space it needed when PSE but not PAE. This remained unnoticed until we moved the SMP zap_low_mappings() before efi_enter_virtual_mode(). Presumably could have been noticed years ago if anyone ran a UP kernel on such machines? Reported-by: Peter Jones <pjones@redhat.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Peter Jones <pjones@redhat.com> Cc: Glauber Costa <gcosta@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Peter Jones <pjones@redhat.com>
2008-07-02Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix NODES_SHIFT Kconfig range
2008-07-01x86: fix NODES_SHIFT Kconfig rangeThomas Gleixner
commit 4323838215184f5a2f081e0d17b8d60731b03164 x86: change size of node ids from u8 to s16 set the range for NODES_SHIFT to 1..15. The possible range is 1..9 Fixes Bugzilla #10726 Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-30Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ptrace GET/SET FPXREGS broken x86: fix cpu hotplug crash x86: section/warning fixes x86: shift bits the right way in native_read_tscp
2008-06-30ptrace GET/SET FPXREGS brokenTAKADA Yoshihito
When I update kernel 2.6.25 from 2.6.24, gdb does not work. On 2.6.25, ptrace(PTRACE_GETFPXREGS, ...) returns ENODEV. But 2.6.24 kernel's ptrace() returns EIO. It is issue of compatibility. I attached test program as pt.c and patch for fix it. #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <signal.h> #include <errno.h> #include <sys/ptrace.h> #include <sys/types.h> struct user_fxsr_struct { unsigned short cwd; unsigned short swd; unsigned short twd; unsigned short fop; long fip; long fcs; long foo; long fos; long mxcsr; long reserved; long st_space[32]; /* 8*16 bytes for each FP-reg = 128 bytes */ long xmm_space[32]; /* 8*16 bytes for each XMM-reg = 128 bytes */ long padding[56]; }; int main(void) { pid_t pid; pid = fork(); switch(pid){ case -1:/* error */ break; case 0:/* child */ child(); break; default: parent(pid); break; } return 0; } int child(void) { ptrace(PTRACE_TRACEME); kill(getpid(), SIGSTOP); sleep(10); return 0; } int parent(pid_t pid) { int ret; struct user_fxsr_struct fpxregs; ret = ptrace(PTRACE_GETFPXREGS, pid, 0, &fpxregs); if(ret < 0){ printf("%d: %s.\n", errno, strerror(errno)); } kill(pid, SIGCONT); wait(pid); return 0; } /* in the kerel, at kernel/i387.c get_fpxregs() */ Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30x86: fix cpu hotplug crashZhang, Yanmin
Vegard Nossum reported crashes during cpu hotplug tests: http://marc.info/?l=linux-kernel&m=121413950227884&w=4 In function _cpu_up, the panic happens when calling __raw_notifier_call_chain at the second time. Kernel doesn't panic when calling it at the first time. If just say because of nr_cpu_ids, that's not right. By checking the source code, I found that function do_boot_cpu is the culprit. Consider below call chain: _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu. So do_boot_cpu is called in the end. In do_boot_cpu, if boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So later on, when _cpu_up calls __raw_notifier_call_chain at the second time to report CPU_UP_CANCELED, because this cpu is already cleared from cpu_possible_map, get_cpu_sysdev returns NULL. Many resources are related to cpu_possible_map, so it's better not to change it. Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in cpu_possible_map. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Tested-by: Vegard Nossum <vegard.nossum@gmail.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-26x86: section/warning fixesDaniel J Blueman
WARNING: arch/x86/mm/built-in.o(.text+0x3a1): Section mismatch in reference from the function set_pte_phys() to the function .init.text:spp_getpage() The function set_pte_phys() references the function __init spp_getpage(). This is often because set_pte_phys lacks a __init annotation or the annotation of spp_getpage is wrong. arch/x86/mm/init_64.c: In function 'early_memtest': arch/x86/mm/init_64.c:520: warning: passing argument 2 of 'find_e820_area_size' from incompatible pointer type Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Cc: "Linus Torvalds" <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24Merge branch 'kvm-updates-2.6.26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: KVM: Remove now unused structs from kvm_para.h x86: KVM guest: Use the paravirt clocksource structs and functions KVM: Make kvm host use the paravirt clocksource structs x86: Make xen use the paravirt clocksource structs and functions x86: Add structs and functions for paravirt clocksource KVM: VMX: Fix host msr corruption with preemption enabled KVM: ioapic: fix lost interrupt when changing a device's irq KVM: MMU: Fix oops on guest userspace access to guest pagetable KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend) KVM: MMU: Fix rmap_write_protect() hugepage iteration bug KVM: close timer injection race window in __vcpu_run KVM: Fix race between timer migration and vcpu migration
2008-06-24x86: KVM guest: Use the paravirt clocksource structs and functionsGerd Hoffmann
This patch updates the kvm host code to use the pvclock structs and functions, thereby making it compatible with Xen. The patch also fixes an initialization bug: on SMP systems the per-cpu has two different locations early at boot and after CPU bringup. kvmclock must take that in account when registering the physical address within the host. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: Make kvm host use the paravirt clocksource structsGerd Hoffmann
This patch updates the kvm host code to use the pvclock structs. It also makes the paravirt clock compatible with Xen. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24x86: Make xen use the paravirt clocksource structs and functionsGerd Hoffmann
This patch updates the xen guest to use the pvclock structs and helper functions. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24x86: Add structs and functions for paravirt clocksourceGerd Hoffmann
This patch adds structs for the paravirt clocksource ABI used by both xen and kvm (pvclock-abi.h). It also adds some helper functions to read system time and wall clock time from a paravirtual clocksource (pvclock.[ch]). They are based on the xen code. They are enabled using CONFIG_PARAVIRT_CLOCK. Subsequent patches of this series will put the code in use. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24xen: remove support for non-PAE 32-bitJeremy Fitzhardinge
Non-PAE operation has been deprecated in Xen for a while, and is rarely tested or used. xen-unstable has now officially dropped non-PAE support. Since Xen/pvops' non-PAE support has also been broken for a while, we may as well completely drop it altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24KVM: VMX: Fix host msr corruption with preemption enabledAvi Kivity
Switching msrs can occur either synchronously as a result of calls to the msr management functions (usually in response to the guest touching virtualized msrs), or asynchronously when preempting a kvm thread that has guest state loaded. If we're unlucky enough to have the two at the same time, host msrs are corrupted and the machine goes kaput on the next syscall. Most easily triggered by Windows Server 2008, as it does a lot of msr switching during bootup. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: Fix oops on guest userspace access to guest pagetableAvi Kivity
KVM has a heuristic to unshadow guest pagetables when userspace accesses them, on the assumption that most guests do not allow userspace to access pagetables directly. Unfortunately, in addition to unshadowing the pagetables, it also oopses. This never triggers on ordinary guests since sane OSes will clear the pagetables before assigning them to userspace, which will trigger the flood heuristic, unshadowing the pagetables before the first userspace access. One particular guest, though (Xenner) will run the kernel in userspace, triggering the oops. Since the heuristic is incorrect in this case, we can simply remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)Marcelo Tosatti
kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed guests properly. It will instantiate two 2MB sptes pointing to the same physical 2MB page when a guest large pte update is trapped. Instead of duplicating code to handle this, disallow directory level updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes emulating one guest 4MB pte can be correctly created by the page fault handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: Fix rmap_write_protect() hugepage iteration bugMarcelo Tosatti
rmap_next() does not work correctly after rmap_remove(), as it expects the rmap chains not to change during iteration. Fix (for now) by restarting iteration from the beginning. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: close timer injection race window in __vcpu_runMarcelo Tosatti
If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: Fix race between timer migration and vcpu migrationMarcelo Tosatti
A guest vcpu instance can be scheduled to a different physical CPU between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable(). If that happens, the timer will only be migrated to the current pCPU on the next exit, meaning that guest LAPIC timer event can be delayed until a host interrupt is triggered. Fix it by cancelling guest entry if any vcpu request is pending. This has the side effect of nicely consolidating vcpu->requests checks. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-20xen: don't drop NX bitJeremy Fitzhardinge
Because NX is now enforced properly, we must put the hypercall page into the .text segment so that it is executable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20xen: mask unwanted pte bits in __supported_pte_maskJeremy Fitzhardinge
[ Stable: this isn't a bugfix in itself, but it's a pre-requiste for "xen: don't drop NX bit" ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19x86, geode: add a VSA2 ID for General SoftwareJordan Crouse
General Software writes their own VSA2 module for their version of the Geode BIOS, which returns a different ID then the standard VSA2. This was causing the framebuffer driver to break for most GSW boards. Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Cc: tglx@linutronix.de Cc: linux-geode@lists.infradead.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19x86: use BOOTMEM_EXCLUSIVE on 32-bitBernhard Walle
This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for i386 and prints a error message on failure. The patch is still for 2.6.26 since it is only bug fixing. The unification of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27. Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org>
2008-06-19x86, 32-bit: fix boot failure on TSC-less processorsMikael Pettersson
Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6" (invalid opcode) and a kernel halt immediately after the kernel has been uncompressed. The BUG shows EIP pointing to an rdtsc instruction in native_read_tsc(), invoked from native_sched_clock(). (This error occurs so early that not even the serial console can capture it.) A bisection showed that this bug first occurs in 2.6.26-rc3-git7, via commit 9ccc906c97e34fd91dc6aaf5b69b52d824386910: >x86: distangle user disabled TSC from unstable > >tsc_enabled is set to 0 from the command line switch "notsc" and from >the mark_tsc_unstable code. Seperate those functionalities and replace >tsc_enable with tsc_disable. This makes also the native_sched_clock() >decision when to use TSC understandable. > >Preparatory patch to solve the sched_clock() issue on 32 bit. > >Signed-off-by: Thomas Gleixner <tglx@linutronix.de> The core reason for this bug is that native_sched_clock() gets called before tsc_init(). Before the commit above, tsc_32.c used a "tsc_enabled" variable which defaulted to 0 == disabled, and which only got enabled late in tsc_init(). Thus early calls to native_sched_clock() would skip the TSC and use jiffies instead. After the commit above, tsc_32.c uses a "tsc_disabled" variable which defaults to 0, meaning that the TSC is Ok to use. Early calls to native_sched_clock() now erroneously try to use the TSC on !cpu_has_tsc processors, leading to invalid opcode exceptions. My proposed fix is to initialise tsc_disabled to a "soft disabled" state distinct from the hard disabled state set up by the "notsc" kernel option. This fixes the native_sched_clock() problem. It also allows tsc_init() to be simplified: instead of setting tsc_disabled = 1 on every error return, we just set tsc_disabled = 0 once when all checks have succeeded. I've verified that this lets my 486 boot again. I've also verified that a Core2 machine still uses the TSC as clocksource after the patch. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19x86: fix NULL pointer deref in __switch_toSuresh Siddha
Patrick McHardy reported a crash: > > I get this oops once a day, its apparently triggered by something > > run by cron, but the process is a different one each time. > > > > Kernel is -git from yesterday shortly before the -rc6 release > > (last commit is the usb-2.6 merge, the x86 patches are missing), > > .config is attached. > > > > I'll retry with current -git, but the patches that have gone in > > since I last updated don't look related. > > > > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at > > 000001ff > > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118 > > [62060.043009] *pde = 00000000 > > [62060.043009] Oops: 0002 [#1] PREEMPT Vegard Nossum analyzed it: > This decodes to > > 0: 0f ae 00 fxsave (%eax) > > so it's related to the floating-point context. This is the exact > location of the crash: > > $ addr2line -e arch/x86/kernel/process_32.o -i ab0 > include/asm/i387.h:232 > include/asm/i387.h:262 > arch/x86/kernel/process_32.c:595 > > ...so it looks like prev_task->thread.xstate->fxsave has become NULL. > Or maybe it never had any other value. Somehow (as described below) TS_USEDFPU is set but the fpu is not allocated or freed. Another possible FPU pre-emption issue with the sleazy FPU optimization which was benign before but not so anymore, with the dynamic FPU allocation patch. New task is getting exec'd and it is prempted at the below point. flush_thread() { ... /* * Forget coprocessor state.. */ clear_fpu(tsk); <----- Preemption point clear_used_math(); ... } Now when it context switches in again, as the used_math() is still set and fpu_counter can be > 5, we will do a math_state_restore() which sets the task's TS_USEDFPU. After it continues from the above preemption point it does clear_used_math() and much later free_thread_xstate(). Now, at the next context switch, it is quite possible that xstate is null, used_math() is not set and TS_USEDFPU is still set. This will trigger unlazy_fpu() causing kernel oops. Fix this by clearing tsk's fpu_counter before clearing task's fpu. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-17x86-64: Fix "bytes left to copy" return value for copy_from_user()Linus Torvalds
Most users by far do not care about the exact return value (they only really care about whether the copy succeeded in its entirety or not), but a few special core routines actually care deeply about exactly how many bytes were copied from user space. And the unrolled versions of the x86-64 user copy routines would sometimes report that it had copied more bytes than it actually had. Very few uses actually have partial copies to begin with, but to make this bug even harder to trigger, most x86 CPU's use the "rep string" instructions for normal user copies, and that version didn't have this issue. To make it even harder to hit, the one user of this that really cared about the return value (and used the uncached version of the copy that doesn't use the "rep string" instructions) was the generic write routine, which pre-populated its source, once more hiding the problem by avoiding the exception case that triggers the bug. In other words, very special thanks to Bron Gondwana who not only triggered this, but created a test-program to show it, and bisected the behavior down to commit 08291429cfa6258c4cd95d8833beb40f828b194e ("mm: fix pagecache write deadlocks") which changed the access pattern just enough that you can now trigger it with 'writev()' with multiple iovec's. That commit itself was not the cause of the bug, it just allowed all the stars to align just right that you could trigger the problem. [ Side note: this is just the minimal fix to make the copy routines (with __copy_from_user_inatomic_nocache as the particular version that was involved in showing this) have the right return values. We really should improve on the exceptional case further - to make the copy do a byte-accurate copy up to the exact page limit that causes it to fail. As it is, the callers have to do extra work to handle the limit case gracefully. ] Reported-by: Bron Gondwana <brong@fastmail.fm> Cc: Nick Piggin <npiggin@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (which didn't have this problem), and since most users that do the carethis was very hard to trigger, but
2008-06-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: fixup write combine comment in pci_mmap_resource x86: PAT export resource_wc in pci sysfs x86, pci-dma.c: don't always add __GFP_NORETRY to gfp suspend-vs-iommu: prevent suspend if we could not resume x86: pci-dma.c: use __GFP_NO_OOM instead of __GFP_NORETRY pci, x86: add workaround for bug in ASUS A7V600 BIOS (rev 1005) PCI: use dev_to_node in pci_call_probe PCI: Correct last two HP entries in the bfsort whitelist
2008-06-12provide rtc_cmos platform deviceStas Sergeev
Recently (around 2.6.25) I've noticed that RTC no longer works for me. It turned out this is because I use pnpacpi=off kernel option to work around the parport_pc bugs. I always did so, but RTC used to work fine in the past, and now it have regressed. The patch fixes the problem by creating the platform device for the RTC when PNP is disabled. This may also help running the PNP-enabled kernel on an older PCs. Signed-off-by: Stas Sergeev <stsp@aknet.ru> Cc: David Brownell <david-b@pacbell.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Adam Belay <ambx1@neo.rr.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>